Message ID | 87v9qix001.fsf@gnu.org |
---|---|
Headers | show |
Ludovic Courtès <ludo@gnu.org> writes: > Hi Brett, > > Brett Gilio <brettg@posteo.net> skribis: > >> MLton is a little bit tricker to package properly as it runs in to the issue >> of requiring a non-source, pre-compiled bootstrap. There is a way to get it to >> build against SMLnj though (which will be my next WIP bug report). This patch >> series uses a series of patched ELFs to compile MLton from source. But obviously >> this will not be permissable upstream. > > “Not permissible” is strong (in the sense that Guix doesn’t have such a > strong policy because there are cases where there’s no known way to > bootstrap from source), but clearly, if there’s a documented way to > build MLton from source, this is what we should be aiming for. > > Thanks! > > Ludo’. It's a little off of the subject at hand, but to my knowledge (as provided by this issue: https://github.com/MLton/mlton/issues/350) I am going to cherry pick a few things of note. "In general, it has been the position of MLton developers that "build from source" is an unrealistic goal for a self-hosting compiler. While it is possible (albeit, very, very slow) to build MLton using another SML compiler, we are not aware of any other SML compiler that meets the spirit of "build from source" (although some might meet the law). In particular, Poly/ML simply includes pre-built compilers in their source repository (see https://github.com/polyml/polyml/blob/master/Makefile.am#L22 and https://github.com/polyml/polyml/tree/master/imports); admittedly these are (space inefficient) text files rather than binary blobs, but they are hardly artifacts meant for human consumption. Similarly, SML/NJ "builds from source" by downloading pre-built binary bootstrap images. We find the "solution" of MLton building by downloading a pre-built MLton to be satisfactory and both more performant and more robust." As Matthew Fluet notes here, even the PolyML repository has some blobs, though they are not ELF blobs. SMLnj repeats this same behavior. I follow this up with: "Taking from the spirit of what Aaron said in the Debian issue (back in 2003), "I agree. A larger circular dependency is worse than depending on oneself. The only way I could see depending on another SML compiler is if that compiler was written in another language such that no dependency cycle would exist." Perhaps a longer-term goal of mine would be to help accomplish this such that the dependency cycle is indeed broken. I know of projects like https://github.com/thepowersgang/mrustc doing this for the Rust toolchain, and https://www.gnu.org/software/mes/ for bootstrapping a seedless GCC." I wonder if the best long-term solution for melding Guix and the ML language community is to try and write an ML compiler in a language like C or Scheme based on a reduced-sized specification of the language. The reason I say all this is because I would really love to see the philosophies of ML/Formal Methods/Proof and the deterministic/functional philosophy of Guix work together. They seem naturally synergistic, but there seems be a difference in history here that is making this quite antagonistic. I'd really like to spark some insight and discussion here because it would be amazing if the formal methods community (like Coq, seL4, HOL/Isabelle, etc.) could really begin to benefit from the Guix model. Sorry for the extra long message. Thanks!
Hi Brett, Brett Gilio <brettg@posteo.net> skribis: > It's a little off of the subject at hand, but to my knowledge (as > provided by this issue: https://github.com/MLton/mlton/issues/350) I am > going to cherry pick a few things of note. Thanks for discussing it with upstream! Your reply summarizes current “best practices” pretty well: for Rust, for Guile (which contains an interpreter in C), for Common Lisp (GNU clisp is written in C), for C (MesCC), etc. > I wonder if the best long-term solution for melding Guix and the ML > language community is to try and write an ML compiler in a language like > C or Scheme based on a reduced-sized specification of the language. I vaguely recall reading about an SML implementation in Scheme. All I could find is a mention of it in <https://www.macs.hw.ac.uk/ultra/skalpel/html/sml.html>. > The reason I say all this is because I would really love to see the > philosophies of ML/Formal Methods/Proof and the deterministic/functional > philosophy of Guix work together. They seem naturally synergistic, but > there seems be a difference in history here that is making this quite > antagonistic. > > I'd really like to spark some insight and discussion here because it > would be amazing if the formal methods community (like Coq, seL4, > HOL/Isabelle, etc.) could really begin to benefit from the Guix model. I agree that the two should be synergistic. Eventually, all this comes down to how to design a programming language or its implementation in a way that allows is to be built incrementally from “nothing”, or from a different language (something janneke and I discussed at the R-B summit). This sounds very much like programming language research question. Thanks, Ludo’.
Ludovic Courtès <ludo@gnu.org> writes: > Hi Simon, > > zimoun <zimon.toutoune@gmail.com> skribis: > >> To my knowledge, the most advanced ML-family language able to >> bootstrap (and verified with prover etc.) is CakeML (subset of >> Standard ML). >> >> (And not packaged in Guix, AFAICT.) >> >> https://cakeml.org/ > > Right, thanks for the pointer! CakeML is truly impressive. It’s also > nice to see how the tiny diagram to the right of the web page clearly > and succinctly describes its compiler/language tower. > > Ludo’. I may be misspeaking, but I think CakeML is formally verified in HOL which bootstraps against PolyML or SMLnj, and also requires MLton. So the issue of cyclic binary-derived bootstrapping remains an issue. This is where I think Amin Bandali (CC) and I's idea of writing a C-based boostrapping compiler and dedicating it to the GNU project would be really valuable. Ultimately I think we need a vehicle for ML to be fledged out in Guix so that we can have a consistent set of utilities for the formal methods community to work in with Guix (see my formal methods working group proposal email). Maybe at some later date we could even fledge out that bootstrapping compiler to be its own distinct implementation of ML, but i'll save that for a later date. Regardless, the issue of binaries-on-binaries in ML seems to be reaching back to 1993! It's about time we ended that. Stay tuned for more information on what I think we could do about that.
Hi, zimoun <zimon.toutoune@gmail.com> skribis: > On Tue, 17 Dec 2019 at 03:21, Brett Gilio <brettg@posteo.net> wrote: > >> I may be misspeaking, but I think CakeML is formally verified in HOL >> which bootstraps against PolyML or SMLnj, and also requires MLton. So >> the issue of cyclic binary-derived bootstrapping remains an issue. This > > I have not checked myself and if I understand well your point: CakeML > requires one PolyML binary to bootstrap (see Bootstrapping locally in > [1] which points to [2]). And this PolyML binary is not small and > requires other not-so-small binaries to be produced. And I am not > talking about the HOL part. So their claim that CakeML bootstraps > really depends on how is defined "bootstrap". :-) Their claim is summarized on the home page (emphasis mine): The CakeML compiler has been bootstrapped inside HOL. By bootstrapped we mean that the compiler has compiled itself. […] The result is a verified binary that _provably_ implements the compiler itself IOW, their approach is to provide a formal proof that a given byte stream corresponds to the given source code. This approach is very different from everything we’ve been doing so far. In essence, we’ve been focusing on building everything from source, with the assumption that source code is auditable. Instead of doing that, they formally provide the source/binary correspondence, which is much stronger. Once you have this proof, it doesn’t matter how HOL or PolyML or any other dependency is built, AIUI. In theory, HOL could be the target of a trusting-trust attack. But then again, one could very well check it with paper-and-pen or with HOL plus manual verification. Food for thought! Ludo’.
Ludovic Courtès <ludo@gnu.org> writes: > Hi, > > zimoun <zimon.toutoune@gmail.com> skribis: > >> On Tue, 17 Dec 2019 at 03:21, Brett Gilio <brettg@posteo.net> wrote: >> >>> I may be misspeaking, but I think CakeML is formally verified in HOL >>> which bootstraps against PolyML or SMLnj, and also requires MLton. So >>> the issue of cyclic binary-derived bootstrapping remains an issue. This >> >> I have not checked myself and if I understand well your point: CakeML >> requires one PolyML binary to bootstrap (see Bootstrapping locally in >> [1] which points to [2]). And this PolyML binary is not small and >> requires other not-so-small binaries to be produced. And I am not >> talking about the HOL part. So their claim that CakeML bootstraps >> really depends on how is defined "bootstrap". :-) > > Their claim is summarized on the home page (emphasis mine): > > The CakeML compiler has been bootstrapped inside HOL. By bootstrapped > we mean that the compiler has compiled itself. […] The result is a > verified binary that _provably_ implements the compiler itself > > IOW, their approach is to provide a formal proof that a given byte > stream corresponds to the given source code. > > This approach is very different from everything we’ve been doing so > far. In essence, we’ve been focusing on building everything from > source, with the assumption that source code is auditable. > > Instead of doing that, they formally provide the source/binary > correspondence, which is much stronger. Once you have this proof, it > doesn’t matter how HOL or PolyML or any other dependency is built, AIUI. > > In theory, HOL could be the target of a trusting-trust attack. But then > again, one could very well check it with paper-and-pen or with HOL plus > manual verification. > > Food for thought! > > Ludo’. > > > > I kind of wonder if their method of having a HOL-implemented proof for their source/binary bytestream is a possibility for GNU Guix in the future? It makes for an interesting speculation, especially for system-level proofs of blob trust. Bandli?