mbox

[bug#38605,WIP,MLton,0/1] Add MLton

Message ID 87v9qix001.fsf@gnu.org
Headers show

Message

Ludovic Courtès Dec. 14, 2019, 5:58 p.m. UTC
Hi Brett,

Brett Gilio <brettg@posteo.net> skribis:

> MLton is a little bit tricker to package properly as it runs in to the issue
> of requiring a non-source, pre-compiled bootstrap. There is a way to get it to
> build against SMLnj though (which will be my next WIP bug report). This patch
> series uses a series of patched ELFs to compile MLton from source. But obviously
> this will not be permissable upstream.

“Not permissible” is strong (in the sense that Guix doesn’t have such a
strong policy because there are cases where there’s no known way to
bootstrap from source), but clearly, if there’s a documented way to
build MLton from source, this is what we should be aiming for.

Thanks!

Ludo’.

Comments

Brett Gilio Dec. 15, 2019, 10:32 p.m. UTC | #1
Ludovic Courtès <ludo@gnu.org> writes:

> Hi Brett,
>
> Brett Gilio <brettg@posteo.net> skribis:
>
>> MLton is a little bit tricker to package properly as it runs in to the issue
>> of requiring a non-source, pre-compiled bootstrap. There is a way to get it to
>> build against SMLnj though (which will be my next WIP bug report). This patch
>> series uses a series of patched ELFs to compile MLton from source. But obviously
>> this will not be permissable upstream.
>
> “Not permissible” is strong (in the sense that Guix doesn’t have such a
> strong policy because there are cases where there’s no known way to
> bootstrap from source), but clearly, if there’s a documented way to
> build MLton from source, this is what we should be aiming for.
>
> Thanks!
>
> Ludo’.

It's a little off of the subject at hand, but to my knowledge (as
provided by this issue: https://github.com/MLton/mlton/issues/350) I am
going to cherry pick a few things of note.

"In general, it has been the position of MLton developers that "build
from source" is an unrealistic goal for a self-hosting compiler. While
it is possible (albeit, very, very slow) to build MLton using another
SML compiler, we are not aware of any other SML compiler that meets the
spirit of "build from source" (although some might meet the law). In
particular, Poly/ML simply includes pre-built compilers in their source
repository (see
https://github.com/polyml/polyml/blob/master/Makefile.am#L22 and
https://github.com/polyml/polyml/tree/master/imports); admittedly these
are (space inefficient) text files rather than binary blobs, but they
are hardly artifacts meant for human consumption. Similarly, SML/NJ
"builds from source" by downloading pre-built binary bootstrap
images. We find the "solution" of MLton building by downloading a
pre-built MLton to be satisfactory and both more performant and more
robust."

As Matthew Fluet notes here, even the PolyML repository has some blobs,
though they are not ELF blobs. SMLnj repeats this same behavior.

I follow this up with:

"Taking from the spirit of what Aaron said in the Debian issue (back in
2003), "I agree. A larger circular dependency is worse than depending on
oneself. The only way I could see depending on another SML compiler is
if that compiler was written in another language such that no dependency
cycle would exist." Perhaps a longer-term goal of mine would be to help
accomplish this such that the dependency cycle is indeed broken. I know
of projects like https://github.com/thepowersgang/mrustc doing this for
the Rust toolchain, and https://www.gnu.org/software/mes/ for
bootstrapping a seedless GCC."

I wonder if the best long-term solution for melding Guix and the ML
language community is to try and write an ML compiler in a language like
C or Scheme based on a reduced-sized specification of the language.

The reason I say all this is because I would really love to see the
philosophies of ML/Formal Methods/Proof and the deterministic/functional
philosophy of Guix work together. They seem naturally synergistic, but
there seems be a difference in history here that is making this quite
antagonistic.

I'd really like to spark some insight and discussion here because it
would be amazing if the formal methods community (like Coq, seL4,
HOL/Isabelle, etc.) could really begin to benefit from the Guix model.

Sorry for the extra long message.

Thanks!
Ludovic Courtès Dec. 16, 2019, 10:02 a.m. UTC | #2
Hi Brett,

Brett Gilio <brettg@posteo.net> skribis:

> It's a little off of the subject at hand, but to my knowledge (as
> provided by this issue: https://github.com/MLton/mlton/issues/350) I am
> going to cherry pick a few things of note.

Thanks for discussing it with upstream!  Your reply summarizes current
“best practices” pretty well: for Rust, for Guile (which contains an
interpreter in C), for Common Lisp (GNU clisp is written in C), for C
(MesCC), etc.

> I wonder if the best long-term solution for melding Guix and the ML
> language community is to try and write an ML compiler in a language like
> C or Scheme based on a reduced-sized specification of the language.

I vaguely recall reading about an SML implementation in Scheme.  All I
could find is a mention of it in
<https://www.macs.hw.ac.uk/ultra/skalpel/html/sml.html>.

> The reason I say all this is because I would really love to see the
> philosophies of ML/Formal Methods/Proof and the deterministic/functional
> philosophy of Guix work together. They seem naturally synergistic, but
> there seems be a difference in history here that is making this quite
> antagonistic.
>
> I'd really like to spark some insight and discussion here because it
> would be amazing if the formal methods community (like Coq, seL4,
> HOL/Isabelle, etc.) could really begin to benefit from the Guix model.

I agree that the two should be synergistic.  Eventually, all this comes
down to how to design a programming language or its implementation in a
way that allows is to be built incrementally from “nothing”, or from a
different language (something janneke and I discussed at the R-B
summit).  This sounds very much like programming language research
question.

Thanks,
Ludo’.
Brett Gilio Dec. 17, 2019, 2:20 a.m. UTC | #3
Ludovic Courtès <ludo@gnu.org> writes:

> Hi Simon,
>
> zimoun <zimon.toutoune@gmail.com> skribis:
>
>> To my knowledge, the most advanced ML-family language able to
>> bootstrap (and verified with prover etc.) is CakeML (subset of
>> Standard ML).
>>
>> (And not packaged in Guix, AFAICT.)
>>
>> https://cakeml.org/
>
> Right, thanks for the pointer!  CakeML is truly impressive.  It’s also
> nice to see how the tiny diagram to the right of the web page clearly
> and succinctly describes its compiler/language tower.
>
> Ludo’.

I may be misspeaking, but I think CakeML is formally verified in HOL
which bootstraps against PolyML or SMLnj, and also requires MLton. So
the issue of cyclic binary-derived bootstrapping remains an issue. This
is where I think Amin Bandali (CC) and I's idea of writing a C-based
boostrapping compiler and dedicating it to the GNU project would be
really valuable.

Ultimately I think we need a vehicle for ML to be fledged out in Guix so
that we can have a consistent set of utilities for the formal methods
community to work in with Guix (see my formal methods working group
proposal email).

Maybe at some later date we could even fledge out that bootstrapping
compiler to be its own distinct implementation of ML, but i'll save that
for a later date. Regardless, the issue of binaries-on-binaries in ML
seems to be reaching back to 1993! It's about time we ended that. Stay
tuned for more information on what I think we could do about that.
Ludovic Courtès Dec. 18, 2019, 2:48 p.m. UTC | #4
Hi,

zimoun <zimon.toutoune@gmail.com> skribis:

> On Tue, 17 Dec 2019 at 03:21, Brett Gilio <brettg@posteo.net> wrote:
>
>> I may be misspeaking, but I think CakeML is formally verified in HOL
>> which bootstraps against PolyML or SMLnj, and also requires MLton. So
>> the issue of cyclic binary-derived bootstrapping remains an issue. This
>
> I have not checked myself and if I understand well your point: CakeML
> requires one PolyML binary to bootstrap (see Bootstrapping locally in
> [1] which points to [2]). And this PolyML binary is not small and
> requires other not-so-small binaries to be produced. And I am not
> talking about the HOL part. So their claim that CakeML bootstraps
> really depends on how is defined "bootstrap". :-)

Their claim is summarized on the home page (emphasis mine):

  The CakeML compiler has been bootstrapped inside HOL.  By bootstrapped
  we mean that the compiler has compiled itself. […] The result is a
  verified binary that _provably_ implements the compiler itself

IOW, their approach is to provide a formal proof that a given byte
stream corresponds to the given source code.

This approach is very different from everything we’ve been doing so
far.  In essence, we’ve been focusing on building everything from
source, with the assumption that source code is auditable.

Instead of doing that, they formally provide the source/binary
correspondence, which is much stronger.  Once you have this proof, it
doesn’t matter how HOL or PolyML or any other dependency is built, AIUI.

In theory, HOL could be the target of a trusting-trust attack.  But then
again, one could very well check it with paper-and-pen or with HOL plus
manual verification.

Food for thought!

Ludo’.
Brett Gilio Dec. 21, 2019, 4:56 a.m. UTC | #5
Ludovic Courtès <ludo@gnu.org> writes:

> Hi,
>
> zimoun <zimon.toutoune@gmail.com> skribis:
>
>> On Tue, 17 Dec 2019 at 03:21, Brett Gilio <brettg@posteo.net> wrote:
>>
>>> I may be misspeaking, but I think CakeML is formally verified in HOL
>>> which bootstraps against PolyML or SMLnj, and also requires MLton. So
>>> the issue of cyclic binary-derived bootstrapping remains an issue. This
>>
>> I have not checked myself and if I understand well your point: CakeML
>> requires one PolyML binary to bootstrap (see Bootstrapping locally in
>> [1] which points to [2]). And this PolyML binary is not small and
>> requires other not-so-small binaries to be produced. And I am not
>> talking about the HOL part. So their claim that CakeML bootstraps
>> really depends on how is defined "bootstrap". :-)
>
> Their claim is summarized on the home page (emphasis mine):
>
>   The CakeML compiler has been bootstrapped inside HOL.  By bootstrapped
>   we mean that the compiler has compiled itself. […] The result is a
>   verified binary that _provably_ implements the compiler itself
>
> IOW, their approach is to provide a formal proof that a given byte
> stream corresponds to the given source code.
>
> This approach is very different from everything we’ve been doing so
> far.  In essence, we’ve been focusing on building everything from
> source, with the assumption that source code is auditable.
>
> Instead of doing that, they formally provide the source/binary
> correspondence, which is much stronger.  Once you have this proof, it
> doesn’t matter how HOL or PolyML or any other dependency is built, AIUI.
>
> In theory, HOL could be the target of a trusting-trust attack.  But then
> again, one could very well check it with paper-and-pen or with HOL plus
> manual verification.
>
> Food for thought!
>
> Ludo’.
>
>
>
>

I kind of wonder if their method of having a HOL-implemented proof for
their source/binary bytestream is a possibility for GNU Guix in the
future? It makes for an interesting speculation, especially for
system-level proofs of blob trust.

Bandli?