diff mbox series

[bug#51838,v6,03/41] guix: node-build-system: Add JSON utilities.

Message ID 20211230073919.30327-4-philip@philipmcgrath.com
State Accepted
Headers show
Series guix: node-build-system: Support compiling add-ons with node-gyp. | expand

Checks

Context Check Description
cbaines/comparison success View comparision
cbaines/git branch success View Git branch
cbaines/applying patch success View Laminar job
cbaines/issue success View issue
cbaines/applying patch success View Laminar job
cbaines/issue success View issue
cbaines/comparison success View comparision
cbaines/git branch success View Git branch
cbaines/comparison success View comparision
cbaines/git branch success View Git branch
cbaines/comparison success View comparision
cbaines/git branch success View Git branch
cbaines/comparison success View comparision
cbaines/git branch success View Git branch
cbaines/applying patch success View Laminar job
cbaines/issue success View issue
cbaines/applying patch success View Laminar job
cbaines/issue success View issue
cbaines/applying patch success View Laminar job
cbaines/issue success View issue
cbaines/comparison success View comparision
cbaines/git branch success View Git branch
cbaines/applying patch success View Laminar job
cbaines/issue success View issue
cbaines/applying patch success View Laminar job
cbaines/issue success View issue

Commit Message

Philip McGrath Dec. 30, 2021, 7:38 a.m. UTC
This commit adds several utility functions for non-destructive
transformation of the JSON representation used by (guix build json),
particularly for purely functional update of JSON objects.  They should
eventually be exported, but most are private for now to allow for more
experience and consideration before commiting to the API.  The design
was largely inspired by the 'racket/dict' and 'racket/hash' libraries.
Liliana Marie Prikler proposed 'with-atomic-json-file-replacement'.

* guix/build/node-build-system.scm (jsobject-ref):
(alist-pop):
(alist-delete*):
(jsobject-delete):
(alist-set):
(jsobject-set):
(jsobject-set*):
(alist-update):
(jsobject-update):
(jsobject-update*):
(jsobject-union): New private procedures.
(with-atomic-json-file-replacement): New exported procedure.
(module-name): Use them.
(build): Use them.
(patch-dependencies): Use them.  Stop using 'assoc-set!' unsafely.

Co-authored-by: Liliana Marie Prikler <liliana.prikler@gmail.com>
---
 guix/build/node-build-system.scm | 275 ++++++++++++++++++++++++++++---
 1 file changed, 251 insertions(+), 24 deletions(-)

Comments

Liliana Marie Prikler Dec. 30, 2021, 4:56 p.m. UTC | #1
Am Donnerstag, dem 30.12.2021 um 02:38 -0500 schrieb Philip McGrath:
> This commit adds several utility functions for non-destructive
> transformation of the JSON representation used by (guix build json),
> particularly for purely functional update of JSON objects.  They
> should
> eventually be exported, but most are private for now to allow for
> more
> experience and consideration before commiting to the API.  The design
> was largely inspired by the 'racket/dict' and 'racket/hash'
> libraries.
> Liliana Marie Prikler proposed 'with-atomic-json-file-replacement'.
Given that this is a fair amount of procedures that you're proposing, I
think a new file would be appropriate.  Perhaps (guix build json-
utils)?  Adding that should IIUC not cause a world rebuild, so we could
do that on master.
Liliana Marie Prikler Dec. 30, 2021, 6:18 p.m. UTC | #2
Having argued for these procedures to be moved into their own file in a
separate mail, now it's time to bikeshed stylistic choices.

Am Donnerstag, dem 30.12.2021 um 02:38 -0500 schrieb Philip McGrath:
> +(define (jsobject-ref js key failure-result)
> +  "Return the value assosciated with KEY in the json object JS.  If
> KEY is not
> +found and FAILURE-RESULT is a procedure, it is called in tail position
> with
> +zero arguments.  Otherwise, FAILURE-RESULT is returned."
> +  ;; TODO: `failure-result` should be optional, but should the default
> +  ;; `failure-result` be #f (like `assoc-ref`), a thunk raising an
> exception,
> +  ;; '(@), or something else?  Keep it mandatory until we discuss and
> decide.
> +  (match js
> +    (('@ . alist)
> +     (match (assoc key alist)
> +       (#f
> +        (if (procedure? failure-result)
> +            (failure-result)
> +            failure-result))
> +       ((_ . value)
> +        value)))))
We can safely replace failure-result by Guile's DEFAULT and leave error
handling to the user.

> +(define (alist-pop alist key)
> +  "Return two values: the first pair in ALIST with the given KEY in
> its
> +'car' (or #f, if no such pair exists) and an assosciation list like
> (and
> +potentially sharing storage with) ALIST, but with no entry for KEY."
> +  (match (assoc key alist)
> +    ;; If key isn't present, we don't need to do any allocation
> +    (#f
> +     (values #f alist))
> +    (found
> +     (values found
> +             ;; Because we have `found`, we can find it more
> +             ;; efficiently this time with `eq?`. We avoid using
> +             ;; `delq` because it would copy pairs in a shared
> +             ;; tail. We assume a sufficiently smart compiler to
> +             ;; handle "tail recursion modulo cons" (vid. e.g. Indiana
> +             ;; University Technical Report No. 19, Friedman & Wise
> +             ;; 1975) at least as efficiently as a hand-written
> +             ;; tail-recursive implementation with an accumulator.
> +             (let loop ((alist alist))
> +               (match alist
> +                 ;; We know that `found` is present,
> +                 ;; so no need to check for '()
> +                 ((this . alist)
> +                  (if (eq? this found)
> +                      alist
> +                      (cons this (loop alist))))))))))
I think this can be more efficiently be done in a "single" loop.

  (let loop ((rest alist)
             (previous '()))
    (match rest
      (() (values #f alist))
      ((first . rest)
       (if (eq? (car first) key)
           (values first (reverse! previous rest))
           (loop rest (cons first previous))))))

Also, I don't think your version is tail-recursive.  (loop alist) is
not in tail position from what I can tell.

We should also look into SRFI-1 span.

> +;; Sadly, Guile's implementation of (@ (srfi srfi-1) alist-delete)
> +;; performs unnecessary allocation, e.g. this currently evaluates to
> #f:
> +;;
> +;;     (let ((alist `(("a" . 1)("b" . 2)("c" . 3))))
> +;;       (eq? alist (alist-delete "x" alist)))
> +;;
> +;; These functions generally choose to allocate a new outer pair
> (with the '@
> +;; tag), even though in unusual cases the resulting object might not
> have
> +;; changed, for the sake of simplicity and to avoid retaining a
> reference to
> +;; the original alist longer than necessary. But that is O(1)
> allocation that
> +;; could only rarely be avoided: `alist-delete` would allocate O(n)
> pairs,
> +;; which would only be necessary in the worst case.
> +(define (alist-delete* alist key)
> +  "Return an assosciation list like (and potentially sharing storage
> with)
> +ALIST, but with no entry for KEY."
> +  (define-values (_popped remaining)
> +    (alist-pop alist key))
> +  remaining)
That's a pretty long comment around something that could be done with
call-with-values or SRFI-71 let.  I think one of these two should be
preferred.

Note that both our versions of alist-pop only pop the first key (as
they should).  This means that alist-delete* should really be called
alist-delete-1 as in "remove the first pair in ALIST belonging to KEY".
For the larger JSON handling below, this makes no difference however.

> +(define (jsobject-delete js key)
> +  "Return a json object like JS, but with no entry for KEY."
> +  (cons '@ (match js
> +             (('@ . alist)
> +              (alist-delete* alist key)))))
Fair enough.

> +(define (alist-set alist key value)
> +  "Return an assosciation list like ALIST, but with KEY mapped to
> VALUE,
> +replacing any existing mapping for KEY."
> +  (acons key value (alist-delete* alist key)))
Is order relevant here?  Because we could just as well reimplement our
alist-delete* loop and cons the replacement onto the rest.  WDYT?

> +(define (jsobject-set js key value)
> +  "Return a json object like JS, but with KEY mapped to VALUE,
> replacing any
> +existing mapping for KEY."
> +  (cons '@ (match js
> +             (('@ . alist)
> +              (alist-set alist key value)))))
I think it'd be wiser to put the cons inside the match.

> +(define jsobject-set*
> +  (case-lambda
> +    "Return a json object like JS, but functionally extended by
> mapping each
> +KEY to each VALUE, replacing any existing mapping for each KEY.  The
> update
> +takes place from left to right, so later mappings overwrite earlier
> mappings
> +for the same KEY."
> +    ((js)
> +     js)
> +    ((js key value)
> +     (jsobject-set js key value))
> +    ((js . args)
> +     (cons '@ (match js
> +                (('@ . alist)
> +                 (let loop ((alist alist)
> +                            (args args))
> +                   (match args
> +                     (()
> +                     alist)
> +                     ((key value . args)
> +                      (loop (alist-set alist key value)
> +                            args))))))))))
I'm not sure if I like this "syntax".  I think I'd prefer
  (jsobject-set* obj (FIELD1 VALUE1) (FIELD2 VALUE2) ...)
with FIELD1, FIELD2 being identifiers
WDYT?
> +(define (alist-update alist key failure-result updater)
> +  "Return an assosciation list like ALIST, but with KEY mapped to
> the result
> +of applying UPDATER to the value to which KEY is mapped in ALIST. 
> When ALIST
> +does not have an existing mapping for KEY, FAILURE-RESULT is used as
> with
> +'jsobject-ref' to obtain the argument for UPDATER."
> +  ;; Often, `updater` will be a lambda expression, so making it the
> last
> +  ;; argument may help to makes the code legible, and the most
> likely
> +  ;; `failure-result` arguments are all shorter than the keyword
> +  ;; `#:failure-result`.  Plus, making `failure-result` mandatory
> helps make
> +  ;; `alist-update` consistent with `alist-update*`.
Which alist-update* are you referring to here?  Either way, the
failure-result to default argument from above applies, but we could
keyword it.
> +  (define-values (popped tail-alist)
> +    (alist-pop alist key))
> +  (acons key
> +         (updater (match popped
> +                    (#f
> +                     (if (procedure? failure-result)
> +                         (failure-result)
> +                         failure-result))
> +                    ((_ . value)
> +                     value)))
> +         tail-alist))
SRFI-71 let says hi.  Also the ordering question applies.  I'm starting
to think we should implement alist-pop, alist-set and alist-update in
terms of a single more powerful function producing three values (or
SRFI-1 span).

> +(define (jsobject-update js key failure-result updater)
> +  "Return a json object like JS, but with KEY mapped to the result
> of applying
> +UPDATER to the value to which KEY is mapped in JS.  When JS does not
> have an
> +existing mapping for KEY, FAILURE-RESULT is used as with 'jsobject-
> ref' to
> +obtain the argument for UPDATER."
> +  (cons '@ (match js
> +             (('@ . alist)
> +              (alist-update alist key failure-result updater)))))
Same default argument.  Cons inside.

> +(define jsobject-update*
> +  (case-lambda
> +    "Return a json object like JS, but functionally extended by
> replacing the
> +mapping for each KEY with the result of applying the corresponding
> UPDATER to
> +the value to which that KEY is mapped in JS---or, if no such mapping
> exists,
> +to a value based on the corresponding FAILURE-RESULT as with
> 'jsobject-ref'.
> +The update takes place from left to right, so later UPDATERs will
> receive the
> +values returned by earlier UPDATERs for the same KEY."
> +    ((js)
> +     js)
> +    ((js key failure-result updater)
> +     (jsobject-update js key failure-result updater))
> +    ((js . args)
> +     (cons '@ (match js
> +                (('@ . alist)
> +                 (let loop ((alist alist)
> +                            (args args))
> +                   (match args
> +                     (()
> +                     alist)
> +                     ((key failure-result updater . args)
> +                      (loop (alist-update alist key failure-result
> updater)
> +                            args))))))))))
Same default argument.  Cons inside.

> +(define* (jsobject-union #:key
> +                         (combine (lambda (a b) b))
> +                         (combine/key (lambda (k a b) (combine a
> b)))
> +                         #:rest json-objects)
> +  "Combine the given JSON-OBJECTS into a single json object.  The
> JSON-OBJECTS
> +are merged from left to right by adding each key/value pair of each
> object to
> +the aggregate object in turn.  When one of the JSON-OBJECTS contains
> a mapping
> +from some key KEY to a value VAL such that the aggregate object
> already
> +contains a mapping from KEY to a value VAL0, the aggregate object is
> +functionally updated to instead map KEY to the value of (COMBINE/KEY
> KEY VAL0
> +VAL).  The default COMBINE/KEY tail-calls (COMBINE VAL0 VAL), and
> the default
> +COMBINE simply returns its second argument, so, by default, mappings
> in later
> +JSON-OBJECTS supersede those in earlier ones."
> +  (match (filter (lambda (v)
> +                   (not (or (keyword? v)
> +                            (procedure? v))))
> +                 json-objects)
> +    (()
> +     '(@))
> +    (((and js0 ('@ . _)))
> +     js0)
> +    ((('@ . alist0) ('@ . alist*) ...)
> +     (cons '@ (fold (lambda (alist1 alist0)
> +                      (if (null? alist0)
> +                          alist1
> +                          (fold (lambda (k+v alist0)
> +                                  (match k+v
> +                                    ((k . v)
> +                                     (define-values (popped tail-
> alist)
> +                                       (alist-pop alist0 k))
> +                                     (match popped
> +                                       (#f
> +                                        (cons k+v tail-alist))
> +                                       ((_ . v0)
> +                                        (acons k
> +                                               (combine/key k v0 v)
> +                                               tail-alist))))))
> +                                alist0
> +                                alist1)))
> +                    alist0
> +                    alist*)))))
Same default argument.  Cons inside.
I think having a single combine function taking (k a b) would be less
confusing than having two.  Is there a rationale for the form you
chose?

> +
> +;;;
> +;;; Phases.
> +;;;
> +
>  (define (set-home . _)
>    (with-directory-excursion ".."
>      (let loop ((i 0))
> @@ -49,7 +281,7 @@ (define (set-home . _)
>  (define (module-name module)
>    (let* ((package.json (string-append module "/package.json"))
>           (package-meta (call-with-input-file package.json read-
> json)))
> -    (assoc-ref package-meta "name")))
> +    (jsobject-ref package-meta "name" #f)))
>  
>  (define (index-modules input-paths)
>    (define (list-modules directory)
> @@ -73,27 +305,24 @@ (define* (patch-dependencies #:key inputs
> #:allow-other-keys)
>  
>    (define index (index-modules (map cdr inputs)))
>  
> -  (define (resolve-dependencies package-meta meta-key)
> -    (fold (lambda (key+value acc)
> -            (match key+value
> -              ('@ acc)
> -              ((key . value) (acons key (hash-ref index key value)
> acc))))
> -          '()
> -          (or (assoc-ref package-meta meta-key) '())))
> +  (define resolve-dependencies
> +    (match-lambda
> +      (('@ . alist)
> +       (cons '@ (map (match-lambda
> +                       ((key . value)
> +                        (cons key (hash-ref index key value))))
> +                     alist)))))
>  
> -  (with-atomic-file-replacement "package.json"
> -    (lambda (in out)
> -      (let ((package-meta (read-json in)))
> -        (assoc-set! package-meta "dependencies"
> -                    (append
> -                     '(@)
> -                     (resolve-dependencies package-meta
> "dependencies")
> -                     (resolve-dependencies package-meta
> "peerDependencies")))
> -        (assoc-set! package-meta "devDependencies"
> -                    (append
> -                     '(@)
> -                     (resolve-dependencies package-meta
> "devDependencies")))
> -        (write-json package-meta out))))
> +  (with-atomic-json-file-replacement "package.json"
> +    (lambda (pkg-meta)
> +      (jsobject-update*
> +       pkg-meta
> +       "devDependencies" '(@) resolve-dependencies
> +       "dependencies" '(@) (lambda (deps)
> +                             (resolve-dependencies
> +                              (jsobject-union
> +                               (jsobject-ref pkg-meta
> "peerDependencies" '(@))
> +                               deps))))))
>    #t)
We should probably add a function to our js utils that "generates an
empty object", because '(@) is quite confusing to see in these
circumstances.  Otherwise LGTM with the aforementioned caveats. 

Cheers
Philip McGrath Dec. 31, 2021, 5:22 a.m. UTC | #3
Hi Liliana,

On 12/30/21 11:56, Liliana Marie Prikler wrote:
 > Am Donnerstag, dem 30.12.2021 um 02:38 -0500 schrieb Philip McGrath:
 >> This commit adds several utility functions for non-destructive
 >> transformation of the JSON representation used by (guix build json),
 >> particularly for purely functional update of JSON objects.  They
 >> should
 >> eventually be exported, but most are private for now to allow for
 >> more
 >> experience and consideration before commiting to the API.  The design
 >> was largely inspired by the 'racket/dict' and 'racket/hash'
 >> libraries.
 >> Liliana Marie Prikler proposed 'with-atomic-json-file-replacement'.
 > Given that this is a fair amount of procedures that you're proposing, I
 > think a new file would be appropriate.  Perhaps (guix build json-
 > utils)?  Adding that should IIUC not cause a world rebuild, so we could
 > do that on master.
 >

I agree that these functions ultimately belong in their own file, and 
I'd even had the name (guix build json-utils) in mind.

I put them in (guix build node-build-system) for now because, if they 
were in (guix build json-utils), they would have to be exported, at 
which point their API would have to be relatively stable, and I didn't 
want designing them to block, or to be rushed by, the rest of this 
series. Now, maybe consensus on the json-utils will turn out to be the 
easy part! But my high-level question is, in your view, do any of the 
points I'm about to respond to block this patch series?

On 12/30/21 13:18, Liliana Marie Prikler wrote:
> Having argued for these procedures to be moved into their own file in a
> separate mail, now it's time to bikeshed stylistic choices.
> 
> Am Donnerstag, dem 30.12.2021 um 02:38 -0500 schrieb Philip McGrath:
>> +(define (jsobject-ref js key failure-result)
>> +  "Return the value assosciated with KEY in the json object JS.  If
>> KEY is not
>> +found and FAILURE-RESULT is a procedure, it is called in tail position
>> with
>> +zero arguments.  Otherwise, FAILURE-RESULT is returned."
>> +  ;; TODO: `failure-result` should be optional, but should the default
>> +  ;; `failure-result` be #f (like `assoc-ref`), a thunk raising an
>> exception,
>> +  ;; '(@), or something else?  Keep it mandatory until we discuss and
>> decide.
>> +  (match js
>> +    (('@ . alist)
>> +     (match (assoc key alist)
>> +       (#f
>> +        (if (procedure? failure-result)
>> +            (failure-result)
>> +            failure-result))
>> +       ((_ . value)
>> +        value)))))
> We can safely replace failure-result by Guile's DEFAULT and leave error
> handling to the user.

I don't care whether we call it DEFAULT or FAILURE-RESULT.

I agree that it should not raise an exception by default. My current 
thinking is that '(@) would be a good default DEFAULT: it is useful for 
the common pattern of traversing or transforming nested objects, and, as 
you point out at the end of this email, explicitly typing #f (the other 
useful possibility) looks much more like normal Scheme than explicitly 
typing '(@).

In my experience with [1] and [2] (the purely-functional dictionary 
libraries I use most often), the special case for a procedure as DEFAULT 
is useful. I feel less strongly about it because it's relatively easy to 
work around for JSON, since you can pick some non-JSON signal value, but 
it also seems to have especially little downside for JSON, since (guix 
build json) will never have a procedure in a valid JSON object. In 
addition to raising exceptions and other control flow, it's useful for 
default values that are expensive to produce.

[1]: https://docs.racket-lang.org/reference/hashtables.html
[2]: https://docs.racket-lang.org/reference/dicts.html

> 
>> +(define (alist-pop alist key)
>> +  "Return two values: the first pair in ALIST with the given KEY in
>> its
>> +'car' (or #f, if no such pair exists) and an assosciation list like
>> (and
>> +potentially sharing storage with) ALIST, but with no entry for KEY."
>> +  (match (assoc key alist)
>> +    ;; If key isn't present, we don't need to do any allocation
>> +    (#f
>> +     (values #f alist))
>> +    (found
>> +     (values found
>> +             ;; Because we have `found`, we can find it more
>> +             ;; efficiently this time with `eq?`. We avoid using
>> +             ;; `delq` because it would copy pairs in a shared
>> +             ;; tail. We assume a sufficiently smart compiler to
>> +             ;; handle "tail recursion modulo cons" (vid. e.g. Indiana
>> +             ;; University Technical Report No. 19, Friedman & Wise
>> +             ;; 1975) at least as efficiently as a hand-written
>> +             ;; tail-recursive implementation with an accumulator.
>> +             (let loop ((alist alist))
>> +               (match alist
>> +                 ;; We know that `found` is present,
>> +                 ;; so no need to check for '()
>> +                 ((this . alist)
>> +                  (if (eq? this found)
>> +                      alist
>> +                      (cons this (loop alist))))))))))
> I think this can be more efficiently be done in a "single" loop.
> 
>    (let loop ((rest alist)
>               (previous '()))
>      (match rest
>        (() (values #f alist))
>        ((first . rest)
>         (if (eq? (car first) key)
>             (values first (reverse! previous rest))
>             (loop rest (cons first previous))))))
> 

I'll admit to a Racket bias, but, having just eliminated the use of 
'assoc-set!', I'm loathe to start mutating pairs (even correctly). To 
quote a bit from the SRFI-1 spec for 'append-reverse!', "note that this 
pattern of iterative computation followed by a reverse can frequently be 
rewritten as a recursion, dispensing with the reverse and append-reverse 
steps, and shifting temporary, intermediate storage from the heap to the 
stack, which is typically a win for reasons of cache locality and eager 
storage reclamation." (See how 'set-cdr!' can crash safe Chez Scheme! 
<https://github.com/cisco/ChezScheme/issues/599>)

IIUC, using SRFI-1's 'span' would lead to the same situation.

> Also, I don't think your version is tail-recursive.  (loop alist) is
> not in tail position from what I can tell.

Yes, "tail recursion modulo cons" refers to a compiler optimization for 
functions which are _not_ tail recursive. For full details, see the 
Friedman & Wise 1975 tech report I cited at 
<https://legacy.cs.indiana.edu/ftp/techreports/TR19.pdf> (or various 
other articles), but, as briefly as I can: The optimization rests on the 
observation that many recursive functions, like the classic definition 
of 'map':

     (define (map f lst)
       (match lst
         (()
          '())
         ((this . lst)
          (cons (f this)
                (map f lst)))))

are nearly tail-recursive, and the only real work remaining to be done 
in the continuation of the recursive call is to fill in the cdr of the 
pair. Thus, a compiler can safely transform this code into a truly 
tail-recursive implementation:

     (define (map f lst)
       (match lst
         (()
          '())
         ((this . lst)
          (define ret (list (f this)))
          (let loop ((dest ret)
                     (lst lst))
            (match lst
              ((this . lst)
               (define new (list (f this)))
               (set-cdr! dest new)
               (loop new lst))
              (()
               ret))))))

Unlike the Proper Implementation of Tail Calls (so-called "tail-call 
optimization"), handling "tail recursion modulo cons" truly is an 
optimization: it does not change the space complexity of the function. 
But it can allow the compiler to generate whatever code it thinks will 
work best with its collector/allocator and continuation/"call stack" 
implementation.

(The optimizations applies to constructors in general, not just 'cons', 
and a compiler can safely apply it to values that are immutable from the 
perspective of the source language.)

> 
>> +;; Sadly, Guile's implementation of (@ (srfi srfi-1) alist-delete)
>> +;; performs unnecessary allocation, e.g. this currently evaluates to
>> #f:
>> +;;
>> +;;     (let ((alist `(("a" . 1)("b" . 2)("c" . 3))))
>> +;;       (eq? alist (alist-delete "x" alist)))
>> +;;
>> +;; These functions generally choose to allocate a new outer pair
>> (with the '@
>> +;; tag), even though in unusual cases the resulting object might not
>> have
>> +;; changed, for the sake of simplicity and to avoid retaining a
>> reference to
>> +;; the original alist longer than necessary. But that is O(1)
>> allocation that
>> +;; could only rarely be avoided: `alist-delete` would allocate O(n)
>> pairs,
>> +;; which would only be necessary in the worst case.
>> +(define (alist-delete* alist key)
>> +  "Return an assosciation list like (and potentially sharing storage
>> with)
>> +ALIST, but with no entry for KEY."
>> +  (define-values (_popped remaining)
>> +    (alist-pop alist key))
>> +  remaining)
> That's a pretty long comment around something that could be done with
> call-with-values or SRFI-71 let.  I think one of these two should be
> preferred.
> 
> Note that both our versions of alist-pop only pop the first key (as
> they should).  This means that alist-delete* should really be called
> alist-delete-1 as in "remove the first pair in ALIST belonging to KEY".
> For the larger JSON handling below, this makes no difference however.

Here I was using '*' to mean "a slightly altered version of", as with 
'letrec' and 'letrec*', but, yes, since the other functions defined here 
use '*' to mean "zero or more times", the name is confusing: I think I'd 
just call it 'alist-delete' and not import (srfi srfi-1)'s version.

The comment may be unnecessarily long ... the essence of what I was 
trying to explain is that, in all of these implementations, I've tried 
to avoid unnecessary allocation. Being able to rely on 'alist-delete', 
and more generally 'alist-pop', not to needlessly copy the "spine" of 
the list lets later functions use them unconditionally.

Why would you prefer 'call-with-values' or SRFI-71 over 'define-values'? 
The style guide against which I'm used to working [3] generally prefers 
internal definitions, to avoid rightward drift.

[3]: 
https://docs.racket-lang.org/style/Choosing_the_Right_Construct.html#%28part._.Definitions%29

>> +(define (alist-set alist key value)
>> +  "Return an assosciation list like ALIST, but with KEY mapped to
>> VALUE,
>> +replacing any existing mapping for KEY."
>> +  (acons key value (alist-delete* alist key)))
> Is order relevant here?  Because we could just as well reimplement our
> alist-delete* loop and cons the replacement onto the rest.  WDYT?

Relying on order for JSON objects is non-interoperable, per RFC 8259 § 
4. I'm not intending for these alist procedures to be exported, so I'm 
not trying to handle any more general case than that, as I explain in 
the comments at the top of the file.

I'm not sure what the advantage would be to reimplementing the 
'alist-delete' loop here.

> 
>> +(define (jsobject-set js key value)
>> +  "Return a json object like JS, but with KEY mapped to VALUE,
>> replacing any
>> +existing mapping for KEY."
>> +  (cons '@ (match js
>> +             (('@ . alist)
>> +              (alist-set alist key value)))))
> I think it'd be wiser to put the cons inside the match.
> 

I don't care very much either way.

>> +(define jsobject-set*
>> +  (case-lambda
>> +    "Return a json object like JS, but functionally extended by
>> mapping each
>> +KEY to each VALUE, replacing any existing mapping for each KEY.  The
>> update
>> +takes place from left to right, so later mappings overwrite earlier
>> mappings
>> +for the same KEY."
>> +    ((js)
>> +     js)
>> +    ((js key value)
>> +     (jsobject-set js key value))
>> +    ((js . args)
>> +     (cons '@ (match js
>> +                (('@ . alist)
>> +                 (let loop ((alist alist)
>> +                            (args args))
>> +                   (match args
>> +                     (()
>> +                     alist)
>> +                     ((key value . args)
>> +                      (loop (alist-set alist key value)
>> +                            args))))))))))
> I'm not sure if I like this "syntax".  I think I'd prefer
>    (jsobject-set* obj (FIELD1 VALUE1) (FIELD2 VALUE2) ...)
> with FIELD1, FIELD2 being identifiers
> WDYT?

So you would make 'jsobject-set*' a macro? When you say, "with FIELD1, 
FIELD2 being identifiers", do you mean that the macro should convert 
them to strings at compile-time? While, if I were designing a JSON 
representation, I'd much prefer to use symbols for the object keys, I 
think it would be confusing to use strings everywhere else but magic 
symbols here.

I based this function on 'hash-set*' [4], 'dict-set*' [5], and numerous 
similar functions in the Racket world, so I have a high degree of 
confidence in the usability of the interface.

[4]: 
https://docs.racket-lang.org/reference/hashtables.html#%28def._%28%28lib._racket%2Fprivate%2Fbase..rkt%29._hash-set%2A%29%29
[5]: 
https://docs.racket-lang.org/reference/dicts.html#%28def._%28%28lib._racket%2Fdict..rkt%29._dict-set%2A%29%29


>> +(define (alist-update alist key failure-result updater)
>> +  "Return an assosciation list like ALIST, but with KEY mapped to
>> the result
>> +of applying UPDATER to the value to which KEY is mapped in ALIST.
>> When ALIST
>> +does not have an existing mapping for KEY, FAILURE-RESULT is used as
>> with
>> +'jsobject-ref' to obtain the argument for UPDATER."
>> +  ;; Often, `updater` will be a lambda expression, so making it the
>> last
>> +  ;; argument may help to makes the code legible, and the most
>> likely
>> +  ;; `failure-result` arguments are all shorter than the keyword
>> +  ;; `#:failure-result`.  Plus, making `failure-result` mandatory
>> helps make
>> +  ;; `alist-update` consistent with `alist-update*`.
> Which alist-update* are you referring to here?  Either way, the
> failure-result to default argument from above applies, but we could
> keyword it.

Ah, I guess read that as, "Plus, making 'default' mandatory helps make 
'jsobject-update' consistent with 'jsobject-update*'."

>> +  (define-values (popped tail-alist)
>> +    (alist-pop alist key))
>> +  (acons key
>> +         (updater (match popped
>> +                    (#f
>> +                     (if (procedure? failure-result)
>> +                         (failure-result)
>> +                         failure-result))
>> +                    ((_ . value)
>> +                     value)))
>> +         tail-alist))
> SRFI-71 let says hi.  Also the ordering question applies.  I'm starting
> to think we should implement alist-pop, alist-set and alist-update in
> terms of a single more powerful function producing three values (or
> SRFI-1 span).

Same question again re 'define-values'.

My intent in creating 'alist-pop' was to have a primitive that would 
work for both 'alist-update' and 'alist-delete', and thereby 
'alist-set'. Returning the prefix and the tail separately would involve 
either extra allocation or mutating pairs. Since order never matters in 
this context, why pay that price?

>> +(define* (jsobject-union #:key
>> +                         (combine (lambda (a b) b))
>> +                         (combine/key (lambda (k a b) (combine a
>> b)))
>> +                         #:rest json-objects)
>> +  "Combine the given JSON-OBJECTS into a single json object.  The
>> JSON-OBJECTS
>> +are merged from left to right by adding each key/value pair of each
>> object to
>> +the aggregate object in turn.  When one of the JSON-OBJECTS contains
>> a mapping
>> +from some key KEY to a value VAL such that the aggregate object
>> already
>> +contains a mapping from KEY to a value VAL0, the aggregate object is
>> +functionally updated to instead map KEY to the value of (COMBINE/KEY
>> KEY VAL0
>> +VAL).  The default COMBINE/KEY tail-calls (COMBINE VAL0 VAL), and
>> the default
>> +COMBINE simply returns its second argument, so, by default, mappings
>> in later
>> +JSON-OBJECTS supersede those in earlier ones."
>> +  (match (filter (lambda (v)
>> +                   (not (or (keyword? v)
>> +                            (procedure? v))))
>> +                 json-objects)
>> +    (()
>> +     '(@))
>> +    (((and js0 ('@ . _)))
>> +     js0)
>> +    ((('@ . alist0) ('@ . alist*) ...)
>> +     (cons '@ (fold (lambda (alist1 alist0)
>> +                      (if (null? alist0)
>> +                          alist1
>> +                          (fold (lambda (k+v alist0)
>> +                                  (match k+v
>> +                                    ((k . v)
>> +                                     (define-values (popped tail-
>> alist)
>> +                                       (alist-pop alist0 k))
>> +                                     (match popped
>> +                                       (#f
>> +                                        (cons k+v tail-alist))
>> +                                       ((_ . v0)
>> +                                        (acons k
>> +                                               (combine/key k v0 v)
>> +                                               tail-alist))))))
>> +                                alist0
>> +                                alist1)))
>> +                    alist0
>> +                    alist*)))))
> Same default argument.  Cons inside.
> I think having a single combine function taking (k a b) would be less
> confusing than having two.  Is there a rationale for the form you
> chose?

I based this function in particular on 'hash-union' from 'racket/hash' 
[6], which uses these keywords. (But in 'hash-union', collisions trigger 
an exception by default, and it requires at least one argument, because 
otherwise it would be unclear what key-comparison function the result 
should use.)

Having '#:combine' in addition to '#:combine/key' is ultimately just a 
convenience, but it is quite a nice convenience in practice. It is quite 
rare, in my experience, for a '#:combine' function to actually depend on 
the key: it might depend on the type of the value, but, very often, it 
unconditionally applies to all keys. Using '#:combine' is particularly 
nice when using a combination function that already exists, like 
'append' or '+'.

[6]: 
https://docs.racket-lang.org/reference/hashtables.html#%28def._%28%28lib._racket%2Fhash..rkt%29._hash-union%29%29

>> +  (with-atomic-json-file-replacement "package.json"
>> +    (lambda (pkg-meta)
>> +      (jsobject-update*
>> +       pkg-meta
>> +       "devDependencies" '(@) resolve-dependencies
>> +       "dependencies" '(@) (lambda (deps)
>> +                             (resolve-dependencies
>> +                              (jsobject-union
>> +                               (jsobject-ref pkg-meta
>> "peerDependencies" '(@))
>> +                               deps))))))
>>     #t)
> We should probably add a function to our js utils that "generates an
> empty object", because '(@) is quite confusing to see in these
> circumstances.  Otherwise LGTM with the aforementioned caveats.

I'm not sure what to call it: it would have to be short, or people (me, 
at least) might end up writing '(@) anyway. Also, IIUC Guile doesn't 
actually prevent you from mutating quoted constant pairs, so I function 
would have to allocate a fresh pair to be robust.

It's a somewhat odd idea, but how about this?

     (define-syntax |{}| (identifier-syntax '(@)))

It's as short as '(@), it looks like the JSON notation for the empty 
object, and IIUC people could only use it to mess up occurrences of '(@) 
within the same compilation unit, which we can't stop them from doing 
anyway.

Alternatively, if we give up the thunk special case for 'default' 
values, we could do:

     (define jsobject-update
       (case-lambda
         ((js key updater)
          (jsobject-update js key '(@) updater))
         ((js key default updater)
          ...)))
     (define jsobject-update*
       (case-lambda
         ...
         ((js . args)
          (match js
            (('@ . alist)
             (cons '@ (let loop ((alist alist)
                                 (args args))
                        (match args
                          (()
                           alist)
                          ((key (? procedure? updater) . args)
                           (loop (alist-update alist key '(@) updater)
                                 args))
                          ((key default updater . args)
                           (loop (alist-update alist key '(@) updater)
                                 args))))))))))

-Philip
Liliana Marie Prikler Dec. 31, 2021, 10:18 a.m. UTC | #4
Hi,

Am Freitag, dem 31.12.2021 um 00:22 -0500 schrieb Philip McGrath:
> I agree that these functions ultimately belong in their own file, and
> I'd even had the name (guix build json-utils) in mind.
> 
> I put them in (guix build node-build-system) for now because, if they
> were in (guix build json-utils), they would have to be exported, at 
> which point their API would have to be relatively stable, and I
> didn't  want designing them to block, or to be rushed by, the rest of
> this series. Now, maybe consensus on the json-utils will turn out to
> be the easy part! But my high-level question is, in your view, do any
> of the points I'm about to respond to block this patch series?
We can certainly do them inside node-build-system, but I'd much prefer
if they took up less vertical real-estate.  I hope we can agree on
that.

> On 12/30/21 13:18, Liliana Marie Prikler wrote:
> > Having argued for these procedures to be moved into their own file
> > in a separate mail, now it's time to bikeshed stylistic choices.
> > 
> > Am Donnerstag, dem 30.12.2021 um 02:38 -0500 schrieb Philip
> > McGrath:
> > > +(define (jsobject-ref js key failure-result)
> > > +  "Return the value assosciated with KEY in the json object JS. 
> > > If
> > > KEY is not
> > > +found and FAILURE-RESULT is a procedure, it is called in tail
> > > position
> > > with
> > > +zero arguments.  Otherwise, FAILURE-RESULT is returned."
> > > +  ;; TODO: `failure-result` should be optional, but should the
> > > default
> > > +  ;; `failure-result` be #f (like `assoc-ref`), a thunk raising
> > > an
> > > exception,
> > > +  ;; '(@), or something else?  Keep it mandatory until we
> > > discuss and
> > > decide.
> > > +  (match js
> > > +    (('@ . alist)
> > > +     (match (assoc key alist)
> > > +       (#f
> > > +        (if (procedure? failure-result)
> > > +            (failure-result)
> > > +            failure-result))
> > > +       ((_ . value)
> > > +        value)))))
> > We can safely replace failure-result by Guile's DEFAULT and leave
> > error handling to the user.
> 
> I don't care whether we call it DEFAULT or FAILURE-RESULT.
This is not just a question of naming, but also it doesn't really make
sense to call a thunk here.  Just take DEFAULT as a value.

> I agree that it should not raise an exception by default. My current 
> thinking is that '(@) would be a good default DEFAULT: it is useful
> for the common pattern of traversing or transforming nested objects,
> and, as you point out at the end of this email, explicitly typing #f
> (the other useful possibility) looks much more like normal Scheme
> than explicitly typing '(@).
The question here is whether '(@) is a good meaningful default or
whether we just want a different way of writing it, e.g. empty-object
or (empty-object).

> In my experience with [1] and [2] (the purely-functional dictionary 
> libraries I use most often), the special case for a procedure as
> DEFAULT is useful. I feel less strongly about it because it's
> relatively easy to work around for JSON, since you can pick some non-
> JSON signal value, but it also seems to have especially little
> downside for JSON, since (guix build json) will never have a
> procedure in a valid JSON object. In addition to raising exceptions
> and other control flow, it's useful for default values that are
> expensive to produce.
That's all nice and dandy, but given that a valued DEFAULT is simpler
both in terms of interface and implementation, I think it is The Right
Thing here.  For error handling purposes, the Guile way of doing things
is to produce a value outside of the expected range (such as #f,
*unspecified* or a locally defined gensym) and check against it.

> [1]: https://docs.racket-lang.org/reference/hashtables.html
> [2]: https://docs.racket-lang.org/reference/dicts.html
> 
> > 
> > > +(define (alist-pop alist key)
> > > +  "Return two values: the first pair in ALIST with the given KEY
> > > in its
> > > +'car' (or #f, if no such pair exists) and an assosciation list
> > > like (and
> > > +potentially sharing storage with) ALIST, but with no entry for
> > > KEY."
> > > +  (match (assoc key alist)
> > > +    ;; If key isn't present, we don't need to do any allocation
> > > +    (#f
> > > +     (values #f alist))
> > > +    (found
> > > +     (values found
> > > +             ;; Because we have `found`, we can find it more
> > > +             ;; efficiently this time with `eq?`. We avoid using
> > > +             ;; `delq` because it would copy pairs in a shared
> > > +             ;; tail. We assume a sufficiently smart compiler to
> > > +             ;; handle "tail recursion modulo cons" (vid. e.g.
> > > Indiana
> > > +             ;; University Technical Report No. 19, Friedman &
> > > Wise
> > > +             ;; 1975) at least as efficiently as a hand-written
> > > +             ;; tail-recursive implementation with an
> > > accumulator.
> > > +             (let loop ((alist alist))
> > > +               (match alist
> > > +                 ;; We know that `found` is present,
> > > +                 ;; so no need to check for '()
> > > +                 ((this . alist)
> > > +                  (if (eq? this found)
> > > +                      alist
> > > +                      (cons this (loop alist))))))))))
> > I think this can be more efficiently be done in a "single" loop.
> > 
> >    (let loop ((rest alist)
> >               (previous '()))
> >      (match rest
> >        (() (values #f alist))
> >        ((first . rest)
> >         (if (eq? (car first) key)
> >             (values first (reverse! previous rest))
> >             (loop rest (cons first previous))))))
> > 
> 
> I'll admit to a Racket bias, but, having just eliminated the use of 
> 'assoc-set!', I'm loathe to start mutating pairs (even correctly). To
> quote a bit from the SRFI-1 spec for 'append-reverse!', "note that
> this pattern of iterative computation followed by a reverse can
> frequently be rewritten as a recursion, dispensing with the reverse
> and append-reverse steps, and shifting temporary, intermediate
> storage from the heap to the stack, which is typically a win for
> reasons of cache locality and eager storage reclamation." (See how
> 'set-cdr!' can crash safe Chez Scheme! 
> <https://github.com/cisco/ChezScheme/issues/599>)
> 
> IIUC, using SRFI-1's 'span' would lead to the same situation.
For the record, we can use the non-destructive append and reverse here
at the expense of more copying.  If done in terms of SRFI-1 span, we
would not need reverse as far as I understand.

> > Also, I don't think your version is tail-recursive.  (loop alist)
> > is not in tail position from what I can tell.
> 
> Yes, "tail recursion modulo cons" refers to a compiler optimization
> for functions which are _not_ tail recursive. For full details, see
> the Friedman & Wise 1975 tech report I cited at 
> <https://legacy.cs.indiana.edu/ftp/techreports/TR19.pdf> (or various 
> other articles), but, as briefly as I can: The optimization rests on
> the observation that many recursive functions, like the classic
> definition of 'map':
> 
>      (define (map f lst)
>        (match lst
>          (()
>           '())
>          ((this . lst)
>           (cons (f this)
>                 (map f lst)))))
> 
> are nearly tail-recursive, and the only real work remaining to be
> done in the continuation of the recursive call is to fill in the cdr
> of the pair. Thus, a compiler can safely transform this code into a
> truly tail-recursive implementation:
> 
>      (define (map f lst)
>        (match lst
>          (()
>           '())
>          ((this . lst)
>           (define ret (list (f this)))
>           (let loop ((dest ret)
>                      (lst lst))
>             (match lst
>               ((this . lst)
>                (define new (list (f this)))
>                (set-cdr! dest new)
>                (loop new lst))
>               (()
>                ret))))))
> 
> Unlike the Proper Implementation of Tail Calls (so-called "tail-call 
> optimization"), handling "tail recursion modulo cons" truly is an 
> optimization: it does not change the space complexity of the
> function. But it can allow the compiler to generate whatever code it
> thinks will work best with its collector/allocator and
> continuation/"call stack" implementation.
> 
> (The optimizations applies to constructors in general, not just
> 'cons', and a compiler can safely apply it to values that are
> immutable from the perspective of the source language.)
I'm not aware to which extent Guile implements tail recursion modulo
cons and I'd argue neither are you until you dig down into disassembly.
I think it's better here to avoid patterns from Racket that would feel
foreign to Guilers, particularly if you have to explain them with
reference to a paper (we already get hate for referring to Wingo's fold
for XML handling).

In principle, what you're supposing is that a sufficiently smart
compiler could rewrite

  (let ((before after (span PRED mylist))) (append before after))

to (list-copy mylist), which as far as I'm aware Guile currently
doesn't.  It could be argued that it would start doing so once I cast
some magic incantations, but I wouldn't count on it without reading the
disassembly.

> > That's a pretty long comment around something that could be done
> > with call-with-values or SRFI-71 let.  I think one of these two
> > should be preferred.
> > 
> > Note that both our versions of alist-pop only pop the first key (as
> > they should).  This means that alist-delete* should really be
> > called alist-delete-1 as in "remove the first pair in ALIST
> > belonging to KEY".
> > For the larger JSON handling below, this makes no difference
> > however.
> 
> Here I was using '*' to mean "a slightly altered version of", as with
> 'letrec' and 'letrec*', but, yes, since the other functions defined
> here use '*' to mean "zero or more times", the name is confusing: I
> think I'd just call it 'alist-delete' and not import (srfi srfi-1)'s
> version.
That's not the issue here, the issue is that the behaviour also differs
from alist-delete!

> The comment may be unnecessarily long ... the essence of what I was 
> trying to explain is that, in all of these implementations, I've
> tried to avoid unnecessary allocation. Being able to rely on
> 'alist-delete', and more generally 'alist-pop', not to needlessly
> copy the "spine" of the list lets later functions use them
> unconditionally.
Which again would be simpler if we used SRFI-1 span or another
primitive that produced (head item spine) as three outputs.  WDYT?

> Why would you prefer 'call-with-values' or SRFI-71 over 'define-
> values'?  The style guide against which I'm used to working [3]
> generally prefers internal definitions, to avoid rightward drift.
> 
> [3]: 
> https://docs.racket-lang.org/style/Choosing_the_Right_Construct.html#%28part._.Definitions%29
Again, that's a racketism.  To avoid rightward drift, we either
carefully choose where to indent or simplify our definitions to no
longer drift.  A single let in a function spanning three lines is
hardly an issue w.r.t. rightward drift.
+(define (alist-set alist key value)

> > Is order relevant here?  Because we could just as well reimplement
> > our alist-delete* loop and cons the replacement onto the rest. 
> > WDYT?
> 
> Relying on order for JSON objects is non-interoperable, per RFC 8259
> §4. I'm not intending for these alist procedures to be exported, so
> I'm not trying to handle any more general case than that, as I
> explain in the comments at the top of the file.
> 
> I'm not sure what the advantage would be to reimplementing the 
> 'alist-delete' loop here.
Fair enough, the question was however not so much what is required per
RFC, but rather if there is a natural feel of order to package.json
that we ought not disturb.  Particularly, putting dependencies before
name and version could be confusing to whoever needs to debug a delete-
dependencies phase gone wrong.

> So you would make 'jsobject-set*' a macro? When you say, "with
> FIELD1, FIELD2 being identifiers", do you mean that the macro should
> convert them to strings at compile-time? 
Yes to both.

> While, if I were designing a JSON representation, I'd much prefer to
> use symbols for the object keys, I think it would be confusing to use
> strings everywhere else but magic symbols here.
I don't think I agree here.  Assuming that jsobject-set* is a primitive
we need at the moment to be defined (rather than the more generic
jsobject-update*), I think using identifiers for JSON keys in our
rewriters would be a good abstraction.  That being said, this is not
necessarily a blocker, just a weird interface imo.
> 

> > Which alist-update* are you referring to here?  Either way, the
> > failure-result to default argument from above applies, but we could
> > keyword it.
> 
> Ah, I guess read that as, "Plus, making 'default' mandatory helps
> make 'jsobject-update' consistent with 'jsobject-update*'."
Fair enough, it's okay to admit one's typos :)
> > 

> > SRFI-71 let says hi.  Also the ordering question applies.  I'm
> > starting to think we should implement alist-pop, alist-set and
> > alist-update in terms of a single more powerful function producing
> > three values (or SRFI-1 span).
> 
> Same question again re 'define-values'.
Same answer.

> My intent in creating 'alist-pop' was to have a primitive that would 
> work for both 'alist-update' and 'alist-delete', and thereby 
> 'alist-set'. Returning the prefix and the tail separately would
> involve either extra allocation or mutating pairs.  Since order never
> matters in this context, why pay that price?
Again, our consumers are not just machines, but also human readers who
might value that order.  It is nice that you're paying attention to
allocated objects, but I think you are blind to some allocations that
you are accustomed to being reasoned away by Racket.
> 

> > Same default argument.  Cons inside.
> > I think having a single combine function taking (k a b) would be
> > less confusing than having two.  Is there a rationale for the form
> > you chose?
> 
> I based this function in particular on 'hash-union' from
> 'racket/hash' [6], which uses these keywords. (But in 'hash-union',
> collisions trigger an exception by default, and it requires at least
> one argument, because otherwise it would be unclear what key-
> comparison function the result should use.)
> 
> Having '#:combine' in addition to '#:combine/key' is ultimately just
> a convenience, but it is quite a nice convenience in practice. It is
> quite rare, in my experience, for a '#:combine' function to actually
> depend on the key: it might depend on the type of the value, but,
> very often, it  unconditionally applies to all keys. Using
> '#:combine' is particularly nice when using a combination function
> that already exists, like 'append' or '+'.
Fair enough, but what about having #:combine be your #:combine/key and
#:default-combine be your #:combine in that keyword soup?  It takes up
more horizontal real-estate (not that we wouldn't write those
vertically anyway).

Alternatively, this would be a good place to use procedure generators
for once.  (ignore-key combinator) takes a procedure that takes two
arguments (such as 'append' or '+') and produces one that takes three
and ignores the initial key.  This would also help making the interface
and implementation simpler: (json-union combinator obj ...)
> > 

> > > +  (with-atomic-json-file-replacement "package.json"
> > > +    (lambda (pkg-meta)
> > > +      (jsobject-update*
> > > +       pkg-meta
> > > +       "devDependencies" '(@) resolve-dependencies
> > > +       "dependencies" '(@) (lambda (deps)
> > > +                             (resolve-dependencies
> > > +                              (jsobject-union
> > > +                               (jsobject-ref pkg-meta
> > > "peerDependencies" '(@))
> > > +                               deps))))))
> > >     #t)
> > We should probably add a function to our js utils that "generates
> > anempty object", because '(@) is quite confusing to see in these
> > circumstances.  Otherwise LGTM with the aforementioned caveats.
> 
> I'm not sure what to call it: it would have to be short, or people
> (me, at least) might end up writing '(@) anyway.
'none', 'epsilon', '@psilon' would be nice "short" ways of writing
"empty object".

> Also, IIUC Guile doesn't actually prevent you from mutating quoted
> constant pairs, so a function would have to allocate a fresh pair to
> be robust.
That is correct, so we do have to be careful with our other primitives
here.

> It's a somewhat odd idea, but how about this?
> 
>      (define-syntax |{}| (identifier-syntax '(@)))
We can currently do without the bar escapes, but I'd prefer not to do
either, since it'd appear as similar line noise to '(@).  Having a
readable name is in my opinion to be preferred.  That being said,
identifier-syntax is the right direction in general imo.

> Alternatively, if we give up the thunk special case for 'default' 
> values, we could do:
> 
>      (define jsobject-update
>        (case-lambda
>          ((js key updater)
>           (jsobject-update js key '(@) updater))
>          ((js key default updater)
>           ...)))
> [...]
For the record, I think the only JSON primitive we need here is

(rewrite-json-file "package.json"
  ((dependencies) [do stuff with dependencies bound])
  ((devDependencies) [do stuff with devDependencies bound])
  [...])

wherein we can convert dependencies and devDependencies to and from
alists using alist->json-object and json-object->alist.  The rest can
be done using [a]list stuff from Guile core and SRFI-1 with one or two
amendments of our own.  WDYT?

The delete-dependencies phase would then be defined as 
  (define (delete-dependencies . etc)
    (define (%delete-dependencies (obj)
              (let ((alist (json-object->alist obj)))
                (alist->json [actually delete the dependencies]))))
    (rewrite-json-file "package.json"
      ((dependencies) (%delete-dependencies dependencies))
      ((devDependencies) (%delete-dependencies devDependencies))))

Pardon me not writing out etc, but I think you get the idea.

As for what happens if a value is unbound, we could make it 
  ((key [default]) code)
so imagine writing (dependencies none) instead of (dependencies) for
safety.

Cheers
Philip McGrath Jan. 8, 2022, 4:13 a.m. UTC | #5
Hi,

On 12/31/21 05:18, Liliana Marie Prikler wrote:
> Am Freitag, dem 31.12.2021 um 00:22 -0500 schrieb Philip McGrath:
>>>> +(define (alist-pop alist key)
>>>> +  "Return two values: the first pair in ALIST with the given KEY
>>>> in its
>>>> +'car' (or #f, if no such pair exists) and an assosciation list
>>>> like (and
>>>> +potentially sharing storage with) ALIST, but with no entry for
>>>> KEY."
>>>> +  (match (assoc key alist)
>>>> +    ;; If key isn't present, we don't need to do any allocation
>>>> +    (#f
>>>> +     (values #f alist))
>>>> +    (found
>>>> +     (values found
>>>> +             ;; Because we have `found`, we can find it more
>>>> +             ;; efficiently this time with `eq?`. We avoid using
>>>> +             ;; `delq` because it would copy pairs in a shared
>>>> +             ;; tail. We assume a sufficiently smart compiler to
>>>> +             ;; handle "tail recursion modulo cons" (vid. e.g.
>>>> Indiana
>>>> +             ;; University Technical Report No. 19, Friedman &
>>>> Wise
>>>> +             ;; 1975) at least as efficiently as a hand-written
>>>> +             ;; tail-recursive implementation with an
>>>> accumulator.
>>>> +             (let loop ((alist alist))
>>>> +               (match alist
>>>> +                 ;; We know that `found` is present,
>>>> +                 ;; so no need to check for '()
>>>> +                 ((this . alist)
>>>> +                  (if (eq? this found)
>>>> +                      alist
>>>> +                      (cons this (loop alist))))))))))
>>> I think this can be more efficiently be done in a "single" loop.
>>>
>>>     (let loop ((rest alist)
>>>                (previous '()))
>>>       (match rest
>>>         (() (values #f alist))
>>>         ((first . rest)
>>>          (if (eq? (car first) key)
>>>              (values first (reverse! previous rest))
>>>              (loop rest (cons first previous))))))
>>>
>>
>> I'll admit to a Racket bias, but, having just eliminated the use of
>> 'assoc-set!', I'm loathe to start mutating pairs (even correctly). To
>> quote a bit from the SRFI-1 spec for 'append-reverse!', "note that
>> this pattern of iterative computation followed by a reverse can
>> frequently be rewritten as a recursion, dispensing with the reverse
>> and append-reverse steps, and shifting temporary, intermediate
>> storage from the heap to the stack, which is typically a win for
>> reasons of cache locality and eager storage reclamation." (See how
>> 'set-cdr!' can crash safe Chez Scheme!
>> <https://github.com/cisco/ChezScheme/issues/599>)
>>
>> IIUC, using SRFI-1's 'span' would lead to the same situation.
> For the record, we can use the non-destructive append and reverse here
> at the expense of more copying.  If done in terms of SRFI-1 span, we
> would not need reverse as far as I understand.
> 
>>> Also, I don't think your version is tail-recursive.  (loop alist)
>>> is not in tail position from what I can tell.
>>
>> Yes, "tail recursion modulo cons" refers to a compiler optimization
>> for functions which are _not_ tail recursive. For full details, see
>> the Friedman & Wise 1975 tech report I cited at
>> <https://legacy.cs.indiana.edu/ftp/techreports/TR19.pdf> (or various
>> other articles), but, as briefly as I can: The optimization rests on
>> the observation that many recursive functions, like the classic
>> definition of 'map':
>>
>>       (define (map f lst)
>>         (match lst
>>           (()
>>            '())
>>           ((this . lst)
>>            (cons (f this)
>>                  (map f lst)))))
>>
>> are nearly tail-recursive, and the only real work remaining to be
>> done in the continuation of the recursive call is to fill in the cdr
>> of the pair. Thus, a compiler can safely transform this code into a
>> truly tail-recursive implementation:
>>
>>       (define (map f lst)
>>         (match lst
>>           (()
>>            '())
>>           ((this . lst)
>>            (define ret (list (f this)))
>>            (let loop ((dest ret)
>>                       (lst lst))
>>              (match lst
>>                ((this . lst)
>>                 (define new (list (f this)))
>>                 (set-cdr! dest new)
>>                 (loop new lst))
>>                (()
>>                 ret))))))
>>
>> Unlike the Proper Implementation of Tail Calls (so-called "tail-call
>> optimization"), handling "tail recursion modulo cons" truly is an
>> optimization: it does not change the space complexity of the
>> function. But it can allow the compiler to generate whatever code it
>> thinks will work best with its collector/allocator and
>> continuation/"call stack" implementation.
>>
>> (The optimizations applies to constructors in general, not just
>> 'cons', and a compiler can safely apply it to values that are
>> immutable from the perspective of the source language.)
> I'm not aware to which extent Guile implements tail recursion modulo
> cons and I'd argue neither are you until you dig down into disassembly.
> I think it's better here to avoid patterns from Racket that would feel
> foreign to Guilers, particularly if you have to explain them with
> reference to a paper (we already get hate for referring to Wingo's fold
> for XML handling).

In a sense, "tail recursion modulo cons" was a red herring here. The 
essential requirement for implementing 'alist-pop' or 'map' as I did is 
that the language implementation be "safe for space", i.e. not have 
"stack overflow"s: Guile meets that requirement. [1]

In a safe-for-space language, the naturally recursive implementations 
and the implementations with explicit, non-destructive accumulators both 
allocate O(n) temporary storage. The difference is that the explicit 
accumulator versions allocate temporary pairs on the heap, while the 
naturally recursive version allocates its temporary space on the "stack" 
(i.e. additional frames of the (non-reified) continuation), which is 
generally, and specifically for Guile per [1], much better (though a 
sufficiently smart generational garbage collector with bump-pointer 
allocation in the nursery could mitigate the difference somewhat).

All of that relies just on the guarantees of Guile as a safe-for-space 
language. The optimization in "tail recursion modulo cons" is that a 
compiler could, if it chose to expend its effort this way, make the 
naturally recursive implementations work without the O(n) temporary 
"stack" storage by transforming transforming the non-tail recursion into 
tail recursion. In essence, it could achieve a similar effect to an 
explicit accumulator plus 'reverse!' without the many downsides (some of 
which [1] discusses).

But the naturally recursive implementation is preferable even if the 
optimization does not apply.

> 
> In principle, what you're supposing is that a sufficiently smart
> compiler could rewrite
> 
>    (let ((before after (span PRED mylist))) (append before after))
> 
> to (list-copy mylist), which as far as I'm aware Guile currently
> doesn't.  It could be argued that it would start doing so once I cast
> some magic incantations, but I wouldn't count on it without reading the
> disassembly.

In some sense that's true, but your example would require a lot of 
interprocedural analysis, not just a directly visible pattern with 
well-known primitives using analysis that has been well known since the 
'70s. But, again, the optimization isn't really relevant.
>>> Is order relevant here?  Because we could just as well reimplement
>>> our alist-delete* loop and cons the replacement onto the rest.
>>> WDYT?
>>
>> Relying on order for JSON objects is non-interoperable, per RFC 8259
>> §4. I'm not intending for these alist procedures to be exported, so
>> I'm not trying to handle any more general case than that, as I
>> explain in the comments at the top of the file.
>>
>> I'm not sure what the advantage would be to reimplementing the
>> 'alist-delete' loop here.
> Fair enough, the question was however not so much what is required per
> RFC, but rather if there is a natural feel of order to package.json
> that we ought not disturb.  Particularly, putting dependencies before
> name and version could be confusing to whoever needs to debug a delete-
> dependencies phase gone wrong.

I haven't noticed a consistent convention in "package.json" files (which 
IIUC may not be entirely hand-written).

For debugging, the biggest problem is that (guix build json) doesn't add 
any linebreaks or indentation.

If I were changing it, I'd want it to write object keys in 'string<?' 
order and to raise an exception if given duplicate keys.

-Philip

[1]: 
https://www.gnu.org/software/guile/docs/docs-2.2/guile-ref/Stack-Overflow.html
diff mbox series

Patch

diff --git a/guix/build/node-build-system.scm b/guix/build/node-build-system.scm
index dcaa719f40..e5c4da5091 100644
--- a/guix/build/node-build-system.scm
+++ b/guix/build/node-build-system.scm
@@ -26,6 +26,7 @@  (define-module (guix build node-build-system)
   #:use-module (ice-9 match)
   #:use-module (srfi srfi-1)
   #:export (%standard-phases
+            with-atomic-json-file-replacement
             node-build))
 
 ;; Commentary:
@@ -34,6 +35,237 @@  (define-module (guix build node-build-system)
 ;;
 ;; Code:
 
+;;;
+;;; JSON utilities.
+;;;
+;;; The following procedures facilitate transforming JSON values using the
+;;; representation from (guix build json), particularly purely functional
+;;; update of JSON objects.  If we decide to make more of them public, we
+;;; might instead put them in their own file or, eventually, add them to
+;;; (guix build json).
+;;;
+;;; JSON objects with duplicate keys are not interoperable: see RFC 8259 § 4.
+;;; These procedures assume, but generally do not check, that JSON objects
+;;; given to them as arguments do not have duplicate keys.  As long as that
+;;; precondition is satisfied, they will produce JSON objects without
+;;; duplicate keys.  Procedures that operate on unwrapped assosciation lists
+;;; may do likewise, which should be considered before exporting them for
+;;; general use.
+;;;
+
+(define (with-atomic-json-file-replacement file proc)
+  "Like 'with-atomic-file-replacement', but PROC is called with a single
+argument---the result of parsing FILE's contents as json---and should a value
+to be written as json to the replacement FILE."
+  (with-atomic-file-replacement file
+    (lambda (in out)
+      (write-json (proc (read-json in)) out))))
+
+(define (jsobject-ref js key failure-result)
+  "Return the value assosciated with KEY in the json object JS.  If KEY is not
+found and FAILURE-RESULT is a procedure, it is called in tail position with
+zero arguments.  Otherwise, FAILURE-RESULT is returned."
+  ;; TODO: `failure-result` should be optional, but should the default
+  ;; `failure-result` be #f (like `assoc-ref`), a thunk raising an exception,
+  ;; '(@), or something else?  Keep it mandatory until we discuss and decide.
+  (match js
+    (('@ . alist)
+     (match (assoc key alist)
+       (#f
+        (if (procedure? failure-result)
+            (failure-result)
+            failure-result))
+       ((_ . value)
+        value)))))
+
+(define (alist-pop alist key)
+  "Return two values: the first pair in ALIST with the given KEY in its
+'car' (or #f, if no such pair exists) and an assosciation list like (and
+potentially sharing storage with) ALIST, but with no entry for KEY."
+  (match (assoc key alist)
+    ;; If key isn't present, we don't need to do any allocation
+    (#f
+     (values #f alist))
+    (found
+     (values found
+             ;; Because we have `found`, we can find it more
+             ;; efficiently this time with `eq?`. We avoid using
+             ;; `delq` because it would copy pairs in a shared
+             ;; tail. We assume a sufficiently smart compiler to
+             ;; handle "tail recursion modulo cons" (vid. e.g. Indiana
+             ;; University Technical Report No. 19, Friedman & Wise
+             ;; 1975) at least as efficiently as a hand-written
+             ;; tail-recursive implementation with an accumulator.
+             (let loop ((alist alist))
+               (match alist
+                 ;; We know that `found` is present,
+                 ;; so no need to check for '()
+                 ((this . alist)
+                  (if (eq? this found)
+                      alist
+                      (cons this (loop alist))))))))))
+
+;; Sadly, Guile's implementation of (@ (srfi srfi-1) alist-delete)
+;; performs unnecessary allocation, e.g. this currently evaluates to #f:
+;;
+;;     (let ((alist `(("a" . 1)("b" . 2)("c" . 3))))
+;;       (eq? alist (alist-delete "x" alist)))
+;;
+;; These functions generally choose to allocate a new outer pair (with the '@
+;; tag), even though in unusual cases the resulting object might not have
+;; changed, for the sake of simplicity and to avoid retaining a reference to
+;; the original alist longer than necessary. But that is O(1) allocation that
+;; could only rarely be avoided: `alist-delete` would allocate O(n) pairs,
+;; which would only be necessary in the worst case.
+(define (alist-delete* alist key)
+  "Return an assosciation list like (and potentially sharing storage with)
+ALIST, but with no entry for KEY."
+  (define-values (_popped remaining)
+    (alist-pop alist key))
+  remaining)
+
+(define (jsobject-delete js key)
+  "Return a json object like JS, but with no entry for KEY."
+  (cons '@ (match js
+             (('@ . alist)
+              (alist-delete* alist key)))))
+
+(define (alist-set alist key value)
+  "Return an assosciation list like ALIST, but with KEY mapped to VALUE,
+replacing any existing mapping for KEY."
+  (acons key value (alist-delete* alist key)))
+
+(define (jsobject-set js key value)
+  "Return a json object like JS, but with KEY mapped to VALUE, replacing any
+existing mapping for KEY."
+  (cons '@ (match js
+             (('@ . alist)
+              (alist-set alist key value)))))
+
+(define jsobject-set*
+  (case-lambda
+    "Return a json object like JS, but functionally extended by mapping each
+KEY to each VALUE, replacing any existing mapping for each KEY.  The update
+takes place from left to right, so later mappings overwrite earlier mappings
+for the same KEY."
+    ((js)
+     js)
+    ((js key value)
+     (jsobject-set js key value))
+    ((js . args)
+     (cons '@ (match js
+                (('@ . alist)
+                 (let loop ((alist alist)
+                            (args args))
+                   (match args
+                     (()
+                     alist)
+                     ((key value . args)
+                      (loop (alist-set alist key value)
+                            args))))))))))
+
+(define (alist-update alist key failure-result updater)
+  "Return an assosciation list like ALIST, but with KEY mapped to the result
+of applying UPDATER to the value to which KEY is mapped in ALIST.  When ALIST
+does not have an existing mapping for KEY, FAILURE-RESULT is used as with
+'jsobject-ref' to obtain the argument for UPDATER."
+  ;; Often, `updater` will be a lambda expression, so making it the last
+  ;; argument may help to makes the code legible, and the most likely
+  ;; `failure-result` arguments are all shorter than the keyword
+  ;; `#:failure-result`.  Plus, making `failure-result` mandatory helps make
+  ;; `alist-update` consistent with `alist-update*`.
+  (define-values (popped tail-alist)
+    (alist-pop alist key))
+  (acons key
+         (updater (match popped
+                    (#f
+                     (if (procedure? failure-result)
+                         (failure-result)
+                         failure-result))
+                    ((_ . value)
+                     value)))
+         tail-alist))
+
+(define (jsobject-update js key failure-result updater)
+  "Return a json object like JS, but with KEY mapped to the result of applying
+UPDATER to the value to which KEY is mapped in JS.  When JS does not have an
+existing mapping for KEY, FAILURE-RESULT is used as with 'jsobject-ref' to
+obtain the argument for UPDATER."
+  (cons '@ (match js
+             (('@ . alist)
+              (alist-update alist key failure-result updater)))))
+
+(define jsobject-update*
+  (case-lambda
+    "Return a json object like JS, but functionally extended by replacing the
+mapping for each KEY with the result of applying the corresponding UPDATER to
+the value to which that KEY is mapped in JS---or, if no such mapping exists,
+to a value based on the corresponding FAILURE-RESULT as with 'jsobject-ref'.
+The update takes place from left to right, so later UPDATERs will receive the
+values returned by earlier UPDATERs for the same KEY."
+    ((js)
+     js)
+    ((js key failure-result updater)
+     (jsobject-update js key failure-result updater))
+    ((js . args)
+     (cons '@ (match js
+                (('@ . alist)
+                 (let loop ((alist alist)
+                            (args args))
+                   (match args
+                     (()
+                     alist)
+                     ((key failure-result updater . args)
+                      (loop (alist-update alist key failure-result updater)
+                            args))))))))))
+
+(define* (jsobject-union #:key
+                         (combine (lambda (a b) b))
+                         (combine/key (lambda (k a b) (combine a b)))
+                         #:rest json-objects)
+  "Combine the given JSON-OBJECTS into a single json object.  The JSON-OBJECTS
+are merged from left to right by adding each key/value pair of each object to
+the aggregate object in turn.  When one of the JSON-OBJECTS contains a mapping
+from some key KEY to a value VAL such that the aggregate object already
+contains a mapping from KEY to a value VAL0, the aggregate object is
+functionally updated to instead map KEY to the value of (COMBINE/KEY KEY VAL0
+VAL).  The default COMBINE/KEY tail-calls (COMBINE VAL0 VAL), and the default
+COMBINE simply returns its second argument, so, by default, mappings in later
+JSON-OBJECTS supersede those in earlier ones."
+  (match (filter (lambda (v)
+                   (not (or (keyword? v)
+                            (procedure? v))))
+                 json-objects)
+    (()
+     '(@))
+    (((and js0 ('@ . _)))
+     js0)
+    ((('@ . alist0) ('@ . alist*) ...)
+     (cons '@ (fold (lambda (alist1 alist0)
+                      (if (null? alist0)
+                          alist1
+                          (fold (lambda (k+v alist0)
+                                  (match k+v
+                                    ((k . v)
+                                     (define-values (popped tail-alist)
+                                       (alist-pop alist0 k))
+                                     (match popped
+                                       (#f
+                                        (cons k+v tail-alist))
+                                       ((_ . v0)
+                                        (acons k
+                                               (combine/key k v0 v)
+                                               tail-alist))))))
+                                alist0
+                                alist1)))
+                    alist0
+                    alist*)))))
+
+
+;;;
+;;; Phases.
+;;;
+
 (define (set-home . _)
   (with-directory-excursion ".."
     (let loop ((i 0))
@@ -49,7 +281,7 @@  (define (set-home . _)
 (define (module-name module)
   (let* ((package.json (string-append module "/package.json"))
          (package-meta (call-with-input-file package.json read-json)))
-    (assoc-ref package-meta "name")))
+    (jsobject-ref package-meta "name" #f)))
 
 (define (index-modules input-paths)
   (define (list-modules directory)
@@ -73,27 +305,24 @@  (define* (patch-dependencies #:key inputs #:allow-other-keys)
 
   (define index (index-modules (map cdr inputs)))
 
-  (define (resolve-dependencies package-meta meta-key)
-    (fold (lambda (key+value acc)
-            (match key+value
-              ('@ acc)
-              ((key . value) (acons key (hash-ref index key value) acc))))
-          '()
-          (or (assoc-ref package-meta meta-key) '())))
+  (define resolve-dependencies
+    (match-lambda
+      (('@ . alist)
+       (cons '@ (map (match-lambda
+                       ((key . value)
+                        (cons key (hash-ref index key value))))
+                     alist)))))
 
-  (with-atomic-file-replacement "package.json"
-    (lambda (in out)
-      (let ((package-meta (read-json in)))
-        (assoc-set! package-meta "dependencies"
-                    (append
-                     '(@)
-                     (resolve-dependencies package-meta "dependencies")
-                     (resolve-dependencies package-meta "peerDependencies")))
-        (assoc-set! package-meta "devDependencies"
-                    (append
-                     '(@)
-                     (resolve-dependencies package-meta "devDependencies")))
-        (write-json package-meta out))))
+  (with-atomic-json-file-replacement "package.json"
+    (lambda (pkg-meta)
+      (jsobject-update*
+       pkg-meta
+       "devDependencies" '(@) resolve-dependencies
+       "dependencies" '(@) (lambda (deps)
+                             (resolve-dependencies
+                              (jsobject-union
+                               (jsobject-ref pkg-meta "peerDependencies" '(@))
+                               deps))))))
   #t)
 
 (define* (delete-lockfiles #:key inputs #:allow-other-keys)
@@ -114,9 +343,7 @@  (define* (configure #:key outputs inputs #:allow-other-keys)
 
 (define* (build #:key inputs #:allow-other-keys)
   (let ((package-meta (call-with-input-file "package.json" read-json)))
-    (if (and=> (assoc-ref package-meta "scripts")
-               (lambda (scripts)
-                 (assoc-ref scripts "build")))
+    (if (jsobject-ref (jsobject-ref package-meta "scripts" '(@)) "build" #f)
         (let ((npm (string-append (assoc-ref inputs "node") "/bin/npm")))
           (invoke npm "run" "build"))
         (format #t "there is no build script to run~%"))