diff mbox series

[bug#42146,1/3] build: Relocate <regexp*> record and associated procedures here.

Message ID ff182177822369c7d31698ecbf0cb5dcbca37644.1697747385.git.maxim.cournoyer@gmail.com
State New
Headers show
Series [bug#42146,1/3] build: Relocate <regexp*> record and associated procedures here. | expand

Commit Message

Maxim Cournoyer Oct. 19, 2023, 8:33 p.m. UTC
* etc/teams.scm.in (<regexp*>, make-regexp*, regexp*-exec): Move to...
* guix/build/utils.scm: ... here.
(list-matches*): New procedure.

Change-Id: I566ac372f7d8ba08de94e19b54dcc68da2106a23
---
 etc/teams.scm.in     | 19 +------------------
 guix/build/utils.scm | 39 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 40 insertions(+), 18 deletions(-)


base-commit: d59653b7c9e43ebdbba20e2ca071429507f94c67

Comments

Ludovic Courtès Oct. 19, 2023, 8:40 p.m. UTC | #1
Hi,

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

> * etc/teams.scm.in (<regexp*>, make-regexp*, regexp*-exec): Move to...
> * guix/build/utils.scm: ... here.
> (list-matches*): New procedure.


[...]

> +++ b/guix/build/utils.scm

[...]

> +;;;
> +;;; Extend regexp objects with a pattern field.
> +;;;
> +(define-record-type <regexp*>
> +  (%make-regexp* pat flag rx)
> +  regexp*?
> +  (pat regexp*-pattern)                 ;the regexp pattern, a string
> +  (flag regexp*-flag)                   ;regexp flags
> +  (rx regexp*-rx))                      ;the compiled regexp object
> +
> +;;; Work around regexp implementation.
> +;;; This record allows to track the regexp pattern and then display it.
> +(define* (make-regexp* pat #:optional (flag regexp/extended))

I’m skeptical about the concrete benefits.  I would not include it in
(guix build utils), or at least not in this patch series.

(I tend to be super conservative about (guix build utils) because we
rarely get a chance to change it.)

Ludo’.
Maxim Cournoyer Oct. 19, 2023, 11:54 p.m. UTC | #2
Hi Ludo,

Ludovic Courtès <ludo@gnu.org> writes:

> Hi,
>
> Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
>
>> * etc/teams.scm.in (<regexp*>, make-regexp*, regexp*-exec): Move to...
>> * guix/build/utils.scm: ... here.
>> (list-matches*): New procedure.
>
>
> [...]
>
>> +++ b/guix/build/utils.scm
>
> [...]
>
>> +;;;
>> +;;; Extend regexp objects with a pattern field.
>> +;;;
>> +(define-record-type <regexp*>
>> +  (%make-regexp* pat flag rx)
>> +  regexp*?
>> +  (pat regexp*-pattern)                 ;the regexp pattern, a string
>> +  (flag regexp*-flag)                   ;regexp flags
>> +  (rx regexp*-rx))                      ;the compiled regexp object
>> +
>> +;;; Work around regexp implementation.
>> +;;; This record allows to track the regexp pattern and then display it.
>> +(define* (make-regexp* pat #:optional (flag regexp/extended))
>
> I’m skeptical about the concrete benefits.  I would not include it in
> (guix build utils), or at least not in this patch series.
>
> (I tend to be super conservative about (guix build utils) because we
> rarely get a chance to change it.)

The original users are substitute* and the teams.scm script.  Since
substitute* is from (guix build utils), it makes sense to add it there
as well, since they are coupled.

The benefit is concrete: it makes it possible to show which regexp
pattern failed to match (its textual representation), instead of
something much less useful such as a generic placeholder such as
"<unknown> (regexp)".

It's in actual use if you look at the definition of substitute:

--8<---------------cut here---------------start------------->8---

  (let ((rx+proc (map (match-lambda
                        (((or (? regexp? pattern) (? regexp*? pattern)) . proc)
                         (cons pattern proc))
                        ((pattern . proc)
                         (cons (make-regexp* pattern regexp/extended) proc)))
                      pattern+procs)))
--8<---------------cut here---------------end--------------->8---

The previous version followed a different approach, annotating the
rx+proc list with the raw pattern; I think the approach here is a bit
cleaner, and it should also enable users to pass pre-computed regexp*
objects to substitute* and have useful error messages produced.
Simon Tournier Oct. 20, 2023, 3:11 p.m. UTC | #3
Hi Ludo,

On Thu, 19 Oct 2023 at 22:40, Ludovic Courtès <ludo@gnu.org> wrote:

>> +;;;
>> +;;; Extend regexp objects with a pattern field.
>> +;;;
>> +(define-record-type <regexp*>
>> +  (%make-regexp* pat flag rx)
>> +  regexp*?
>> +  (pat regexp*-pattern)                 ;the regexp pattern, a string
>> +  (flag regexp*-flag)                   ;regexp flags
>> +  (rx regexp*-rx))                      ;the compiled regexp object
>> +
>> +;;; Work around regexp implementation.
>> +;;; This record allows to track the regexp pattern and then display it.
>> +(define* (make-regexp* pat #:optional (flag regexp/extended))
>
> I’m skeptical about the concrete benefits.  I would not include it in
> (guix build utils), or at least not in this patch series.
>
> (I tend to be super conservative about (guix build utils) because we
> rarely get a chance to change it.)

If I remember correctly, the record was introduced in #58660 [1].
Basically, if you have,

              (make-regexp "^gnu/packages/python(-.+|)\\.scm$")

then you only have access to some #<regexp 7f6315fb3500>.  Other said,
you lost the human-readable "^gnu/packages/python(-.+|)\\.scm$" regexp
pattern.  The workaround just stores this human-readable regexp pattern.
Later, it is thus possible to display it; for debugging or else.

For the location of such feature, I do not have an opinion.  For the
concrete benefits, I have one. :-)

Well, maybe the feature – keep an access to the human-readable regexp
pattern – could be implemented on Guile-side.

Or maybe there is another simpler way that I am not aware of?

Cheers,
simon


1: https://issues.guix.gnu.org/58660
Maxim Cournoyer Oct. 20, 2023, 3:39 p.m. UTC | #4
Hi,

Simon Tournier <zimon.toutoune@gmail.com> writes:

> Hi Ludo,
>
> On Thu, 19 Oct 2023 at 22:40, Ludovic Courtès <ludo@gnu.org> wrote:
>
>>> +;;;
>>> +;;; Extend regexp objects with a pattern field.
>>> +;;;
>>> +(define-record-type <regexp*>
>>> +  (%make-regexp* pat flag rx)
>>> +  regexp*?
>>> +  (pat regexp*-pattern)                 ;the regexp pattern, a string
>>> +  (flag regexp*-flag)                   ;regexp flags
>>> +  (rx regexp*-rx))                      ;the compiled regexp object
>>> +
>>> +;;; Work around regexp implementation.
>>> +;;; This record allows to track the regexp pattern and then display it.
>>> +(define* (make-regexp* pat #:optional (flag regexp/extended))
>>
>> I’m skeptical about the concrete benefits.  I would not include it in
>> (guix build utils), or at least not in this patch series.
>>
>> (I tend to be super conservative about (guix build utils) because we
>> rarely get a chance to change it.)
>
> If I remember correctly, the record was introduced in #58660 [1].
> Basically, if you have,
>
>               (make-regexp "^gnu/packages/python(-.+|)\\.scm$")
>
> then you only have access to some #<regexp 7f6315fb3500>.  Other said,
> you lost the human-readable "^gnu/packages/python(-.+|)\\.scm$" regexp
> pattern.  The workaround just stores this human-readable regexp pattern.
> Later, it is thus possible to display it; for debugging or else.
>
> For the location of such feature, I do not have an opinion.  For the
> concrete benefits, I have one. :-)
>
> Well, maybe the feature – keep an access to the human-readable regexp
> pattern – could be implemented on Guile-side.

Agreed, for the long haul (as with many cool bits that are Guix specific
currently).
Ludovic Courtès Oct. 23, 2023, 8:42 a.m. UTC | #5
Hi Maxim,

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

> Ludovic Courtès <ludo@gnu.org> writes:

[...]

>>> +;;; Extend regexp objects with a pattern field.
>>> +;;;
>>> +(define-record-type <regexp*>
>>> +  (%make-regexp* pat flag rx)
>>> +  regexp*?
>>> +  (pat regexp*-pattern)                 ;the regexp pattern, a string
>>> +  (flag regexp*-flag)                   ;regexp flags
>>> +  (rx regexp*-rx))                      ;the compiled regexp object
>>> +
>>> +;;; Work around regexp implementation.
>>> +;;; This record allows to track the regexp pattern and then display it.
>>> +(define* (make-regexp* pat #:optional (flag regexp/extended))
>>
>> I’m skeptical about the concrete benefits.  I would not include it in
>> (guix build utils), or at least not in this patch series.

[...]

> The benefit is concrete: it makes it possible to show which regexp
> pattern failed to match (its textual representation), instead of
> something much less useful such as a generic placeholder such as
> "<unknown> (regexp)".

Yes, okay.

Can we keep the <regexp*> interface private, then?  To me it’s an
implementation detail and perhaps a bit of a hack that I’d rather not
commit to maintaining.

The cost might be to duplicate it in teams.scm, but that’s an acceptable
cost IMO.

> It's in actual use if you look at the definition of substitute:
>
>
>   (let ((rx+proc (map (match-lambda
>                         (((or (? regexp? pattern) (? regexp*? pattern)) . proc)
>                          (cons pattern proc))
>                         ((pattern . proc)
>                          (cons (make-regexp* pattern regexp/extended) proc)))
>                       pattern+procs)))
>
> The previous version followed a different approach, annotating the
> rx+proc list with the raw pattern; I think the approach here is a bit
> cleaner, and it should also enable users to pass pre-computed regexp*
> objects to substitute* and have useful error messages produced.

Note that ‘substitute*’ only takes string literals as patterns, not
regexp objects.  The pattern part of clauses has sometimes been abused
to pass code, typically ‘string-append’ calls, but that’s definitely not
the spirit.

Another approach would have been, in ‘substitute*’, to capture the
regexp-as-string as well as other useful debugging info, such as its
source location.

Thanks,
Ludo’.
Maxim Cournoyer Oct. 23, 2023, 3:05 p.m. UTC | #6
Hi Ludovic,

Ludovic Courtès <ludo@gnu.org> writes:

> Hi Maxim,
>
> Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
>
>> Ludovic Courtès <ludo@gnu.org> writes:
>
> [...]
>
>>>> +;;; Extend regexp objects with a pattern field.
>>>> +;;;
>>>> +(define-record-type <regexp*>
>>>> +  (%make-regexp* pat flag rx)
>>>> +  regexp*?
>>>> +  (pat regexp*-pattern)                 ;the regexp pattern, a string
>>>> +  (flag regexp*-flag)                   ;regexp flags
>>>> +  (rx regexp*-rx))                      ;the compiled regexp object
>>>> +
>>>> +;;; Work around regexp implementation.
>>>> +;;; This record allows to track the regexp pattern and then display it.
>>>> +(define* (make-regexp* pat #:optional (flag regexp/extended))
>>>
>>> I’m skeptical about the concrete benefits.  I would not include it in
>>> (guix build utils), or at least not in this patch series.
>
> [...]
>
>> The benefit is concrete: it makes it possible to show which regexp
>> pattern failed to match (its textual representation), instead of
>> something much less useful such as a generic placeholder such as
>> "<unknown> (regexp)".
>
> Yes, okay.
>
> Can we keep the <regexp*> interface private, then?  To me it’s an
> implementation detail and perhaps a bit of a hack that I’d rather not
> commit to maintaining.

If you can think of a more elegant way to define it, I'm all ears.
Ideally, as Simon pointed out, that's something that Guile would support
natively (the ability to pull back the raw pattern from a regexp
object).  In Python for example, the re.Pattern object has a 'pattern'
property [0]

[0]  https://docs.python.org/3/library/re.html?highlight=regex#re.Pattern.pattern

Otherwise I don't think "the bit of a hack" is substantiated enough :-).

> The cost might be to duplicate it in teams.scm, but that’s an acceptable
> cost IMO.

I dislike copying bits around just to "hide" them from the "official"
interface, and have no fear maintaining that simple piece of code, so
I'd rather keep it at one place and expose it as a public API.

>> It's in actual use if you look at the definition of substitute:
>>
>>
>>   (let ((rx+proc (map (match-lambda
>>                         (((or (? regexp? pattern) (? regexp*? pattern)) . proc)
>>                          (cons pattern proc))
>>                         ((pattern . proc)
>>                          (cons (make-regexp* pattern regexp/extended) proc)))
>>                       pattern+procs)))
>>
>> The previous version followed a different approach, annotating the
>> rx+proc list with the raw pattern; I think the approach here is a bit
>> cleaner, and it should also enable users to pass pre-computed regexp*
>> objects to substitute* and have useful error messages produced.
>
> Note that ‘substitute*’ only takes string literals as patterns, not
> regexp objects.  The pattern part of clauses has sometimes been abused
> to pass code, typically ‘string-append’ calls, but that’s definitely not
> the spirit.
>

It's true that 'substitute*' only accepts literal patterns, but that's
not the case with 'substitute' itself, which is also exposed in the API
of (guix build utils): it accepts regexp objects as well, and thus
there's value in this change -- users could in theory use make-regexp*
objects and pass that to 'substitute' and benefit from more precises
error messages.

> Another approach would have been, in ‘substitute*’, to capture the
> regexp-as-string as well as other useful debugging info, such as its
> source location.

That'd be a nice solution, but it doesn't take into account
'substitute', as mentioned above.  As it has exactly zero (!) users
outside of substitute*, perhaps it could be yanked from the public API
and then we could consider doing the above for substitute* ?
diff mbox series

Patch

diff --git a/etc/teams.scm.in b/etc/teams.scm.in
index 55242caad1..8af25b9802 100644
--- a/etc/teams.scm.in
+++ b/etc/teams.scm.in
@@ -34,29 +34,12 @@ 
              (srfi srfi-9)
              (srfi srfi-26)
              (ice-9 format)
-             (ice-9 regex)
              (ice-9 match)
              (ice-9 rdelim)
+             (guix build utils)
              (guix ui)
              (git))
 
-(define-record-type <regexp*>
-  (%make-regexp* pat flag rx)
-  regexp*?
-  (pat regexp*-pattern)
-  (flag regexp*-flag)
-  (rx regexp*-rx))
-
-;;; Work around regexp implementation.
-;;; This record allows to track the regexp pattern and then display it.
-(define* (make-regexp* pat #:optional (flag regexp/extended))
-  "Alternative to `make-regexp' producing annotated <regexp*> objects."
-  (%make-regexp* pat flag (make-regexp pat flag)))
-
-(define (regexp*-exec rx* str)
-  "Execute the RX* regexp, a <regexp*> object."
-  (regexp-exec (regexp*-rx rx*) str))
-
 (define-record-type <team>
   (make-team id name description members scope)
   team?
diff --git a/guix/build/utils.scm b/guix/build/utils.scm
index 8e630ad586..2b3a8e278b 100644
--- a/guix/build/utils.scm
+++ b/guix/build/utils.scm
@@ -9,6 +9,7 @@ 
 ;;; Copyright © 2020, 2021 Maxim Cournoyer <maxim.cournoyer@gmail.com>
 ;;; Copyright © 2021, 2022 Maxime Devos <maximedevos@telenet.be>
 ;;; Copyright © 2021 Brendan Tildesley <mail@brendan.scot>
+;;; Copyright © 2022 Simon Tournier <zimon.toutoune@gmail.com>
 ;;; Copyright © 2023 Carlo Zancanaro <carlo@zancanaro.id.au>
 ;;;
 ;;; This file is part of GNU Guix.
@@ -28,6 +29,7 @@ 
 
 (define-module (guix build utils)
   #:use-module (srfi srfi-1)
+  #:use-module (srfi srfi-9)
   #:use-module (srfi srfi-11)
   #:use-module (srfi srfi-26)
   #:use-module (srfi srfi-34)
@@ -55,6 +57,14 @@  (define-module (guix build utils)
             package-name->name+version
             parallel-job-count
 
+            make-regexp*
+            regexp*-exec
+            regexp*?
+            regexp*-pattern
+            regexp*-flag
+            regexp*-rx
+            list-matches*
+
             compressor
             tarball?
             %xz-parallel-args
@@ -163,6 +173,35 @@  (define-syntax-rule (define-constant name val)
    (module-replace! (current-module) '(setvbuf)))
   (else #f))
 
+
+;;;
+;;; Extend regexp objects with a pattern field.
+;;;
+(define-record-type <regexp*>
+  (%make-regexp* pat flag rx)
+  regexp*?
+  (pat regexp*-pattern)                 ;the regexp pattern, a string
+  (flag regexp*-flag)                   ;regexp flags
+  (rx regexp*-rx))                      ;the compiled regexp object
+
+;;; Work around regexp implementation.
+;;; This record allows to track the regexp pattern and then display it.
+(define* (make-regexp* pat #:optional (flag regexp/extended))
+  "Alternative to `make-regexp' producing annotated <regexp*> objects."
+  (%make-regexp* pat flag (make-regexp pat flag)))
+
+(define (regexp*-exec rx* str)
+  "Execute the RX* regexp, a <regexp*> object."
+  (regexp-exec (regexp*-rx rx*) str))
+
+(define* (list-matches* regexp str #:optional (flags regexp/extended))
+  "Like 'list-matches', but also accepting a regexp* as REGEXP."
+  (match regexp
+    ((or (? string?) (? regexp?))
+     (list-matches regexp str flags))
+    ((? regexp*?)
+     (list-matches (regexp*-rx regexp) str flags))))
+
 
 ;;;
 ;;; Compression helpers.