diff mbox series

[bug#45893,v2,3/3] ui: Add command hint.

Message ID 20210116002634.10401-3-zimon.toutoune@gmail.com
State Accepted
Headers show
Series DRAFT: Hint command line typo | expand

Checks

Context Check Description
cbaines/comparison success View comparision
cbaines/git branch success View Git branch
cbaines/applying patch success View Laminar job
cbaines/issue success View issue

Commit Message

Simon Tournier Jan. 16, 2021, 12:26 a.m. UTC
* guix/ui.scm (run-guix-command): Add command hint.
---
 guix/ui.scm | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

Comments

Ludovic Courtès Jan. 19, 2021, 5:38 p.m. UTC | #1
zimoun <zimon.toutoune@gmail.com> skribis:

> * guix/ui.scm (run-guix-command): Add command hint.

[...]

> +    (fold (lambda (name res)
> +            (if (string-null? res)
> +                (string-append  "@code{" name "}")
> +                (string-append "@code{" name "}, " res)))
> +          ""
> +          (string-closest (symbol->string guess) command-names)))

Hmm I thought ‘string-closest’ would return a single string, but
actually it returns a list of strings?

You cannot append strings together like this as this can break i18n.
The proper way would be to write:

  "Do you mean one of these: ~a?"

but then, thinking about it, wouldn’t it be more natural to suggest a
single command rather than several?

Also, it seems to me that there’s always at least one hit, is that
correct?  We should make sure that strings above a certain distance are
ignored, in which case there’s no hint to display.

Thanks!

Next up is package names, right?  :-)

Ludo’.
Simon Tournier Jan. 19, 2021, 6:01 p.m. UTC | #2
Hi,

On Tue, 19 Jan 2021 at 18:38, Ludovic Courtès <ludo@gnu.org> wrote:
>
> zimoun <zimon.toutoune@gmail.com> skribis:
>
> > * guix/ui.scm (run-guix-command): Add command hint.
>
> [...]
>
> > +    (fold (lambda (name res)
> > +            (if (string-null? res)
> > +                (string-append  "@code{" name "}")
> > +                (string-append "@code{" name "}, " res)))
> > +          ""
> > +          (string-closest (symbol->string guess) command-names)))
>
> Hmm I thought ‘string-closest’ would return a single string, but
> actually it returns a list of strings?
>
> You cannot append strings together like this as this can break i18n.

Hum?  But it is not (G_ "")...

> The proper way would be to write:
>
>   "Do you mean one of these: ~a?"
>
> but then, thinking about it, wouldn’t it be more natural to suggest a
> single command rather than several?

...but the real question is this: one or several hints.

Yeah, if we assume that it is about typo on the command line and the
option names are different enough, which are both 2 reasonable
assumptions, then it should always return one hint.

Well, it depends if we consider the case where the typo is at the
exact same distance as 2 different option names.

> Also, it seems to me that there’s always at least one hit, is that
> correct?  We should make sure that strings above a certain distance are
> ignored, in which case there’s no hint to display.

The hint reports all the options which are at the same minimum
distance; whatever the minimum is.  For instance "guix show --kikoo"
would hint something.

Instead, and including your remarks, maybe if the distance is greater
than a threshold, then return an error as usual.  Make more sense.


> Next up is package names, right?  :-)

Hehe!  I have tried...  But it is not "doable" in practise... well, I
find it too slow.  The natural improvement is to cut down the
levenhstein-distance by stopping if the score is greater than
threshold.  Well, I have not tried yet. :-)


Thanks for the feedback and the comment.

Cheers,
simon
Simon Tournier Jan. 19, 2021, 11:59 p.m. UTC | #3
Hi Ludo,

> Next up is package names, right?  :-)

As said I have tried to hint typo about packages for “guix show” but it
is too slow.  So, in procrastinating mood, I have tried to investigate a
bit.

Currently, the cache would be read 2 times, once by
’find-packages-by-name’ and then if the returned list is empty, again
with ’fold-available-packages’ to find the closest name.  Read the cache
only once implies a lot of work.

However, the first improvement is to speed up ’string-distance’.  Well,
the naive recursive implementation is well-known to be really slow.

Well, one question is: what is the status of Stream in Guile?  Without
drifting the initial topic, I am interested by  the answer because it
could be useful for “guix git log” (avoid to traverse all the history
tree before displaying but traverse when it is required, somehow).


Cheers,
simon

--8<---------------cut here---------------start------------->8---
scheme@(guix-user)> 
(use-modules (gnu packages)
             (guix utils))

(define (read-the-cache guess)
  (map (lambda (name)
         (identity name))
       (fold-available-packages
        (lambda* (name version result
                       #:key supported? deprecated?
                       #:allow-other-keys)
          (if (and supported? (not deprecated?))
              (cons name result)
              result))
        '())))

(define (compute-distance guess)
  (map (lambda (name)
         (string-distance guess name))
       (fold-available-packages
        (lambda* (name version result
                       #:key supported? deprecated?
                       #:allow-other-keys)
          (if (and supported? (not deprecated?))
              (cons name result)
              result))
        '())))
scheme@(guix-user)> ,time (define foo (read-the-cache "macs-mgit"))
;; 3.492591s real time, 4.523108s run time.  1.530055s spent in GC.
scheme@(guix-user)> ,time (define foo (read-the-cache "macs-mgit"))
;; 0.125346s real time, 0.123948s run time.  0.000000s spent in GC.
scheme@(guix-user)> ,time (define foo (compute-distance "macs-mgit"))
;; 3.813699s real time, 6.051472s run time.  3.256658s spent in GC.
scheme@(guix-user)> ,profile (define foo (compute-distance "macs-mgit"))
%     cumulative   self             
time   seconds     seconds  procedure
 44.68     51.86      1.83  guix/memoization.scm:100:0
 17.55      0.72      0.72  hash-set!
 12.23      0.54      0.50  guix/utils.scm:863:2:mproc
  9.04      0.37      0.37  hash-ref
  4.26     47.60      0.17  guix/utils.scm:863:2
  3.19      0.13      0.13  list?
  2.13      0.09      0.09  string->list
  1.60      0.07      0.07  min
  1.06      0.04      0.04  ice-9/popen.scm:183:0:reap-pipes
  1.06      0.04      0.04  length
  0.53      0.26      0.02  guix/combinators.scm:37:2:fold2
  0.53      0.02      0.02  equal?
  0.53      0.02      0.02  gnu/packages.scm:246:32
  0.53      0.02      0.02  char=?
  0.53      0.02      0.02  pointer->string
  0.53      0.02      0.02  srfi/srfi-1.scm:951:15
  0.00  30583.00      0.00  ice-9/boot-9.scm:220:5:map1
  0.00      4.08      0.00  <current input>:31:9
  0.00      0.15      0.00  <current input>:17:0:compute-distance
  0.00      0.13      0.00  guix/discovery.scm:177:0:fold-module-public-variables
  0.00      0.11      0.00  guix/discovery.scm:184:19
  0.00      0.09      0.00  gnu/packages.scm:224:21
  0.00      0.09      0.00  guix/utils.scm:860:0:string-distance
  0.00      0.04      0.00  guix/packages.scm:933:0:supported-package?
  0.00      0.04      0.00  srfi/srfi-1.scm:734:0:find-tail
  0.00      0.04      0.00  %after-gc-thunk
  0.00      0.04      0.00  anon #x227d190
  0.00      0.02      0.00  ice-9/boot-9.scm:1673:4:with-exception-handler
  0.00      0.02      0.00  guix/discovery.scm:43:0:scheme-files
  0.00      0.02      0.00  gnu/packages.scm:237:0:fold-packages
  0.00      0.02      0.00  srfi/srfi-1.scm:452:2:fold
  0.00      0.02      0.00  guix/discovery.scm:137:8
  0.00      0.02      0.00  guix/build/syscalls.scm:993:4
  0.00      0.02      0.00  guix/discovery.scm:59:14
  0.00      0.02      0.00  guix/discovery.scm:100:0:scheme-modules
  0.00      0.02      0.00  guix/discovery.scm:148:0:all-modules
  0.00      0.02      0.00  guix/build/syscalls.scm:1014:0:scandir*
  0.00      0.02      0.00  srfi/srfi-1.scm:487:0:fold-right
---
Sample count: 188
Total time: 4.084680487 seconds (2.659723098 seconds in GC)
scheme@(guix-user)>
--8<---------------cut here---------------end--------------->8---
Simon Tournier Jan. 20, 2021, 9:49 a.m. UTC | #4
Hi,

On Wed, 20 Jan 2021 at 00:59, zimoun <zimon.toutoune@gmail.com> wrote:

> However, the first improvement is to speed up ’string-distance’.  Well,
> the naive recursive implementation is well-known to be really slow.

For the interested reader, the patch implements the naive recursion
version.  And just to compare, I have quickly compared to the iterative
with full matrix.  See [1].

Roughly speaking, it is a factor of 5x.

--8<---------------cut here---------------start------------->8---
scheme@(guix-user)> ,time (define x (read-the-cache "macs-mgit"))
;; 3.425513s real time, 4.524607s run time.  1.619604s spent in GC.
scheme@(guix-user)> ,time (define x (compute-distance2 "macs-mgit"))
;; 0.870167s real time, 1.056089s run time.  0.271307s spent in GC.
scheme@(guix-user)> ,time (define x (compute-distance "macs-mgit"))
;; 3.637917s real time, 5.601863s run time.  2.847500s spent in GC.
--8<---------------cut here---------------end--------------->8---

The naive recursion version seems fast enough for options and commands
–– because there are few.  The key advantage of recursion implementation
is the readibility, IMHO.  Compare with the iteration with full matrix
below.

I am in favor to merge this naive recursion version for now.  And
postpone improvements.  Because to be competitive for package hints,
instead of plain “edit distance“ which scales poorly, there is 2
directions:

 - implement ’string-distance’ at the C level (in the standard library?)
 - pre-process the package names at package cache build time; with
   suffix tree or n-gram or <name-it> –– in the scope of “guix search”
   improvements.

Both are piece of works and I am not convinced the package name hint is
worth.
 
Cheers,
simon

1: <https://en.wikipedia.org/wiki/Levenshtein_distance#Computing_Levenshtein_distance>


--8<---------------cut here---------------start------------->8---
(define (edit-distance s1 s2)
  (let* ((as (string->list s1))
         (bs (string->list s2))
         (s (list->vector as))
         (t (list->vector bs))
         (m (length as))
         (n (length bs))
         (d (make-typed-array 'u32 0 (1+ m) (1+ n))))

    (do ((i 1 (1+ i))) ((> i m))
      (array-set! d i i 0))

    (do ((j 1 (1+ j))) ((> j n))
      (array-set! d j 0 j))

    (do ((j 1 (1+ j))) ((> j n))
      (do ((i 1 (1+ i))) ((> i m))
        (let* ((c1 (vector-ref s (1- i)))
               (c2 (vector-ref t (1- j)))
               (cost (if (char=? c1 c2)
                         0 1))
               (deletion  (1+ (array-ref d (1- i) j)))
               (insertion (1+ (array-ref d i (1- j))))
               (substitution (+ cost
                                (array-ref d (1- i) (1- j))))
               (v (min deletion
                       insertion
                       substitution)))
          (array-set! d v i j))))
    (array-ref d m n)))
--8<---------------cut here---------------end--------------->8---
Ludovic Courtès Jan. 26, 2021, 8:53 p.m. UTC | #5
Hi,

zimoun <zimon.toutoune@gmail.com> skribis:

> On Tue, 19 Jan 2021 at 18:38, Ludovic Courtès <ludo@gnu.org> wrote:
>>
>> zimoun <zimon.toutoune@gmail.com> skribis:
>>
>> > * guix/ui.scm (run-guix-command): Add command hint.
>>
>> [...]
>>
>> > +    (fold (lambda (name res)
>> > +            (if (string-null? res)
>> > +                (string-append  "@code{" name "}")
>> > +                (string-append "@code{" name "}, " res)))
>> > +          ""
>> > +          (string-closest (symbol->string guess) command-names)))
>>
>> Hmm I thought ‘string-closest’ would return a single string, but
>> actually it returns a list of strings?
>>
>> You cannot append strings together like this as this can break i18n.
>
> Hum?  But it is not (G_ "")...

Yes, but here you’re building an enumeration like:

  ‘foo’, ‘bar’, ‘baz’

This should be i18n’d, and so it should all be in a single format
string.

> Hehe!  I have tried...  But it is not "doable" in practise... well, I
> find it too slow.  The natural improvement is to cut down the
> levenhstein-distance by stopping if the score is greater than
> threshold.  Well, I have not tried yet. :-)

Oh I see.  Perhaps instead of (or in addition to) ‘string-distance’, you
need something like (string-distance<? a b len) ?

Thanks,
Ludo’.
Ludovic Courtès Jan. 26, 2021, 9 p.m. UTC | #6
zimoun <zimon.toutoune@gmail.com> skribis:

> Well, one question is: what is the status of Stream in Guile?  Without
> drifting the initial topic, I am interested by  the answer because it
> could be useful for “guix git log” (avoid to traverse all the history
> tree before displaying but traverse when it is required, somehow).

As discussed on IRC, there’s (srfi srfi-41).

> (define (read-the-cache guess)
>   (map (lambda (name)
>          (identity name))
>        (fold-available-packages
>         (lambda* (name version result
>                        #:key supported? deprecated?
>                        #:allow-other-keys)
>           (if (and supported? (not deprecated?))
>               (cons name result)
>               result))
>         '())))

Why ‘map’ here?  :-)

> scheme@(guix-user)> ,time (define foo (read-the-cache "macs-mgit"))
> ;; 3.492591s real time, 4.523108s run time.  1.530055s spent in GC.

3.5s?!  I have:

--8<---------------cut here---------------start------------->8---
scheme@(guix-user)> ,use(gnu packages)
scheme@(guix-user)> ,time (define lst (fold-available-packages
        (lambda* (name version result
                       #:key supported? deprecated?
                       #:allow-other-keys)
          (if (and supported? (not deprecated?))
              (cons name result)
              result))
        '()))
;;; <stdin>:2:6: warning: possibly unused local top-level variable `lst'
;; 0.093728s real time, 0.130037s run time.  0.065544s spent in GC.
--8<---------------cut here---------------end--------------->8---

I assume you’re using ‘guix repl’ and the cache is authoritative,
meaning that GUIX_PACKAGE_PATH is unset and there’s no ‘-L’ flag, right?

> scheme@(guix-user)> ,profile (define foo (compute-distance "macs-mgit"))
> %     cumulative   self             
> time   seconds     seconds  procedure
>  44.68     51.86      1.83  guix/memoization.scm:100:0
>  17.55      0.72      0.72  hash-set!
>  12.23      0.54      0.50  guix/utils.scm:863:2:mproc
>   9.04      0.37      0.37  hash-ref

OK, the naive memoizing implementation is inefficient, now we know.  :-)

Thanks,
Ludo’.
Simon Tournier Jan. 26, 2021, 9:27 p.m. UTC | #7
Hi,

Thanks for the review.

On Tue, 26 Jan 2021 at 21:53, Ludovic Courtès <ludo@gnu.org> wrote:

>>> > +    (fold (lambda (name res)
>>> > +            (if (string-null? res)
>>> > +                (string-append  "@code{" name "}")
>>> > +                (string-append "@code{" name "}, " res)))
>>> > +          ""
>>> > +          (string-closest (symbol->string guess) command-names)))
>>>
>>> Hmm I thought ‘string-closest’ would return a single string, but
>>> actually it returns a list of strings?
>>>
>>> You cannot append strings together like this as this can break i18n.
>>
>> Hum?  But it is not (G_ "")...
>
> Yes, but here you’re building an enumeration like:
>
>   ‘foo’, ‘bar’, ‘baz’
>
> This should be i18n’d, and so it should all be in a single format
> string.

Are the options translated?  If yes, then I understand, else I miss.

Anyway, I have removed that since I agree with your practical argument:
hint is for typo.  Type a faulty option name at the same distance as 2
real option names is not a typo. ;-)


Cheers,
simon
Simon Tournier Jan. 26, 2021, 9:44 p.m. UTC | #8
Hi,

On Tue, 26 Jan 2021 at 22:00, Ludovic Courtès <ludo@gnu.org> wrote:

> As discussed on IRC, there’s (srfi srfi-41).

I am playing with it.  Thanks! :-)


>> (define (read-the-cache guess)
>>   (map (lambda (name)
>>          (identity name))
>>        (fold-available-packages
>>         (lambda* (name version result
>>                        #:key supported? deprecated?
>>                        #:allow-other-keys)
>>           (if (and supported? (not deprecated?))
>>               (cons name result)
>>               result))
>>         '())))
>
> Why ‘map’ here?  :-)

Good question!  Burn CPU? :-)


>> scheme@(guix-user)> ,time (define foo (read-the-cache "macs-mgit"))
>> ;; 3.492591s real time, 4.523108s run time.  1.530055s spent in GC.
>
> 3.5s?!  I have:
>
> scheme@(guix-user)> ,use(gnu packages)
> scheme@(guix-user)> ,time (define lst (fold-available-packages
>         (lambda* (name version result
>                        #:key supported? deprecated?
>                        #:allow-other-keys)
>           (if (and supported? (not deprecated?))
>               (cons name result)
>               result))
>         '()))
> ;;; <stdin>:2:6: warning: possibly unused local top-level variable `lst'
> ;; 0.093728s real time, 0.130037s run time.  0.065544s spent in GC.
>
> I assume you’re using ‘guix repl’ and the cache is authoritative,
> meaning that GUIX_PACKAGE_PATH is unset and there’s no ‘-L’ flag,
> right?

Yes.  GUIX_PACKAGE_PATH unset and no ’-L’ flag.  So, it should be worse
otherwise.  

Well, I am surprise by the timing difference.  I do not remember on
which machine I did: maybe an old desktop with spinning disks at work.


>> scheme@(guix-user)> ,profile (define foo (compute-distance "macs-mgit"))
>> %     cumulative   self             
>> time   seconds     seconds  procedure
>>  44.68     51.86      1.83  guix/memoization.scm:100:0
>>  17.55      0.72      0.72  hash-set!
>>  12.23      0.54      0.50  guix/utils.scm:863:2:mproc
>>   9.04      0.37      0.37  hash-ref
>
> OK, the naive memoizing implementation is inefficient, now we know.  :-)

’memoize’ or ’mlambda’?  Or both?  Well, the thread is mess up to I do
not remember which one had been used.

On the other hand, the naive recursive edit distance is well know to be
slow and ineffective.

Cheers,
simon
Ludovic Courtès Jan. 27, 2021, 10:09 p.m. UTC | #9
Hi,

zimoun <zimon.toutoune@gmail.com> skribis:

>>> scheme@(guix-user)> ,profile (define foo (compute-distance "macs-mgit"))
>>> %     cumulative   self             
>>> time   seconds     seconds  procedure
>>>  44.68     51.86      1.83  guix/memoization.scm:100:0
>>>  17.55      0.72      0.72  hash-set!
>>>  12.23      0.54      0.50  guix/utils.scm:863:2:mproc
>>>   9.04      0.37      0.37  hash-ref
>>
>> OK, the naive memoizing implementation is inefficient, now we know.  :-)
>
> ’memoize’ or ’mlambda’?  Or both?

Both.

> Well, the thread is mess up to I do not remember which one had been
> used.
>
> On the other hand, the naive recursive edit distance is well know to be
> slow and ineffective.

Yeah.

Ludo’.
diff mbox series

Patch

diff --git a/guix/ui.scm b/guix/ui.scm
index bd504c68da..43c2007594 100644
--- a/guix/ui.scm
+++ b/guix/ui.scm
@@ -2123,6 +2123,20 @@  Run COMMAND with ARGS.\n"))
 (define (run-guix-command command . args)
   "Run COMMAND with the given ARGS.  Report an error when COMMAND is not
 found."
+  (define (command-hint guess commands)
+    (define command-names
+      (map (lambda (command)
+             (match (command-name command)
+               ((head tail ...) head)))
+           commands))
+
+    (fold (lambda (name res)
+            (if (string-null? res)
+                (string-append  "@code{" name "}")
+                (string-append "@code{" name "}, " res)))
+          ""
+          (string-closest (symbol->string guess) command-names)))
+
   (define module
     (catch 'misc-error
       (lambda ()
@@ -2139,6 +2153,8 @@  found."
                 (load file)
                 (resolve-interface `(guix extensions ,command)))))
           (lambda _
+            (display-hint (format #f (G_ "Do you mean ~a?")
+                                  (command-hint command (commands))))
             (format (current-error-port)
                     (G_ "guix: ~a: command not found~%") command)
             (show-guix-usage))))))