diff mbox series

[bug#45893,v3,1/3] utils: Add string distance.

Message ID 20210119212810.20681-1-zimon.toutoune@gmail.com
State Accepted
Headers show
Series [bug#45893,v3,1/3] utils: Add string distance. | expand

Checks

Context Check Description
cbaines/submitting builds success
cbaines/comparison success View comparision
cbaines/git branch success View Git branch
cbaines/applying patch success View Laminar job
cbaines/issue success View issue

Commit Message

Simon Tournier Jan. 19, 2021, 9:28 p.m. UTC
* guix/utils.scm (string-distance): New procedure.
(string-closest): New procedure.
* tests/utils.scm: Test it.
---
 guix/utils.scm  | 47 ++++++++++++++++++++++++++++++++++++++++++++++-
 tests/utils.scm | 18 ++++++++++++++++++
 2 files changed, 64 insertions(+), 1 deletion(-)


base-commit: 884f320e7ceb35cb8472510e47fc5f1944675d82
prerequisite-patch-id: 07abf72be0f4db9fbc19cb719d87bc1c69e8479d

Comments

Ludovic Courtès Jan. 26, 2021, 9:18 p.m. UTC | #1
zimoun <zimon.toutoune@gmail.com> skribis:

> +(define (string-distance s1 s2)
> +  "Compute the Levenshtein distance between two strings."
> +  ;; Naive implemenation
> +  (define loop
> +    (mlambda (as bt)

In general, ‘mlambda’ & co. are nice for prototyping, but for local
procedures like this, it’s a sledgehammer.  So instead, we should
probably manage memoization state explicitly, but that often leads to
code that’s much less nice.

Ludo’.
Ludovic Courtès Jan. 26, 2021, 9:20 p.m. UTC | #2
And the rest LGTM!

So I don’t know, should we try a more efficient-but-still-readable
variant right away, or should we first apply these three patches?

Thanks!

Ludo’.
Simon Tournier Jan. 26, 2021, 10:05 p.m. UTC | #3
Hi Ludo,

On Tue, 26 Jan 2021 at 22:20, Ludovic Courtès <ludo@gnu.org> wrote:
> And the rest LGTM!
>
> So I don’t know, should we try a more efficient-but-still-readable
> variant right away, or should we first apply these three patches?

Well, I have implemented [1] the full matrix version, almost copy/paste
from Wikipedia [2]. :-) Ugly, isn’t it!

Let merge and improve if required, IMHO.  As Arun mentioned in the
«improving “guix search”» thread, maybe it is worth to give a look at
the Guile string library.

However, there is a missing point not discussed and important: it only
works for ’parse-command-line’ and not ’args-fold*’.  The main reason
is: I have not found how to raise the hint for these both functions
without code duplication.

If there is no technical blocking point, I would like to replace (with
care and double-check) all the ’args-fold*’ by ’parse-command-line’.  An
unified CLI entry-point.

Well, extend what is done for “guix show” and “guix search” for all the
commands.   In one commit.

WDYT?

1: <http://issues.guix.gnu.org/issue/45893#16>
2:
<https://en.wikipedia.org/wiki/Levenshtein_distance#Computing_Levenshtein_distance>


Cheers,
simon
Ludovic Courtès Feb. 3, 2021, 11:28 a.m. UTC | #4
Hi!

zimoun <zimon.toutoune@gmail.com> skribis:

> Well, I have implemented [1] the full matrix version, almost copy/paste
> from Wikipedia [2]. :-) Ugly, isn’t it!

Yup! :-)

> Let merge and improve if required, IMHO.  As Arun mentioned in the
> «improving “guix search”» thread, maybe it is worth to give a look at
> the Guile string library.

I went ahead and applied the three patches.  I took the liberty to make
two changes:

  1. Changed “Do you mean” to “Did you mean”;

  2. Display hints after errors, as is done elsewhere.

It’s really pleasant!

(I thought: when one types “guix clone”, should we suggest “git clone”? :-))

Thanks!

Ludo’.
Simon Tournier Feb. 3, 2021, 12:15 p.m. UTC | #5
Hi Ludo,

On Wed, 3 Feb 2021 at 12:29, Ludovic Courtès <ludo@gnu.org> wrote:

> > Let merge and improve if required, IMHO.  As Arun mentioned in the
> > «improving “guix search”» thread, maybe it is worth to give a look at
> > the Guile string library.
>
> I went ahead and applied the three patches.  I took the liberty to make
> two changes:

Thanks!

> It’s really pleasant!

Cool!  However, it works for commands using 'parse-command-line' and
not 'args-fold*'.  For example, one patch of the series replace for
"guix show" and "guix search".  I would like to replace all the
args-fold* by parse-command-line, I think it makes sense.  WDYT?


> (I thought: when one types “guix clone”, should we suggest “git clone”? :-))

Hehe!  Maybe an extension could invoke Git under the hood when the
command is not found. ;-)

Cheers,
simon
Simon Tournier Feb. 4, 2021, 11:08 p.m. UTC | #6
Hi,

On Thu, 4 Feb 2021 at 22:39, zimoun <zimon.toutoune@gmail.com> wrote:

> …but not work for all the options:
>
> --8<---------------cut here---------------start------------->8---
> $ guix system vm --no-grafts ~/src/guix/guix/doc/os-config-bare-bones.texi --substitute-rls=https://ci.guix.gnu.org --dr-run --derivation
> guix system: error: substitute-rls=https://ci.guix.gnu.org: unrecognized option
> --8<---------------cut here---------------end--------------->8---

Fixed by patch#46308.  <http://issues.guix.gnu.org/issue/46308>

All the best,
simon
diff mbox series

Patch

diff --git a/guix/utils.scm b/guix/utils.scm
index f8b05e7e80..dc2259ef8c 100644
--- a/guix/utils.scm
+++ b/guix/utils.scm
@@ -8,6 +8,7 @@ 
 ;;; Copyright © 2017 Mathieu Othacehe <m.othacehe@gmail.com>
 ;;; Copyright © 2018, 2020 Marius Bakke <marius@gnu.org>
 ;;; Copyright © 2020 Efraim Flashner <efraim@flashner.co.il>
+;;; Copyright © 2021 Simon Tournier <zimon.toutoune@gmail.com>
 ;;;
 ;;; This file is part of GNU Guix.
 ;;;
@@ -37,6 +38,7 @@ 
   #:use-module (guix memoization)
   #:use-module ((guix build utils) #:select (dump-port mkdir-p delete-file-recursively))
   #:use-module ((guix build syscalls) #:select (mkdtemp! fdatasync))
+  #:use-module ((guix combinators) #:select (fold2))
   #:use-module (guix diagnostics)           ;<location>, &error-location, etc.
   #:use-module (ice-9 format)
   #:use-module (ice-9 regex)
@@ -114,7 +116,10 @@ 
             call-with-decompressed-port
             compressed-output-port
             call-with-compressed-output-port
-            canonical-newline-port))
+            canonical-newline-port
+
+            string-distance
+            string-closest))
 
 
 ;;;
@@ -847,6 +852,46 @@  be determined."
           ;; raising an error would upset Geiser users
           #f))))))
 
+
+;;;
+;;; String comparison.
+;;;
+
+(define (string-distance s1 s2)
+  "Compute the Levenshtein distance between two strings."
+  ;; Naive implemenation
+  (define loop
+    (mlambda (as bt)
+      (match as
+        (() (length bt))
+        ((a s ...)
+         (match bt
+           (() (length as))
+           ((b t ...)
+            (if (char=? a b)
+                (loop s t)
+                (1+ (min
+                     (loop as t)
+                     (loop s bt)
+                     (loop s t))))))))))
+
+  (let ((c1 (string->list s1))
+        (c2 (string->list s2)))
+    (loop c1 c2)))
+
+(define* (string-closest trial tests #:key (threshold 3))
+  "Return the string from TESTS that is the closest from the TRIAL,
+according to 'string-distance'.  If the TESTS are too far from TRIAL,
+according to THRESHOLD, then #f is returned."
+  (identity                              ;discard second return value
+    (fold2 (lambda (test closest minimal)
+             (let ((dist (string-distance trial test)))
+               (if (and  (< dist minimal) (< dist threshold))
+                   (values test dist)
+                   (values closest minimal))))
+           #f +inf.0
+           tests)))
+
 ;;; Local Variables:
 ;;; eval: (put 'call-with-progress-reporter 'scheme-indent-function 1)
 ;;; End:
diff --git a/tests/utils.scm b/tests/utils.scm
index 9bce446d98..40eaf65bbc 100644
--- a/tests/utils.scm
+++ b/tests/utils.scm
@@ -2,6 +2,7 @@ 
 ;;; Copyright © 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020 Ludovic Courtès <ludo@gnu.org>
 ;;; Copyright © 2014 Eric Bavier <bavier@member.fsf.org>
 ;;; Copyright © 2016 Mathieu Lirzin <mthl@gnu.org>
+;;; Copyright © 2021 Simon Tournier <zimon.toutoune@gmail.com>
 ;;;
 ;;; This file is part of GNU Guix.
 ;;;
@@ -265,6 +266,23 @@  skip these tests."
                      string-reverse)
     (call-with-input-file temp-file get-string-all)))
 
+(test-equal "string-distance"
+  '(0 1 1 5 5)
+  (list
+   (string-distance "hello" "hello")
+   (string-distance "hello" "helo")
+   (string-distance "helo" "hello")
+   (string-distance "" "hello")
+   (string-distance "hello" "")))
+
+(test-equal "string-closest"
+  '("hello" "hello" "helo" #f)
+  (list
+   (string-closest "hello" '("hello"))
+   (string-closest "hello" '("helo" "hello" "halo"))
+   (string-closest "hello" '("kikoo" "helo" "hihihi" "halo"))
+   (string-closest "hello" '("aaaaa" "12345" "hellohello" "h"))))
+
 (test-end)
 
 (false-if-exception (delete-file temp-file))