Message ID | 20210119212810.20681-1-zimon.toutoune@gmail.com |
---|---|
State | Accepted |
Headers | show |
Series | [bug#45893,v3,1/3] utils: Add string distance. | expand |
Context | Check | Description |
---|---|---|
cbaines/submitting builds | success | |
cbaines/comparison | success | View comparision |
cbaines/git branch | success | View Git branch |
cbaines/applying patch | success | View Laminar job |
cbaines/issue | success | View issue |
zimoun <zimon.toutoune@gmail.com> skribis: > +(define (string-distance s1 s2) > + "Compute the Levenshtein distance between two strings." > + ;; Naive implemenation > + (define loop > + (mlambda (as bt) In general, ‘mlambda’ & co. are nice for prototyping, but for local procedures like this, it’s a sledgehammer. So instead, we should probably manage memoization state explicitly, but that often leads to code that’s much less nice. Ludo’.
And the rest LGTM! So I don’t know, should we try a more efficient-but-still-readable variant right away, or should we first apply these three patches? Thanks! Ludo’.
Hi Ludo, On Tue, 26 Jan 2021 at 22:20, Ludovic Courtès <ludo@gnu.org> wrote: > And the rest LGTM! > > So I don’t know, should we try a more efficient-but-still-readable > variant right away, or should we first apply these three patches? Well, I have implemented [1] the full matrix version, almost copy/paste from Wikipedia [2]. :-) Ugly, isn’t it! Let merge and improve if required, IMHO. As Arun mentioned in the «improving “guix search”» thread, maybe it is worth to give a look at the Guile string library. However, there is a missing point not discussed and important: it only works for ’parse-command-line’ and not ’args-fold*’. The main reason is: I have not found how to raise the hint for these both functions without code duplication. If there is no technical blocking point, I would like to replace (with care and double-check) all the ’args-fold*’ by ’parse-command-line’. An unified CLI entry-point. Well, extend what is done for “guix show” and “guix search” for all the commands. In one commit. WDYT? 1: <http://issues.guix.gnu.org/issue/45893#16> 2: <https://en.wikipedia.org/wiki/Levenshtein_distance#Computing_Levenshtein_distance> Cheers, simon
Hi! zimoun <zimon.toutoune@gmail.com> skribis: > Well, I have implemented [1] the full matrix version, almost copy/paste > from Wikipedia [2]. :-) Ugly, isn’t it! Yup! :-) > Let merge and improve if required, IMHO. As Arun mentioned in the > «improving “guix search”» thread, maybe it is worth to give a look at > the Guile string library. I went ahead and applied the three patches. I took the liberty to make two changes: 1. Changed “Do you mean” to “Did you mean”; 2. Display hints after errors, as is done elsewhere. It’s really pleasant! (I thought: when one types “guix clone”, should we suggest “git clone”? :-)) Thanks! Ludo’.
Hi Ludo, On Wed, 3 Feb 2021 at 12:29, Ludovic Courtès <ludo@gnu.org> wrote: > > Let merge and improve if required, IMHO. As Arun mentioned in the > > «improving “guix search”» thread, maybe it is worth to give a look at > > the Guile string library. > > I went ahead and applied the three patches. I took the liberty to make > two changes: Thanks! > It’s really pleasant! Cool! However, it works for commands using 'parse-command-line' and not 'args-fold*'. For example, one patch of the series replace for "guix show" and "guix search". I would like to replace all the args-fold* by parse-command-line, I think it makes sense. WDYT? > (I thought: when one types “guix clone”, should we suggest “git clone”? :-)) Hehe! Maybe an extension could invoke Git under the hood when the command is not found. ;-) Cheers, simon
Hi, On Thu, 4 Feb 2021 at 22:39, zimoun <zimon.toutoune@gmail.com> wrote: > …but not work for all the options: > > --8<---------------cut here---------------start------------->8--- > $ guix system vm --no-grafts ~/src/guix/guix/doc/os-config-bare-bones.texi --substitute-rls=https://ci.guix.gnu.org --dr-run --derivation > guix system: error: substitute-rls=https://ci.guix.gnu.org: unrecognized option > --8<---------------cut here---------------end--------------->8--- Fixed by patch#46308. <http://issues.guix.gnu.org/issue/46308> All the best, simon
diff --git a/guix/utils.scm b/guix/utils.scm index f8b05e7e80..dc2259ef8c 100644 --- a/guix/utils.scm +++ b/guix/utils.scm @@ -8,6 +8,7 @@ ;;; Copyright © 2017 Mathieu Othacehe <m.othacehe@gmail.com> ;;; Copyright © 2018, 2020 Marius Bakke <marius@gnu.org> ;;; Copyright © 2020 Efraim Flashner <efraim@flashner.co.il> +;;; Copyright © 2021 Simon Tournier <zimon.toutoune@gmail.com> ;;; ;;; This file is part of GNU Guix. ;;; @@ -37,6 +38,7 @@ #:use-module (guix memoization) #:use-module ((guix build utils) #:select (dump-port mkdir-p delete-file-recursively)) #:use-module ((guix build syscalls) #:select (mkdtemp! fdatasync)) + #:use-module ((guix combinators) #:select (fold2)) #:use-module (guix diagnostics) ;<location>, &error-location, etc. #:use-module (ice-9 format) #:use-module (ice-9 regex) @@ -114,7 +116,10 @@ call-with-decompressed-port compressed-output-port call-with-compressed-output-port - canonical-newline-port)) + canonical-newline-port + + string-distance + string-closest)) ;;; @@ -847,6 +852,46 @@ be determined." ;; raising an error would upset Geiser users #f)))))) + +;;; +;;; String comparison. +;;; + +(define (string-distance s1 s2) + "Compute the Levenshtein distance between two strings." + ;; Naive implemenation + (define loop + (mlambda (as bt) + (match as + (() (length bt)) + ((a s ...) + (match bt + (() (length as)) + ((b t ...) + (if (char=? a b) + (loop s t) + (1+ (min + (loop as t) + (loop s bt) + (loop s t)))))))))) + + (let ((c1 (string->list s1)) + (c2 (string->list s2))) + (loop c1 c2))) + +(define* (string-closest trial tests #:key (threshold 3)) + "Return the string from TESTS that is the closest from the TRIAL, +according to 'string-distance'. If the TESTS are too far from TRIAL, +according to THRESHOLD, then #f is returned." + (identity ;discard second return value + (fold2 (lambda (test closest minimal) + (let ((dist (string-distance trial test))) + (if (and (< dist minimal) (< dist threshold)) + (values test dist) + (values closest minimal)))) + #f +inf.0 + tests))) + ;;; Local Variables: ;;; eval: (put 'call-with-progress-reporter 'scheme-indent-function 1) ;;; End: diff --git a/tests/utils.scm b/tests/utils.scm index 9bce446d98..40eaf65bbc 100644 --- a/tests/utils.scm +++ b/tests/utils.scm @@ -2,6 +2,7 @@ ;;; Copyright © 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020 Ludovic Courtès <ludo@gnu.org> ;;; Copyright © 2014 Eric Bavier <bavier@member.fsf.org> ;;; Copyright © 2016 Mathieu Lirzin <mthl@gnu.org> +;;; Copyright © 2021 Simon Tournier <zimon.toutoune@gmail.com> ;;; ;;; This file is part of GNU Guix. ;;; @@ -265,6 +266,23 @@ skip these tests." string-reverse) (call-with-input-file temp-file get-string-all))) +(test-equal "string-distance" + '(0 1 1 5 5) + (list + (string-distance "hello" "hello") + (string-distance "hello" "helo") + (string-distance "helo" "hello") + (string-distance "" "hello") + (string-distance "hello" ""))) + +(test-equal "string-closest" + '("hello" "hello" "helo" #f) + (list + (string-closest "hello" '("hello")) + (string-closest "hello" '("helo" "hello" "halo")) + (string-closest "hello" '("kikoo" "helo" "hihihi" "halo")) + (string-closest "hello" '("aaaaa" "12345" "hellohello" "h")))) + (test-end) (false-if-exception (delete-file temp-file))