diff mbox series

[bug#58136] ui: Improve sort order when searching package names.

Message ID Y0aj2GRdkZG6cFhs@zpidnb93
State New
Headers show
Series [bug#58136] ui: Improve sort order when searching package names. | expand

Checks

Context Check Description
cbaines/comparison success View comparision
cbaines/git-branch success View Git branch
cbaines/applying patch success
cbaines/issue success View issue
cbaines/comparison success View comparision
cbaines/git-branch success View Git branch
cbaines/applying patch success View Laminar job
cbaines/issue success View issue

Commit Message

Lars-Dominik Braun Oct. 12, 2022, 11:24 a.m. UTC
Hi simon,

> In addition to your proposal which LGTM, maybe we could also use the
> ’upstream-name’ properties.  Most of the time, the Guix name matches the
> upstream name, but sometimes not.  Although, it would not fix the issue
> for ggplot2 since there is no upstream-name for this package. :-)
I agree that using the upstream-name would be a good idea.

>  2. set the “namespace” weight to 1 (or 2 if you prefer)
> 
>     Otherwise, for example, generic name as CSV could artificially bump
>     the relevance and hide relevant packages.  For instance, compare
> 
>        guix search csv
The issue here is we don’t know what the user is searching for. If we
add more weight to the package name then usually libraries (rust-csv,
ghc-csv, …) win. Imo a search for “csv” should return tools to
manipulate CSV files like csvkit, csvdiff, xlsx2csv, … Just like
“json” should yield tools like jq, json.sh and possibly others which
I cannot find right now. But maybe I’m searching for a C library that
parses CSV instead. And then what…?

As for ggplot2, the particular issue seems to be that scores are added
for each match and the description for some of our packages contains
“ggplot2” alot. So I tried using MAX instead of +, which works,
but results in little variation of scores and thus weird sort order
(descending by name). It does not feel like an improvement either.

Cheers,
Lars

Comments

Ludovic Courtès Oct. 17, 2022, 7:46 a.m. UTC | #1
Hi!

Lars-Dominik Braun <ldb@leibniz-psychology.org> skribis:

> diff --git a/guix/packages.scm b/guix/packages.scm
> index 94e464cd01..9934501cdb 100644
> --- a/guix/packages.scm
> +++ b/guix/packages.scm
> @@ -86,6 +86,7 @@ (define-module (guix packages)
>              this-package
>              package-name
>              package-upstream-name
> +            package-upstream-name*
>              package-version
>              package-full-name
>              package-source
> @@ -657,6 +658,38 @@ (define (package-upstream-name package)
>    (or (assq-ref (package-properties package) 'upstream-name)
>        (package-name package)))
>  
> +(define (package-upstream-name* package)
> +  "Return the upstream name of PACKAGE, which could be different from the name
> +it has in Guix."

s/which could.*Guix/accounting for commonly-used package name prefixes
in addition to the @code{upstream-name} property/

Preferably make this addition in a separate commit.

> +++ b/guix/ui.scm
> @@ -1623,10 +1623,23 @@ (define (relevance obj regexps metrics)
>    (define (score regexp str)
>      (fold-matches regexp str 0
>                    (lambda (m score)
> -                    (+ score
> -                       (if (string=? (match:substring m) str)
> -                           5             ;exact match
> -                           1)))))
> +                    (let* ((start (- (match:start m) 1))
> +                           (end (match:end m))
> +                           (left (if (>= start 0) (string-ref str start) #f))
> +                           (right (if (< end (string-length str)) (string-ref str end) #f))
> +                           (delimiter-classes '(Cc Cf Pd Pe Pf Pi Po Ps Sk Zs Zl Zp))
> +                           (delim-left (or (member (and=> left char-general-category) delimiter-classes) (eq? left #f)))
> +                           (delim-right (or (member (and=> right char-general-category) delimiter-classes) (eq? right #f))))
> +                      (max score
> +                        (cond
> +                          ;; regexp is a full match for str.
> +                          ((and (eq? left #f) (eq? right #f)) 4)
> +                          ;; regexp matches a single word in str.
> +                          ((and delim-left delim-right) 3)
> +                          ;; regexp matches the beginning or end of a word in str.
> +                          ((or delim-left delim-right) 2)
> +                          ;; Everything else.
> +                          (#t 1)))))))

The intent is to have all regexps behave as if the user passed \<STR\>,
is that right?  Would be nice to have a comment clarifying that above
and perhaps making it a separate change?

Stylistic notes:

  (if cond consequent #f)  =>  (and cond consequent)
  (eq? x #f)               =>  (not x)
  (cond … (#t x))          =>  (cond … (else x))

> @@ -1635,10 +1648,11 @@ (define (regexp->score regexp)
>                  ((field . weight)
>                   (match (field obj)
>                     (#f  relevance)
> +                   ('() relevance)
>                     ((? string? str)
> -                    (+ relevance (* (score-regexp str) weight)))
> +                    (max relevance (* (score-regexp str) weight)))
>                     ((lst ...)
> -                    (+ relevance (* weight (apply + (map score-regexp lst)))))))))
> +                    (max relevance (* weight (apply max (map score-regexp lst)))))))))

Intuitively I would expect scores to add up, otherwise we’re kinda
losing information; so I would not make this change.  WDYT?

There’s a test for ‘package-relevance’ in tests/ui.scm.  Please make
sure it still passes and ideally add relevant tests such as the CSV
example you gave.

Thanks!

Ludo’.
Simon Tournier Oct. 17, 2022, 8:19 a.m. UTC | #2
Hi Lars, Ludo,

In short, I miss why the initial patch with a minor tweak is not enough
for covering the corner cases. :-)


On lun., 17 oct. 2022 at 09:46, Ludovic Courtès <ludo@gnu.org> wrote:
>> +++ b/guix/ui.scm
>> @@ -1623,10 +1623,23 @@ (define (relevance obj regexps metrics)
>>    (define (score regexp str)
>>      (fold-matches regexp str 0
>>                    (lambda (m score)
>> -                    (+ score
>> -                       (if (string=? (match:substring m) str)
>> -                           5             ;exact match
>> -                           1)))))
>> +                    (let* ((start (- (match:start m) 1))
>> +                           (end (match:end m))
>> +                           (left (if (>= start 0) (string-ref str start) #f))
>> +                           (right (if (< end (string-length str)) (string-ref str end) #f))
>> +                           (delimiter-classes '(Cc Cf Pd Pe Pf Pi Po Ps Sk Zs Zl Zp))
>> +                           (delim-left (or (member (and=> left char-general-category) delimiter-classes) (eq? left #f)))
>> +                           (delim-right (or (member (and=> right char-general-category) delimiter-classes) (eq? right #f))))
>> +                      (max score
>> +                        (cond
>> +                          ;; regexp is a full match for str.
>> +                          ((and (eq? left #f) (eq? right #f)) 4)
>> +                          ;; regexp matches a single word in str.
>> +                          ((and delim-left delim-right) 3)
>> +                          ;; regexp matches the beginning or end of a word in str.
>> +                          ((or delim-left delim-right) 2)
>> +                          ;; Everything else.
>> +                          (#t 1)))))))
>
> The intent is to have all regexps behave as if the user passed \<STR\>,
> is that right?  Would be nice to have a comment clarifying that above
> and perhaps making it a separate change?

All this appears to me overcomplicated.  Personally, I have to read it
many times to get the logic; while the initial patch was much clearer,
IMHO.

Other said, I am not convinced the complexity is worth the corner case.


The initial patch with the minor tweak I am proposing (maybe using
package-upstream-name*) appears to me enough for covering the corner
cases initially reported (as ggplot2).


>> @@ -1635,10 +1648,11 @@ (define (regexp->score regexp)
>>                  ((field . weight)
>>                   (match (field obj)
>>                     (#f  relevance)
>> +                   ('() relevance)
>>                     ((? string? str)
>> -                    (+ relevance (* (score-regexp str) weight)))
>> +                    (max relevance (* (score-regexp str) weight)))
>>                     ((lst ...)
>> -                    (+ relevance (* weight (apply + (map score-regexp lst)))))))))
>> +                    (max relevance (* weight (apply max (map score-regexp lst)))))))))
>
> Intuitively I would expect scores to add up, otherwise we’re kinda
> losing information; so I would not make this change.  WDYT?

I agree with Ludo that ’max’ is counterintuitive.


Well, could we list some examples (keyword and expectation)?  Because
the initial patch with a minor tweak LGTM and covers ggplot2, csv, and
some others.


Cheers,
simon
diff mbox series

Patch

diff --git a/guix/packages.scm b/guix/packages.scm
index 94e464cd01..9934501cdb 100644
--- a/guix/packages.scm
+++ b/guix/packages.scm
@@ -86,6 +86,7 @@  (define-module (guix packages)
             this-package
             package-name
             package-upstream-name
+            package-upstream-name*
             package-version
             package-full-name
             package-source
@@ -657,6 +658,38 @@  (define (package-upstream-name package)
   (or (assq-ref (package-properties package) 'upstream-name)
       (package-name package)))
 
+(define (package-upstream-name* package)
+  "Return the upstream name of PACKAGE, which could be different from the name
+it has in Guix."
+  (let ((namespaces (list "cl-"
+                          "ecl-"
+                          "emacs-"
+                          "ghc-"
+                          "go-"
+                          "guile-"
+                          "java-"
+                          "julia-"
+                          "lua-"
+                          "minetest-"
+                          "node-"
+                          "ocaml-"
+                          "perl-"
+                          "python-"
+                          "r-"
+                          "ruby-"
+                          "rust-"
+                          "sbcl-"
+                          "texlive-"))
+        (name (package-name package)))
+    (or (assq-ref (package-properties package) 'upstream-name)
+        (let loop ((prefixes namespaces))
+          (match prefixes
+            ('() name)
+            ((prefix rest ...)
+              (if (string-prefix? prefix name)
+                (substring name (string-length prefix))
+                (loop (cdr prefixes)))))))))
+
 (define (hidden-package p)
   "Return a \"hidden\" version of P--i.e., one that 'fold-packages' and thus,
 user interfaces, ignores."
diff --git a/guix/ui.scm b/guix/ui.scm
index dad2b853ac..da16a50f9f 100644
--- a/guix/ui.scm
+++ b/guix/ui.scm
@@ -1623,10 +1623,23 @@  (define (relevance obj regexps metrics)
   (define (score regexp str)
     (fold-matches regexp str 0
                   (lambda (m score)
-                    (+ score
-                       (if (string=? (match:substring m) str)
-                           5             ;exact match
-                           1)))))
+                    (let* ((start (- (match:start m) 1))
+                           (end (match:end m))
+                           (left (if (>= start 0) (string-ref str start) #f))
+                           (right (if (< end (string-length str)) (string-ref str end) #f))
+                           (delimiter-classes '(Cc Cf Pd Pe Pf Pi Po Ps Sk Zs Zl Zp))
+                           (delim-left (or (member (and=> left char-general-category) delimiter-classes) (eq? left #f)))
+                           (delim-right (or (member (and=> right char-general-category) delimiter-classes) (eq? right #f))))
+                      (max score
+                        (cond
+                          ;; regexp is a full match for str.
+                          ((and (eq? left #f) (eq? right #f)) 4)
+                          ;; regexp matches a single word in str.
+                          ((and delim-left delim-right) 3)
+                          ;; regexp matches the beginning or end of a word in str.
+                          ((or delim-left delim-right) 2)
+                          ;; Everything else.
+                          (#t 1)))))))
 
   (define (regexp->score regexp)
     (let ((score-regexp (lambda (str) (score regexp str))))
@@ -1635,10 +1648,11 @@  (define (regexp->score regexp)
                 ((field . weight)
                  (match (field obj)
                    (#f  relevance)
+                   ('() relevance)
                    ((? string? str)
-                    (+ relevance (* (score-regexp str) weight)))
+                    (max relevance (* (score-regexp str) weight)))
                    ((lst ...)
-                    (+ relevance (* weight (apply + (map score-regexp lst)))))))))
+                    (max relevance (* weight (apply max (map score-regexp lst)))))))))
             0 metrics)))
 
   (let loop ((regexps regexps)
@@ -1655,7 +1669,8 @@  (define (regexp->score regexp)
 (define %package-metrics
   ;; Metrics used to compute the "relevance score" of a package against a set
   ;; of regexps.
-  `((,package-name . 4)
+  `((,package-name . 5)
+    (,package-upstream-name* . 1)
 
     ;; Match against uncommon outputs.
     (,(lambda (package)