Message ID | cover.1644147246.git.public@yoctocell.xyz |
---|---|
Headers | show |
Series | Add Repology updater | expand |
Xinglu Chen schreef op zo 06-02-2022 om 12:50 [+0100]: > Because of the way ‘%updaters’ in (guix upstream) works, the Repology > updater is the first or second updater that is used (since it > technically works on ever package), but because of the limitations I > mentioned above, the result might not always be the best. The Repology > updater is mostly useful for things that don’t already have an updater, > e.g., ‘maven-dependency-tree’. Would it make sense to hard-code the > ‘%updaters’ variable and put the Repology last in the list? I would prefer not to hardcode %updaters and keep the current discovery mechanism, such that people can experiment with updaters outside a git checkout of guix and in channels. FWIW it would be useful to have the same mechanism for importers. However, it might be a good idea to do some _postprocessing_ on the discovered list of updaters, e.g. they could be sorted on 'genericity' with 'stable-sort' (*): (define (genericity x) (cond ((it is "generic-SOMETHING") 1) ((it is repology) 2) (#true 0))) (define (less x y) (<= (genericity x) (genericity y))) (*) stable-sort and not sort, to preserve alphabetical ordering for updaters with the same genericity. Greetings, Maxime.
Maxime schrieb am Sonntag der 06. Februar 2022 um 13:41 +01: > Xinglu Chen schreef op zo 06-02-2022 om 12:50 [+0100]: >> Because of the way ‘%updaters’ in (guix upstream) works, the Repology >> updater is the first or second updater that is used (since it >> technically works on ever package), but because of the limitations I >> mentioned above, the result might not always be the best. The Repology >> updater is mostly useful for things that don’t already have an updater, >> e.g., ‘maven-dependency-tree’. Would it make sense to hard-code the >> ‘%updaters’ variable and put the Repology last in the list? > > I would prefer not to hardcode %updaters and keep the current > discovery mechanism, such that people can experiment with updaters > outside a git checkout of guix and in channels. Good point. > FWIW it would be useful to have the same mechanism for importers. > > However, it might be a good idea to do some _postprocessing_ on > the discovered list of updaters, e.g. they could be sorted on > 'genericity' with 'stable-sort' (*): > > (define (genericity x) > (cond ((it is "generic-SOMETHING") 1) > ((it is repology) 2) > (#true 0))) > > (define (less x y) > (<= (genericity x) (genericity y))) > > (*) stable-sort and not sort, to preserve alphabetical ordering > for updaters with the same genericity. That looks like a good idea
Hi! Xinglu Chen <public@yoctocell.xyz> skribis: > This patchset adds a new updater, which scans Repology[1] for updates. > It should technically support all packages in Guix! :-) I wouldn’t want to spoil the party, but I’m mildly enthusiastic. Repology implements the same functionality as our updaters, so repology.org is effectively “service as a software substitute” (SaaSS). My preference would be to keep our existing updaters rather than effectively ditch them and delegate the work to Repology. It’s tempting to think we can have both, but I’m not sure this would last long. WDYT? Ludo’.
Hi, Ludovic schrieb am Dienstag der 08. Februar 2022 um 23:59 +01: > Hi! > > Xinglu Chen <public@yoctocell.xyz> skribis: > >> This patchset adds a new updater, which scans Repology[1] for updates. >> It should technically support all packages in Guix! :-) > > I wouldn’t want to spoil the party, but I’m mildly enthusiastic. > > Repology implements the same functionality as our updaters, so > repology.org is effectively “service as a software substitute” > (SaaSS). Right, but it tracks a lot more repositories than what our updaters do, so why not take advantage of that. > My preference would be to keep our existing updaters rather than > effectively ditch them and delegate the work to Repology. It’s tempting > to think we can have both, but I’m not sure this would last long. The point of the Repology updater is to act as a fallback if none of the other updaters can update a package, e.g., ‘maven-dependency-tree’. I already mentioned that language-specific updaters usually provide more accurate and detailed information, so they should be used when possible; we aren’t losing anything here.
Hello, Xinglu Chen <public@yoctocell.xyz> writes: > The point of the Repology updater is to act as a fallback if none of > the other updaters can update a package, e.g., ‘maven-dependency-tree’. > I already mentioned that language-specific updaters usually provide more > accurate and detailed information, so they should be used when possible; > we aren’t losing anything here. One issue is that such an updater will introduce frequent false positives. It is common for Repology to get the latest release wrong, because some distribution is doing fancy versioning, or because different distributions disagree about what is upstream. I don't think we can rely on Repology's "newest" status. The updater may need to provide its own version comparison tool, because Repology's tool and Guix versioning do not play nice, in particular when using `git-version'. Regards,
Nicolas schrieb am Mittwoch der 09. Februar 2022 um 15:29 +01: > Hello, > > Xinglu Chen <public@yoctocell.xyz> writes: > >> The point of the Repology updater is to act as a fallback if none of >> the other updaters can update a package, e.g., ‘maven-dependency-tree’. >> I already mentioned that language-specific updaters usually provide more >> accurate and detailed information, so they should be used when possible; >> we aren’t losing anything here. > > One issue is that such an updater will introduce frequent false > positives. It is common for Repology to get the latest release wrong, > because some distribution is doing fancy versioning, or because > different distributions disagree about what is upstream. Yeah, I have noticed that it sometimes thinks that a version like “20080323” is newer than something like “0.1.2-0.a1b2b3d”, even though it might not necessarily be true. This seems to be the case for a lot of Common Lisp packages which usually don’t have any proper releases. > I don't think we can rely on Repology's "newest" status. The updater may > need to provide its own version comparison tool, because Repology's tool > and Guix versioning do not play nice, in particular when using > `git-version'. In my testing, the “newest” status does a pretty good job (besides the problem I mentioned above) Some other “bad” updates I found[*] are listed below (excluding Common Lisp packages). --8<---------------cut here---------------start------------->8--- guile-ac-d-bus would be upgraded from 1.0.0-beta.0 to 1.0.0-beta0 sic would be upgraded from 1.2 to 1.2+20210506_058547e tla2tools would be upgraded from 1.7.1-0.6932e19 to 20140313 quickjs would be upgraded from 2021-03-27 to 2021.03.27 stow would be upgraded from 2.3.1 to 2.3.1+5.32 cube would be upgraded from 4.3.5 to 2005.08.29 python-ratelimiter would be upgraded from 1.2.0 to 1.2.0.post0 gr-osmosdr would be upgraded from 0.2.3-0.a100eb0 to 0.2.3.20210128 countdown would be upgraded from 1.0.0 to 20150606 http-parser would be upgraded from 2.9.4-1.ec8b5ee to 2.9.4.20201223 xlsx2csv would be upgraded from 0.7.4 to 20200427211949 keynav would be upgraded from 0.20110708.0 to 20150730+4ae486d --8<---------------cut here---------------end--------------->8--- It seems like most of these could be solved by checking if the version scheme changed from semver to calver. I think that’s a pretty good result considering how many packages we have. [*] Until I ran into <https://issues.guix.gnu.org/53923>
Hello, Xinglu Chen <public@yoctocell.xyz> writes: > Yeah, I have noticed that it sometimes thinks that a version like > “20080323” is newer than something like “0.1.2-0.a1b2b3d”, even though > it might not necessarily be true. This seems to be the case for a lot > of Common Lisp packages which usually don’t have any proper releases. [...] > In my testing, the “newest” status does a pretty good job (besides the > problem I mentioned above) > > Some other “bad” updates I found[*] are listed below (excluding Common Lisp > packages). > > --8<---------------cut here---------------start------------->8--- > guile-ac-d-bus would be upgraded from 1.0.0-beta.0 to 1.0.0-beta0 > sic would be upgraded from 1.2 to 1.2+20210506_058547e > tla2tools would be upgraded from 1.7.1-0.6932e19 to 20140313 > quickjs would be upgraded from 2021-03-27 to 2021.03.27 > stow would be upgraded from 2.3.1 to 2.3.1+5.32 > cube would be upgraded from 4.3.5 to 2005.08.29 > python-ratelimiter would be upgraded from 1.2.0 to 1.2.0.post0 > gr-osmosdr would be upgraded from 0.2.3-0.a100eb0 to 0.2.3.20210128 > countdown would be upgraded from 1.0.0 to 20150606 > http-parser would be upgraded from 2.9.4-1.ec8b5ee to 2.9.4.20201223 > xlsx2csv would be upgraded from 0.7.4 to 20200427211949 > keynav would be upgraded from 0.20110708.0 to 20150730+4ae486d > --8<---------------cut here---------------end--------------->8--- > > It seems like most of these could be solved by checking if the version > scheme changed from semver to calver. I think that’s a pretty good > result considering how many packages we have. I think this would not cut it. As I wrote, almost any package using `git-version' is going to create a version mismatch. This is because we consider (git-version "X.Y" revision commit) to be greater than "X.Y" whereas Repology either ignore the version, or consider it to be a pre-release before "X.Y". See, e.g., "emacs:circe" project, or "joycond". This, I think, the most prominent category of comparison failures. Also, there are versions which are plain wrong, e.g., "emacs:csv-mode", and disqualify correct and up-to-date version. There are also version disagreement in, e.g., "colobot", or upstream disagreement, e.g., "emacs:scala-mode". See also "emacs:geiser-racket", "python:folium" or "higan" for other projects with versioning issues. Regards,
Hi, Xinglu Chen <public@yoctocell.xyz> skribis: > Ludovic schrieb am Dienstag der 08. Februar 2022 um 23:59 +01: [...] >> Repology implements the same functionality as our updaters, so >> repology.org is effectively “service as a software substitute” >> (SaaSS). > > Right, but it tracks a lot more repositories than what our updaters do, > so why not take advantage of that. True, but this is kinda self-reinforcing: it’ll sure keep tracking more if we stop maintaining our own code (IIRC, Repology was started after ‘guix refresh’ and I believe it’s maintained by one person.) >> My preference would be to keep our existing updaters rather than >> effectively ditch them and delegate the work to Repology. It’s tempting >> to think we can have both, but I’m not sure this would last long. > > The point of the Repology updater is to act as a fallback if none of > the other updaters can update a package, e.g., ‘maven-dependency-tree’. > I already mentioned that language-specific updaters usually provide more > accurate and detailed information, so they should be used when possible; > we aren’t losing anything here. Hmm yes, could be. OTOH, like Nicolas writes, we would probably need some filtering or post-processing to reduce false-positives, right? Do you have examples where our updaters perform poorly and where Repology does a better job? I wonder if there are lessons to be drawn and bugs to be fixed. Thanks, Ludo’.
Hello, Ludovic Courtès <ludo@gnu.org> writes: > Do you have examples where our updaters perform poorly and where > Repology does a better job? I wonder if there are lessons to be drawn > and bugs to be fixed. As a data point, I'm sorry to say that our updaters are useless to me. I watch over more than one thousand packages. I would have a hard time expressing what are those packages to the updater, besides writing and keeping up-to-date a huge manifest file. Assuming I could manage this, fetching all version information would take considerable time, and, since many packages are from GitHub, the party would stop early anyway with GitHub refusing to proceed and requesting some token I don't have. OTOH, using Repology API, I get the information I want in about ten seconds. Sure, I need to eyeball through the results, filtering false positives (around 4% in my case), but it still is a practical solution. IMO, to be useful, updaters may need to rely on an external service, which may, or may not, belong to the Guix ecosystem. They also need a good UI. I don't want to sound too negative, though. And current updaters are certainly good enough when watching over a couple of packages, which might be the most common use-case. Cheers,
Nicolas Goaziou schreef op ma 14-02-2022 om 11:40 [+0100]: > [...]. Assuming I could manage this, > fetching all version information would take considerable time, and, > since many packages are from GitHub, the party would stop early anyway > with GitHub refusing to proceed and requesting some token I don't have. > > OTOH, using Repology API, I get the information I want in about ten > seconds. Sure, I need to eyeball through the results, filtering false > positives (around 4% in my case), but it still is a practical solution. > > IMO, to be useful, updaters may need to rely on an external service, > which may, or may not, belong to the Guix ecosystem. They also need > a good UI. To avoid exceeding API limits and reduce network traffic, I suggest the following change: Cache HTTP responses, using http-fetch/cached instead of http-fetch. When something is in the cache and not expired, this avoids some network traffic and does not bring us closer to the API limits. When it is expired (and in the cache), then at least http-fetch/cached makes a conditional request with If-Modified-Since, which GitHub does not count against the rate limit, assuming a ‘304 Not Modified’ response! That does not address all your concerns but it should help I think. Greetings, Maxime.
Hi Nicolas, Nicolas Goaziou <mail@nicolasgoaziou.fr> skribis: > Ludovic Courtès <ludo@gnu.org> writes: > >> Do you have examples where our updaters perform poorly and where >> Repology does a better job? I wonder if there are lessons to be drawn >> and bugs to be fixed. > > As a data point, I'm sorry to say that our updaters are useless to me. > > I watch over more than one thousand packages. I would have a hard time > expressing what are those packages to the updater, besides writing and > keeping up-to-date a huge manifest file. Assuming I could manage this, > fetching all version information would take considerable time, and, > since many packages are from GitHub, the party would stop early anyway > with GitHub refusing to proceed and requesting some token I don't have. > > OTOH, using Repology API, I get the information I want in about ten > seconds. Sure, I need to eyeball through the results, filtering false > positives (around 4% in my case), but it still is a practical solution. (I’m confused because my understanding of what you first wrote was that Repology had too many false positives to be useful.) You wrote about your feelings and that’s insightful, but can we focus on specific examples where updaters are not helpful so we can better understand and improve the situation? > IMO, to be useful, updaters may need to rely on an external service, > which may, or may not, belong to the Guix ecosystem. All the updaters rely on an external service. Relying on a centralized SaaSS is different, though. > They also need a good UI. Do you have examples of what’s wrong on the UI side? To me, the main shortcoming is that ‘guix refresh’ doesn’t tell you that if you update X, you may also need to update Y and Z. That info is not always available, but it is available in repos such as PyPI and ELPA. Thanks, Ludo’.
Hello, Ludovic Courtès <ludo@gnu.org> writes: > (I’m confused because my understanding of what you first wrote was that > Repology had too many false positives to be useful.) Repology is okay for my use-case because I've gotten accustomed to its quirks. I wouldn't recommend it as a fall-back solution for Guix in its current form, tho, for the reason above. Does that make sense? > You wrote about your feelings and that’s insightful, but can we focus on > specific examples where updaters are not helpful so we can better > understand and improve the situation? I wrote about the following facts: - it is difficult to specify a large number of packages, - when you have specified a large number of packages, the processing is slow, - checking GitHub fails for me. I don't see any feelings in there. >> IMO, to be useful, updaters may need to rely on an external service, >> which may, or may not, belong to the Guix ecosystem. > > All the updaters rely on an external service. Relying on a centralized > SaaSS is different, though. Fair enough. I meant an external centralized service. > Do you have examples of what’s wrong on the UI side? It has no Emacs interface. Nuff said. ;) Again, I don't know how to specify efficiently many packages, e.g., all Emacs packages, or all games. Also, reading through a massive output in the terminal is not very user friendly, IMO. > To me, the main shortcoming is that ‘guix refresh’ doesn’t tell you that > if you update X, you may also need to update Y and Z. That info is not > always available, but it is available in repos such as PyPI and ELPA. I don't think solving this is realistic. Dependencies are sometimes very loose. Regards,
Hi! Nicolas Goaziou <mail@nicolasgoaziou.fr> skribis: > Ludovic Courtès <ludo@gnu.org> writes: > >> (I’m confused because my understanding of what you first wrote was that >> Repology had too many false positives to be useful.) > > Repology is okay for my use-case because I've gotten accustomed to its > quirks. I wouldn't recommend it as a fall-back solution for Guix in its > current form, tho, for the reason above. Does that make sense? It sure does, thanks for explaining. > I wrote about the following facts: > - it is difficult to specify a large number of packages, > - when you have specified a large number of packages, the processing is > slow, > - checking GitHub fails for me. Alright, I had missed that. Regarding “specifying many packages”, do examples like these work for you: • guix refresh -t elpa • guix refresh $(guix package -A ^emacs- | cut -f1) • guix refresh -r emacs-emms • guix refresh -s non-core -t generic-git • guix refresh -m packages-i-care-about.scm If not, what kind of selection mechanism could help? ‘-s’ currently accepts only two values, but we could augment it. Regarding slow processing, it very much depends on the updater. For example, on a warm cache, ‘guix refresh -t gnu’ is relatively fast thanks to caching: --8<---------------cut here---------------start------------->8--- $ time guix refresh -t gnu gnu/packages/wget.scm:48:13: wget would be upgraded from 1.21.1 to 1.21.2 gnu/packages/tls.scm:86:13: libtasn1 would be upgraded from 4.17.0 to 4.18.0 [...] real 0m38.314s user 0m38.981s sys 0m0.164s --8<---------------cut here---------------end--------------->8--- It could be that some updaters do many HTTP round trips without any caching, which slows things down. [...] >> Do you have examples of what’s wrong on the UI side? > > It has no Emacs interface. Nuff said. ;) True! :-) I realize this is going off-topic, but let’s see if we can improve the existing infrastructure to make it more convenient. Thanks, Ludo’.
Hello, Ludovic Courtès <ludo@gnu.org> writes: > Regarding “specifying many packages”, do examples like these work for > you: > > • guix refresh -t elpa I don't find it very useful in practice. As a user, the packages I'm interested in probably rely on more than one updater. I'm not even supposed to know what updater relates to a given package. I actually only use this when I know a GNU ELPA package is outdated already, and I want it to compute the hash for me: ./pre-inst-env guix refresh -t elpa -u emacs-foo > • guix refresh $(guix package -A ^emacs- | cut -f1) This one is interesting. This illustrates that the UI is, from my point of view, a bit lacking. It would be a nice improvement to add a regexp mechanism built-in, like in "guix search". In any case, this fails after reporting status of around 50 packages, with this time: real 0m41,881s user 0m12,155s sys 0m0,726s Assuming I don't get the "rate limit exceeded" error, at this rate, it would take more than 15 minutes to check all the packages in "emacs-xyz.scm". This is a bit long. I don't see how this could reasonably be made faster without relying on an external centralized service doing the checks regularly (e.g., once a day) before the user actually requests them. > • guix refresh -r emacs-emms It also fails with the "rate limit exceeded". While this sounds theoretically nice, I wouldn't know how to make use of it yet. > • guix refresh -s non-core -t generic-git See above about "-t elpa". > • guix refresh -m packages-i-care-about.scm Yes, obviously, this is a nice, too. However, it doesn't scale if you need to specify 1000+ packages. > If not, what kind of selection mechanism could help? ‘-s’ currently > accepts only two values, but we could augment it. Besides regexp matching, it may be useful to filter packages per module, or source file name. Package categories is a bit awkward, tho, and probably not satisfying. > I realize this is going off-topic, but let’s see if we can improve the > existing infrastructure to make it more convenient. Is it really off-topic? Anyway, all of this is only one data point, and, as a reminder, I certainly don't want to disparage either Xinglu Chen's work, or current "guix refresh" functionality. HTH,
Hi, Nicolas Goaziou <mail@nicolasgoaziou.fr> skribis: > Ludovic Courtès <ludo@gnu.org> writes: > >> Regarding “specifying many packages”, do examples like these work for >> you: >> >> • guix refresh -t elpa > > I don't find it very useful in practice. As a user, the packages I'm > interested in probably rely on more than one updater. I'm not even > supposed to know what updater relates to a given package. Right, that’s more for packagers than for users. As a user, what works better is: guix refresh -r $(guix package -I |cut -f1) -s non-core … or simply ‘--with-latest’, if I’m not interested in updating package definitions. >> • guix refresh $(guix package -A ^emacs- | cut -f1) > > This one is interesting. This illustrates that the UI is, from my point > of view, a bit lacking. It would be a nice improvement to add a regexp > mechanism built-in, like in "guix search". Makes sense, we can do that. > In any case, this fails after reporting status of around 50 packages, > with this time: > > real 0m41,881s > user 0m12,155s > sys 0m0,726s How does it fail? If it’s the GitHub rate limit, then there’s only one answer: you have to provide a token. > Assuming I don't get the "rate limit exceeded" error, at this rate, it > would take more than 15 minutes to check all the packages in > "emacs-xyz.scm". This is a bit long. > I don't see how this could reasonably be made faster without relying on > an external centralized service doing the checks regularly (e.g., once > a day) before the user actually requests them. Maybe you’re right, but before jumping to the conclusion, we have to investigate a bit. Like I wrote, the ‘gnu’ updater for instance fetches a single file that remains in cache afterwards—the cost is constant. We should identify updaters that have linear cost and check what can be done. ‘github’, ‘generic-html’, and ‘generic-git’ are of that kind. Now, the command I gave above looks at 1,134 packages, so is it even something you want to do as a packager? >> • guix refresh -r emacs-emms > > It also fails with the "rate limit exceeded". While this sounds > theoretically nice, I wouldn't know how to make use of it yet. > >> • guix refresh -s non-core -t generic-git > > See above about "-t elpa". > >> • guix refresh -m packages-i-care-about.scm > > Yes, obviously, this is a nice, too. However, it doesn't scale if you > need to specify 1000+ packages. You can use ‘fold-packages’ and have three lines that return a manifest of 10K packages if you want it. Honestly, since I mostly rely on others these days :-), I’m no longer sure what the packager’s workflow is. Also, the level of coupling varies greatly between, say, a C/C++ package and a set of Python/Emacs/Rust packages. I find that ‘guix refresh’ works fine for loosely-coupled C/C++ packages where often you’d want to upgrade packages individually. But for Python and Emacs packages, what do we want? Do packagers always want to check 1K+ packages at once? Or are there other patterns? >> If not, what kind of selection mechanism could help? ‘-s’ currently >> accepts only two values, but we could augment it. > > Besides regexp matching, it may be useful to filter packages per module, > or source file name. Package categories is a bit awkward, tho, and > probably not satisfying. We can add options to make it more convenient, but it’s already possible: guix refresh $(guix package -A | grep emacs-xyz.scm | cut -f1) >> I realize this is going off-topic, but let’s see if we can improve the >> existing infrastructure to make it more convenient. > > Is it really off-topic? > > Anyway, all of this is only one data point, and, as a reminder, > I certainly don't want to disparage either Xinglu Chen's work, or > current "guix refresh" functionality. Yup, same here! I think we have nice infrastructure but you raise important shortcomings. What Xinglu Chen did might in fact be one way to address it, and there may also be purely UI issues that we could address. Thanks, Ludo’.
Hi, On Thu, 17 Feb 2022 at 11:35, Ludovic Courtès <ludo@gnu.org> wrote: >>> • guix refresh $(guix package -A ^emacs- | cut -f1) >> >> This one is interesting. This illustrates that the UI is, from my point >> of view, a bit lacking. It would be a nice improvement to add a regexp >> mechanism built-in, like in "guix search". > > Makes sense, we can do that. I agree the UI is not nice. Well, at the command line, I never read the complete output of “guix package -A” and I always pipe it with “cut -f1”. Well, I think this complete display is only useful for third-party; the only one I have in mind is emacs-guix. Therefore, are we maintaining this CLI for backward compatibility when we could change both? Something more useful as output would be: name version synopsis Whatever. :-) Even the internal etc/completion/bash/guix has to pipe: --8<---------------cut here---------------start------------->8--- _guix_complete_available_package () { local prefix="$1" if [ -z "$_guix_available_packages" ] then # Cache the complete list because it rarely changes and makes # completion much faster. _guix_available_packages="$(${COMP_WORDS[0]} package -A 2> /dev/null \ | cut -f1)" fi COMPREPLY+=($(compgen -W "$_guix_available_packages" -- "$prefix")) } --8<---------------cut here---------------end--------------->8--- Last, I am not convinced that “guix search” would be help here. Because: 1. the output requires to pipe with recsel, 2. it is much slower than “package -A” [1]. 1: <https://issues.guix.gnu.org/39258#119> >>> • guix refresh -m packages-i-care-about.scm >> >> Yes, obviously, this is a nice, too. However, it doesn't scale if you >> need to specify 1000+ packages. [...] >> In any case, this fails after reporting status of around 50 packages, >> with this time: >> >> real 0m41,881s >> user 0m12,155s >> sys 0m0,726s > > How does it fail? If it’s the GitHub rate limit, then there’s only one > answer: you have to provide a token. Let mimick a collection if 1000+ packages I care about. Consider this manifest for packages using r-build-system only… --8<---------------cut here---------------start------------->8--- (use-modules (guix packages) (gnu packages) (guix build-system r)) (packages->manifest (fold-packages (lambda (package result) (if (eq? (package-build-system package) r-build-system) (cons package result) result)) '())) --8<---------------cut here---------------end--------------->8--- …it hits the issue of Github token… --8<---------------cut here---------------start------------->8--- gnu/packages/bioconductor.scm:6034:13: 1.66.0 is already the latest version of r-plgem gnu/packages/bioconductor.scm:6011:13: 1.22.0 is already the latest version of r-rots gnu/packages/bioconductor.scm:12614:2: warning: 'bioconductor' updater failed to determine available releases for r-fourcseq Backtrace: 13 (primitive-load "/home/simon/.config/guix/current/bin/guix") [...] ice-9/boot-9.scm:1685:16: In procedure raise-exception: Error downloading release information through the GitHub API. This may be fixed by using an access token and setting the environment variable GUIX_GITHUB_TOKEN, for instance one procured from https://github.com/settings/tokens real 10m27.306s user 4m14.077s sys 0m12.467s --8<---------------cut here---------------end--------------->8--- …when most R packages come from CRAN or Bioconductor archives. Basically, ~5000 packages come from Github which represents ~25% of overall. Therefore, one needs to be really lucky when updating many package and not hit the Github rate limit. Yes, large collection of packages cannot be updated easily. Somehow, it is an issue from upstream and it is hard to fix… except by duplicating upstream or provide a token. :-) Well, using the external centralized Repology service is a first step to update at scale, no? A second step could be to have this feature included in the Data Service; but before we have other fishes to fry, IMHO. :-) >> Assuming I don't get the "rate limit exceeded" error, at this rate, it >> would take more than 15 minutes to check all the packages in >> "emacs-xyz.scm". This is a bit long. >> >> I don't see how this could reasonably be made faster without relying on >> an external centralized service doing the checks regularly (e.g., once >> a day) before the user actually requests them. > > Maybe you’re right, but before jumping to the conclusion, we have to > investigate a bit. Like I wrote, the ‘gnu’ updater for instance fetches > a single file that remains in cache afterwards—the cost is constant. Repology acts as this “external centralized service”, no? On one hand, it is a practical solution; especially by being fast enough. On the other hand, it serves few false positives (say 4% to fix the ideas). Nicolas, considering the complexity of packages and their origins, do you think it would be possible to do better (fast and accurate) than Repology at scale? >>> • guix refresh -m packages-i-care-about.scm >> >> Yes, obviously, this is a nice, too. However, it doesn't scale if you >> need to specify 1000+ packages. > > You can use ‘fold-packages’ and have three lines that return a manifest > of 10K packages if you want it. Yes, see example above. >>> If not, what kind of selection mechanism could help? ‘-s’ currently >>> accepts only two values, but we could augment it. >> >> Besides regexp matching, it may be useful to filter packages per module, >> or source file name. Package categories is a bit awkward, tho, and >> probably not satisfying. > > We can add options to make it more convenient, but it’s already > possible: Since these features are advanced, why not keep the CLI simple and instead on rely manifest files for complex filtering? >>> I realize this is going off-topic, but let’s see if we can improve the >>> existing infrastructure to make it more convenient. [...] > I think we have nice infrastructure but you raise important > shortcomings. What Xinglu Chen did might in fact be one way to address > it, and there may also be purely UI issues that we could address. All the points raised here are important but appears to me orthogonal with the patch series. :-) Cheers, simon
Hello, zimoun <zimon.toutoune@gmail.com> writes: > On Thu, 17 Feb 2022 at 11:35, Ludovic Courtès <ludo@gnu.org> wrote: >> How does it fail? If it’s the GitHub rate limit, then there’s only one >> answer: you have to provide a token. IIUC, I have to register on GitHub to create this token. This is a bit sad as a prerequisite to use one core feature of Guix. > Let mimick a collection if 1000+ packages I care about. Consider this > manifest for packages using r-build-system only… > > --8<---------------cut here---------------start------------->8--- > (use-modules (guix packages) > (gnu packages) > (guix build-system r)) > > (packages->manifest > (fold-packages (lambda (package result) > (if (eq? (package-build-system package) r-build-system) > (cons package result) > result)) > '())) > --8<---------------cut here---------------end--------------->8--- I have to learn about fold-packages. > Nicolas, considering the complexity of packages and their origins, do > you think it would be possible to do better (fast and accurate) than > Repology at scale? It's not about doing better, but doing differently. We do not need all of Repology's features. As far as the updater part is concerned, Repology pokes at various package repositories, which are usually not upstream, extracts package versions, applies some heuristics to normalize and compare them, then decides what is the newest version and which repositories provide outdated packages. This has a two obvious shortcomings: 1. the version number usually doesn't come from an official source, so it may be wrong—e.g., our emacs-csv-mode is "outdated" because Funtoo 1.4 chose a non-existing higher version number for the same package. 2. version comparison does not understand every local versioning scheme—e.g., our emacs-fold-dwim packages is currently at "1.2-0.c46f4bb", which is, in Guix parlance, after "1.2", yet Repology thinks this is actually older than "1.2". Therefore, I think a (theoretical) centralized Guix-centric version checker could be fast: it would only poke at what our packages consider to be upstream, and accurate, since it would know about our versioning rules. Basically it could boil down to calling current "guix refresh" on every package daily, and serializing the results. >>>> • guix refresh -m packages-i-care-about.scm >>> >>> Yes, obviously, this is a nice, too. However, it doesn't scale if you >>> need to specify 1000+ packages. >> >> You can use ‘fold-packages’ and have three lines that return a manifest >> of 10K packages if you want it. > > Yes, see example above. Point taken. Regards,
Hi! Nicolas Goaziou <mail@nicolasgoaziou.fr> skribis: > zimoun <zimon.toutoune@gmail.com> writes: > >> On Thu, 17 Feb 2022 at 11:35, Ludovic Courtès <ludo@gnu.org> wrote: > > >>> How does it fail? If it’s the GitHub rate limit, then there’s only one >>> answer: you have to provide a token. > > IIUC, I have to register on GitHub to create this token. This is a bit > sad as a prerequisite to use one core feature of Guix. I have some good news! https://issues.guix.gnu.org/54241 Granted, it’s not a revolution, but it should fix one of the main annoyances of ‘guix refresh’. Regarding the “sad prerequisite”, the ‘github’ updater predates the ‘generic-git’ updater by several years, during which it was the only way to get data for matching packages. Actually I wonder if it’s still useful to keep. In theory it can provide more accurate data than the ‘generic-git’ updater; not sure if this is the case in practice. Ludo’.