mbox series

[bug#44193,0/1] 'guix publish --cache' can publish items not yet cached

Message ID 20201024144929.4529-1-ludo@gnu.org
Headers show
Series 'guix publish --cache' can publish items not yet cached | expand

Message

Ludovic Courtès Oct. 24, 2020, 2:49 p.m. UTC
Hello!

The ‘--cache’ mode of ‘guix publish’ is nice for many reasons.  Until
now though, it would only return 200 to narinfo and nar requests
corresponding to items already in cache—already “baked”.  Thus, the
first narinfo request for an item would always return 404; one would
have to wait until the item is baked to get 200 and download the
substitute.

If you’re fetching substitutes for popular packages on a popular
instance, the extra delay is most likely invisible.  However, if you’re
using an instance with few users, or if you’re interesting in
substitutes for an package that’s not popular, the behavior described
above is a showstopper.  (Many people here have criticized it in the
past. :-))

This patch changes the behavior of ‘--cache’: if a store item is not
yet in cache, and if it’s “small enough”, then narinfo/nar requests for
it immediately return 200, as in the no-cache mode.  You’re trading
possibly increased server resource usage for reduced publication delay.

To put an upper bound on the extra resource usage, the
‘--cache-bypass-threshold’ allows users to control the maximum size
of a store item that can receive this treatment.


An interesting use case that can benefit from this change is .drv
substitutes: .drv themselves can be substitutes, so you can run:

  guix build /gnu/store/….drv

and the daemon will fetch the .drv and its closure and start
building it (since commit 9c9982dc0c8c38ce3821b154b7e92509c1564317).
It’s not something we use much so far, but maybe we could put it
to good use in the future (I know Chris has been playing with it
in the context of the Data Service).

Thoughts?

Ludo’.

Ludovic Courtès (1):
  publish: Add '--cache-bypass-threshold'.

 doc/guix.texi            | 24 +++++++++++-
 guix/scripts/publish.scm | 85 ++++++++++++++++++++++++++++++++--------
 tests/publish.scm        | 43 ++++++++++++++++++--
 3 files changed, 130 insertions(+), 22 deletions(-)

Comments

Miguel Arruga Vivas Oct. 25, 2020, 1:11 p.m. UTC | #1
Hi!

Just one general comment about this issue:

Ludovic Courtès <ludo@gnu.org> writes:
> Thus, the first narinfo request for an item would always return 404;
> one would have to wait until the item is baked to get 200 and download
> the substitute.

I'd argue that returning unconditionally the 404 is a problem.  If the
nar is getting baked, I guess that a 202[1] would be the appropriate
answer, and I'd leave the 404 for invalid store paths[2].  This way the
client could implement more policies: the classic timeout, but also, for
example, it might check other servers before checking once again if
nobody else has it, or directly wait until a 404 is reached.  WDYT?

Happy hacking!
Miguel

[1] Section 10.2.3 from https://www.ietf.org/rfc/rfc2616.txt
[2] I understand that it isn't at all a bad usage of the 404, as it
    explicitly says that the condition might be temporary, but on the
    other hand I don't know how could that extra information be used by
    a rogue client in any way worse than it could do right now, as the
    server process is doing the same computation more or less in both
    cases.
Ludovic Courtès Oct. 25, 2020, 4:49 p.m. UTC | #2
Hi!

Miguel Ángel Arruga Vivas <rosen644835@gmail.com> skribis:

> Ludovic Courtès <ludo@gnu.org> writes:
>> Thus, the first narinfo request for an item would always return 404;
>> one would have to wait until the item is baked to get 200 and download
>> the substitute.
>
> I'd argue that returning unconditionally the 404 is a problem.  If the
> nar is getting baked, I guess that a 202[1] would be the appropriate
> answer, and I'd leave the 404 for invalid store paths[2].  This way the
> client could implement more policies: the classic timeout, but also, for
> example, it might check other servers before checking once again if
> nobody else has it, or directly wait until a 404 is reached.  WDYT?

Indeed, 202 seems more appropriate (and it’s precisely half of 404, that
tells something!).

Unfortunately (guix scripts substitute) currently explicitly checks for
404 and 200 and considers anything else to be a transient error with a
default TTL (in ‘handle-narinfo-response’).  So we would need to adapt
that first and then wait until some time has passed before ‘guix
publish’ can return 202.  :-/

I guess we can change (guix scripts substitute) with that in mind
already.  WDYT?

Ludo’.
Miguel Arruga Vivas Oct. 25, 2020, 5:30 p.m. UTC | #3
Hi, Ludo!

Ludovic Courtès <ludo@gnu.org> writes:
> Hi!
>
> Miguel Ángel Arruga Vivas <rosen644835@gmail.com> skribis:
>
>> Ludovic Courtès <ludo@gnu.org> writes:
>>> Thus, the first narinfo request for an item would always return 404;
>>> one would have to wait until the item is baked to get 200 and download
>>> the substitute.
>>
>> I'd argue that returning unconditionally the 404 is a problem.  If the
>> nar is getting baked, I guess that a 202[1] would be the appropriate
>> answer, and I'd leave the 404 for invalid store paths[2].  This way the
>> client could implement more policies: the classic timeout, but also, for
>> example, it might check other servers before checking once again if
>> nobody else has it, or directly wait until a 404 is reached.  WDYT?
>
> Indeed, 202 seems more appropriate (and it’s precisely half of 404, that
> tells something!).

:-)

> Unfortunately (guix scripts substitute) currently explicitly checks for
> 404 and 200 and considers anything else to be a transient error with a
> default TTL (in ‘handle-narinfo-response’).  So we would need to adapt
> that first and then wait until some time has passed before ‘guix
> publish’ can return 202.  :-/

I see, it uses 'max-age from the http response only when it's a 404.
Nonetheless, correct me if I'm wrong, the difference is 5 vs 10 minutes,
so I don't think we should wait too much to upgrade both sides. :-)

> I guess we can change (guix scripts substitute) with that in mind
> already.  WDYT?

I fully agree with that.  Adding 202 together with 404 would be enough
as an start, wouldn't it?
-------------------------------8<-----------------------------
            (cache-narinfo! url (hash-part->path hash-part) #f
                            (if (or (= 404 code) (= 202 code))
                                ttl
                                %narinfo-transient-error-ttl))
------------------------------->8-----------------------------

Happy hacking!
Miguel
Ludovic Courtès Oct. 26, 2020, 10:50 a.m. UTC | #4
Hi,

Miguel Ángel Arruga Vivas <rosen644835@gmail.com> skribis:

> I fully agree with that.  Adding 202 together with 404 would be enough
> as an start, wouldn't it?
> -------------------------------8<-----------------------------
>             (cache-narinfo! url (hash-part->path hash-part) #f
>                             (if (or (= 404 code) (= 202 code))
>                                 ttl
>                                 %narinfo-transient-error-ttl))
> ------------------------------->8-----------------------------

Yes, exactly!

Ludo’.
Miguel Arruga Vivas Oct. 27, 2020, 7:19 p.m. UTC | #5
Hi,

Ludovic Courtès <ludo@gnu.org> writes:
> Hi,
>
> Miguel Ángel Arruga Vivas <rosen644835@gmail.com> skribis:
>
>> I fully agree with that.  Adding 202 together with 404 would be enough
>> as an start, wouldn't it?
>> -------------------------------8<-----------------------------
>>             (cache-narinfo! url (hash-part->path hash-part) #f
>>                             (if (or (= 404 code) (= 202 code))
>>                                 ttl
>>                                 %narinfo-transient-error-ttl))
>> ------------------------------->8-----------------------------
>
> Yes, exactly!

Should I, or you, push this before the release?  It's probably worth
having it already for 1.2.

The optimization could be cool too: IIUC could be only the other if
branch the one returning the 202 when it's widely accepted, perhaps I
should have explicitly pointed out that earlier instead of driving too
much the conversation to the return code, sorry for that. :-(

Happy hacking!
Miguel
Ludovic Courtès Oct. 28, 2020, 9:39 a.m. UTC | #6
Hi,

Miguel Ángel Arruga Vivas <rosen644835@gmail.com> skribis:

> Ludovic Courtès <ludo@gnu.org> writes:
>> Hi,
>>
>> Miguel Ángel Arruga Vivas <rosen644835@gmail.com> skribis:
>>
>>> I fully agree with that.  Adding 202 together with 404 would be enough
>>> as an start, wouldn't it?
>>> -------------------------------8<-----------------------------
>>>             (cache-narinfo! url (hash-part->path hash-part) #f
>>>                             (if (or (= 404 code) (= 202 code))
>>>                                 ttl
>>>                                 %narinfo-transient-error-ttl))
>>> ------------------------------->8-----------------------------
>>
>> Yes, exactly!
>
> Should I, or you, push this before the release?  It's probably worth
> having it already for 1.2.

Agreed, you can go ahead and push this change.

> The optimization could be cool too: IIUC could be only the other if
> branch the one returning the 202 when it's widely accepted, perhaps I
> should have explicitly pointed out that earlier instead of driving too
> much the conversation to the return code, sorry for that. :-(

No problem!

Ludo’.