diff mbox series

[bug#67245] store: Use a non-blocking socket for store connections.

Message ID 460cdfa67b473ea2f1593668b2d9d0fd159378d0.1700244314.git.mail@cbaines.net
State New
Headers show
Series [bug#67245] store: Use a non-blocking socket for store connections. | expand

Commit Message

Christopher Baines Nov. 17, 2023, 6:05 p.m. UTC
For some applications, it's important to do this here rather than just making
the socket non-blocking after the connection is established because there can
be I/O on the socket that will block during the handshake.

I've noticed this blocking during the handshake causing issues in the build
coordinator for example.

* guix/store.scm (open-unix-domain-socket, open-inet-socket): Pass
SOCK_NONBLOCK when calling socket.

Change-Id: I8225762b78448bc1f7b698c8de5d736e13f577bf
---
 guix/store.scm | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)


base-commit: e35b7c5386c1bfacf47ed31bac9b503373dd26fc

Comments

Ludovic Courtès Nov. 26, 2023, 10:16 p.m. UTC | #1
Hi Christopher,

Christopher Baines <mail@cbaines.net> skribis:

> For some applications, it's important to do this here rather than just making
> the socket non-blocking after the connection is established because there can
> be I/O on the socket that will block during the handshake.
>
> I've noticed this blocking during the handshake causing issues in the build
> coordinator for example.
>
> * guix/store.scm (open-unix-domain-socket, open-inet-socket): Pass
> SOCK_NONBLOCK when calling socket.
>
> Change-Id: I8225762b78448bc1f7b698c8de5d736e13f577bf

I feel we should really discuss on Guix + Fibers since we’ve apparently
been going through the exact same set of issues.  :-)

(The other thing that comes to mind is the resource pool!)

> +++ b/guix/store.scm
> @@ -460,7 +460,9 @@ (define (open-unix-domain-socket file)
>  '&store-connection-error' upon error."
>    (let ((s (with-fluids ((%default-port-encoding #f))
>               ;; This trick allows use of the `scm_c_read' optimization.
> -             (socket PF_UNIX (logior SOCK_STREAM SOCK_CLOEXEC) 0)))
> +             (socket PF_UNIX
> +                     (logior SOCK_STREAM SOCK_CLOEXEC SOCK_NONBLOCK)
> +                     0)))

We cannot do this here because callers have to be prepared to deal with
non-blocking sockets, and that’s not the case in Guix itself.

In Cuirass, I have this:

--8<---------------cut here---------------start------------->8---
(define (non-blocking-port port)
  "Make PORT non-blocking and return it."
  (let ((flags (fcntl port F_GETFL)))
    (when (zero? (logand O_NONBLOCK flags))
      (fcntl port F_SETFL (logior O_NONBLOCK flags)))
    port))

(define (ensure-non-blocking-store-connection store)
  "Mark the file descriptor that backs STORE, a <store-connection>, as
O_NONBLOCK."
  (match (store-connection-socket store)
    ((? file-port? port)
     (non-blocking-port port))
    (_ #f)))

(define-syntax-rule (with-store/non-blocking store exp ...)
  "Like 'with-store', bind STORE to a connection to the store, but ensure that
said connection is non-blocking (O_NONBLOCK).  Evaluate EXP... in that
context."
  (with-store store
    (ensure-non-blocking-store-connection store)
    (let ()
      exp ...)))
--8<---------------cut here---------------end--------------->8---

Then ‘with-store/non-blocking’ is used in fiberized context where I know
this is fine.

I think it’ll have to remain this way until Guix itself is fiberized or
something.

Does that make sense?

Ludo’.
Christopher Baines Nov. 27, 2023, 9:48 a.m. UTC | #2
Ludovic Courtès <ludo@gnu.org> writes:

> Hi Christopher,
>
> Christopher Baines <mail@cbaines.net> skribis:
>
>> For some applications, it's important to do this here rather than just making
>> the socket non-blocking after the connection is established because there can
>> be I/O on the socket that will block during the handshake.
>>
>> I've noticed this blocking during the handshake causing issues in the build
>> coordinator for example.
>>
>> * guix/store.scm (open-unix-domain-socket, open-inet-socket): Pass
>> SOCK_NONBLOCK when calling socket.
>>
>> Change-Id: I8225762b78448bc1f7b698c8de5d736e13f577bf
>
> I feel we should really discuss on Guix + Fibers since we’ve apparently
> been going through the exact same set of issues.  :-)
>
> (The other thing that comes to mind is the resource pool!)

I'm mostly ignoring these issues then coping the code once you write it
:)

>> +++ b/guix/store.scm
>> @@ -460,7 +460,9 @@ (define (open-unix-domain-socket file)
>>  '&store-connection-error' upon error."
>>    (let ((s (with-fluids ((%default-port-encoding #f))
>>               ;; This trick allows use of the `scm_c_read' optimization.
>> -             (socket PF_UNIX (logior SOCK_STREAM SOCK_CLOEXEC) 0)))
>> +             (socket PF_UNIX
>> +                     (logior SOCK_STREAM SOCK_CLOEXEC SOCK_NONBLOCK)
>> +                     0)))
>
> We cannot do this here because callers have to be prepared to deal with
> non-blocking sockets, and that’s not the case in Guix itself.

I can see potential problems for programs outside of Guix which use
suspendable ports, but given Guix doesn't use suspendable ports, this
won't change behaviour, right?

Obviously Guile will be working a bit differently, using poll when it
needs to wait for I/O, but at the scheme level within Guix, things
should be no different.

I tried guix weather with this change, and things seemed fine. Is there
a specific bit of Guix you're concerned about?
Ludovic Courtès Nov. 30, 2023, 9:11 p.m. UTC | #3
Hi Chris,

Christopher Baines <mail@cbaines.net> skribis:

> Ludovic Courtès <ludo@gnu.org> writes:
>> I feel we should really discuss on Guix + Fibers since we’ve apparently
>> been going through the exact same set of issues.  :-)
>>
>> (The other thing that comes to mind is the resource pool!)
>
> I'm mostly ignoring these issues then coping the code once you write it
> :)

Heh, so we’re already in sync maybe, not bad.  :-)

>>> +++ b/guix/store.scm
>>> @@ -460,7 +460,9 @@ (define (open-unix-domain-socket file)
>>>  '&store-connection-error' upon error."
>>>    (let ((s (with-fluids ((%default-port-encoding #f))
>>>               ;; This trick allows use of the `scm_c_read' optimization.
>>> -             (socket PF_UNIX (logior SOCK_STREAM SOCK_CLOEXEC) 0)))
>>> +             (socket PF_UNIX
>>> +                     (logior SOCK_STREAM SOCK_CLOEXEC SOCK_NONBLOCK)
>>> +                     0)))
>>
>> We cannot do this here because callers have to be prepared to deal with
>> non-blocking sockets, and that’s not the case in Guix itself.
>
> I can see potential problems for programs outside of Guix which use
> suspendable ports, but given Guix doesn't use suspendable ports, this
> won't change behaviour, right?
>
> Obviously Guile will be working a bit differently, using poll when it
> needs to wait for I/O, but at the scheme level within Guix, things
> should be no different.

Hmm yes, I think you’re right.

One issue is if we hand over the file descriptor to something that’s not
Guile.  Off the top of my head, this happens with inferiors and in the
‘build’ procedure of ‘build-self.scm’ (well, the process that receives
that file descriptor is Guile, but if it’s older than 3.0 (?), then it
may behave differently.)

So it should be safe indeed, but adds a bit of overhead (hopping via
‘current-{read,write}-waiter’) and needs good testing.

Laziness gives an incentive for the status quo, but I’m not opposed to
the change if we get more confidence (test suite passing, tests with
inferiors and ‘time-machine’, and some more auditing.)

Ludo’.
Christopher Baines May 12, 2024, 5:38 p.m. UTC | #4
Ludovic Courtès <ludo@gnu.org> writes:

>>>> +++ b/guix/store.scm
>>>> @@ -460,7 +460,9 @@ (define (open-unix-domain-socket file)
>>>>  '&store-connection-error' upon error."
>>>>    (let ((s (with-fluids ((%default-port-encoding #f))
>>>>               ;; This trick allows use of the `scm_c_read' optimization.
>>>> -             (socket PF_UNIX (logior SOCK_STREAM SOCK_CLOEXEC) 0)))
>>>> +             (socket PF_UNIX
>>>> +                     (logior SOCK_STREAM SOCK_CLOEXEC SOCK_NONBLOCK)
>>>> +                     0)))
>>>
>>> We cannot do this here because callers have to be prepared to deal with
>>> non-blocking sockets, and that’s not the case in Guix itself.
>>
>> I can see potential problems for programs outside of Guix which use
>> suspendable ports, but given Guix doesn't use suspendable ports, this
>> won't change behaviour, right?
>>
>> Obviously Guile will be working a bit differently, using poll when it
>> needs to wait for I/O, but at the scheme level within Guix, things
>> should be no different.
>
> Hmm yes, I think you’re right.
>
> One issue is if we hand over the file descriptor to something that’s not
> Guile.  Off the top of my head, this happens with inferiors and in the
> ‘build’ procedure of ‘build-self.scm’ (well, the process that receives
> that file descriptor is Guile, but if it’s older than 3.0 (?), then it
> may behave differently.)
>
> So it should be safe indeed, but adds a bit of overhead (hopping via
> ‘current-{read,write}-waiter’) and needs good testing.
>
> Laziness gives an incentive for the status quo, but I’m not opposed to
> the change if we get more confidence (test suite passing, tests with
> inferiors and ‘time-machine’, and some more auditing.)

Maybe we can just move the with-store/non-blocking in to Guix, as that
will solve the immediate issue.

I've sent a new patch for that.
diff mbox series

Patch

diff --git a/guix/store.scm b/guix/store.scm
index f8e77b2cd9..216be98c05 100644
--- a/guix/store.scm
+++ b/guix/store.scm
@@ -460,7 +460,9 @@  (define (open-unix-domain-socket file)
 '&store-connection-error' upon error."
   (let ((s (with-fluids ((%default-port-encoding #f))
              ;; This trick allows use of the `scm_c_read' optimization.
-             (socket PF_UNIX (logior SOCK_STREAM SOCK_CLOEXEC) 0)))
+             (socket PF_UNIX
+                     (logior SOCK_STREAM SOCK_CLOEXEC SOCK_NONBLOCK)
+                     0)))
         (a (make-socket-address PF_UNIX file)))
 
     (system-error-to-connection-error file
@@ -488,7 +490,8 @@  (define (open-inet-socket host port)
       ((ai rest ...)
        (let ((s (socket (addrinfo:fam ai)
                         ;; TCP/IP only
-                        (logior SOCK_STREAM SOCK_CLOEXEC) IPPROTO_IP)))
+                        (logior SOCK_STREAM SOCK_CLOEXEC SOCK_NONBLOCK)
+                        IPPROTO_IP)))
 
          (catch 'system-error
            (lambda ()