Message ID | 460cdfa67b473ea2f1593668b2d9d0fd159378d0.1700244314.git.mail@cbaines.net |
---|---|
State | New |
Headers | show |
Series | [bug#67245] store: Use a non-blocking socket for store connections. | expand |
Hi Christopher, Christopher Baines <mail@cbaines.net> skribis: > For some applications, it's important to do this here rather than just making > the socket non-blocking after the connection is established because there can > be I/O on the socket that will block during the handshake. > > I've noticed this blocking during the handshake causing issues in the build > coordinator for example. > > * guix/store.scm (open-unix-domain-socket, open-inet-socket): Pass > SOCK_NONBLOCK when calling socket. > > Change-Id: I8225762b78448bc1f7b698c8de5d736e13f577bf I feel we should really discuss on Guix + Fibers since we’ve apparently been going through the exact same set of issues. :-) (The other thing that comes to mind is the resource pool!) > +++ b/guix/store.scm > @@ -460,7 +460,9 @@ (define (open-unix-domain-socket file) > '&store-connection-error' upon error." > (let ((s (with-fluids ((%default-port-encoding #f)) > ;; This trick allows use of the `scm_c_read' optimization. > - (socket PF_UNIX (logior SOCK_STREAM SOCK_CLOEXEC) 0))) > + (socket PF_UNIX > + (logior SOCK_STREAM SOCK_CLOEXEC SOCK_NONBLOCK) > + 0))) We cannot do this here because callers have to be prepared to deal with non-blocking sockets, and that’s not the case in Guix itself. In Cuirass, I have this: --8<---------------cut here---------------start------------->8--- (define (non-blocking-port port) "Make PORT non-blocking and return it." (let ((flags (fcntl port F_GETFL))) (when (zero? (logand O_NONBLOCK flags)) (fcntl port F_SETFL (logior O_NONBLOCK flags))) port)) (define (ensure-non-blocking-store-connection store) "Mark the file descriptor that backs STORE, a <store-connection>, as O_NONBLOCK." (match (store-connection-socket store) ((? file-port? port) (non-blocking-port port)) (_ #f))) (define-syntax-rule (with-store/non-blocking store exp ...) "Like 'with-store', bind STORE to a connection to the store, but ensure that said connection is non-blocking (O_NONBLOCK). Evaluate EXP... in that context." (with-store store (ensure-non-blocking-store-connection store) (let () exp ...))) --8<---------------cut here---------------end--------------->8--- Then ‘with-store/non-blocking’ is used in fiberized context where I know this is fine. I think it’ll have to remain this way until Guix itself is fiberized or something. Does that make sense? Ludo’.
Ludovic Courtès <ludo@gnu.org> writes: > Hi Christopher, > > Christopher Baines <mail@cbaines.net> skribis: > >> For some applications, it's important to do this here rather than just making >> the socket non-blocking after the connection is established because there can >> be I/O on the socket that will block during the handshake. >> >> I've noticed this blocking during the handshake causing issues in the build >> coordinator for example. >> >> * guix/store.scm (open-unix-domain-socket, open-inet-socket): Pass >> SOCK_NONBLOCK when calling socket. >> >> Change-Id: I8225762b78448bc1f7b698c8de5d736e13f577bf > > I feel we should really discuss on Guix + Fibers since we’ve apparently > been going through the exact same set of issues. :-) > > (The other thing that comes to mind is the resource pool!) I'm mostly ignoring these issues then coping the code once you write it :) >> +++ b/guix/store.scm >> @@ -460,7 +460,9 @@ (define (open-unix-domain-socket file) >> '&store-connection-error' upon error." >> (let ((s (with-fluids ((%default-port-encoding #f)) >> ;; This trick allows use of the `scm_c_read' optimization. >> - (socket PF_UNIX (logior SOCK_STREAM SOCK_CLOEXEC) 0))) >> + (socket PF_UNIX >> + (logior SOCK_STREAM SOCK_CLOEXEC SOCK_NONBLOCK) >> + 0))) > > We cannot do this here because callers have to be prepared to deal with > non-blocking sockets, and that’s not the case in Guix itself. I can see potential problems for programs outside of Guix which use suspendable ports, but given Guix doesn't use suspendable ports, this won't change behaviour, right? Obviously Guile will be working a bit differently, using poll when it needs to wait for I/O, but at the scheme level within Guix, things should be no different. I tried guix weather with this change, and things seemed fine. Is there a specific bit of Guix you're concerned about?
Hi Chris, Christopher Baines <mail@cbaines.net> skribis: > Ludovic Courtès <ludo@gnu.org> writes: >> I feel we should really discuss on Guix + Fibers since we’ve apparently >> been going through the exact same set of issues. :-) >> >> (The other thing that comes to mind is the resource pool!) > > I'm mostly ignoring these issues then coping the code once you write it > :) Heh, so we’re already in sync maybe, not bad. :-) >>> +++ b/guix/store.scm >>> @@ -460,7 +460,9 @@ (define (open-unix-domain-socket file) >>> '&store-connection-error' upon error." >>> (let ((s (with-fluids ((%default-port-encoding #f)) >>> ;; This trick allows use of the `scm_c_read' optimization. >>> - (socket PF_UNIX (logior SOCK_STREAM SOCK_CLOEXEC) 0))) >>> + (socket PF_UNIX >>> + (logior SOCK_STREAM SOCK_CLOEXEC SOCK_NONBLOCK) >>> + 0))) >> >> We cannot do this here because callers have to be prepared to deal with >> non-blocking sockets, and that’s not the case in Guix itself. > > I can see potential problems for programs outside of Guix which use > suspendable ports, but given Guix doesn't use suspendable ports, this > won't change behaviour, right? > > Obviously Guile will be working a bit differently, using poll when it > needs to wait for I/O, but at the scheme level within Guix, things > should be no different. Hmm yes, I think you’re right. One issue is if we hand over the file descriptor to something that’s not Guile. Off the top of my head, this happens with inferiors and in the ‘build’ procedure of ‘build-self.scm’ (well, the process that receives that file descriptor is Guile, but if it’s older than 3.0 (?), then it may behave differently.) So it should be safe indeed, but adds a bit of overhead (hopping via ‘current-{read,write}-waiter’) and needs good testing. Laziness gives an incentive for the status quo, but I’m not opposed to the change if we get more confidence (test suite passing, tests with inferiors and ‘time-machine’, and some more auditing.) Ludo’.
Ludovic Courtès <ludo@gnu.org> writes: >>>> +++ b/guix/store.scm >>>> @@ -460,7 +460,9 @@ (define (open-unix-domain-socket file) >>>> '&store-connection-error' upon error." >>>> (let ((s (with-fluids ((%default-port-encoding #f)) >>>> ;; This trick allows use of the `scm_c_read' optimization. >>>> - (socket PF_UNIX (logior SOCK_STREAM SOCK_CLOEXEC) 0))) >>>> + (socket PF_UNIX >>>> + (logior SOCK_STREAM SOCK_CLOEXEC SOCK_NONBLOCK) >>>> + 0))) >>> >>> We cannot do this here because callers have to be prepared to deal with >>> non-blocking sockets, and that’s not the case in Guix itself. >> >> I can see potential problems for programs outside of Guix which use >> suspendable ports, but given Guix doesn't use suspendable ports, this >> won't change behaviour, right? >> >> Obviously Guile will be working a bit differently, using poll when it >> needs to wait for I/O, but at the scheme level within Guix, things >> should be no different. > > Hmm yes, I think you’re right. > > One issue is if we hand over the file descriptor to something that’s not > Guile. Off the top of my head, this happens with inferiors and in the > ‘build’ procedure of ‘build-self.scm’ (well, the process that receives > that file descriptor is Guile, but if it’s older than 3.0 (?), then it > may behave differently.) > > So it should be safe indeed, but adds a bit of overhead (hopping via > ‘current-{read,write}-waiter’) and needs good testing. > > Laziness gives an incentive for the status quo, but I’m not opposed to > the change if we get more confidence (test suite passing, tests with > inferiors and ‘time-machine’, and some more auditing.) Maybe we can just move the with-store/non-blocking in to Guix, as that will solve the immediate issue. I've sent a new patch for that.
diff --git a/guix/store.scm b/guix/store.scm index f8e77b2cd9..216be98c05 100644 --- a/guix/store.scm +++ b/guix/store.scm @@ -460,7 +460,9 @@ (define (open-unix-domain-socket file) '&store-connection-error' upon error." (let ((s (with-fluids ((%default-port-encoding #f)) ;; This trick allows use of the `scm_c_read' optimization. - (socket PF_UNIX (logior SOCK_STREAM SOCK_CLOEXEC) 0))) + (socket PF_UNIX + (logior SOCK_STREAM SOCK_CLOEXEC SOCK_NONBLOCK) + 0))) (a (make-socket-address PF_UNIX file))) (system-error-to-connection-error file @@ -488,7 +490,8 @@ (define (open-inet-socket host port) ((ai rest ...) (let ((s (socket (addrinfo:fam ai) ;; TCP/IP only - (logior SOCK_STREAM SOCK_CLOEXEC) IPPROTO_IP))) + (logior SOCK_STREAM SOCK_CLOEXEC SOCK_NONBLOCK) + IPPROTO_IP))) (catch 'system-error (lambda ()