diff mbox series

[bug#40993] cuirass: Add build products download support.

Message ID 874ksz4w21.fsf@gmail.com
State Accepted
Headers show
Series [bug#40993] cuirass: Add build products download support. | expand

Checks

Context Check Description
cbaines/applying patch fail View Laminar job

Commit Message

Mathieu Othacehe May 1, 2020, 1:35 p.m. UTC
Hey Danny,

> very cool!

Thanks :)

> Though I agree using sendfile would be much better, especially since the user
> can download 800 MB image files there.
>
> The guile (web server) module allows passing a procedure as the #:body, but
> then it makes a bytevector out of the result and hard-codes the content-type :P.
>
> Eventually (web server http) http-write is reached, which only supports encoding
> bytevectors and #f, that's it.  No files.
>
> So we'd have to overwrite http-write.
>
> But we are using our own (web server fiberized) impl already.
>
> So our impl chould be extended to be able to get and process FDs.
>
> client-loop there has
>
>               (lambda (response body)
>                 (write-response response client)
>                 (when body
>                   (put-bytevector client body))
>
> which means the "when body" part should be extended to also handle files, not just bytevectors.

The problem is that even with our fiberized implementation, what we pass
as "body" is checked in "sanitize-response" procedure of Guile's (web
server) module.

With the (very) hacky patch attached, I fool sanitize-response, by
sending the file name as a bytevector. This allows me to save gigabytes
of RAM when downloading disk images.

WDYT?

Thanks,

Mathieu

Comments

Ludovic Courtès May 1, 2020, 9:17 p.m. UTC | #1
Hello!

Mathieu Othacehe <m.othacehe@gmail.com> skribis:

> With the (very) hacky patch attached, I fool sanitize-response, by
> sending the file name as a bytevector. This allows me to save gigabytes
> of RAM when downloading disk images.

Yay!  This is similar to what ‘guix publish’ does.  :-)

> From 0c5e91c170639d50d1cc339fa0b0e68ea4fba68c Mon Sep 17 00:00:00 2001
> From: Mathieu Othacehe <m.othacehe@gmail.com>
> Date: Fri, 1 May 2020 15:03:12 +0200
> Subject: [PATCH] cuirass: Use sendfiles instead of raw copies.
>
> * src/cuirass/http.scm (respond-file): Send the file name as an UTF8
> bytevector, instead of the raw file content,
> (respond-gzipped-file): ditto. Also set 'content-disposition header.
> * src/web/server/fiberized.scm (client-loop): Check if 'content-disposition is
> set. If it's the case, assume that the bytevector is the file name, and use
> sendfiles to send it. Otherwise, keep the existing behaviour and send directly
> the received bytevector.


> +(define extend-response (@@ (web server) extend-response))

@@ is evil and it’s not guaranteed to work with Guile 3: the procedure
might be inlined.

But you can use these ‘guix publish’ helper procedures, which rely on
(srfi srfi-9 gnu):

  (define (strip-headers response)
    "Return RESPONSE's headers minus 'Content-Length' and our internal headers."
    (fold alist-delete
          (response-headers response)
          '(content-length x-raw-file x-nar-compression)))

  (define (with-content-length response length)
    "Return RESPONSE with a 'content-length' header set to LENGTH."
    (set-field response (response-headers)
               (alist-cons 'content-length length
                           (strip-headers response))))

> +                      (call-with-input-file file
> +                        (lambda (port)
> +                          (write-response
> +                           (extend-response response 'content-length
> +                                            file-size)
> +                           client)
> +                          (sendfile client port file-size))))

I didn’t look at the other patches, but note that ‘sendfile’ blocks.
Since Cuirass is fiberized, you shouldn’t block a fiber.

‘guix publish’ doesn’t use Fibers but it shouldn’t block either while
sending a nar, so what it does is spawn a new thread for the ‘sendfile’
call.

HTH!

Ludo’.
diff mbox series

Patch

From 0c5e91c170639d50d1cc339fa0b0e68ea4fba68c Mon Sep 17 00:00:00 2001
From: Mathieu Othacehe <m.othacehe@gmail.com>
Date: Fri, 1 May 2020 15:03:12 +0200
Subject: [PATCH] cuirass: Use sendfiles instead of raw copies.

* src/cuirass/http.scm (respond-file): Send the file name as an UTF8
bytevector, instead of the raw file content,
(respond-gzipped-file): ditto. Also set 'content-disposition header.
* src/web/server/fiberized.scm (client-loop): Check if 'content-disposition is
set. If it's the case, assume that the bytevector is the file name, and use
sendfiles to send it. Otherwise, keep the existing behaviour and send directly
the received bytevector.
---
 src/cuirass/http.scm         | 25 ++++++++++---------------
 src/web/server/fiberized.scm | 21 +++++++++++++++++++--
 2 files changed, 29 insertions(+), 17 deletions(-)

diff --git a/src/cuirass/http.scm b/src/cuirass/http.scm
index 79fa246..bdc780c 100644
--- a/src/cuirass/http.scm
+++ b/src/cuirass/http.scm
@@ -40,7 +40,8 @@ 
   #:use-module (web uri)
   #:use-module (fibers)
   #:use-module (fibers channels)
-  #:use-module ((rnrs bytevectors) #:select (utf8->string))
+  #:use-module ((rnrs bytevectors) #:select (utf8->string
+                                             string->utf8))
   #:use-module (sxml simple)
   #:use-module (cuirass templates)
   #:use-module (guix utils)
@@ -246,19 +247,14 @@  Hydra format."
         "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd")
        (sxml->xml body port))))
 
-  (define* (respond-file file
-                         #:key name)
+  (define* (respond-file file)
     (let ((content-type (or (assoc-ref %file-mime-types
                                        (file-extension file))
                             '(application/octet-stream))))
       (respond `((content-type . ,content-type)
-                 ,@(if name
-                       `((content-disposition
-                          . (form-data (filename . ,name))))
-                       '()))
-               ;; FIXME: FILE is potentially big so it'd be better to not load
-               ;; it in memory and instead 'sendfile' it.
-               #:body (call-with-input-file file get-bytevector-all))))
+                 (content-disposition
+                  . (form-data (filename . ,(basename file)))))
+               #:body (string->utf8 file))))
 
   (define (respond-static-file path)
     ;; PATH is a list of path components
@@ -273,10 +269,9 @@  Hydra format."
   (define (respond-gzipped-file file)
     ;; Return FILE with 'gzip' content-encoding.
     (respond `((content-type . (text/plain (charset . "UTF-8")))
-               (content-encoding . (gzip)))
-             ;; FIXME: FILE is potentially big so it'd be better to not load
-             ;; it in memory and instead 'sendfile' it.
-             #:body (call-with-input-file file get-bytevector-all)))
+               (content-encoding . (gzip))
+               (content-disposition . (form-data (filename . ,file))))
+             #:body (string->utf8 file)))
 
   (define (respond-build-not-found build-id)
     (respond-json-with-error
@@ -521,7 +516,7 @@  Hydra format."
 
     (('GET "download" id)
      (let ((path (db-get-build-product-path id)))
-       (respond-file path #:name (basename path))))
+       (respond-file path)))
 
     (('GET "static" path ...)
      (respond-static-file path))
diff --git a/src/web/server/fiberized.scm b/src/web/server/fiberized.scm
index 308b642..68ae132 100644
--- a/src/web/server/fiberized.scm
+++ b/src/web/server/fiberized.scm
@@ -37,6 +37,7 @@ 
   #:use-module (web request)
   #:use-module (web response)
   #:use-module (web server)
+  #:use-module ((rnrs bytevectors) #:select (utf8->string))
   #:use-module (ice-9 binary-ports)
   #:use-module (ice-9 match)
   #:use-module (fibers)
@@ -92,6 +93,8 @@ 
               ((0) (memq 'keep-alive (response-connection response)))))
            (else #f)))))
 
+(define extend-response (@@ (web server) extend-response))
+
 (define (client-loop client have-request)
   ;; Always disable Nagle's algorithm, as we handle buffering
   ;; ourselves.
@@ -119,9 +122,23 @@ 
                                               #:headers '((content-length . 0)))
                               #vu8()))))
               (lambda (response body)
-                (write-response response client)
                 (when body
-                  (put-bytevector client body))
+                  (let* ((headers (response-headers response))
+                         (file? (assq-ref headers 'content-disposition))
+                         (file (and file? (utf8->string body)))
+                         (file-size (and file? (stat:size (stat file)))))
+                    (cond
+                     (file?
+                      (call-with-input-file file
+                        (lambda (port)
+                          (write-response
+                           (extend-response response 'content-length
+                                            file-size)
+                           client)
+                          (sendfile client port file-size))))
+                     (else
+                      (write-response response client)
+                      (put-bytevector client body)))))
                 (force-output client)
                 (if (and (keep-alive? response)
                          (not (eof-object? (peek-char client))))
-- 
2.26.0