diff mbox

[bug#50040,0/2] publish: Always render nar/narinfo during backing.

Message ID 87zgsbuss7.fsf@gnu.org
State New
Headers show

Commit Message

Mathieu Othacehe Sept. 17, 2021, 3:27 p.m. UTC
Hey Ludo,

> The ‘mumi worker’ process also seems to be doing a lot of I/O and using
> quite a bit of CPU (presumably allocating a lot).  Its ‘perf’ profile is
> like this:
>
>    2.62%  .mumi-real  libguile-3.0.so.1.1.1  [.] scm_ilength
>    1.20%  .mumi-real  libguile-3.0.so.1.1.1  [.] scm_getc
>    1.12%  .mumi-real  libgc.so.1.4.3         [.] GC_add_to_black_list_normal.part.4
>    1.06%  .mumi-real  libgc.so.1.4.3         [.] GC_build_fl
>    0.98%  .mumi-real  libgc.so.1.4.3         [.] GC_reclaim_clear
>    0.97%  .mumi-real  libguile-3.0.so.1.1.1  [.] get_callee_vcode
>    0.91%  .mumi-real  libgc.so.1.4.3         [.] GC_generic_malloc_many
>    0.90%  .mumi-real  libguile-3.0.so.1.1.1  [.] peek_byte_or_eof
>    0.78%  .mumi-real  [JIT] tid 62601        [.] 0x00007f886964804d
>
> Food for thought…

Yep, interesting findings, thanks for sharing. I just discovered that
the Nginx server is logging in /var/log/nginx/error.log the errors we
are interested in:

--8<---------------cut here---------------start------------->8---
2021/09/17 14:13:37 [error] 129925#0: *3727055 upstream timed out (110: Connection timed out) while reading response header from upstream, client: XXX, server: ci.guix.gnu.org, request: "GET /3m1j4ddx11
prp6azw3rjdhljg5vchf1s.narinfo HTTP/1.1", upstream: "http://127.0.0.1:3000/3m1j4ddx11prp6azw3rjdhljg5vchf1s.narinfo", host: "ci.guix.gnu.org"
2021/09/17 14:13:37 [error] 129925#0: *3727020 upstream timed out (110: Connection timed out) while reading response header from upstream, client: XXX, server: ci.guix.gnu.org, request: "GET /pdhr7fgql3
qm1x24yxkj4c2f6s7vffi5.narinfo HTTP/1.1", upstream: "http://127.0.0.1:3000/pdhr7fgql3qm1x24yxkj4c2f6s7vffi5.narinfo", host: "ci.guix.gnu.org"
2021/09/17 14:13:37 [error] 129925#0: *3726717 upstream timed out (110: Connection timed out) while reading response header from upstream, client: XXX, server: ci.guix.gnu.org, request: "GET /19gv6sq
qcmn8q020bpm9jyl6n9bjpg05.narinfo HTTP/1.1", upstream: "http://127.0.0.1:3000/19gv6sqqcmn8q020bpm9jyl6n9bjpg05.narinfo", host: "ci.guix.gnu.org"
2021/09/17 14:13:37 [error] 129925#0: *3725361 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 141.80.167.169, server: ci.guix.gnu.org, request: "GET /y8dp69gk
y4dn5vnn7wnx04j20q0yrdja.narinfo HTTP/1.1", upstream: "http://127.0.0.1:3000/y8dp69gky4dn5vnn7wnx04j20q0yrdja.narinfo", host: "141.80.167.131"
--8<---------------cut here---------------end--------------->8---

Despite the following diff, extending the timeout delays to 10 seconds,
we still have a lot of those errors. Some happening on the build farm
(141.80.167.X), some directly on user machines.

--8<---------------cut here---------------start------------->8---

Thanks,

Mathieu

Comments

Mathieu Othacehe Oct. 5, 2021, 10:07 a.m. UTC | #1
Hey,

The guix-publish daemon has been updated on berlin to include this fix:
222f4661ed11b225f458cbe495a296f233129bec. Sadly, it looks like it
doesn't improve the situation.

I observed that the builds of the following evaluation failed because of
the typical: "cannot build missing derivation
?/gnu/store/4p6hgfkj11q5mqk39lb3kxczwqswzh97-iso-image-installer.drv?".

https://ci.guix.gnu.org/eval/27520

When looking at the Nginx logs, we have those concomitant errors:

--8<---------------cut here---------------start------------->8---
021/10/05 10:52:44 [info] 129925#0: *19790054 client 94.237.57.97 closed keepalive connection
2021/10/05 10:52:44 [error] 129925#0: *19787848 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 141.80.167.181, server: ci.guix.gnu.org, request: "GET /bhjk59akv02qibag0pxx6cjlrlgs1gjm.narinfo HTTP/1.1", upstream: "http://127.0.0.
1:3000/bhjk59akv02qibag0pxx6cjlrlgs1gjm.narinfo", host: "141.80.167.131"
2021/10/05 10:52:44 [error] 129925#0: *19788215 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 141.80.167.183, server: ci.guix.gnu.org, request: "GET /g243k85y7z7j6dq7nx74gp3d28hhad71.narinfo HTTP/1.1", upstream: "http://127.0.0.
1:3000/g243k85y7z7j6dq7nx74gp3d28hhad71.narinfo", host: "141.80.167.131"
2021/10/05 10:52:44 [error] 129925#0: *19787890 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 141.80.167.160, server: ci.guix.gnu.org, request: "GET /gxp6fhj7mszs90b24k630wprrkwsm5vb.narinfo HTTP/1.1", upstream: "http://127.0.0.
1:3000/gxp6fhj7mszs90b24k630wprrkwsm5vb.narinfo", host: "141.80.167.131"
2021/10/05 10:52:44 [error] 129925#0: *19788116 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 141.80.167.163, server: ci.guix.gnu.org, request: "GET /hyjiw3cnvsn2k5mpfafmczy68jmpb71x.narinfo HTTP/1.1", upstream: "http://127.0.0.
1:3000/hyjiw3cnvsn2k5mpfafmczy68jmpb71x.narinfo", host: "141.80.167.131"
2021/10/05 10:52:44 [error] 129925#0: *19788032 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 141.80.167.167, server: ci.guix.gnu.org, request: "GET /ngnnppl9ncixxjid0crfhwjn4h0xyg8c.narinfo HTTP/1.1", upstream: "http://127.0.0.
1:3000/ngnnppl9ncixxjid0crfhwjn4h0xyg8c.narinfo", host: "141.80.167.131"
2021/10/05 10:52:44 [error] 129925#0: *19787887 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 141.80.167.181, server: ci.guix.gnu.org, request: "GET /lvpmiq4p8wqyxi34q99qm7d0xfwq484a.narinfo HTTP/1.1", upstream: "http://127.0.0.
1:3000/lvpmiq4p8wqyxi34q99qm7d0xfwq484a.narinfo", host: "141.80.167.131"
2021/10/05 10:52:44 [error] 129925#0: *19788074 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 141.80.167.186, server: ci.guix.gnu.org, request: "GET /wxv8a2845d5zxq0yaxd57arjvdvhppib.narinfo HTTP/1.1", upstream: "http://127.0.0.
1:3000/wxv8a2845d5zxq0yaxd57arjvdvhppib.narinfo", host: "141.80.167.131"
2021/10/05 10:52:44 [error] 129925#0: *19787681 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 141.80.167.165, server: ci.guix.gnu.org, request: "GET /8kmc1fznb29rmbjiyvkbkixhzm2287al.narinfo HTTP/1.1", upstream: "http://127.0.0.
1:3000/8kmc1fznb29rmbjiyvkbkixhzm2287al.narinfo", host: "141.80.167.131"
2021/10/05 10:52:44 [error] 129925#0: *19787714 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 141.80.167.180, server: ci.guix.gnu.org, request: "GET /795mhps0sdli2ndvcv1aq54b7fg59d97.narinfo HTTP/1.1", upstream: "http://127.0.0.
1:3000/795mhps0sdli2ndvcv1aq54b7fg59d97.narinfo", host: "141.80.167.131"
2021/10/05 10:52:44 [error] 129925#0: *19788327 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 141.80.167.165, server: ci.guix.gnu.org, request: "GET /zsv7zg4xxr2hlirsbifka5y62394zzl8.narinfo HTTP/1.1", upstream: "http://127.0.0.
1:3000/zsv7zg4xxr2hlirsbifka5y62394zzl8.narinfo", host: "141.80.167.131"
2021/10/05 10:52:44 [error] 129925#0: *19787809 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 141.80.167.169, server: ci.guix.gnu.org, request: "GET /a8npwin71gbn9fprn18i5j26cl91l3rn.narinfo HTTP/1.1", upstream: "http://127.0.0.
--8<---------------cut here---------------end--------------->8---

Thanks,

Mathieu
diff mbox

Patch

diff --git a/hydra/nginx/berlin.scm b/hydra/nginx/berlin.scm
index 44ff28e..7a085e5 100644
--- a/hydra/nginx/berlin.scm
+++ b/hydra/nginx/berlin.scm
@@ -97,9 +97,9 @@ 
 
            ;; Do not tolerate slowness of hydra.gnu.org when fetching
            ;; narinfos: better return 504 quickly than wait forever.
-           "proxy_connect_timeout 2s;"
-           "proxy_read_timeout 2s;"
-           "proxy_send_timeout 2s;"
+           "proxy_connect_timeout 10s;"
+           "proxy_read_timeout 10s;"
+           "proxy_send_timeout 10s;"
--8<---------------cut here---------------end--------------->8---

I'll see if I can reproduce the I/O pressure on demand, to be able to
have more accurate strace/perf investigations.