diff mbox series

[bug#41444] Fix guile-fibers resource leak

Message ID 87zha0zqft.fsf@cune.org
State Accepted
Headers show
Series [bug#41444] Fix guile-fibers resource leak | expand

Checks

Context Check Description
cbaines/applying patch fail View Laminar job

Commit Message

Caleb Ristvedt May 22, 2020, 1:52 a.m. UTC
This adds a patch to guile-fibers to fix a resource leak that caused
file descriptors to be opened and never closed with each invocation of
`run-fibers'.  This is presumably what was causing the tests to fail, as
guile will abort when it gets EMFILE while attempting to create a new
thread.  I've verified that it builds on my system, but it's only a
4-core machine, and the rate at which file descriptors leak scales with
the number of cores, so it's possible it would have built successfully
here regardless.  Could someone with access to a system with more cores
verify that it now builds properly there?

Hopefully a bug fix release will show up soon enough and we can get rid
of this.

- reepca

Comments

Christopher Baines May 22, 2020, 5:44 p.m. UTC | #1
Caleb Ristvedt <caleb.ristvedt@cune.org> writes:

> This adds a patch to guile-fibers to fix a resource leak that caused
> file descriptors to be opened and never closed with each invocation of
> `run-fibers'.  This is presumably what was causing the tests to fail, as
> guile will abort when it gets EMFILE while attempting to create a new
> thread.  I've verified that it builds on my system, but it's only a
> 4-core machine, and the rate at which file descriptors leak scales with
> the number of cores, so it's possible it would have built successfully
> here regardless.  Could someone with access to a system with more cores
> verify that it now builds properly there?

I've tried this on bayfront.guix.gnu.org which has 32 cores and I'm very
glad to say it seems to work!

Maybe tweak the capitalisation in the commit message, "New patch", "Add
it ...", "Use it", but yeah, I'm all for merging this, with it I'll be
able to reconfigure bayfront hopefully. Are you set to tweak the commit
message and push?

Thanks,

Chris
Caleb Ristvedt May 22, 2020, 7:38 p.m. UTC | #2
Christopher Baines <mail@cbaines.net> writes:

> I've tried this on bayfront.guix.gnu.org which has 32 cores and I'm very
> glad to say it seems to work!

Excellent.

> Maybe tweak the capitalisation in the commit message, "New patch", "Add
> it ...", "Use it",

Done.

> Are you set to tweak the commit message and push?

Indeed, tweaked and pushed as 9af90aafdfd8afd5fb7b5377ca5daf2215d38d7a.

- reepca
Caleb Ristvedt May 22, 2020, 7:39 p.m. UTC | #3
closed.
diff mbox series

Patch

From 659fa6b70cb8364187753e240076cdb107320070 Mon Sep 17 00:00:00 2001
From: Caleb Ristvedt <caleb.ristvedt@cune.org>
Date: Thu, 21 May 2020 20:30:58 -0500
Subject: [PATCH] gnu: guile-fibers: Add patch to fix resource leak.

guile-fibers@1.0.0 has a resource leak where run-fibers will only destroy one
scheduler, but it creates as many as there are cpu cores by default (see
https://github.com/wingo/fibers/issues/36).  This causes the tests to fail on
systems with many cores, and can cause guile to crash under certain
circumstances.  This fixes that resource leak.  At present neither git master
nor the latest release has fixed this yet.

* gnu/packages/patches/guile-fibers-destroy-peer-schedulers.patch: new patch.
* gnu/local.mk: add it to the list of patches.
* gnu/packages/guile-xyz.scm (guile-fibers): use it.
---
 gnu/local.mk                                  |  1 +
 gnu/packages/guile-xyz.scm                    |  5 +++-
 ...guile-fibers-destroy-peer-schedulers.patch | 24 +++++++++++++++++++
 3 files changed, 29 insertions(+), 1 deletion(-)
 create mode 100644 gnu/packages/patches/guile-fibers-destroy-peer-schedulers.patch

diff --git a/gnu/local.mk b/gnu/local.mk
index 1d9de9a57e..2f24f892b1 100644
--- a/gnu/local.mk
+++ b/gnu/local.mk
@@ -1053,6 +1053,7 @@  dist_patch_DATA =						\
   %D%/packages/patches/guile-3.0-relocatable.patch		\
   %D%/packages/patches/guile-linux-syscalls.patch		\
   %D%/packages/patches/guile-3.0-linux-syscalls.patch		\
+  %D%/packages/patches/guile-fibers-destroy-peer-schedulers.patch \
   %D%/packages/patches/guile-gdbm-ffi-support-gdbm-1.14.patch	\
   %D%/packages/patches/guile-present-coding.patch		\
   %D%/packages/patches/guile-rsvg-pkgconfig.patch		\
diff --git a/gnu/packages/guile-xyz.scm b/gnu/packages/guile-xyz.scm
index 674b1f922b..a1deee32d1 100644
--- a/gnu/packages/guile-xyz.scm
+++ b/gnu/packages/guile-xyz.scm
@@ -523,7 +523,10 @@  Unix-style DSV format and RFC 4180 format.")
                     (("#:use-module \\(fibers\\)")
                      (string-append "#:use-module (fibers)\n"
                                     "#:use-module (ice-9 threads)\n")))
-                  #t))))
+                  #t))
+              (patches
+               ;; fixes a resource leak that causes crashes in the tests
+               (search-patches "guile-fibers-destroy-peer-schedulers.patch"))))
     (build-system gnu-build-system)
     (arguments
      '(;; The code uses 'scm_t_uint64' et al., which are deprecated in 3.0.
diff --git a/gnu/packages/patches/guile-fibers-destroy-peer-schedulers.patch b/gnu/packages/patches/guile-fibers-destroy-peer-schedulers.patch
new file mode 100644
index 0000000000..8bb7153153
--- /dev/null
+++ b/gnu/packages/patches/guile-fibers-destroy-peer-schedulers.patch
@@ -0,0 +1,24 @@ 
+Fibers 1.0.0 has a bug in run-fibers in which peer schedulers aren't destroyed -
+so if you had 4 cores, 1 would be destroyed when run-fibers returned, but the
+other 3 would stay around.  Each scheduler uses 3 file descriptors, so for
+machines with many cores, this resource leak adds up quickly - quickly enough
+that the test suite can even fail because of it.
+
+See https://github.com/wingo/fibers/issues/36.
+
+This fixes that.  It should be safe to destroy the peer schedulers at the given
+point because the threads that could be running them are all either dead or the
+current thread.
+
+As of May 21, 2020, this bug still existed in the 1.0.0 (latest) release and in
+git master.
+--- a/fibers.scm	2020-05-21 18:38:06.890690154 -0500
++++ b/fibers.scm	2020-05-21 18:38:56.395686693 -0500
+@@ -137,5 +137,6 @@
+              (%run-fibers scheduler hz finished? affinity))
+            (lambda ()
+              (stop-auxiliary-threads scheduler)))))
++      (for-each destroy-scheduler (scheduler-remote-peers scheduler))
+       (destroy-scheduler scheduler)
+       (apply values (atomic-box-ref ret))))))
+
-- 
2.26.2