From patchwork Wed Feb 22 05:17:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: kyle X-Patchwork-Id: 47211 Return-Path: X-Original-To: patchwork@mira.cbaines.net Delivered-To: patchwork@mira.cbaines.net Received: by mira.cbaines.net (Postfix, from userid 113) id 14EFF16B97; Wed, 22 Feb 2023 05:18:23 +0000 (GMT) X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on mira.cbaines.net X-Spam-Level: X-Spam-Status: No, score=-3.7 required=5.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mira.cbaines.net (Postfix) with ESMTPS id 1F46F16B5E for ; Wed, 22 Feb 2023 05:18:21 +0000 (GMT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pUhW1-0006Gy-VX; Wed, 22 Feb 2023 00:18:05 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pUhVz-0006Go-BQ for guix-patches@gnu.org; Wed, 22 Feb 2023 00:18:03 -0500 Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1pUhVy-00041q-Rx for guix-patches@gnu.org; Wed, 22 Feb 2023 00:18:03 -0500 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1pUhVy-0001Xr-EV for guix-patches@gnu.org; Wed, 22 Feb 2023 00:18:02 -0500 X-Loop: help-debbugs@gnu.org Subject: [bug#61701] [PATCH] doc: Propose new cookbook section for reproducible research. Resent-From: kyle Original-Sender: "Debbugs-submit" Resent-CC: guix-patches@gnu.org Resent-Date: Wed, 22 Feb 2023 05:18:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 61701 X-GNU-PR-Package: guix-patches X-GNU-PR-Keywords: patch To: 61701@debbugs.gnu.org Cc: Kyle Andrews X-Debbugs-Original-To: guix-patches@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.16770430645904 (code B ref -1); Wed, 22 Feb 2023 05:18:02 +0000 Received: (at submit) by debbugs.gnu.org; 22 Feb 2023 05:17:44 +0000 Received: from localhost ([127.0.0.1]:57816 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pUhVf-0001XA-NT for submit@debbugs.gnu.org; Wed, 22 Feb 2023 00:17:44 -0500 Received: from lists.gnu.org ([209.51.188.17]:32940) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pUhVb-0001X0-KT for submit@debbugs.gnu.org; Wed, 22 Feb 2023 00:17:41 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pUhVb-0006G2-9b for guix-patches@gnu.org; Wed, 22 Feb 2023 00:17:39 -0500 Received: from mout01.posteo.de ([185.67.36.65]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pUhVY-0003ce-V3 for guix-patches@gnu.org; Wed, 22 Feb 2023 00:17:39 -0500 Received: from submission (posteo.de [185.67.36.169]) by mout01.posteo.de (Postfix) with ESMTPS id 1BE4824048F for ; Wed, 22 Feb 2023 06:17:32 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=posteo.net; s=2017; t=1677043053; bh=88EFeGbzQ6Sr6/OkhBheV0lRPMic1IKI8vQsygcOa5o=; h=From:To:Cc:Subject:Date:From; b=mxV+WGQsz6aYGodHcud8PYJ9aD52pcqq2Hk1G9in6UVMpAN1KLW/O1lYBEIE4R9td 9gY/6biA0rRQvPfk9BZIt5oHsk/xbmnBQO59sVx094zQQfw5HWZl+I+XftQNOCGb4z WAtU9vUFzscEe7nSL0/SYTl7ocnNc5ZwwIOya0tBOFbIxoXddfUhZgmt2OCRMweEPv 2jKkOqEfjWABEwyE1JeJN7dNSSxex+1Fb7Nzxx408hoLDKOXHfJcybXCO3LIkQDCR7 HjRZnjC+PlEsYUQiZQ18JIZXlrGEHWyUtAYfoWUZ4N4K6PkwQComxCBudBeORh5X8j Oq+Zpltxb9QrQ== Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4PM4Cv60TZz9rxD; Wed, 22 Feb 2023 06:17:31 +0100 (CET) From: kyle Date: Wed, 22 Feb 2023 05:17:29 +0000 Message-Id: <3ffea5b37541a6f3409299f3e8e6200bc1c9aef6.1677043049.git.kyle@posteo.net> MIME-Version: 1.0 Received-SPF: pass client-ip=185.67.36.65; envelope-from=kyle@posteo.net; helo=mout01.posteo.de X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: guix-patches@gnu.org List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-patches-bounces+patchwork=mira.cbaines.net@gnu.org Sender: guix-patches-bounces+patchwork=mira.cbaines.net@gnu.org X-getmail-retrieved-from-mailbox: Patches From: Kyle Andrews The intent was to cover the most common cases where R and python using researchers could rapidly achieve the benefits of reproducibility. --- doc/guix-cookbook.texi | 174 +++++++++++++++++++++++++++++++++++ guix/build-system/python.scm | 1 + 2 files changed, 175 insertions(+) diff --git a/doc/guix-cookbook.texi b/doc/guix-cookbook.texi index b9fb916f4a..8a10bcbec7 100644 --- a/doc/guix-cookbook.texi +++ b/doc/guix-cookbook.texi @@ -114,6 +114,7 @@ Top Environment management +* Reproducible Research in Practice:: Write manifests to create reproducible environments. * Guix environment via direnv:: Setup Guix environment with direnv Installing Guix on a Cluster @@ -3538,9 +3539,182 @@ Environment management demonstrate such utilities. @menu +* Reproducible Research in Practice:: Write manifests to create reproducible environments * Guix environment via direnv:: Setup Guix environment with direnv @end menu +@node Reproducible Research in Practice +@section Common scientific software environments + +Many researchers write applied scientific software supported by a +mixture of more generic tools developed by teams written within the R +and Python ecosystems and supporting shell utilities. Even researchers +who predominantly stick to using just R or just python often have to use +both R and python at the same time when collaborating with others. This +tutorial covers strategies for creating manifests to handle such +situations. + +Widely used R packages are hosted on CRAN, which employs a strict test +suite backed by continuous integration infrastructure for the latest R +version. A positive result of this rigid discipline is that most R +packages from the same period of time will interoperate well together +when used with a particular R version. This means there is a clear +low-complexity target for achieving a reproducible environment. + +Writing a manifest for packaging R code alone requires only minimal +knowledge of the Guix infrastructure. This stub should work for most +cases involving the R packages already in Guix. + +@example +(use-modules + (gnu packages cran) + (gnu packages statistics)) + +(packages->manifest + (list r r-tidyverse)) + +R packages are defined predominantly inside of gnu/packages/cran.scm and +gnu/packages/statistics.scm files under a guix source repository. + +This manifest can be run with the basic guix shell command: + +@example +guix shell --manifest=manifest.scm --container +@end example + +Please remember at the end to pin your channels so that others in the +future know how to recover your exact Guix environment. + +@example +guix describe --format=channels > channels.scm +@end example + +This can be done with Guix time machine: + +@example +guix time-machine --channels=channels.scm \ + -- guix shell --manifest=manifest.scm --container +@end example + +In contrast, the python scientific ecosystem is far less +standardized. There is no effort made to integrate all python packages +together. While there is a latest python version, it is less often less +dominantly used for various reasons such as the fact that python tends +to be employed with much larger teams than R is. This makes packaging up +reproducible python environments much more difficult. Adding R together +with python as a mixture complicates things still further. However, we +have to be mindful of the goals of reproducible research. + +If reproducibility becomes an end in itself and not a catlyst towards +faster discovery, then Guix will be a non-starter for scientists. Their +goal is to develop useful understanding about particular aspects of the +world. + +Thankfully, three common scenarios cover the vast majority of +needs. These are: + +@itemize +@item +combining standard package definitions with custom package definitions +@item +combining package definitions from the current revision with other revisions +@item +combining package variants which need a modified build-system +@end itemize + +In the rest of the tutorial we develop a manifest which tackles all +three of these common issues. The hope is that if you see the hardest +possible common situation as being readily solvable without writing +thousands of lines of code, researchers will clearly see it as worth the +effort which will not pose a significant detour from the main line of +their research. + +@example +(use-modules + (guix packages) + (guix download) + (guix licenses) + (guix profiles) + (gnu packages) + (gnu packages cran) + (guix inferior) + (guix channels) + (guix build-system python)) + +;; guix import pypi APTED +(define python-apted + (package + (name "python-apted") + (version "1.0.3") + (source (origin + (method url-fetch) + (uri (pypi-uri "apted" version)) + (sha256 + (base32 + "1sawf6s5c64fgnliwy5w5yxliq2fc215m6alisl7yiflwa0m3ymy")))) + (build-system python-build-system) + (home-page "https://github.com/JoaoFelipe/apted") + (synopsis "APTED algorithm for the Tree Edit Distance") + (description "APTED algorithm for the Tree Edit Distance") + (license expat))) + +(define last-guix-with-python-3.6 + (list + (channel + (name 'guix) + (url "https://git.savannah.gnu.org/git/guix.git") + (commit + "d66146073def03d1a3d61607bc6b77997284904b")))) + +(define connection-to-last-guix-with-python-3.6 + (inferior-for-channels last-guix-with-python-3.6)) + +(define first car) + +(define python-3.6 + (first + (lookup-inferior-packages + connection-to-last-guix-with-python-3.6 "python"))) + +(define python3.6-numpy + (first + (lookup-inferior-packages + connection-to-last-guix-with-python-3.6 "python-numpy"))) + +(define included-packages + (list r r-reticulate)) + +(define inferior-packages + (list python-3.6 python3.6-numpy)) + +(define package-with-python-3.6 + (package-with-explicit-python python-3.6 + "python-" "python3.6-" 'python3-variant)) + +(define custom-variant-packages + (list (package-with-python-3.6 python-apted))) + +(concatenate-manifest + (map packages->manifest + (list + included-packages + inferior-packages + custom-variant-packages))) +@end example + +This should produce a profile with the latest R and an older python +3.6. These should be able to interoperate with code like: + +@example +library(reticulate) +use_python("python") +apted = import("apted") +t1 = '{a{b}{c}}' +t2 = '{a{b{d}}}' +metric = apted$APTED(t1, t2) +distance = metric$compute_edit_distance() +@end example + @node Guix environment via direnv @section Guix environment via direnv diff --git a/guix/build-system/python.scm b/guix/build-system/python.scm index c8f04b2298..d4aaab906d 100644 --- a/guix/build-system/python.scm +++ b/guix/build-system/python.scm @@ -36,6 +36,7 @@ (define-module (guix build-system python) #:use-module (srfi srfi-1) #:use-module (srfi srfi-26) #:export (%python-build-system-modules + package-with-explicit-python package-with-python2 strip-python2-variant default-python