From patchwork Sun Oct 6 15:42:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Tomas Volf <~@wolfsden.cz> X-Patchwork-Id: 68640 Return-Path: X-Original-To: patchwork@mira.cbaines.net Delivered-To: patchwork@mira.cbaines.net Received: by mira.cbaines.net (Postfix, from userid 113) id 95B7E27BBEA; Sun, 6 Oct 2024 16:44:24 +0100 (BST) X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on mira.cbaines.net X-Spam-Level: X-Spam-Status: No, score=-7.8 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_ADSP_ALL,DKIM_SIGNED,DKIM_VALID,MAILING_LIST_MULTI, RCVD_IN_DNSWL_BLOCKED,RCVD_IN_MSPIKE_H2,RCVD_IN_VALIDITY_CERTIFIED, RCVD_IN_VALIDITY_RPBL,RCVD_IN_VALIDITY_SAFE,SPF_HELO_PASS autolearn=unavailable autolearn_force=no version=3.4.6 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mira.cbaines.net (Postfix) with ESMTPS id C256427BBE2 for ; Sun, 6 Oct 2024 16:44:23 +0100 (BST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sxTQS-000640-BK; Sun, 06 Oct 2024 11:44:04 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sxTQM-00062y-F7 for guix-patches@gnu.org; Sun, 06 Oct 2024 11:43:59 -0400 Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1sxTQK-00023Y-W5; Sun, 06 Oct 2024 11:43:57 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debbugs.gnu.org; s=debbugs-gnu-org; h=MIME-Version:Date:From:To:Subject; bh=FyT7c9CJ6vv+TLbqeuNKSqvZDoIhQKQx/ESabvtl0kw=; b=n84McS8BmcZmbLzEXssh7LYOO6LterFKa8QeKYLFXmyDCTbPWFMLKsmUl4H8VqOpEBECACc5lW6H4vnAdjTaRFSTinNn42TfO88sq4aI+JiXVHsfn5GCQVqfuxZcd8fmqz69dp8M1Sa1WbkB4vTRL6hjcikl1Z6KzN724wzAY04DIbGpSEMnuCURavRpoB1cC+Iq3DKoLqiXWYnQnCA4ip+ectZpH/O1NcvOK56hO6+ZeIzFA7DKu7SJc0kqj8wiVl+G5Egpq54b82l8wq6bH2jmIxi0KNqUylVO2PeRiVcCuoazTRdZuiIODOK7x5AIW2ls33Q0f72j34nfN4pIoA==; Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1sxTQQ-0000Js-1R; Sun, 06 Oct 2024 11:44:02 -0400 X-Loop: help-debbugs@gnu.org Subject: [bug#73660] [PATCH] gexp: Improve support of Unicode characters. Resent-From: Tomas Volf <~@wolfsden.cz> Original-Sender: "Debbugs-submit" Resent-CC: guix@cbaines.net, pelzflorian@pelzflorian.de, dev@jpoiret.xyz, ludo@gnu.org, othacehe@gnu.org, maxim.cournoyer@gmail.com, zimon.toutoune@gmail.com, me@tobias.gr, guix-patches@gnu.org Resent-Date: Sun, 06 Oct 2024 15:44:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 73660 X-GNU-PR-Package: guix-patches X-GNU-PR-Keywords: patch To: 73660@debbugs.gnu.org Cc: Tomas Volf <~@wolfsden.cz>, Christopher Baines , Florian Pelz , Josselin Poiret , Ludovic =?utf-8?q?Court=C3=A8s?= , Mathieu Othacehe , Maxim Cournoyer , Simon Tournier , Tobias Geerinckx-Rice X-Debbugs-Original-To: guix-patches@gnu.org X-Debbugs-Original-Xcc: Christopher Baines , Florian Pelz , Josselin Poiret , Ludovic =?utf-8?q?Court=C3=A8s?= , Mathieu Othacehe , Maxim Cournoyer , Simon Tournier , Tobias Geerinckx-Rice Received: via spool by submit@debbugs.gnu.org id=B.17282293861124 (code B ref -1); Sun, 06 Oct 2024 15:44:01 +0000 Received: (at submit) by debbugs.gnu.org; 6 Oct 2024 15:43:06 +0000 Received: from localhost ([127.0.0.1]:41987 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1sxTPV-0000I2-DC for submit@debbugs.gnu.org; Sun, 06 Oct 2024 11:43:06 -0400 Received: from lists.gnu.org ([209.51.188.17]:37454) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <~@wolfsden.cz>) id 1sxTPR-0000HP-PR for submit@debbugs.gnu.org; Sun, 06 Oct 2024 11:43:03 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <~@wolfsden.cz>) id 1sxTPL-00060P-CT for guix-patches@gnu.org; Sun, 06 Oct 2024 11:42:55 -0400 Received: from wolfsden.cz ([37.205.8.62]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <~@wolfsden.cz>) id 1sxTPI-00021n-PL for guix-patches@gnu.org; Sun, 06 Oct 2024 11:42:55 -0400 Received: by wolfsden.cz (Postfix, from userid 104) id 5B6FA31EFF6; Sun, 6 Oct 2024 15:42:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=wolfsden.cz; s=mail; t=1728229370; bh=ZBRaB9MJIRKXeghlOajWnO1m7Ic2O5xdKKnrOZEl82o=; h=From:To:Cc:Subject:Date; b=pxMmK+ci/oq0O5IPT1Kugz/stNfI33uhC/FXYlBV5JEZWMM7g4MRbLeD9QMBwmsBG nLP6bdeS2WD5YlabEDLgV/mMRxDIVoKgANcAroK+BtAMq5pOUf4vNZXcLtNfBskg8V aBPP95JdxMQEvdxpDvsUVEBd7HfHWYR8a8G5XU9lf1pHitVu+1A2yXXbUZ94pKVCXt vKMschrBc4oxw0Br+MQGE72pwDtyXB6aZ0q4fZOwLKZvZYrFidnjccohMfbaz3KY4O 9nzsoQb5ElQ4moOsRVL221XqrEaId5Gc+ySyb/9GTn4e0oy2oFdEj2srbrJLAH5Hn9 UPt1oBP3m4pqtYVBJw65em8jwupCjxK9ok9NfxCC/M0rQImrUi6ojm1PEln7kte4BL kS2gAhSnDaL7Ix2X1HhQ8nvJGRy2kJyF0qqhWXrPpZ0djFgxhWfw82Bk9BtHbAXn0U UCZ5179pjqKoZ/NEA8aK5MiH0J5UxIAhdaGfd+LkvokuDvqZpL36PExUN9Wr3qKXj3 oMTC6VsYdviJUvd8VqMHKIx7xitugQnGVnd+soI8rcnSCmxgy1O2WCHjH52YKwlsRx XRmBZVD4IN1thu9c/ICuYy6SHxvx9aHVQ3ZrpsjVVBgjZ9oU0HrleOGPnB93eNHrOT kg4J3K5U9MUkl/5PL3Ykl4Gw= Received: from localhost (unknown [146.70.134.132]) by wolfsden.cz (Postfix) with ESMTPSA id 8964A320A72; Sun, 6 Oct 2024 15:42:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=wolfsden.cz; s=mail; t=1728229369; bh=ZBRaB9MJIRKXeghlOajWnO1m7Ic2O5xdKKnrOZEl82o=; h=From:To:Cc:Subject:Date; b=WCbiWurtPUZ5dm+2+5Fre9w34Vz5uoQowSTSO60nyHEGfykoT3x3yGt1keZoZo4ze omc/s7EwPV5aUZARxn9xOIb4AK5iBQ6+62o3gadUKdyZMKhq4NRJh6ufDM9BmEeDF/ faFr9mjat9OObOqEJ768Fa/PFkIyulMVx+w0KO49H6vJRVcxZaj0g2wsQ1PU+Mj8gg G9rQA5RpTPNAzEdcXmqGxJCKuDeFeGjRfY/pxkOq6PzufjMkU/KIzn5cypo4gCREbI kyl969x/thG26O6sB+LkwWGiw9gD6p2IiYR5a012hyM6xnQH9HvhLFswFz1Kqnd3dO F5BQ3OdUmBmEGcf/wcicAOG757B4bs0KJpAif22WnvtYtmANek6aCMCpUskuK/+uf4 lLqNA83X76i+GrUW1QBPlBQcOlmdpokNqYbTIe22cxr2sYc8g58UMYa71D44HGDmBs KbstWIwvli7Y6L3pbt95qDbEYh2ZjFwBY+pXZ3aBhT32olLq0Ahq8CFeE2PrMEIXJM WrLVH7ZnVptd7T6BRWRcYEl0X3/Lu/72k1SBQj+0SigGWaJ1cMxszvPuT6iJlwtgS3 Dj8e8Lq3qIgZMwzzW9r+N2FtChf2kRlJCe6dKOhUoeqGrc83wgcmPbRVzThXN1PQOl hUgHXqt4yR+IDC0O9BlOVF68= From: Tomas Volf <~@wolfsden.cz> Date: Sun, 6 Oct 2024 17:42:26 +0200 Message-ID: X-Mailer: git-send-email 2.46.0 MIME-Version: 1.0 Received-SPF: pass client-ip=37.205.8.62; envelope-from=~@wolfsden.cz; helo=wolfsden.cz X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: guix-patches@gnu.org List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-patches-bounces+patchwork=mira.cbaines.net@gnu.org Sender: guix-patches-bounces+patchwork=mira.cbaines.net@gnu.org X-getmail-retrieved-from-mailbox: Patches Support for non-ASCII characters was mixed. Some gexp forms did support them, while others did not. Combined with current value for %default-port-conversion-strategy, that sometimes led to unpleasant surprises. For example: (scheme-file "utf8" #~(with-output-to-file #$output (λ _ (display "猫")))) Was written to the store as: ((? _ (display "\u732b"))) No, that is not font issue on your part, that is an actual #\? instead of the lambda character. Which, surprisingly, does not do what it should when executed. The solution is to switch to C.UTF-8 locale where possible, since it is now always available. Or to explicitly set the port encoding. No tests are provided, since majority of tests/gexp.scm use guile in version 2, and it tends to work under it. The issues occur mostly with guile 3. I did test it locally using: #!/bin/sh set -eu set -x [ -f guix.scm ] || { echo >&2 Run from root of Guix repo.; exit 1; } [ -f gnu.scm ] || { echo >&2 Run from root of Guix repo.; exit 1; } cat >猫.scm <<'EOF' (define-module (猫) #:export (say)) (define (say) "nyaaaa~~~~!") EOF mkdir -p dir-with-utf8-file cp 猫.scm dir-with-utf8-file/ cat >repro.scm <<'EOF' (use-modules (guix build utils) (guix derivations) (guix gexp) (guix store) (ice-9 ftw) (ice-9 textual-ports)) (define cat "猫") (define (drv-content drv) (call-with-input-file (derivation->output-path drv) get-string-all)) (define (out-content out) (call-with-input-file out get-string-all)) (define (drv-listing drv) (scandir (derivation->output-path drv))) (define (dir-listing dir) (scandir dir)) (define-macro (test exp lower? report) (let ((type (car exp))) `(false-if-exception (let ((drv (with-store %store (run-with-store %store (,(if lower? lower-object identity) ,exp))))) (format #t "~%~a:~%" ',type) (when (with-store %store (build-derivations %store (list drv))) (format #t "~a~%" (,report drv))))))) (test (computed-file "utf8" #~(with-output-to-file #$output (λ _ (display #$cat)))) #t drv-content) (test (program-file "utf8" #~((λ _ (display #$cat)))) #t drv-content) (test (scheme-file "utf8" #~((λ _ (display #$cat)))) #t drv-content) (test (text-file* "utf8" cat cat cat) #f drv-content) (test (compiled-modules '((猫))) #f drv-listing) (test (file-union "utf8" `((,cat ,(plain-file "utf8" cat)))) #t drv-listing) ;;; No fix needed: (test (imported-modules '((猫))) #f dir-listing) (test (local-file "dir-with-utf8-file" #:recursive? #t) #t dir-listing) (test (plain-file "utf8" cat) #t out-content) (test (mixed-text-file "utf8" cat cat cat) #t drv-content) (test (directory-union "utf8" (list (local-file "dir-with-utf8-file" #:recursive? #t))) #t dir-listing) EOF guix shell -CWN -D guix glibc-locales -- \ env LANG=C.UTF-8 ./pre-inst-env guix repl -- ./repro.scm Before this commit, the output is: + '[' -f guix.scm ']' + '[' -f gnu.scm ']' + cat + mkdir -p dir-with-utf8-file + cp 猫.scm dir-with-utf8-file/ + cat + guix shell -CWN -D guix glibc-locales -- env LANG=C.UTF-8 ./pre-inst-env guix repl -- ./repro.scm computed-file: ? program-file: #!/gnu/store/mfkz7fvlfpv3ppwbkv0imb19nrf95akf-guile-3.0.9/bin/guile --no-auto-compile !# ((? _ (display "\u732b"))) scheme-file: ((? _ (display "\u732b"))) text-file*: ??? compiled-modules: building path(s) `/gnu/store/ay3jifyvliigfgnz67jf0kgngzpya5a5-module-import-compiled' Backtrace: 5 (primitive-load "/gnu/store/rn7b0dq6iqfmmqyqzamix2mjmfy?") In ice-9/eval.scm: 619:8 4 (_ #f) In srfi/srfi-1.scm: 460:18 3 (fold # ?) In ice-9/eval.scm: 245:16 2 (_ #(#(#) # ?)) In ice-9/boot-9.scm: 1982:24 1 (_ _) In unknown file: 0 (stat "./???.scm" #) ERROR: In procedure stat: In procedure stat: No such file or directory: "./???.scm" builder for `/gnu/store/dxg87135zcd6a1c92dlrkyvxlbhfwfld-module-import-compiled.drv' failed with exit code 1 file-union: (. .. ?) imported-modules: (. .. 猫.scm) local-file: (. .. 猫.scm) plain-file: 猫 mixed-text-file: 猫猫猫 directory-union: (. .. 猫.scm) Which I think you will agree is far from optimal. After my fix the output changes to: + '[' -f guix.scm ']' + '[' -f gnu.scm ']' + cat + mkdir -p dir-with-utf8-file + cp 猫.scm dir-with-utf8-file/ + cat + guix shell -CWN -D guix glibc-locales -- env LANG=C.UTF-8 ./pre-inst-env guix repl -- ./repro.scm computed-file: 猫 program-file: #!/gnu/store/8kbmn359jqkgsbqgqxnmiryvd9ynz8w7-guile-3.0.9/bin/guile --no-auto-compile !# ((λ _ (display "猫"))) scheme-file: ((λ _ (display "猫"))) text-file*: 猫猫猫 compiled-modules: (. .. 猫.go) file-union: (. .. 猫) imported-modules: (. .. 猫.scm) local-file: (. .. 猫.scm) plain-file: 猫 mixed-text-file: 猫猫猫 directory-union: (. .. 猫.scm) Which is actually what the user would expect. I also added missing arguments to the documentation. * guix/gexp.scm (computed-file): Set LANG to C.UTF-8 by default. (compiled-modules): Try to `setlocale'. (gexp->script), (gexp->file): New `locale' argument defaulting to C.UTF-8. (text-file*): Set output port encoding to UTF-8. * doc/guix.texi (G-Expressions)[computed-file]: Document the changes. Use @var. Document #:guile. [gexp->script]: Document #:locale. Fix default value for #:target. [gexp->file]: Document #:locale, #:system and #:target. Change-Id: Ib323b51af88a588b780ff48ddd04db8be7c729fb --- doc/guix.texi | 11 +++++++---- guix/gexp.scm | 24 ++++++++++++++++++------ 2 files changed, 25 insertions(+), 10 deletions(-) diff --git a/doc/guix.texi b/doc/guix.texi index 52e36e4354..683ba2f44b 100644 --- a/doc/guix.texi +++ b/doc/guix.texi @@ -12270,7 +12270,9 @@ G-Expressions This is the declarative counterpart of @code{text-file}. @end deffn -@deffn {Procedure} computed-file name gexp [#:local-build? #t] [#:options '()] +@deffn {Procedure} computed-file @var{name} @var{gexp} @ + [#:local-build? #t] [#:guile] @ + [#:options '(#:env-vars (("LANG" . "C.UTF-8")))] Return an object representing the store item @var{name}, a file or directory computed by @var{gexp}. When @var{local-build?} is true (the default), the derivation is built locally. @var{options} is a list of @@ -12281,7 +12283,7 @@ G-Expressions @deffn {Monadic Procedure} gexp->script @var{name} @var{exp} @ [#:guile (default-guile)] [#:module-path %load-path] @ - [#:system (%current-system)] [#:target #f] + [#:system (%current-system)] [#:target 'current] [#:locale "C.UTF-8"] Return an executable script @var{name} that runs @var{exp} using @var{guile}, with @var{exp}'s imported modules in its search path. Look up @var{exp}'s modules in @var{module-path}. @@ -12318,8 +12320,9 @@ G-Expressions @deffn {Monadic Procedure} gexp->file @var{name} @var{exp} @ [#:set-load-path? #t] [#:module-path %load-path] @ - [#:splice? #f] @ - [#:guile (default-guile)] + [#:splice? #f] [#:guile (default-guile)] @ + [#:system (%current-system)] [#:target 'current] @ + [#:locale "C.UTF-8"] Return a derivation that builds a file @var{name} containing @var{exp}. When @var{splice?} is true, @var{exp} is considered to be a list of expressions that will be spliced in the resulting file. diff --git a/guix/gexp.scm b/guix/gexp.scm index e44aea6420..c8aba91779 100644 --- a/guix/gexp.scm +++ b/guix/gexp.scm @@ -597,7 +597,10 @@ (define-record-type (options computed-file-options)) ;list of arguments (define* (computed-file name gexp - #:key guile (local-build? #t) (options '())) + #:key + guile + (local-build? #t) + (options '(#:env-vars (("LANG" . "C.UTF-8"))))) "Return an object representing the store item NAME, a file or directory computed by GEXP. When LOCAL-BUILD? is #t (the default), it ensures the corresponding derivation is built locally. OPTIONS may be used to pass @@ -1700,6 +1703,9 @@ (define* (compiled-modules modules (system base target) (system base compile)) + ;; Best effort. The locale is not installed in all contexts. + (false-if-exception (setlocale LC_ALL "C.UTF-8")) + (define modules (getenv "modules")) @@ -1990,7 +1996,8 @@ (define* (gexp->script name exp #:key (guile (default-guile)) (module-path %load-path) (system (%current-system)) - (target 'current)) + (target 'current) + (locale "C.UTF-8")) "Return an executable script NAME that runs EXP using GUILE, with EXP's imported modules in its search path. Look up EXP's modules in MODULE-PATH." (mlet* %store-monad ((target (if (eq? target 'current) @@ -2033,7 +2040,8 @@ (define* (gexp->script name exp ;; These derivations are not worth offloading or ;; substituting. #:local-build? #t - #:substitutable? #f))) + #:substitutable? #f + #:env-vars `(("LANG" . ,locale))))) (define* (gexp->file name exp #:key (guile (default-guile)) @@ -2041,7 +2049,8 @@ (define* (gexp->file name exp #:key (module-path %load-path) (splice? #f) (system (%current-system)) - (target 'current)) + (target 'current) + (locale "C.UTF-8")) "Return a derivation that builds a file NAME containing EXP. When SPLICE? is true, EXP is considered to be a list of expressions that will be spliced in the resulting file. @@ -2081,7 +2090,8 @@ (define* (gexp->file name exp #:key #:local-build? #t #:substitutable? #f #:system system - #:target target) + #:target target + #:env-vars `(("LANG" . ,locale))) (gexp->derivation name (gexp (call-with-output-file (ungexp output) @@ -2098,7 +2108,8 @@ (define* (gexp->file name exp #:key #:local-build? #t #:substitutable? #f #:system system - #:target target)))) + #:target target + #:env-vars `(("LANG" . ,locale)))))) (define* (text-file* name #:rest text) "Return as a monadic value a derivation that builds a text file containing @@ -2108,6 +2119,7 @@ (define* (text-file* name #:rest text) (define builder (gexp (call-with-output-file (ungexp output "out") (lambda (port) + (set-port-encoding! port "UTF-8") (display (string-append (ungexp-splicing text)) port))))) (gexp->derivation name builder