Message ID | 3e187cf8646059a513b502a17abe9f88daae6f6b.1701943221.git.ludo@gnu.org |
---|---|
State | New |
Headers | show |
Series | Update glibc to 2.38; make C.UTF-8 always available | expand |
Ludovic Courtès <ludo@gnu.org> skribis: > + ;; Install the C.UTF-8 locale so there's always a UTF-8 > + ;; locale around. > + (let* ((out (assoc-ref outputs "out")) > + (bin (string-append out "/bin")) > + (locale (string-append out "/lib/locale/" > + ,(package-version > + this-package)))) > + (mkdir-p locale) > + (invoke (string-append bin "/localedef") > + "--no-archive" "--prefix" locale > + "-i" "C" "-f" "UTF-8" > + (string-append locale "/C.UTF-8"))))) I realize now that this cannot work when cross-compiling, because the this ‘localedef’ binary is not executable on the build machine. I suspect libc builds an additional ‘localedef’ for the build machine but I’m not sure where it is, hmm… Ludo’.
Ludovic Courtès <ludo@gnu.org> skribis: > Ludovic Courtès <ludo@gnu.org> skribis: > >> + ;; Install the C.UTF-8 locale so there's always a UTF-8 >> + ;; locale around. >> + (let* ((out (assoc-ref outputs "out")) >> + (bin (string-append out "/bin")) >> + (locale (string-append out "/lib/locale/" >> + ,(package-version >> + this-package)))) >> + (mkdir-p locale) >> + (invoke (string-append bin "/localedef") >> + "--no-archive" "--prefix" locale >> + "-i" "C" "-f" "UTF-8" >> + (string-append locale "/C.UTF-8"))))) > > I realize now that this cannot work when cross-compiling, because the > this ‘localedef’ binary is not executable on the build machine. > > I suspect libc builds an additional ‘localedef’ for the build machine > but I’m not sure where it is, hmm… I was told on #glibc that (1) there’s no ‘localedef’ for the build machine produced during cross-compilation, and (2) that more generally, there’s no way to cross-build locale data, that endianness and other things may matter. I suspect #2 was about the locale archive and not locale data, because evidence suggests that locale data is system-independent: --8<---------------cut here---------------start------------->8--- $ for s in aarch64-linux powerpc64le-linux armhf-linux i686-linux ; do diff -r $(guix build glibc-locales@2.35) $(guix build glibc-locales@2.35 -s "$s") && echo "$s same as x86_64-linux" ; done aarch64-linux same as x86_64-linux powerpc64le-linux same as x86_64-linux armhf-linux same as x86_64-linux i686-linux same as x86_64-linux $ guix describe guix 6e2dd51 repository URL: https://git.savannah.gnu.org/git/guix.git branch: master commit: 6e2dd51df5f3f51e9056dd4f2e1b036195ab3caa --8<---------------cut here---------------end--------------->8--- Efraim, could you check against powerpc-linux, which is the only big-endian target we +/- support? Ludo’.
On Thu, Dec 07, 2023 at 10:26:36PM +0100, Ludovic Courtès wrote: > Ludovic Courtès <ludo@gnu.org> skribis: > > > Ludovic Courtès <ludo@gnu.org> skribis: > > > >> + ;; Install the C.UTF-8 locale so there's always a UTF-8 > >> + ;; locale around. > >> + (let* ((out (assoc-ref outputs "out")) > >> + (bin (string-append out "/bin")) > >> + (locale (string-append out "/lib/locale/" > >> + ,(package-version > >> + this-package)))) > >> + (mkdir-p locale) > >> + (invoke (string-append bin "/localedef") > >> + "--no-archive" "--prefix" locale > >> + "-i" "C" "-f" "UTF-8" > >> + (string-append locale "/C.UTF-8"))))) > > > > I realize now that this cannot work when cross-compiling, because the > > this ‘localedef’ binary is not executable on the build machine. > > > > I suspect libc builds an additional ‘localedef’ for the build machine > > but I’m not sure where it is, hmm… > > I was told on #glibc that (1) there’s no ‘localedef’ for the build > machine produced during cross-compilation, and (2) that more generally, > there’s no way to cross-build locale data, that endianness and other > things may matter. > > I suspect #2 was about the locale archive and not locale data, because > evidence suggests that locale data is system-independent: > > --8<---------------cut here---------------start------------->8--- > $ for s in aarch64-linux powerpc64le-linux armhf-linux i686-linux ; do diff -r $(guix build glibc-locales@2.35) $(guix build glibc-locales@2.35 -s "$s") && echo "$s same as x86_64-linux" ; done > aarch64-linux same as x86_64-linux > powerpc64le-linux same as x86_64-linux > armhf-linux same as x86_64-linux > i686-linux same as x86_64-linux > $ guix describe > guix 6e2dd51 > repository URL: https://git.savannah.gnu.org/git/guix.git > branch: master > commit: 6e2dd51df5f3f51e9056dd4f2e1b036195ab3caa > --8<---------------cut here---------------end--------------->8--- > > Efraim, could you check against powerpc-linux, which is the only > big-endian target we +/- support? I found a difference in almost every file. The tarball of the locales was too big to attach so I've uploaded it here¹. Looking at it in diffoscope it looked like most of the data that looked human readable was the same, but there was some endian switching with the other data bits. So without actually checking other big endian systems it looks like we could set target #f for the locales, but for those that share their endianness. ¹ https://flashner.co.il/~efraim/glibc-locales-2.35-powerpc-linux.tar.xz
Hello! Efraim Flashner <efraim@flashner.co.il> skribis: > On Thu, Dec 07, 2023 at 10:26:36PM +0100, Ludovic Courtès wrote: [...] >> I was told on #glibc that (1) there’s no ‘localedef’ for the build >> machine produced during cross-compilation, and (2) that more generally, >> there’s no way to cross-build locale data, that endianness and other >> things may matter. >> >> I suspect #2 was about the locale archive and not locale data, because >> evidence suggests that locale data is system-independent: >> >> --8<---------------cut here---------------start------------->8--- >> $ for s in aarch64-linux powerpc64le-linux armhf-linux i686-linux ; do diff -r $(guix build glibc-locales@2.35) $(guix build glibc-locales@2.35 -s "$s") && echo "$s same as x86_64-linux" ; done >> aarch64-linux same as x86_64-linux >> powerpc64le-linux same as x86_64-linux >> armhf-linux same as x86_64-linux >> i686-linux same as x86_64-linux >> $ guix describe >> guix 6e2dd51 >> repository URL: https://git.savannah.gnu.org/git/guix.git >> branch: master >> commit: 6e2dd51df5f3f51e9056dd4f2e1b036195ab3caa >> --8<---------------cut here---------------end--------------->8--- >> >> Efraim, could you check against powerpc-linux, which is the only >> big-endian target we +/- support? > > I found a difference in almost every file. The tarball of the locales > was too big to attach so I've uploaded it here¹. Looking at it in > diffoscope it looked like most of the data that looked human readable > was the same, but there was some endian switching with the other data > bits. So without actually checking other big endian systems it looks > like we could set target #f for the locales, but for those that share > their endianness. OK, interesting, thanks for checking! So we won’t be able to reliably provide C.UTF-8 in cross-compiled libcs. Maybe not a big problem, but it does mean that cross-compiled code will be “less capable” because of that. Ludo’.
On Sat, Dec 09, 2023 at 10:41:41PM +0100, Ludovic Courtès wrote: > Hello! > > Efraim Flashner <efraim@flashner.co.il> skribis: > > > On Thu, Dec 07, 2023 at 10:26:36PM +0100, Ludovic Courtès wrote: > > [...] > > >> I was told on #glibc that (1) there’s no ‘localedef’ for the build > >> machine produced during cross-compilation, and (2) that more generally, > >> there’s no way to cross-build locale data, that endianness and other > >> things may matter. > >> > >> I suspect #2 was about the locale archive and not locale data, because > >> evidence suggests that locale data is system-independent: > >> > >> --8<---------------cut here---------------start------------->8--- > >> $ for s in aarch64-linux powerpc64le-linux armhf-linux i686-linux ; do diff -r $(guix build glibc-locales@2.35) $(guix build glibc-locales@2.35 -s "$s") && echo "$s same as x86_64-linux" ; done > >> aarch64-linux same as x86_64-linux > >> powerpc64le-linux same as x86_64-linux > >> armhf-linux same as x86_64-linux > >> i686-linux same as x86_64-linux > >> $ guix describe > >> guix 6e2dd51 > >> repository URL: https://git.savannah.gnu.org/git/guix.git > >> branch: master > >> commit: 6e2dd51df5f3f51e9056dd4f2e1b036195ab3caa > >> --8<---------------cut here---------------end--------------->8--- > >> > >> Efraim, could you check against powerpc-linux, which is the only > >> big-endian target we +/- support? > > > > I found a difference in almost every file. The tarball of the locales > > was too big to attach so I've uploaded it here¹. Looking at it in > > diffoscope it looked like most of the data that looked human readable > > was the same, but there was some endian switching with the other data > > bits. So without actually checking other big endian systems it looks > > like we could set target #f for the locales, but for those that share > > their endianness. > > OK, interesting, thanks for checking! > > So we won’t be able to reliably provide C.UTF-8 in cross-compiled libcs. > Maybe not a big problem, but it does mean that cross-compiled code will > be “less capable” because of that. We should be able to create some monstrosity of a #:target field to say that within an endianness group target is #f but otherwise is (%current-target-system). Should work for all the locale generators actually. untested: (if (and (target-little-endian? (%current-system)) (target-little-endian? (%current-target-system))) #f (%current-target-system)) Although if we are going to rely on target-little-endian we might want to define that field in (guix platform) too so we don't assign an endianness to 8-bit controllers or embedded systems.
diff --git a/gnu/packages/base.scm b/gnu/packages/base.scm index c5eac8a2da..985cd627fe 100644 --- a/gnu/packages/base.scm +++ b/gnu/packages/base.scm @@ -1023,6 +1023,21 @@ (define-public glibc (map (cut string-append slib "/" <>) files)))))) + (add-after 'install 'install-utf8-c-locale + (lambda* (#:key outputs #:allow-other-keys) + ;; Install the C.UTF-8 locale so there's always a UTF-8 + ;; locale around. + (let* ((out (assoc-ref outputs "out")) + (bin (string-append out "/bin")) + (locale (string-append out "/lib/locale/" + ,(package-version + this-package)))) + (mkdir-p locale) + (invoke (string-append bin "/localedef") + "--no-archive" "--prefix" locale + "-i" "C" "-f" "UTF-8" + (string-append locale "/C.UTF-8"))))) + ,@(if (target-hurd?) '((add-after 'install 'augment-libc.so (lambda* (#:key outputs #:allow-other-keys) @@ -1108,11 +1123,19 @@ (define-public glibc-2.35 "glibc-hurd-clock_t_centiseconds.patch" "glibc-hurd-clock_gettime_monotonic.patch" "glibc-hurd-mach-print.patch" - "glibc-hurd-gettyent.patch")))))) + "glibc-hurd-gettyent.patch")))) + (arguments + (substitute-keyword-arguments (package-arguments glibc) + ((#:phases phases) + ;; The C.UTF-8 fails to build in glibc 2.35: + ;; <https://sourceware.org/bugzilla/show_bug.cgi?id=28861>. + ;; It is missing altogether in versions earlier than 2.35. + `(modify-phases ,phases + (delete 'install-utf8-c-locale))))))) (define-public glibc-2.33 (package - (inherit glibc) + (inherit glibc-2.35) (name "glibc") (version "2.33") (source (origin @@ -1139,7 +1162,7 @@ (define-public glibc-2.33 (define-public glibc-2.32 (package - (inherit glibc) + (inherit glibc-2.35) (version "2.32") (source (origin (inherit (package-source glibc)) @@ -1194,7 +1217,7 @@ (define-public glibc-2.32 (define-public glibc-2.31 (package - (inherit glibc) + (inherit glibc-2.35) (version "2.31") (source (origin (inherit (package-source glibc))