diff mbox series

[bug#67686,core-updates,4/5] gnu: glibc: Install C.UTF-8 locale.

Message ID 3e187cf8646059a513b502a17abe9f88daae6f6b.1701943221.git.ludo@gnu.org
State New
Headers show
Series Update glibc to 2.38; make C.UTF-8 always available | expand

Commit Message

Ludovic Courtès Dec. 7, 2023, 10:22 a.m. UTC
* gnu/packages/base.scm (glibc)[arguments]: Add ‘install-utf8-c-locale’
phase.
(glibc-2.35)[arguments]: Delete ‘install-utf8-c-locale’ phase.
(glibc-2.33, glibc-2.32, glibc-2.31): Inherit from ‘glibc-2.35’.

Change-Id: I7ba515184c7b7c40eaefd355639ffef8eeca66d8
---
 gnu/packages/base.scm | 31 +++++++++++++++++++++++++++----
 1 file changed, 27 insertions(+), 4 deletions(-)

Comments

Ludovic Courtès Dec. 7, 2023, 10:30 a.m. UTC | #1
Ludovic Courtès <ludo@gnu.org> skribis:

> +                     ;; Install the C.UTF-8 locale so there's always a UTF-8
> +                     ;; locale around.
> +                     (let* ((out (assoc-ref outputs "out"))
> +                            (bin (string-append out "/bin"))
> +                            (locale (string-append out "/lib/locale/"
> +                                                   ,(package-version
> +                                                     this-package))))
> +                       (mkdir-p locale)
> +                       (invoke (string-append bin "/localedef")
> +                               "--no-archive" "--prefix" locale
> +                               "-i" "C" "-f" "UTF-8"
> +                               (string-append locale "/C.UTF-8")))))

I realize now that this cannot work when cross-compiling, because the
this ‘localedef’ binary is not executable on the build machine.

I suspect libc builds an additional ‘localedef’ for the build machine
but I’m not sure where it is, hmm…

Ludo’.
Ludovic Courtès Dec. 7, 2023, 9:26 p.m. UTC | #2
Ludovic Courtès <ludo@gnu.org> skribis:

> Ludovic Courtès <ludo@gnu.org> skribis:
>
>> +                     ;; Install the C.UTF-8 locale so there's always a UTF-8
>> +                     ;; locale around.
>> +                     (let* ((out (assoc-ref outputs "out"))
>> +                            (bin (string-append out "/bin"))
>> +                            (locale (string-append out "/lib/locale/"
>> +                                                   ,(package-version
>> +                                                     this-package))))
>> +                       (mkdir-p locale)
>> +                       (invoke (string-append bin "/localedef")
>> +                               "--no-archive" "--prefix" locale
>> +                               "-i" "C" "-f" "UTF-8"
>> +                               (string-append locale "/C.UTF-8")))))
>
> I realize now that this cannot work when cross-compiling, because the
> this ‘localedef’ binary is not executable on the build machine.
>
> I suspect libc builds an additional ‘localedef’ for the build machine
> but I’m not sure where it is, hmm…

I was told on #glibc that (1) there’s no ‘localedef’ for the build
machine produced during cross-compilation, and (2) that more generally,
there’s no way to cross-build locale data, that endianness and other
things may matter.

I suspect #2 was about the locale archive and not locale data, because
evidence suggests that locale data is system-independent:

--8<---------------cut here---------------start------------->8---
$ for s in aarch64-linux powerpc64le-linux armhf-linux i686-linux ; do diff -r $(guix build glibc-locales@2.35) $(guix build glibc-locales@2.35 -s "$s") && echo "$s same as x86_64-linux" ; done
aarch64-linux same as x86_64-linux
powerpc64le-linux same as x86_64-linux
armhf-linux same as x86_64-linux
i686-linux same as x86_64-linux
$ guix describe
  guix 6e2dd51
    repository URL: https://git.savannah.gnu.org/git/guix.git
    branch: master
    commit: 6e2dd51df5f3f51e9056dd4f2e1b036195ab3caa
--8<---------------cut here---------------end--------------->8---

Efraim, could you check against powerpc-linux, which is the only
big-endian target we +/- support?

Ludo’.
Efraim Flashner Dec. 9, 2023, 4:33 p.m. UTC | #3
On Thu, Dec 07, 2023 at 10:26:36PM +0100, Ludovic Courtès wrote:
> Ludovic Courtès <ludo@gnu.org> skribis:
> 
> > Ludovic Courtès <ludo@gnu.org> skribis:
> >
> >> +                     ;; Install the C.UTF-8 locale so there's always a UTF-8
> >> +                     ;; locale around.
> >> +                     (let* ((out (assoc-ref outputs "out"))
> >> +                            (bin (string-append out "/bin"))
> >> +                            (locale (string-append out "/lib/locale/"
> >> +                                                   ,(package-version
> >> +                                                     this-package))))
> >> +                       (mkdir-p locale)
> >> +                       (invoke (string-append bin "/localedef")
> >> +                               "--no-archive" "--prefix" locale
> >> +                               "-i" "C" "-f" "UTF-8"
> >> +                               (string-append locale "/C.UTF-8")))))
> >
> > I realize now that this cannot work when cross-compiling, because the
> > this ‘localedef’ binary is not executable on the build machine.
> >
> > I suspect libc builds an additional ‘localedef’ for the build machine
> > but I’m not sure where it is, hmm…
> 
> I was told on #glibc that (1) there’s no ‘localedef’ for the build
> machine produced during cross-compilation, and (2) that more generally,
> there’s no way to cross-build locale data, that endianness and other
> things may matter.
> 
> I suspect #2 was about the locale archive and not locale data, because
> evidence suggests that locale data is system-independent:
> 
> --8<---------------cut here---------------start------------->8---
> $ for s in aarch64-linux powerpc64le-linux armhf-linux i686-linux ; do diff -r $(guix build glibc-locales@2.35) $(guix build glibc-locales@2.35 -s "$s") && echo "$s same as x86_64-linux" ; done
> aarch64-linux same as x86_64-linux
> powerpc64le-linux same as x86_64-linux
> armhf-linux same as x86_64-linux
> i686-linux same as x86_64-linux
> $ guix describe
>   guix 6e2dd51
>     repository URL: https://git.savannah.gnu.org/git/guix.git
>     branch: master
>     commit: 6e2dd51df5f3f51e9056dd4f2e1b036195ab3caa
> --8<---------------cut here---------------end--------------->8---
> 
> Efraim, could you check against powerpc-linux, which is the only
> big-endian target we +/- support?

I found a difference in almost every file. The tarball of the locales
was too big to attach so I've uploaded it here¹.  Looking at it in
diffoscope it looked like most of the data that looked human readable
was the same, but there was some endian switching with the other data
bits.  So without actually checking other big endian systems it looks
like we could set target #f for the locales, but for those that share
their endianness.

¹ https://flashner.co.il/~efraim/glibc-locales-2.35-powerpc-linux.tar.xz
Ludovic Courtès Dec. 9, 2023, 9:41 p.m. UTC | #4
Hello!

Efraim Flashner <efraim@flashner.co.il> skribis:

> On Thu, Dec 07, 2023 at 10:26:36PM +0100, Ludovic Courtès wrote:

[...]

>> I was told on #glibc that (1) there’s no ‘localedef’ for the build
>> machine produced during cross-compilation, and (2) that more generally,
>> there’s no way to cross-build locale data, that endianness and other
>> things may matter.
>> 
>> I suspect #2 was about the locale archive and not locale data, because
>> evidence suggests that locale data is system-independent:
>> 
>> --8<---------------cut here---------------start------------->8---
>> $ for s in aarch64-linux powerpc64le-linux armhf-linux i686-linux ; do diff -r $(guix build glibc-locales@2.35) $(guix build glibc-locales@2.35 -s "$s") && echo "$s same as x86_64-linux" ; done
>> aarch64-linux same as x86_64-linux
>> powerpc64le-linux same as x86_64-linux
>> armhf-linux same as x86_64-linux
>> i686-linux same as x86_64-linux
>> $ guix describe
>>   guix 6e2dd51
>>     repository URL: https://git.savannah.gnu.org/git/guix.git
>>     branch: master
>>     commit: 6e2dd51df5f3f51e9056dd4f2e1b036195ab3caa
>> --8<---------------cut here---------------end--------------->8---
>> 
>> Efraim, could you check against powerpc-linux, which is the only
>> big-endian target we +/- support?
>
> I found a difference in almost every file. The tarball of the locales
> was too big to attach so I've uploaded it here¹.  Looking at it in
> diffoscope it looked like most of the data that looked human readable
> was the same, but there was some endian switching with the other data
> bits.  So without actually checking other big endian systems it looks
> like we could set target #f for the locales, but for those that share
> their endianness.

OK, interesting, thanks for checking!

So we won’t be able to reliably provide C.UTF-8 in cross-compiled libcs.
Maybe not a big problem, but it does mean that cross-compiled code will
be “less capable” because of that.

Ludo’.
Efraim Flashner Dec. 10, 2023, 7:24 a.m. UTC | #5
On Sat, Dec 09, 2023 at 10:41:41PM +0100, Ludovic Courtès wrote:
> Hello!
> 
> Efraim Flashner <efraim@flashner.co.il> skribis:
> 
> > On Thu, Dec 07, 2023 at 10:26:36PM +0100, Ludovic Courtès wrote:
> 
> [...]
> 
> >> I was told on #glibc that (1) there’s no ‘localedef’ for the build
> >> machine produced during cross-compilation, and (2) that more generally,
> >> there’s no way to cross-build locale data, that endianness and other
> >> things may matter.
> >> 
> >> I suspect #2 was about the locale archive and not locale data, because
> >> evidence suggests that locale data is system-independent:
> >> 
> >> --8<---------------cut here---------------start------------->8---
> >> $ for s in aarch64-linux powerpc64le-linux armhf-linux i686-linux ; do diff -r $(guix build glibc-locales@2.35) $(guix build glibc-locales@2.35 -s "$s") && echo "$s same as x86_64-linux" ; done
> >> aarch64-linux same as x86_64-linux
> >> powerpc64le-linux same as x86_64-linux
> >> armhf-linux same as x86_64-linux
> >> i686-linux same as x86_64-linux
> >> $ guix describe
> >>   guix 6e2dd51
> >>     repository URL: https://git.savannah.gnu.org/git/guix.git
> >>     branch: master
> >>     commit: 6e2dd51df5f3f51e9056dd4f2e1b036195ab3caa
> >> --8<---------------cut here---------------end--------------->8---
> >> 
> >> Efraim, could you check against powerpc-linux, which is the only
> >> big-endian target we +/- support?
> >
> > I found a difference in almost every file. The tarball of the locales
> > was too big to attach so I've uploaded it here¹.  Looking at it in
> > diffoscope it looked like most of the data that looked human readable
> > was the same, but there was some endian switching with the other data
> > bits.  So without actually checking other big endian systems it looks
> > like we could set target #f for the locales, but for those that share
> > their endianness.
> 
> OK, interesting, thanks for checking!
> 
> So we won’t be able to reliably provide C.UTF-8 in cross-compiled libcs.
> Maybe not a big problem, but it does mean that cross-compiled code will
> be “less capable” because of that.

We should be able to create some monstrosity of a #:target field to say
that within an endianness group target is #f but otherwise is
(%current-target-system). Should work for all the locale generators
actually.

untested:

(if (and (target-little-endian? (%current-system))
         (target-little-endian? (%current-target-system)))
    #f
    (%current-target-system))

Although if we are going to rely on target-little-endian we might want
to define that field in (guix platform) too so we don't assign an
endianness to 8-bit controllers or embedded systems.
diff mbox series

Patch

diff --git a/gnu/packages/base.scm b/gnu/packages/base.scm
index c5eac8a2da..985cd627fe 100644
--- a/gnu/packages/base.scm
+++ b/gnu/packages/base.scm
@@ -1023,6 +1023,21 @@  (define-public glibc
                                          (map (cut string-append slib "/" <>)
                                               files))))))
 
+                 (add-after 'install 'install-utf8-c-locale
+                   (lambda* (#:key outputs #:allow-other-keys)
+                     ;; Install the C.UTF-8 locale so there's always a UTF-8
+                     ;; locale around.
+                     (let* ((out (assoc-ref outputs "out"))
+                            (bin (string-append out "/bin"))
+                            (locale (string-append out "/lib/locale/"
+                                                   ,(package-version
+                                                     this-package))))
+                       (mkdir-p locale)
+                       (invoke (string-append bin "/localedef")
+                               "--no-archive" "--prefix" locale
+                               "-i" "C" "-f" "UTF-8"
+                               (string-append locale "/C.UTF-8")))))
+
                  ,@(if (target-hurd?)
                        '((add-after 'install 'augment-libc.so
                            (lambda* (#:key outputs #:allow-other-keys)
@@ -1108,11 +1123,19 @@  (define-public glibc-2.35
                                        "glibc-hurd-clock_t_centiseconds.patch"
                                        "glibc-hurd-clock_gettime_monotonic.patch"
                                        "glibc-hurd-mach-print.patch"
-                                       "glibc-hurd-gettyent.patch"))))))
+                                       "glibc-hurd-gettyent.patch"))))
+    (arguments
+     (substitute-keyword-arguments (package-arguments glibc)
+       ((#:phases phases)
+        ;; The C.UTF-8 fails to build in glibc 2.35:
+        ;; <https://sourceware.org/bugzilla/show_bug.cgi?id=28861>.
+        ;; It is missing altogether in versions earlier than 2.35.
+        `(modify-phases ,phases
+           (delete 'install-utf8-c-locale)))))))
 
 (define-public glibc-2.33
   (package
-    (inherit glibc)
+    (inherit glibc-2.35)
     (name "glibc")
     (version "2.33")
     (source (origin
@@ -1139,7 +1162,7 @@  (define-public glibc-2.33
 
 (define-public glibc-2.32
   (package
-    (inherit glibc)
+    (inherit glibc-2.35)
     (version "2.32")
     (source (origin
               (inherit (package-source glibc))
@@ -1194,7 +1217,7 @@  (define-public glibc-2.32
 
 (define-public glibc-2.31
   (package
-    (inherit glibc)
+    (inherit glibc-2.35)
     (version "2.31")
     (source (origin
               (inherit (package-source glibc))