diff mbox series

[bug#44075] gnu: Add make-glibc-locales-collection.

Message ID 20201019140236.GF9117@E5400
State New
Headers show
Series [bug#44075] gnu: Add make-glibc-locales-collection. | expand

Checks

Context Check Description
cbaines/issue success View issue
cbaines/comparison success View comparision
cbaines/git branch success View Git branch
cbaines/applying patch success View Laminar job

Commit Message

Efraim Flashner Oct. 19, 2020, 2:02 p.m. UTC
On Mon, Oct 19, 2020 at 03:17:54PM +0200, Miguel Ángel Arruga Vivas wrote:
> Hi Efraim,
> 
> I've been taking a look into your patch.  One issue are the comments
> about utf8 and UTF-8, as the issue is already explained in
> make-glibc[-utf8]-locales.

Thanks for taking a look. For the utf8 vs UTF-8 there are a couple of
comments in the code:
The above phase does not install locales with names using
the "normalized codeset."  Thus, create symlinks like:
en_US.utf8 -> en_US.UTF-8

and also:
For backward compatibility with Guix
<= 0.8.3, add "xx_YY.UTF-8".

When I check on one of my Debian boxes:
$localectl list-locales
C.UTF-8
en_US.utf8

I've learned that 'C.UTF-8' isn't from upstream, Debian has a patch for
it. Having written this patch a week or so ago and returning to it I
can't tell you which is the correct one. I think it's best to offer both
so it's not confusing which is correct. That works for the logic for
accepting either in the list of locales, but I'm concerned that skipping
the symlink to the other one could cause problems.

> 
> Other point is:
> 
> Efraim Flashner <efraim@flashner.co.il> writes:
> > +(define* (make-glibc-locales-collection
> > +           glibc
> > +           #:optional (locales
> > +                        '(list "en_US.utf8" "en_US.ISO-8859-1")))
> > (... Removed for clarity ...)
> > +             ,locales)
> 
> I would have used list there like (list ,@locales) or '(,@locales), this
> looks a bit odd to my eyes at least.  I'd expect this kind of calling code:
> 
> (let ((locales '("de_CH.utf8" ... "de_DE.utf8"))
>       (my-glibc ...))
>   (make-glibc-locales-collection myglibc locales))
> 
> Enforcing an extra quotation for no real reason on the calling site, as
> strings are self-evaluating objects, and the use of the symbol list,
> whose meaning depends on other context of execution, doesn't seem
> necessary.  Even worse, my example would raise an error as "de_CH.utf8"
> is not a procedure.

My scheme-foo isn't terribly strong, sometimes I just hack at it until
the code does what I want, and this is one of those times. I've changed
it so that locales takes a list and not an item that is a list.

> What do you think about replacing make-glibc-utf8-locales with a call of
> the new function (using that code) ensuring that the generated
> derivation stays the same for that case (i.e. it's optimized for the
> UTF-8 case)?

This is what I originally wanted to do, but there's a glibc-locales
buried in the bootstrap path so it's not so easy to just swap it out. I
can make the change in core-updates. I'll play around with it and see if
I can come out with the same derivation using a different function, but
I'm not expecting it to turn out identical.

> 
> Happy hacking!
> Miguel
>

Comments

Miguel Arruga Vivas Oct. 19, 2020, 5:43 p.m. UTC | #1
Hello,

Efraim Flashner <efraim@flashner.co.il> writes:
> Thanks for taking a look.

No problem at all, I was thinking about something along these lines too,
but then I saw your patch. :-)

> For the utf8 vs UTF-8 there are a couple of comments in the code:
> The above phase does not install locales with names using
> the "normalized codeset."  Thus, create symlinks like:
> en_US.utf8 -> en_US.UTF-8
>
> and also:
> For backward compatibility with Guix
> <= 0.8.3, add "xx_YY.UTF-8".

Yes, what I mean is that the comments along the code may need to be
clarified, but adding a "nobody knows" doesn't add much information.
The actual source[1] says that the correct value is utf8, following
the rules for the 'normalized codeset' naming, that I copy here:

  1. Remove all characters besides numbers and letters.
  2. Fold letters to lowercase.
  3. If the same only contains digits prepend the string ‘"iso"’.

We should stick to that naming regarding libc locales (note to self),
even though we keep the links for compatibility.  That includes the
other locale at en_us-glibc-locales.

>> What do you think about replacing make-glibc-utf8-locales with a call of
>> the new function (using that code) ensuring that the generated
>> derivation stays the same for that case (i.e. it's optimized for the
>> UTF-8 case)?
>
> This is what I originally wanted to do, but there's a glibc-locales
> buried in the bootstrap path so it's not so easy to just swap it out.

I guess you mean glibc-utf8-locales, if not I'm missing where that
reference could come from.  That's why I insist on leaving exactly the
same derivation.

> I can make the change in core-updates. I'll play around with it and
> see if I can come out with the same derivation using a different
> function, but I'm not expecting it to turn out identical.

The colour of my lisp-rate belt isn't even close to some you can see
around here but I could bring you some help if you want, because I think
it's easier to do it without any change in current packages than it
might sound.  Not the definitions in scheme, of course, I mean the
derivations the daemon actually sees.

Said this, I've seen that [single-]locale-directory does mostly the
same, and there is a build-locale function in gnu/build/locale.scm... so
I'm starting to see as a better idea to clean up glibc-utf8-locales up
in core-updates using the latter, as it would lead to cleaner code for
the builder.

Could you check that and tell if you consider feasible what I propose?

Happy hacking!
Miguel

[1] info "(libc)Using gettextized software"
Efraim Flashner Oct. 21, 2020, 7:01 a.m. UTC | #2
On Mon, Oct 19, 2020 at 07:43:27PM +0200, Miguel Ángel Arruga Vivas wrote:
> Hello,
> 
> Efraim Flashner <efraim@flashner.co.il> writes:
> > Thanks for taking a look.
> 
> No problem at all, I was thinking about something along these lines too,
> but then I saw your patch. :-)

Your patch was what got me to actually send mine to guix-patches :)

> > For the utf8 vs UTF-8 there are a couple of comments in the code:
> > The above phase does not install locales with names using
> > the "normalized codeset."  Thus, create symlinks like:
> > en_US.utf8 -> en_US.UTF-8
> >
> > and also:
> > For backward compatibility with Guix
> > <= 0.8.3, add "xx_YY.UTF-8".
> 
> Yes, what I mean is that the comments along the code may need to be
> clarified, but adding a "nobody knows" doesn't add much information.

That is a good point. I'll change it to something actually useful,
mentioning that there's a convention but that people often get it wrong.

> The actual source[1] says that the correct value is utf8, following
> the rules for the 'normalized codeset' naming, that I copy here:
> 
>   1. Remove all characters besides numbers and letters.
>   2. Fold letters to lowercase.
>   3. If the same only contains digits prepend the string ‘"iso"’.
> 
> We should stick to that naming regarding libc locales (note to self),
> even though we keep the links for compatibility.  That includes the
> other locale at en_us-glibc-locales.

I agree it should be renamed to en-us. I have some bikeshedding to think
about for en-us-glibc-locales vs glibc-locales-en-us. And I'll also test
out one for glibc-locales-guile-tests.

> >> What do you think about replacing make-glibc-utf8-locales with a call of
> >> the new function (using that code) ensuring that the generated
> >> derivation stays the same for that case (i.e. it's optimized for the
> >> UTF-8 case)?
> >
> > This is what I originally wanted to do, but there's a glibc-locales
> > buried in the bootstrap path so it's not so easy to just swap it out.
> 
> I guess you mean glibc-utf8-locales, if not I'm missing where that
> reference could come from.  That's why I insist on leaving exactly the
> same derivation.

I meant in commencement.scm. There's a glibc-utf8-locales-final near the
definition for %boot5-inputs.

> > I can make the change in core-updates. I'll play around with it and
> > see if I can come out with the same derivation using a different
> > function, but I'm not expecting it to turn out identical.
> 
> The colour of my lisp-rate belt isn't even close to some you can see
> around here but I could bring you some help if you want, because I think
> it's easier to do it without any change in current packages than it
> might sound.  Not the definitions in scheme, of course, I mean the
> derivations the daemon actually sees.
> 
> Said this, I've seen that [single-]locale-directory does mostly the
> same, and there is a build-locale function in gnu/build/locale.scm... so
> I'm starting to see as a better idea to clean up glibc-utf8-locales up
> in core-updates using the latter, as it would lead to cleaner code for
> the builder.
> 
> Could you check that and tell if you consider feasible what I propose?

I hadn't seen gnu/build/locales.scm. I'll have to see what we can do
about using it. We normally don't allow cross-over from the build side
to the packages side.

> Happy hacking!
> Miguel
> 
> [1] info "(libc)Using gettextized software"
Miguel Arruga Vivas Oct. 21, 2020, 11:18 p.m. UTC | #3
Hello,

Efraim Flashner <efraim@flashner.co.il> writes:
> Your patch was what got me to actually send mine to guix-patches :)

Thanks. :-)

> That is a good point. I'll change it to something actually useful,
> mentioning that there's a convention but that people often get it wrong.

Great, the more knowledge, the easier to overcome that common
misunderstanding. :-)

> I agree it should be renamed to en-us. I have some bikeshedding to think
> about for en-us-glibc-locales vs glibc-locales-en-us. And I'll also test
> out one for glibc-locales-guile-tests.

I meant the ISO-8859-1, the locale creation should be en_EN.iso88591,
the locale name can be en_EN.88591 too...

Although in that case shouldn't be glibc-en-us-locales?  I'd like green
lights for the bike shed, please. ;-)

> I hadn't seen gnu/build/locales.scm. I'll have to see what we can do
> about using it. We normally don't allow cross-over from the build side
> to the packages side.

It is already being used in the build of glibc-locales and I guess it
would be harmless to move it to guix build if that is a problem, but I
think it can be used as it is right now, even I didn't check yet.  In
this for-each(gnu/packages/base.scm:1084), for example:
------------------------->8-------------begin----------------------
(for-each (lambda (directory)
            (let*-values (((base)
                           (basename directory))
                          ((name codeset)
                           (locale->name+codeset base))
                          ((normalized)
                           (normalize-codeset codeset)))
              (unless (string=? codeset normalized)
                (symlink base
                         (string-append (dirname directory)
                                        "/" name "."
                                        normalized)))))
          locales)
-------------------------8<--------------end-----------------------

I would write something like this in the builder code:
------------------------->8-------------begin----------------------
(define (locale-builder directory)
  (lambda (locale)
    (let-values (((name codeset)
                  (locale->name+codeset)))
      (if codeset
          (build-locale name
                        #:directory directory
                        #:codeset (normalize-codeset codeset))
          (build-locale name
                        #:directory directory
                        #:codeset "utf8")))))
(let ...
  (for-each (locale-builder localedir) '(,@locales))
  ...)
-------------------------8<--------------end-----------------------

I leave you here the function I was hacking apart (please, do not
confuse with working code ;-)) if you want to take any idea other from
it:
------------------------->8-------------begin----------------------
(define*-public (make-glibc-selected-locales
                 glibc
                 #:key
                 (suffix "-utf8-locales")
                 (locales %default-locales)
                 (alias %default-alias))
  (package
    (name (string-append (package-name glibc) suffix))
    (version (package-version glibc))
    (source #f)
    (build-system trivial-build-system)
    (arguments
     `(#:modules ((guix build utils) (gnu build locale))
       #:builder
       (begin
         (use-modules (guix build utils) (gnu build locale))
         ;; Print the common alias for locale
         (define (print-locale-alias locale)
           (let-values (((name codeset)
                         (locale->name+codeset)))
             (if codeset
                 (or (string-equal codeset (normalize-codeset codeset))
                     (format #t "~a.~a ~a.~a~%" name codeset
                             name (normalize-codeset codeset)))
                 (format #t "~a.UTF-8 ~a.utf8" name name))))

         ;; Generate a function that builds the locale into directory.
         (define (locale-builder directory)
           (lambda (locale)
             (let-values (((name codeset)
                           (locale->name+codeset)))
               (if codeset
                   (build-locale name
                                 #:directory directory
                                 #:codeset (normalize-codeset codeset))
                   (build-locale name
                                 #:directory directory
                                 #:codeset "utf8")))))

         ;; Do the job.
         (let* ((libc      (assoc-ref %build-inputs "glibc"))
                (gzip      (assoc-ref %build-inputs "gzip"))
                (out       (assoc-ref %outputs "out"))
                (localedir (string-append out "/lib/locale/"
                                          ,(version-major+minor version)))
                (alias     (string-append localedir "/locale.alias"))
                (locales   '(,@locales)))

           ;; 'localedef' needs 'gzip'.
           (setenv "PATH" (string-append libc "/bin:" gzip "/bin"))

           (mkdir-p localedir)
           ;; Generate each locale provided.
           (for-each (locale-builder localedir) locales)
           ;; Generate alias file.
           (with-output-to-file alias
             (lambda ()
               (format #t "~a~%" "# Aliases for common codeset names.")
               (for-each print-locale-alias locales)
               (format #t "~a~%" "# Other aliases.")
               (for-each (lambda (line)
                           (format #t "~a~%" line))
                         '(,@alias)))))))
    (native-inputs `(("glibc" ,glibc)
                     ("gzip" ,gzip)))
    (synopsis "Configured sample of locales")
    (description
     "TODO.")
    (home-page (package-home-page glibc))
    (license (package-license glibc))))
-------------------------8<--------------end-----------------------

Happy hacking,
Miguel
diff mbox series

Patch

diff --git a/gnu/packages/base.scm b/gnu/packages/base.scm
index c83775d8ee..4ea31c2ab6 100644
--- a/gnu/packages/base.scm
+++ b/gnu/packages/base.scm
@@ -62,7 +62,8 @@ 
   #:use-module (srfi srfi-1)
   #:use-module (srfi srfi-26)
   #:export (glibc
-            libiconv-if-needed))
+            libiconv-if-needed
+            make-custom-glibc-locales))
 
 ;;; Commentary:
 ;;;
@@ -1106,6 +1107,69 @@  to the @code{share/locale} sub-directory of this package.")
                                         ,(version-major+minor
                                           (package-version glibc)))))))))))
 
+(define* (make-glibc-locales-collection
+           glibc
+           #:optional (locales
+                        (list "en_US.utf8" "en_US.ISO-8859-1")))
+                        ;; This list for testing
+                        ;(list "el_GR.UTF-8" "en_US.utf8" "he_IL.ISO-8859-8" "ja_JP.EUC-JP" "zh_CN.GB18030" "zh_CN.GBK" "hy_AM.ARMSCII-8")))
+  (package
+    (name "glibc-locales-collection")
+    (version (package-version glibc))
+    (source #f)
+    (build-system trivial-build-system)
+    (arguments
+     `(#:modules ((guix build utils))
+       #:builder
+       (begin
+         (use-modules (guix build utils))
+
+         (let* ((libc      (assoc-ref %build-inputs "glibc"))
+                (gzip      (assoc-ref %build-inputs "gzip"))
+                (out       (assoc-ref %outputs "out"))
+                (localedir (string-append out "/lib/locale/"
+                                          ,(version-major+minor version))))
+           ;; 'localedef' needs 'gzip'.
+           (setenv "PATH" (string-append libc "/bin:" gzip "/bin"))
+
+           (mkdir-p localedir)
+           (for-each
+             (lambda (locale)
+               (let* ((contains-dot? (string-index locale #\.))
+                      (encoding-type (substring locale (1+ contains-dot?)))
+                      (raw-locale    (substring locale 0 contains-dot?))
+                      (utf8? (or (number? (string-contains locale ".utf8"))
+                                 (number? (string-contains locale ".UTF-8"))))
+                      (file (if utf8?
+                              (string-append localedir "/" raw-locale ".utf8")
+                              (if (string-contains locale ".ISO")
+                                (string-append localedir "/" raw-locale)
+                                (string-append localedir "/" locale)))))
+
+                 (invoke "localedef" "--no-archive"
+                         "--prefix" localedir
+                         "-i" raw-locale
+                         "-f" (if (equal? "utf8" encoding-type)
+                                "UTF-8"
+                                encoding-type)
+                         file)
+
+                 ;; Is it utf8 or UTF-8? NO ONE KNOWS!
+                 (when utf8?
+                   (symlink (string-append raw-locale ".utf8")
+                            (string-append localedir "/"
+                                           raw-locale ".UTF-8")))))
+             (list ,@locales))
+           #t))))
+    (native-inputs `(("glibc" ,glibc)
+                     ("gzip" ,gzip)))
+    (synopsis "Customizable collection of locales")
+    (description
+     "This package provides a custom collection of locales useful for
+providing exactly the locales requested when size matters.")
+    (home-page (package-home-page glibc))
+    (license (package-license glibc))))
+
 (define-public (make-glibc-utf8-locales glibc)
   (package
     (name "glibc-utf8-locales")
@@ -1161,6 +1225,13 @@  test environments.")
 (define-public glibc-utf8-locales
   (make-glibc-utf8-locales glibc))
 
+(define-public en_us-glibc-locales
+  (package
+    (inherit (make-glibc-locales-collection
+               glibc
+               (list "en_US.utf8" "en_US.ISO-8859-1")))
+    (name "en-us-glibc-locales")))
+
 ;; Packages provided to ease use of binaries linked against the previous libc.
 (define-public glibc-locales-2.29
   (package (inherit (make-glibc-locales glibc-2.29))