Message ID | 20201019140236.GF9117@E5400 |
---|---|
State | New |
Headers | show |
Series | [bug#44075] gnu: Add make-glibc-locales-collection. | expand |
Context | Check | Description |
---|---|---|
cbaines/issue | success | View issue |
cbaines/comparison | success | View comparision |
cbaines/git branch | success | View Git branch |
cbaines/applying patch | success | View Laminar job |
Hello, Efraim Flashner <efraim@flashner.co.il> writes: > Thanks for taking a look. No problem at all, I was thinking about something along these lines too, but then I saw your patch. :-) > For the utf8 vs UTF-8 there are a couple of comments in the code: > The above phase does not install locales with names using > the "normalized codeset." Thus, create symlinks like: > en_US.utf8 -> en_US.UTF-8 > > and also: > For backward compatibility with Guix > <= 0.8.3, add "xx_YY.UTF-8". Yes, what I mean is that the comments along the code may need to be clarified, but adding a "nobody knows" doesn't add much information. The actual source[1] says that the correct value is utf8, following the rules for the 'normalized codeset' naming, that I copy here: 1. Remove all characters besides numbers and letters. 2. Fold letters to lowercase. 3. If the same only contains digits prepend the string ‘"iso"’. We should stick to that naming regarding libc locales (note to self), even though we keep the links for compatibility. That includes the other locale at en_us-glibc-locales. >> What do you think about replacing make-glibc-utf8-locales with a call of >> the new function (using that code) ensuring that the generated >> derivation stays the same for that case (i.e. it's optimized for the >> UTF-8 case)? > > This is what I originally wanted to do, but there's a glibc-locales > buried in the bootstrap path so it's not so easy to just swap it out. I guess you mean glibc-utf8-locales, if not I'm missing where that reference could come from. That's why I insist on leaving exactly the same derivation. > I can make the change in core-updates. I'll play around with it and > see if I can come out with the same derivation using a different > function, but I'm not expecting it to turn out identical. The colour of my lisp-rate belt isn't even close to some you can see around here but I could bring you some help if you want, because I think it's easier to do it without any change in current packages than it might sound. Not the definitions in scheme, of course, I mean the derivations the daemon actually sees. Said this, I've seen that [single-]locale-directory does mostly the same, and there is a build-locale function in gnu/build/locale.scm... so I'm starting to see as a better idea to clean up glibc-utf8-locales up in core-updates using the latter, as it would lead to cleaner code for the builder. Could you check that and tell if you consider feasible what I propose? Happy hacking! Miguel [1] info "(libc)Using gettextized software"
On Mon, Oct 19, 2020 at 07:43:27PM +0200, Miguel Ángel Arruga Vivas wrote: > Hello, > > Efraim Flashner <efraim@flashner.co.il> writes: > > Thanks for taking a look. > > No problem at all, I was thinking about something along these lines too, > but then I saw your patch. :-) Your patch was what got me to actually send mine to guix-patches :) > > For the utf8 vs UTF-8 there are a couple of comments in the code: > > The above phase does not install locales with names using > > the "normalized codeset." Thus, create symlinks like: > > en_US.utf8 -> en_US.UTF-8 > > > > and also: > > For backward compatibility with Guix > > <= 0.8.3, add "xx_YY.UTF-8". > > Yes, what I mean is that the comments along the code may need to be > clarified, but adding a "nobody knows" doesn't add much information. That is a good point. I'll change it to something actually useful, mentioning that there's a convention but that people often get it wrong. > The actual source[1] says that the correct value is utf8, following > the rules for the 'normalized codeset' naming, that I copy here: > > 1. Remove all characters besides numbers and letters. > 2. Fold letters to lowercase. > 3. If the same only contains digits prepend the string ‘"iso"’. > > We should stick to that naming regarding libc locales (note to self), > even though we keep the links for compatibility. That includes the > other locale at en_us-glibc-locales. I agree it should be renamed to en-us. I have some bikeshedding to think about for en-us-glibc-locales vs glibc-locales-en-us. And I'll also test out one for glibc-locales-guile-tests. > >> What do you think about replacing make-glibc-utf8-locales with a call of > >> the new function (using that code) ensuring that the generated > >> derivation stays the same for that case (i.e. it's optimized for the > >> UTF-8 case)? > > > > This is what I originally wanted to do, but there's a glibc-locales > > buried in the bootstrap path so it's not so easy to just swap it out. > > I guess you mean glibc-utf8-locales, if not I'm missing where that > reference could come from. That's why I insist on leaving exactly the > same derivation. I meant in commencement.scm. There's a glibc-utf8-locales-final near the definition for %boot5-inputs. > > I can make the change in core-updates. I'll play around with it and > > see if I can come out with the same derivation using a different > > function, but I'm not expecting it to turn out identical. > > The colour of my lisp-rate belt isn't even close to some you can see > around here but I could bring you some help if you want, because I think > it's easier to do it without any change in current packages than it > might sound. Not the definitions in scheme, of course, I mean the > derivations the daemon actually sees. > > Said this, I've seen that [single-]locale-directory does mostly the > same, and there is a build-locale function in gnu/build/locale.scm... so > I'm starting to see as a better idea to clean up glibc-utf8-locales up > in core-updates using the latter, as it would lead to cleaner code for > the builder. > > Could you check that and tell if you consider feasible what I propose? I hadn't seen gnu/build/locales.scm. I'll have to see what we can do about using it. We normally don't allow cross-over from the build side to the packages side. > Happy hacking! > Miguel > > [1] info "(libc)Using gettextized software"
Hello, Efraim Flashner <efraim@flashner.co.il> writes: > Your patch was what got me to actually send mine to guix-patches :) Thanks. :-) > That is a good point. I'll change it to something actually useful, > mentioning that there's a convention but that people often get it wrong. Great, the more knowledge, the easier to overcome that common misunderstanding. :-) > I agree it should be renamed to en-us. I have some bikeshedding to think > about for en-us-glibc-locales vs glibc-locales-en-us. And I'll also test > out one for glibc-locales-guile-tests. I meant the ISO-8859-1, the locale creation should be en_EN.iso88591, the locale name can be en_EN.88591 too... Although in that case shouldn't be glibc-en-us-locales? I'd like green lights for the bike shed, please. ;-) > I hadn't seen gnu/build/locales.scm. I'll have to see what we can do > about using it. We normally don't allow cross-over from the build side > to the packages side. It is already being used in the build of glibc-locales and I guess it would be harmless to move it to guix build if that is a problem, but I think it can be used as it is right now, even I didn't check yet. In this for-each(gnu/packages/base.scm:1084), for example: ------------------------->8-------------begin---------------------- (for-each (lambda (directory) (let*-values (((base) (basename directory)) ((name codeset) (locale->name+codeset base)) ((normalized) (normalize-codeset codeset))) (unless (string=? codeset normalized) (symlink base (string-append (dirname directory) "/" name "." normalized))))) locales) -------------------------8<--------------end----------------------- I would write something like this in the builder code: ------------------------->8-------------begin---------------------- (define (locale-builder directory) (lambda (locale) (let-values (((name codeset) (locale->name+codeset))) (if codeset (build-locale name #:directory directory #:codeset (normalize-codeset codeset)) (build-locale name #:directory directory #:codeset "utf8"))))) (let ... (for-each (locale-builder localedir) '(,@locales)) ...) -------------------------8<--------------end----------------------- I leave you here the function I was hacking apart (please, do not confuse with working code ;-)) if you want to take any idea other from it: ------------------------->8-------------begin---------------------- (define*-public (make-glibc-selected-locales glibc #:key (suffix "-utf8-locales") (locales %default-locales) (alias %default-alias)) (package (name (string-append (package-name glibc) suffix)) (version (package-version glibc)) (source #f) (build-system trivial-build-system) (arguments `(#:modules ((guix build utils) (gnu build locale)) #:builder (begin (use-modules (guix build utils) (gnu build locale)) ;; Print the common alias for locale (define (print-locale-alias locale) (let-values (((name codeset) (locale->name+codeset))) (if codeset (or (string-equal codeset (normalize-codeset codeset)) (format #t "~a.~a ~a.~a~%" name codeset name (normalize-codeset codeset))) (format #t "~a.UTF-8 ~a.utf8" name name)))) ;; Generate a function that builds the locale into directory. (define (locale-builder directory) (lambda (locale) (let-values (((name codeset) (locale->name+codeset))) (if codeset (build-locale name #:directory directory #:codeset (normalize-codeset codeset)) (build-locale name #:directory directory #:codeset "utf8"))))) ;; Do the job. (let* ((libc (assoc-ref %build-inputs "glibc")) (gzip (assoc-ref %build-inputs "gzip")) (out (assoc-ref %outputs "out")) (localedir (string-append out "/lib/locale/" ,(version-major+minor version))) (alias (string-append localedir "/locale.alias")) (locales '(,@locales))) ;; 'localedef' needs 'gzip'. (setenv "PATH" (string-append libc "/bin:" gzip "/bin")) (mkdir-p localedir) ;; Generate each locale provided. (for-each (locale-builder localedir) locales) ;; Generate alias file. (with-output-to-file alias (lambda () (format #t "~a~%" "# Aliases for common codeset names.") (for-each print-locale-alias locales) (format #t "~a~%" "# Other aliases.") (for-each (lambda (line) (format #t "~a~%" line)) '(,@alias))))))) (native-inputs `(("glibc" ,glibc) ("gzip" ,gzip))) (synopsis "Configured sample of locales") (description "TODO.") (home-page (package-home-page glibc)) (license (package-license glibc)))) -------------------------8<--------------end----------------------- Happy hacking, Miguel
diff --git a/gnu/packages/base.scm b/gnu/packages/base.scm index c83775d8ee..4ea31c2ab6 100644 --- a/gnu/packages/base.scm +++ b/gnu/packages/base.scm @@ -62,7 +62,8 @@ #:use-module (srfi srfi-1) #:use-module (srfi srfi-26) #:export (glibc - libiconv-if-needed)) + libiconv-if-needed + make-custom-glibc-locales)) ;;; Commentary: ;;; @@ -1106,6 +1107,69 @@ to the @code{share/locale} sub-directory of this package.") ,(version-major+minor (package-version glibc))))))))))) +(define* (make-glibc-locales-collection + glibc + #:optional (locales + (list "en_US.utf8" "en_US.ISO-8859-1"))) + ;; This list for testing + ;(list "el_GR.UTF-8" "en_US.utf8" "he_IL.ISO-8859-8" "ja_JP.EUC-JP" "zh_CN.GB18030" "zh_CN.GBK" "hy_AM.ARMSCII-8"))) + (package + (name "glibc-locales-collection") + (version (package-version glibc)) + (source #f) + (build-system trivial-build-system) + (arguments + `(#:modules ((guix build utils)) + #:builder + (begin + (use-modules (guix build utils)) + + (let* ((libc (assoc-ref %build-inputs "glibc")) + (gzip (assoc-ref %build-inputs "gzip")) + (out (assoc-ref %outputs "out")) + (localedir (string-append out "/lib/locale/" + ,(version-major+minor version)))) + ;; 'localedef' needs 'gzip'. + (setenv "PATH" (string-append libc "/bin:" gzip "/bin")) + + (mkdir-p localedir) + (for-each + (lambda (locale) + (let* ((contains-dot? (string-index locale #\.)) + (encoding-type (substring locale (1+ contains-dot?))) + (raw-locale (substring locale 0 contains-dot?)) + (utf8? (or (number? (string-contains locale ".utf8")) + (number? (string-contains locale ".UTF-8")))) + (file (if utf8? + (string-append localedir "/" raw-locale ".utf8") + (if (string-contains locale ".ISO") + (string-append localedir "/" raw-locale) + (string-append localedir "/" locale))))) + + (invoke "localedef" "--no-archive" + "--prefix" localedir + "-i" raw-locale + "-f" (if (equal? "utf8" encoding-type) + "UTF-8" + encoding-type) + file) + + ;; Is it utf8 or UTF-8? NO ONE KNOWS! + (when utf8? + (symlink (string-append raw-locale ".utf8") + (string-append localedir "/" + raw-locale ".UTF-8"))))) + (list ,@locales)) + #t)))) + (native-inputs `(("glibc" ,glibc) + ("gzip" ,gzip))) + (synopsis "Customizable collection of locales") + (description + "This package provides a custom collection of locales useful for +providing exactly the locales requested when size matters.") + (home-page (package-home-page glibc)) + (license (package-license glibc)))) + (define-public (make-glibc-utf8-locales glibc) (package (name "glibc-utf8-locales") @@ -1161,6 +1225,13 @@ test environments.") (define-public glibc-utf8-locales (make-glibc-utf8-locales glibc)) +(define-public en_us-glibc-locales + (package + (inherit (make-glibc-locales-collection + glibc + (list "en_US.utf8" "en_US.ISO-8859-1"))) + (name "en-us-glibc-locales"))) + ;; Packages provided to ease use of binaries linked against the previous libc. (define-public glibc-locales-2.29 (package (inherit (make-glibc-locales glibc-2.29))