diff mbox series

[bug#65550] profiles: Don't propagate inputs for non-development packages.

Message ID 1f7f85189b17ebddb6d0e26afd7c5fad88c997e9.1693057951.git.iyzsong@member.fsf.org
State New
Headers show
Series [bug#65550] profiles: Don't propagate inputs for non-development packages. | expand

Commit Message

vasilii.smirnov--- via Guix-patches" via Aug. 26, 2023, 1:53 p.m. UTC
From: 宋文武 <iyzsong@member.fsf.org>

* guix/profiles.scm (package->manifest-entry): Only include propagated inputs
from a package for its "dev" output, or its "out" output if the package
doesn't have a "dev" one.
---
 guix/profiles.scm | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)


base-commit: 98a8666a0cbf33e24efff615243b98144a92c950

Comments

Liliana Marie Prikler Aug. 26, 2023, 2:21 p.m. UTC | #1
Am Samstag, dem 26.08.2023 um 21:53 +0800 schrieb iyzsong@envs.net:
> From: 宋文武 <iyzsong@member.fsf.org>
> 
> * guix/profiles.scm (package->manifest-entry): Only include
> propagated inputs from a package for its "dev" output, or its "out"
> output if the package doesn't have a "dev" one.
> ---
I think this patch misses the most obvious use case of the out:lib
split.  I also think that hardcoding this list is bound to fail.

Instead, we could for the time being solve this with yet another
package property.
  '((propagate-inputs-from "lib")) ; but not out
  '((propagate-inputs-from . ("lib"))) ; same meaning, different style
  '((propagate-inputs-from "out" "lib")) ; but not doc
If the property is missing, we still propagate from all outputs, as is
currently done.

WDYT?
宋文武 Aug. 27, 2023, 7:30 a.m. UTC | #2
Liliana Marie Prikler <liliana.prikler@gmail.com> writes:

> Am Samstag, dem 26.08.2023 um 21:53 +0800 schrieb iyzsong@envs.net:
>> From: 宋文武 <iyzsong@member.fsf.org>
>> 
>> * guix/profiles.scm (package->manifest-entry): Only include
>> propagated inputs from a package for its "dev" output, or its "out"
>> output if the package doesn't have a "dev" one.
>> ---
> I think this patch misses the most obvious use case of the out:lib
> split.  I also think that hardcoding this list is bound to fail.

Ah, that's true.  We currently put both headers and pkgconfig files in
the "lib" output, I think we should put those files into "dev" instead,
only leave shared libraries in "lib" (then we don't need propagated it
anymore).

And I did a test to find packages with "lib" and propagated-inputs:
--8<---------------cut here---------------start------------->8---
(use-modules (gnu packages)
             (guix packages)
             (srfi srfi-1)
             (ice-9 pretty-print))

(define x
  (delete-duplicates
   (fold-packages
    (lambda (package result)
      (if (and (member "lib" (package-outputs package))
               (not (null? (package-propagated-inputs package))))
          (cons (package-name package) result)
          result))
    '())))

(pretty-print x)
--8<---------------cut here---------------end--------------->8---
Only hwloc and apache-arrow, and 24 packages (uniqued by name) have
a "lib" output, seems doable.


> Instead, we could for the time being solve this with yet another
> package property.
>   '((propagate-inputs-from "lib")) ; but not out
>   '((propagate-inputs-from . ("lib"))) ; same meaning, different style
>   '((propagate-inputs-from "out" "lib")) ; but not doc
> If the property is missing, we still propagate from all outputs, as is
> currently done.
>
> WDYT?

Yes, a property is more flexible here, haven't think about it.  Well I
wishful think if we always follow a good convention (put development
files in "dev" or "out"), then maybe we can saving some bytes in code?
I'd like to split more packages with multiple outputs with "dev" to see
how the convention works out.  If not then we could take this 'property'
way.

Thanks!
Josselin Poiret Aug. 27, 2023, 7:34 a.m. UTC | #3
Hi everyone,

Liliana Marie Prikler <liliana.prikler@gmail.com> writes:

> I think this patch misses the most obvious use case of the out:lib
> split.  I also think that hardcoding this list is bound to fail.
>
> Instead, we could for the time being solve this with yet another
> package property.
>   '((propagate-inputs-from "lib")) ; but not out
>   '((propagate-inputs-from . ("lib"))) ; same meaning, different style
>   '((propagate-inputs-from "out" "lib")) ; but not doc
> If the property is missing, we still propagate from all outputs, as is
> currently done.
>
> WDYT?

I don't really like the hackiness of this (yet another weird thing for
independent contributors to learn), but I also understand the need.
This one LGTM, but maybe we should think a bit more about simpler
solutions before committing to this.

Best,
Liliana Marie Prikler Aug. 27, 2023, 9:31 a.m. UTC | #4
Am Sonntag, dem 27.08.2023 um 15:30 +0800 schrieb 宋文武:
> > Instead, we could for the time being solve this with yet another
> > package property.
> >   '((propagate-inputs-from "lib")) ; but not out
> >   '((propagate-inputs-from . ("lib"))) ; same meaning, different
> > style
> >   '((propagate-inputs-from "out" "lib")) ; but not doc
> > If the property is missing, we still propagate from all outputs, as
> > is currently done.
> > 
> > WDYT?
> 
> Yes, a property is more flexible here, haven't think about it.  Well
> I wishful think if we always follow a good convention (put
> development files in "dev" or "out"), then maybe we can saving some
> bytes in code?  I'd like to split more packages with multiple outputs
> with "dev" to see how the convention works out.  If not then we could
> take this 'property' way.
I'd really like to avoid yet another special output, when "bin", "lib",
etc. have already been given clear meanings, one of which strongly
overlaps with "stuff that wants propagated inputs for pkg-config
reasons".

The only package that currently has a "dev" output is mariadb, and it's
quite an odd one, as "dev" has been introduced to reduce closure size,
not to propagate even more.  I don't think that hardcoding output names
works given this precedent.  At the very least, "dev" is the wrong
choice.

Cheers
宋文武 Aug. 29, 2023, 10:24 a.m. UTC | #5
Liliana Marie Prikler <liliana.prikler@gmail.com> writes:

> I'd really like to avoid yet another special output, when "bin", "lib",
> etc. have already been given clear meanings, one of which strongly
> overlaps with "stuff that wants propagated inputs for pkg-config
> reasons".

The benefit to put headers files and libraries files into seperated
outputs is to reduce the runtime closure size of packages, for example
my home profile contains xfce, emacs, fonts, etc. has total 5GiB (by
guix size), and they have 300MiB under include.

calculated by:
--8<---------------cut here---------------start------------->8---
x=/gnu/store/0fyhci5kv03rfb9430jqs9wki2ifq5ja-profile
guix size $x
for i in `guix size $x`;
  do [ -e $i/include ] && du -sb $i/include;
done | awk '{ sum += $1 } END { print sum / 1024 / 1024 }'
--8<---------------cut here---------------end--------------->8---

If put headers and other development files into a "dev" output, then
those 300MiB can be saved (won't need to be substituted if substitutes
available).  Note that use a "include" output won't help here if you
leave pkg-config files in "lib", since pkg-config files need reference
its include and binaries need reference its libraries.

So it seems to me a "dev" output is unavoidable, also both Debian and
Alpine Linux use '-dev' packages for the same reason, it should be
familiar to learn..
Liliana Marie Prikler Aug. 29, 2023, 5:01 p.m. UTC | #6
Am Dienstag, dem 29.08.2023 um 18:24 +0800 schrieb 宋文武:
> Liliana Marie Prikler <liliana.prikler@gmail.com> writes:
> 
> > I'd really like to avoid yet another special output, when "bin",
> > "lib", etc. have already been given clear meanings, one of which
> > strongly overlaps with "stuff that wants propagated inputs for pkg-
> > config reasons".
> 
> The benefit to put headers files and libraries files into seperated
> outputs is to reduce the runtime closure size of packages, for
> example my home profile contains xfce, emacs, fonts, etc. has total
> 5GiB (by guix size), and they have 300MiB under include.
> 
> calculated by:
> --8<---------------cut here---------------start------------->8---
> x=/gnu/store/0fyhci5kv03rfb9430jqs9wki2ifq5ja-profile
> guix size $x
> for i in `guix size $x`;
>   do [ -e $i/include ] && du -sb $i/include;
> done | awk '{ sum += $1 } END { print sum / 1024 / 1024 }'
> --8<---------------cut here---------------end--------------->8---
> 
> If put headers and other development files into a "dev" output, then
> those 300MiB can be saved (won't need to be substituted if
> substitutes available).  Note that use a "include" output won't help
> here if you leave pkg-config files in "lib", since pkg-config files
> need reference its include and binaries need reference its libraries.
> 
> So it seems to me a "dev" output is unavoidable, also both Debian and
> Alpine Linux use '-dev' packages for the same reason, it should be
> familiar to learn..
There is little point in a dev package when you blow up its size by
propagating inputs…
宋文武 Aug. 30, 2023, 10:55 a.m. UTC | #7
Liliana Marie Prikler <liliana.prikler@gmail.com> writes:

> Am Dienstag, dem 29.08.2023 um 18:24 +0800 schrieb 宋文武:
>> [...]
>> If put headers and other development files into a "dev" output, then
>> those 300MiB can be saved (won't need to be substituted if
>> substitutes available).  Note that use a "include" output won't help
>> here if you leave pkg-config files in "lib", since pkg-config files
>> need reference its include and binaries need reference its libraries.
>> 
>> So it seems to me a "dev" output is unavoidable, also both Debian and
>> Alpine Linux use '-dev' packages for the same reason, it should be
>> familiar to learn..
> There is little point in a dev package when you blow up its size by
> propagating inputs…

Well propagated-inputs won't increase size of a package, they bring some
of its inputs when build other one to remove the need to list those some
inputs as inputs of the other one.

Consider gtk+ and mousepad:
  gtk+ propagated cairo, freetype, pango, etc.
  
  so I don't need add those inputs to mousepad, and if gtk+ doesn't
  propagated them then I must add them to mousepad because those are
  actually needed (because pkg-config check them and need link flags
  from those dependencies of gtk+).

But for runtime, when substitute of mousepad is available, if we split
thoes packages with "dev" outputs, guix size mousepad would still be
something like:

...
/gnu/store/xxxx-mesa-23.1.4
/gnu/store/xxxx-freetype-2.13.0
/gnu/store/xxxx-gtk+-3.24.37
...

It won't include any "dev" output of its dependencies packages, since
development files are not referenced by mousepad's binary, so the size
of headers and development files is saved (won't be substituted).

Only when substitute of mousepad is not available, you'll start by download
mesa-23.1.4-dev, gtk+3.24.37-dev, etc.  And after build, those "dev"
outputs can be GCed immediately.


Hope it's clear now!
Maxim Cournoyer Sept. 1, 2023, 2:16 a.m. UTC | #8
Hi,

宋文武 <iyzsong@envs.net> writes:

> Liliana Marie Prikler <liliana.prikler@gmail.com> writes:
>
>> I'd really like to avoid yet another special output, when "bin", "lib",
>> etc. have already been given clear meanings, one of which strongly
>> overlaps with "stuff that wants propagated inputs for pkg-config
>> reasons".
>
> The benefit to put headers files and libraries files into seperated
> outputs is to reduce the runtime closure size of packages, for example
> my home profile contains xfce, emacs, fonts, etc. has total 5GiB (by
> guix size), and they have 300MiB under include.
>
> calculated by:
>
> x=/gnu/store/0fyhci5kv03rfb9430jqs9wki2ifq5ja-profile
> guix size $x
> for i in `guix size $x`;
>   do [ -e $i/include ] && du -sb $i/include;
> done | awk '{ sum += $1 } END { print sum / 1024 / 1024 }'
>
> If put headers and other development files into a "dev" output, then
> those 300MiB can be saved (won't need to be substituted if substitutes
> available).  Note that use a "include" output won't help here if you
> leave pkg-config files in "lib", since pkg-config files need reference
> its include and binaries need reference its libraries.
>
> So it seems to me a "dev" output is unavoidable, also both Debian and
> Alpine Linux use '-dev' packages for the same reason, it should be
> familiar to learn..

I'm not convinced that saving 300 MiB on 5 GiB is worth making using
Guix packages more complicated, especially in this age of compressed
file systems (text files compress very well).  We'd also have to change
a couple things to make it convenient and not have to plaster the code
base with ugly `(,lib "dev") inputs ^^'.

I reckon it could all be automated, but I'm not sure it's worth the
effort at this stage (there bigger low hanging fruits to pick in my
opinion -- such as moving static libraries automatically in the gnu
build system, paying more attention to large documentation or unecessary
installed files, etc.).
Maxim Cournoyer Sept. 1, 2023, 2:34 a.m. UTC | #9
Hi Liliana and 宋文武,

Liliana Marie Prikler <liliana.prikler@gmail.com> writes:

> Am Samstag, dem 26.08.2023 um 21:53 +0800 schrieb iyzsong@envs.net:
>> From: 宋文武 <iyzsong@member.fsf.org>
>>
>> * guix/profiles.scm (package->manifest-entry): Only include
>> propagated inputs from a package for its "dev" output, or its "out"
>> output if the package doesn't have a "dev" one.
>> ---
> I think this patch misses the most obvious use case of the out:lib
> split.  I also think that hardcoding this list is bound to fail.
>
> Instead, we could for the time being solve this with yet another
> package property.
>   '((propagate-inputs-from "lib")) ; but not out
>   '((propagate-inputs-from . ("lib"))) ; same meaning, different style
>   '((propagate-inputs-from "out" "lib")) ; but not doc
> If the property is missing, we still propagate from all outputs, as is
> currently done.

I'm rather against the proposed changes as it stands.  I think it'd lead
to a lot of noise in the code base and favor more complex splits of
packages, which I'm doubtful is worth the cost (in terms of
hackiness/complexity).

Since I've known Guix, I've appreciated that a simple 'find $(guix build
some-package) typically shows me the package whole, except perhaps for
its doc and debug symbols.  Having to know which packages have a "lib"
outputs when using them in package definitions is also not fun in my
experience (you start by adding the package, then it fails, then you
verify its outputs, then you add `(,package "lib")), and I fear going to
a "dev" output would bring the same kind of annoyance but at larger
scale.

I think we'd need to evaluate what we'd gain in terms of size reduction
a bit more carefully before moving in this direction and how it'd impact
the user experience.  E.g., if we can reduce the minimal Guix image size
by maybe 30%, that'd be nice, but if we're talking of about 5% like in
your example profile, it doesn't seem worth the complexity to me.
宋文武 Sept. 1, 2023, 12:53 p.m. UTC | #10
Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:

> Hi Liliana and 宋文武,
>
> Liliana Marie Prikler <liliana.prikler@gmail.com> writes:
>
>> Am Samstag, dem 26.08.2023 um 21:53 +0800 schrieb iyzsong@envs.net:
>>> From: 宋文武 <iyzsong@member.fsf.org>
>>>
>>> * guix/profiles.scm (package->manifest-entry): Only include
>>> propagated inputs from a package for its "dev" output, or its "out"
>>> output if the package doesn't have a "dev" one.
>>> ---
>> I think this patch misses the most obvious use case of the out:lib
>> split.  I also think that hardcoding this list is bound to fail.
>>
>> Instead, we could for the time being solve this with yet another
>> package property.
>>   '((propagate-inputs-from "lib")) ; but not out
>>   '((propagate-inputs-from . ("lib"))) ; same meaning, different style
>>   '((propagate-inputs-from "out" "lib")) ; but not doc
>> If the property is missing, we still propagate from all outputs, as is
>> currently done.
>
> [...]
>
> I think we'd need to evaluate what we'd gain in terms of size reduction
> a bit more carefully before moving in this direction and how it'd impact
> the user experience.  E.g., if we can reduce the minimal Guix image size
> by maybe 30%, that'd be nice, but if we're talking of about 5% like in
> your example profile, it doesn't seem worth the complexity to me.

I agree, thanks Maxim and Liliana for the inputs!
Maxim Cournoyer Sept. 1, 2023, 3:03 p.m. UTC | #11
Hi,

宋文武 <iyzsong@envs.net> writes:

[...]

>> I think we'd need to evaluate what we'd gain in terms of size reduction
>> a bit more carefully before moving in this direction and how it'd impact
>> the user experience.  E.g., if we can reduce the minimal Guix image size
>> by maybe 30%, that'd be nice, but if we're talking of about 5% like in
>> your example profile, it doesn't seem worth the complexity to me.
>
> I agree, thanks Maxim and Liliana for the inputs!

Should we close this issue for now?
宋文武 Sept. 2, 2023, 12:43 a.m. UTC | #12
Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:

> Hi,
>
> 宋文武 <iyzsong@envs.net> writes:
>
> [...]
>
>>> I think we'd need to evaluate what we'd gain in terms of size reduction
>>> a bit more carefully before moving in this direction and how it'd impact
>>> the user experience.  E.g., if we can reduce the minimal Guix image size
>>> by maybe 30%, that'd be nice, but if we're talking of about 5% like in
>>> your example profile, it doesn't seem worth the complexity to me.
>>
>> I agree, thanks Maxim and Liliana for the inputs!
>
> Should we close this issue for now?

Yes, closing.
Ludovic Courtès Sept. 9, 2023, 10:38 a.m. UTC | #13
Hello!

iyzsong@envs.net skribis:

> From: 宋文武 <iyzsong@member.fsf.org>
>
> * guix/profiles.scm (package->manifest-entry): Only include propagated inputs
> from a package for its "dev" output, or its "out" output if the package
> doesn't have a "dev" one.

What’s the rationale?

Right now, there’s only one package with an output called “dev”:

--8<---------------cut here---------------start------------->8---
$ guix package -A |cut -f3|grep dev|wc -l
1
--8<---------------cut here---------------end--------------->8---

So the focus on “dev” is debatable.

More importantly, this would break things like:

  guix shell python python-scipy -C -- python3 -c 'import numpy'

… where ‘python-numpy’ is propagated by ‘python-scipy’.

Definitely not something we want to change.

WDYT?

Ludo’.
diff mbox series

Patch

diff --git a/guix/profiles.scm b/guix/profiles.scm
index fea766879d..9ed81dec4d 100644
--- a/guix/profiles.scm
+++ b/guix/profiles.scm
@@ -390,7 +390,14 @@  (define* (package->manifest-entry package #:optional (output "out")
                           ((label package output)
                            (package->manifest-entry package output
                                                     #:parent (delay entry))))
-                        (package-propagated-inputs package)))
+                        ;; Only add propagated inputs for PACKAGE:dev, or
+                        ;; PACKAGE:out when PACKAGE doesn't have a "dev"
+                        ;; output.
+                        (if (if (member "dev" (package-outputs package))
+                                (equal? "dev" output)
+                                (equal? "out" output))
+                            (package-propagated-inputs package)
+                            '())))
             (entry (manifest-entry
                      (name (package-name package))
                      (version (package-version package))