mbox series

[bug#53536,0/1] Add poppler-with-data.

Message ID 20220125235931.4451-1-higashi@taiju.info
Headers show
Series Add poppler-with-data. | expand

Message

Taiju HIGASHI Jan. 25, 2022, 11:59 p.m. UTC
Hi,

I would like to view PDF files in Japanese with a viewer such as Evince, but
it seems that it cannot render Japanese at present, so I wrote a patch.

As far as I know, in order for Poppler to render CJK text, one of the
following conditions must be met

1. Install poppler with poppler-data preinstalled.
2. Install poppler, then poppler-data in the path expected by
poppler. (`POPPLER_INSTALL_PREFIX/share/poppler`)

ref:
https://github.com/freedesktop/poppler/blob/277f5de9684b3392f0d585bd36ad1a5e9e9e9ed7/CMakeLists.txt#L348-L362

Guix provides poppler and poppler-data as standalone packages, but installing
both will not satisfy either of the above prerequisites.

So I defined poppler with poppler-data as a package with the name
popper-with-data.

This package is intended to be used in the installation of packages that also
have poppler as a dependency, as shown below.

   guix package -i evince --with-input=poppler=poppler-with-data

However, As a user, this is still a bit of a hassle, so if you have a better
idea, I'd like to see it.

Incidentally, it seems that Nix had defined its own environment variable
(PLOPPER_DATADIR) to deal with this problem.

The usage of this variable seems to have been changed in the next commit, but
I personally thought it was a generic and useful way to deal with the problem.

ref:
https://github.com/NixOS/nixpkgs/pull/17819/commits/1bde33074efa11fa2edcf71032d2e634f852f349

If it is allowed to integrate poppler and poppler-data, that would be the
simplest solution.

Thank you.

Taiju HIGASHI (1):
  gnu: Add poppler-with-data.

 gnu/packages/pdf.scm | 11 +++++++++++
 1 file changed, 11 insertions(+)

--
2.34.0

Comments

Liliana Marie Prikler Jan. 26, 2022, 7:37 a.m. UTC | #1
Hi Taiju,

Am Mittwoch, dem 26.01.2022 um 08:59 +0900 schrieb Taiju HIGASHI:
> Hi,
> 
> I would like to view PDF files in Japanese with a viewer such as
> Evince, but it seems that it cannot render Japanese at present, so I
> wrote a patch.
> 
> As far as I know, in order for Poppler to render CJK text, one of the
> following conditions must be met
> 
> 1. Install poppler with poppler-data preinstalled.
> 2. Install poppler, then poppler-data in the path expected by
> poppler. (`POPPLER_INSTALL_PREFIX/share/poppler`)
> 
> ref:
> https://github.com/freedesktop/poppler/blob/277f5de9684b3392f0d585bd36ad1a5e9e9e9ed7/CMakeLists.txt#L348-L362
> 
> Guix provides poppler and poppler-data as standalone packages, but
> installing both will not satisfy either of the above prerequisites.
> 
> So I defined poppler with poppler-data as a package with the name
> popper-with-data.
> 
> This package is intended to be used in the installation of packages
> that also have poppler as a dependency, as shown below.
> 
>    guix package -i evince --with-input=poppler=poppler-with-data
> 
> However, As a user, this is still a bit of a hassle, so if you have a
> better idea, I'd like to see it.
That seems to be one solution, but note that the Qt5 variants of
poppler would still be affected by that bug.  Now note, that poppler-
data itself does not depend on poppler, so we could simply add it as
input to the poppler package.  However, this can not be done on master,
because it'd cause 7k+ rebuilds.  Instead, I suggest we make poppler-
with-data a replacement for poppler, which should by package/inherit
then also apply to the other variants.

I've CC'd Marius, Tobias and Leo to aid me in my judgement here, but I
think grafts would be necessary if we don't want to do input rewriting
with several variants.

> Incidentally, it seems that Nix had defined its own environment
> variable (PLOPPER_DATADIR) to deal with this problem.
Are there any other packages you might want to install into
POPPLER_INSTALL_PREFIX?  If so, a colon-separated POPPLER_DATA_PATH
should be preferred.  Note that if we add that feature, we'd still have
to graft it on master currently.

Cheers
Taiju HIGASHI Jan. 26, 2022, 1:38 p.m. UTC | #2
Hi Liliana,

> That seems to be one solution, but note that the Qt5 variants of
> poppler would still be affected by that bug.

You're right, poppler-qt5 also needs a variant with poppler-data added.


> Instead, I suggest we make poppler-with-data a replacement for
> poppler, which should by package/inherit then also apply to the other
> variants.
>
> I've CC'd Marius, Tobias and Leo to aid me in my judgement here, but I
> think grafts would be necessary if we don't want to do input rewriting
> with several variants.

I apologize if I didn't understand exactly what you meant.
Am I correct in understanding that the direction of this patch is
correct?

The idea is to create a variant called poppler-qt5-with-data and replace
the input of packages that depend on poppler-qt5 with it.

I'm starting to think that this patch proposal is a realistic and
reasonable solution.

I'm using poppler-with-data to write the manifest, but without
poppler-with-data, I'm not sure what I'd do.

In fact, I'm using poppler-with-data to write manifests, and it's much
better than not using poppler-with-data.

At least I don't have to redefine evince etc. on my own anymore.

ref:
https://git.sr.ht/~taiju/taix/tree/8a3ab4407eefe720193e401cf8f11d96550733e9/item/guix-config/package-config.scm


If I am interpreting your reply incorrectly, I would appreciate it if
you could be more specific.


> Are there any other packages you might want to install into
> POPPLER_INSTALL_PREFIX?  If so, a colon-separated POPPLER_DATA_PATH
> should be preferred.  Note that if we add that feature, we'd still have
> to graft it on master currently.

At the moment, there is nothing other than poppler-data that we want to
install in POPPLER_INSTALL_PREFIX.

However, this idea, while generic, may be confusing to users, as shown
in the reason why it was deprecated in Nix.
At least, if the package poppler-with-data exists, we can speculate that
it might be able to solve the problem.

QUOTE:
    Previously we relied on an environment variable POPPLER_DATADIR
    which practically noone used and everyone was expected to set. This
    is a good candidate for a feature option because noone really
    _noticed_ that this data is not available. Disabled by default
    because of this and size of the data (22M).

ref:
https://github.com/NixOS/nixpkgs/pull/17819/commits/1bde33074efa11fa2edcf71032d2e634f852f349


Thanks
Taiju HIGASHI Jan. 26, 2022, 1:42 p.m. UTC | #3
Sorry, the reference URL was wrong.

ref:
https://git.sr.ht/~taiju/taix/tree/fcafe2ccb92975c9273c9fb769cb577e9d64de59/item/guix-config/package-config.scm
Liliana Marie Prikler Jan. 26, 2022, 2:16 p.m. UTC | #4
Hi,

Am Mittwoch, dem 26.01.2022 um 22:38 +0900 schrieb Taiju HIGASHI:
> > Instead, I suggest we make poppler-with-data a replacement for
> > poppler, which should by package/inherit then also apply to the
> > other variants.
> > 
> > I've CC'd Marius, Tobias and Leo to aid me in my judgement here,
> > but I think grafts would be necessary if we don't want to do input
> > rewriting with several variants.
> 
> I apologize if I didn't understand exactly what you meant.
> Am I correct in understanding that the direction of this patch is
> correct?
> 
> The idea is to create a variant called poppler-qt5-with-data and
> replace the input of packages that depend on poppler-qt5 with it.
> 
> I'm starting to think that this patch proposal is a realistic and
> reasonable solution.
I think we can agree that the patch moves in the right direction, but
I'm not sure whether we can claim to be "there yet" with just that
patch alone.  If we want to make it so that evince handles CJK out of
the box (without the user needing to rewrite inputs), we would have to
replace poppler with poppler-with-data. On master first by grafts, and
on core-updates by adding it as input directly.

> I'm using poppler-with-data to write the manifest, but without
> poppler-with-data, I'm not sure what I'd do.
> 
> In fact, I'm using poppler-with-data to write manifests, and it's
> much better than not using poppler-with-data.
> 
> At least I don't have to redefine evince etc. on my own anymore.
> 
> ref:
> https://git.sr.ht/~taiju/taix/tree/8a3ab4407eefe720193e401cf8f11d96550733e9/item/guix-config/package-config.scm
> 
> 
> If I am interpreting your reply incorrectly, I would appreciate it if
> you could be more specific.
What works for you in a manifest is not necessarily something that
should be pushed to upstream as-is.  As you can easily see, it's a
transformation you're able to do locally regardless of the state of the
Guix repo.  For the main channel, "quick hacks to alleviate problems"
are typically discouraged in favour of varying degrees of "proper
solutions".

> > Are there any other packages you might want to install into
> > POPPLER_INSTALL_PREFIX?  If so, a colon-separated POPPLER_DATA_PATH
> > should be preferred.  Note that if we add that feature, we'd still
> > have to graft it on master currently.
> 
> At the moment, there is nothing other than poppler-data that we want
> to install in POPPLER_INSTALL_PREFIX.
> 
> However, this idea, while generic, may be confusing to users, as
> shown in the reason why it was deprecated in Nix.
> At least, if the package poppler-with-data exists, we can speculate
> that it might be able to solve the problem.
> 
> QUOTE:
>     Previously we relied on an environment variable POPPLER_DATADIR
>     which practically noone used and everyone was expected to set.  
> This is a good candidate for a feature option because noone
>     really _noticed_ that this data is not available.  Disabled by
>     default because of this and size of the data (22M).

Unlike Nix, we don't have feature options (not until Ludo pushes the
code to add them, at least, but that too appears to be at least one
core-updates cycle away), so it's either the all-or-nothing approach of
having or not having it as input to poppler, or the environment
variable.  Looking at the output of `guix size', poppler takes up 138.1
MiB on a disk with 7.3 MiB being poppler itself, whereas poppler-data
takes up 12.4 MiB.  I personally think that tradeoff to be worth it --
as in "let's include poppler-data in poppler so as to not discriminate
against our Non-Latin script users".  It's an increase of less than 10%
to support clearly more than 10% of the world's population (though how
many of them use PDFs is an uncertain number).

Cheers
Taiju HIGASHI Jan. 27, 2022, 12:55 a.m. UTC | #5
Hi,

You're absolutely right.
This patch can only alleviate the pain, it cannot eliminate it.

This was a painstaking effort to accommodate users who do not use CJK,
but if you would consider making poppler-with-data a replacement for
poppler, that would be better.

> Unlike Nix, we don't have feature options (not until Ludo pushes the
> code to add them, at least, but that too appears to be at least one
> core-updates cycle away), so it's either the all-or-nothing approach
> of having or not having it as input to poppler, or the environment
> variable.  Looking at the output of `guix size', poppler takes up
> 138.1 MiB on a disk with 7.3 MiB being poppler itself, whereas
> poppler-data takes up 12.4 MiB.  I personally think that tradeoff to
> be worth it as in "let's include poppler-data in poppler so as to not
> discriminate against our Non-Latin script users".  It's an increase of
> less than 10% to support clearly more than 10% of the world's
> population (though how many of them use PDFs is an uncertain number).

I'm on the minority side, so I can't speak strongly, but I'd be very
happy if you would consider bundling poppler-data with poppler by
default.

I had a strong desire to use Guix, so I solved small problems on my own.
However, people who are just trying out Guix may decide that Guix is
useless just because it cannot display Japanese PDFs.

Displaying and outputting PDFs in Japanese is a very basic task, so
struggling with this is not impressive.

It's much better now, but in the past, We often encountered problems
specific to the Japanese environment in GNU/Linux.

In light of this, it is usually best to use a distribution with many
Japanese users.

If they are using Guix and run into such problems, they will think, "I
knew it".

I want more Japanese users to use this wonderful Guix, but I think it
would be a shame if these experiences affect them and they leave.

The above is my opinion as a Japanese, but other non-Latin script users
may have the same opinion.

Thanks
Taiju HIGASHI Feb. 2, 2022, 3:58 a.m. UTC | #6
Hi,

The patch has been reworked to include poppler-data in the poppler.

gimp and glimp have poppler-data as input from the start, but if you
remove it, you can't build, so I left it in.

Thanks
Taiju HIGASHI Feb. 2, 2022, 6:22 a.m. UTC | #7
Sorry, I wrote it earlier in a hurry during a break, so when I reread
it, the text was wrong in many ways.

Removing poppler-data from gimp and glimpse caused the build to fail, so
I'm keeping it in input.