From patchwork Fri Aug 12 05:07:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxim Cournoyer X-Patchwork-Id: 41569 Return-Path: X-Original-To: patchwork@mira.cbaines.net Delivered-To: patchwork@mira.cbaines.net Received: by mira.cbaines.net (Postfix, from userid 113) id 6B2F227BBEA; Fri, 12 Aug 2022 06:09:39 +0100 (BST) X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on mira.cbaines.net X-Spam-Level: X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00,DKIM_ADSP_CUSTOM_MED, DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FROM,MAILING_LIST_MULTI, SPF_HELO_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mira.cbaines.net (Postfix) with ESMTPS id BE8C427BBE9 for ; Fri, 12 Aug 2022 06:09:38 +0100 (BST) Received: from localhost ([::1]:54938 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oMMvR-0001VI-UH for patchwork@mira.cbaines.net; Fri, 12 Aug 2022 01:09:37 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:56164) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oMMux-0001V2-7K for guix-patches@gnu.org; Fri, 12 Aug 2022 01:09:07 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:37379) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1oMMus-0000Xv-P4 for guix-patches@gnu.org; Fri, 12 Aug 2022 01:09:06 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1oMMus-0005vf-KL for guix-patches@gnu.org; Fri, 12 Aug 2022 01:09:02 -0400 X-Loop: help-debbugs@gnu.org Subject: [bug#57151] [PATCH 2/2] gnu: tesseract-ocr: Make the default install minimally useful. Resent-From: Maxim Cournoyer Original-Sender: "Debbugs-submit" Resent-CC: guix-patches@gnu.org Resent-Date: Fri, 12 Aug 2022 05:09:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 57151 X-GNU-PR-Package: guix-patches X-GNU-PR-Keywords: patch To: 57151@debbugs.gnu.org Cc: Maxim Cournoyer Received: via spool by 57151-submit@debbugs.gnu.org id=B57151.166028089222719 (code B ref 57151); Fri, 12 Aug 2022 05:09:02 +0000 Received: (at 57151) by debbugs.gnu.org; 12 Aug 2022 05:08:12 +0000 Received: from localhost ([127.0.0.1]:55360 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oMMu4-0005uM-3M for submit@debbugs.gnu.org; Fri, 12 Aug 2022 01:08:12 -0400 Received: from mail-qt1-f181.google.com ([209.85.160.181]:34514) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oMMty-0005td-RN for 57151@debbugs.gnu.org; Fri, 12 Aug 2022 01:08:11 -0400 Received: by mail-qt1-f181.google.com with SMTP id e28so61180qts.1 for <57151@debbugs.gnu.org>; Thu, 11 Aug 2022 22:08:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=HDg+cFq/ihlwS9SixKXXfP0sbLRkkHbidgtddAS5G+0=; b=UrT1wj9K2bso9tB9cvdLkgmqIfswm0GSwszL5VqhLG9Txd2byRp3uebN9DnqZb4cVt MGKchH+1xvCc3t0iRCOgtsg8sR+fUJYB3Y0ahGUiMfpibewVZMbsymDOkh3hOn4arH64 S9mFhfqgOKLokY+PBSF+l1L6Fpz4WDSP7smbykZlC6uwaH9AN+p72tBFbWZDpML4Bu9A cjlCjBi5id0DdMC7oiX2WGPwKS1VEbyWLuyGHheVjeAvEV3GWtr5b/Lzq2bTbNJXi36f nllgwalSACF7TELdEWVhEtQJRT8pj0C83/nkuvGVWKOrw17hJies+Oj8/Dt6brLAeFiQ 2zhQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=HDg+cFq/ihlwS9SixKXXfP0sbLRkkHbidgtddAS5G+0=; b=2eTHljZrvMI7vTSsFRmezO9CO+JTjjYMD71nY5jZu1z0aWb9BKG8T3tGqxBb0T/wRr 6eA9L/2ZXr+Xc+nCAHG6e1NFG0eunWGgVZaEnKGrJ97z+xkGcVaTdDEj/NMpJZGJhj9c U4WjsYubZkvvCpMDQh0kcMgoxUq7rI32Rfh7pXxuijACX95qpZtN+r7aUzdihhlR2DOz AmoDkaHL6ylVE7qoQBphxd9QaEbpEzFp2nyNdn3sRGn00NViK6tuvFpwS1kogqp7i91O B7HfDULXuiHT4SpdPMTusdELGsuK3wY6NUw8Mpz3JWCSiXnccY/xVYTLrjp7Tk+vQU2b m2+Q== X-Gm-Message-State: ACgBeo1mChn3d2niRTUqZTz6BaYlZCctdCK5ak0Iz58n2Cx3APGZBoPy 3tofcNPen3Hg/rwa90WpaLAA3zyJEnU= X-Google-Smtp-Source: AA6agR6ykEQR0AKc58jhPhnjF9BRYIm9z/GLXZHVYL/Rq9HbaG7QyKeFOLTJy8u6AgxbAmD5e/gIWA== X-Received: by 2002:ac8:5f12:0:b0:343:6510:ed6f with SMTP id x18-20020ac85f12000000b003436510ed6fmr2195974qta.342.1660280881225; Thu, 11 Aug 2022 22:08:01 -0700 (PDT) Received: from localhost.localdomain (dsl-10-148-207.b2b2c.ca. [72.10.148.207]) by smtp.gmail.com with ESMTPSA id l18-20020a37f912000000b006b5fe1c376fsm938253qkj.131.2022.08.11.22.08.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 11 Aug 2022 22:08:00 -0700 (PDT) From: Maxim Cournoyer Date: Fri, 12 Aug 2022 01:07:52 -0400 Message-Id: <20220812050752.3980-2-maxim.cournoyer@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220812050752.3980-1-maxim.cournoyer@gmail.com> References: <20220812050752.3980-1-maxim.cournoyer@gmail.com> MIME-Version: 1.0 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: guix-patches@gnu.org List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-patches-bounces+patchwork=mira.cbaines.net@gnu.org Sender: "Guix-patches" X-getmail-retrieved-from-mailbox: Patches * gnu/packages/ocr.scm (tesseract-ocr) [phases]{adjust-TESSDATA_PREFIX-macro}: New phase. {install-minimal-tessdata}: New phase. [native-inputs]: Add tesseract-ocr-tessdata-fast. [search-paths]: New field. [description]: Mention how to add support for more languages. --- gnu/packages/ocr.scm | 33 ++++++++++++++++++++++++++++++--- 1 file changed, 30 insertions(+), 3 deletions(-) diff --git a/gnu/packages/ocr.scm b/gnu/packages/ocr.scm index e2c9f561cc..21d257ef24 100644 --- a/gnu/packages/ocr.scm +++ b/gnu/packages/ocr.scm @@ -132,6 +132,15 @@ (define-public tesseract-ocr (substitute* "configure.ac" (("AC_SUBST\\(\\[XML_CATALOG_FILES])") "")))) + (add-after 'unpack 'adjust-TESSDATA_PREFIX-macro + (lambda _ + ;; Use a deeper TESSDATA_PREFIX hierarchy so that a more + ;; specific search-path than '/share' can be specified. The + ;; build system uses CPPFLAGS for itself, so we can't simply set + ;; a make flag. + (substitute* "Makefile.am" + (("-DTESSDATA_PREFIX='\"@datadir@\"'") + "-DTESSDATA_PREFIX='\"@datadir@/tesseract-ocr\"'")))) (add-after 'build 'build-training (lambda* (#:key parallel-build? #:allow-other-keys) (define n (if parallel-build? (number->string @@ -140,7 +149,18 @@ (define n (if parallel-build? (number->string (invoke "make" "-j" n "training"))) (add-after 'install 'install-training (lambda _ - (invoke "make" "training-install")))))) + (invoke "make" "training-install"))) + (add-after 'install 'install-minimal-tessdata + ;; tesseract-ocr cannot be used without its trained models data; + ;; install the English language as a minimal base which can be + ;; extended via TESSDATA_PREFIX. + (lambda* (#:key native-inputs inputs #:allow-other-keys) + (define eng.traineddata + "/share/tesseract-ocr/tessdata/eng.traineddata") + (install-file (search-input-file (or native-inputs inputs) + eng.traineddata) + (dirname (string-append #$output + eng.traineddata)))))))) (native-inputs (list asciidoc autoconf @@ -152,13 +172,18 @@ (define n (if parallel-build? (number->string libtool libxml2 ;for XML_CATALOG_FILES libxslt - pkg-config)) + pkg-config + tesseract-ocr-tessdata-fast)) (inputs (list cairo icu4c leptonica pango python-wrapper)) + (native-search-paths (list (search-path-specification + (variable "TESSDATA_PREFIX") + (files (list "share/tesseract-ocr/tessdata")) + (separator #f)))) ;single value (home-page "https://github.com/tesseract-ocr/tesseract") (synopsis "Optical character recognition engine") (description @@ -166,7 +191,9 @@ (define n (if parallel-build? (number->string high accuracy. It supports many languages, output text formatting, hOCR positional information and page layout analysis. Several image formats are supported through the Leptonica library. It can also detect whether text is -monospaced or proportional.") +monospaced or proportional. Support for the English language is included by +default. To add support for more languages, the +@code{tesseract-ocr-tessdata-fast} package should be installed.") (license license:asl2.0))) (define-public gimagereader