[bug#33436] gnu: Add python-warcio.

Message ID 4f8816d0-8b47-7299-f31b-a2fa0f592d2d@riseup.net
State Accepted
Headers show
Series [bug#33436] gnu: Add python-warcio. | expand

Checks

Context Check Description
cbaines/applying patch success Successfully applied

Commit Message

swedebugia Nov. 19, 2018, 8:41 p.m. UTC

Comments

Ludovic Courtès Nov. 21, 2018, 10:21 a.m. UTC | #1
Hello!

swedebugia <swedebugia@riseup.net> skribis:

>>From 537b2b111a464956bdec640ea5f84c4598ea66f9 Mon Sep 17 00:00:00 2001
> From: swedebugia <swedebugia@riseup.net>
> Date: Mon, 19 Nov 2018 21:37:46 +0100
> Subject: [PATCH] gnu: Add python-warcio.
>
> * gnu/packages/python.scm: New variable.
                           ^
Nitpick: You forgot the variable name here.  :-)

> +   (arguments
> +    ;; FIXME: Some tests require network access. 150 out of 1354 fail
> +    '(#:tests? #f))

Could you investigate a bit further?  What do the test logs show?

It would be good to see if these tests can be easily fixed, if they
should definitely be skipped (for instance because they rely on some
external service), or if it’s something else.

> +   (home-page "https://github.com/webrecorder/warcio")
> +   (synopsis "Streaming web archival archive (WARC) library")
> +   (description
> +    "warcio is designed for fast, low-level access to web archival
                 ^^
What about: “warcio is a Python library to read and write the WARC format
commonly used in Web archives.  It is designed for…”?

Thank you,
Ludo’.
Ludovic Courtès Jan. 11, 2019, 8:29 a.m. UTC | #2
Ping!

ludo@gnu.org (Ludovic Courtès) skribis:

> Hello!
>
> swedebugia <swedebugia@riseup.net> skribis:
>
>>>>From 537b2b111a464956bdec640ea5f84c4598ea66f9 Mon Sep 17 00:00:00 2001
>> From: swedebugia <swedebugia@riseup.net>
>> Date: Mon, 19 Nov 2018 21:37:46 +0100
>> Subject: [PATCH] gnu: Add python-warcio.
>>
>> * gnu/packages/python.scm: New variable.
>                            ^
> Nitpick: You forgot the variable name here.  :-)
>
>> +   (arguments
>> +    ;; FIXME: Some tests require network access. 150 out of 1354 fail
>> +    '(#:tests? #f))
>
> Could you investigate a bit further?  What do the test logs show?
>
> It would be good to see if these tests can be easily fixed, if they
> should definitely be skipped (for instance because they rely on some
> external service), or if it’s something else.
>
>> +   (home-page "https://github.com/webrecorder/warcio")
>> +   (synopsis "Streaming web archival archive (WARC) library")
>> +   (description
>> +    "warcio is designed for fast, low-level access to web archival
>                  ^^
> What about: “warcio is a Python library to read and write the WARC format
> commonly used in Web archives.  It is designed for…”?
>
> Thank you,
> Ludo’.
swedebugia Feb. 24, 2019, 2:01 a.m. UTC | #3
On 2019-01-11 09:29, Ludovic Courtès wrote:
> Ping!
> 
> ludo@gnu.org (Ludovic Courtès) skribis:
> 
>> Hello!
>>
>> swedebugia <swedebugia@riseup.net> skribis:
>>
>>>> >From 537b2b111a464956bdec640ea5f84c4598ea66f9 Mon Sep 17 00:00:00 2001
>>> From: swedebugia <swedebugia@riseup.net>
>>> Date: Mon, 19 Nov 2018 21:37:46 +0100
>>> Subject: [PATCH] gnu: Add python-warcio.
>>>
>>> * gnu/packages/python.scm: New variable.
>>                            ^
>> Nitpick: You forgot the variable name here.  :-)
>>
>>> +   (arguments
>>> +    ;; FIXME: Some tests require network access. 150 out of 1354 fail
>>> +    '(#:tests? #f))
>>
>> Could you investigate a bit further?  What do the test logs show?
>>
>> It would be good to see if these tests can be easily fixed, if they
>> should definitely be skipped (for instance because they rely on some
>> external service), or if it’s something else.
>>
>>> +   (home-page "https://github.com/webrecorder/warcio")
>>> +   (synopsis "Streaming web archival archive (WARC) library")
>>> +   (description
>>> +    "warcio is designed for fast, low-level access to web archival
>>                  ^^
>> What about: “warcio is a Python library to read and write the WARC format
>> commonly used in Web archives.  It is designed for…”?
>>
>> Thank you,
>> Ludo’.
Here comes the log from the build.

I see a lot of "OSError: [Errno 9] Bad file descriptor" and a few
"FileNotFoundError"

Should I report upstream?
Ludovic Courtès March 4, 2019, 1:43 p.m. UTC | #4
Hello swedebugia,

swedebugia <swedebugia@riseup.net> skribis:

> On 2019-01-11 09:29, Ludovic Courtès wrote:

[...]

>>>> +   (arguments
>>>> +    ;; FIXME: Some tests require network access. 150 out of 1354 fail
>>>> +    '(#:tests? #f))
>>>
>>> Could you investigate a bit further?  What do the test logs show?

[...]

> Here comes the log from the build.
>
> I see a lot of "OSError: [Errno 9] Bad file descriptor" and a few
> "FileNotFoundError"
>
> Should I report upstream?

Yes, please.  Perhaps you need to investigate a little bit beforehand
(see <https://gnu.org/s/guix/manual/en/html_node/Debugging-Build-Failures.html>)
so you can provide them with just the information they need to reproduce
and understand the problem.

Thanks!

Ludo’.
Maxim Cournoyer July 13, 2021, 3:23 p.m. UTC | #5
Hello,

I've added the 'python-wsgiprox' and 'python-certauth' missing
dependencies as of warcio 1.7.4, managed to get the test suite to run
with only 2 disabled tests (due to networking requirements), and pushed
as 89bd7565e8.

Thanks to both of you!

Closing.

Maxim

Patch

From 537b2b111a464956bdec640ea5f84c4598ea66f9 Mon Sep 17 00:00:00 2001
From: swedebugia <swedebugia@riseup.net>
Date: Mon, 19 Nov 2018 21:37:46 +0100
Subject: [PATCH] gnu: Add python-warcio.

* gnu/packages/python.scm: New variable.
---
 gnu/packages/python.scm | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/gnu/packages/python.scm b/gnu/packages/python.scm
index 2b7482a3e..24e8c409f 100644
--- a/gnu/packages/python.scm
+++ b/gnu/packages/python.scm
@@ -14630,3 +14630,29 @@  on regular expressions.")
      "This module implements the PRECIS Framework as described in RFC 8264,
 RFC 8265 and RFC 8266.")
     (license license:expat)))
+
+(define-public python-warcio
+  (package
+   (name "python-warcio")
+   (version "1.6.3")
+   (source
+    (origin
+      (method url-fetch)
+      (uri (pypi-uri "warcio" version))
+      (sha256
+       (base32
+        "1nyhghbag1chh5fml848x799mwgkgmz3l3ipv7lr6p0lj1jq8i1r"))))
+   (build-system python-build-system)
+   (inputs `(("python-six" ,python-six)
+             ("python-requests" ,python-requests)
+             ("python-httpbin" ,python-httpbin)
+             ("python-pytest-cov" ,python-pytest-cov)))
+   (arguments
+    ;; FIXME: Some tests require network access. 150 out of 1354 fail
+    '(#:tests? #f))
+   (home-page "https://github.com/webrecorder/warcio")
+   (synopsis "Streaming web archival archive (WARC) library")
+   (description
+    "warcio is designed for fast, low-level access to web archival
+content, oriented around a stream of WARC records rather than files.")
+   (license license:asl2.0)))
-- 
2.18.0