From patchwork Thu Jan 11 21:17:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Troy Figiel X-Patchwork-Id: 58808 Return-Path: X-Original-To: patchwork@mira.cbaines.net Delivered-To: patchwork@mira.cbaines.net Received: by mira.cbaines.net (Postfix, from userid 113) id 8865027BBEA; Thu, 11 Jan 2024 21:43:34 +0000 (GMT) X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on mira.cbaines.net X-Spam-Level: X-Spam-Status: No, score=-3.7 required=5.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_MSPIKE_H5,RCVD_IN_MSPIKE_WL, SPF_HELO_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mira.cbaines.net (Postfix) with ESMTPS id 2424227BBE2 for ; Thu, 11 Jan 2024 21:43:33 +0000 (GMT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rO2pb-0002yU-Rg; Thu, 11 Jan 2024 16:43:21 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rO2pW-0002xb-4H for guix-patches@gnu.org; Thu, 11 Jan 2024 16:43:15 -0500 Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rO2pN-0002Lz-0s for guix-patches@gnu.org; Thu, 11 Jan 2024 16:43:10 -0500 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1rO2pK-0006qV-JH for guix-patches@gnu.org; Thu, 11 Jan 2024 16:43:02 -0500 X-Loop: help-debbugs@gnu.org Subject: [bug#68391] [PATCH 3/3] gnu: Add python-pandera. References: <87edentut3.fsf@troyfigiel.com> In-Reply-To: <87edentut3.fsf@troyfigiel.com> Resent-From: Troy Figiel Original-Sender: "Debbugs-submit" Resent-CC: guix-patches@gnu.org Resent-Date: Thu, 11 Jan 2024 21:43:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 68391 X-GNU-PR-Package: guix-patches X-GNU-PR-Keywords: patch To: 68391@debbugs.gnu.org Received: via spool by 68391-submit@debbugs.gnu.org id=B68391.170500937726298 (code B ref 68391); Thu, 11 Jan 2024 21:43:02 +0000 Received: (at 68391) by debbugs.gnu.org; 11 Jan 2024 21:42:57 +0000 Received: from localhost ([127.0.0.1]:34193 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rO2pF-0006q5-0H for submit@debbugs.gnu.org; Thu, 11 Jan 2024 16:42:57 -0500 Received: from mout-p-102.mailbox.org ([80.241.56.152]:52710) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rO2pD-0006ps-9U for 68391@debbugs.gnu.org; Thu, 11 Jan 2024 16:42:56 -0500 Received: from smtp1.mailbox.org (smtp1.mailbox.org [IPv6:2001:67c:2050:b231:465::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-102.mailbox.org (Postfix) with ESMTPS id 4T9ynj6nPLz9sWl for <68391@debbugs.gnu.org>; Thu, 11 Jan 2024 22:42:49 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=troyfigiel.com; s=MBO0001; t=1705009369; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type; bh=/2RVXWPrlnV/U3j7AQGtIZ2bWAXJ7gIhgR67iN48w3w=; b=WGOxniCodoFnYxVXsK/BLnwxYcmY2r7olLz3z/skoAnodOHAyEurAUTEfb9qRhkzGQTyPH k/kodR8o0Pe7aboy++qCushPIihGyZN0Yd0qqxMWwmnUwRii24gzP+atwllxdx10CFrU2U bwNUNkaRhqLOwVIWYJBUj2pZtrWQlCUxOagZEOqWt60SIdapt47nGkrlWs5bVYY5hon5jK IG0b8MvqO/1QJjfI2sZybBjx/tGShxRaw4HMEhunDn4t6Vq3+DnUAfIe0p8WbLvCwniMJ6 js2BXUlT4u1mNeq/gA0zHmbTsemLbrC/0RYXHZxARimbIUt9LuER/5p3bYDoRQ== From: Troy Figiel Date: Thu, 11 Jan 2024 22:17:50 +0100 Message-ID: <87a5pbtuli.fsf@troyfigiel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 4T9ynj6nPLz9sWl X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: guix-patches@gnu.org List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-patches-bounces+patchwork=mira.cbaines.net@gnu.org Sender: guix-patches-bounces+patchwork=mira.cbaines.net@gnu.org X-getmail-retrieved-from-mailbox: Patches * gnu/packages/python-science.scm (python-pandera): New variable. --- gnu/packages/python-science.scm | 75 +++++++++++++++++++++++++++++++++ 1 file changed, 75 insertions(+) diff --git a/gnu/packages/python-science.scm b/gnu/packages/python-science.scm index d8e0b343fb..f82cbdb79c 100644 --- a/gnu/packages/python-science.scm +++ b/gnu/packages/python-science.scm @@ -619,6 +619,81 @@ (define-public python-pandas-stubs of suggesting best recommended practices for using @code{python-pandas}.") (license license:bsd-3))) +(define-public python-pandera + (package + (name "python-pandera") + (version "0.17.2") + (source + (origin + ;; No tests in the PyPI tarball. + (method git-fetch) + (uri (git-reference + (url "https://github.com/unionai-oss/pandera") + (commit (string-append "v" version)))) + (file-name (git-file-name name version)) + (sha256 + (base32 "1mnqk583z90k1n0z3lfa4rd0ng40v7hqfk7phz5gjmxlzfjbxa1x")) + (modules '((guix build utils))) + ;; These tests require PySpark. We need to remove the entire directory, + ;; since the conftest.py in this directory contains a PySpark import. + ;; (See: https://github.com/pytest-dev/pytest/issues/7452) + (snippet '(delete-file-recursively "tests/pyspark")))) + (build-system pyproject-build-system) + (arguments + (list + #:test-flags '(list "-k" + (string-append + ;; Needs python-pandas >= 1.5 + "not test_python_std_list_dict_generics" + " and not test_python_std_list_dict_empty_and_none" + " and not test_pandas_modules_importable")))) + ;; Pandera comes with a lot of extras. We test as many as possible, but do + ;; not include all of them in the propagated-inputs. Currently, we have to + ;; skip the pyspark and io tests due to missing packages python-pyspark + ;; and python-frictionless. + (propagated-inputs (list python-hypothesis ;strategies extra + python-multimethod + python-numpy + python-packaging + python-pandas + python-pandas-stubs ;mypy extra + python-pydantic + python-scipy ;hypotheses extra + python-typeguard-4 + python-typing-inspect + python-wrapt)) + (native-inputs (list python-dask ;dask extra + python-fastapi ;fastapi extra + python-geopandas ;geopandas extra + python-modin ;modin extra + python-pyarrow ;needed to run fastapi tests + python-pytest + python-pytest-asyncio + python-sphinx + python-uvicorn)) ;needed to run fastapi tests + (home-page "https://github.com/unionai-oss/pandera") + (synopsis "Perform data validation on dataframe-like objects") + (description + "@code{python-pandera} provides a flexible and expressive API for +performing data validation on dataframe-like objects to make data processing +pipelines more readable and robust. Dataframes contain information that +@code{python-pandera} explicitly validates at runtime. This is useful in +production-critical data pipelines or reproducible research settings. With +@code{python-pandera}, you can: + +@itemize +@item Define a schema once and use it to validate different dataframe types. +@item Check the types and properties of columns. +@item Perform more complex statistical validation like hypothesis testing. +@item Seamlessly integrate with existing data pipelines via function decorators. +@item Define dataframe models with the class-based API with pydantic-style syntax. +@item Synthesize data from schema objects for property-based testing. +@item Lazily validate dataframes so that all validation rules are executed. +@item Integrate with a rich ecosystem of tools like @code{python-pydantic}, +@code{python-fastapi} and @code{python-mypy}. +@end itemize") + (license license:expat))) + (define-public python-pythran (package (name "python-pythran")