diff mbox series

[bug#49946,v7,01/32] gnu: tree-sitter: Move to its own module.

Message ID 20221125012142.22579-2-pierre.langlois@gmx.com
State New
Headers show
Series gnu: Add tree-sitter for emacs. | expand

Commit Message

Pierre Langlois Nov. 25, 2022, 1:21 a.m. UTC
* gnu/packages/text-editors.scm (tree-sitter): Move to ...
* gnu/packages/tree-sitter.scm: ... here, a new module.
* gnu/packages/vim.scm: Use (gnu packages tree-sitter).
* gnu/local.mk (GNU_SYSTEM_MODULES): Register tree-sitter.scm new module.
---
 gnu/local.mk                  |  1 +
 gnu/packages/text-editors.scm | 52 -----------------------
 gnu/packages/tree-sitter.scm  | 78 +++++++++++++++++++++++++++++++++++
 gnu/packages/vim.scm          |  1 +
 4 files changed, 80 insertions(+), 52 deletions(-)
 create mode 100644 gnu/packages/tree-sitter.scm

Comments

\( Nov. 25, 2022, 6:37 a.m. UTC | #1
Heya,

On Fri Nov 25, 2022 at 1:21 AM GMT, Pierre Langlois wrote:
> * gnu/packages/text-editors.scm (tree-sitter): Move to ...
> * gnu/packages/tree-sitter.scm: ... here, a new module.
> * gnu/packages/vim.scm: Use (gnu packages tree-sitter).
> * gnu/local.mk (GNU_SYSTEM_MODULES): Register tree-sitter.scm new module.

> --- /dev/null
> +++ b/gnu/packages/tree-sitter.scm

> @@ -0,0 +1,78 @@

> +(define-module (gnu packages tree-sitter)
> +  #:use-module (guix gexp)
> +  #:use-module ((guix licenses) #:prefix license:)
> +  #:use-module (guix build-system gnu)
> +  #:use-module (guix git-download)
> +  #:use-module (guix packages)
> +  #:use-module (guix utils)
> +  #:use-module (gnu packages icu4c))

Please sort these alphabetically.

> +(define-public tree-sitter
> +  (package
> +    (name "tree-sitter")
> +    (version "0.20.6")
> +    (source (origin
> +              (method git-fetch)
> +              (uri (git-reference
> +                    (url "https://github.com/tree-sitter/tree-sitter")
> +                    (commit (string-append "v" version))))
> +              (file-name (git-file-name name version))
> +              (sha256
> +               (base32
> +                "1z20518snyg0zp75qgs5bxmzjqws4dd19vnp6sya494za3qp5b6d"))
> +              (modules '((guix build utils)))
> +              (snippet '(begin

Please change it to use a gexp: ``(snippet #~(begin ...))''.

> +                          ;; Remove bundled ICU parts
> +                          (delete-file-recursively "lib/src/unicode")
> +                          #t))))

This #T is now unnecessary.

> +    (arguments
> +     (list #:phases
> +           '(modify-phases %standard-phases
> +              (delete 'configure))

Please use a gexp here, too.

> +           #:make-flags
> +           #~(list (string-append "PREFIX="
> +                                  #$output)
> +                   (string-append "CC="
> +                                  #$(cc-for-target)))))

Maybe modify the indentation here a little.

  (string-append "PREFIX=" #$output)
  (string-append "CC=" #$(cc-for-target))

> +
> +This package includes the @code{libtree-sitter} runtime library.
> +")

Please remove the newline at the end of the description.

    -- (
Pierre Langlois Nov. 25, 2022, 10:18 a.m. UTC | #2
Hi!

"(" <paren@disroot.org> writes:

> Heya,

Thanks for taking a look and spending time reviewing this, this is
great. I'll work on your suggestions over the weekend.

Thanks,
Pierre
Andrew Tropin Feb. 9, 2023, 10:11 a.m. UTC | #3
On 2022-11-25 10:18, Pierre Langlois wrote:

> Hi!
>
> "(" <paren@disroot.org> writes:
>
>> Heya,
>
> Thanks for taking a look and spending time reviewing this, this is
> great. I'll work on your suggestions over the weekend.
>
> Thanks,
> Pierre

Hi Pierre!

Thank you very much for all the work.

I applied tree-sitter and tree-sitter-cli patches, adjusted them,
addressed comments of unmatching-paren and succesfully built respective
packages.  I'll push them later today or tomorrow.  Just letting you
know to prevent duplication of effort.

Also, later I'll make a separate message on tree-sitter grammars and the
way I see they can be added to Guix and Emacs.
Simon Tournier Feb. 9, 2023, 12:39 p.m. UTC | #4
Hi,

On Thu, 09 Feb 2023 at 14:11, Andrew Tropin <andrew@trop.in> wrote:

> I applied tree-sitter and tree-sitter-cli patches,

Just to be sure to understand, you have only applied 02/32 and 05/32,
right?


[bug#49946] [PATCH v7 02/32] gnu: tree-sitter: Update to 0.20.7.
id:20221125012142.22579-3-pierre.langlois@gmx.com
http://issues.guix.gnu.org/msgid/20221125012142.22579-3-pierre.langlois@gmx.com

[bug#49946] [PATCH v7 05/32] gnu: Add tree-sitter-cli.
id:20221125012142.22579-6-pierre.langlois@gmx.com
http://issues.guix.gnu.org/msgid/20221125012142.22579-6-pierre.langlois@gmx.com

Leaving out all the others, right?

Cheers,
simon
Andrew Tropin Feb. 9, 2023, 2:04 p.m. UTC | #5
On 2023-02-09 13:39, zimoun wrote:

> Hi,
>
> On Thu, 09 Feb 2023 at 14:11, Andrew Tropin <andrew@trop.in> wrote:
>
>> I applied tree-sitter and tree-sitter-cli patches,
>
> Just to be sure to understand, you have only applied 02/32 and 05/32,
> right?
>
>
> [bug#49946] [PATCH v7 02/32] gnu: tree-sitter: Update to 0.20.7.
> id:20221125012142.22579-3-pierre.langlois@gmx.com
> http://issues.guix.gnu.org/msgid/20221125012142.22579-3-pierre.langlois@gmx.com
>
> [bug#49946] [PATCH v7 05/32] gnu: Add tree-sitter-cli.
> id:20221125012142.22579-6-pierre.langlois@gmx.com
> http://issues.guix.gnu.org/msgid/20221125012142.22579-6-pierre.langlois@gmx.com
>
> Leaving out all the others, right?

Merged first 5 patches from 01 to 05, also added one more commit, which
addresses some things from reviews and one commit, which adds html
grammar.

The html grammar is added for the testing purposes.  It relies on
generated parser.c and scanner.c and we will need to repackage it using
grammar.js instead.  I'm not sure if a separate build system is needed
for this, I guess we can just rewrite tree-sitter-grammar function,
which generates packages as in example with tree-sitter-grammar-html:
https://git.savannah.gnu.org/cgit/guix.git/tree/gnu/packages/tree-sitter.scm?h=53b00b91b73bd60412d5bd057e22e6d63194a7f7#n158

Anyway, I only skimmed tree-sitter-build-system source code, and plan to
read it carefully, evaluate and either introduce new build system or
just move all needed parts to tree-sitter-grammar function.  WDYT?
After we done with it we can package all other grammars.

According to emacs integration: I already have a working prototype on my
local setup, which uses builtin treesit package, it's similiar to what
Luis mentioned with search-paths.  We just need to figure out a few
minor details and I'll add it.
Andrew Tropin Feb. 10, 2023, 12:52 p.m. UTC | #6
On 2023-02-09 18:04, Andrew Tropin wrote:

> On 2023-02-09 13:39, zimoun wrote:
>
>> Hi,
>>
>> On Thu, 09 Feb 2023 at 14:11, Andrew Tropin <andrew@trop.in> wrote:
>>
>>> I applied tree-sitter and tree-sitter-cli patches,
>>
>> Just to be sure to understand, you have only applied 02/32 and 05/32,
>> right?
>>
>>
>> [bug#49946] [PATCH v7 02/32] gnu: tree-sitter: Update to 0.20.7.
>> id:20221125012142.22579-3-pierre.langlois@gmx.com
>> http://issues.guix.gnu.org/msgid/20221125012142.22579-3-pierre.langlois@gmx.com
>>
>> [bug#49946] [PATCH v7 05/32] gnu: Add tree-sitter-cli.
>> id:20221125012142.22579-6-pierre.langlois@gmx.com
>> http://issues.guix.gnu.org/msgid/20221125012142.22579-6-pierre.langlois@gmx.com
>>
>> Leaving out all the others, right?
>
> Merged first 5 patches from 01 to 05, also added one more commit, which
> addresses some things from reviews and one commit, which adds html
> grammar.
>
> The html grammar is added for the testing purposes.  It relies on
> generated parser.c and scanner.c and we will need to repackage it using
> grammar.js instead.  I'm not sure if a separate build system is needed
> for this, I guess we can just rewrite tree-sitter-grammar function,
> which generates packages as in example with tree-sitter-grammar-html:
> https://git.savannah.gnu.org/cgit/guix.git/tree/gnu/packages/tree-sitter.scm?h=53b00b91b73bd60412d5bd057e22e6d63194a7f7#n158
>
> Anyway, I only skimmed tree-sitter-build-system source code, and plan to
> read it carefully, evaluate and either introduce new build system or
> just move all needed parts to tree-sitter-grammar function.  WDYT?
> After we done with it we can package all other grammars.

Ok, I realized that the proper build process for tree-sitter grammars is
a little harder than I expected, tree-sitter-build system make sense.  I
reviewed it, made a small change:

--8<---------------cut here---------------start------------->8---
@@ -29,7 +29,7 @@ (define-module (guix build tree-sitter-build-system)
 ;; Commentary:
 ;;
 ;; Build procedures for tree-sitter grammar packages.  This is the
-;; builder-side code, which builds on top fo the node build-system.
+;; builder-side code, which builds on top of the node build-system.
 ;;
 ;; Tree-sitter grammars are written in JavaScript and compiled to a native
 ;; shared object.  The `tree-sitter generate' command invokes `node' in order
@@ -114,7 +114,7 @@ (define (compile-language dir)
                    "-fno-exceptions"
                    "-O2"
                    "-g"
-                   "-o" ,(string-append lib "/" lang ".so")
+                   "-o" ,(string-append lib "/libtree-sitter-" lang ".so")
                    ;; An additional `scanner.{c,cc}' file is sometimes
                    ;; provided.
                    ,@(cond
--8<---------------cut here---------------end--------------->8---

rewrote html grammar to use this build system and made it work with
built-in treesit package.  Also, tried examples of c and cpp grammars
from patches in this thread.

If you ok with it, I'll push the build system to master and update the
html grammar accordingly.

The final result will look like this:

--8<---------------cut here---------------start------------->8---
(define tree-sitter-delete-generated-files
  #~(begin
      (delete-file "binding.gyp")
      (delete-file-recursively "bindings")
      (delete-file "src/grammar.json")
      (delete-file "src/node-types.json")
      (delete-file "src/parser.c")
      (delete-file-recursively "src/tree_sitter")))

(define* (tree-sitter-grammar
          language language-for-synopsis version commit hash
          #:key
          (repository-url
           (format #f "https://github.com/tree-sitter/tree-sitter-~a" language))
          (inputs '()))
  (let ((synopsis (string-append language-for-synopsis
                                 " grammar for tree-sitter"))
        (name (string-append "tree-sitter-grammar-" language)))
    (package
      (name name)
      (version version)
      (home-page repository-url)
      (source (origin
                (method git-fetch)
                (uri (git-reference
                      (url repository-url)
                      (commit commit)))
                (file-name (git-file-name name version))
                (sha256 (base32 hash))
                (modules '((guix build utils)))
                (snippet tree-sitter-delete-generated-files)))
      (build-system tree-sitter-build-system)
      (inputs inputs)
      (synopsis synopsis)
      (description (string-append synopsis "."))
      (license license:expat))))

(define-public tree-sitter-grammar-html
  (tree-sitter-grammar
   "html" "HTML"
   "0.19.0" "v0.19.0"
   "1hg7vbcy7bir6b8x11v0a4x0glvqnsqc3i2ixiarbxmycbgl3axy"))
--8<---------------cut here---------------end--------------->8---

After that we can bring the rest of the grammars.

>
> According to emacs integration: I already have a working prototype on
> my local setup, which uses builtin treesit package, it's similiar to
> what Luis mentioned with search-paths.  We just need to figure out a
> few minor details and I'll add it.

I've sent a patch for emacs+tree-sitter integration in a different reply
in this thread.

Thank you very much for all the work, it's really valuable!
Pierre Langlois Feb. 10, 2023, 3:48 p.m. UTC | #7
Hi Andrew, thanks for pushing this along! It's great to see things
getting merged.

Andrew Tropin <andrew@trop.in> writes:

> [[PGP Signed Part:Undecided]]
> On 2023-02-09 18:04, Andrew Tropin wrote:
>
>> On 2023-02-09 13:39, zimoun wrote:
>>
>>> Hi,
>>>
>>> On Thu, 09 Feb 2023 at 14:11, Andrew Tropin <andrew@trop.in> wrote:
>>>
>>>> I applied tree-sitter and tree-sitter-cli patches,
>>>
>>> Just to be sure to understand, you have only applied 02/32 and 05/32,
>>> right?
>>>
>>>
>>> [bug#49946] [PATCH v7 02/32] gnu: tree-sitter: Update to 0.20.7.
>>> id:20221125012142.22579-3-pierre.langlois@gmx.com
>>> http://issues.guix.gnu.org/msgid/20221125012142.22579-3-pierre.langlois@gmx.com
>>>
>>> [bug#49946] [PATCH v7 05/32] gnu: Add tree-sitter-cli.
>>> id:20221125012142.22579-6-pierre.langlois@gmx.com
>>> http://issues.guix.gnu.org/msgid/20221125012142.22579-6-pierre.langlois@gmx.com
>>>
>>> Leaving out all the others, right?
>>
>> Merged first 5 patches from 01 to 05, also added one more commit, which
>> addresses some things from reviews and one commit, which adds html
>> grammar.
>>
>> The html grammar is added for the testing purposes.  It relies on
>> generated parser.c and scanner.c and we will need to repackage it using
>> grammar.js instead.  I'm not sure if a separate build system is needed
>> for this, I guess we can just rewrite tree-sitter-grammar function,
>> which generates packages as in example with tree-sitter-grammar-html:
>> https://git.savannah.gnu.org/cgit/guix.git/tree/gnu/packages/tree-sitter.scm?h=53b00b91b73bd60412d5bd057e22e6d63194a7f7#n158
>>
>> Anyway, I only skimmed tree-sitter-build-system source code, and plan to
>> read it carefully, evaluate and either introduce new build system or
>> just move all needed parts to tree-sitter-grammar function.  WDYT?
>> After we done with it we can package all other grammars.
>
> Ok, I realized that the proper build process for tree-sitter grammars is
> a little harder than I expected, tree-sitter-build system make sense.  I
> reviewed it, made a small change:

Ah great, I was going to comment to try and push for us to keep the
build system. I originally went with a template package and inheritance,
but Maxime suggested moving to a build-system which ended up making the
package definitions a *lot* nicer IMO (see previous discussion here
https://issues.guix.gnu.org/49946#144). It also allows us to deal with
grammars that depend on each other more nicely I think.

>
> @@ -29,7 +29,7 @@ (define-module (guix build tree-sitter-build-system)
>  ;; Commentary:
>  ;;
>  ;; Build procedures for tree-sitter grammar packages.  This is the
> -;; builder-side code, which builds on top fo the node build-system.
> +;; builder-side code, which builds on top of the node build-system.
>  ;;
>  ;; Tree-sitter grammars are written in JavaScript and compiled to a native
>  ;; shared object.  The `tree-sitter generate' command invokes `node' in order
> @@ -114,7 +114,7 @@ (define (compile-language dir)
>                     "-fno-exceptions"
>                     "-O2"
>                     "-g"
> -                   "-o" ,(string-append lib "/" lang ".so")
> +                   "-o" ,(string-append lib "/libtree-sitter-" lang ".so")
>                     ;; An additional `scanner.{c,cc}' file is sometimes
>                     ;; provided.
>                     ,@(cond
>
>
> rewrote html grammar to use this build system and made it work with
> built-in treesit package.  Also, tried examples of c and cpp grammars
> from patches in this thread.
>
> If you ok with it, I'll push the build system to master and update the
> html grammar accordingly.
>
> The final result will look like this:
>
> (define tree-sitter-delete-generated-files
>   #~(begin
>       (delete-file "binding.gyp")
>       (delete-file-recursively "bindings")
>       (delete-file "src/grammar.json")
>       (delete-file "src/node-types.json")
>       (delete-file "src/parser.c")
>       (delete-file-recursively "src/tree_sitter")))
>
> (define* (tree-sitter-grammar
>           language language-for-synopsis version commit hash
>           #:key
>           (repository-url
>            (format #f "https://github.com/tree-sitter/tree-sitter-~a" language))
>           (inputs '()))
>   (let ((synopsis (string-append language-for-synopsis
>                                  " grammar for tree-sitter"))
>         (name (string-append "tree-sitter-grammar-" language)))
>     (package
>       (name name)
>       (version version)
>       (home-page repository-url)
>       (source (origin
>                 (method git-fetch)
>                 (uri (git-reference
>                       (url repository-url)
>                       (commit commit)))
>                 (file-name (git-file-name name version))
>                 (sha256 (base32 hash))
>                 (modules '((guix build utils)))
>                 (snippet tree-sitter-delete-generated-files)))
>       (build-system tree-sitter-build-system)
>       (inputs inputs)
>       (synopsis synopsis)
>       (description (string-append synopsis "."))
>       (license license:expat))))
>
> (define-public tree-sitter-grammar-html
>   (tree-sitter-grammar
>    "html" "HTML"
>    "0.19.0" "v0.19.0"
>    "1hg7vbcy7bir6b8x11v0a4x0glvqnsqc3i2ixiarbxmycbgl3axy"))
>
> After that we can bring the rest of the grammars.

I would suggest to rmeove the `tree-sitter-grammar' function, and keep
grammars as "regular" package records, even though it's a little bit
more verbose:

--8<---------------cut here---------------start------------->8---
(define-public tree-sitter-html
  (package
    (name "tree-sitter-html")
    (version "0.19.0")
    (source (origin
              (method git-fetch)
              (uri (git-reference
                    (url "https://github.com/tree-sitter/tree-sitter-html")
                    (commit (string-append "v" version))))
              (file-name (git-file-name name version))
              (sha256
               (base32
                "1hg7vbcy7bir6b8x11v0a4x0glvqnsqc3i2ixiarbxmycbgl3axy"))
              (modules '((guix build utils)))
              (snippet tree-sitter-delete-generated-files)))
    (build-system tree-sitter-build-system)
    (home-page "https://github.com/tree-sitter/tree-sitter-html")
    (synopsis "Tree-sitter HTML grammar")
    (description
     "This package provides a HTML grammar for the Tree-sitter library.")
    (license license:expat)))
--8<---------------cut here---------------end--------------->8---

This way, they look like any other package in Guix, which makes it
easier for us to apply automatic changes in the future if needed (for
example like how the input format could be automically updated for all
"simple" package definitions, but had to be manual whenever custom code
refactoring was done). Does this make sense?

>
>>
>> According to emacs integration: I already have a working prototype on
>> my local setup, which uses builtin treesit package, it's similiar to
>> what Luis mentioned with search-paths.  We just need to figure out a
>> few minor details and I'll add it.
>
> I've sent a patch for emacs+tree-sitter integration in a different reply
> in this thread.
>
> Thank you very much for all the work, it's really valuable!

:-)

Thanks,
Pierre
Pierre Langlois Feb. 10, 2023, 5:02 p.m. UTC | #8
Pierre Langlois <pierre.langlois@gmx.com> writes:

> [[PGP Signed Part:Undecided]]
> Hi Andrew, thanks for pushing this along! It's great to see things
> getting merged.
>
> Andrew Tropin <andrew@trop.in> writes:
>
>> [[PGP Signed Part:Undecided]]
>> On 2023-02-09 18:04, Andrew Tropin wrote:
>>
>>> On 2023-02-09 13:39, zimoun wrote:
>>>
>>>> Hi,
>>>>
>>>> On Thu, 09 Feb 2023 at 14:11, Andrew Tropin <andrew@trop.in> wrote:
>>>>
>>>>> I applied tree-sitter and tree-sitter-cli patches,
>>>>
>>>> Just to be sure to understand, you have only applied 02/32 and 05/32,
>>>> right?
>>>>
>>>>
>>>> [bug#49946] [PATCH v7 02/32] gnu: tree-sitter: Update to 0.20.7.
>>>> id:20221125012142.22579-3-pierre.langlois@gmx.com
>>>> http://issues.guix.gnu.org/msgid/20221125012142.22579-3-pierre.langlois@gmx.com
>>>>
>>>> [bug#49946] [PATCH v7 05/32] gnu: Add tree-sitter-cli.
>>>> id:20221125012142.22579-6-pierre.langlois@gmx.com
>>>> http://issues.guix.gnu.org/msgid/20221125012142.22579-6-pierre.langlois@gmx.com
>>>>
>>>> Leaving out all the others, right?
>>>
>>> Merged first 5 patches from 01 to 05, also added one more commit, which
>>> addresses some things from reviews and one commit, which adds html
>>> grammar.
>>>
>>> The html grammar is added for the testing purposes.  It relies on
>>> generated parser.c and scanner.c and we will need to repackage it using
>>> grammar.js instead.  I'm not sure if a separate build system is needed
>>> for this, I guess we can just rewrite tree-sitter-grammar function,
>>> which generates packages as in example with tree-sitter-grammar-html:
>>> https://git.savannah.gnu.org/cgit/guix.git/tree/gnu/packages/tree-sitter.scm?h=53b00b91b73bd60412d5bd057e22e6d63194a7f7#n158
>>>
>>> Anyway, I only skimmed tree-sitter-build-system source code, and plan to
>>> read it carefully, evaluate and either introduce new build system or
>>> just move all needed parts to tree-sitter-grammar function.  WDYT?
>>> After we done with it we can package all other grammars.
>>
>> Ok, I realized that the proper build process for tree-sitter grammars is
>> a little harder than I expected, tree-sitter-build system make sense.  I
>> reviewed it, made a small change:
>
> Ah great, I was going to comment to try and push for us to keep the
> build system. I originally went with a template package and inheritance,
> but Maxime suggested moving to a build-system which ended up making the
> package definitions a *lot* nicer IMO (see previous discussion here
> https://issues.guix.gnu.org/49946#144). It also allows us to deal with
> grammars that depend on each other more nicely I think.
>
>>
>> @@ -29,7 +29,7 @@ (define-module (guix build tree-sitter-build-system)
>>  ;; Commentary:
>>  ;;
>>  ;; Build procedures for tree-sitter grammar packages.  This is the
>> -;; builder-side code, which builds on top fo the node build-system.
>> +;; builder-side code, which builds on top of the node build-system.
>>  ;;
>>  ;; Tree-sitter grammars are written in JavaScript and compiled to a native
>>  ;; shared object.  The `tree-sitter generate' command invokes `node' in order
>> @@ -114,7 +114,7 @@ (define (compile-language dir)
>>                     "-fno-exceptions"
>>                     "-O2"
>>                     "-g"
>> -                   "-o" ,(string-append lib "/" lang ".so")
>> +                   "-o" ,(string-append lib "/libtree-sitter-" lang ".so")
>>                     ;; An additional `scanner.{c,cc}' file is sometimes
>>                     ;; provided.
>>                     ,@(cond
>>
>>
>> rewrote html grammar to use this build system and made it work with
>> built-in treesit package.  Also, tried examples of c and cpp grammars
>> from patches in this thread.
>>
>> If you ok with it, I'll push the build system to master and update the
>> html grammar accordingly.

Oh, I forgot to say, this change to the build system LGTM! I'm really
happy to see it merged soon :-). The path change will probably break the
emacs-28-based tree-sitter support, but that's OK, it's better for the
build-system to be made to target emacs 29's builtin support. I'm sure I
can work around for emacs 28.

Thanks,
Pierre
Andrew Tropin Feb. 12, 2023, 5:55 a.m. UTC | #9
On 2023-02-10 15:48, Pierre Langlois wrote:

> Hi Andrew, thanks for pushing this along! It's great to see things
> getting merged.
>
> Andrew Tropin <andrew@trop.in> writes:
>
>> [[PGP Signed Part:Undecided]]
>> On 2023-02-09 18:04, Andrew Tropin wrote:
>>
>>> On 2023-02-09 13:39, zimoun wrote:
>>>
>>>> Hi,
>>>>
>>>> On Thu, 09 Feb 2023 at 14:11, Andrew Tropin <andrew@trop.in> wrote:
>>>>
>>>>> I applied tree-sitter and tree-sitter-cli patches,
>>>>
>>>> Just to be sure to understand, you have only applied 02/32 and 05/32,
>>>> right?
>>>>
>>>>
>>>> [bug#49946] [PATCH v7 02/32] gnu: tree-sitter: Update to 0.20.7.
>>>> id:20221125012142.22579-3-pierre.langlois@gmx.com
>>>> http://issues.guix.gnu.org/msgid/20221125012142.22579-3-pierre.langlois@gmx.com
>>>>
>>>> [bug#49946] [PATCH v7 05/32] gnu: Add tree-sitter-cli.
>>>> id:20221125012142.22579-6-pierre.langlois@gmx.com
>>>> http://issues.guix.gnu.org/msgid/20221125012142.22579-6-pierre.langlois@gmx.com
>>>>
>>>> Leaving out all the others, right?
>>>
>>> Merged first 5 patches from 01 to 05, also added one more commit, which
>>> addresses some things from reviews and one commit, which adds html
>>> grammar.
>>>
>>> The html grammar is added for the testing purposes.  It relies on
>>> generated parser.c and scanner.c and we will need to repackage it using
>>> grammar.js instead.  I'm not sure if a separate build system is needed
>>> for this, I guess we can just rewrite tree-sitter-grammar function,
>>> which generates packages as in example with tree-sitter-grammar-html:
>>> https://git.savannah.gnu.org/cgit/guix.git/tree/gnu/packages/tree-sitter.scm?h=53b00b91b73bd60412d5bd057e22e6d63194a7f7#n158
>>>
>>> Anyway, I only skimmed tree-sitter-build-system source code, and plan to
>>> read it carefully, evaluate and either introduce new build system or
>>> just move all needed parts to tree-sitter-grammar function.  WDYT?
>>> After we done with it we can package all other grammars.
>>
>> Ok, I realized that the proper build process for tree-sitter grammars is
>> a little harder than I expected, tree-sitter-build system make sense.  I
>> reviewed it, made a small change:
>
> Ah great, I was going to comment to try and push for us to keep the
> build system. I originally went with a template package and inheritance,
> but Maxime suggested moving to a build-system which ended up making the
> package definitions a *lot* nicer IMO (see previous discussion here
> https://issues.guix.gnu.org/49946#144). It also allows us to deal with
> grammars that depend on each other more nicely I think.
>
>>
>> @@ -29,7 +29,7 @@ (define-module (guix build tree-sitter-build-system)
>>  ;; Commentary:
>>  ;;
>>  ;; Build procedures for tree-sitter grammar packages.  This is the
>> -;; builder-side code, which builds on top fo the node build-system.
>> +;; builder-side code, which builds on top of the node build-system.
>>  ;;
>>  ;; Tree-sitter grammars are written in JavaScript and compiled to a native
>>  ;; shared object.  The `tree-sitter generate' command invokes `node' in order
>> @@ -114,7 +114,7 @@ (define (compile-language dir)
>>                     "-fno-exceptions"
>>                     "-O2"
>>                     "-g"
>> -                   "-o" ,(string-append lib "/" lang ".so")
>> +                   "-o" ,(string-append lib "/libtree-sitter-" lang ".so")
>>                     ;; An additional `scanner.{c,cc}' file is sometimes
>>                     ;; provided.
>>                     ,@(cond
>>
>>
>> rewrote html grammar to use this build system and made it work with
>> built-in treesit package.  Also, tried examples of c and cpp grammars
>> from patches in this thread.
>>
>> If you ok with it, I'll push the build system to master and update the
>> html grammar accordingly.
>>
>> The final result will look like this:
>>
>> (define tree-sitter-delete-generated-files
>>   #~(begin
>>       (delete-file "binding.gyp")
>>       (delete-file-recursively "bindings")
>>       (delete-file "src/grammar.json")
>>       (delete-file "src/node-types.json")
>>       (delete-file "src/parser.c")
>>       (delete-file-recursively "src/tree_sitter")))
>>
>> (define* (tree-sitter-grammar
>>           language language-for-synopsis version commit hash
>>           #:key
>>           (repository-url
>>            (format #f "https://github.com/tree-sitter/tree-sitter-~a" language))
>>           (inputs '()))
>>   (let ((synopsis (string-append language-for-synopsis
>>                                  " grammar for tree-sitter"))
>>         (name (string-append "tree-sitter-grammar-" language)))
>>     (package
>>       (name name)
>>       (version version)
>>       (home-page repository-url)
>>       (source (origin
>>                 (method git-fetch)
>>                 (uri (git-reference
>>                       (url repository-url)
>>                       (commit commit)))
>>                 (file-name (git-file-name name version))
>>                 (sha256 (base32 hash))
>>                 (modules '((guix build utils)))
>>                 (snippet tree-sitter-delete-generated-files)))
>>       (build-system tree-sitter-build-system)
>>       (inputs inputs)
>>       (synopsis synopsis)
>>       (description (string-append synopsis "."))
>>       (license license:expat))))
>>
>> (define-public tree-sitter-grammar-html
>>   (tree-sitter-grammar
>>    "html" "HTML"
>>    "0.19.0" "v0.19.0"
>>    "1hg7vbcy7bir6b8x11v0a4x0glvqnsqc3i2ixiarbxmycbgl3axy"))
>>
>> After that we can bring the rest of the grammars.
>
> I would suggest to rmeove the `tree-sitter-grammar' function, and keep
> grammars as "regular" package records, even though it's a little bit
> more verbose:
>
> --8<---------------cut here---------------start------------->8---
> (define-public tree-sitter-html
>   (package
>     (name "tree-sitter-html")

It seems tree-sitter-html mimics upstream package name and probably make
more sense than tree-sitter-grammar-html used by me.

>     (version "0.19.0")
>     (source (origin
>               (method git-fetch)
>               (uri (git-reference
>                     (url "https://github.com/tree-sitter/tree-sitter-html")
>                     (commit (string-append "v" version))))
>               (file-name (git-file-name name version))
>               (sha256
>                (base32
>                 "1hg7vbcy7bir6b8x11v0a4x0glvqnsqc3i2ixiarbxmycbgl3axy"))
>               (modules '((guix build utils)))
>               (snippet tree-sitter-delete-generated-files)))
>     (build-system tree-sitter-build-system)
>     (home-page "https://github.com/tree-sitter/tree-sitter-html")
>     (synopsis "Tree-sitter HTML grammar")
>     (description
>      "This package provides a HTML grammar for the Tree-sitter library.")
>     (license license:expat)))
> --8<---------------cut here---------------end--------------->8---
>
> This way, they look like any other package in Guix, which makes it
> easier for us to apply automatic changes in the future if needed (for
> example like how the input format could be automically updated for all
> "simple" package definitions, but had to be manual whenever custom code
> refactoring was done). Does this make sense?

Make sense, but on the other hand we already have hunspell, aspell
dictionaries and probably a few more others, which are very similiar in
spirit and we already have to keep in mind their existence on such
automatic code updates.

It looks that the packages differ only in url for the source code, lang
name and sometimes in inputs.  Having template package function can make
management of shared parts more centralized, reduce possibility of
copy-paste mistakes, when the description wasn't updated and so on and
can reduce the amount of the code overall (which also reduces the change
of introducing an error).

I don't have a strong opinion on this topic, but leaning towards the
template function slightly more, however I'm completely ok with the
standalone package definitions as well.  WDYT?

>
>>
>>>
>>> According to emacs integration: I already have a working prototype on
>>> my local setup, which uses builtin treesit package, it's similiar to
>>> what Luis mentioned with search-paths.  We just need to figure out a
>>> few minor details and I'll add it.
>>
>> I've sent a patch for emacs+tree-sitter integration in a different reply
>> in this thread.
>>
>> Thank you very much for all the work, it's really valuable!
>
> :-)
>
> Thanks,
> Pierre
>
Andrew Tropin Feb. 12, 2023, 6:05 a.m. UTC | #10
On 2023-02-10 17:02, Pierre Langlois wrote:

> Pierre Langlois <pierre.langlois@gmx.com> writes:
>
>> [[PGP Signed Part:Undecided]]
>> Hi Andrew, thanks for pushing this along! It's great to see things
>> getting merged.
>>
>> Andrew Tropin <andrew@trop.in> writes:
>>
>>> [[PGP Signed Part:Undecided]]
>>> On 2023-02-09 18:04, Andrew Tropin wrote:
>>>
>>>> On 2023-02-09 13:39, zimoun wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> On Thu, 09 Feb 2023 at 14:11, Andrew Tropin <andrew@trop.in> wrote:
>>>>>
>>>>>> I applied tree-sitter and tree-sitter-cli patches,
>>>>>
>>>>> Just to be sure to understand, you have only applied 02/32 and 05/32,
>>>>> right?
>>>>>
>>>>>
>>>>> [bug#49946] [PATCH v7 02/32] gnu: tree-sitter: Update to 0.20.7.
>>>>> id:20221125012142.22579-3-pierre.langlois@gmx.com
>>>>> http://issues.guix.gnu.org/msgid/20221125012142.22579-3-pierre.langlois@gmx.com
>>>>>
>>>>> [bug#49946] [PATCH v7 05/32] gnu: Add tree-sitter-cli.
>>>>> id:20221125012142.22579-6-pierre.langlois@gmx.com
>>>>> http://issues.guix.gnu.org/msgid/20221125012142.22579-6-pierre.langlois@gmx.com
>>>>>
>>>>> Leaving out all the others, right?
>>>>
>>>> Merged first 5 patches from 01 to 05, also added one more commit, which
>>>> addresses some things from reviews and one commit, which adds html
>>>> grammar.
>>>>
>>>> The html grammar is added for the testing purposes.  It relies on
>>>> generated parser.c and scanner.c and we will need to repackage it using
>>>> grammar.js instead.  I'm not sure if a separate build system is needed
>>>> for this, I guess we can just rewrite tree-sitter-grammar function,
>>>> which generates packages as in example with tree-sitter-grammar-html:
>>>> https://git.savannah.gnu.org/cgit/guix.git/tree/gnu/packages/tree-sitter.scm?h=53b00b91b73bd60412d5bd057e22e6d63194a7f7#n158
>>>>
>>>> Anyway, I only skimmed tree-sitter-build-system source code, and plan to
>>>> read it carefully, evaluate and either introduce new build system or
>>>> just move all needed parts to tree-sitter-grammar function.  WDYT?
>>>> After we done with it we can package all other grammars.
>>>
>>> Ok, I realized that the proper build process for tree-sitter grammars is
>>> a little harder than I expected, tree-sitter-build system make sense.  I
>>> reviewed it, made a small change:
>>
>> Ah great, I was going to comment to try and push for us to keep the
>> build system. I originally went with a template package and inheritance,
>> but Maxime suggested moving to a build-system which ended up making the
>> package definitions a *lot* nicer IMO (see previous discussion here
>> https://issues.guix.gnu.org/49946#144). It also allows us to deal with
>> grammars that depend on each other more nicely I think.
>>
>>>
>>> @@ -29,7 +29,7 @@ (define-module (guix build tree-sitter-build-system)
>>>  ;; Commentary:
>>>  ;;
>>>  ;; Build procedures for tree-sitter grammar packages.  This is the
>>> -;; builder-side code, which builds on top fo the node build-system.
>>> +;; builder-side code, which builds on top of the node build-system.
>>>  ;;
>>>  ;; Tree-sitter grammars are written in JavaScript and compiled to a native
>>>  ;; shared object.  The `tree-sitter generate' command invokes `node' in order
>>> @@ -114,7 +114,7 @@ (define (compile-language dir)
>>>                     "-fno-exceptions"
>>>                     "-O2"
>>>                     "-g"
>>> -                   "-o" ,(string-append lib "/" lang ".so")
>>> +                   "-o" ,(string-append lib "/libtree-sitter-" lang ".so")
>>>                     ;; An additional `scanner.{c,cc}' file is sometimes
>>>                     ;; provided.
>>>                     ,@(cond
>>>
>>>
>>> rewrote html grammar to use this build system and made it work with
>>> built-in treesit package.  Also, tried examples of c and cpp grammars
>>> from patches in this thread.
>>>
>>> If you ok with it, I'll push the build system to master and update the
>>> html grammar accordingly.
>
> Oh, I forgot to say, this change to the build system LGTM! I'm really
> happy to see it merged soon :-). The path change will probably break the
> emacs-28-based tree-sitter support, but that's OK, it's better for the
> build-system to be made to target emacs 29's builtin support. I'm sure I
> can work around for emacs 28.

Actually, I think we can build grammars with both names, just providing
two .so files instead of one.  If you won't find a better workaround we
can go this way.
Pierre Langlois Feb. 12, 2023, 12:07 p.m. UTC | #11
Hi,

Andrew Tropin <andrew@trop.in> writes:

> [[PGP Signed Part:Undecided]]
> On 2023-02-10 15:48, Pierre Langlois wrote:
>
>> Hi Andrew, thanks for pushing this along! It's great to see things
>> getting merged.
>>
>> Andrew Tropin <andrew@trop.in> writes:
>>
>>> [[PGP Signed Part:Undecided]]
>>> On 2023-02-09 18:04, Andrew Tropin wrote:
>>>
>>>> On 2023-02-09 13:39, zimoun wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> On Thu, 09 Feb 2023 at 14:11, Andrew Tropin <andrew@trop.in> wrote:
>>>>>
>>>>>> I applied tree-sitter and tree-sitter-cli patches,
>>>>>
>>>>> Just to be sure to understand, you have only applied 02/32 and 05/32,
>>>>> right?
>>>>>
>>>>>
>>>>> [bug#49946] [PATCH v7 02/32] gnu: tree-sitter: Update to 0.20.7.
>>>>> id:20221125012142.22579-3-pierre.langlois@gmx.com
>>>>> http://issues.guix.gnu.org/msgid/20221125012142.22579-3-pierre.langlois@gmx.com
>>>>>
>>>>> [bug#49946] [PATCH v7 05/32] gnu: Add tree-sitter-cli.
>>>>> id:20221125012142.22579-6-pierre.langlois@gmx.com
>>>>> http://issues.guix.gnu.org/msgid/20221125012142.22579-6-pierre.langlois@gmx.com
>>>>>
>>>>> Leaving out all the others, right?
>>>>
>>>> Merged first 5 patches from 01 to 05, also added one more commit, which
>>>> addresses some things from reviews and one commit, which adds html
>>>> grammar.
>>>>
>>>> The html grammar is added for the testing purposes.  It relies on
>>>> generated parser.c and scanner.c and we will need to repackage it using
>>>> grammar.js instead.  I'm not sure if a separate build system is needed
>>>> for this, I guess we can just rewrite tree-sitter-grammar function,
>>>> which generates packages as in example with tree-sitter-grammar-html:
>>>> https://git.savannah.gnu.org/cgit/guix.git/tree/gnu/packages/tree-sitter.scm?h=53b00b91b73bd60412d5bd057e22e6d63194a7f7#n158
>>>>
>>>> Anyway, I only skimmed tree-sitter-build-system source code, and plan to
>>>> read it carefully, evaluate and either introduce new build system or
>>>> just move all needed parts to tree-sitter-grammar function.  WDYT?
>>>> After we done with it we can package all other grammars.
>>>
>>> Ok, I realized that the proper build process for tree-sitter grammars is
>>> a little harder than I expected, tree-sitter-build system make sense.  I
>>> reviewed it, made a small change:
>>
>> Ah great, I was going to comment to try and push for us to keep the
>> build system. I originally went with a template package and inheritance,
>> but Maxime suggested moving to a build-system which ended up making the
>> package definitions a *lot* nicer IMO (see previous discussion here
>> https://issues.guix.gnu.org/49946#144). It also allows us to deal with
>> grammars that depend on each other more nicely I think.
>>
>>>
>>> @@ -29,7 +29,7 @@ (define-module (guix build tree-sitter-build-system)
>>>  ;; Commentary:
>>>  ;;
>>>  ;; Build procedures for tree-sitter grammar packages.  This is the
>>> -;; builder-side code, which builds on top fo the node build-system.
>>> +;; builder-side code, which builds on top of the node build-system.
>>>  ;;
>>>  ;; Tree-sitter grammars are written in JavaScript and compiled to a native
>>>  ;; shared object.  The `tree-sitter generate' command invokes `node' in order
>>> @@ -114,7 +114,7 @@ (define (compile-language dir)
>>>                     "-fno-exceptions"
>>>                     "-O2"
>>>                     "-g"
>>> -                   "-o" ,(string-append lib "/" lang ".so")
>>> +                   "-o" ,(string-append lib "/libtree-sitter-" lang ".so")
>>>                     ;; An additional `scanner.{c,cc}' file is sometimes
>>>                     ;; provided.
>>>                     ,@(cond
>>>
>>>
>>> rewrote html grammar to use this build system and made it work with
>>> built-in treesit package.  Also, tried examples of c and cpp grammars
>>> from patches in this thread.
>>>
>>> If you ok with it, I'll push the build system to master and update the
>>> html grammar accordingly.
>>>
>>> The final result will look like this:
>>>
>>> (define tree-sitter-delete-generated-files
>>>   #~(begin
>>>       (delete-file "binding.gyp")
>>>       (delete-file-recursively "bindings")
>>>       (delete-file "src/grammar.json")
>>>       (delete-file "src/node-types.json")
>>>       (delete-file "src/parser.c")
>>>       (delete-file-recursively "src/tree_sitter")))
>>>
>>> (define* (tree-sitter-grammar
>>>           language language-for-synopsis version commit hash
>>>           #:key
>>>           (repository-url
>>>            (format #f "https://github.com/tree-sitter/tree-sitter-~a" language))
>>>           (inputs '()))
>>>   (let ((synopsis (string-append language-for-synopsis
>>>                                  " grammar for tree-sitter"))
>>>         (name (string-append "tree-sitter-grammar-" language)))
>>>     (package
>>>       (name name)
>>>       (version version)
>>>       (home-page repository-url)
>>>       (source (origin
>>>                 (method git-fetch)
>>>                 (uri (git-reference
>>>                       (url repository-url)
>>>                       (commit commit)))
>>>                 (file-name (git-file-name name version))
>>>                 (sha256 (base32 hash))
>>>                 (modules '((guix build utils)))
>>>                 (snippet tree-sitter-delete-generated-files)))
>>>       (build-system tree-sitter-build-system)
>>>       (inputs inputs)
>>>       (synopsis synopsis)
>>>       (description (string-append synopsis "."))
>>>       (license license:expat))))
>>>
>>> (define-public tree-sitter-grammar-html
>>>   (tree-sitter-grammar
>>>    "html" "HTML"
>>>    "0.19.0" "v0.19.0"
>>>    "1hg7vbcy7bir6b8x11v0a4x0glvqnsqc3i2ixiarbxmycbgl3axy"))
>>>
>>> After that we can bring the rest of the grammars.
>>
>> I would suggest to rmeove the `tree-sitter-grammar' function, and keep
>> grammars as "regular" package records, even though it's a little bit
>> more verbose:
>>
>> --8<---------------cut here---------------start------------->8---
>> (define-public tree-sitter-html
>>   (package
>>     (name "tree-sitter-html")
>
> It seems tree-sitter-html mimics upstream package name and probably make
> more sense than tree-sitter-grammar-html used by me.

Yeah, at some point I think I had named the packages with "grammar" as
well, but thought it was a bit of a mouthful.  I'm also thinking one day
we may build language bindings as part of the build system (Rust and
NodeJS I think ATM), so those packages could do more than ship the
grammar in the future (although we don't know if we'll ever really need
that).

>
>>     (version "0.19.0")
>>     (source (origin
>>               (method git-fetch)
>>               (uri (git-reference
>>                     (url "https://github.com/tree-sitter/tree-sitter-html")
>>                     (commit (string-append "v" version))))
>>               (file-name (git-file-name name version))
>>               (sha256
>>                (base32
>>                 "1hg7vbcy7bir6b8x11v0a4x0glvqnsqc3i2ixiarbxmycbgl3axy"))
>>               (modules '((guix build utils)))
>>               (snippet tree-sitter-delete-generated-files)))
>>     (build-system tree-sitter-build-system)
>>     (home-page "https://github.com/tree-sitter/tree-sitter-html")
>>     (synopsis "Tree-sitter HTML grammar")
>>     (description
>>      "This package provides a HTML grammar for the Tree-sitter library.")
>>     (license license:expat)))
>> --8<---------------cut here---------------end--------------->8---
>>
>> This way, they look like any other package in Guix, which makes it
>> easier for us to apply automatic changes in the future if needed (for
>> example like how the input format could be automically updated for all
>> "simple" package definitions, but had to be manual whenever custom code
>> refactoring was done). Does this make sense?
>
> Make sense, but on the other hand we already have hunspell, aspell
> dictionaries and probably a few more others, which are very similiar in
> spirit and we already have to keep in mind their existence on such
> automatic code updates.
>
> It looks that the packages differ only in url for the source code, lang
> name and sometimes in inputs.  Having template package function can make
> management of shared parts more centralized, reduce possibility of
> copy-paste mistakes, when the description wasn't updated and so on and
> can reduce the amount of the code overall (which also reduces the change
> of introducing an error).
>
> I don't have a strong opinion on this topic, but leaning towards the
> template function slightly more, however I'm completely ok with the
> standalone package definitions as well.  WDYT?

I can think of both cost/benefits to the template so I don't have a
strong opinion either :-).

I do like the template to make sure people don't forget to delete
generated files, that's quite important as it seems upstream packages
often check-in the generated C code.  Although, we could probably assert
that with in the build-system phase?  I'll think about that.

On the other hand, I wonder how the template works for packages that
provide multiple grammars (see ocaml and typescript for example).  I
guess we could use the template for trivial packages, and standalone
definitions for more complex ones?  In general, if we keep the template
interface really simple, then I'm happy with it.

Thanks,
Pierre
Pierre Langlois Feb. 12, 2023, 12:24 p.m. UTC | #12
Andrew Tropin <andrew@trop.in> writes:

> [[PGP Signed Part:Undecided]]
> On 2023-02-10 17:02, Pierre Langlois wrote:
>
>> Pierre Langlois <pierre.langlois@gmx.com> writes:
>>
>>> [[PGP Signed Part:Undecided]]
>>> Hi Andrew, thanks for pushing this along! It's great to see things
>>> getting merged.
>>>
>>> Andrew Tropin <andrew@trop.in> writes:
>>>
>>>> [[PGP Signed Part:Undecided]]
>>>> On 2023-02-09 18:04, Andrew Tropin wrote:
>>>>
>>>>> On 2023-02-09 13:39, zimoun wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> On Thu, 09 Feb 2023 at 14:11, Andrew Tropin <andrew@trop.in> wrote:
>>>>>>
>>>>>>> I applied tree-sitter and tree-sitter-cli patches,
>>>>>>
>>>>>> Just to be sure to understand, you have only applied 02/32 and 05/32,
>>>>>> right?
>>>>>>
>>>>>>
>>>>>> [bug#49946] [PATCH v7 02/32] gnu: tree-sitter: Update to 0.20.7.
>>>>>> id:20221125012142.22579-3-pierre.langlois@gmx.com
>>>>>> http://issues.guix.gnu.org/msgid/20221125012142.22579-3-pierre.langlois@gmx.com
>>>>>>
>>>>>> [bug#49946] [PATCH v7 05/32] gnu: Add tree-sitter-cli.
>>>>>> id:20221125012142.22579-6-pierre.langlois@gmx.com
>>>>>> http://issues.guix.gnu.org/msgid/20221125012142.22579-6-pierre.langlois@gmx.com
>>>>>>
>>>>>> Leaving out all the others, right?
>>>>>
>>>>> Merged first 5 patches from 01 to 05, also added one more commit, which
>>>>> addresses some things from reviews and one commit, which adds html
>>>>> grammar.
>>>>>
>>>>> The html grammar is added for the testing purposes.  It relies on
>>>>> generated parser.c and scanner.c and we will need to repackage it using
>>>>> grammar.js instead.  I'm not sure if a separate build system is needed
>>>>> for this, I guess we can just rewrite tree-sitter-grammar function,
>>>>> which generates packages as in example with tree-sitter-grammar-html:
>>>>> https://git.savannah.gnu.org/cgit/guix.git/tree/gnu/packages/tree-sitter.scm?h=53b00b91b73bd60412d5bd057e22e6d63194a7f7#n158
>>>>>
>>>>> Anyway, I only skimmed tree-sitter-build-system source code, and plan to
>>>>> read it carefully, evaluate and either introduce new build system or
>>>>> just move all needed parts to tree-sitter-grammar function.  WDYT?
>>>>> After we done with it we can package all other grammars.
>>>>
>>>> Ok, I realized that the proper build process for tree-sitter grammars is
>>>> a little harder than I expected, tree-sitter-build system make sense.  I
>>>> reviewed it, made a small change:
>>>
>>> Ah great, I was going to comment to try and push for us to keep the
>>> build system. I originally went with a template package and inheritance,
>>> but Maxime suggested moving to a build-system which ended up making the
>>> package definitions a *lot* nicer IMO (see previous discussion here
>>> https://issues.guix.gnu.org/49946#144). It also allows us to deal with
>>> grammars that depend on each other more nicely I think.
>>>
>>>>
>>>> @@ -29,7 +29,7 @@ (define-module (guix build tree-sitter-build-system)
>>>>  ;; Commentary:
>>>>  ;;
>>>>  ;; Build procedures for tree-sitter grammar packages.  This is the
>>>> -;; builder-side code, which builds on top fo the node build-system.
>>>> +;; builder-side code, which builds on top of the node build-system.
>>>>  ;;
>>>>  ;; Tree-sitter grammars are written in JavaScript and compiled to a native
>>>>  ;; shared object.  The `tree-sitter generate' command invokes `node' in order
>>>> @@ -114,7 +114,7 @@ (define (compile-language dir)
>>>>                     "-fno-exceptions"
>>>>                     "-O2"
>>>>                     "-g"
>>>> -                   "-o" ,(string-append lib "/" lang ".so")
>>>> +                   "-o" ,(string-append lib "/libtree-sitter-" lang ".so")
>>>>                     ;; An additional `scanner.{c,cc}' file is sometimes
>>>>                     ;; provided.
>>>>                     ,@(cond
>>>>
>>>>
>>>> rewrote html grammar to use this build system and made it work with
>>>> built-in treesit package.  Also, tried examples of c and cpp grammars
>>>> from patches in this thread.
>>>>
>>>> If you ok with it, I'll push the build system to master and update the
>>>> html grammar accordingly.
>>
>> Oh, I forgot to say, this change to the build system LGTM! I'm really
>> happy to see it merged soon :-). The path change will probably break the
>> emacs-28-based tree-sitter support, but that's OK, it's better for the
>> build-system to be made to target emacs 29's builtin support. I'm sure I
>> can work around for emacs 28.
>
> Actually, I think we can build grammars with both names, just providing
> two .so files instead of one.  If you won't find a better workaround we
> can go this way.

That's no problem, I was able to work-around it quite easily.  The
emacs-tree-sitter-langs package expects all the grammars supported to be
in a single directory, so we have to bundle them.  So we can quite
easily create symlinks from the bundle with the expected names,
something like:

--8<---------------cut here---------------start------------->8---
(define-public emacs-tree-sitter-langs-grammar-bundle
  (package
    (name "emacs-tree-sitter-langs-grammar-bundle")
    (source #f)
    (version (package-version tree-sitter))
    (build-system trivial-build-system)
    (inputs
     ;; FIXME: Support for some languages is still left to package.
     (list tree-sitter-bash
           tree-sitter-c
           tree-sitter-c-sharp
           tree-sitter-cpp
           tree-sitter-css
           tree-sitter-elixir
           tree-sitter-elm
           tree-sitter-go
           tree-sitter-haskell
           tree-sitter-html
           tree-sitter-java
           tree-sitter-javascript
           tree-sitter-json
           tree-sitter-julia
           tree-sitter-lua
           tree-sitter-ocaml
           tree-sitter-php
           tree-sitter-python
           tree-sitter-r
           tree-sitter-rust
           tree-sitter-ruby
           tree-sitter-typescript))
    (arguments
     (list #:builder
           (with-imported-modules '((guix build union)
                                    (guix build utils))
             #~(begin
                 (use-modules (ice-9 match)
                              (srfi srfi-1)
                              (guix build union)
                              (guix build utils))
                 (mkdir-p #$output)
                 (for-each
                  (lambda (lib)
                    (symlink lib
                             (string-append #$output "/"
                                            (substring (basename lib)
                                                       (string-length "libtree-sitter-")))))
                  (append-map (match-lambda
                                ((name directory)
                                 (find-files directory "libtree-sitter-.*\\.so$")))
                              '#$(package-inputs this-package)))))))
    (synopsis #f)
    (description #f)
    (home-page #f)
    (license #f)))
--8<---------------cut here---------------end--------------->8---

Giving us:

--8<---------------cut here---------------start------------->8---
/gnu/store/fr9h59wgn55ilfifvm5df7xzxfwh38pc-emacs-tree-sitter-langs-grammar-bundle-0.20.7
├── bash.so -> /gnu/store/9qzvcdlpryjl44klfq0i2liqf6lsp5jq-tree-sitter-bash-0.19.0/lib/tree-sitter/libtree-sitter-bash.so
├── cpp.so -> /gnu/store/my02kq2dr6h48nmrl2dnfnm5rggx837w-tree-sitter-cpp-0.20.0-1.5ead1e2/lib/tree-sitter/libtree-sitter-cpp.so
├── c_sharp.so -> /gnu/store/mp8jvbhx5xlgj5bxa52zhmk5x8i7md5h-tree-sitter-c-sharp-0.19.1/lib/tree-sitter/libtree-sitter-c_sharp.so
├── c.so -> /gnu/store/y9ln76yx68q495vx8bnqdy87x7k8ihy5-tree-sitter-c-0.20.2/lib/tree-sitter/libtree-sitter-c.so
├── css.so -> /gnu/store/fwji59sdwvqpqyb94m55qw1ak92cmys9-tree-sitter-css-0.19.0/lib/tree-sitter/libtree-sitter-css.so
├── elixir.so -> /gnu/store/f6ismnxf7hlh1sq87zzarw56avvrzsl9-tree-sitter-elixir-0.19.0-1.de20391/lib/tree-sitter/libtree-sitter-elixir.so
├── elm.so -> /gnu/store/56zjl2ljvba3z1j6zfmpfa8mq19a3g3a-tree-sitter-elm-5.6.3/lib/tree-sitter/libtree-sitter-elm.so
├── go.so -> /gnu/store/qfy5sw6za96wkszbi21adxbxzggvjz3g-tree-sitter-go-0.19.1-1.05900fa/lib/tree-sitter/libtree-sitter-go.so
├── haskell.so -> /gnu/store/cq3chmqcb8g1nf5mzb6yhzdd6x9gvg1v-tree-sitter-haskell-0.14.0-1.e30bdfd/lib/tree-sitter/libtree-sitter-haskell.so
├── html.so -> /gnu/store/5pqfyjrg3yyvxaxidk690ffls3yb7wbi-tree-sitter-html-0.19.0/lib/tree-sitter/libtree-sitter-html.so
├── javascript.so -> /gnu/store/28s09v3dfb1c9bdkq2791z9abxnih66p-tree-sitter-javascript-0.20.0/lib/tree-sitter/libtree-sitter-javascript.so
├── java.so -> /gnu/store/i7jlqf5hbv0rhlkp4qlyc2d5ndw69dx4-tree-sitter-java-0.20.1/lib/tree-sitter/libtree-sitter-java.so
├── json.so -> /gnu/store/5dv4r74p6gd9w2ncs6pjrhz5hbw47262-tree-sitter-json-0.19.0/lib/tree-sitter/libtree-sitter-json.so
├── julia.so -> /gnu/store/5k37g1sdsllgh64p0w6ggabsni6jqlkr-tree-sitter-julia-0.19.0/lib/tree-sitter/libtree-sitter-julia.so
├── lua.so -> /gnu/store/nqzn4a6kgb2rx4y44pxdm4sqf3pzcpz1-tree-sitter-lua-0.0.14/lib/tree-sitter/libtree-sitter-lua.so
├── ocaml_interface.so -> /gnu/store/3h7krcj3xxclirb8afxh65ipabw1821l-tree-sitter-ocaml-0.19.0-1.0348562/lib/tree-sitter/libtree-sitter-ocaml_interface.so
├── ocaml.so -> /gnu/store/3h7krcj3xxclirb8afxh65ipabw1821l-tree-sitter-ocaml-0.19.0-1.0348562/lib/tree-sitter/libtree-sitter-ocaml.so
├── php.so -> /gnu/store/ymxf5m8jhihbrag7v2pghgydj3byp7wh-tree-sitter-php-0.19.0-1.435fa00/lib/tree-sitter/libtree-sitter-php.so
├── python.so -> /gnu/store/ggmzicwfxb7gz1rr9lfkx8cak62bfw7v-tree-sitter-python-0.19.1-1.ed0fe62/lib/tree-sitter/libtree-sitter-python.so
├── r.so -> /gnu/store/y9dxbnbb5dyf0rq3kpar7ip4w3lq6sb9-tree-sitter-r-0.0.1-1.80efda5/lib/tree-sitter/libtree-sitter-r.so
├── ruby.so -> /gnu/store/ky8n30dw16ck6byaqnhbf9ib7xp7j0yw-tree-sitter-ruby-0.20.0/lib/tree-sitter/libtree-sitter-ruby.so
├── rust.so -> /gnu/store/25zdpwrgq1xibhv7xpg64i4g71xah6g1-tree-sitter-rust-0.20.1/lib/tree-sitter/libtree-sitter-rust.so
├── tsx.so -> /gnu/store/wqpcphz855yjrginwqrymd3xzzxb8k8l-tree-sitter-typescript-0.20.1/lib/tree-sitter/libtree-sitter-tsx.so
└── typescript.so -> /gnu/store/wqpcphz855yjrginwqrymd3xzzxb8k8l-tree-sitter-typescript-0.20.1/lib/tree-sitter/libtree-sitter-typescript.so
--8<---------------cut here---------------end--------------->8---
Andrew Tropin Feb. 14, 2023, 1:24 p.m. UTC | #13
On 2023-02-12 12:07, Pierre Langlois wrote:

> Hi,
>
> Andrew Tropin <andrew@trop.in> writes:
>
>> [[PGP Signed Part:Undecided]]
>> On 2023-02-10 15:48, Pierre Langlois wrote:
>>
>>> Hi Andrew, thanks for pushing this along! It's great to see things
>>> getting merged.
>>>
>>> Andrew Tropin <andrew@trop.in> writes:
>>>
>>>> [[PGP Signed Part:Undecided]]
>>>> On 2023-02-09 18:04, Andrew Tropin wrote:
>>>>
>>>>> On 2023-02-09 13:39, zimoun wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> On Thu, 09 Feb 2023 at 14:11, Andrew Tropin <andrew@trop.in> wrote:
>>>>>>
>>>>>>> I applied tree-sitter and tree-sitter-cli patches,
>>>>>>
>>>>>> Just to be sure to understand, you have only applied 02/32 and 05/32,
>>>>>> right?
>>>>>>
>>>>>>
>>>>>> [bug#49946] [PATCH v7 02/32] gnu: tree-sitter: Update to 0.20.7.
>>>>>> id:20221125012142.22579-3-pierre.langlois@gmx.com
>>>>>> http://issues.guix.gnu.org/msgid/20221125012142.22579-3-pierre.langlois@gmx.com
>>>>>>
>>>>>> [bug#49946] [PATCH v7 05/32] gnu: Add tree-sitter-cli.
>>>>>> id:20221125012142.22579-6-pierre.langlois@gmx.com
>>>>>> http://issues.guix.gnu.org/msgid/20221125012142.22579-6-pierre.langlois@gmx.com
>>>>>>
>>>>>> Leaving out all the others, right?
>>>>>
>>>>> Merged first 5 patches from 01 to 05, also added one more commit, which
>>>>> addresses some things from reviews and one commit, which adds html
>>>>> grammar.
>>>>>
>>>>> The html grammar is added for the testing purposes.  It relies on
>>>>> generated parser.c and scanner.c and we will need to repackage it using
>>>>> grammar.js instead.  I'm not sure if a separate build system is needed
>>>>> for this, I guess we can just rewrite tree-sitter-grammar function,
>>>>> which generates packages as in example with tree-sitter-grammar-html:
>>>>> https://git.savannah.gnu.org/cgit/guix.git/tree/gnu/packages/tree-sitter.scm?h=53b00b91b73bd60412d5bd057e22e6d63194a7f7#n158
>>>>>
>>>>> Anyway, I only skimmed tree-sitter-build-system source code, and plan to
>>>>> read it carefully, evaluate and either introduce new build system or
>>>>> just move all needed parts to tree-sitter-grammar function.  WDYT?
>>>>> After we done with it we can package all other grammars.
>>>>
>>>> Ok, I realized that the proper build process for tree-sitter grammars is
>>>> a little harder than I expected, tree-sitter-build system make sense.  I
>>>> reviewed it, made a small change:
>>>
>>> Ah great, I was going to comment to try and push for us to keep the
>>> build system. I originally went with a template package and inheritance,
>>> but Maxime suggested moving to a build-system which ended up making the
>>> package definitions a *lot* nicer IMO (see previous discussion here
>>> https://issues.guix.gnu.org/49946#144). It also allows us to deal with
>>> grammars that depend on each other more nicely I think.
>>>
>>>>
>>>> @@ -29,7 +29,7 @@ (define-module (guix build tree-sitter-build-system)
>>>>  ;; Commentary:
>>>>  ;;
>>>>  ;; Build procedures for tree-sitter grammar packages.  This is the
>>>> -;; builder-side code, which builds on top fo the node build-system.
>>>> +;; builder-side code, which builds on top of the node build-system.
>>>>  ;;
>>>>  ;; Tree-sitter grammars are written in JavaScript and compiled to a native
>>>>  ;; shared object.  The `tree-sitter generate' command invokes `node' in order
>>>> @@ -114,7 +114,7 @@ (define (compile-language dir)
>>>>                     "-fno-exceptions"
>>>>                     "-O2"
>>>>                     "-g"
>>>> -                   "-o" ,(string-append lib "/" lang ".so")
>>>> +                   "-o" ,(string-append lib "/libtree-sitter-" lang ".so")
>>>>                     ;; An additional `scanner.{c,cc}' file is sometimes
>>>>                     ;; provided.
>>>>                     ,@(cond
>>>>
>>>>
>>>> rewrote html grammar to use this build system and made it work with
>>>> built-in treesit package.  Also, tried examples of c and cpp grammars
>>>> from patches in this thread.
>>>>
>>>> If you ok with it, I'll push the build system to master and update the
>>>> html grammar accordingly.
>>>>
>>>> The final result will look like this:
>>>>
>>>> (define tree-sitter-delete-generated-files
>>>>   #~(begin
>>>>       (delete-file "binding.gyp")
>>>>       (delete-file-recursively "bindings")
>>>>       (delete-file "src/grammar.json")
>>>>       (delete-file "src/node-types.json")
>>>>       (delete-file "src/parser.c")
>>>>       (delete-file-recursively "src/tree_sitter")))
>>>>
>>>> (define* (tree-sitter-grammar
>>>>           language language-for-synopsis version commit hash
>>>>           #:key
>>>>           (repository-url
>>>>            (format #f "https://github.com/tree-sitter/tree-sitter-~a" language))
>>>>           (inputs '()))
>>>>   (let ((synopsis (string-append language-for-synopsis
>>>>                                  " grammar for tree-sitter"))
>>>>         (name (string-append "tree-sitter-grammar-" language)))
>>>>     (package
>>>>       (name name)
>>>>       (version version)
>>>>       (home-page repository-url)
>>>>       (source (origin
>>>>                 (method git-fetch)
>>>>                 (uri (git-reference
>>>>                       (url repository-url)
>>>>                       (commit commit)))
>>>>                 (file-name (git-file-name name version))
>>>>                 (sha256 (base32 hash))
>>>>                 (modules '((guix build utils)))
>>>>                 (snippet tree-sitter-delete-generated-files)))
>>>>       (build-system tree-sitter-build-system)
>>>>       (inputs inputs)
>>>>       (synopsis synopsis)
>>>>       (description (string-append synopsis "."))
>>>>       (license license:expat))))
>>>>
>>>> (define-public tree-sitter-grammar-html
>>>>   (tree-sitter-grammar
>>>>    "html" "HTML"
>>>>    "0.19.0" "v0.19.0"
>>>>    "1hg7vbcy7bir6b8x11v0a4x0glvqnsqc3i2ixiarbxmycbgl3axy"))
>>>>
>>>> After that we can bring the rest of the grammars.
>>>
>>> I would suggest to rmeove the `tree-sitter-grammar' function, and keep
>>> grammars as "regular" package records, even though it's a little bit
>>> more verbose:
>>>
>>> --8<---------------cut here---------------start------------->8---
>>> (define-public tree-sitter-html
>>>   (package
>>>     (name "tree-sitter-html")
>>
>> It seems tree-sitter-html mimics upstream package name and probably make
>> more sense than tree-sitter-grammar-html used by me.
>
> Yeah, at some point I think I had named the packages with "grammar" as
> well, but thought it was a bit of a mouthful.  I'm also thinking one day
> we may build language bindings as part of the build system (Rust and
> NodeJS I think ATM), so those packages could do more than ship the
> grammar in the future (although we don't know if we'll ever really need
> that).
>
>>
>>>     (version "0.19.0")
>>>     (source (origin
>>>               (method git-fetch)
>>>               (uri (git-reference
>>>                     (url "https://github.com/tree-sitter/tree-sitter-html")
>>>                     (commit (string-append "v" version))))
>>>               (file-name (git-file-name name version))
>>>               (sha256
>>>                (base32
>>>                 "1hg7vbcy7bir6b8x11v0a4x0glvqnsqc3i2ixiarbxmycbgl3axy"))
>>>               (modules '((guix build utils)))
>>>               (snippet tree-sitter-delete-generated-files)))
>>>     (build-system tree-sitter-build-system)
>>>     (home-page "https://github.com/tree-sitter/tree-sitter-html")
>>>     (synopsis "Tree-sitter HTML grammar")
>>>     (description
>>>      "This package provides a HTML grammar for the Tree-sitter library.")
>>>     (license license:expat)))
>>> --8<---------------cut here---------------end--------------->8---
>>>
>>> This way, they look like any other package in Guix, which makes it
>>> easier for us to apply automatic changes in the future if needed (for
>>> example like how the input format could be automically updated for all
>>> "simple" package definitions, but had to be manual whenever custom code
>>> refactoring was done). Does this make sense?
>>
>> Make sense, but on the other hand we already have hunspell, aspell
>> dictionaries and probably a few more others, which are very similiar in
>> spirit and we already have to keep in mind their existence on such
>> automatic code updates.
>>
>> It looks that the packages differ only in url for the source code, lang
>> name and sometimes in inputs.  Having template package function can make
>> management of shared parts more centralized, reduce possibility of
>> copy-paste mistakes, when the description wasn't updated and so on and
>> can reduce the amount of the code overall (which also reduces the change
>> of introducing an error).
>>
>> I don't have a strong opinion on this topic, but leaning towards the
>> template function slightly more, however I'm completely ok with the
>> standalone package definitions as well.  WDYT?
>
> I can think of both cost/benefits to the template so I don't have a
> strong opinion either :-).
>
> I do like the template to make sure people don't forget to delete
> generated files, that's quite important as it seems upstream packages
> often check-in the generated C code.  Although, we could probably assert
> that with in the build-system phase?  I'll think about that.
>
> On the other hand, I wonder how the template works for packages that
> provide multiple grammars (see ocaml and typescript for example).  I
> guess we could use the template for trivial packages, and standalone
> definitions for more complex ones?  In general, if we keep the template
> interface really simple, then I'm happy with it.

Hi Pierre!

I spend two days trying grammars with and without helper function and
found hepler quite helpful to reduce boilerplate and errors from
copypaste, so I went the way with helper.  The logic inside is quite
trivial, the only downside I found so far is that in cases when
repository url constructed automatically I can't easily open the repo
url in the browser.

I packaged all the grammars from this thread and a few more on top of
it.  Updated them to usually latest versions, added some comments, when
needed.

If I forgot to reply on something or you have any comments/ideas, let me
know! :)

Kudos to Pierre and everyone, who helped with all the tree-sitter stuff.
Pierre Langlois Feb. 17, 2023, 12:38 p.m. UTC | #14
Hi,

Andrew Tropin <andrew@trop.in> writes:

> [[PGP Signed Part:Undecided]]
> On 2023-02-12 12:07, Pierre Langlois wrote:
>
>> Hi,
>>
>> Andrew Tropin <andrew@trop.in> writes:
>>
>>> [[PGP Signed Part:Undecided]]
>>> On 2023-02-10 15:48, Pierre Langlois wrote:
>>>
>>>> Hi Andrew, thanks for pushing this along! It's great to see things
>>>> getting merged.
>>>>
>>>> Andrew Tropin <andrew@trop.in> writes:
>>>>
>>>>> [[PGP Signed Part:Undecided]]
>>>>> On 2023-02-09 18:04, Andrew Tropin wrote:
>>>>>
>>>>>> On 2023-02-09 13:39, zimoun wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> On Thu, 09 Feb 2023 at 14:11, Andrew Tropin <andrew@trop.in> wrote:
>>>>>>>
>>>>>>>> I applied tree-sitter and tree-sitter-cli patches,
>>>>>>>
>>>>>>> Just to be sure to understand, you have only applied 02/32 and 05/32,
>>>>>>> right?
>>>>>>>
>>>>>>>
>>>>>>> [bug#49946] [PATCH v7 02/32] gnu: tree-sitter: Update to 0.20.7.
>>>>>>> id:20221125012142.22579-3-pierre.langlois@gmx.com
>>>>>>> http://issues.guix.gnu.org/msgid/20221125012142.22579-3-pierre.langlois@gmx.com
>>>>>>>
>>>>>>> [bug#49946] [PATCH v7 05/32] gnu: Add tree-sitter-cli.
>>>>>>> id:20221125012142.22579-6-pierre.langlois@gmx.com
>>>>>>> http://issues.guix.gnu.org/msgid/20221125012142.22579-6-pierre.langlois@gmx.com
>>>>>>>
>>>>>>> Leaving out all the others, right?
>>>>>>
>>>>>> Merged first 5 patches from 01 to 05, also added one more commit, which
>>>>>> addresses some things from reviews and one commit, which adds html
>>>>>> grammar.
>>>>>>
>>>>>> The html grammar is added for the testing purposes.  It relies on
>>>>>> generated parser.c and scanner.c and we will need to repackage it using
>>>>>> grammar.js instead.  I'm not sure if a separate build system is needed
>>>>>> for this, I guess we can just rewrite tree-sitter-grammar function,
>>>>>> which generates packages as in example with tree-sitter-grammar-html:
>>>>>> https://git.savannah.gnu.org/cgit/guix.git/tree/gnu/packages/tree-sitter.scm?h=53b00b91b73bd60412d5bd057e22e6d63194a7f7#n158
>>>>>>
>>>>>> Anyway, I only skimmed tree-sitter-build-system source code, and plan to
>>>>>> read it carefully, evaluate and either introduce new build system or
>>>>>> just move all needed parts to tree-sitter-grammar function.  WDYT?
>>>>>> After we done with it we can package all other grammars.
>>>>>
>>>>> Ok, I realized that the proper build process for tree-sitter grammars is
>>>>> a little harder than I expected, tree-sitter-build system make sense.  I
>>>>> reviewed it, made a small change:
>>>>
>>>> Ah great, I was going to comment to try and push for us to keep the
>>>> build system. I originally went with a template package and inheritance,
>>>> but Maxime suggested moving to a build-system which ended up making the
>>>> package definitions a *lot* nicer IMO (see previous discussion here
>>>> https://issues.guix.gnu.org/49946#144). It also allows us to deal with
>>>> grammars that depend on each other more nicely I think.
>>>>
>>>>>
>>>>> @@ -29,7 +29,7 @@ (define-module (guix build tree-sitter-build-system)
>>>>>  ;; Commentary:
>>>>>  ;;
>>>>>  ;; Build procedures for tree-sitter grammar packages.  This is the
>>>>> -;; builder-side code, which builds on top fo the node build-system.
>>>>> +;; builder-side code, which builds on top of the node build-system.
>>>>>  ;;
>>>>>  ;; Tree-sitter grammars are written in JavaScript and compiled to a native
>>>>>  ;; shared object.  The `tree-sitter generate' command invokes `node' in order
>>>>> @@ -114,7 +114,7 @@ (define (compile-language dir)
>>>>>                     "-fno-exceptions"
>>>>>                     "-O2"
>>>>>                     "-g"
>>>>> -                   "-o" ,(string-append lib "/" lang ".so")
>>>>> +                   "-o" ,(string-append lib "/libtree-sitter-" lang ".so")
>>>>>                     ;; An additional `scanner.{c,cc}' file is sometimes
>>>>>                     ;; provided.
>>>>>                     ,@(cond
>>>>>
>>>>>
>>>>> rewrote html grammar to use this build system and made it work with
>>>>> built-in treesit package.  Also, tried examples of c and cpp grammars
>>>>> from patches in this thread.
>>>>>
>>>>> If you ok with it, I'll push the build system to master and update the
>>>>> html grammar accordingly.
>>>>>
>>>>> The final result will look like this:
>>>>>
>>>>> (define tree-sitter-delete-generated-files
>>>>>   #~(begin
>>>>>       (delete-file "binding.gyp")
>>>>>       (delete-file-recursively "bindings")
>>>>>       (delete-file "src/grammar.json")
>>>>>       (delete-file "src/node-types.json")
>>>>>       (delete-file "src/parser.c")
>>>>>       (delete-file-recursively "src/tree_sitter")))
>>>>>
>>>>> (define* (tree-sitter-grammar
>>>>>           language language-for-synopsis version commit hash
>>>>>           #:key
>>>>>           (repository-url
>>>>>            (format #f "https://github.com/tree-sitter/tree-sitter-~a" language))
>>>>>           (inputs '()))
>>>>>   (let ((synopsis (string-append language-for-synopsis
>>>>>                                  " grammar for tree-sitter"))
>>>>>         (name (string-append "tree-sitter-grammar-" language)))
>>>>>     (package
>>>>>       (name name)
>>>>>       (version version)
>>>>>       (home-page repository-url)
>>>>>       (source (origin
>>>>>                 (method git-fetch)
>>>>>                 (uri (git-reference
>>>>>                       (url repository-url)
>>>>>                       (commit commit)))
>>>>>                 (file-name (git-file-name name version))
>>>>>                 (sha256 (base32 hash))
>>>>>                 (modules '((guix build utils)))
>>>>>                 (snippet tree-sitter-delete-generated-files)))
>>>>>       (build-system tree-sitter-build-system)
>>>>>       (inputs inputs)
>>>>>       (synopsis synopsis)
>>>>>       (description (string-append synopsis "."))
>>>>>       (license license:expat))))
>>>>>
>>>>> (define-public tree-sitter-grammar-html
>>>>>   (tree-sitter-grammar
>>>>>    "html" "HTML"
>>>>>    "0.19.0" "v0.19.0"
>>>>>    "1hg7vbcy7bir6b8x11v0a4x0glvqnsqc3i2ixiarbxmycbgl3axy"))
>>>>>
>>>>> After that we can bring the rest of the grammars.
>>>>
>>>> I would suggest to rmeove the `tree-sitter-grammar' function, and keep
>>>> grammars as "regular" package records, even though it's a little bit
>>>> more verbose:
>>>>
>>>> --8<---------------cut here---------------start------------->8---
>>>> (define-public tree-sitter-html
>>>>   (package
>>>>     (name "tree-sitter-html")
>>>
>>> It seems tree-sitter-html mimics upstream package name and probably make
>>> more sense than tree-sitter-grammar-html used by me.
>>
>> Yeah, at some point I think I had named the packages with "grammar" as
>> well, but thought it was a bit of a mouthful.  I'm also thinking one day
>> we may build language bindings as part of the build system (Rust and
>> NodeJS I think ATM), so those packages could do more than ship the
>> grammar in the future (although we don't know if we'll ever really need
>> that).
>>
>>>
>>>>     (version "0.19.0")
>>>>     (source (origin
>>>>               (method git-fetch)
>>>>               (uri (git-reference
>>>>                     (url "https://github.com/tree-sitter/tree-sitter-html")
>>>>                     (commit (string-append "v" version))))
>>>>               (file-name (git-file-name name version))
>>>>               (sha256
>>>>                (base32
>>>>                 "1hg7vbcy7bir6b8x11v0a4x0glvqnsqc3i2ixiarbxmycbgl3axy"))
>>>>               (modules '((guix build utils)))
>>>>               (snippet tree-sitter-delete-generated-files)))
>>>>     (build-system tree-sitter-build-system)
>>>>     (home-page "https://github.com/tree-sitter/tree-sitter-html")
>>>>     (synopsis "Tree-sitter HTML grammar")
>>>>     (description
>>>>      "This package provides a HTML grammar for the Tree-sitter library.")
>>>>     (license license:expat)))
>>>> --8<---------------cut here---------------end--------------->8---
>>>>
>>>> This way, they look like any other package in Guix, which makes it
>>>> easier for us to apply automatic changes in the future if needed (for
>>>> example like how the input format could be automically updated for all
>>>> "simple" package definitions, but had to be manual whenever custom code
>>>> refactoring was done). Does this make sense?
>>>
>>> Make sense, but on the other hand we already have hunspell, aspell
>>> dictionaries and probably a few more others, which are very similiar in
>>> spirit and we already have to keep in mind their existence on such
>>> automatic code updates.
>>>
>>> It looks that the packages differ only in url for the source code, lang
>>> name and sometimes in inputs.  Having template package function can make
>>> management of shared parts more centralized, reduce possibility of
>>> copy-paste mistakes, when the description wasn't updated and so on and
>>> can reduce the amount of the code overall (which also reduces the change
>>> of introducing an error).
>>>
>>> I don't have a strong opinion on this topic, but leaning towards the
>>> template function slightly more, however I'm completely ok with the
>>> standalone package definitions as well.  WDYT?
>>
>> I can think of both cost/benefits to the template so I don't have a
>> strong opinion either :-).
>>
>> I do like the template to make sure people don't forget to delete
>> generated files, that's quite important as it seems upstream packages
>> often check-in the generated C code.  Although, we could probably assert
>> that with in the build-system phase?  I'll think about that.
>>
>> On the other hand, I wonder how the template works for packages that
>> provide multiple grammars (see ocaml and typescript for example).  I
>> guess we could use the template for trivial packages, and standalone
>> definitions for more complex ones?  In general, if we keep the template
>> interface really simple, then I'm happy with it.
>
> Hi Pierre!
>
> I spend two days trying grammars with and without helper function and
> found hepler quite helpful to reduce boilerplate and errors from
> copypaste, so I went the way with helper.  The logic inside is quite
> trivial, the only downside I found so far is that in cases when
> repository url constructed automatically I can't easily open the repo
> url in the browser.
>
> I packaged all the grammars from this thread and a few more on top of
> it.  Updated them to usually latest versions, added some comments, when
> needed.
>
> If I forgot to reply on something or you have any comments/ideas, let me
> know! :)
>
> Kudos to Pierre and everyone, who helped with all the tree-sitter stuff.

Thank you for landing all this work!
diff mbox series

Patch

diff --git a/gnu/local.mk b/gnu/local.mk
index 7278c50e4f..7d11f4bb27 100644
--- a/gnu/local.mk
+++ b/gnu/local.mk
@@ -603,6 +603,7 @@  GNU_SYSTEM_MODULES =				\
   %D%/packages/tmux.scm				\
   %D%/packages/toolkits.scm			\
   %D%/packages/tor.scm				\
+  %D%/packages/tree-sitter.scm			\
   %D%/packages/tv.scm				\
   %D%/packages/uglifyjs.scm			\
   %D%/packages/uml.scm				\
diff --git a/gnu/packages/text-editors.scm b/gnu/packages/text-editors.scm
index ed77113726..57178e7b87 100644
--- a/gnu/packages/text-editors.scm
+++ b/gnu/packages/text-editors.scm
@@ -70,7 +70,6 @@  (define-module (gnu packages text-editors)
   #:use-module (gnu packages gtk)
   #:use-module (gnu packages guile)
   #:use-module (gnu packages haskell-xyz)
-  #:use-module (gnu packages icu4c)
   #:use-module (gnu packages image)
   #:use-module (gnu packages libbsd)
   #:use-module (gnu packages libreoffice)
@@ -1219,57 +1218,6 @@  (define-public edlin
 FreeDOS as a functional clone of the old MS-DOS program edlin.")
     (license license:gpl2+)))
 
-(define-public tree-sitter
-  (package
-    (name "tree-sitter")
-    (version "0.20.6")
-    (source (origin
-              (method git-fetch)
-              (uri (git-reference
-                    (url "https://github.com/tree-sitter/tree-sitter")
-                    (commit (string-append "v" version))))
-              (file-name (git-file-name name version))
-              (sha256
-               (base32
-                "1z20518snyg0zp75qgs5bxmzjqws4dd19vnp6sya494za3qp5b6d"))
-              (modules '((guix build utils)))
-              (snippet '(begin
-                          ;; Remove bundled ICU parts
-                          (delete-file-recursively "lib/src/unicode")
-                          #t))))
-    (build-system gnu-build-system)
-    (inputs (list icu4c))
-    (arguments
-     (list #:phases
-           '(modify-phases %standard-phases
-              (delete 'configure))
-           #:tests? #f ; there are no tests for the runtime library
-           #:make-flags
-           #~(list (string-append "PREFIX="
-                                  #$output)
-                   (string-append "CC="
-                                  #$(cc-for-target)))))
-    (home-page "https://tree-sitter.github.io/tree-sitter/")
-    (synopsis "Incremental parsing system for programming tools")
-    (description
-     "Tree-sitter is a parser generator tool and an incremental parsing
-library.  It can build a concrete syntax tree for a source file and efficiently
-update the syntax tree as the source file is edited.
-
-Tree-sitter aims to be:
-
-@itemize
-@item General enough to parse any programming language
-@item Fast enough to parse on every keystroke in a text editor
-@item Robust enough to provide useful results even in the presence of syntax errors
-@item Dependency-free so that the runtime library (which is written in pure C)
-can be embedded in any application
-@end itemize
-
-This package includes the @code{libtree-sitter} runtime library.
-")
-    (license license:expat)))
-
 (define-public mle
   (package
     (name "mle")
diff --git a/gnu/packages/tree-sitter.scm b/gnu/packages/tree-sitter.scm
new file mode 100644
index 0000000000..7116a10ed7
--- /dev/null
+++ b/gnu/packages/tree-sitter.scm
@@ -0,0 +1,78 @@ 
+;;; GNU Guix --- Functional package management for GNU
+;;; Copyright © 2022 Luis Henrique Gomes Higino <luishenriquegh2701@gmail.com>
+;;; Copyright © 2022 Pierre Langlois <pierre.langlois@gmx.com>
+;;;
+;;; This file is part of GNU Guix.
+;;;
+;;; GNU Guix is free software; you can redistribute it and/or modify it
+;;; under the terms of the GNU General Public License as published by
+;;; the Free Software Foundation; either version 3 of the License, or (at
+;;; your option) any later version.
+;;;
+;;; GNU Guix is distributed in the hope that it will be useful, but
+;;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;;; GNU General Public License for more details.
+;;;
+;;; You should have received a copy of the GNU General Public License
+;;; along with GNU Guix.  If not, see <http://www.gnu.org/licenses/>.
+
+(define-module (gnu packages tree-sitter)
+  #:use-module (guix gexp)
+  #:use-module ((guix licenses) #:prefix license:)
+  #:use-module (guix build-system gnu)
+  #:use-module (guix git-download)
+  #:use-module (guix packages)
+  #:use-module (guix utils)
+  #:use-module (gnu packages icu4c))
+
+(define-public tree-sitter
+  (package
+    (name "tree-sitter")
+    (version "0.20.6")
+    (source (origin
+              (method git-fetch)
+              (uri (git-reference
+                    (url "https://github.com/tree-sitter/tree-sitter")
+                    (commit (string-append "v" version))))
+              (file-name (git-file-name name version))
+              (sha256
+               (base32
+                "1z20518snyg0zp75qgs5bxmzjqws4dd19vnp6sya494za3qp5b6d"))
+              (modules '((guix build utils)))
+              (snippet '(begin
+                          ;; Remove bundled ICU parts
+                          (delete-file-recursively "lib/src/unicode")
+                          #t))))
+    (build-system gnu-build-system)
+    (inputs (list icu4c))
+    (arguments
+     (list #:phases
+           '(modify-phases %standard-phases
+              (delete 'configure))
+           #:tests? #f ; there are no tests for the runtime library
+           #:make-flags
+           #~(list (string-append "PREFIX="
+                                  #$output)
+                   (string-append "CC="
+                                  #$(cc-for-target)))))
+    (home-page "https://tree-sitter.github.io/tree-sitter/")
+    (synopsis "Incremental parsing system for programming tools")
+    (description
+     "Tree-sitter is a parser generator tool and an incremental parsing
+library.  It can build a concrete syntax tree for a source file and efficiently
+update the syntax tree as the source file is edited.
+
+Tree-sitter aims to be:
+
+@itemize
+@item General enough to parse any programming language
+@item Fast enough to parse on every keystroke in a text editor
+@item Robust enough to provide useful results even in the presence of syntax errors
+@item Dependency-free so that the runtime library (which is written in pure C)
+can be embedded in any application
+@end itemize
+
+This package includes the @code{libtree-sitter} runtime library.
+")
+    (license license:expat)))
diff --git a/gnu/packages/vim.scm b/gnu/packages/vim.scm
index 957f00a92e..0c44c3c114 100644
--- a/gnu/packages/vim.scm
+++ b/gnu/packages/vim.scm
@@ -72,6 +72,7 @@  (define-module (gnu packages vim)
   #:use-module (gnu packages tcl)
   #:use-module (gnu packages text-editors)
   #:use-module (gnu packages terminals)
+  #:use-module (gnu packages tree-sitter)
   #:use-module (gnu packages xdisorg)
   #:use-module (gnu packages xorg))