nimforum mirror - Ebnf Lexer and Parser generator in nim

victor (orginal) [2018-06-03T04:30:46+02:00] view original

hello all am just wondering if i could see a lexer and parser generator in nimlang, i want to create a mini language that compiles to binary and ive search for a parser generator and found none, just want to know if any tool like antlr exist for nim :)

Araq (orginal) [2018-06-03T22:18:25+02:00] view original

Unfortunately I'm unaware of anything in this domain. It's sad.

Lando (orginal) [2018-06-04T08:52:31+02:00] view original

Does it absolutely have to be eBNF? If not, there's PEGs.

andrea (orginal) [2018-06-04T13:52:04+02:00] view original

Unfortunately, it is not clear to me how to build a parser using the pegs module. I can define a grammar, and even check if some string matches that grammar, but one usually wants to associate actions with matches, such as building some kind of syntax tree, and it is not clear to me whether the pegs module can be used for that purpose

victor (orginal) [2018-06-07T11:45:23+02:00] view original

At least there's no documentation on how to archive this, could be help me with that please?

Udiknedormin (orginal) [2018-06-11T15:10:45+02:00] view original

As far as I know, you can get subexpressions for a pegs entity, so it's seems possible to build tree or do some other arbitrary actions, does it not?

andrea (orginal) [2018-06-13T09:15:10+02:00] view original

@Udiknedormin I don't see how to get subexpressions from the documentation. Even if it were possible, it is still not clear to me how this would help. I think one could recurse over the subexpressions, but even then wouldn't it be easier to write a recursive parser in the first place?

Lando (orginal) [2018-06-15T21:55:47+02:00] view original

Theoretically, PEG captures could be used to get matches for sub-expressions, but captures are limited to 20 in the current implementation and they are no replacement for a proper parser anyways. Luckily, the main matching routine of the pegs module can be easly converted into a simple interpreting event parser by adding some callbacks. Also, the object generated by the peg proc already is a complete AST of the PEG, the node's fields are just not accessible, they just need some exported getters.

So I just copied peg.nim, made these changes and now there's something to work with here. Just clone it and type nimble develop in the top directory to use it instaed of the original pegs module. I will lobby the powers that be to include the changes (at least the PEG AST accessors) in the official pegs module to make this thing obsolete.

Both the event parser and the PEG AST could be used for a parser generator. The parser would need a PEG of the PEG grammar itself, but the one from the doc of pegs doesn't really work. So the PEG AST is the best bet as it is. Since PEG is unambiguous, it should always be possible to generate parser code from a specific PEG AST. The pegs module (and hence xpegs) doesn't work in Nim's VM, so we can't use macros to generate parser code, but have to output actual source code. At least at first, because as soon as someone generates parser code for the PEG grammar itself that does run in the VM, we could then use that parser from there on. But again, a fully correct PEG of the PEG grammar itself would be needed for that.

loloiccl (orginal) [2019-04-04T18:11:03+02:00] view original

I made a BNF lexer/parser generator library (https://github.com/loloiccl/nimly). (unfortunately, EBNF is not supported now)

Araq (orginal) [2019-04-04T20:09:45+02:00] view original

Wow, nimly looks nice. Don't have the time to try it. When you say "EBNF not supported" what exactly is missing?

zevv (orginal) [2019-04-04T21:52:22+02:00] view original

I'm taking the liberty to shamelessly mention my recent project here, as this seems the appropriate thread to do so: NPeg is a PEG-style parser generator which allows free mixing of grammar and Nim code, which should be suitable for the task of lexing and parsing.

It can collect simple string captures, complex captures as a JSON tree, or run arbitrary Nim code at match time.

NPeg is available in nimble, the manual and project page are at https://github.com/zevv/npeg

loloiccl (orginal) [2019-04-05T07:36:06+02:00] view original

nimly support only BNF now. For example, these are missing now.

option ([else] in) : IF cond THEN exp [else]

0 or more ({param} in): PROC NAME RPAR {param} LPAR

But you can write BNF which equal to these.

I will support EBNF later (https://github.com/loloiccl/nimly/issues/21)

Araq (orginal) [2019-04-05T09:53:12+02:00] view original

Ah please support "0 or more", it sucks to write simple loops as recursion.

loloiccl (orginal) [2019-04-10T23:34:16+02:00] view original

Add option and repeat in https://github.com/loloiccl/nimly/pull/22

spip (orginal) [2019-07-27T03:02:46+02:00] view original

npeg has very nice features:

A complete documentation with clear code.

Peg generation at compile time.

AST captures to build an Abstract Syntax Tree while matching the grammar.

Nim code embedding in the rules.

Debugging and tracing functions, with the nice grammar tree view.

Unfortunately, it has two missing features that prevent me using instead of spending my time trying to debug my Nim's pegs grammar:

It has its own grammar syntax for rules that does not follow (E)BNF like Nim's pegs. People are somewhat more used to EBNF syntax, with postfix occurrence patterns, assumed sequences, etc. Existing rules would have to be rewritten and debugged to work with npeg.

It does not support Unicode, meaning being able to parse UTF-8 strings using Runes and Unicode-aware built-in macros, like pegs does, for instance with \\white (=any Unicode whitespace character).

If you were to make it API compatible with Nim's pegs, it would be a great replacement for the pegs module.

zevv (orginal) [2019-08-16T22:32:40+02:00] view original

Hi @spip,

Sorry, only noticed your post just now - for future communication feel free to post into the NPeg issues at github so I get properly notified.

It has its own grammar syntax for rules that does not follow (E)BNF like Nim's pegs

This is a design choice: having a grammar parseable by the Nim compiler has a number of advantages:

Reduced code size because there is no need to create a parser for the grammars itself. Of course this could be made in NPeg itself, but that's a bit of a chicken-and-egg problem :)

Implementation in Nim macros allows for smooth mixing of grammars and nim code, so there is no need for nasty tricks to get nim code back from string grammars etc

Not very important but still nice: syntax highlighting generally keeps working withing macro DSLs

This said, it would probably be not too hard to create a (E)BNF compatible parser with everything that is now in place. I do see some problems with this though: (E)BNF and PEG grammers may look similar, but are not trivially compatible (for example, ordered choice in PEGs). You simply can not parse any arbitrary (E)BNF grammar with a PEG, there are always things that need some reordering or rewriting to make them PEG compatible, or at least more efficient to limit backtracking. (On the other hand: the current syntax is not too far from (E)BNF. For example, take a look at src/npeg/lib/uri.nim for a PEG translation of RFC3984.)

Also, the (E)BNF syntax would need a number of extensions in order to specify captures or other actions to perform at parse time, which kind of defies the purpose of having a compatible grammar to start with. Last but not least: I see no clean way to mix grammar and Nim code. I'm very much open to any ideas and experiments, so let me hear if you have any practical suggestions!

It does not support Unicode, meaning being able to parse UTF-8 strings using Runes

Like I said in the manual: there is rudimentary UTF-8 support available, and I'm not sure what exactly would be needed to make NPeg really "UTF-8 compatible". Over the last few days I added proper library support for NPeg, and started a bare minimum utf8 lib. The same applies here: I'd be glad to hear any ideas you might have and I'm happy to see if we can make NPeg suit your needs!

dponyatov (orginal) [2019-08-21T21:55:28+02:00] view original

As I read, Nim has seamless with any C libraries and code, so for the lexer, you can use Ragel, it produces readable and compact code with -G2 option (I use it on low-end microcontrollers for command parsing).

The more interesting question is Nim able to do backtracking to implement DCG parsing for real complex context-sensitive and arbitrary syntaxes.

jan0sc (orginal) [2021-01-31T22:51:04+01:00] view original

For anyone specifically needing ANTLR integration, I have made a package for using the ANTLR4 runtime via the JavaScript bindings: https://github.com/jan0sc/antlr4nim

jasonfi (orginal) [2021-02-01T04:12:33+01:00] view original

Swig support for Nim would be great: https://github.com/swig/swig/issues/1852

Mirror of forum.nim-lang.org

3881 :: Ebnf Lexer and Parser generator in nim