nimforum mirror - pegs module: how to use captures in enter/leave procs?

Amenhotep (orginal) [2019-04-16T07:25:13+02:00] view original

Hi there!

I need to wite a parser. I tried to do it using the pegs module. I wrote my grammar and it works, it recognizez strings. But now I need to write some actions to actually build a syntactic tree while parsing. I need to access the captured portions while parsing. Something like:

import strutils, pegs

let grammar = peg """
  terms <- ^ term+ $
  term <- \letter* {\d+} \n+
"""

let example = """a1
b2
c3
"""

let parseExample = grammar.eventParser:
    pkNonTerminal:
      enter: # p, s, start
        if p.nt.name=="term":
          echo "  It seems like a new term, let's check it..."
      leave: # p, s, start, length (length = -1 in case of failure)
        if p.nt.name=="term" and length >= 1:
          echo "Yes indeed, it's a term and I captured this: '", $1, "'"

let pLen = parseExample(example)
echo pLen

The result is:

  It seems like a new term, let's check it...
Yes indeed, it's a term and I captured this: '1'
  It seems like a new term, let's check it...
Yes indeed, it's a term and I captured this: '1'
  It seems like a new term, let's check it...
Yes indeed, it's a term and I captured this: '1'
  It seems like a new term, let's check it...
9

Obviously, $1 is not understood as "the first captured string", but as the string representation of a literal 1.

How can I access the captured strings for use in the enter/leave procs?

PS: Maybe I'm doing it wrong. I intend to use the whole grammar (about 20-30 rules) to parse a file in one shot. Is this approach OK?

Araq (orginal) [2019-04-16T11:42:37+02:00] view original

I don't know. :-) I wouldn't use PEGs for writing a parser, see this thread https://forum.nim-lang.org/t/3881 for alternatives.

gemath (orginal) [2019-04-16T12:11:49+02:00] view original

The docs of eventParser show how the matched portion of the input string s, starting at index start with length length can be accessed inside the handlers. Note that PEGs are matched eagerly, one can only be sure that a match is legitimate if length >= 0 in the leave handler. To use this for actual parsing, it takes some external data structure like a stack or probably your synatx tree.

Alternatively, you could take a look at zevv's NPeg on github.

Amenhotep (orginal) [2019-04-16T20:26:40+02:00] view original

Thank you for your answers. I chose pegs because it says it works with Unicode (and I need this).

I have already researched other options:

NPeg seemed great, I wrote my grammar in it and tested it, it worked (for matching), but I found it lacks enter/leave procs so I don't know how to build my syntactic tree with it. My language includes nested parentheses.

Nimly... frankly, I couldn't understand the docs/examples. :)

Well, it seems I will have to do it by hand (actually, I already have a working parser written by hand in Python, I was just hoping for an automated tool in Nim...). I will use regular expressions. Is this a good plan? Does anyone have some advice on this?

PS: I read the pegs documentation, I saw the example, my question was about using captured substrings (those enclosed in {}) in the enter/leave procs. Not the whole s string with start and length -- that matches the entire rule, not the {} portions.

mashingan (orginal) [2019-04-17T01:44:43+02:00] view original

Haven't tried the event parser, but if only capturing the needed text I usually use find proc

var buffer = newseq[string](10) # crash if the seq not enough
if example =~ grammar:
  discard example.find(grammar, buffer)
  echo buffer

# will print
# ["1", "2", "3", "", "", "", "", "", "", ""]

Amenhotep (orginal) [2019-04-17T10:59:48+02:00] view original

Aha. So you use it basically like a capturing regex. Interesting.

I'm wondering, could I use it like this to parse nested parentheses? They're no more than 4 levels deep. But they can be quite long (thousands of chars).

zevv (orginal) [2019-04-17T19:01:18+02:00] view original

Hi Amenhotep,

Please drop an issue on the NPeg github page with a description and/or some examples of what you're trying to do, I'd be glad to help you out and see if we can get that to work.

https://github.com/zevv/npeg/issues

gemath (orginal) [2019-04-19T11:43:59+02:00] view original

PS: I read the pegs documentation, I saw the example, my question was about using captured substrings (those enclosed in {}) in the enter/leave procs. Not the whole s string with start and length -- that matches the entire rule, not the {} portions.

Ah, ok. Captures are not available in eventParser, a parser grammar is generally defined down to the elements of interest. Maybe @mashingan is right and what you are looking for is more of a matcher than a parser.

Mirror of forum.nim-lang.org

4791 :: pegs module: how to use captures in enter/leave procs?