Hi there!
I need to wite a parser. I tried to do it using the pegs module. I wrote my grammar and it works, it recognizez strings. But now I need to write some actions to actually build a syntactic tree while parsing. I need to access the captured portions while parsing. Something like:
import strutils, pegs
let grammar = peg """
terms <- ^ term+ $
term <- \letter* {\d+} \n+
"""
let example = """a1
b2
c3
"""
let parseExample = grammar.eventParser:
pkNonTerminal:
enter: # p, s, start
if p.nt.name=="term":
echo " It seems like a new term, let's check it..."
leave: # p, s, start, length (length = -1 in case of failure)
if p.nt.name=="term" and length >= 1:
echo "Yes indeed, it's a term and I captured this: '", $1, "'"
let pLen = parseExample(example)
echo pLen
The result is:
It seems like a new term, let's check it...
Yes indeed, it's a term and I captured this: '1'
It seems like a new term, let's check it...
Yes indeed, it's a term and I captured this: '1'
It seems like a new term, let's check it...
Yes indeed, it's a term and I captured this: '1'
It seems like a new term, let's check it...
9
Obviously, $1 is not understood as "the first captured string", but as the string representation of a literal 1.
How can I access the captured strings for use in the enter/leave procs?
PS: Maybe I'm doing it wrong. I intend to use the whole grammar (about 20-30 rules) to parse a file in one shot. Is this approach OK?
The docs of eventParser show how the matched portion of the input string s, starting at index start with length length can be accessed inside the handlers. Note that PEGs are matched eagerly, one can only be sure that a match is legitimate if length >= 0 in the leave handler. To use this for actual parsing, it takes some external data structure like a stack or probably your synatx tree.
Alternatively, you could take a look at zevv's NPeg on github.
Thank you for your answers. I chose pegs because it says it works with Unicode (and I need this).
I have already researched other options:
Well, it seems I will have to do it by hand (actually, I already have a working parser written by hand in Python, I was just hoping for an automated tool in Nim...). I will use regular expressions. Is this a good plan? Does anyone have some advice on this?
PS: I read the pegs documentation, I saw the example, my question was about using captured substrings (those enclosed in {}) in the enter/leave procs. Not the whole s string with start and length -- that matches the entire rule, not the {} portions.
Haven't tried the event parser, but if only capturing the needed text I usually use find proc
var buffer = newseq[string](10) # crash if the seq not enough
if example =~ grammar:
discard example.find(grammar, buffer)
echo buffer
# will print
# ["1", "2", "3", "", "", "", "", "", "", ""]
Aha. So you use it basically like a capturing regex. Interesting.
I'm wondering, could I use it like this to parse nested parentheses? They're no more than 4 levels deep. But they can be quite long (thousands of chars).
Hi Amenhotep,
Please drop an issue on the NPeg github page with a description and/or some examples of what you're trying to do, I'd be glad to help you out and see if we can get that to work.
PS: I read the pegs documentation, I saw the example, my question was about using captured substrings (those enclosed in {}) in the enter/leave procs. Not the whole s string with start and length -- that matches the entire rule, not the {} portions.
Ah, ok. Captures are not available in eventParser, a parser grammar is generally defined down to the elements of interest. Maybe @mashingan is right and what you are looking for is more of a matcher than a parser.