Hello,
I have been doint last winter's Advent of Code puzzles in order to practice Nim, and I am stuck at the day 5 puzzle. I use a PE grammar to parse the input; I could have used regular expressions but since the puzzles are going to become progressively harder I figured I might as well learn PEG while the input is still simple enough.
Here is the sample input:
0,9 -> 5,9
8,0 -> 0,8
9,4 -> 3,4
2,2 -> 2,1
7,0 -> 7,4
6,4 -> 2,0
0,9 -> 2,9
3,4 -> 1,4
0,0 -> 8,8
5,5 -> 8,2
Each line of text describes a line: first come the x- and y coordinate of the first point, then an arrow, and finally the coordinates of the last point. My grammar is:
let
grammar = peg"""
lines <- line (\n line)* \n?
line <- first \s+ '->' \s+ last
first <- point
last <- point
point <- x ',' y
x <- coordinate
y <- coordinate
coordinate <- number
number <- \d+
"""
The problem is that if the text file contains a trailing line feed Nim will try to parse the line feed as a number as well. I don't know why, because the last newline of the grammar is optional. The grammar does match, but the event parser throws an error because it cannot parse the last "number", which is just an empty string. Here is my event parser:
type
Coordinate* = distinct Natural
Point* = object
x*, y*: Coordinate
Line* = object
first*, last*: Point
converter to_coordinate*(n: int): Coordinate =
Coordinate(n)
proc parse_lines*(input: string): seq[Line] =
var
lines: seq[Line]
let
parser = block:
var
line: Line
point: Point
x, y, coordinate: Coordinate
grammar.eventParser:
pkNonTerminal:
leave:
let matchstr = s.substr(start, start + length - 1)
case p.nt.name
of "line":
lines.add(line)
of "first":
line.first = point
of "last":
line.last = point
of "point":
point = Point(x: x, y: y)
of "x":
x = coordinate
of "y":
y = coordinate
of "number":
debugEcho "[", matchstr, "]"
coordinate = matchstr.parseInt()
if parser(input) != len(input):
raise newException(ValueError, "Error parsing input string")
lines
discard parse_lines(input) # where input is a text string read from a file
The debugEcho is there for me to verify that all the numbers have been parsed, and it does indeed print all 40 numbers, plus an extra empty pair of brackets. I cannot see anything wrong here, the grammar is pretty straight forward. I solved the previous day's challenge without issue in a similar way. Any idea what is going on here?
Your grammar is wrong, PEGs are deterministic and this rule
lines <- line (\n line)* \n?
can only cause problems.
let
grammar* = peg"""
spec <- draws (separator board)* \n?
separator <- \n \n
draws <- draw (',' draw)*
draw <- number
board <- row \n row \n row \n row \n row
row <- \s* entry \s+ entry \s+ entry \s+ entry \s+ entry
entry <- number
number <- \d+
"""
Do you have a resource to learn from? I don't want to waste your time with non-Nim topics.
PEGs are a strict representation of the simple imperative code that you would write if you were writing a parser by hand.
The documentation of eventParser says that leave handlers are called with length set to -1 for an unsuccessful match. Your code tries to parse an unsuccessful line match in the (\n line)* expression, which cannot work. If the body of the leave block is wrapped in a if length > -1:, it works.
This is not strictly backtracking, since the parser only tries to peek one character ahead after the \n in the expression to find out that the match is unsuccessful, there is no buffer to be rewound and re-read. So the grammar works, but it is pretty regex-y. An IMHO more idiomatic rule would be
lines <- (line (\n / $))+
Thank you, that solved the problem. So if I understand correctly, the parser is trying to match pattern \n line, finds the \n, then tries matching line, which means it tries matching first, point, x, coordinated and finally number, which fails and causes the parser to leave that node.
But then, why does the previous day's code work? Is it because there are no false matches there since the separator is \n \n?
But then, why does the previous day's code work? Is it because there are no false matches there since the separator is n n?
I think so.
it tries matching first, point, x, coordinated and finally number, which fails and causes the parser to leave that node.
Yes, but to be precise: every entered node is left (i.e. its leave handler is executed) eventually, regardless of whether the match succeeded or not. The handler just blows up when parseInt is called on an empty string because of the unsuccessful match.