Need to write parser for custom document format (similar to rdoc/markdown/yaml, with custom blocks), and it's a good opportunity to better learn parsers.
Is there a short and clean example of parser in Nim? There are many parsers written in Nim, but they have different goal - performance and low overhead and so the code looks too complicated and low level, I would prefer the opposite goal, clean and short code with slower performance and overhead.
In case you weren't aware, Crafting Interpreters is a great book and freely available online:
http://craftinginterpreters.com/
It gently walks you through writing a parser (and later an interpreter) for a java-like language using recursive descent (with code in java) and Pratt parsing (in C).
In my experience the code is pretty easily translated to Nim, and you can find a bunch of people's implementations in Nim if your search GitHub for "Nim lox" or "Nim Crafting Interpreters".
This is the pattern I usually go for for dead simple parsers. Here is an example for an extremely limited version of S-expressions with only lists, integers and atoms:
type
SexpKind = enum
List, Integer, Atom
Sexp = object
case kind: SexpKind
of List: elements: seq[Sexp]
of Integer: integer: int
of Atom: atom: string
proc parseInteger(s: string, i: var int): Sexp =
result = Sexp(kind: Integer, integer: 0)
while i < s.len:
let c = s[i]
case c
of '0'..'9':
result.integer = result.integer * 10 + (c.int - '0'.int)
else:
dec i
return
inc i
proc parseAtom(s: string, i: var int): Sexp =
result = Sexp(kind: Atom, atom: "")
while i < s.len:
let c = s[i]
case c
of 'a'..'z', 'A'..'Z', '_', '-':
result.atom.add(c)
else:
dec i
return
inc i
proc parseList(s: string, i: var int): Sexp =
result = Sexp(kind: List, elements: @[])
# consume (
if s[i] != '(': assert false
inc i
while i < s.len:
let c = s[i]
case c
of '0'..'9':
result.elements.add(parseInteger(s, i))
of 'a'..'z', 'A'..'Z', '_', '-':
result.elements.add(parseAtom(s, i))
of '(':
result.elements.add(parseList(s, i))
of ')':
return
else: discard
inc i
proc parseSexp(s: string): seq[Sexp] =
result = @[]
var i = 0
while i < s.len:
let c = s[i]
case c
of '0'..'9':
result.add(parseInteger(s, i))
of 'a'..'z', 'A'..'Z', '_', '-':
result.add(parseAtom(s, i))
of '(':
result.add(parseList(s, i))
else: discard
inc i
echo parseSexp("(foo 1 (bar 2 (3)) 4 ())")
if s[i] != '(': assert false
Why not assert s[i] == '('?
Rather interesting parser generator that emits C/C++: http://www.colm.net/open-source/ragel/
Ragel state machines can not only recognize byte sequences as regular expression machines do, but can also execute code at arbitrary points in the recognition of a regular language. Code embedding is done using inline operators that do not disrupt the regular language syntax.
It'd be cool to make a typesafe nim wrapper on top of it.
Thanks, I'm glad that I asked, as I didn't knew about many these projects, and there are couple that do what I want.
P.S.
The Book - Crafting Interpreters, by Robert Nystrom, seems to be a very interesting, unfortunately it's 600 pages long, so that's for some distant future when have more time.
If someone interested in such things, there's similar but much smaller project called Make a Lisp.
This project also very interesting that you can compare different languages, as it's implemented in like 20 top languages, including Nim. To get a feeling of the language, how short / expressive / complicated it is. In my preference, the shortest and cleanest implementation in Ruby, and it's nice to see that Nim version is almost as short and clean too. Huge difference if you see code for other languages like Java, Rust, etc. to see how different it is.
FWIW, if you're only interested in writing a simple recursive-descent parser that outputs an AST, you can stop after chapter 6 (out of 30).
The rest of the book is about writing an interpreter and bytecode compiler.