Is it possible to convert a string containing valid nimrod code to a nimrod AST? I've been looking in module macros, and found parseStmt(...), but it gives me internal errors.
Need this for some source code analysis, it seems overkill to write a custom lexer/parser.
Internal errors are bugs (but often of minor importance for now) ... However parseStmt() et. al only work at compile time.
For source code analysis you'll probably want to use the lexer/parser/semantic checker of the compiler itself which is an 'import' away since the compiler is written in Nimrod itself... There are docs for it too (but a bit sparse): http://nimrod-code.org/intern.html#the-compiler-s-architecture
can I revive this question? How to import the compiler parser module and its procs in nim 2.0? E.g. parseString() seems to be the right thing to parse a string into an AST. Is it possible to do something like:
import parser
let
code_string = "echo \"Hello world!\""
let
code_ast = parseString(code_string)
echo code_ast
The compiler cannot find the parser module:
$ nim c -r parse_nimcode.nim
/.../parse_nimcode.nim(1, 8) Error: cannot open file: parser
Is it under some intern/compilertools/parser ?
Aha, I just had to install nimble install compiler. Otherwise it did not find the package compiler/parser:
$ cat parse_nimcode.nim
import compiler/parser
...
$ nim c -r parse_nimcode.nim
Hint: used config file '/.../.local/nim/config/nim.cfg' [Conf]
Hint: used config file '/.../.local/nim/config/config.nims' [Conf]
......................................................................
/.../parse_nimcode.nim(1, 16) Error: cannot open file: compiler/parser
But now it complains about a mismatch:
import compiler/parser
let
code_string: string = "echo \"Hello world!\""
let
code_ast = parseString(code_string)
echo code_ast
$ nim c -r parse_nimcode.nim
...
/.../parse_nimcode.nim(7, 25) Error: type mismatch
Expression: parseString(code_string)
[1] code_string: string
Expected one of (first mismatch at [position]):
[2] proc parseString(s: string; cache: IdentCache; config: ConfigRef;
filename: string = ""; line: int = 0;
errorHandler: ErrorHandler = nil): PNode
It probably needs those cache: IdentCache and config: ConfigRef in the arguments.
OK, this works:
import compiler/parser
import compiler/idents
import compiler/options
let
code_string: string = "echo \"Hello world!\""
let
code_ident = IdentCache()
code_conf = ConfigRef()
code_ast = parseString(code_string, code_ident, code_conf)
#code_ast = parseString(code_string)
echo code_ast.repr
# code_ast.treeRepr - unfortunately:
# Error: undeclared field: 'treeRepr' for type ast.PNode
The output:
PNode(typ: nil, info: TLineInfo(line: 1, col: 0, fileIndex: 0), flags: {}, kind: nkStmtList,
sons: @[PNode(typ: nil, info: TLineInfo(line: 1, col: 0, fileIndex: 0), flags: {}, kind: nkCommand,
sons: @[PNode(typ: nil, info: TLineInfo(line: 1, col: 0, fileIndex: 0), flags: {}, kind: nkIdent, ident: PIdent(id: -1, s: "echo", next: nil, h: -3990281224686746450)),
PNode(typ: nil, info: TLineInfo(line: 1, col: 5, fileIndex: 0), flags: {}, kind: nkStrLit, strVal: "Hello world!")])])
That link is from 2012 and domain/website is no more. Here's an updated link:
https://nim-lang.org/docs/intern.html#the-compiler-s-architecture
Unfortunately, the new URL leads to few information on the compiler architecture for a user like I am. My idea was to use Nim's first-class meta-programming features to plot simple control flow graphs.
From parseString docs, PNode is actually the proper AST, not NimNode. Also, PNode is really a ref to `TNode`, which is defined in `compiler/ast`.
But I have some problem running parseString on a code where let x = 5 spans 2 separate lines:
import compiler/parser
import compiler/idents
import compiler/options
import compiler/ast
let
code_string: string = """
echo "Hello world!"
#let x = 5 # <---------- works
# the following breaks:
let
x = 5
proc foo() =
echo "bar"
foo()
"""
echo code_string
let
code_ident = IdentCache()
code_conf = ConfigRef()
code_ast = parseString(code_string, code_ident, code_conf)
let
offset_by: uint = 2
import strutils
proc recurse_repr(ast_node: PNode, offset: uint = 0) =
echo repeat(" ", offset_by*offset), ast_node.typ.repr, ast_node.kind.repr
#for s in ast_node.sons:
# recurse_repr(s, offset+1)
# how to test if the node contains `sons`?
# it is hardcoded in the `kind`s?
# https://nim-lang.org/docs/compiler/ast.html#TNode
try:
for s in ast_node.sons:
recurse_repr(s, offset+1)
except:
discard
recurse_repr(code_ast)
When let x_var = 5 spans 2 lines, I get a segfault:
Traceback (most recent call last)
//parse_nim_code.nim(40) parse_nim_code
//nim-2.0.4-f81677aa0cf688e364f894141dee30ca8c39065c/compiler/parser.nim(2579) parseString
//nim-2.0.4-f81677aa0cf688e364f894141dee30ca8c39065c/compiler/ast.nim(2528) parseAll
//nim-2.0.4-f81677aa0cf688e364f894141dee30ca8c39065c/compiler/parser.nim(173) parMessage
//nim-2.0.4-f81677aa0cf688e364f894141dee30ca8c39065c/compiler/lexer.nim(236) lexMessageTok
//nim-2.0.4-f81677aa0cf688e364f894141dee30ca8c39065c/compiler/lexer.nim(227) dispMessage
//nim-2.0.4-f81677aa0cf688e364f894141dee30ca8c39065c/compiler/msgs.nim(586) liMessage
//nim-2.0.4-f81677aa0cf688e364f894141dee30ca8c39065c/compiler/msgs.nim(445) handleError
//nim-2.0.4-f81677aa0cf688e364f894141dee30ca8c39065c/compiler/msgs.nim(417) quit
//nim-2.0.4-f81677aa0cf688e364f894141dee30ca8c39065c/compiler/options.nim(627) isDefined
//nim/2.0.4/nim/lib/pure/strtabs.nim(211) hasKey
//nim/2.0.4/nim/lib/pure/strtabs.nim(137) rawGet
//nim/2.0.4/nim/lib/pure/strtabs.nim(118) myhash
SIGSEGV: Illegal storage access. (Attempt to read from nil?)
Error: execution of an external program failed: '//parse_nim_code'
Can I change something in ConfigRef() that goes into parseString() to make it work? I.e. I'd like to make it work on any regular Nim code.
Yes, the goal was the control flow of the program.
Yeah, looking more at what PNode tree contains, I see that it is really just syntax. (The if statement PNode contains things like nkIdent ":".) I need the _semantic tree of the code. Basically, like Lisp's braces. If I understand Nim right, NimNodes must be it. And, I bet, the compiler module does have some representation of the semantics somewhere inside.
I.e. parseStmt fits the goal exactly. But, as the original question answer from Araq says, it works only at compile time.
I'll check out npeg. 1 2. From the first glance, it seems like another one of those ultra-formal syntax parsers that CS people make all the time in universities. I am looking for something way simpler, namely a runtime parseStmt. I am not trying to write a thesis on parsing regular expressions here. It is just a toy. I want to convert the code into some standard "GraphML"-like file, that can be understood by any popular software for graphs.
It would also be nice if this "runtime parseStmt" could optionally process the macros in the code and output the _runtime semantic tree.
You seem to be very confused about something, or not communicating what you're trying to do very well (at least I'm pretty lost..)
I didn't propose npeg as a solution to your problem (it's a great parsing library. But not what you need). But rather as a library which is an example of creating graphs out of code.
Why do you want to create this graph on runtime? This sounds a lot like static analysis, not something typically done while the program is running. The parseStmt procedure parses Nim code into NimNodes, and NimNodes only exists in the macros module. The compiler uses PNodes instead, which are analogous to NimNodes. I don't think the compiler has a separate semantic representation of the code, it's all stored in the PNodes if I'm not mistaken.