I finally found out what was wrong with my code. The thing is it works far faster with Boehm (compared to the default GC I mean), but when trying the command with --gc:boehm it complains that the command does not exist.
Any ideas?
So... any idea?
Your posts are a bit hard to understand, at least for me, and I have no idea on what project you are working and what you intend. What is
when trying the command with --gc:boehm it complains
You can compile your program with something like nim c --gc:boehm myprog.nim to use boehm GC, but I think you know that and have successfully done it, because you wrote that it works faster with boehm?
For setupForeignThreadGc() it seems to use the GC for which your program is compiled, I don't think that we can use multiple GC's in a single program. See
It's an interpreter: https://github.com/arturo-lang/arturo
Basically, so far I had no-GC whatsoever and try to find out what to make the whole project WITH a GC. (And then how to tune it)
This code is one reason why every GC hates your code:
ValueRef {.union.} = ref object
s: string
a: Array
d: Context
f: Function
when not defined(mini):
bi: Int
.union is for C interop only, you put a Nim string inside, kaboom. And that's probably just the top of the iceberg, you also use .computedGoto where it currently is simply ignored and that's even a good thing as you misapply computedGoto...
Well, I guess it is only the tip of the iceberg. However, I have made even set up super-minimal test environments and I'm still struggling with the GC. Here's an example:
import algorithm, system/ansi_c, base64, bitops, hashes, httpClient, json, macros, math, md5, oids, os
import osproc, parsecsv, parseutils, random, re, segfaults, sequtils, sets, std/editdistance
import std/sha1, streams, strformat, strutils, sugar, unicode, tables, terminal
import times, uri
type
#[----------------------------------------
C interface
----------------------------------------]#
yy_buffer_state {.importc.} = ref object
yy_input_file : File
yy_ch_buf : cstring
yy_buf_pos : cstring
yy_buf_size : clong
yy_n_chars : cint
yy_is_our_buffer : cint
yy_is_interactive : cint
yy_at_bol : cint
yy_fill_buffer : cint
yy_buffer_status : cint
# Parser C Interface
proc yyparse(): cint {.importc.}
proc yy_scan_string(str: cstring): yy_buffer_state {.importc.}
proc yy_switch_to_buffer(buff: yy_buffer_state) {.importc.}
proc yy_delete_buffer(buff: yy_buffer_state) {.importc.}
var yyfilename {.importc.}: cstring
var yyin {.importc.}: File
var yylineno {.importc.}: cint
type
ParamKind = enum
RegParam, NumParam
Param = object
case kind: ParamKind:
of RegParam: reg: int
of NumParam: num: int
Statement = ref object
cmd: int
params: seq[Param]
StatementList = ref object
list: seq[Statement]
var
MainProgram {.exportc.} : StatementList
A,B,C,D,E,F,G,H: int
Regs : seq[int] = newSeq[int](8)
Stack : seq[int]
template benchmark*(benchmarkName: string, code: untyped) =
block:
let t0 = epochTime()
code
let elapsed = epochTime() - t0
let elapsedStr = elapsed.formatFloat(format = ffDecimal, precision = 3)
echo "CPU Time [", benchmarkName, "] ", elapsedStr, "s"
proc newStm(cmd: int): Statement {.exportc.} =
Statement(cmd: cmd, params: @[])
proc newStmReg(cmd: int, r0: int): Statement {.exportc.} =
Statement(cmd: cmd, params: @[Param(kind:RegParam, reg: r0)])
proc newStmNum(cmd: int, n0: int): Statement {.exportc.} =
Statement(cmd: cmd, params: @[Param(kind:NumParam, num: n0)])
proc newStmRegReg(cmd: int, r0: int, r1: int): Statement {.exportc.} =
Statement(cmd: cmd, params: @[Param(kind:RegParam, reg: r0),Param(kind:RegParam, reg: r1)])
proc newStmRegNum(cmd: int, r0: int, n1: int): Statement {.exportc.} =
Statement(cmd: cmd, params: @[Param(kind:RegParam, reg: r0),Param(kind:NumParam, num: n1)])
proc newStmRegRegNum(cmd: int, r0: int, r1: int, n2: int): Statement {.exportc.} =
Statement(cmd: cmd, params: @[Param(kind:RegParam, reg: r0),Param(kind:RegParam, reg: r1),Param(kind:NumParam, num: n2)])
proc newStmRegNumNum(cmd: int, r0: int, n1: int, n2: int): Statement {.exportc.} =
Statement(cmd: cmd, params: @[Param(kind:RegParam, reg: r0),Param(kind:NumParam, num: n1),Param(kind:NumParam, num: n2)])
proc newStmListWithStm(stm: Statement): StatementList {.exportc.} =
StatementList(list: @[stm])
proc addStmToList(lst: StatementList, stm: Statement) {.exportc.} =
GC_ref(lst)
lst.list.add(stm)
proc setStatements(stms: StatementList) {.exportc.} =
MainProgram = stms
template R0():untyped {.dirty.} =
Regs[stm.params[0].reg]
template R1():untyped {.dirty.} =
Regs[stm.params[1].reg]
template R2():untyped {.dirty.} =
Regs[stm.params[2].reg]
template N0():untyped {.dirty.} =
stm.params[0].num
template N1():untyped {.dirty.} =
stm.params[1].num
template N2():untyped {.dirty.} =
stm.params[2].num
template IS_REG(x: int):bool {.dirty.} =
stm.params[x].kind == RegParam
template IS_NUM(x: int):bool {.dirty.} =
stm.params[x].kind == NumParam
when isMainModule:
let scriptPath = commandLineParams()[0]
yylineno = 0
yyfilename = scriptPath
#Reg = initTable[string,int]()
discard open(yyin, scriptPath)
MainProgram = StatementList(list: @[])
#setupForeignThreadGc()
benchmark "parse:":
discard yyparse()
#tearDownForeignThreadGc()
benchmark "execute:":
var line: int = 0
while line<MainProgram.list.len:
var stm = MainProgram.list[line]
{.computedGoTo.}
case stm.cmd
of 0:
if IS_REG(1): R0 = R1
else: R0 = N1
of 1: inc(R0)
of 2:
if IS_REG(1) and IS_REG(2): R0 = R1 + R2
else:
if IS_REG(1): R0 = R1 + N2
else:
if IS_REG(2): R0 = N1 + R2
else: R0 = N1 + N2
of 3:
if IS_REG(1) and IS_REG(2): R0 = R1 - R2
else:
if IS_REG(1): R0 = R1 - N2
else:
if IS_REG(2): R0 = N1 - R2
else: R0 = N1 - N2
of 4:
if IS_REG(1) and IS_REG(2): R0 = R1 * R2
else:
if IS_REG(1): R0 = R1 * N2
else:
if IS_REG(2): R0 = N1 * R2
else: R0 = N1 * N2
of 5:
if IS_REG(1) and IS_REG(2): R0 = R1 div R2
else:
if IS_REG(1): R0 = R1 div N2
else:
if IS_REG(2): R0 = N1 div R2
else: R0 = N1 div N2
of 6:
if IS_REG(1) and IS_REG(2): R0 = R1 mod R2
else:
if IS_REG(1): R0 = R1 mod N2
else:
if IS_REG(2): R0 = N1 mod R2
else: R0 = N1 mod N2
of 7:
if IS_REG(0):
echo R0
else:
echo N0
of 8:
if IS_REG(1):
if R0==R1: line = N2-1; continue
else:
if R0==N1: line = N2-1; continue
of 9:
if IS_REG(1):
if R0!=R1: line = N2-1; continue
else:
if R0!=N1: line = N2-1; continue
of 10:
if IS_REG(1):
if R0>R1: line = N2-1; continue
else:
if R0>N1: line = N2-1; continue
of 11:
if IS_REG(1):
if R0<R1: line = N2-1; continue
else:
if R0<N1: line = N2-1; continue
of 12:
if IS_REG(1):
if R0>=R1: line = N2-1; continue
else:
if R0>=N1: line = N2-1; continue
of 13:
if IS_REG(1):
if R0<=R1: line = N2-1; continue
else:
if R0<=N1: line = N2-1; continue
of 14:
line = N0-1
continue
of 15:
Stack.add(R0)
of 16:
R0 = Stack.pop()
else: discard
inc(line)
echo "StatementCount: ", MainProgram.list.len
The only tricky part about the above is that it interacts with Flex/Bison. When the source code was in D, all I'd have to do is the equivalent of GC_ref for the 2 object I'm GC_ref -ing here.
Now, in Nim, the example works fine with the default GC (although I fail to see how it frees any memory tbh), but when switching to Boehm, in a large sample input (900k+ lines), 90% of it seems to have disappeared, while at times it might work.
Now we are getting there:
type
#[----------------------------------------
C interface
----------------------------------------]#
yy_buffer_state {.importc.} = ref object
yy_input_file : File
yy_ch_buf : cstring
yy_buf_pos : cstring
yy_buf_size : clong
yy_n_chars : cint
yy_is_our_buffer : cint
yy_is_interactive : cint
yy_at_bol : cint
yy_fill_buffer : cint
yy_buffer_status : cint
# Parser C Interface
proc yyparse(): cint {.importc.}
proc yy_scan_string(str: cstring): yy_buffer_state {.importc.}
is wrong too as the yy_buffer_state is not a ref, it's a ptr!
Well, speaking of... I guess it's my lucky day. Also, the motto that says that "only when you ask a question, do you fully understand the issue" seems rather confirmed.
So... I'm very very close to have it all working as before (=before attempting to turn the GC on). There were missing bits here and there (quite difficult to list them all here since they're too project-specific), but I think I will manage to fine-tune it.
For now, I'm not extremely happy with the timings (I run a set of benchmarks to see how it compares with other interpreted language interpreters and incorporating the GC slowed it down a bit in some of the cases - but regarding memory management, I'm more than happy!)
Thanks ;-)
For the sake of comparison, here are the results (the tests are micro-benchmarks for isolated parts I'm trying to check, but very intensive ones for that matter).
--gc:none
https://gist.github.com/drkameleon/099c1d373367877fc3bc8c48067ae09f
--gc:refc
https://gist.github.com/drkameleon/caa1a3d5087dcabfff8a8158d18e9011
I don't know where your bottlenecks are but the if/else statements inside the opcodes dispatch looks incredibly suspicious for performance but then I didn't write a register VM yet so take it with a grain of salt.
Maybe you should have specialized switch:
And a dispatcher that send to those.
I'm pretty sure your interpreter is struggling with branch mispredictions here.
The VM code above is just an experiment.
For now, the interpreter is a plain tree-walking interpreter.
It sure has its issues and I'm trying to find my way around, but things are getting better and better and the results so far are more than satisfying :)