Hi,
I am porting an application from C++ to Nim. While some code is faster in the Nim version after some tweaking (using template instead of func at some points for example), I am having issues with a lambda function (or in Nim it's called closure I believe):
let addrOfState = addr someState
for m in p.iterator(
...,
aFunction = (a, b) => addrOfState.importantTable.get(a, b),
...
):
doStuff()
This introduces something called colonenv_ in the C code which is connected to the calls of interiorAllocatedPtr, markStackAndRegisters and rawNewObj which together seem to cause a 10% slowdown of the application (even though this happens not during the "hottest" path).
I found a workaround, that increases the performance by 10% like this:
let addrOfState = addr someState
for m in p.iterator(
...,
addr someState.importantTable,
...
):
doStuff(m)
Here I am basically calling the function importantTable.get(a, b) from inside the p.iterator. As I said, it solves the performance issue, however I would like to not do it like this, because the p.iterator shouldn't "know", that it is using importantTable. Is there a way to do this, without sacrificing performance?Could you provide an MWE (minimal working example) of your issue?
On a related note, I like the C++ approach to lambdas, which is based on a struct with an overloaded function call method, and makes explicit whether or not variables are captured, and whether they're mutated. IIRC Nim can do something similar as an experimental feature, that is, make objects callable. I wonder if anyone uses this feature?
Slow solution (2.87 seconds):
type State = int
func hereIsAFunction(state: State, m: int): int =
result = m mod state
func blabla(d: int, aFunction: proc(m: int): int {.closure.}): int =
return aFunction(d)
proc world(state: var State, o, depth: int): int =
if depth == 0:
return 200
let addrOfState = addr state
result = blabla(d = 0, aFunction = proc(m: int): int = addrOfState[].hereIsAFunction(m))
result += state.world(o + depth, depth-1) + state.world(o, depth-1)
var state = 4
echo state.world(0, 26)
Fast solution (202.83 milliseconds):
type State = int
func hereIsAFunction(state: State, m: int): int =
result = m mod state
func blabla(d: int, state: State): int =
return state.hereIsAFunction(d)
proc world(state: var State, o, depth: int): int =
if depth == 0:
return 200
result = blabla(d = 0, state = state)
result += state.world(o + depth, depth-1) + state.world(o, depth-1)
var state = 4
echo state.world(0, 26)
the overhead of closures is nothing particular to anonymous functions, it applies to any proc that accesses external state.
if you write your lambda such that it only accesses its parameters, it will be {.nimcall.} and have no more overhead than any other function.
of course, this implies passing your state into the iterator as well
it might look something like this:
import sugar,tables
type
State = object
importantTable: Table[int,string]
StateProc = proc(st:State,key:int,dflt:string):string{.nimcall.}
iterator piter(state:State,aFunction:StateProc):string =
for i in 0..4:
yield aFunction(state,i,"not found")
proc main() =
var someState = State(importantTable: {0:"zero",1:"one",2:"two",4:"four"}.toTable)
for i in piter(
state = someState,
aFunction = (st,a,b) => st.importantTable.getOrDefault(a,b)
):
echo i
main()
or with the experimental and undocumented callOperator which i didn't know about until just now (cheers @bpr): i don't know how far it can be trusted, so let's wrap State in a distinct type:
type StateFunctor = distinct State
template `()`(st:StateFunctor,a:int,b:string):string = State(st).importantTable.getOrDefault(a,b)
iterator qiter(aFunction:StateFunctor):string =
for i in 0..4:
yield aFunction(i,"not found")
proc main() =
var someState = State(importantTable: {0:"zero",1:"one",2:"two",4:"four"}.toTable)
for i in qiter(aFunction = StateFunctor(someState)):
echo i
main()
wouldn't it be brilliant to be able to pass a template to a function!? callOperator gets super close to that.
but it breaks newSeq so caveat emptor
the overhead of closures is nothing particular to anonymous functions, it applies to any proc that accesses external state.
I am not sure I understood you correctly, but this version for example, takes only about 300 milliseconds, too:
type State = int
func hereIsAFunction(state: State, m: int): int =
result = m mod state
var state = 4
proc blabla(d: int): int =
return state.hereIsAFunction(d)
proc world(state: var State, o, depth: int): int =
if depth == 0:
return 200
result = blabla(d = 0)
result += state.world(o + depth, depth-1) + state.world(o, depth-1)
echo state.world(0, 26)
Even though this time blablabla accesses an external variable. By the way, I am not sure if the recursion is necessary in the examples, probably another way of calling the function world very often would show this perfomance issue aswell.sorry, i didn't see your reply with your MWE here's the nimcall version:
i'm almost certain you don't need to use 'State.addr', what's the rationale behind that?
type State = int
func hereIsAFunction(state: State, m: int): int =
result = m mod state
func blabla(d: int, state: State, aFunction: proc(st: State,
m: int): int {.nimcall.}): int =
return aFunction(state, d)
proc world(state: State, o, depth: int): int =
if depth == 0:
return 200
#let addrOfState = addr state
result = blabla(d = 0, state, aFunction = proc(st: State,
m: int): int = st.hereIsAFunction(m))
result += state.world(o + depth, depth-1) + state.world(o, depth-1)
var state = 4
echo state.world(0, 26)
i get 2.4 seconds for your first example, and 0.04 for both your second and this example here.
I used the addr state because in the real world application I need to pass state as var to world, and if I remember correctly, then I got a compiler error that says something like "can't capture state, would violate memory safety".
In your example the function 'blabla` has to know, that State exists. However, I might in future want to pass a function to blabla that doesn't depend on State, so aFunction can't have State anywhere as argument.
right, well, either you pass in State as a parameter, or in a hidden environment in a closure, or as a global.
so if you don't want parameter, and closure too slow, then global it must be. don't pass State as a parameter to any function.
also, I'd listen to the compiler in that situation rather than overruling it.
It seems like --gc:arc or --gc:orc are a bit slower than without any --gc argument. No extra --gc: 2.88 seconds --gc:markAndSweep: 1.77 seconds --gc:arc: 5.46 seconds --gc:orc: 6.01 seconds
(The second/faster example without closure doesn't change much with any of these options)
Perhaps a new closure design inspired by C++ and even Rust is worth considering? IMO a good first step would be making Nims' overloadable call operator non-experimental? After all, closures are a poor mans' object, right? More seriously, I think this design popped up in C++ because it already had the overloaded call operator, and the horribly named Functor idiom. a bit of syntax sugar and compiler magic gave us C++ lambdas.
@tsojtsoj, if the experimental call operator is OK, then you can use something like this
{.experimental: "callOperator".}
type State = int
type Closure = object
addrOfState: ptr State
func hereIsAFunction(state: State, m: int): int =
result = m mod state
proc `()`(c: Closure, m: int):int = return hereIsAFunction(c.addrOfState[], m)
func blabla(d: int, aFunction: Closure): int =
return aFunction(d)
proc world(state: var State, o, depth: int): int =
if depth == 0:
return 200
let c = Closure(addrOfState: addr state)
result = blabla(d = 0, aFunction = c)
result += state.world(o + depth, depth-1) + state.world(o, depth-1)
var state = 4
echo state.world(0, 26)
which I found about 10X faster on my machine than the original slow version you wrote. YMMV of course.
I don't like having to use pragmas like {.nimcall.} and {.closure.}.
Perhaps a new closure design inspired by C++ and Rust is worth considering?
Well we're getting std / tasks which is related.