nimforum mirror - Performance issue with closure

tsojtsoj (orginal) [2021-04-02T23:54:10+02:00] view original

Hi,

I am porting an application from C++ to Nim. While some code is faster in the Nim version after some tweaking (using template instead of func at some points for example), I am having issues with a lambda function (or in Nim it's called closure I believe):


let addrOfState = addr someState
for m in p.iterator(
        ...,
        aFunction = (a, b) => addrOfState.importantTable.get(a, b),
        ...
):
    doStuff()

This introduces something called colonenv_ in the C code which is connected to the calls of interiorAllocatedPtr, markStackAndRegisters and rawNewObj which together seem to cause a 10% slowdown of the application (even though this happens not during the "hottest" path).

I found a workaround, that increases the performance by 10% like this:


let addrOfState = addr someState
for m in p.iterator(
        ...,
        addr someState.importantTable,
        ...
):
    doStuff(m)

Here I am basically calling the function importantTable.get(a, b) from inside the p.iterator. As I said, it solves the performance issue, however I would like to not do it like this, because the p.iterator shouldn't "know", that it is using importantTable. Is there a way to do this, without sacrificing performance?

bpr (orginal) [2021-04-03T00:10:56+02:00] view original

Could you provide an MWE (minimal working example) of your issue?

On a related note, I like the C++ approach to lambdas, which is based on a struct with an overloaded function call method, and makes explicit whether or not variables are captured, and whether they're mutated. IIRC Nim can do something similar as an experimental feature, that is, make objects callable. I wonder if anyone uses this feature?

tsojtsoj (orginal) [2021-04-03T01:43:22+02:00] view original

Slow solution (2.87 seconds):


type State = int

func hereIsAFunction(state: State, m: int): int =
    result = m mod state

func blabla(d: int, aFunction: proc(m: int): int {.closure.}): int =
    return aFunction(d)

proc world(state: var State, o, depth: int): int =
    if depth == 0:
        return 200
    
    let addrOfState = addr state
    result = blabla(d = 0, aFunction = proc(m: int): int = addrOfState[].hereIsAFunction(m))
    
    result += state.world(o + depth, depth-1) + state.world(o, depth-1)

var state = 4

echo state.world(0, 26)

Fast solution (202.83 milliseconds):


type State = int

func hereIsAFunction(state: State, m: int): int =
    result = m mod state

func blabla(d: int, state: State): int =
    return state.hereIsAFunction(d)

proc world(state: var State, o, depth: int): int =
    if depth == 0:
        return 200
    
    result = blabla(d = 0, state = state)
    
    result += state.world(o + depth, depth-1) + state.world(o, depth-1)

var state = 4

echo state.world(0, 26)

shirleyquirk (orginal) [2021-04-03T01:51:35+02:00] view original

the overhead of closures is nothing particular to anonymous functions, it applies to any proc that accesses external state.

if you write your lambda such that it only accesses its parameters, it will be {.nimcall.} and have no more overhead than any other function.

of course, this implies passing your state into the iterator as well

it might look something like this:

import sugar,tables
type
  State = object
    importantTable: Table[int,string]
  StateProc = proc(st:State,key:int,dflt:string):string{.nimcall.}

iterator piter(state:State,aFunction:StateProc):string =
  for i in 0..4:
    yield aFunction(state,i,"not found")

proc main() =
  var someState = State(importantTable: {0:"zero",1:"one",2:"two",4:"four"}.toTable)
  for i in piter(
         state = someState,
         aFunction = (st,a,b) => st.importantTable.getOrDefault(a,b)
         ):
    echo i

main()

or with the experimental and undocumented callOperator which i didn't know about until just now (cheers @bpr): i don't know how far it can be trusted, so let's wrap State in a distinct type:

type StateFunctor = distinct State
template `()`(st:StateFunctor,a:int,b:string):string = State(st).importantTable.getOrDefault(a,b)

iterator qiter(aFunction:StateFunctor):string =
  for i in 0..4:
    yield aFunction(i,"not found")

proc main() =
  var someState = State(importantTable: {0:"zero",1:"one",2:"two",4:"four"}.toTable)
  for i in qiter(aFunction = StateFunctor(someState)):
    echo i

main()

wouldn't it be brilliant to be able to pass a template to a function!? callOperator gets super close to that.

but it breaks newSeq so caveat emptor

tsojtsoj (orginal) [2021-04-03T03:06:03+02:00] view original

the overhead of closures is nothing particular to anonymous functions, it applies to any proc that accesses external state.

I am not sure I understood you correctly, but this version for example, takes only about 300 milliseconds, too:


type State = int

func hereIsAFunction(state: State, m: int): int =
    result = m mod state

var state = 4

proc blabla(d: int): int =
    return state.hereIsAFunction(d)

proc world(state: var State, o, depth: int): int =
    if depth == 0:
        return 200
    
    result = blabla(d = 0)
    
    result += state.world(o + depth, depth-1) + state.world(o, depth-1)

echo state.world(0, 26)

Even though this time blablabla accesses an external variable. By the way, I am not sure if the recursion is necessary in the examples, probably another way of calling the function world very often would show this perfomance issue aswell.

Yardanico (orginal) [2021-04-03T03:09:29+02:00] view original

In your case state is a global variable, so it doesn't have to be captured and blabla is still nimcall.

shirleyquirk (orginal) [2021-04-03T03:21:43+02:00] view original

sorry, i didn't see your reply with your MWE here's the nimcall version:

i'm almost certain you don't need to use 'State.addr', what's the rationale behind that?

type State = int

func hereIsAFunction(state: State, m: int): int =
    result = m mod state

func blabla(d: int, state: State, aFunction: proc(st: State,
        m: int): int {.nimcall.}): int =
    return aFunction(state, d)

proc world(state: State, o, depth: int): int =
    if depth == 0:
        return 200
    
    #let addrOfState = addr state
    result = blabla(d = 0, state, aFunction = proc(st: State,
            m: int): int = st.hereIsAFunction(m))
    
    result += state.world(o + depth, depth-1) + state.world(o, depth-1)

var state = 4

echo state.world(0, 26)

i get 2.4 seconds for your first example, and 0.04 for both your second and this example here.

tsojtsoj (orginal) [2021-04-03T03:34:52+02:00] view original

I used the addr state because in the real world application I need to pass state as var to world, and if I remember correctly, then I got a compiler error that says something like "can't capture state, would violate memory safety".

In your example the function 'blabla` has to know, that State exists. However, I might in future want to pass a function to blabla that doesn't depend on State, so aFunction can't have State anywhere as argument.

shirleyquirk (orginal) [2021-04-03T03:56:00+02:00] view original

right, well, either you pass in State as a parameter, or in a hidden environment in a closure, or as a global.

so if you don't want parameter, and closure too slow, then global it must be. don't pass State as a parameter to any function.

also, I'd listen to the compiler in that situation rather than overruling it.

Araq (orginal) [2021-04-03T06:45:03+02:00] view original

Closures have been so hard to implement correctly that no resources were left for an optimizing implementation. (And Nim's closure design predates C++11, iirc.) Eventually the compiler will get better, in the meantime please use the shown workarounds.

tsojtsoj (orginal) [2021-04-03T15:03:41+02:00] view original

For anyone who can't easily use one of these workarounds, using --gc:markAndSweep might improve the performance a bit.

Araq (orginal) [2021-04-03T16:00:11+02:00] view original

As does using --gc:arc or --gc:orc...

tsojtsoj (orginal) [2021-04-03T18:07:24+02:00] view original

It seems like --gc:arc or --gc:orc are a bit slower than without any --gc argument. No extra --gc: 2.88 seconds --gc:markAndSweep: 1.77 seconds --gc:arc: 5.46 seconds --gc:orc: 6.01 seconds

(The second/faster example without closure doesn't change much with any of these options)

Yardanico (orginal) [2021-04-03T18:12:01+02:00] view original

Just FYI - "no extra gc" then the refc is used (because it's the default), so providing no gc is the same as providing --gc:refc

bpr (orginal) [2021-04-04T02:41:29+02:00] view original

Perhaps a new closure design inspired by C++ and even Rust is worth considering? IMO a good first step would be making Nims' overloadable call operator non-experimental? After all, closures are a poor mans' object, right? More seriously, I think this design popped up in C++ because it already had the overloaded call operator, and the horribly named Functor idiom. a bit of syntax sugar and compiler magic gave us C++ lambdas.

@tsojtsoj, if the experimental call operator is OK, then you can use something like this

{.experimental: "callOperator".}

type State = int
type Closure = object
    addrOfState: ptr State


func hereIsAFunction(state: State, m: int): int =
    result = m mod state

proc `()`(c: Closure, m: int):int = return hereIsAFunction(c.addrOfState[], m)

func blabla(d: int, aFunction: Closure): int =
    return aFunction(d)

proc world(state: var State, o, depth: int): int =
    if depth == 0:
        return 200
    
    let c = Closure(addrOfState: addr state)
    result = blabla(d = 0, aFunction = c)
    
    result += state.world(o + depth, depth-1) + state.world(o, depth-1)

var state = 4

echo state.world(0, 26)

which I found about 10X faster on my machine than the original slow version you wrote. YMMV of course.

I don't like having to use pragmas like {.nimcall.} and {.closure.}.

Araq (orginal) [2021-04-04T08:05:04+02:00] view original

Perhaps a new closure design inspired by C++ and Rust is worth considering?

Well we're getting std / tasks which is related.

Mirror of forum.nim-lang.org

7738 :: Performance issue with closure