I've started to try to understand more what move semantics is and how and when I can make my code benefit from it. I've watched Araq's talk on it and read a few threads on the forum along with the Destructors page in the docs. But I don't feel like I've really managed to nail down what's the difference between a copy and a move. So I have a few cases that I wanted to ask about regarding objects and seqs. Correct me if I'm saying something wrong in my assumptions here :-)
Case 1: Say we have a purely stack-allocated object:
type
MyObj = object
# only int, floats or other stack-allocated objects here
proc consume(x: sink MyObj) =
# Do something with it
discard
proc ex1() =
var x = MyObj()
var y = x # a move, right? Because x isn't used anymore
# If I had echoed `x` here, then the above line would have been a copy instead?
proc ex2() =
var x = MyObj()
consume(x) # x is moved into consume right away?
# an echo here would make the above line a copy of x instead?
What exactly is happening in memory on a move compared to copy in this case? Or are they basically the same in this case (except the move zeros the source)?
Case 2: Now let's introduce a object with a seq:
type
MyObj = object
children: seq[MyObj]
proc consume(x: sink MyObj) =
# Do something with it
discard
proc ex3() =
var x = createLargeTree() # just creates a deep tree of nested MyObjs, the main thing is that it's big in memory
var y = x
proc ex4() =
var x = createLargeTree() # just creates a deep tree of nested MyObjs, the main thing is that it's big in memory
var y = x # now this is a copy
consume(x) # x get's moved into consume
My question here is what's happening to the seq field in this case when x is moved. The thing that throws me off is that seqs has value semantics and are deeply copied on "copy" but it's in fact just a stack-pointer to memory on the heap. Depending on how seqs in objects are implemented I can see two ways a move could work:
y.field = x.field
which would mean that the value semantics of seqs come into play and the entire tree is copied along with the object.If it's option 2 I think I'm starting to see why move semantics can be more efficient in some cases. A last question: Arc does sink parameter interference, right? Do I ever need to annotate my procs with "sink" or can I trust it to always move when I consume a variable at the end of a proc?
Hopefully my questions are fairly understandable and not too obvious. Thank you in advance! :-D
var y = x would be elided and all further reference of y should use x instead in the generated code.
For stack objects this shouldn't really matter because GCC/LLVM should optimize those variable renaming anyway. This is because during code generation, they transform the code in SSA form (Static Single Assignment) meaning no variables are mutable and all changes in values for example a += 1 are transformed into variable assignment a1 = a0 + 1
In consume the x is passed by hidden reference can be mutated, this is useful if you need destructive updates that would be costly if you did a copy first, for example sorting in-place.
It's option 2
Arc is smart and knows how different fields should be treated. It knows that it should just copy the pointer of the seq and not the seq itself to avoid unneccecary copies.
Arc does sink inference but there are still 2 cases where you might want to use explicit sink.
Case 1. To only allow move semantics. Some types can only be moved, in particular all types that represent a resource (memory like GPU memory, open file, socket connection, database handle, lock, message queue). You want to ensure that there is a single owner to those resources throughout the resource lifetime and so you disallow copy but allow moves.
Case 2. Overloading for optimization, to avoid copy/reuse buffers we can have 2 versions of a procedure, one with sink and one without, for example
proc add1(a: seq[int]): seq[int] =
result.newSeq(a.len) # Extra allocation
for i in 0 ..< a.len:
result[i] = a[i] + 1
proc add1(a: sink seq[int]): seq[int] =
# No allocation, we reuse the buffer
for i in 0 ..< a.len:
a[i] += 1
`=sink`(result, a)
Thank you very much @mratsim! :-D This has cleared up a lot of my questions. It's always fascination how smart compilers can be about reading code and optimizing it. A huge amount of work has obviously been put into them.
One more question about sink inference: Is it infered once per proc or is it done once for every call to the proc? ie. it creates separate signatures for the proc depending on if the argument is sink-able or not. For example:
proc customAdd(s: seq[SomeType], newElement: SomeType) =
s.add newElement
Here newElement could be sink-able sometimes, but sometimes not. Does the compiler see that it could be sink-able, and adds the sink parameter? Or does it create a new overloaded proc with the sink parameter as you showed in your example? Case 2. Overloading for optimization, to avoid copy/reuse buffers we can have 2 versions of a procedure, one with sink and one without, for example
That's not supported by Nim and so far I haven't seen convincing examples. If you want to take over ownership, you always want to leaving you with the sink T version only.