nimforum mirror - Question about move semantics for objects and seqs

hugogranstrom (orginal) [2020-07-12T10:42:29+02:00] view original

I've started to try to understand more what move semantics is and how and when I can make my code benefit from it. I've watched Araq's talk on it and read a few threads on the forum along with the Destructors page in the docs. But I don't feel like I've really managed to nail down what's the difference between a copy and a move. So I have a few cases that I wanted to ask about regarding objects and seqs. Correct me if I'm saying something wrong in my assumptions here :-)

Case 1: Say we have a purely stack-allocated object:

type
  MyObj = object
    # only int, floats or other stack-allocated objects here

proc consume(x: sink MyObj) =
  # Do something with it
  discard

proc ex1() =
  var x = MyObj()
  var y = x # a move, right? Because x isn't used anymore
  # If I had echoed `x` here, then the above line would have been a copy instead?

proc ex2() =
  var x = MyObj()
  consume(x) # x is moved into consume right away?
  # an echo here would make the above line a copy of x instead?

What exactly is happening in memory on a move compared to copy in this case? Or are they basically the same in this case (except the move zeros the source)?

Case 2: Now let's introduce a object with a seq:

type
  MyObj = object
    children: seq[MyObj]

proc consume(x: sink MyObj) =
  # Do something with it
  discard

proc ex3() =
  var x = createLargeTree() # just creates a deep tree of nested MyObjs, the main thing is that it's big in memory
  var y = x

proc ex4() =
  var x = createLargeTree() # just creates a deep tree of nested MyObjs, the main thing is that it's big in memory
  var y = x # now this is a copy
  consume(x) # x get's moved into consume

My question here is what's happening to the seq field in this case when x is moved. The thing that throws me off is that seqs has value semantics and are deeply copied on "copy" but it's in fact just a stack-pointer to memory on the heap. Depending on how seqs in objects are implemented I can see two ways a move could work:

All the MyObj's fields are just copied directly like
```
y.field = x.field
```
which would mean that the value semantics of seqs come into play and the entire tree is copied along with the object.

Arc is smart and knows how different fields should be treated. It knows that it should just copy the pointer of the seq and not the seq itself to avoid unneccecary copies.

If it's option 2 I think I'm starting to see why move semantics can be more efficient in some cases. A last question: Arc does sink parameter interference, right? Do I ever need to annotate my procs with "sink" or can I trust it to always move when I consume a variable at the end of a proc?

Hopefully my questions are fairly understandable and not too obvious. Thank you in advance! :-D

mratsim (orginal) [2020-07-13T11:41:20+02:00] view original

Case 1

var y = x would be elided and all further reference of y should use x instead in the generated code.

For stack objects this shouldn't really matter because GCC/LLVM should optimize those variable renaming anyway. This is because during code generation, they transform the code in SSA form (Static Single Assignment) meaning no variables are mutable and all changes in values for example a += 1 are transformed into variable assignment a1 = a0 + 1

In consume the x is passed by hidden reference can be mutated, this is useful if you need destructive updates that would be costly if you did a copy first, for example sorting in-place.

Case 2

It's option 2

Arc is smart and knows how different fields should be treated. It knows that it should just copy the pointer of the seq and not the seq itself to avoid unneccecary copies.

Extra

Arc does sink inference but there are still 2 cases where you might want to use explicit sink.

Case 1. To only allow move semantics. Some types can only be moved, in particular all types that represent a resource (memory like GPU memory, open file, socket connection, database handle, lock, message queue). You want to ensure that there is a single owner to those resources throughout the resource lifetime and so you disallow copy but allow moves.

Case 2. Overloading for optimization, to avoid copy/reuse buffers we can have 2 versions of a procedure, one with sink and one without, for example

proc add1(a: seq[int]): seq[int] =
  result.newSeq(a.len) # Extra allocation
  for i in 0 ..< a.len:
    result[i] = a[i] + 1

proc add1(a: sink seq[int]): seq[int] =
  # No allocation, we reuse the buffer
  for i in 0 ..< a.len:
    a[i] += 1
  `=sink`(result, a)

hugogranstrom (orginal) [2020-07-13T14:47:43+02:00] view original

Thank you very much @mratsim! :-D This has cleared up a lot of my questions. It's always fascination how smart compilers can be about reading code and optimizing it. A huge amount of work has obviously been put into them.

One more question about sink inference: Is it infered once per proc or is it done once for every call to the proc? ie. it creates separate signatures for the proc depending on if the argument is sink-able or not. For example:

proc customAdd(s: seq[SomeType], newElement: SomeType) =
  s.add newElement

Here newElement could be sink-able sometimes, but sometimes not. Does the compiler see that it could be sink-able, and adds the sink parameter? Or does it create a new overloaded proc with the sink parameter as you showed in your example?

Araq (orginal) [2020-07-13T15:01:09+02:00] view original

Case 2. Overloading for optimization, to avoid copy/reuse buffers we can have 2 versions of a procedure, one with sink and one without, for example

That's not supported by Nim and so far I haven't seen convincing examples. If you want to take over ownership, you always want to leaving you with the sink T version only.

Mirror of forum.nim-lang.org

6538 :: Question about move semantics for objects and seqs

Case 1

Case 2

Extra