nimforum mirror - Wrong copy of sequences?

lscrd (orginal) [2018-03-17T20:03:16+01:00] view original

I would like to discuss a problem I have encountered and for which I have submitted a report on the bug tracker with a different version using newSeqOfCap.

Here is a simple program:

proc p() =
  var a = @[0, 1, 2]
  let b = a
  a.add(3)
  echo a  # @[0, 1, 2, 3]
  echo b  # @[0, 1, 2]

p()

The result is logical: a and b are different sequences and modifying a doesn't change b.

Now, a somewhat different program.

proc p() =
  var a = @[0, 1, 2, 3]
  discard a.pop
  let b = a
  a.add(5)
  echo a  # @[0, 1, 2, 5]
  echo b  # @[0, 1, 2, 5]

p()

It seems that now a and b are sharing some memory. Looking at the generated C code, it appears that in the first case, when adding an element, there is a reallocation. This is not the case in the second program as there is enough room to receive the new value but it is not clear for me why the length of b is modified.

But, what if we don't change the length at all?

proc p() =
  var a = @[0, 1, 2, 3]
  let b = a
  a[^1] = 4
  echo a  # @[0, 1, 2, 4]
  echo b  # @[0, 1, 2, 4]

p()

This seems clearly wrong to me. Now if we replace the sequence by an array.

proc p() =
  var a = [0, 1, 2, 3]
  let b = a
  a[^1] = 4
  echo a  # [0, 1, 2, 4]
  echo b  # [0, 1, 2, 3]

p()

And looking at the C code, there is clearly a copy, which was expected.

We can also get some odd behavior with parameters.

var a = @[0, 1, 2, 3]

proc p(s: seq[int]) =
  echo s  # @[0, 1, 2, 3]
  a[^1] = 4
  echo s  # @[0, 1, 2, 4]

p(a)

I think this problem is not likely to happen frequently, but it may cause some troubles. What do you think of it? And how could this been solved?

mashingan (orginal) [2018-03-17T20:28:44+01:00] view original

I ran your code, using 0.17.2, it gave wrong seq in example 1, other than that it's same with your result.

Is that because the compiler infer that variable a and b didn't used at other place so it's safe to be shared? Just a guess tho.

When I change the b into var and modify it, it does copy the seq

proc p =
  var a = @[0, 1, 2, 3]
  discard a.pop
  #let b = a
  var b = a
  a.add 5
  b.add 10
  echo a
  echo b

p()

lscrd (orginal) [2018-03-17T20:55:17+01:00] view original

I use 0.18.0, so the results may differ, of course. When running in the browser, I get the same results as with version 0.18.0.

Maybe the compiler does some optimization but it cannot consider that a and b are not used in another place: they are used by echo.

Assigning with var works, of course, so, it's clearly an optimization when assigning to a read-only object. I suppose, this has been done for performance reason.

I have tried with version 0.17.2. Indeed, I get a strange result in the first case, i.e.


@[0, 1, 2, 3]
@[54014246935360, 1]

So it seems that some bug has been fixed in version 0.18.0. For the other tests, the results are indeed the same.

StasB (orginal) [2018-03-17T21:08:24+01:00] view original

Interestingly, this seems to work correctly:

proc p() =
  var a = @[0, 1, 2, 3]
  discard a.pop
  var b = a # note the change from `let` to `var`
  a.add(5)
  echo a  # @[0, 1, 2, 5]
  echo b  # @[0, 1, 2]

p()

Looks like an optimization gone wrong. Perhaps the intent was for the sharing to happen between two let s.

[EDIT]

Looks like Iscrd beat me to it.

lscrd (orginal) [2018-03-17T21:15:58+01:00] view original

Yes, I know it works with var. I have once written a statement let b = a in a program, a being a sequence, with a comment # No copy. Reading this some months later, I was not sure that there is actually no copy (and I thought that, in fact, a copy is needed). So I done some checks and found these strange behaviors.

cdome (orginal) [2018-03-17T23:13:28+01:00] view original

Just thinking out loud:

I image how difficult it would be to fix this one:

var a = @[0, 1, 2, 3]

proc p(s: seq[int]) =
  echo s  # @[0, 1, 2, 3]
  a[^1] = 4
  echo s  # @[0, 1, 2, 4]

p(a)

Global alias analysis would be required or tons of not needed copies everywhere that will kill performance completely. And alias analysis is not perfect anyway, it gives answer maybe too often resulting is unnecessary copy. It might be better to change the semantic of the language such that it shares always and copy happens only on explicit call to copy() so people know what to expect.

mashingan (orginal) [2018-03-17T23:26:33+01:00] view original

For global variable example, I tried to modify the s and it cannot be compiled.

Since s is considered immutable, so s is shared. However the member is mutable so it becomes like that.

I think the only way to do it is to keep separation between mutable and immutable variable. That's why we can know for sure that immutable can always be shared while mutable always be copied (by default)

lscrd (orginal) [2018-03-18T09:58:41+01:00] view original

Yes, I think the last example is the most annoying one as, to solve it, we have to do a copy which is just something we don't want to do for obvious performance reasons. I have tried to change the parameter type from sequence to openarray with same result. And with an array instead of a sequence, we get the same behavior too. So, changing the semantic for sequences would not be enough, we would have to change the semantic for arrays, too, and kill the whole copy semantic of the language. Not the right way to go, I think.

Maybe a clear distinction between mutable and immutable will indeed solve the issue. The difficulty is to find a balance between the increased complexity of the language and the performance.

Araq (orginal) [2018-03-18T17:41:05+01:00] view original

let was designed to replicate parameter passing semantics so that a template can more easily do "evaluate once" for its parameters. Parameter passing also does not involve (deep) copies. Fixing/changing it will cause performance problems so it's not clear what to do.

lscrd (orginal) [2018-03-20T19:54:55+01:00] view original

Yes, I understand. Indeed, it seemed to me it was an issue quite difficult to solve. But, it is something very unlikely to happen in real programs and, so far, nobody has encountered these problems. I have built these examples when I have suspected that something may be wrong in some cases and this is not actual code in some program I have written.

mratsim (orginal) [2018-03-21T10:18:24+01:00] view original

You are not alone.

Though in my case I wanted to remove copies when safe and it seemed like seq wrapped in objects always triggered copies.

Anyway, relevant issues:

Return by let/const values #6793

Distinguish let/var in assignment overload #6348

Blog posts:

Writetracking

Destructors

Destructors followup

mratsim (orginal) [2018-03-21T12:17:55+01:00] view original

Edit, I forgot about the Borrowing RFC #7373

Mirror of forum.nim-lang.org

3663 :: Wrong copy of sequences?