I would like to discuss a problem I have encountered and for which I have submitted a report on the bug tracker with a different version using newSeqOfCap.
Here is a simple program:
proc p() =
var a = @[0, 1, 2]
let b = a
a.add(3)
echo a # @[0, 1, 2, 3]
echo b # @[0, 1, 2]
p()
The result is logical: a and b are different sequences and modifying a doesn't change b.
Now, a somewhat different program.
proc p() =
var a = @[0, 1, 2, 3]
discard a.pop
let b = a
a.add(5)
echo a # @[0, 1, 2, 5]
echo b # @[0, 1, 2, 5]
p()
It seems that now a and b are sharing some memory. Looking at the generated C code, it appears that in the first case, when adding an element, there is a reallocation. This is not the case in the second program as there is enough room to receive the new value but it is not clear for me why the length of b is modified.
But, what if we don't change the length at all?
proc p() =
var a = @[0, 1, 2, 3]
let b = a
a[^1] = 4
echo a # @[0, 1, 2, 4]
echo b # @[0, 1, 2, 4]
p()
This seems clearly wrong to me. Now if we replace the sequence by an array.
proc p() =
var a = [0, 1, 2, 3]
let b = a
a[^1] = 4
echo a # [0, 1, 2, 4]
echo b # [0, 1, 2, 3]
p()
And looking at the C code, there is clearly a copy, which was expected.
We can also get some odd behavior with parameters.
var a = @[0, 1, 2, 3]
proc p(s: seq[int]) =
echo s # @[0, 1, 2, 3]
a[^1] = 4
echo s # @[0, 1, 2, 4]
p(a)
I think this problem is not likely to happen frequently, but it may cause some troubles. What do you think of it? And how could this been solved?
I ran your code, using 0.17.2, it gave wrong seq in example 1, other than that it's same with your result.
Is that because the compiler infer that variable a and b didn't used at other place so it's safe to be shared? Just a guess tho.
When I change the b into var and modify it, it does copy the seq
proc p =
var a = @[0, 1, 2, 3]
discard a.pop
#let b = a
var b = a
a.add 5
b.add 10
echo a
echo b
p()
I use 0.18.0, so the results may differ, of course. When running in the browser, I get the same results as with version 0.18.0.
Maybe the compiler does some optimization but it cannot consider that a and b are not used in another place: they are used by echo.
Assigning with var works, of course, so, it's clearly an optimization when assigning to a read-only object. I suppose, this has been done for performance reason.
I have tried with version 0.17.2. Indeed, I get a strange result in the first case, i.e.
@[0, 1, 2, 3]
@[54014246935360, 1]
So it seems that some bug has been fixed in version 0.18.0. For the other tests, the results are indeed the same.
Interestingly, this seems to work correctly:
proc p() =
var a = @[0, 1, 2, 3]
discard a.pop
var b = a # note the change from `let` to `var`
a.add(5)
echo a # @[0, 1, 2, 5]
echo b # @[0, 1, 2]
p()
Looks like an optimization gone wrong. Perhaps the intent was for the sharing to happen between two let s.
[EDIT]
Looks like Iscrd beat me to it.
Just thinking out loud:
I image how difficult it would be to fix this one:
var a = @[0, 1, 2, 3]
proc p(s: seq[int]) =
echo s # @[0, 1, 2, 3]
a[^1] = 4
echo s # @[0, 1, 2, 4]
p(a)
Global alias analysis would be required or tons of not needed copies everywhere that will kill performance completely. And alias analysis is not perfect anyway, it gives answer maybe too often resulting is unnecessary copy. It might be better to change the semantic of the language such that it shares always and copy happens only on explicit call to copy() so people know what to expect.
For global variable example, I tried to modify the s and it cannot be compiled.
Since s is considered immutable, so s is shared. However the member is mutable so it becomes like that.
I think the only way to do it is to keep separation between mutable and immutable variable. That's why we can know for sure that immutable can always be shared while mutable always be copied (by default)
Yes, I think the last example is the most annoying one as, to solve it, we have to do a copy which is just something we don't want to do for obvious performance reasons. I have tried to change the parameter type from sequence to openarray with same result. And with an array instead of a sequence, we get the same behavior too. So, changing the semantic for sequences would not be enough, we would have to change the semantic for arrays, too, and kill the whole copy semantic of the language. Not the right way to go, I think.
Maybe a clear distinction between mutable and immutable will indeed solve the issue. The difficulty is to find a balance between the increased complexity of the language and the performance.
You are not alone.
Though in my case I wanted to remove copies when safe and it seemed like seq wrapped in objects always triggered copies.
Anyway, relevant issues:
Blog posts: