Hello,
I have a conceptual/style question here.
In C++ (and presumably some other languages), there is a concept called "iterator invalidation". The basic idea is that while iterating a structure, changes to the structure could break iteration, leading to a loss of memory safety. It is also undefined behavior to use an "old" iterator, or one that was created before the structure was updated.
A while ago I switched my former C++ projects to Rust and found that the language's borrowing rules protect against this error:
// In stdlib, oversimplified, generics removed for simplicity:
mod vec {
struct Vec {
}
struct Iter<'a> {
vector: &'a Vec,
//...
}
impl Iterator for Iter {
//...
}
impl Vec {
pub fn iter<'a>(&'a self) -> Iter<'a> {
Iter {
vector: self,
}
}
}
}
// Test
let mut v = vec![1, 2, 3];
for i in &v {
if *i == 2 {
v.clear(); // Yay, compile time error, no unsafety.
}
}
So, in Nim, it is possible to modify a structure you are iterating. For some simple structures, though, this works fine. For example, the following does not crash.
var s = @[1, 2, 3]
for i in s:
echo s
if i == 2:
s.setLen(0) # iteration stops, no unsafety.
However, this seems fairly delicate - the iterator must be implemented in a way that expects the seq to be modified. Perhaps incidentally, this is true.
So my question is:
Should I/can I ever count on an iterator in Nim to handle the structure being changed during iteration?
Not unless the implementation promises to handle it, I suppose. There's nothing implicit in an iterator that stops it from being affected by changes in the structure. e.g.
iterator foo[T](s: seq[T]): T =
var i = 0
while i < s.len:
yield s[i]
inc i
var s = @[1, 2, 3, 4, 5]
for t in foo(s):
echo t
s.add(123)
Is there any way to leverage Nim's type system or metaprogramming capabilities to create an iterator that forbids modifying the structure?
Maybe you could use the effect system somehow. I'm by no means a Nim expert, though, so I don't know it's possible.
Here's a simple proof of concept:
import macros
type Mut = object
proc messUpMySeq(s: var seq[int]) {.tags: [Mut].} =
s.add(123)
iterator foo[T](s: seq[T]): T =
var i = 0
while i < s.len:
yield s[i]
inc i
macro safeFor(binding, it, body): untyped =
quote do:
(proc() {.tags: [].} =
for `binding` in `it`:
`body`)()
var s = @[1, 2, 3, 4, 5]
safeFor t, foo(s):
messUpMySeq(s)
echo t
Unfortunately, this requires a custom for.
[EDIT]
Actually, that's probably nonsense. It will prevent you from calling messUpMySeq on any seq, not just the one being iterated over. Since it uses types for tags, maybe in a dependent type system this could work.
[EDIT2]
Thinking about it some more, my guess is that it would be impossible to catch such mistakes statically in the presence of unrestricted aliasing, because you could create additional references to a data structure based on arbitrary runtime conditions. I guess this is why Nim copies seqs on assignment. You could set some flag in the data structure dynamically from the iterator and check for it in mutating procs, though.
Cool. Thanks for putting some thought into this :)
I suppose that unrestricted referencing is one of Nim's design decisions in the language as a whole as well. It is perfectly valid to hold multiple mutable references to one object. Data can change under your feet at any time with this setup, so this is more like Java, whereas I was thinking more with the Rust mindset...
So probably I shouldn't really try to do anything unless I have a structure that could actually cost memory safety when abused, in which case the runtime-checked instance state "lock" probably makes the most sense.
@mratsim: Could you elaborate on the third option, please? The following doesn't compile:
iterator foo[T](s: seq{call,`let`}[T]): T =
var i = 0
while i < s.len:
yield s[i]
inc i
Whatever the correct syntax is, wouldn't this restrict potential arguments to be immutable even beyond the scope of the for? It also wouldn't work for custom data structures unless they are deep-copied on assignment.Yes, the arguments must be immutable even after the for in that case.
Seems like the syntax is this:
iterator foo[T](s: seq[T]{call,`let`}): T =
var i = 0
while i < s.len:
yield s[i]
inc i