In the absence of support for recursive iterators, how would you write an iterator returning all the nodes of a general tree?
GordonBGood gave an example of such an iterator for a binary tree and mratsim wrote a similar one for binary trees stored in arrays in Weave, but here I want to walk over all the nodes of the following tree type:
type
Node = ref object
value: string
children: seq[Node]
Whenever an iterator uses data structures with side effects or when not using tail call recursion, first flattening all the results in a queue/stack and then unqueing/unstacking them sequentially from the iterator is not possible. Also this flattening phase can be done only with bounded values iterators; you can't do it with an infinite data loop, or when the subsequent value of the iterator depends on the state of previous values... So what are the ways to write recursive iterators without using recursion?
In general: All recursion can be converted to iteration. That is a foundation of the Church-Turing thesis. Also, iterators are just abstractions around loops. Simply put the yield where the body of the loop would go.
@GordonBGood's algorithm is just embedded recursion using closure functions. It still builds a stack. (he mentions this in his post.) I believe you can modify that code from a binary tree to the general case fairly easily.
Tree Traversal abstractly requires a way to jump back to the parent node. @mratsim does this by bit shifting, a trick he can use because of the array representation.
For the representation you are using, one way to achieve this is to keep the reference to the parent. Either by building a stack, or by modifying the node object to keep an extra "parent" ref field. This obviously has some space cost. You have essentially embedded the stack into the tree itself.
See here for an example in C: https://stackoverflow.com/a/8976310
Another option is something called "Morris Traversal": https://www.geeksforgeeks.org/inorder-tree-traversal-without-recursion-and-without-stack/
Note that solution requires being able to modify the tree.
I've written an iterator over an XMLNode:
https://hg.sr.ht/~sschwarzer/vppdiff/browse/default/src/vppdiff/cleanxmi.nim#L46
Internally the iterator uses a stack.
Iterator features:
I suspect the implementation is more complicated than it has to be, but this code is what I've been able to come up with. ;-)
In the interests of a more self-contained thread and to perhaps more easily see what it's doing, here is a more brief version of @sschwarzer's algorithm:
import xmltree
type Handle* = tuple[kidIx: int; par, kid: XmlNode] #Ix of kid in parent
iterator handlesHiToLo*(node: XmlNode): Handle =
## Recursively yield ``Handle`` objects for direct&indirect kids of ``node``
## from high to low index order (on same level). { This order simplifies
## removing kids in-place without affecting later yielded kid indices. }
## The root node is included in yielded ``Handle``s with a ``nil`` parent.
template recordYield(par: XmlNode, kidIx: int, kid: XmlNode=nil) =
handle = (kidIx, par, (if kid.isNil: par[kidIx] else: kid))
yield handle
proc high(node: XmlNode): int = node.len - 1
var done = false #flag to stop outer loop from inner
var stack = newSeq[Handle]()
var handle: Handle #Last node yielded
recordYield nil.XmlNode, 0, node #Always yield the root first
if node.len == 0: done = true #Nothing left to do
else: recordYield node, node.high #Start from high node of root & proceed
while not done:
let kidHasKids =
if handle.par.len - 1 < handle.kidIx: false #kid@kidIx was deld post yield
else: handle.kid.len > 0
if kidHasKids: #Descend tree, build stack, yield last grandkid
stack.add handle
recordYield handle.kid, handle.kid.high
continue
while true: #Yield next (lower index) kid | maybe pop,find new
if handle.kidIx >= 1:
recordYield handle.par, handle.kidIx - 1
break
if stack.len > 0:
handle = stack.pop()
else: #No more nodes to try on the stack. Done.
done = true
break
If you want some other visitation order such as low-to-high then you have to adjust things a bit. For in-order/work in the middle you might need to adjust it a lot.
I actually kind of doubt it can be simplified if you want to preserve the deleteability of nodes pointed to by yielded handles.
In these sorts of situations, recursive code is often drastically more clear. A good programming language should facilitate clear code. Therefore Nim iterators should grow recursion capability someday. But I'm not signing up to implement it, and I wouldn't assert it's a high priority.
It is always possible to move a recursive algorithm to a non-recursive one; even avoiding psuedo-stacks or tail-recursion games.
It is, ironically, something I enjoy doing because it is quite the intellectual puzzle sometimes. The toughest one I did was with the Negamax (Minimax with restrictions) in a python library. (Sorry, I've not moved it to Nim yet.)
But, for context: writing a normal negamax algo in any language takes, perhaps, a few hours in any language. But, writing the non-recursive one, involved multiple convoluted arrays and more complex edge cases than I've ever encountered before. It took over a week of intense work.
SO,
..if you are convinced you want to go through that ordeal: I recommend starting by solving the problem with a large tree by hand on paper. Make notes of your own thinking. The human mind does not have enough "temporary stack space" for recursive solutions. Any notes you need to scribble down are likely array variable contents.
Anything your head can do can be described in an algorithm with enough scrutiny.
(BTW, the non-recursive version of negamax is about 3 to 8 times faster. But it uses lots more memory.)
Does this test
if handle.par.len - 1 < handle.kidIx: false #kid@kidIx was deld post yield
mean that deletion is only supported for the last kid (or all kid s)? E.g., if I don't delete kid 5, but then delete kid 4, isn't len still 4, so the test fails and we process kid 4's kids?That sounds like a bug/limitation in @sschwarzer's algorithm to me, but maybe he can verify. Good catch, @e.
Another pretty different approach that I use in cligen/tern.nim (that does not support in-iteration deletion) is something like this:
const NUL* = '\0'
type
NodeOb*[T] {.acyclic.} = object
ch*: char
cnt*: int
when T isnot void: val*: T
kid*: array[3, ref NodeOb[T]] #0,1,2 ~ <,=,>
Node*[T] = ref NodeOb[T]
Tern*[T] = object ## A Tern can be used as either a mapping from strings
root*: Node[T] ## to type ``T`` or as a set(strings) if ``T`` is void.
count: int ## Number of elements
depth: int ## Depth of Tree
iterator leaves[T](r: Node[T], depth:int, pfx=""): tuple[k:string, n:Node[T]] =
type #Nim iterators should grow
Which = enum st0, st1, st2, stR #..recursion capability so
State = tuple[st: Which, k: string, n: Node[T]] #..this headache can be as
if r != nil: #..easy as `print`.
var stack = newSeqOfCap[State](depth)
stack.add (st: st0, k: pfx, n: r)
while stack.len > 0:
let state = stack[^1].addr
if state.n == nil: break
case state.st
of st0:
state.st = st1
if state.n.kid[0] != nil:
stack.add (st: st0, k: state.k, n: state.n.kid[0])
of st1:
state.st = st2
if state.n.ch == NUL:
yield (k: state.k, n: state.n)
elif state.n.kid[1] != nil:
stack.add (st: st0, k: state.k & $state.n.ch, n: state.n.kid[1])
of st2:
state.st = stR
if state.n.kid[2] != nil:
stack.add (st: st0, k: state.k, n: state.n.kid[2])
of stR: discard stack.pop
The above is probably pretty close to what compilers generate for recursive function calls. Of course, as mentioned I think things like this are more elegant/easy to understand:
proc print*[T](p: Node[T], depth=0) = #2,1,0 gives std emoji-orientation
if p == nil: return #..i.e. ":-)" head-tilt not "(-:".
print(p.kid[2], depth + 1)
echo strutils.repeat(" ", depth), cast[int](p), " ch: ", p.ch.repr, " cnt: ", p.cnt
print(p.kid[1], depth + 1)
print(p.kid[0], depth + 1)
For cligen/trie.nim since I was already using a super wasteful uncompressed trie, I just punted on all that and spent memory proportional to the answer, i.e.:
proc collect*[T](n: Node[T], key: var string, d=0, pfx="", i=0): seq[tuple[k: string, n: Node[T]]] =
if n == nil:
return
if n.term:
result.add (k: (if i > 0: pfx[0..<i] & key else: key), n: n)
for h, ch in n.kidc:
key.setLen d + 1
key[d] = ch
result.add collect(n.kidp[h], key, d + 1, pfx, i)
proc leaves[T](r: Node[T], depth=99, pfx="", i=0, d=0):
seq[tuple[k: string, n: Node[T]]] =
var key = "" #Would be nicer to do iterator w/state bounded
collect(r, key, d, pfx, i) #..by depth, but seemed tricky & unimportant.
It sounds like @JohnAD might find the trie variant of leaves an interesting challenge to do The Right Way. PRs welcome. ;-)@e Yes, it seems that's a bug that wasn't revealed by my existing unit tests. Thank you!
I'll look what I can do. :-)
I've been staring at the problem of converting a recursive Python iterator to a Nim iterative iterator for days without any conclusive result. For those interested, I'm trying to port that code.
Let's start with a naive question. An iterator can be written by replacing the side effect of an iterative proc by a yield command. For instance, the countup procedure is transformed into the countup iterator:
proc countup(a, b: int): int =
var res = a
while res <= b:
echo res # <== side effect: display result
inc(res)
becomes the iterator
iterator countup(a, b: int): int =
var res = a
while res <= b:
yield res # <== changed to yield result
inc(res)
If I write the recursive procedure dual to the Python recursive iterator, then convert this recursive procedure into an iterative one, and finally transform it into the looked for Nim iterative iterator, will it be as simple as replacing the side effects by yield calls like what was done with countup?
The step process:
Python recursive iterator
|
v
Python recursive proc
|
v
Python iterative proc
|
v
Nim iterative proc
|
v
Nim iterative iterator
In other words, instead of trying to do a big jump to convert a Python recursive iterator into a Nim iterative iterator, I want to do small steps into a dual space and then convert back the result in the original space. Will the trip be as simple as I describe it?
@araq proposal with inversion of control is interesting because it splits the tree walking part that can be written simply with a recursive proc from the rest of the process (capturing the results).
proc walkTree(node: Node; proc doSomething(n: Node): Node) =
doSomething(node)
for child in node.children:
changeData
walkTree(child)
restoreData
walkTree(root, echo)
The only problem is that you can't interrupt the tree walk or only do a partial enumeration. But from the algorithmic point of view, I think that's the simpler to implement.
I can't split this pattern in two separate phases:
2. consume the queue with an iterator. because phase #1 is not deterministic (in execution time) and I need to use results as soon as they are available.
If Nim has setjump/longjump features (I've just found them in segfaults library!), I could change that pattern to write two coroutines, consumer and producer. The producer would give control to the consumer as soon as a result is available. And the consumer will have the responsibility to resume the producer when it needs another result.
This setjump/longjump scenario can also be mimicked with two processes/threads and a pipe.
But these scenarios are more low level or system hacks, with potential memory problems, and I would rather find an algorithmic solution.
Is there a standard way to resume an iterator in Nim? For instance, if I save a closure iterator in a stack, after having started looping on it, is there a way to resume the iteration?
Well, this really does not address @spip's question at all, but it might be something @sschwarzer could use as an example for a multi-way/k-ary tree iterator. The iteration itself (not via iterator, but via Path also should support deletion/insertion/other tree surgery. Maybe this time @e won't find any bugs. Famous last words. ;-)
type #k-ary search tree (like a B-tree)
NodeOb {.acyclic.} = object
data*: seq[int] #.data.len > 0
kids*: seq[ref NodeOb] #.kids.len == .data.len + 1
Node* = ref NodeOb #.. Also, .kids.len==0=>LEAF
Path* = seq[tuple[p: Node, j: int]]
proc newNode*(data: openArray[int]): Node =
result = Node.new #Convenience Node builder
result.data.setLen(data.len)
for i, x in data: result.data[i] = x
proc seek_most*(path: var Path, t: Node, lower=true) =
if t == nil: return #Extend path from t to least|..
var t = t #..|greatest-most kid per lower
while t.kids.len > 0:
path.add (t, if lower: 0 else: t.kids.len - 1)
t = t.kids[path[^1].j]
path.add (t, if lower: 0 else: t.data.len - 1)
proc seek_most*(t: Node, lower=true): Path =
result.seek_most(t, lower) #Conveniene Path builder
proc seek_succ*(path: var Path) = # path -> successor
if path.len == 0: return #Just for safety
if path[^1].p.kids.len > 0:
path[^1].j.inc
path.seek_most(path[^1].p.kids[path[^1].j], true)
return
while path[^1].j == path[^1].p.data.len - 1:
path.setLen(path.len - 1) #discard path.pop
if path.len == 0: #path @beg; No more succs
return
path[^1].j.dec
path[^1].j.inc
proc seek_pred*(path: var Path) = # path -> predecessor
if path.len == 0: return
if path[^1].p.kids.len > 0:
path.seek_most(path[^1].p.kids[path[^1].j], false)
return
while path[^1].j == 0:
path.setLen(path.len - 1) #discard path.pop
if path.len == 0: #path @beg; No more preds
return
path[^1].j.dec
iterator forward*(root: Node):
tuple[node: Node; j, depth, val: int] =
var path = seek_most(root, true)
while path.len > 0:
yield (path[^1].p, path[^1].j, path.len,
path[^1].p.data[path[^1].j])
path.seek_succ
iterator reverse*(root: Node):
tuple[node: Node; j, depth, val: int] =
var path = seek_most(root, false)
while path.len > 0:
yield (path[^1].p, path[^1].j, path.len,
path[^1].p.data[path[^1].j])
path.seek_pred
import strutils
proc `$`(n: Node): string = cast[uint](n).toHex
let spaces = repeat(' ', 80)
proc print*(n: Node, depth=0) =
if n == nil: return
for i in 0 ..< n.data.len:
if i < n.kids.len:
n.kids[i].print depth + 1
echo spaces[0..<2*depth], $n, "(",i,"): ", n.data[i]
if n.kids.len > 0:
n.kids[^1].print depth + 1
var root = newNode([100, 200, 300]) # These values
root.kids.add newNode([20, 40, 60]) #..make this a
root.kids.add newNode([120, 140, 160]) #..full B-tree
root.kids.add newNode([220, 240, 260])
root.kids.add newNode([320, 340, 360])
root.print
echo ""
for (p, j, depth, val) in root.reverse:
echo "R: ",spaces[0..<2*depth], $p, "(",j,"): ", val
echo ""
for (p, j, depth, val) in root.forward:
echo "F: ",spaces[0..<2*depth], $p, "(",j,"): ", val
If Nim has setjump/longjump features (I've just found them in segfaults library!), I could change that pattern to write two coroutines, consumer and producer.
There's coro module in case you didn't realize it.
Is there a standard way to resume an iterator in Nim? For instance, if I save a closure iterator in a stack, after having started looping on it, is there a way to resume the iteration?
Perhaps, I misunderstood, but do you means iterator invocation? You can check whether the iterator has finished or not with finished proc and just simply call like myiterator() would progress your iterator. You don't have to put an iterator in for-loop syntax just to iterate it.
@cblake Thanks for your encouragement, it did motivate me. :-) Unfortunately, I only found the time this week.
So finally, here's a new take on the recursive tree iterator: https://hg.sr.ht/~sschwarzer/lazytree (clone with
hg clone https://hg.sr.ht/~sschwarzer/lazytree
if you want to play with the code.)
The iterator code is in https://hg.sr.ht/~sschwarzer/lazytree/browse/src/lazytree.nim and tests in https://hg.sr.ht/~sschwarzer/lazytree/browse/tests/test_lazytree.nim .
Using the iterator:
rootNode = parseXML("<root>...</root>")
for nodeInfo in lazyTree(rootNode):
# Deleting modifies the tree under `rootNode and also has the effect
# that the deleted node's child nodes won't be yielded. Of course
# deleting nodes is optional. In most cases you want to keep the node.
if nodeInfo.node.tag == "toBeDeleted":
nodeInfo.parentNode.delete(nodeInfo.childIndex)
# Do something else with `nodeInfo.node`, which is the actual node.
# (`parentNode` and `childIndex` are mostly needed to be able to delete
# a node from the tree.)
...
Some remarks/thoughts:
Do you see any edge cases where the algorithm would fail? If you think that a code comment doesn't fit the code, this could be an indication. ;-)
Other feedback is welcome, too!
I don't see anything super obviously wrong, but I only just read through it. Your test suite looks like it could maybe use a mixture of inserts & deletes in either order while iterating, but @e / @spip / @GordonBGood / @rayman22201 should probably take a look.
It may not be so appropriate to your "tree structure surgery while iterating" use case, but I happened to do Yet Another Idea just the other day over at https://forum.nim-lang.org/t/5506 which bypasses the Nim iterator syntax framework for a Nim template that accepts multiple body clauses. So, for path in itr(params): perElement becomes:
forPath(params): cannotRecurseWork
do: perElementWork
do: preRecurseWork
do: postRecurseWork
Internally, that template forPath is just a regular recursive function that exposes these "hooks" for the clauses. The end of that other thread has a more complete example of using that forPath defined in https://github.com/c-blake/cligen/ in the cligen/dents.nim submodule. Something like XML would not need to worry about file system permission issues and that first clause.
It is not very easy to use/safe, but it is yet another way to package up walking trees without recursive iterators that "looks similar" to Nim iterators that a reader of this thread may care to be aware of.
Many thanks for your feedback!
Your test suite looks like it could maybe use a mixture of inserts & deletes in either order while iterating
The inserts are "missing" because they were not relevant for my use case and I hadn't thought a lot about them. As far as I did think about insertions, I wasn't sure how well that would work or behave as you'd intuitively would expect. Since the algorithm "detects" deletions via a change of the child nodes in the parent node, it won't notice when you delete a node and insert another instead. So I didn't spend much more time on that because I thought it would be too brittle anyway.
Now that I think more about it, with the current implementation, I'd expect this behavior:
My problem right now is, although I think the behavior is reasonable, I don't know if a user would also find it reasonable. Also, obviously not everything is possible. For example, if the iterator is already yielding child nodes of a node and you delete the parent node (because you held on to the yielded value of an earlier iteration) this won't stop the iteration over the child nodes. I guess documenting the behavior of the iterator might be more difficult than the implementation.
in either order while iterating
At the moment only iterating forward is supported. Or did you mean something else, like inserting a node before or after the just-yielded node?
It may not be so appropriate to your "tree structure surgery while iterating" use case, but I happened to do Yet Another Idea just the other day over at https://forum.nim-lang.org/t/5506 which bypasses the Nim iterator syntax framework for a Nim template that accepts multiple body clauses.
At first I wasn't sure whether I understood it, but after a more thorough look at the example in the other thread, I think I understand what you're doing there. :-) On the one hand it's indeed interesting, on the other hand it feels kind of "heavy". I don't know how to describe it better. Maybe it would help if you could replace the second and third do with pre_recurse and post_recurse.
Apart from iterators, I also like your callback idea from a messages above. This should be both flexible and relatively easy to implement with the tools we already have (like iterating over sibling notes with node[childIndex]). That said, iterators feel more lightweight and intuitive (as long as they're applicable to the problem.)
Actually, I experimented with the callback idea yesterday to fix my special problem in the vppdiff tool, although I didn't use a callback but defined a recursive proc with the XML clean-up code. While doing that I noticed that node.items() wasn't a good fit for this because it takes the len of the parent node before it even starts iterating and when you delete child nodes during the iteration that leads to IndexError s. So while items is an iterator and you'd intuitively think it's lazy, it's only "partially lazy" in that it doesn't consider changes in the number of children.
Now, you usually should follow the advice not to modify a structure while iterating over it, but I still wonder if XmlNode.items should be changed to use a while loop instead fixing the upper boundary for the index at the start of the iteration.
Yeah..By "either order" I just meant inserting after deleting and vice-versa as you figured out.
I like your idea of "do" being spelled differently for the various sections! (If that can be done easily.)
No doubt modifying structures while iterating can be hairy tricky and is tricky to have good interfaces for client code.
Now that I think more about it, with the current implementation, I'd expect this behavior:
- If you delete a node and insert another node instead, the iterator should go over the child nodes of the new node. I think that's actually reasonable.
- If you insert a node after the one just yielded, the iterator should still yield the child nodes of the last yielded node and then yield the newly inserted node and its children. Also not too bad.
So yes, I'll experiment with insertion and add tests if the insertions work as expected. :-)
This turned out to be more complicated than I thought. I should have looked at the code before writing this. ;-) I'm going to open a new thread on the further development of the iterator.