My persistent bounded balance tree library is working well. Now I want to optimize.
The tree type is
type
BBTree*[K,V] = ref object of RootObj # BBTree is a generic type with keys and values of types K, V
## `BBTree` is an opaque immutable type.
key: K # the search key; must support the generic ``cmp`` proc
val: V # the data value associated with the key, and stored in a node
left: BBTree[K,V] # left subtree; may be nil
right: BBTree[K,V] # right subtree; may be nil
size: int # the size of the (sub-)tree rooted in this node
Since roughly half of all tree nodes are leaves with no need of left, right, or size fields, the first step is to introduce specialized leaf and internal node objects. I considered object variants to do this, but after digging into it determined that they are essentially C unions, taking the maximum size of any variant. (Document maintainers please note that clarifying this in the manual would be nice.) So, next try is object inheritance:
type
BBTree*[K,V] = ref object of RootObj # BBTree is a generic type with keys and values of types K, V
## `BBTree` is an opaque immutable type.
BBLeaf[K,V] = ref object of BBTree
key: K # the search key; must support the generic ``cmp`` proc
val: V # the data value associated with the key, and stored in a node
BBNode2[K,V] = ref object of BBLeaf
left: BBTree[K,V] # left subtree; may be nil
right: BBTree[K,V] # right subtree; may be nil
size: int # the size of the (sub-)tree rooted in this node
func size[K,V](n: BBLeaf[K,V]): int = return 1
func left[K,V](n: BBLeaf[K,V]): BBTree[K,V] = return nil
func right[K,V](n: BBLeaf[K,V]): BBTree[K,V] = return nil
Unfortunately, this leads to... Error: cannot instantiate: 'BBTree'
Using this test code:
iterator inorder*[K,V](root: BBTree[K,V]): (K,V) =
## Inorder traversal of the tree at `root`, i.e., from smallest to largest key
# Since recursive iterators are not yet implemented, this uses an explicit stack
var stack: seq[BBTree[K,V]] = @[]
var curr = root
while (not curr.isNil) or stack.len > 0:
while (not curr.isNil):
add(stack, curr) # push node before going left
curr = curr.left
# at the leftmost node; curr is nil
curr = stack.pop()
yield (curr.key, curr.val)
curr = curr.right # now go right
# test
func isOrdered[K,V](root: BBTree[K,V], min: K): bool =
var last = min
for k,v in inorder(root):
discard v
if last == min:
last = k
elif cmp(last, k) < 0:
discard # ok
else:
return false
return true
var root : BBTree[int,int] = nil
assert(isOrdered(root, low(int)))
I wondered if there is a problem with using common field names and proc names (e.g., left), so I tried this without success:
type
BBTree*[K,V] = ref object of RootObj # BBTree is a generic type with keys and values of types K, V
## `BBTree` is an opaque immutable type.
BBLeaf[K,V] = ref object of BBTree
key: K # the search key; must support the generic ``cmp`` proc
val: V # the data value associated with the key, and stored in a node
BBNode[K,V] = ref object of BBLeaf
vleft: BBTree[K,V] # left subtree; may be nil
vright: BBTree[K,V] # right subtree; may be nil
vsize: int # the size of the (sub-)tree rooted in this node
func size[K,V](n: BBLeaf[K,V]): int = return 1
func left[K,V](n: BBLeaf[K,V]): BBTree[K,V] = return nil
func right[K,V](n: BBLeaf[K,V]): BBTree[K,V] = return nil
func size[K,V](n: BBNode[K,V]): int = return n.vsize
func left[K,V](n: BBNode[K,V]): BBTree[K,V] = return n.vleft
func right[K,V](n: BBNode[K,V]): BBTree[K,V] = return n.vright
What am I doing wrong?
Let me help, because I'm also interested in the answer to this:
This should be a more minimal reproduction: https://gist.github.com/rayman22201/92f28f0c1d9ccbc5a36fe04c2132c49e
You would check which type it is using the of operator
if curr of BBNode2:
#cast to a BBNode2
else if curr of BBLeaf:
#cast to a BBLeaf
Depending on what you want to optimize (size or speed), you might want to stay with your initial version.
The cache misses from dereferencing the extra node payload will be very costly.
@rayman22201, thanks. Yikes, what a hassle. I would have expected Nim to do that discrimination in its method dispatch.
Where is this kind of thing documented?
proc and func uses static dispatch. To enable dynamic dispatch, method must be used instead: https://nim-lang.org/docs/manual.html#multiminusmethods
This appears to be working:
type
BBTree*[K,V] = BBLeaf[K, V] # BBTree is a generic type with keys and values of types K, V
## `BBTree` is an opaque immutable type.
BBLeaf[K,V] = ref object of RootObj
key: K # the search key; must support the generic ``cmp`` proc
val: V # the data value associated with the key, and stored in a node
BBNode[K,V] = ref object of BBLeaf
vleft: BBTree[K,V] # left subtree; may be nil
vright: BBTree[K,V] # right subtree; may be nil
vsize: int # the size of the (sub-)tree rooted in this node
method size[K,V](n: BBLeaf[K,V]): int = return 1
method left[K,V](n: BBLeaf[K,V]): BBTree[K,V] = discard
method right[K,V](n: BBLeaf[K,V]): BBTree[K,V] = discard
method size[K,V](n: BBNode[K,V]): int = return n.vsize
method left[K,V](n: BBNode[K,V]): BBTree[K,V] = return n.vleft
method right[K,V](n: BBNode[K,V]): BBTree[K,V] = return n.vright
Error: execution of an external compiler program 'clang -c -w -I/Users/e/Dev/Nim/Nim/lib -I/Users/e/Dev/Nim/eNim/nim-bbtree/tests -o /Users/e/.cache/nim/testbbtree_d/bbtree_testbbtree.c.o /Users/e/.cache/nim/testbbtree_d/bbtree_testbbtree.c' failed with exit code: 1
/Users/e/.cache/nim/testbbtree_d/bbtree_testbbtree.c:6709:86: error: redefinition of 'left_2u7Q4oZNBna8dJLQnuQm6A'
N_LIB_PRIVATE N_NIMCALL(tyObject_BBTreecolonObjectType__x9c9b9aUvIj9ao6qWVt1Yk55nw*, left_2u7Q4oZNBna8dJLQnuQm6A)(tyObject_BBTreecolonObjectType__x9c9b9aUvIj9ao6qWVt1Yk55nw* n) {
^
/Users/e/.cache/nim/testbbtree_d/bbtree_testbbtree.c:5933:86: note: previous definition is here
N_LIB_PRIVATE N_NIMCALL(tyObject_BBTreecolonObjectType__x9c9b9aUvIj9ao6qWVt1Yk55nw*, left_2u7Q4oZNBna8dJLQnuQm6A)(tyObject_BBTreecolonObjectType__x9c9b9aUvIj9ao6qWVt1Yk55nw* n) {
^
/Users/e/.cache/nim/testbbtree_d/bbtree_testbbtree.c:6871:86: error: redefinition of 'right_2u7Q4oZNBna8dJLQnuQm6A_2'
N_LIB_PRIVATE N_NIMCALL(tyObject_BBTreecolonObjectType__x9c9b9aUvIj9ao6qWVt1Yk55nw*, right_2u7Q4oZNBna8dJLQnuQm6A_2)(tyObject_BBTreecolonObjectType__x9c9b9aUvIj9ao6qWVt1Yk55nw* n) {
^
/Users/e/.cache/nim/testbbtree_d/bbtree_testbbtree.c:6270:86: note: previous definition is here
N_LIB_PRIVATE N_NIMCALL(tyObject_BBTreecolonObjectType__x9c9b9aUvIj9ao6qWVt1Yk55nw*, right_2u7Q4oZNBna8dJLQnuQm6A_2)(tyObject_BBTreecolonObjectType__x9c9b9aUvIj9ao6qWVt1Yk55nw* n) {
^
Wow. That very much looks like a compiler bug.
An error in Nim is one thing, but Nim producing invalid C code is definitely a bug. This might be difficult, but if you can try to create a minimal reproduction that causes this, it would be super helpful.
Unfortunately, The interaction of generics and inheritance is complex and not tested as well as it could be. I'm sorry it's causing you so many problems, but I appreciate that you are bringing out compiler bugs. Your pain will hopefully help make Nim better if it's any consolation. :-)
Error: unhandled exception: cannot dispatch; dispatcher is nil [NilAccessError]
See gist
I was noodling with your gist. This seems to be a bug with multimethods and generics.
possibly related to this bug: https://github.com/nim-lang/Nim/issues/88
or this bug: https://github.com/nim-lang/Nim/issues/6732
or this bug: https://github.com/nim-lang/Nim/issues/10038
and may be related to this code in the compiler: https://github.com/nim-lang/Nim/blob/9658a35706e1a7d7fce85a02bae2c9a1348d6bc6/compiler/semtypinst.nim#L468
Your example is still way too complicated for a bug report, but if I can extract the relevant parts I will make a proper bug report for it.
Essentially what it looks like is happening:
The the generic params are being cached too aggressively in the vtable dispatcher.
So the first instantiation with
BBTree[string,int]
works fine. You can just keep making objects of type BBTree[string,int] all day long, and everything works fine.
then you attempt to create a tree with different types,
BBTree[int,int]
Then the dispatcher either gets confused and picks the wrong method, or just fails with a dispatch error (depending on the discard or not).
TLDR; I think you walked into a minefield. :-(
@Araq has said this in the past: "Multi methods and generics are broken beyond repair IMO."
If you want to continue with this approach, I think the only real path forward here is to implement your own runtime dispatch using macros and/or type based if statements as I showed earlier.
Or wait for the compiler devs to fix a multitude of bugs around this feature. :-/
I'm sorry I don't have a better answer for you.
Yeah. I didn't expect it to be this broken either.
I think this part of the compiler hasn't gotten as much love because inheritance and run time dispatch is not really favored by the community.
The nim "blessed" way has always been variant types instead.
I do hope some of these issues get fixed, (and eventually I think they will be), but as it stands, Nim is a bit rough here.
struct tyObject_BBTreecolonObjectType__J7jcHxXTiq5WhuKk2uaAGQ {
tyEnum_NodeKind_9bCibRYMAQfBZ6GB8aWzvUA kind;
union {
struct { NI lkey; NI lval; };
struct { NI nkey; NI nval;
tyObject_BBTreecolonObjectType__J7jcHxXTiq5WhuKk2uaAGQ* vleft;
tyObject_BBTreecolonObjectType__J7jcHxXTiq5WhuKk2uaAGQ* vright;
NI vsize;
};
};
};
I assume Nim will allocate the union for instances of my nodes, so there is no space savings. With inheritance and methods I was hoping that Nim would allocate smaller structures for leaf nodes. Is there a better way to do that? (I'm ok with "no," I just want to know if I've overlooked something.)