nimforum mirror - Do we have a RTree or R*Tree for Nim?

Stefan_Salewski (orginal) [2017-11-27T11:21:28+01:00] view original

https://en.wikipedia.org/wiki/R-tree

I assume not yet.

For Ruby I used my custom bindings to boost library for this. But as the RTree algorithm is not too complex, a native Nim implementation may be preferable over a Nim Boost-Rtree wrapper.

I guess we would have to use Concepts for a native Nim RTree?

Next question would be if one would start with the plain paper from 1984/1990 or with an already existing implementation. There are C, C++, Java, Go and many more implementations. I think the Nim API may look similar as the Go one -- Go is using Interfaces. https://github.com/dhconnelly/rtreego

mratsim (orginal) [2017-11-27T12:56:19+01:00] view original

As far as I know there is none.

I'm interested in a pure Nim implementation of R-tree and closely related Kd-tree for geospatial algorithms (clustering especially).

I've added your suggestion to the "Are we scientists yet" RFC about needed libraries.

Stefan_Salewski (orginal) [2017-12-08T22:37:56+01:00] view original

In the last days I wrote a first draft following the Guttman paper.

https://github.com/StefanSalewski/RTree

While the algorithm is very simple and well explained in the paper, there are two points which makes the implementation not that easy: 1. We want a fully generic RTree (dimension, data type for location, data type for contained data.) And 2. the tree has two different types of nodes, the leaf with data, and the inner nodes.

So I used some type casts and proc arguments with or types (|). The code compiles and I can at least add one entry, delete it or search for it. Currently I do not really care if the code really works -- basically it should but it will have many bugs for sure.

I was mostly only interested in how the Nim code would look.

I still wonder how we can write it in a simpler and more compact way. Would using concepts help? Or maybe templates instead of procs with OR types? Or object variants or methods?

Of course we could simplify the code, if we use only one node type -- each node could contain data and the pointer to next node, but that would waste some memory. Or we could restrict the contained data to reference types, then we could do dirty casts between data and nodepointers. I guess C implementations would do that.

Of course later we would convert the RTree into the STAR variant R*Tree with optimized split algorithm and better performance, but that variant is a bit more complicated.

One general problem of Rtrees and R*Trees may be their age -- the papers are from 1984 and 1990 and the original authors tuned the algorithm for paged memory. But I am not aware of better algorithms for spatial search of extended objects, i.e. rectangles in 2D.

Udiknedormin (orginal) [2017-12-09T01:01:16+01:00] view original

Use object variants, they're exactly for cases like that --- the number of variants is fixed.

Stefan_Salewski (orginal) [2017-12-09T15:57:55+01:00] view original

Yes, maybe I will try object variants. But they also have some disadvantages: Size is always determinated by largest variant, and additional space is consumed by discriminator. To get clean, simple code I would have to make each array element an object variant, with a common member bounding box. Then I could access box without any tests, and have to check discriminator field only for the rare cases where I have to access pointer to next node or content field. But that increases memory consumption, and fewer items fit into cache.

I could also divide the array with the node into two, one for the boxes, and one for pointers/content. Would allow easy box access, but would make moving elements more work.

Stefan_Salewski (orginal) [2017-12-09T21:33:18+01:00] view original

Currently I favour to let the shape of the code as it is.

One question arise: Would it make sense to rewrite code like this

https://github.com/StefanSalewski/RTree/blob/master/rtree.nim#L305

for j in 1 ..< n.numEntries:
  if n of Leaf[M, D, RT, LT]:
    b = union(b, Leaf[M, D, RT, LT](n).a[j].b)
  elif n of Node[M, D, RT, LT]:
    b = union(b, Node[M, D, RT, LT](n).a[j].b)
  else:
    assert false

For max speed into something like this:

if n of Leaf[M, D, RT, LT]:
  let h = Leaf[M, D, RT, LT](n)
  for j in 1 ..< n.numEntries:
    b = union(b, h.a[j].b)
elif n of Node[M, D, RT, LT]:
  let h = Node[M, D, RT, LT](n)
  for j in 1 ..< n.numEntries:
    b = union(b, h.a[j].b)
else:
  assert false

Some languages like Oberon had type guards and the with clause for that, I think it was code like WITH n:Node[M, D, RT, LT] DO or similar, which saves a temporary h pointer. But maybe all that rewriting is unnecessary as the compilers optimize it.

Stefan_Salewski (orginal) [2017-12-21T21:03:53+01:00] view original

I have just shipped the first working release:

https://github.com/StefanSalewski/RTree

The R*Tree variant, bulk-loading and nearest-neighbour-search may follow later.

Stefan_Salewski (orginal) [2018-01-06T22:48:06+01:00] view original

The RTree module now contains the R*Tree variant also.

https://github.com/StefanSalewski/RTree

I guess the number of potential users of this module is very small -- I will add a nimble install script in the next days and maybe even add it to the nimble package list.

Of course there is much room for testing still...

Mirror of forum.nim-lang.org

3373 :: Do we have a RTree or R*Tree for Nim?