treeform (orginal) [2021-02-20T08:31:40+01:00] view original

Spatial data structures for Nim.

nimble install spacy

Spatial algorithms are used to find the "closest" things faster than simple brute force iteration would. They make your code run faster using smarter data structures. This library has different "Spaces" that you can use to speed up games and graphical applications.

BruteSpace

BruteSpace is basically a brute force algorithm that takes every inserted element and compares to every other inserted element.

The BruteSpace is faster when there are few elements in the space or you don't do many look ups. It’s a good baseline space that enables you to see how slow things actually can be. Don't discount it! Linear scans are pretty fast when you are just zipping through memory. Brute force might be all you need!

SortSpace

SortSpace is probably the simplest spatial algorithm you can have. All it does is sorts all entries on one axis. Here it does the X axis. Then all it does is looks to the right and the left for matches. It’s very simple to code and produces good results when the radius is really small.

You can see we are checking way less elements compared to BruteSpace. Instead of checking vs all elements we are only checking in the vertical slice.

SortSpace draws its power from the underlying sorting algorithm n×log(n) nature. It’s really good for very small distances when you don’t expect many elements to appear in the vertical slice. SortSpace is really good at cache locality because you are searching things next to each other and are walking linearly in memory.

HashSpace

HashSpace is a little more complex than SortSpace but it’s still pretty simple. Instead of drawing its power from a sorting algorithm it draws its power from hash tables. HashSpace has a resolution and every entry going in is put into a grid-bucket. To check for surrounding entries you simply look up closest grid buckets and then loop through their entries.

HashSpaces are really good for when your entries are uniformly distributed with even density and things can’t really bunch up too much. They work even better when entries are really far apart. They are also really good when you are always searching the same distance in that you can make the grid size match your search radius. You can tune this space for your usecase.

QuadSpace

QuadSpace is basically the same as "quad tree" (I just like the space theme). Quad trees are a little harder to make but usually winners in all kinds of spatial applications. They work by starting out with a single quad and as more elements are inserted into the quad they hit maximum elements in a quad and split into 4. The elements are redistributed. As those inner quads begin to fill up they are split as well. When looking up stuff you just have to walk into the closets quads.

QuadSpaces are really good at almost everything. But they might miss out in some niche cases where SortSpaces (really small distances) or HashSpaces (uniform density) might win out. They are also bad at cache locality as many pointers or references might make you jump all over the place in memory.

KdSpace

Just like QuadSpace is about Quad Trees, KdSpace is about kd-tree. Kd-Trees differ from quad trees in that they are binary and they sort their results as they divide. Potentially getting less nodes and less bounds to check. Quad trees build their nodes as new elements are inserted while kd-trees build all the nodes in one big final step.

KdSpace trees take a long time to build. In theory KdSpace would be good when the entries are static, the tree is built once and used often. While QuadSpace might be better when the tree is rebuilt all the time.

Always be profiling.

You can’t really say one Space is faster than the others, you always need to check. The hardware or your particular problem might drastically change the speed characteristics. This is why all spaces have a similar API and you can just swap them out when another space seems better for your use case.

Any feedback appricated!

cblake (orginal) [2021-02-20T09:13:26+01:00] view original

You would probably be quite interested in Nievergelt's Grid File. It's like your Hash Space but has per-axis indices which let you subdivide the grid where it is densely populated. So, it works better for highly non-uniform distributions.

Scalability-wise, the typical description/impl and probably good enough for intermediate scale uses just a linear array on each axis with binary search and memory shifting insert/delete. However, one could certainly instead use a binary tree or B-tree for those axes.

Stefan_Salewski (orginal) [2021-02-20T09:55:32+01:00] view original

Does it work only for 2D points, or also for multidimensional points and rectangles as RTree does?

enthus1ast (orginal) [2021-02-20T12:08:01+01:00] view original

Nice, build all the collision detection!!1

treeform (orginal) [2021-02-20T18:26:24+01:00] view original

@cblake The grid file system seems really interesting. Its like it combines the best parts of all data structures. Feels complex to build. Maybe I should try that next.

@Stefan_Salewski, sorry its 2d only at the moment. It might not be hard to expand it to be a 3d version.

@enthus1ast yes, with this and https://github.com/treeform/bumpy you can do a lot!

Clonk (orginal) [2021-02-21T16:24:36+01:00] view original

Do you plan on supporting Delaunau triangulation ?

Stefan_Salewski (orginal) [2021-02-22T08:26:55+01:00] view original

supporting Delaunau triangulation ?

Have you carefully tried this one:

https://github.com/StefanSalewski/cdt

Clonk (orginal) [2021-02-22T10:12:39+01:00] view original

No I haven't. I'l ltake a look at it.

For now, I'm using scipy.Delaunay through Nimpy and was looking into creating binding for the C++ underlying library Qhull used by Scipy, but a pure Nim solution is always nicer to work with.

Stefan_Salewski (orginal) [2021-02-22T10:29:10+01:00] view original

Yes, would be nice. My CDT is a very advanced lib, it provides a fully dynamic incremental delaunay triangulation based on the quad edge data structure. I used CGAL with Ruby for that ten years ago for that, as I was not able to code it myself, and of course Ruby is too slow for it. Then two years ago I finally managed it, after I got all the needed papers together. Was a 12 week fulltime project for me. More testing would be nice, and maybe I should improve docs some day. And maybe most important, ARC may not work out of the box , I was not going to bother araq with it as long as no one is really using it. But I assume araq will know how to enable ARC. But ARC had some progress in the last two years, so maybe ARC works already. At least I think Arc works for RTree.

enthus1ast (orginal) [2021-02-22T11:01:51+01:00] view original

So much cool stuff, if I just had more time (or less other stuff todo).

treeform (orginal) [2021-02-23T03:26:44+01:00] view original

@Stefan_Salewski your Delaunay Triangulation is really cool!

Stefan_Salewski (orginal) [2021-04-03T22:18:44+02:00] view original

Can we subclass the stored objects? From a short look at your API docs I doubt it:

Entry = object
 id*: uint32
 pos*: Vec2

Well subclassing value objects works more or less, (see https://forum.nim-lang.org/t/7733) but I have the feeling that you store the Entry instances in a seq, and I assume at least then subclassing will fail?

And without subclassing the lib is a bit restricted, as we generally want to add attributes to the stored objects. Like names, colors and all that. Well we can somehow map each entry instance to other data by use of its id field, maybe mapping to array or table values. But that is a poor solution, I did something similar many years ago mapping Ruby API to CGAL and BOOST libs. Nim's generics should offer better solutions.

Unfortunately I have the same problem with my CDT. I just discovered that it compiles fine with ORC, and after adding cursor notation it compiles and works with ARC too. (I do not really understand cursor annotation, but valgrind does not complain, so maybe it is OK.) But as the CDT lib was created strictly following papers, it is not yet generic. Similar to your spacy libs it stores plain points, but for real live I have to attach more data to the Vertices and edges. That was the reason I just visited your lib, to get some fine ideas. But it seems that I have to do some own thinking. The API of a CDT lib is a complicated beast, only its use can show how a useful API have to look.

Unfortunately

Araq (orginal) [2021-04-04T10:58:33+02:00] view original

Well we can somehow map each entry instance to other data by use of its id field, maybe mapping to array or table values. But that is a poor solution...

Hardly, it's "Data oriented design" and becoming more popular as it's a good way to squeeze out more performance. But I agree it's usually more inconvenient to use.

xigoi (orginal) [2021-04-04T17:38:25+02:00] view original

Another solution would be to make Entry a concept, so you could put in whatever object you want without the runtime overhead of inheritance.

Stefan_Salewski (orginal) [2021-04-05T11:55:03+02:00] view original

Hardly, it's "Data oriented design" and becoming more popular as it's a good way to squeeze out more performance. But I agree it's usually more inconvenient to use.

I heard about "Data oriented design" and Entity–component–system (ECS) a few times in the last decade. But my feeling was that it is a hyped design pattern even more than OOP was in the 1990 years. Some useful real world examples would be nice.

But I have some doubts: Assume we have spatial data structures like a 2D triangulation with vertices and edges. Each edge has a starting and an ending vertex, and each vertex has a set of edges which connect it to its neightbor vertices. And of course we have a large set of functions and iterators, to iterate over all edges or vertices, to locate a vertex nearest to a given query point and much more. The data structure may be fully dynamic, we can add and remove points on the fly. If we can not attach arbitrary information to each vertex, then things get really complicated and slow. Imagine we want to remove a point nearest to a given query point: We have to locate it, get its ID, and then have to fix all the related data structures which contains the additional information.

In practice unfortunatelly things become soon much more complicated as in the famous examples in books and papers. Assume we want to use some libraries for our spatial data, a delaunay triangulation for neighrbor relations, a RTree for fast location queries, and maybe a lib which can find the convex hull of a dataset. I would assume that inheritance with references could solve it, maybe some generics are needed additional.

For my concrete problem with my CDT my current idea is that I modify the API in a way that the CDT does not create the vertices on its own, but that I pass in the vertices, which can be subclassed ref objects. My vertices are discs, and I need a convex hull lib to find convex paths. Unfortunately the CDT and the ConvesHullOfDisc lib are designed from different papers and so do not use a common data structure. I guess I can make the ConvexHullOfDisks lib generic, so it can work with the vertices of the CDT. The RTree should be not really needed for this application currently.

@xigoi

Conceps may be a nice solution, but they were considered experimental 6 years ago when I started with Nim, and I think they still are? And I think its main inventor is not really active in Nim any more, so using concepts for larger, non toy apps is some risk.

Mirror of forum.nim-lang.org