Hi, I am trying to design a linear algebra library to wrap standard implementations of BLAS (and later LAPACK).
I have some design goals in mind, but there is one point I am not able to resolve. First, let me explain some things I would like to achieve:
- having the compiler keep track of dimensions. This can be done by using types such as Matrix64[M, N: static[int]] and declaring the effects of operations on dimensions. Of course it would be nice to have operations for matrices whose dimensions are unknown at compile time, but I do not need them right now and I may add them later
- simplify the BLAS interface by use of term-rewriting macros. Many BLAS functions compute rather complex operations such as C = alpha * A * B + beta * C. This is to reduce the number of times one has to loop and keep things in caches. I would like to expose standard operations, such as matrix product, and make use of term-rewriting macros whenever the opportunity arises to conflate more operations into a single BLAS call.
Now, my main issue is that BLAS operations expect to be called with pointers to the appropriate area of memory where the matrix is stored. If I understand correctly (please, correct me if I am getting this wrong!) in Nim I can get the raw address of a variable only if it is declared with var. There are at least three ways I can think I could manage this:
- allocate everything on the stack (since dimensions are know at compile time anyway). To get the address, I need operands to be of var type even for functions that do not mutate them
- use standard heap memory tracked by the Nim garbage-collector - again, this will require operands of var type everywhere
- use malloc or one of the more specialized allocators such as mkl_malloc and hide returned pointer behind a Nim ref, using destructors to make sure that when the ref is collected, the appropriate free function is called.
The third one seems the most sensible solution - in particular it is the only one that does not require to make everything into a var, and allows the use of custom allocators, that make sure that matrices are aligned nicely.
The problems I see with this approach are:
- it may become complex to decide when to free memory. For instance, when there is more than one reference to a matrix (or a row thereof). I would end up implementing my own refcount, which does not seem to be a good idea.
- sometimes it may be nice to be able to make use of the static information about the dimension to avoid heap allocations altogether.
What is the most sensible/idiomatic way to go?