There is still a lot of work to be done, but before going on, I would like to know if people have comments on either the API or the implementation.
The library is here It is ready to be put on nimble, but I did not add it yet.
If anyone has comments of any kind, it will be very helpful!
Hi, It looks a nice start! : )
I see you already support atlas and mkl. I think support for OpenBLAS would be interesting too, because it is more-or-less the fastest open source blas. Benchmark against Julia would be nice for the basic operations.
One big advantage of linear algebra for nim over julia is that nim supports shared memory parallelism (OpenMP). I guess it is no problem to use your library with such parallelism over the vectors/matrices. It would be very interesting to see some examples and how they fair against OpenMP in C.
The idea to switch the underlying implementation is awesome!
I can't stop thinking about the transpose operation. You wrote that it is done in constant time because there is no memory copied. Thus this is fast and dangerous. Is there a solution which is fast and safe? For example having two matrix types: mutable and immutable (two immutable types could point to the same memory)?
Or can we use the garbage collector for tracking this? If you want to modify an element by calling []=, then could we check whether there are more than one reference to that given memory location. If there are, then we must make a copy before modifying the values. Is it cheap and safe to check this?
Thanks, Peter
@jaak I have added OpenBLAS. I had some preliminary (very simple) benchmarks for matrix multiplication, against Numpy (Python) and Breeze (Scala), both using ATLAS, and MKL in C. There is essentially no significant difference, the main factor is the underlying Fortran library used. I will add more benchmarks when the thing is in more advanced shape, and I will also consider Julia (as well as Torch + Lua). If Julia also is based on standard BLAS implementations, anyway, I do not expect any significant difference.
@mora Of course Nim has var and let to track mutability :-) Operations that modify the underlying matrix are only allowed for var, so this should be enough in most cases. In cases where one wants to have two transposed matrices that are also mutable (I do not expect to be very common) I could provide a clone operation or something to that extent
Thank for the answer @andrea. Very quick update with OpenBLAS :)
It would be interesting to have some linear system solvers, like QR etc and eigenvalue solvers available. If you need any help, I think I can find time to lend a hand.
Yeah, that is filed in the TODO under "more operations". I first want to make sure that the representations are okay for, say, using rewrite rules, and mixing static and dynamic matrices, run on CUDA and so on. I plan to add operations after that, at the moment I am fighting with CUDA :-)
But if you want to help in this area, I would be very grateful. Feel free to ask here or on Github anything that is not clear, both in terms of design and implementation. It is still rather small, though, so it should not be very difficult to follow, especially starting from the tests.
@andrea: My idea is really not that important, feel free to ignore this comment. I understand the purpose of let and var, and also agree on a clone function. My idea is just to have copy-on-write for transposed matrices, they could point to the same memory address, and the memory would be copied when it is modified. It is not an easy task to do, that's why I tried to include GC as well. However, I'm not sure that it would work.
Again, awesome project!
Peter
A related, more common problem, is when one needs access to a row or a column of a matrix. At the moment, I am performing a copy, but this seems wasteful for the rather common case when one wants to iterate over rows without mutating them. I still did not figure out how to do this efficiently and safely with a GC.
Anyway, what I am doing for now is to add the clone operation, which is useful anyway. Let's discuss more in detail what makes sense for the transposed matrix in order to overcome the above issues.
I am contemplating on the choice of programming language for rewriting my quantum three-body code currently implemented in Java, and nim - I started looking towards it just yesterday - looks tempting. I deadly need a decent complex number implementation (that's why I am abandoning Java), including linear algebra. So, when talking BLAS/LAPACK, do we mean all the BLAS, including c* and z* subroutines?
Anyway, linear algebra is a must, so I hope to become one of your users/testers in the nearest months.
Yes, of course I also want to support c* and z*. The issue right now is that there is a decent amount of duplication to handle float64/float32 (that is, d* and s*). This in turn arises from the fact that I am not able (yet) to make things generic, while at the same time keeping the matrix size information in the type system.
I think this arises from some bug in the implementation of static[T], but I am not sure yet. I will post the exact issue in a separate thread and see if there is a nice way solve it.
Otherwise, I will still of course support complex operations, but it may take a little longer
Concerning the API, I am not sure if it is a good idea to use the same symbol '*' both for commutative and non-commutative multiplications. Suppose we have a code for optimizing algebraic expressions. If we make a distinction between the operations over commutative and non-commutative rings, such a piece of code still could be applicable to the expressions that involve matrices. We could also introduce commutative element-wise operations.
As I understand, working with diagonal and banded matrices falls into the category "More specialized BLAS operations"?
In general, there is some distinction between creating a wrapper for a standard library and developing a complete linear algebra infrastructure. It seems that you are actually pursuing the latter, more ambitious goal!
Yeah, my goal would be to make it possible to use Nim to run linear algebra operations, writing them in a style that would not be too distant from, say, Matlab, with compiler checks that the matrix sizes match, and with the possibility to run transparently on the GPU.
Said otherwise, a DSL that delegates the hard work to BLAS or cuBLAS.
It is pretty trivial to wrap BLAS as it stands, and in fact it is part of what I am doing, but I have no particular interest in doing it per se. As a end user, I would never want to write code using raw BLAS.
Actually, I would like to do something to work with neural networks, in the spirit of Torch, even if smaller in scope. The issue I have is that the two main competitors in the neural network arena are Torch (Lua) and Theano (Python). I have tried using both and settled on Torch for now. What I do not like is that Lua is dynamically typed, and this makes debugging a neural network much harder than it needs to be, at least for me. I would like to have something that works reasonably well in a typed language, and I am doing some experiments with Nim. This is also because many Torch libraries are actually wrappers over C libraries that would be usable from Nim as well.
Concerning the API, I plan to make some optimizations as rewrite rules into the library itself. I understand that one may want to distinguish commutative rings from noncommutative ones, but alas I do not think that this would be worth the loss in expressivity given by losing the standard mathematical notation