nimforum mirror - Linear algebra library

andrea (orginal) [2016-01-02T23:20:15+01:00] view original

Hi, version 0.2.1 of my linear algebra library is published!

I have added support for matrices and vectors whose size is not tracked at compile time, and they integrate nicely with those of static size. I also found some ways to better leverage the Nim type inference, in order to improve the public API. Finally, I added universal functions, that is, functions that are applied element-wise to vectors and matrices.

When I am back to work - hence I have my Nvidia card - I plan to make support for GPU types on par with those on the CPU.

;-)

mogu (orginal) [2016-01-04T02:31:07+01:00] view original

Nice work!!!

filcuc (orginal) [2016-01-04T02:36:47+01:00] view original

Wow..Unicredit use Nim...i didnt expext such a new tool to be used by a conservative reality like a bank (in Italy) :D

com (orginal) [2016-01-04T16:16:40+01:00] view original

Look forward to trying to this library. Is there support for integer matrices? This could be quite useful for certain applications. Numpy has this ability.

andrea (orginal) [2016-01-04T16:45:47+01:00] view original

Not yet. If you look here there are a LOT of things that need to be done.

One of those is Add a fallback Nim implementation, that is valid over other rings, since BLAS only works for floating point computations.

If you want to help, though, I can give you some directions ;-)

com (orginal) [2016-01-10T20:34:07+01:00] view original

I'd really like to help, though I'm a bit busy at the moment writing a paper for my PhD. After that's done, I should be able to help. I'm slowly putting together a bioinformatics package for Nim, which will likely require a linear algebra library. I actually started a "Numpy-like" matrix library for Nim, but got busy and had to drop it for a while. Since your library is more complete than mine (and uses BLAS or ATLAS), I'd be inclined to contribute to your project instead of developing mine further.

andrea (orginal) [2016-01-10T22:50:30+01:00] view original

Please, let me know in case, I'd be very glad to help you get started! I have added a tentative list of things to do in the README, but I am also open to suggestions :-)

lqdc13 (orginal) [2016-01-11T22:01:53+01:00] view original

How do you guys use the library without support for matrix file parsing? I have this really simple/hackish space separated numbers format that I came up with, but it would be nice to have something that opens compressed matrix files quickly. Perhaps we could add something that parses "*.npz" files since that seems to be the most commonly used format.

andrea (orginal) [2016-01-11T23:51:16+01:00] view original

Hi @lqdc13, I have never needed to work with matrices stored in files. I would be very happy to support the most common file formats, but I doubt it will happen very soon. Would you like to help on this front? ;-)

lqdc13 (orginal) [2016-01-13T10:26:09+01:00] view original

I would love to but not immediately. Looks like there is already a zip library, so it won't be too bad.

Here's the reference implementation: https://github.com/numpy/numpy/blob/v1.10.1/numpy/lib/npyio.py#L257

Krux02 (orginal) [2017-06-07T18:45:34+02:00] view original

Value types have a good advantage, but the problem is, that when you make them too big, they could overflow the stack (iirc 2MB on Windows). But you should use value type arguments, because a ref type can always be converted to a value type with the [] operator.

proc foo1(arg: Matrix): Matrix

var matref1, matref2: ref Matrix

matref2[] = foo1(matref1[])

The other direction is not as easy. The only problem is that the result will be a new allocation, no matter if you use ref types or not. So I suggest to also provide an api that has result arguments

proc foo1(dst: var Matrix, arg: Matrix): void

proc foo1(arg: Matrix): Matrix =
  result.foo1(arg)

This allows to reuse temporary matrices more efficiently.

proc foo2(arg: ref Matrix):

andrea (orginal) [2017-06-08T14:25:46+02:00] view original

The fact is, matrices and vectors already contain a seq, which is allocated on the heap (at least in the common case), so there is not much more cost in adding an additional ref.

The only case where avoiding a ref would actually save time is when many matrices or vectors refer to the same underlying seq, but point at different places. A common case is iterating over the rows or columns of a matrix.

I have decided to just optimize this case for now, by reusing the same ref, and therefore avoiding allocations (well, there is just a single allocation for the whole iteration)

mratsim (orginal) [2017-06-08T20:27:21+02:00] view original

I've been scratching my head a lot for my tensor library between using the default seq value semantics or ref/shallowCopy.

For now I'm using the default seq value semantics at least until benchmarks or usage shows that it poses a memory/performance issue.

Here are the relevant Nim quotes:

From Araq: "Parameter passing doesn't copy, var x = foo() doesn't copy but moves let x = y doesn't copy but moves, var x = y does copy but I can use shallowCopy instead of = for that."

From Jehan: "First, it's important to understand that most of the time, you won't need shallowCopy at all. Copying is shallow by default if (1) the left-hand side of an assignment is a let variable or (2) the right-hand side is a function call."

perturbation2 (orginal) [2017-06-09T05:59:07+02:00] view original

One thing that I would argue for with any new linear algebra library is some sort of 'out' parameter, like numpy has for its operations that produce a vector. In an inner loop of a project using linalg, I had an operation like:

# simplified
const word2vec_size = 300
for i in 0 .. <NUM_STEPS:
  var oldVector: Vector64[word2vec_size] += someOtherVector

when I changed this to:

const word2vec_size = 300
for i in 0 .. <NUM_STEPS:
  var oldVector: Vector64[word2vec_size] += learning_rate * someOtherVector

I had an immediate slowdown - which (duh, my fault) was because in other to scale 'someOtherVector' and add it to the original, a temporary vector had to be created that held the results of scaling it by the learning rate. The easiest way of working around was to do some calls into nimblas that gave a pre-allocated vector to store the result of vector-scalar multiplication. Having two multiply procs (with/without 'out') or one with an optional 'out' parameter like:

import nimblas

proc multiply*(scale_val: float, input: VectorType; output: VectorType = nil): VectorType =
  if output == nil:
     output = newVectorTypeLike(input)
  
  copy(len(input), input.fp, 1, output.fp, 1)
  scal(len(input), scale_val, output.fp, 1)
  
  # VectorType has to be reference type so that this doesn't result in a copy
  result = output

would really help users avoid allocations.

andrea (orginal) [2017-06-09T09:20:13+02:00] view original

@perturbation2

This is actually something that would help a lot in some portions. Users should not need to call BLAS manually, although that is always an option. It helps that in Neo I make al fields public - this is not very safe, but at least if a user needs to access a raw pointer in situations such as the above one, it is there.

That said, I could not find the time to expose two versions of each operation yet. I may eventually get to it, but if you want to contribute, a PR is always welcome!

Even better, there should be rewrite rules that match your slow call and transform it into the BLAS version without intermediate results. I will add some feature that lets me test whether rewrite rules are being applied, and then start adding some of the most common ones

Araq (orginal) [2017-06-09T09:29:35+02:00] view original

Well rewrite rules need to be able to target these var result operations (as I call them) so they need to exist anyway.

This is btw the reason result is not a keyword, it allows easy refactorings from

proc foo(): T =
  result = T()

proc foo(result: var T) =
  result = T()

;-)

Araq (orginal) [2017-06-09T09:36:37+02:00] view original

In fact, this problem is so severe (every proc returning a string is affected) I thought about more language support for it:

Every routine foo(result: var T; args) can be used as an expression of type T via the transformation into (var res = default(T); foo(res, args); res)

andrea (orginal) [2017-06-09T09:43:36+02:00] view original

Sure, rewrite rules have to be written, and this is an ongoing work. I just added a way to test that they are in fact applied, since otherwise it is easy to get regressions.

Your suggestions for out parameters is neat, although I would not be able to use it directly, since many operations are operators in these libraries, and that form would not work for the versions with more parameters

Araq (orginal) [2017-06-09T09:50:52+02:00] view original

Operators would work like anything else, but it might be ugly:

proc `+`(result: var T; x, y: T) = ...

`+`(res, a, b) # optimized call; store into 'res'

echo x + b # uses temporary when used as expression

Mirror of forum.nim-lang.org

1918 :: Linear algebra library