Hello, bellow is a zero copy transpose function for matrices. What I was trying to do is have a concept which satisfies Matrix or Transpose but also checks for the type, so later I can have a Matrix[T] type. Any help would be appreciated.
type
AnyMatrix = concept m, var mvar, type M
M.rows is int
M.cols is int
m[int, int] is float
mvar[int, int] = float
type T = t(M)
Matrix = object
rows, cols: int
data: seq[float]
Transposed = distinct Matrix
proc `[]`(m: Matrix, i, j: int): float {.inline.} =
m.data[i * m.cols + j]
proc `[]=`(m: var Matrix, i, j: int, v: float) {.inline.} =
m.data[i * m.cols + j] = v
template t(m: Matrix): Transposed =
Transposed(m)
proc `[]`(m: Transposed, i, j: int): float {.inline.} =
Matrix(m)[j, i]
proc `[]=`(m: var Transposed, i, j: int, v: float) {.inline.} =
Matrix(m)[j, i] = v
template t(m: Transposed): Matrix =
Matrix(m)
template rows(m: Transposed): int =
Matrix(m).cols
template cols(m: Transposed): int =
Matrix(m).rows
proc matrix(data: seq[float], m: int): Matrix =
result.rows = m
result.cols = if m != 0: data.len div m else: 0
result.data = data
proc `*`(a, b: AnyMatrix): Matrix =
result.cols = a.cols
result.rows = b.rows
newSeq(result.data, result.cols * result.rows)
for i in 0 ..< b.cols:
for j in 0 ..< a.rows:
var s = 0.0
for k in 0 ..< a.cols:
s += a[i, k] * b[k, j]
result[i, j] = s
var a = matrix(@[1.0, 2, 3, 4, 5, 6, 7, 8], 2)
var b = matrix(@[1.0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], 4)
var c = a.t * b
Here you go
type
Matrix = object
rows, cols: int
data: seq[float]
Transposed = distinct Matrix
AnyMatrix = Matrix or Transposed
Note that instead of transposed you probably want to use column-major or row-major (or C-order/ Fortran-order) because transposed is unclear about the data representation.
proc op(a: AnyMatrix, b: AnyMatrix): Matrix
could concepts be usefull in this case so I can restrict to only i.e. a: AnyMatrix[float] and b: AnyMatrix[float]?The manual mentions:
By default, during overload resolution each named type class will bind to exactly one concrete type. We call such type classes bind once types.
so what I am trying is not possible? Can someone confirm?
Generics are enough:
type
Matrix[T: SomeReal] = object
rows, cols: int
data: seq[T]
Transposed[T: SomeReal] = distinct Matrix[T]
AnyMatrix[T: SomeReal] = Matrix[T] or Transposed[T]
proc op[T:SomeReal](a, b: AnyMatrix[T]): AnyMatrix[T] = ...
Make sure you read the code of either Neo (and older linalg) or Arraymancer. Both Andrea and me like to constrain things at compile-time as much as possible and would probably have met most of the struggles you will have while wrapping a matrix library.
mratsim: the problem I'm having is:
Error: type mismatch: got <Transposed, Matrix>
when using the same as your last snippet. On the other hand splitting the types in the proc parameters would allow Matrix[Complex] * Matrix[float]
which we don't want.
The libraries you posted are very nice. I need lu factorization and eigenvalues. So I am porting Jama, which is old, but I don't know a better lib.
Note that instead of transposed you probably want to use column-major or row-major (or C-order/ Fortran-order) because transposed is unclear about the data representation.
In Neo transpose function copies the matrix. I was trying to imitate how gonum/matrix implements it
This should works. if you have a, b Nim expects both to resolve to the same AnyMatrix and the same T. An example is a, b: SomeInteger versus a: SomeInteger, b: SomeInteger
proc op[T:SomeReal](a: AnyMatrix[T], b: AnyMatrix[T]): AnyMatrix[T] = ...
Alternatively you can use something like:
proc op[T, U: AnyMatrix](a: T, b: U): T or U = ...
but since Nim doesn't support higher kinded types at the moment you cannot access the subtype of AnyMatrix naturally like
proc op[T: SomeReal, U,: AnyMatrix[T]](a: T, b: U): T or U = ...
it can be done with macro though. But try the first solution I think.
Edit:
Regarding the transpose, if your data is stored in a ref object/array or a shallow seq, Nim will just copy the reference and not the whole data.
mratsim: try the code in the beginning and you will see the same error I'm getting. Jama actually uses seq[seq[float]] for storage. But this representation will be better right?
Well here are some benchmarks I made to test different data storage and matrix multiplication algorithms.
type
MatrixB = object
data: seq[seq[float]]
m, n: int
Matrix = object
data: seq[float]
m, n: int
proc stdMatrixProduct(a, b: MatrixB): MatrixB =
result.m = a.m
result.n = b.n
newDataB()
for i in 0 ..< b.n:
for j in 0 ..< a.m:
var s = 0.0
for k in 0 ..< a.n:
s += a.data[i][k] * b.data[k][j]
result.data[i][j] = s
proc optMatrixProduct(a, b: MatrixB): MatrixB =
result.m = a.m
result.n = b.n
newDataB()
var b_colj = newSeq[float](a.n)
for j in 0 ..< b.n:
for k in 0 ..< a.n:
b_colj[k] = b.data[k][j]
for i in 0 ..< a.m:
var a_rowi = unsafeAddr a.data[i]
var s = 0.0
for k in 0 ..< a.n:
s += a_rowi[k] * b_colj[k]
result.data[i][j] = s
results:
8.649us naive data storage - standard ijk algorithm
1.329us naive - algorithm
2.456us packed - standard
2.675us packed - optimized
2.441us packed - strided iteration
seq[seq[float]] is the fastest!
You allocated a wholly new seq in optMatrixProduct, no wonder why it's slower than unoptimized version. By the way: have you tried changing the representation of the matrix before multiplication (so that for k in 0 ..< a.n: a.data[i,k] and b.data[k,j] are both linear in memory)? As far as I know it should give it a nice boost.
seq[seq[float]] plays nice with changing matrix in-place, e.x. reordering the rows (or columns, depending on interpretation). It can be pretty useful, I agree.
You shouldn't compare on Matrix Multiplication with triple for-loops, that's the slowest way to do matrix multiplication.
In short, yes the best memory representation for 2D-only matrices is seq[float] like what Neo and you are doing.
If it's just to solve linear equations Neo has a solve function and I added least_squares equation solver to Arraymancer.
If it's for learning use seq[float] instead of the seq of seq double indirection. You can follow JAMA code then.
If it's for learning how to write fast matrix libraries, use seq[float] too and have a look at BLAS and LAPACK and their nim wrapper nimblas and nimlapack. You will end up writing a lot of glue code though.
Edit note: for a good boost while keeping a simple triple for-loop, you can transpose a matrix like in this Nim benchmark.
Long answer
Matrix multiplication is a very memory intensive algorithm with n^3 operations on n^2 data and what counts is keeping those n^2 in L1 and L2 cache as long as possible.
For speed, you should use a proper BLAS library to implement matrix multiplication, both Neo and Arraymancer use nimblas for that, it's a wrapper for OpenBLAS, Intel MKL or Apple Accelerate. If you're interested in performance for matrices here is a good start on memory layout.
If you really want to implement fast matrix multiplication by hand, you can follow this tutorial in C and Assembly. The main takeaway is that keeping your data hot in cache trumps everything.
I followed it to implement fast matrix multiplication on integer (BLAS libs are float only) and achieved 5x the speed of Numpy (10x on dual core) and 2.2x the speed of Julia (4.4x on dual core) triple for loop on a 1500x1500 matrices. This grows at a rate of n^3 so on 3000x3000 it would be 125x faster. Code is here.
Also I had a look at Jama code, it is good for teaching or as a reference implementation/specification but it is not built for speed:
Udiknedormin: Actually this code is the fastest (when used with right data structure of course) You think the Jama creators didn't think about allocations :P, that's why it creates a temporary sequence only once instead of a whole new matrix like the kostya/matmul.nim mratsim posted.
mratsim: thanks for the links, I have already done a (bad, incomplete) port I want to ask, if you know other modern libraries that deal with numerical computations which could be ported easily to nim? I don't care about speed I want the functionality.
@planhths
Yes, I actually overlooked where it happens. Now I see what it does is essentially partial transposition (I mentioned transposition can give nice results before, it also plays nicely with parallelization).
But did you really called THAT one "naive - algorithm"? :-o Doesn't really look naive to me. Did you try the same method with seq[float] representation?