nimforum mirror - Arraymancer - v0.4.0 (May 2018)

mratsim (orginal) [2017-07-05T22:15:22+02:00] view original

As a data scientist, I feel that Nim has tremendous potential for data science, machine learning and deep learning.

In particular, it's currently non-trivial to bridge the gap between deep learning research (mostly Python and sometimes Lua) and production (C for embedded devices, javascript for web services ...).

For the past 3 months I've been working on Arraymancer, a tensor library that currently provides a subset of Numpy functionality in a fast and ergonomic library. It features:

Creating tensors from nested sequences and arrays (even 10 level of nesting)

Pretty printing of up to 4D tensors (would need help to generalize)

Slicing with Nim syntax

Slices can be mutated

Reshaping, broadcasting, concatenating tensors. Also permuting their dimensions.

Universal functions

Accelerated matrix and vector operations using BLAS

Iterators (on values, coordinates, axis)

Aggregate and statistics (sum, mean, and a generic aggregate higher order function)

Next steps (in no particular order) include:

adding CUDA support using andrea's nimcuda package

adding Neural Network / Deep Learning functions

Improving the documentation and adding the library on Nimble

The library: https://github.com/mratsim/Arraymancer

I welcome your feedback or expected use case. I especially would love to know the pain points people have with deep learning and putting deep learning models in production.

cmacmackin (orginal) [2017-07-08T10:03:26+02:00] view original

I've been following this for a while on GitHub and I think it is a very impressive project. Nim would be a great language for scientific computing, but it needs to have the numerical libraries and this is an excellent first step in creating them.

A couple of questions. First, are you planning to add neural network functionality directly to Arraymancer? Surely that would be something better suited for a separate, specialised library? A second, more general, question I have is whether you'd consider making the get_data_ptr proc public. It would be nice to be able to integrate your tensors with wrappers for existing numerical software written in C and we'd need access to the raw data for that.

mratsim (orginal) [2017-07-08T11:04:38+02:00] view original

get_data_ptr is now public ;).

For now, I will add the neural network functionality directly in Arraymancer.

The directory structure will probably be:

src/arraymancer ==> core Tensor stuff

src/autograd ==> automatic gradient computation (i.e. Nim-rmad ported to tensors)

src/neuralnet ==> neural net layers

This mirrors PyTorch's tree

I made this choice for the following reasons:

It's easier for me to keep track of one repo, refactor code, document and test.

I'm focusing on deep learning

It's much easier to communicate about one single package (and attracts new people to Nim ;) ).

Data scientists are used to have deep learning in a single package (tensor + neural net interface): Tensorflow, Torch/PyTorch, Nervana Neon, MxNet ...

Nim's DeadCodeElim will ensure that unused code will not be compiled.

If the tensor part (without the NN) get even 0.1% of Numpy popularity and people start using it in several packages that means:

It's a rich man problem!

We get new devs and input for scientific/numerical Nim.

We can reconsider splitting as we will know actual expectations.

We can even build a "scinim" community which drives all key scientific nim packages.

In the mean time I think it's best if I do what is easier for me and worry about how to scale later.

bluenote (orginal) [2017-07-15T10:18:48+02:00] view original

A late reply because I was hoping to dive into this a bit deeper before replying. But due to lack of time, a high-level feedback must suffice: This looks awesome!

I completely agree with your observation that there is a gap between developing prototypes e.g. in Python and bringing them into production -- not only in deep learning, but data science in general. And I also think that Nim's feature set would be perfect to fill this gap.

A quick question on using statically-typed tensors: I assume that this implies that the topolgy of a network cannot be dynamic at all? I'm wondering if there are good work-arounds to situations where dynamic network topologies are required, for instance when a model wants to choose its number of hidden layer nodes iteratively, picking the best model variant. Are dynamically typed tensors an option or would that defeat the design / performance?

mratsim (orginal) [2017-07-16T16:08:25+02:00] view original

The only static parts of the Tensor types are the Backend (Cpu, CUDA, ...) and the internal type (int32, float32, object ...).

The network topology will be dynamic and using dynamic graphs more akin to PyTorch/Chainer/DyNet than Theano/Tensorflow/Keras.

My next step is to build an autograd so people only need to implement the forward pass, backpropagation will be automatic. For this part I'm waiting for VTable.

PS: I think NimData is great too, Pandas seems like a much harder beast!

mratsim (orginal) [2017-09-24T19:49:00+02:00] view original

I am very excited to announce the second release of Arraymancer which includes numerous improvements blablabla ...

Without further ado:

Communauty
- There is a Gitter room!

Breaking
- shallowCopy is now unsafeView and accepts let arguments
- Element-wise multiplication is now .* instead of |*|
- vector dot product is now dot instead of .*

Deprecated
- All tensor initialization proc have their Backend parameter deprecated.
- fmap is now map
- agg and agg_in_place are now fold and nothing (too bad!)

Initial support for Cuda !!!
- All linear algebra operations are supported
- Slicing (read-only) is supported
- Transforming a slice to a new contiguous Tensor is supported

Tensors
- Introduction of unsafe operations that works without copy: unsafeTranspose, unsafeReshape, unsafebroadcast, unsafeBroadcast2, unsafeContiguous
- Implicit broadcasting via .+, .*, ./, .- and their in-place equivalent .+=, .-=, .*=, ./=
- Several shapeshifting operations: squeeze, at and their unsafe version.
- New property: size
- Exporting: export_tensor and toRawSeq
- reduce and reduce on axis

Ecosystem:
- I express my deep thanks to @edubart for testing Arraymancer, contributing new functions, and improving its overall performance. He built arraymancer-demos and arraymancer-vision, check
those out you can load images in Tensor and do logistic regression on those!

Also thanks to the Nim communauty on IRC/Gitter, they are a tremendous help (yes Varriount, Yardanico, Zachary, Krux).
I probably would have struggled a lot more without the guidance of Andrea's code for Cuda in his neo and nimcuda library.

And obviously Araq and Dom for Nim which is an amazing language for performance, productivity, safety and metaprogramming.

dataPulverizer (orginal) [2017-09-24T20:37:17+02:00] view original

This looks like a very useful library for me. I shall certainly be checking it out. Nice one!

mratsim (orginal) [2017-12-14T00:50:20+01:00] view original

Arraymancer v0.3.0 Dec. 14 2017

Finally after much struggles, here is Arraymancer new version. Available now on Nimble. It comes with a new shiny doc (thanks @flyx and NimYAML doc): https://mratsim.github.io/Arraymancer

Changes:

Very Breaking

Tensors uses reference semantics now: let a = b will share data by default and copies must be made explicitly.

There is no need to use unsafe proc to avoid copies especially for slices.

Unsafe procs are deprecated and will be removed leading to a smaller and simpler codebase and API/documentation.

Tensors and CudaTensors now works the same way.

Use clone to do copies.

Arraymancer now works like Numpy and Julia, making it easier to port code.

Unfortunately it makes it harder to debug unexpected data sharing.

Breaking (?)

The max number of dimensions supported has been reduced from 8 to 7 to reduce cache misses. Note, in deep learning the max number of dimensions needed is 6 for 3D videos: [batch, time, color/feature channels, Depth, Height, Width]

Documentation

Documentation has been completely revamped and is available here: https://mratsim.github.io/Arraymancer/

Huge performance improvements

Use non-initialized seq

shape and strides are now stored on the stack

optimization via inlining all higher-order functions
apply_inline, map_inline, fold_inline and reduce_inline templates are available.

all higher order functions are parallelized through OpenMP

integer matrix multiplication uses SIMD, loop unrolling, restrict and 64-bit alignment

prevent false sharing/cache contention in OpenMP reduction

remove temporary copies in several proc

runtime checks/exception are now behind unlikely

A*B + C and C+=A*B are automatically fused in one operation

do not initialize result tensors

Neural network:

Added linear, sigmoid_cross_entropy,

softmax_cross_entropy layers

Added Convolution layer

Shapeshifting:

Added unsqueeze and stack

Math:

Added min, max, abs, reciprocal, negate and in-place mnegate and mreciprocal

Statistics:

Added variance and standard deviation

Broadcasting

Added .^ (broadcasted exponentiation)

Cuda:

Support for convolution primitives: forward and backward

Broadcasting ported to Cuda

Examples

Added perceptron learning xor function example

Precision

Arraymancer uses ln1p (ln(1 + x)) and exp1m procs (exp(1 - x)) where appropriate to avoid catastrophic cancellation

Deprecated

Version 0.3.1 with the ALL deprecated proc removed will be released in a week. Due to issue https://github.com/nim-lang/Nim/issues/6436, even using non-deprecated proc like zeros, ones, newTensor you will get a deprecated warning.

newTensor, zeros, ones arguments have been changed from zeros([5, 5], int) to zeros[int]([5, 5])

All unsafe proc are now default and deprecated.

mratsim (orginal) [2018-05-05T19:21:38+02:00] view original

The new version of Arraymancer, v0.4.0 "The Name of the Wind" is live today. here is the changelog:

Core:
- OpenCL tensors are now available! However Arraymancer will naively select the first backend available. It can be CPU, it can be GPU. They support basic and broadcasted operations (Addition, matrix multiplication, elementwise multiplication, ...)
- Addition of an argmax and argmax_max procs.

Datasets:
- Loading the MNIST dataset from http://yann.lecun.com/exdb/mnist/
- Reading and writing from CSV

Linear algebra:
- Least squares solver
- Eigenvalues and eigenvectors decomposition for symmetric matrices

Machine Learning
- Principal Component Analysis (PCA)

Statistics
- Computation of covariance matrices

Neural network
- Introduction of a short intuitive syntax to build neural networks! (A blend of Keras and PyTorch).
- Maxpool2D layer
- Mean Squared Error loss
- Tanh and softmax activation functions

Examples and tutorials
- Digit recognition using Convolutional Neural Net
- Teaching Fizzbuzz to a neural network

Tooling
- Plotting tensors through Python

Several updates linked to Nim rapid development and several bugfixes.

Thanks:

Bluenote10 for the CSV writing proc and the tensor plotting tool

Miran for benchmarking

Manguluka for tanh

Vindaar for bugfixing

Every participants in RFCs

And you user of the library.

qqtop (orginal) [2018-05-06T11:18:28+02:00] view original

Congratulations !

I especially like the neural network examples and hope more

will be forthcoming.

Araq (orginal) [2019-01-14T11:12:00+01:00] view original

@mratsim: If everything is a (pointer, len) pair, who do you deal with the ownership problems?

mratsim (orginal) [2019-01-16T15:25:34+01:00] view original

There is a memowner bool in the type. If it's false, it will not deallocate memory when not referenced.

const LASER_MAXRANK*{.intdefine.} = 6

type DynamicStackArray*[T] = object
    data*: array[LASER_MAXRANK, T]
    len*: int

type
  RawImmutableView*[T] = distinct ptr UncheckedArray[T]
  RawMutableView*[T] = distinct ptr UncheckedArray[T]
  
  Metadata* = DynamicStackArray[int]
  
  Tensor*[T] = object                    # Total stack: 128 bytes = 2 cache-lines
    shape*: Metadata                     # 56 bytes
    strides*: Metadata                   # 56 bytes
    offset*: int                         # 8 bytes
    storage*: CpuStorage[T]              # 8 bytes
  
  CpuStorage*{.shallow.}[T] = ref object # Total heap: 25 bytes = 1 cache-line
    when supportsCopyMem(T):
      raw_data*: ptr UncheckedArray[T]   # 8 bytes
      memalloc*: pointer                 # 8 bytes
      memowner*: bool                    # 1 byte
    else: # Tensors of strings, other ref types or non-trivial destructors
      raw_data*: seq[T]                  # 8 bytes (16 for seq v2 backed by destructors?)

juancarlospaco (orginal) [2019-01-16T15:31:24+01:00] view original

Intro to Tensors https://github.com/juancarlospaco/nim-presentation-slides/tree/master/ejemplos/avanzado/tensorflow#intro-to-tensors

mratsim (orginal) [2019-07-19T02:22:05+02:00] view original

I've released a new stable version of Arraymancer v0.5.1.

The number of changes is small in number but great in quality:

https://github.com/mratsim/Arraymancer/releases/tag/v0.5.1

Changes affecting backward compatibility:

None

Changes:

0.20.x compatibility (commit 0921190)

Complex support

Einsum / einstein summation: This allows using C|i, j] = A[i,k] * B[k,j] to specify

for i in 0 ..< A.shape[0]:
    for k in 0 ..< A.shape[1]:
      for j in 0 ..< B.shape[1]:
        C[i,j] += A[i,k] * B[k,j]

Naive whitespace tokenizer for Natural Language Processing

Preview of the Laser backend for matrix multiplication 5x faster than before (without SIMD autodetection):

Fix:

Fix height/width order when reading an image in tensor

Thanks to @chimez for the complex support and updating for 0.20, @metasyn for the tokenizer, @xcokazaki for the image dimension fix and @Vindaar for the einsum implemention

mratsim (orginal) [2020-01-09T01:36:22+01:00] view original

Couldn't render post #35868.

Libman (orginal) [2020-01-10T20:29:06+01:00] view original

Very impressive matmul results in Kostya's benchmarks.

Would be even better if s/Apache/MIT/ license...

mratsim (orginal) [2020-01-10T21:11:37+01:00] view original

I didn't even know it was in Kostya, but technically just like Julia, Numpy, and Lubeck (D) the current version of Arraymancer is just using BLAS behind (which are 95% C)

That said I do get similar result in a pure Nim BLAS, with both OpenMP-based or with Weave-based threading:

code: https://github.com/mratsim/weave/tree/3bfb6416/benchmarks/matmul_gemm_blas. OpenMP is "laser_omp_gemm" and Weave (Pthread/Windows Fibers) is weave_gemm

Regarding license, point taken. Most of my new libs are Status-Style, dual licensed Apache+MIT:

Synthesis

Weave

cumulonimbus (orginal) [2020-01-29T12:11:43+01:00] view original

I have randomly wondered onto https://github.com/flame/blis#key-features today and haven't had a chance to play with it, but it sure looks interesting; Are you familiar with it? Obviously less relevant if you're going to do a pure Nim BLAS ...

mratsim (orginal) [2020-01-29T13:15:21+01:00] view original

Yes I'm very familiar with BLIS and my own BLAS is implemented following the BLIS paper http://www.cs.utexas.edu/users/flame/pubs/blis3_ipdps14.pdf

Actually some BLIS developers are also aware of my own efforts:

https://github.com/pytorch/pytorch/issues/26534#issuecomment-537606608

including Weave and Arraymancer: https://github.com/flame/blis/issues/352#issuecomment-570579138

Mirror of forum.nim-lang.org

3029 :: Arraymancer - v0.4.0 (May 2018)