The following code is a very preliminary attempt at developing an autodiff library in Nim (at least for now, mainly to learn about Nim's powerful template and also possibly macro constructs).
type AddOp = object
type SubOp = object
type MulOp = object
type DivOp = object
type
DualExpr = concept x
isDual(x)
Operator = concept x
isOp(x)
Dual[T] = object
val: T
grad: T
DualUnaryExpr[Op: Operator, X: DualExpr] = object
x: X
DualBinaryExpr[Op: Operator, L, R: DualExpr] = object
l: L
r: R
type dual = Dual[float]
template isDual[T](x: T): bool = false
template isDual[T](x: Dual[T]): bool = true
template isDual[Op, X](x: DualUnaryExpr[Op, X]): bool = true
template isDual[Op, L, R](x: DualBinaryExpr[Op, L, R]): bool = true
template isOp[T](x: T): bool = false
template isOp(x: AddOp): bool = true
template isOp(x: SubOp): bool = true
template isOp(x: MulOp): bool = true
template isOp(x: DivOp): bool = true
template `+`[L, R](l: L, r: R) =
DualBinaryExpr[AddOp, L, R](l: l, r: r)
var x = dual(val: 1.0, grad: 1.0)
var y = dual(val: 2.0, grad: 0.0)
var z = x + y
echo isDual(1.0)
echo isDual(x)
echo isDual(y)
echo isDual(z)
I get the following output (https://play.nim-lang.org/#ix=1TVX):
Hint: used config file '/nim/config/nim.cfg' [Conf]
Hint: system [Processing]
Hint: widestrs [Processing]
Hint: io [Processing]
Hint: in [Processing]
/usercode/in.nim(45, 11) template/generic instantiation of `+` from here
/usercode/in.nim(40, 32) Error: undeclared field: 'x'
What am I missing here?
I want eventually to be able to make autodiff computations such that an expression tree is created at compile time, and the assignment operator consumes this expression tree in an optimal way (e.g., avoiding temporaries, optimizing the mathematical operations, etc.).
The template is substituting both sides of the field initialization because you use the same parameter names as field names. This is the expanded expression:
DualBinaryExpr[AddOp, L, R](x: x, y: y)
You just have to rename the params (and give it a return type, but a separate issue).
template `+`[L, R](lhs: L, rhs: R) =
DualBinaryExpr[AddOp, L, R](l: lhs, r: rhs)
This allows for code like:
type
Foo = object
a, b: int
template consructWith(field: untyped, val: untyped): Foo =
Foo(field: val)
echo consructWith(a, 1)
Compile-Time autodifferentiation is something that I'm also very interested in and will add to my deep learning compiler project Laser/Lux
I suggest you have a look at my experiments on compile-time computation graphs at https://github.com/mratsim/compute-graph-optim
I go through progressive step by step experiments to build a compile-time graph, macros start simple (no macros, pure generics) and grow very complicated.
I've looked also in a couple of representations: object algebras and tagless typed encoding before settling on a combined deep embedding using ADTs + shallow embedding using function built on deep embedding, as suggested in this paper. My full research on computation graphs representations is available here.
Ultimately, I've implemented a proof-of-concept compiler in Nim macros using the following tree representation which is very similar to Nim macros and that I've rewritten 3 times already.
As I want both compile-time and JIT code generation in the future (including autodiff) and I want to reuse the data structure, my design a tiny bit more complex because I can't use generics/concepts at runtime. Instead I have my own AST and compile-time symbolic function evaluation that would mirror runtime function evaluation + codegen from the AST tree.
A couple comments:
You have taken the object algebra approach, this means that all operations will be encoded in the generic type system, this has the following limitations:
Many computation graphs library in other languages uses the Visitor Pattern, you can't do that at compile-time in Nim as inheritance+methods don't work at compile-time. The preferred way (and the way Nim compiler and macros are implemented) is to use ADTs/object variants.
@jasper Thanks a lot. The following worked (I had to add the auto return type as you mentioned):
template `+`[L, R](lhs: L, rhs: R): auto =
DualBinaryExpr[AddOp, L, R](l: lhs, r: rhs)
@mratsim Your tensor lib Lux seems quite impressive. My idea is to have an autodiff library in Nim similar to this one I developed in C++: autodiff.github.io . It relies heavily on template metaprogramming, if constexpr, SFINAE, etc., to optimize the expression tree (whenever possible) and avoid temporaries.
There are forward and reverse mode automatic differentiation algorithms in there. My research needs, however, relies mainly (if not only) in forward mode; so, forward mode in autodiff lib is what I have spent most of my available time for this dev.
I'm quite new to Nim; just started exploring it yesterday and I found it quite neat. Right now, just want to experiment with it and check if it could be used for advanced scientific computing (for problems that involve numerical optimization, solving ODEs, PDEs, etc.).
Nim provides an easy way to optimize compile-time expressions via term-rewriting templates (macros) and can be used to rewrite ax + b into fused-multiply-add, exp(x) - 1 into expm1 or ln(x + 1) into ln1p (log1p in <math.h>). They only work if the expression is done in a single line though so the techniques you used in C++ are probably more general.
In terms of need, I only use reverse-mode autodiff so no overlap here. Actually reverse-mode autodifferentiation was how I started Nim (https://github.com/mratsim/nim-rmad) after doing some Project Euler with it.
For collaboration you might want to talk to Hugo Granstrom, who wrote some ODEs in Nim (https://github.com/HugoGranstrom/numericalnim ), I'm mostly working on Machine Learning so I only need reverse-mode autodiff. I guess given the growing number of scientists in Nim (a couple are in bio as well), maybe a IRC+Gitter+Discord+Matrix (whatever is popular) channel for Nim/science might be interesting.
Regarding what I'm planning:
Laser started as a research into improving Arraymancer backend. I have a couple of issues with Arraymancer backend:
The reasons are multiple:
So I started Laser with small goals:
but then I decided to write a DSL+compiler, inspired by the Halide DSL for image processing and their gradient Halide extension for auto-differentiation:
In Lux repo there are several documents that goes into details of:
This is all very instructive; thanks a lot. I'm wondering if you have benchmarked your nim codes with, say, C/C++ alternatives (or even a simplified C++ code you wrote yourself). Dealing with GC is something new to me; I'm concerned how this would impact numerical simulations, for example, if dynamically allocated vectors/matrices are everywhere in the code.
I had a look at the term-rewriting templates; I don't think this is what I want to use (although it could complement with the approach I' trying, based on objects that represent expression nodes). You've said They only work if the expression is done in a single line though so the techniques you used in C++ are probably more general. I'm wondering if there is a way to achieve a similar effect with Nim then? Wouldn't general template methods permit this (not the term-rewriting templates)?
Nim implementation: https://github.com/numforge/laser/blob/2f619fdbb2496aa7a5e5538035a8d42d88db8c10/laser/primitives/matrix_multiplication/gemm.nim
Note that the temporary buffers are managed by GC (ref object): https://github.com/numforge/laser/blob/2f619fdbb2496aa7a5e5538035a8d42d88db8c10/laser/primitives/matrix_multiplication/gemm_tiling.nim#L251
In short if you don't use a ref/seq/string (GC-ed types) in a critical path, the GC won't trouble you.
Weave is as fast as the original C code and faster than both GCC/Clang OpenMP and Intel Threads Building Blocks.
Last example, raytracing: see https://forum.nim-lang.org/t/5124#32243 to get Nim as fast as the original C code.
I.e. everything you learned about C++ performance can apply to Nim (just replace std::vector by seq). If you write Nim code like C code, you get C speed.
Really impressive. You wrote your own optimized gemm in Nim and this is on par with the big shot BLAS implementations around.
Do you happen to have implementation for (or Nim bindings to) dense linear solvers (e.g., partial pivoting LU)? What do you recommend me to use for this in Nim? I have seen this library called Neo; I wonder if it is actively developed (last commit is from 6 months).
Neo's author @andreaferretti is still around but with much less time than before.
He maintains nimlapack for LAPACK which is the standard Fortran/C interface for linear algebra routine.
It's a super low-level interfacewhere you need to pass all temporary buffer so a bit annoying to use, here are examples on how I dealt with: