nimforum mirror - Undeclared field error when using template

aleal (orginal) [2019-08-31T21:08:01+02:00] view original

The following code is a very preliminary attempt at developing an autodiff library in Nim (at least for now, mainly to learn about Nim's powerful template and also possibly macro constructs).

type AddOp = object
type SubOp = object
type MulOp = object
type DivOp = object

type
  
  DualExpr = concept x
    isDual(x)
  
  Operator = concept x
    isOp(x)
  
  Dual[T] = object
    val: T
    grad: T
  
  DualUnaryExpr[Op: Operator, X: DualExpr] = object
    x: X
  
  DualBinaryExpr[Op: Operator, L, R: DualExpr] = object
    l: L
    r: R

type dual = Dual[float]

template isDual[T](x: T): bool = false
template isDual[T](x: Dual[T]): bool = true
template isDual[Op, X](x: DualUnaryExpr[Op, X]): bool = true
template isDual[Op, L, R](x: DualBinaryExpr[Op, L, R]): bool = true

template isOp[T](x: T): bool = false
template isOp(x: AddOp): bool = true
template isOp(x: SubOp): bool = true
template isOp(x: MulOp): bool = true
template isOp(x: DivOp): bool = true


template `+`[L, R](l: L, r: R) =
  DualBinaryExpr[AddOp, L, R](l: l, r: r)


var x = dual(val: 1.0, grad: 1.0)
var y = dual(val: 2.0, grad: 0.0)
var z = x + y

echo isDual(1.0)
echo isDual(x)
echo isDual(y)
echo isDual(z)

I get the following output (https://play.nim-lang.org/#ix=1TVX):


Hint: used config file '/nim/config/nim.cfg' [Conf]
Hint: system [Processing]
Hint: widestrs [Processing]
Hint: io [Processing]
Hint: in [Processing]
/usercode/in.nim(45, 11) template/generic instantiation of `+` from here
/usercode/in.nim(40, 32) Error: undeclared field: 'x'

What am I missing here?

I want eventually to be able to make autodiff computations such that an expression tree is created at compile time, and the assignment operator consumes this expression tree in an optimal way (e.g., avoiding temporaries, optimizing the mathematical operations, etc.).

jasper (orginal) [2019-09-01T09:28:54+02:00] view original

The template is substituting both sides of the field initialization because you use the same parameter names as field names. This is the expanded expression:

DualBinaryExpr[AddOp, L, R](x: x, y: y)

You just have to rename the params (and give it a return type, but a separate issue).

template `+`[L, R](lhs: L, rhs: R) =
  DualBinaryExpr[AddOp, L, R](l: lhs, r: rhs)

This allows for code like:

type
  Foo = object
    a, b: int
template consructWith(field: untyped, val: untyped): Foo =
  Foo(field: val)
echo consructWith(a, 1)

mratsim (orginal) [2019-09-01T12:15:01+02:00] view original

Compile-Time autodifferentiation is something that I'm also very interested in and will add to my deep learning compiler project Laser/Lux

I suggest you have a look at my experiments on compile-time computation graphs at https://github.com/mratsim/compute-graph-optim

I go through progressive step by step experiments to build a compile-time graph, macros start simple (no macros, pure generics) and grow very complicated.

I've looked also in a couple of representations: object algebras and tagless typed encoding before settling on a combined deep embedding using ADTs + shallow embedding using function built on deep embedding, as suggested in this paper. My full research on computation graphs representations is available here.

Ultimately, I've implemented a proof-of-concept compiler in Nim macros using the following tree representation which is very similar to Nim macros and that I've rewritten 3 times already.

As I want both compile-time and JIT code generation in the future (including autodiff) and I want to reuse the data structure, my design a tiny bit more complex because I can't use generics/concepts at runtime. Instead I have my own AST and compile-time symbolic function evaluation that would mirror runtime function evaluation + codegen from the AST tree.

A couple comments:

You have taken the object algebra approach, this means that all operations will be encoded in the generic type system, this has the following limitations:

You can only use it for expressions, no for-loop (unless there is a clever way I'm not aware of)

generic symbol resolution is probably one of the slowest part of the compiler

it's much harder to work with types at compile-time than with values within macros

Many computation graphs library in other languages uses the Visitor Pattern, you can't do that at compile-time in Nim as inheritance+methods don't work at compile-time. The preferred way (and the way Nim compiler and macros are implemented) is to use ADTs/object variants.

aleal (orginal) [2019-09-01T20:25:40+02:00] view original

@jasper Thanks a lot. The following worked (I had to add the auto return type as you mentioned):

template `+`[L, R](lhs: L, rhs: R): auto =
  DualBinaryExpr[AddOp, L, R](l: lhs, r: rhs)

@mratsim Your tensor lib Lux seems quite impressive. My idea is to have an autodiff library in Nim similar to this one I developed in C++: autodiff.github.io . It relies heavily on template metaprogramming, if constexpr, SFINAE, etc., to optimize the expression tree (whenever possible) and avoid temporaries.

There are forward and reverse mode automatic differentiation algorithms in there. My research needs, however, relies mainly (if not only) in forward mode; so, forward mode in autodiff lib is what I have spent most of my available time for this dev.

I'm quite new to Nim; just started exploring it yesterday and I found it quite neat. Right now, just want to experiment with it and check if it could be used for advanced scientific computing (for problems that involve numerical optimization, solving ODEs, PDEs, etc.).

mratsim (orginal) [2019-09-01T23:02:17+02:00] view original

Nim provides an easy way to optimize compile-time expressions via term-rewriting templates (macros) and can be used to rewrite ax + b into fused-multiply-add, exp(x) - 1 into expm1 or ln(x + 1) into ln1p (log1p in <math.h>). They only work if the expression is done in a single line though so the techniques you used in C++ are probably more general.

In terms of need, I only use reverse-mode autodiff so no overlap here. Actually reverse-mode autodifferentiation was how I started Nim (https://github.com/mratsim/nim-rmad) after doing some Project Euler with it.

For collaboration you might want to talk to Hugo Granstrom, who wrote some ODEs in Nim (https://github.com/HugoGranstrom/numericalnim ), I'm mostly working on Machine Learning so I only need reverse-mode autodiff. I guess given the growing number of scientists in Nim (a couple are in bio as well), maybe a IRC+Gitter+Discord+Matrix (whatever is popular) channel for Nim/science might be interesting.

Regarding what I'm planning:

Laser started as a research into improving Arraymancer backend. I have a couple of issues with Arraymancer backend:

Implementing fast algorithms is tedious, for example in order of complexity:
- 2D Softmax
- Convolution
- Recurrent neural networks
The reasons are multiple:
- min/max reductions are slow, because of min/max but also because reductions are slow if only using a single accumulator.
- sometimes it's because I needed to loop on more than 3 tensors at the same time, sometimes strided as a result of a slice
- sometimes it's because OpenMP doesn't support nested loop,
- sometimes it's because the exp and log in <math.h> can be speedup by 10x and those are a huge bottleneck in natural language processing.
- And all the time, it's because SIMD intrinsics and CPU autodetection are needed for maximum performance, also prefetching, tiling the loops to fit in L1 and L2 cache and all that jazz.

Once it's done on CPU, I need to do the same on Cuda, AMD ROCm, OpenCL, Metal (maybe) and probably Vulkan Compute.

Then I need to implement their gradient which is first hard to derive and then need the same optimizations.

So I started Laser with small goals:

fast reductions

fast parallel iterations/reductions on variadic number of tensors, potentially strided.

fast transcendental functions

but then I decided to write a DSL+compiler, inspired by the Halide DSL for image processing and their gradient Halide extension for auto-differentiation:

Having a simple DSL to express operations on multi-dimensional arrays

with multiple backends

composable at a high level, and that composition doesn't hurt loop-fusion / create temporaries.

and that DSL has 2 parts, the algorithm, say C[i, j] = A[i, k] * B[k, j] (matrix multiplication) and the schedule: what is vectorized, what is cached, what is parallel, what is tiled. And the schedule can be adapted depending on CPU or GPU. This is important because optimizing compilers don't know how to optimize numerical workloads.

the algorithm can be differentiated.

work at compile-time (Nim macros, or maybe a Nim compiler-plugin) and runtime (via LLVM JIT API)

In Lux repo there are several documents that goes into details of:

Overview of Lux

Challenges I want to tackle

aleal (orginal) [2019-09-02T10:28:38+02:00] view original

This is all very instructive; thanks a lot. I'm wondering if you have benchmarked your nim codes with, say, C/C++ alternatives (or even a simplified C++ code you wrote yourself). Dealing with GC is something new to me; I'm concerned how this would impact numerical simulations, for example, if dynamically allocated vectors/matrices are everywhere in the code.

I had a look at the term-rewriting templates; I don't think this is what I want to use (although it could complement with the approach I' trying, based on objects that represent expression nodes). You've said They only work if the expression is done in a single line though so the techniques you used in C++ are probably more general. I'm wondering if there is a way to achieve a similar effect with Nim then? Wouldn't general template methods permit this (not the term-rewriting templates)?

mratsim (orginal) [2019-09-04T01:20:48+02:00] view original

Usually I'm as fast as pure Assembly, C, C++ or Fortran code on code I optimized.

Example:

benchmarks of matrix multiplication: https://github.com/numforge/laser/blob/2f619fdbb2496aa7a5e5538035a8d42d88db8c10/benchmarks/gemm/gemm_bench_float32.nim#L418-L465, sometimes Laser is faster, sometimes slower.

OpenBLAS is pure Assembly: https://github.com/xianyi/OpenBLAS

Intel MKL-DNN, is C++ and Jit Assembly: https://github.com/intel/mkl-dnn

Nim implementation: https://github.com/numforge/laser/blob/2f619fdbb2496aa7a5e5538035a8d42d88db8c10/laser/primitives/matrix_multiplication/gemm.nim

Note that the temporary buffers are managed by GC (ref object): https://github.com/numforge/laser/blob/2f619fdbb2496aa7a5e5538035a8d42d88db8c10/laser/primitives/matrix_multiplication/gemm_tiling.nim#L251

In short if you don't use a ref/seq/string (GC-ed types) in a critical path, the GC won't trouble you.

Other example a multithreading runtime:

benchmarks: https://github.com/mratsim/weave/tree/2bb0284e5555ec8882412886fb15c948b2826b81/benchmarks/fibonacci

OpenMP is pure C from GCC or Clang

Intel TBB is pure C++

Weave (https://github.com/mratsim/weave/blob/2bb0284e5555ec8882412886fb15c948b2826b81/e04_channel_based_work_stealing/tests/fib.nim) is a pure Nim reimplementation of C code from a paper

Weave is as fast as the original C code and faster than both GCC/Clang OpenMP and Intel Threads Building Blocks.

Last example, raytracing: see https://forum.nim-lang.org/t/5124#32243 to get Nim as fast as the original C code.

I.e. everything you learned about C++ performance can apply to Nim (just replace std::vector by seq). If you write Nim code like C code, you get C speed.

aleal (orginal) [2019-09-04T06:35:01+02:00] view original

Really impressive. You wrote your own optimized gemm in Nim and this is on par with the big shot BLAS implementations around.

Do you happen to have implementation for (or Nim bindings to) dense linear solvers (e.g., partial pivoting LU)? What do you recommend me to use for this in Nim? I have seen this library called Neo; I wonder if it is actively developed (last commit is from 6 months).

mratsim (orginal) [2019-09-04T11:15:18+02:00] view original

Neo's author @andreaferretti is still around but with much less time than before.

He maintains nimlapack for LAPACK which is the standard Fortran/C interface for linear algebra routine.

For example:

?getrf and

?getf2 for LU factorization

The whole LAPACK doc on solver: https://software.intel.com/en-us/mkl-developer-reference-c-matrix-factorization-lapack-computational-routines (also available on Netlib)

It's a super low-level interfacewhere you need to pass all temporary buffer so a bit annoying to use, here are examples on how I dealt with:

least square solver

eigenvecs in symmetric matrices

Mirror of forum.nim-lang.org

5157 :: Undeclared field error when using template