I'm working with an external library that requires passing a context variable to all library calls. As I'm writing a DSL to ease using this library, I'd rather hide this context variable to the user of the DSL. Is there a way to achieve like a closure capture variable for public procs?
For instance, the following DSL code
math:
a = (b + c) * d
would be translated to something like:
template math*(body: untyped) =
block:
let ctx {. inject .} = initLib()
`body`
proc `+`*(a, b: Mat): Mat =
result = libAdd(ctx, a, b)
proc `*`*(a, b: Mat): Mat =
result = libMult(ctx, a, b)
where the ctx variable is captured in procs + and *, resulting in
block:
let ctx {. inject .} = initLib()
a = libMult(ctx, libAdd(ctx, b, c), d)
if the two procs were templates.
I've been using templates to achieve the result but 1) templates are bugged when dealing with "generics" syntax 2) templates can give unexpected results when arguments with side effects are evaluated multiple times, 3) though templates are evaluated a compile time, syntax errors in the code will stay hidden until the template is used, 4) the type of the template is known only when it is evaluated, 5) debugging templates is more difficult than standard code, etc. See the wiki for risks of using templates...
Ideally, because the DSL code can be used in multiple procs or even in recursive code, I need to capture the latest ctx in the math scope. If I can declare inner closure procs as public, it would do the job... but it can't be done.
Is it possible to do what I want to do?
Nim's stdlib has with macro that seems to be what you need
https://nim-lang.org/docs/with.html#with.m%2Ctyped%2Cvarargs%5Buntyped%5D
It's possible to not use templates if you define your DSL in the math body:
template math*(body: untyped) =
block:
proc `+`*(a, b: Mat): Mat =
result = libAdd(ctx, a, b)
proc `*`*(a, b: Mat): Mat =
result = libMult(ctx, a, b)
let ctx {. inject .} = initLib()
`body`
In Arraymancer I chose to just use ctx.network as a template/macro parameter
https://github.com/mratsim/Arraymancer#handwritten-digit-recognition-with-convolutions
network ctx, DemoNet:
layers:
x: Input([1, 28, 28])
cv1: Conv2D(x.out_shape, 20, 5, 5)
mp1: MaxPool2D(cv1.out_shape, (2,2), (0,0), (2,2))
cv2: Conv2D(mp1.out_shape, 50, 5, 5)
mp2: MaxPool2D(cv2.out_shape, (2,2), (0,0), (2,2))
fl: Flatten(mp2.out_shape)
hidden: Linear(fl.out_shape, 500)
classifier: Linear(500, 10)
forward x:
x.cv1.relu.mp1.cv2.relu.mp2.fl.hidden.relu.classifier
The main difference is that the input x also carry the context ctx in its data structure so that I don't have to pass a ctx.add(a, b) every time. https://github.com/mratsim/Arraymancer/blob/4ae9b811/src/arraymancer/autograd/autograd_common.nim#L44-L70
That said, I'm not that happy with the setup so I plan to write a full blown math DSL using a compiler approach. A simple expression for matrix multiplication would look like
proc matmul(A, B: Function): Function =
## Generator of A * B matrix multiplication function
var i, j, k: Domain
# The "what"
# Definition of the result function
C[i, j] = A[i, k] * B[k, j]
# The "How"
# Optional tips for high-performance computing depending on GPU or CPU
when defined(cuda):
# Separate onto 256 iterations and launch a cuda thread for each
C.unroll(i, 256)
.parallel()
...
else:
# Iterate on blocks of 96 j, vectorize them using assembly and parallelize over i
C.tile(j, 96)
.vectorize
.parallel(i)
...
# Return
return C # Matrix multiplication
# `generate` concretizes this definition (the what) and schedule (the how)
generate foobar:
proc foobar(a: Tensor[float32], b, c: Tensor[float32]): Tensor[float32]
Something with just the "what" is already implemented in Einsum:
https://github.com/mratsim/Arraymancer/blob/4ae9b81/src/arraymancer/tensor/einsum.nim#L512-L525
# implicit Einstein summation
let c = einsum(a, b):
a[i,j] * b[j,k]
# explicit Einstein summation. Note that identifier `d` in statement
# is arbitrary and need not match what will be assigned to.
let d = einsum(a, b):
d[i,k] = a[i,j] * b[j,k] # explicit Einstein summation
Alternatively you might want to change the way your DSL is done, I have a couple of experiments here for computation graph DSL: https://github.com/mratsim/compute-graph-optim
For example using a tagless final approach, you define operations and interpreters for those operations and the "eval" interpreter can carry your context. https://github.com/mratsim/compute-graph-optim/blob/master/e05_typed_tagless_final.nim
type
Expr[Repr] = concept x, type T
lit(T) is Repr[T]
`+`(Repr[T], Repr[T]) is Repr[T]
Id[T] = object
val: T
Print[T] = object
str: string
func lit[T](n: T, Repr: type[Id]): Id[T] =
Id[T](val: n)
func `+`[T](a, b: Id[T]): Id[T] =
Id[T](val: a.val + b.val)
func lit[T](n: T, Repr: type[Print]): Print[T] =
Print[T](str: $n)
func `+`[T](a, b: Print[T]): Print[T] =
Print[T](str: "(" & $a.str & " + " & $b.str & ")")
func foo(Repr: type): Repr =
result = lit(1, Repr) + lit(2, Repr) + lit(3, Repr)
echo foo(Id).val # <----- Use a context here if needed
echo foo(Print).str
Other techniques I explored include object algebra, attribute grammars, visitor pattern, catamorphisms, functional lenses, transducers or the compiler approach I took (and you are leaning into) using shallow embeddings (user-defined functions) composed from deep embeddings (core math functions based optimized implementations, your library in your case).
From what I understand, I will have to go with the macro route and manage the full DSL language syntax. I have to think about it before jumping as my DSL can be mixed with Nim code and only the calls to the library needs to get the context variable (I used a math syntax in my example but the language is more general).
Like H. L. Mencken said, For every complex problem there is an answer that is clear, simple, and wrong.. And using templates to define a large DSL is such a solution...
Thanks for the hints and experience feedback.
For the record, another simple but wrong solution:
It's possible to not use templates if you define your DSL in the math body:
template math*(body: untyped) =
block:
proc `+`*(a, b: Mat): Mat =
result = libAdd(ctx, a, b)
proc `*`*(a, b: Mat): Mat =
result = libMult(ctx, a, b)
let ctx {. inject .} = initLib()
`body`
results in Error: 'export' is only allowed at top level
Inspired by the with macro, I'm thinking of writing a macro to inject the ctx into all libXXX calls while walking the AST. But in order for the procs to compile, I would have to write them like:
# Fake libXXX API to allow compilation
proc lib2Mult(a, b: Mat): Mat = discard
proc lib2Add(a, b: Mat): Mat = discard
# Real libXXX API
{. pragma: libXXX, importc, dynlib: libName, cdecl .}
proc libMult(ctx: Ctx; a, b: Mat): Mat {. libXXX .}
proc libAdd(ctx: Ctx; a, b: Mat): Mat {. libXXX .}
proc `*`(a, b: Mat): Mat =
result = lib2Mult(a, b)
proc `+`*(a, b: Mat): Mat =
result = lib2Add(a, b)
The macro would replace the fake lib2XXX calls by libXXX and inject the ctx variable.
Some thoughts before starting to code:
Document it well that the convention is to call the parameter ctx and then have:
{. pragma: libXXX, importc, dynlib: libName, cdecl .}
proc libMult(ctx: Ctx; a, b: Mat): Mat {. libXXX .}
proc libAdd(ctx: Ctx; a, b: Mat): Mat {. libXXX .}
template `*`(a, b: Mat): Mat =lib2Mult(ctx, a, b)
template `+`*(a, b: Mat): Mat = lib2Add(ctx, a, b)
template initMat(): Mat = libCreateMat(ctx)
User code:
proc code(ctx: Ctx) =
var x = initMat()
var y = initMat()
echo x + y
There are many other ways though.
@spip For the record, another simple but wrong solution
Error: 'export' is only allowed at top level
This one is easy to make right. The + and * operators are defined and used inside the block. You do not need to export them. Simply remove the export symbol from them and you are done.