Hi, here is the slides of a talk on HPC I just gave.
https://github.com/jcosborn/cudanim/blob/master/demo3/doc/PP-Nim-metaprogramming-DOE-COE-PP-2017.pdf
I post it here for a wish.
I wish the next release of Nim does not break the code.
Wow, awesome library, brilliant slides and amazing work!
Out of curiosity, where did you give this talk?
I work in HPC myself (mostly using Fortran). I liked your slides very much. However I have a few a few questions. For now, I'll focus on the one that puzzles me the most.
Am I misunderstanding the purpose of your ArrayObj type? It seems to me your implementation of macro indexArray*(x: ArrayObj{call}, y: ArrayIndex): untyped is little inflexible. After all, not every function returning an ArrayObj will be just doing element-wise calculations (or should it?)... I guess that's a good time for using procedure-modifying macros which would add them to some compile-time collection and then indexArray will choose the right transformation based on data in the collection (also at compile-time). Also, some special cases could be done easier this way. For example, let's consider the difference between element-wise addition and multiplication and circular shifting:
proc `+`*(x: ArrayObj, y: ArrayObj): ArrayObj {.elemental.}
proc `*`*(x: ArrayObj, y: ArrayObj): ArrayObj {.elemental.}
# (x + y * z)[i] --> x[i] + y[i] * z[i]
replace:
proc rshift*(x: ArrayObj, shift: int): ArrayObj = ...
proc opt(x: ArrayObj, shift: int, i: SomeInteger) =
if shift + i < x.len:
x[shift + i - x.len]
else:
x[shift + i]
proc opt(x: ArrayObj, shift: static[int], i: static[SomeInteger]) =
when shift + i < x.len:
x[shift + i - x.len]
else:
x[shift + i]
...
# if x.len == 10:
# x.cshift(2)[5] --> x[6]
# x.cshift(2)[9] --> x[1]
Actually, macro elemental is quite simple. Unless we would like it to use vectorization or other additional optimizations, of course.
Am I misunderstanding the purpose of your ArrayObj type? It seems to me your implementation of macro indexArray*(x: ArrayObj{call}, y: ArrayIndex): untyped is little inflexible. After all, not every function returning an ArrayObj will be just doing element-wise calculations (or should it?)...
You are absolutely correct. The current definition of indexArray in the repository is incomplete. You can find an improved one at
https://github.com/jcosborn/qex/blob/devel/src/new/fieldProxy.nim#L100
Your suggestion of using a macro to annotate procs and modify a global compile time list of element-wise calculations is a good option, too. Perhaps that is a nice option for users to define their own element-wise operations. I will consider this. (J's rank system is in the back of my head, but I will not go down that path anytime soon.)
The complexity of shift goes up easily with MPI and vectorization. In QEX, we have something similar, which is probably the most complicated piece in the code base.