Hello,
First off I'm not well versed in Nimrod so please have mercy. I was discussing with a coworker about how to generate good hardware-specific C code from nimrod and we couldn't decide how to make it work in one example. Is the following code translation feasible in Nimrod?
Example base code in Nimrod
for i in 0..len-1:
p[i] = a*p[i] + b
For an ARM Cortex M the C equivalent code is slow. It turns out there's a bug in armcc where it thinks MACs are fast, but really they should be avoided like the plague. Also it's beneficial to place all loads and stores together as they take N+1 cycles.
Ideal output code in C after unrolling by 2 for simplicity
for(int i=0;i<len/2;i++) {
p0 = p[2*i+0];
p1 = p[2*i+1];
p0 *= a;
p1 *= a;
p0 += b;
p1 += b;
p[2*i+0] = p0;
p[2*i+1] = p1;
}
//cleanup loop code omitted
It would be really neat if I could write simplistic code like the base code, and choose my target hardware and based on some rules spit out relatively ideal C code on the output
Loop unrolling is tricky (there's a pragma for it, but not implemented), but you may be able to do something like the following to generate specific code for certain types of expression (obviously, you'll do something like a when defined(ARMCortexM) before the template):
template muladd{`+`(`*`(a, b), c)}(a, b, c: int): expr =
block:
var t: int = a * c
t += b
t
var p: array[0..15,int]
for i in 0..len(p)-1:
p[i] = p[i] * 10 + 5
See term-rewriting macros for more details. Caveat: I don't know how stable their implementation is yet.
templates are hygienic by default, there is no need for the block statement here. I write this because I still see this idiom quite a lot and it makes me sad. ;-)
Term rewriting macros work quite well afaict but the aliasing constraints do not work at all yet.