nimforum mirror - Generating efficient hardware-specific code

dwhite85 (orginal) [2014-07-25T21:51:55+02:00] view original

Hello,

First off I'm not well versed in Nimrod so please have mercy. I was discussing with a coworker about how to generate good hardware-specific C code from nimrod and we couldn't decide how to make it work in one example. Is the following code translation feasible in Nimrod?

Example base code in Nimrod


for i in 0..len-1:
  p[i] = a*p[i] + b

For an ARM Cortex M the C equivalent code is slow. It turns out there's a bug in armcc where it thinks MACs are fast, but really they should be avoided like the plague. Also it's beneficial to place all loads and stores together as they take N+1 cycles.

Ideal output code in C after unrolling by 2 for simplicity

for(int i=0;i<len/2;i++) {
    p0 = p[2*i+0];
    p1 = p[2*i+1];
    p0 *= a;
    p1 *= a;
    p0 += b;
    p1 += b;
    p[2*i+0] = p0;
    p[2*i+1] = p1;
}

//cleanup loop code omitted

It would be really neat if I could write simplistic code like the base code, and choose my target hardware and based on some rules spit out relatively ideal C code on the output

Jehan (orginal) [2014-07-25T23:29:31+02:00] view original

Loop unrolling is tricky (there's a pragma for it, but not implemented), but you may be able to do something like the following to generate specific code for certain types of expression (obviously, you'll do something like a when defined(ARMCortexM) before the template):


template muladd{`+`(`*`(a, b), c)}(a, b, c: int): expr =
  block:
    var t: int = a * c
    t += b
    t

var p: array[0..15,int]
for i in 0..len(p)-1:
  p[i] = p[i] * 10 + 5

See term-rewriting macros for more details. Caveat: I don't know how stable their implementation is yet.

Araq (orginal) [2014-07-26T10:33:35+02:00] view original

templates are hygienic by default, there is no need for the block statement here. I write this because I still see this idiom quite a lot and it makes me sad. ;-)

Term rewriting macros work quite well afaict but the aliasing constraints do not work at all yet.

Mirror of forum.nim-lang.org

507 :: Generating efficient hardware-specific code