Certainly, as a scientific programmer, it's convenient to have this notation, and ^ has the right (high) precedence:
import math
proc `^` (x: float, y: float): float = pow(x,y)
proc `^` (x: float, n: int): float =
result = 1.0
for i in 1..n: result *= x
echo 10 + 2 ^ 3
echo 2 ^ pi
Of course one can do better for the integer case, see
With generics specialized for particular integers, one could do even better for the common cases like ^2 and ^3
steve d
Cool - yes, that's one Pascal notation I never liked much.
I shall find a suitably optimized version for (float,int) and do a few microbenchmarks.
Before I make the committment, here is a simple test rig and some numbers:
import times, math
const N = 90000000
proc `^` (x,y: float):float = pow(x,y)
proc `^` (x: float, n:int):float =
let neg = n < 0
var m = n
if neg:
m = -n
if m == 2:
result = x*x
elif m == 3:
result = x*x*x
elif m == 4:
let sqr = x*x
result = sqr*sqr
else:
return pow(x,float(n))
if neg:
if n == 3: result = -result
result = 1.0/result
var
y = 2.3
x:float
template timeThis(msg: string, code: stmt): stmt {.immediate.} =
block:
let t0 = cpuTime()
for i in 1..N:
code
echo msg,": ", int(1000*(cpuTime() - t0)),"ms"
x = 0.0
timeThis("testing ^2.0"):
x += y ^ 2.0
x = 0.0
timeThis("testing ^2"):
x += y ^ 2
x = 0.0
timeThis("testing sqr"):
x += y * y
x = 0.0
timeThis("testing ^3"):
x += y ^ 3
x = 0.0
timeThis("testing ^4"):
x += y ^ 4
x = 0.0
timeThis("testing ^3.0"):
x += y ^ 3.0
x = 0.0
timeThis("testing ^-3"):
x += y ^ -3
And the results are typically:
testing ^2.0: 140ms
testing ^2: 109ms
testing sqr: 99ms
testing ^3: 110ms
testing ^4: 99ms
testing ^3.0: 3689ms
testing ^-3: 99ms
We don't get much advantage over straight pow for 2.0, but for 3.0 & 4.0 there is an enormous improvement. So at least squares and cubes should be optimized for; fourth-power is perhaps not so common?
Any feelings about this approach of specializing for integer powers?
^(int,int) could also be optimized, particularly for powers-of-two, but then there's always the shift operations.
The following non-overloaded version does just as well (there was a bullshit line switching sign for m==3 in last version anyhow)
proc `^` (x: float, y:float):float =
let neg = y < 0
var m = y
if neg:
m = -y
if m == 2.0:
result = x*x
elif m == 3.0:
result = x*x*x
elif m == 4.0:
let sqr = x*x
result = sqr*sqr
else:
return pow(x,y)
if neg:
result = 1.0/result
Scary to directly compare floating-point numbers like that, but 3 does convert to 3.0 exacly. A result of an arbitrary calculation would not so match, of course.
I think the proper way to do this in nimrod is to detect if the power parameter is known at compile-time and optimize only for this case, without any run-time dispatching. This can be done with the undocumented static params (expr[T]).
Alternatively, something similar can be achieved with a pattern matching rule.
If run-time dispatching is used, an early exit can be helpful:
Most interesting - where does the isConst pseudo-function come from? And if a const, how can the actual value be extracted?
It remains to be seen whether there's actually any advantage to that specialization, but that's an experiment that must be done (i.e. profile then optimize)
This cries for TR macros:
template optPow2*{x ^ 2.0}(x: float): expr =
let a = x
a * a
template optPow3*{x ^ 3.0}(x: float): expr =
let a = x
a * a * a
But beware we still have very few tests for TR macros/templates so it might not work yet.
Most cool, pattern matching in expressions!
However, if my tests are to be believed, I'm not seeing any significant runtime difference between these cases:
timeThis("summing with ^",1000000):
sum = 0.0
for i in 1..256:
sum += float(i)^2
assert(sum == 5.6252160000000000e+06)
timeThis("summing with sqr",1000000):
sum = 0.0
for i in 1..256:
let ix = float(i)
sum += ix*ix
assert(sum == 5.6252160000000000e+06)
So any compile-time specialization would be an academic exercise, no?
(This is using the single float version, no separate in overload)