type
Euro = distinct int
Color = 0..1
var
i: int
f: float
e: Euro
c: Color
# generally nonsense
f = cast[float](i)
i = cast[int](f)
# expensive conversions
f = i.float
i = f.int
# cheap conversion, we should get that for free
e = i.Euro
i = e.int
# and this? May be cheap if size is equal
c = i.Color
i = c.int
# so we may like to have
i = safeCast[Color](c)
#That would complain if a safe cast does not work, so we have to use a conversion.
people confuse 'cast' and type conversions all the time already.
The reason is that the difference is not often explained well. I can not really remember a very good explanation in the Nim docs -- maybe I forgot it or it was not there 18 months ago when I read it. And I have some books about C and other languages which explains the difference not well also. But we know the difference.
Nim does not allow mathematical operations between float and int without conversion-- it was my impression that performance was one reason?
I generally avoid casts and conversions, but for example when we have a distinct int and do arithmetic, we may have to write a.int + b.int. I think that is free of additional costs.
But I think my idea is not too wrong, for this code piece
proc main =
var
x: float
j: int
x = 0
j = 0
for i in 0..10000000:
j += (i.float + 3.1).int
#j += i + 3
echo j
main()
it is three times faster when I replace the float expression in the loop with the commented out int expression. Of course that is not a good comparison, the float add may be slower as well.
From http://stackoverflow.com/questions/12920700/floating-point-conversions-and-performance we get that such conversions are not very costly.
Stefan_Salewski: Of course that is not a good comparison, the float add may be slower as well.
What did you benchmark this with? With -d:release, they are both instantaneous for me (because the compiler optimizes most of the code completely away).
The following two pieces of code take identical time for me with -d:release:
proc main =
for i in 1..1000000000:
var x {.volatile.} = i
main()
proc main =
for i in 1..1000000000:
var x {.volatile.} = i.float
main()
Granted, this probably also has to do with the FP unit not having anything else to do on a superscalar processor. And in the following example, the version with conversions from float is actually faster (again, probably an artifact of superscalar execution being able to do more in parallel):
proc main =
var x {.volatile.}: int
for i in 1..1000000000:
x += i
echo x
main()
proc main =
var x {.volatile.}: float
for i in 1..1000000000:
x += i.float
echo x
main()
(And, yes, the float version overflows.)
Nim 0.13 with -d:release and gcc 5.3.
But my box is an older AMD64, maybe float addition is much slower, I do not know. I tested multiple times
Yes. it is difficult to test, I tried a loop that is not fully removed by gcc.
stefan@AMD64X2 ~/nimtoychess $ time ./x 50000035000003 real 0m0.003s user 0m0.000s sys 0m0.002s stefan@AMD64X2 ~/nimtoychess $ time ./x 100000070000006 real 0m0.071s user 0m0.069s sys 0m0.002s
the version with conversions from float is actually faster
That is really interesting. I know that float is comparable fast as int on modern hardware.
I can remember you told someone to avoid usage of float for code where int can be used. I generally do that.
[EDIT]
http://forum.nim-lang.org/t/533/2
You wrote: A second is that using floating point operations unnecessarily and extensively hurts hyperthreading opportunities
[EDIT2]
Indeed, you are right: for int the loop is completely removed by gcc, loop iterartions does not matter :-(
Besides the hyperthreading resource issues Jehan mentioned in your link, there is also a possible context switch boost if your process/thread literally never uses FP. The OS can avoid saving/restoring most of the register state in that case, and this can speed up each and every context switch (back in the day an improvement on the order of 3X, obviously very CPU-dependent). In these days of SSE/AVX-optimized string operations, it has become more rare for even purely non-numerical programs to never touch those registers, though. And, of course, only context switch-heavy workloads benefit. This is also a kind of "subtle background cost" that only a careful benchmark might reveal.
It's just yet another possible origin for the advice of "use ints not floats if you can". Another much older reason goes back to "not all CPUs even have FPUs". :-)
Advice and guidelines are just that, though - almost never a substitute for benchmarks/timing, and mileage can vary for so very many reasons - compilers, optimization flags, CPUs, etc.
#That would complain if a safe cast does not work, so we have to use a conversion.
I don't see the point of using safeCast rather than a conversion. If the conversion is cheap then great. If it's not, you had to pay the cost anyway. Why write it with safeCast, get a compile-time error (which might be conditional on how the types are defined), and then rewrite it with a conversion, rather than just write it with a conversion in the first place?
If the goal is to find out about expensive code that might be rewritten to be cheaper, a language feature like safeCast isn't the way to do that. Rather, develop static analysis tools that warn of expensive operations or, better, do profiling ... rewriting something just for the sake of performance should only be done when an actual significant performance issue has been identified. As they say, Premature optimization is the root of all evil.
If the goal is to find out about expensive code that might be rewritten to be cheaper,
Yes, that was the core of my idea. Sometimes I do not know costs of a conversion. OK, the cost may be very small for all of allowed conversions in Nim, as Araq said. In the microcontroller area cost are more important, for example one should avoid mixed arithmetic operations with signed and unsigned datatypes or operants of different byte size. When the compiler does conversition silent, or you use conversions explicitly assuming zero cost you may loose performance.
Stefan_Salewski: I can remember you told someone to avoid usage of float for code where int can be used.
This was about (1) unnecessary use of float operations (2) in a library.
An application can make more assumptions than a library can. If an application is using integer-only code with the goal to exploit hyperthreading opportunities, then a library that unnecessarily use floating point operations can hurt that; if an application author knows that their program is only single-threaded, then they want to use their CPU's resources as efficiently as possible. Applications can have that knowledge; libraries generally don't.
This, again, can change in libraries that perfom huge chunks of computations by themselves (such as libraries for scientific computing). As always, when it comes to optimization, things are rarely black and white.
Note also that even if you're intentionally offloading integer computations to an FPU, you're creating technical debt. The code may become more involved than necessary, it may not be possible to use vectorization techniques if you mix and match int and float computations, your code's behavior with respect to overflow changes, and so forth. This is generally why you do such optimizations only if you know that you need them and if you are aware of the trade-offs. (Knuth's "premature optimization is the root of all evil" quote is really about the minimization of technical debt once you get down to it.)
Knuth's "premature optimization is the root of all evil" quote is really about the minimization of technical debt once you get down to it.
That's a good observation as far as the "evil" part goes, but it's also about misplaced effort -- YAGNI.