yesterday evening I saw this post "Slow performance compared to C++, ideas?" http://forum.dlang.org/post/[email protected] on the D forum and wondered how nimrod performs on this code.
My translation to nimrod is here https://gist.github.com/AdrianV/5774141
on my laptop it performs quite well against the c++ code:
nimrod: 450 ms c++: 690 ms
I have no working D compiler installed anymore, but I think the result for nimrod isn't too bad ;-)
Wow, this is very cool.
I'm actually surprised that Nimrod outperforms C++, I wonder why that is.
I have a question about the output, this is what it looks like on my computer (amazing that this is only ~250 lines):
Why are the reflections of the red and dark blue balls shown on the silver ball? is it because the silver ball is transparent?
I was surprised too, that nimrod was faster. Maybe because I implemented the Vec3 as a simple array and not as a class and statically unrolled all the operations on it. - and I hoisted an invariant operation out of the loop (and there is another one which I have overseen).
I would like to port the code to use SIMD - has anyone an example how to do SIMD with Nimrod ?
And yes the silver sphere is 80% transparent.
I know this is a few months old, but if you're trying to use SDL on a Mac, try adding the following at the top of your code before any SDL calls:
when defined(macosx):
const
LibCocoa = "/System/Library/Frameworks/Cocoa.framework/Cocoa"
proc NSAppLoad*():bool {.cdecl, importc: "NSApplicationLoad", dynlib: LibCocoa.}
discard NSAppLoad()
It sets up the required NSApplication object that isn't available, hence the crashes.
HTH.
AIR.
Nice. Will be using this as a benchmark to test SIMD code eventually :)
Speaking a little more on SIMD (which I am no expert on, but know a little about). SIMD is only really useful in well controlled hot-loops, otherwise you'll likely thrash your registers and loose any performance SIMD may have brought (or even slow things down), at least on non-x86_64 hardware (apparently all floating-point ops use SIMD processors on x86_64 processors, so bottlenecks on ARM or PowerPC aren't always bottlenecks on x86_64).
SIMD's benefit isn't just about calculating multiple values at once, it's also about "compressing" the number of vector operations an algorithm needs. For instance, a 'MADD' SIMD op performs both multiply and addition of a vector in a single op. So proper SIMD code can be more than a factor of 4 in performance gains in some areas.
I don't want to make up numbers (i don't have my old code around), but I remember spending some time writing a SIMD-based matrix/vector structs in C to compared performance, and the SIMD version was significantly faster in many conditions (by orders of magnitude, depending on the CPU). The biggest CPU hot-loops in games are Animation & Physics processing (both prime candidates for SIMD optimization), so for game-engine engineers SIMD is very important.
for me it would be very interesting how to improve (and understand) the performance of this code
My brother and I played with the test a bit. On my machine i've improved it by ~40ms (Nimrod 0.9.2). We also ported the code to C# for comparison (and recorded results). The repo is here:
https://github.com/zezba9000/RayTraceBenchmark
For future reference, If anyone wants to extend the results page or add a language, just send me a message or make a pull-request.
It would be nice to see your results for comparison, adrianv.