nimforum mirror - a simple raytracer

adrianv (orginal) [2013-06-13T16:43:05+02:00] view original

yesterday evening I saw this post "Slow performance compared to C++, ideas?" http://forum.dlang.org/post/[email protected] on the D forum and wondered how nimrod performs on this code.

My translation to nimrod is here https://gist.github.com/AdrianV/5774141

on my laptop it performs quite well against the c++ code:

nimrod: 450 ms c++: 690 ms

I have no working D compiler installed anymore, but I think the result for nimrod isn't too bad ;-)

dom96 (orginal) [2013-06-13T20:36:17+02:00] view original

Wow, this is very cool.

I'm actually surprised that Nimrod outperforms C++, I wonder why that is.

I have a question about the output, this is what it looks like on my computer (amazing that this is only ~250 lines):

Why are the reflections of the red and dark blue balls shown on the silver ball? is it because the silver ball is transparent?

adrianv (orginal) [2013-06-14T07:58:56+02:00] view original

I was surprised too, that nimrod was faster. Maybe because I implemented the Vec3 as a simple array and not as a class and statically unrolled all the operations on it. - and I hoisted an invariant operation out of the loop (and there is another one which I have overseen).

I would like to port the code to use SIMD - has anyone an example how to do SIMD with Nimrod ?

And yes the silver sphere is 80% transparent.

Araq (orginal) [2013-06-14T09:56:46+02:00] view original

I've seen GCC's optimizer generate SIMD often enough with code emitted by Nimrod. I've also seen native Nimrod code outperform C code that explicitly used SIMD intristics. So my advice is to check the generated assembly; if it doesn't use SIMD already you can try Visual C++ or Intel C++ or play around with your coding style to make it emit SIMD. Afaik we have no wrapper yet for the SIMD intristics.

asterite (orginal) [2013-06-29T19:20:20+02:00] view original

Couldn't render post #846.

Araq (orginal) [2013-06-29T23:51:28+02:00] view original

Oh, it doesn't work on Mac? I can't say I'm surprised. It's the OS which costs us the most maintenance time.

AIR (orginal) [2013-10-31T19:47:17+01:00] view original

I know this is a few months old, but if you're trying to use SDL on a Mac, try adding the following at the top of your code before any SDL calls:


when defined(macosx):
  const
      LibCocoa = "/System/Library/Frameworks/Cocoa.framework/Cocoa"
  
  proc NSAppLoad*():bool {.cdecl, importc: "NSApplicationLoad", dynlib: LibCocoa.}
  
  discard NSAppLoad()

It sets up the required NSApplication object that isn't available, hence the crashes.

HTH.

AIR.

filwit (orginal) [2013-11-01T19:57:02+01:00] view original

Nice. Will be using this as a benchmark to test SIMD code eventually :)

Speaking a little more on SIMD (which I am no expert on, but know a little about). SIMD is only really useful in well controlled hot-loops, otherwise you'll likely thrash your registers and loose any performance SIMD may have brought (or even slow things down), at least on non-x86_64 hardware (apparently all floating-point ops use SIMD processors on x86_64 processors, so bottlenecks on ARM or PowerPC aren't always bottlenecks on x86_64).

SIMD's benefit isn't just about calculating multiple values at once, it's also about "compressing" the number of vector operations an algorithm needs. For instance, a 'MADD' SIMD op performs both multiply and addition of a vector in a single op. So proper SIMD code can be more than a factor of 4 in performance gains in some areas.

I don't want to make up numbers (i don't have my old code around), but I remember spending some time writing a SIMD-based matrix/vector structs in C to compared performance, and the SIMD version was significantly faster in many conditions (by orders of magnitude, depending on the CPU). The biggest CPU hot-loops in games are Animation & Physics processing (both prime candidates for SIMD optimization), so for game-engine engineers SIMD is very important.

adrianv (orginal) [2013-11-02T20:50:42+01:00] view original

yes this code is nice to study different processor and compiler dependent aspects. I tried for example clang vs gcc. Float64 vs float32. OpenMP vs single threaded. Some results are really astonishing:

on an old AMD machine OpenMP scales as expected with every used core. On a new Intel core i5/i7 OpenMP(gcc) was slower than single threaded or gets even slower than than the old AMD.

on float64 gcc and clang performs almost on par on my i7. On float32 clang is about 30% better than gcc

for me it would be very interesting how to improve (and understand) the performance of this code

filwit (orginal) [2013-11-04T00:07:30+01:00] view original

My brother and I played with the test a bit. On my machine i've improved it by ~40ms (Nimrod 0.9.2). We also ported the code to C# for comparison (and recorded results). The repo is here:

https://github.com/zezba9000/RayTraceBenchmark

For future reference, If anyone wants to extend the results page or add a language, just send me a message or make a pull-request.

It would be nice to see your results for comparison, adrianv.

Mirror of forum.nim-lang.org

167 :: a simple raytracer