Nimrod now has its own profiler:
http://build.nimrod-code.org/docs/estp.html
It's supposed to be used for macro level optimizations (finding out why 30% is spent in GC), not micro level optimizations like cache miss avoidance (finding out how to improve the GC's performance). Other existing profilers work fine for micro optimizations, but don't work well for macro optimizations IMO.
Enjoy.