I'm almost finished with the porting of the original Quake map compiling tools from 1996, written in C.
To be more precise, the tool that calculates the light for the map, nothing special, it a straight 1:1 port. After initial benchmarks i ran on various maps, it turned out that the Nim version is almost twice slower than the original C version.
I found the hotspot, its in the function called TestRay. Basicly it checks if a straith line could be drawn between light and a sample point on face from the map. To check if there is a blocking face between these two 3d points, the BSP tree is walked recursively until it comes to the leafs, or it hits some other condition for breaking. There are 10000 -20000 checks flips and flops only for one test, and there are 1296 sample points on a face, and tens of thousands faces on a map in average.
Same as in the C version, Im using ptrmath, the original code is here
, nim version is here
Have you converted it by c2nim initially?
I have no idea about your goal of course -- low level code is still low level even when coded in low level Nim...
First a remark about your proc
proc `toref`[T](x: var T): ref T =
cast[ref typeof(x)](x.addr)
that can not really work, you can not cast an arbitrary variable of type T to a ref. In Nim a ref is a GC traced reference to an object, there is a refcount and type information involved, which is much more than a plain pointer. So delete this proc now!
Have you verified size of
type
tnode_t* = object
nodetype*: int32
normal*: array[3, float32]
dist*: float32
children*: array[2, int32]
pad*: int32
I am not sure, but I would guess that when you not mark this object with pure pragma, then there may be additional type information included in the object instance -- making it larger and decreasing number of instances in cache.
I can't see other issues on a first look, you may have to compare line by line. Maybe inspect Nim's intermediate C code. Tiny differences may make big differences in performance, for example the original C code may be optimized that the C compiler generates fine SIMD instructions, which may not work with Nim's C output.
But of course it is great that your Nim version works at all!
@Stefan_Salewski thanks for the feedback. I compared the sizes of the structs, they same as the C version,
("Tnode_t size: ", 32)
("Tracestack_t size: ", 20)
One more idea: What is the option for gcc? Nim passes -O3 to gcc by default, while C programs most of the times use only -O2. -O3 can significantly grow the executable size -- generally O3 should be not slower than O2, but for rare cases it may be slower.
And you may try option -flto for link time optimation, or use clang instead of gcc.
Can you test with --gc:nome ?
You need to compile with both -d:release (removes stacktraces and uses -O3) and -d:danger.
Using -d:danger only is like compiling C with no optimization.
I'll have a look later.
In my experience Nim is as fast as C, especially for low-level stuff.
Im using MinGW 4.9.2, used TDM GCC 5.1, but its even slower with it. 4.9.2 compiles faster, produces faster code and smaller binary.
Tried with: -O2, no difference -flto, speeded up the parser, wich uses .split(), but did nothing to TestLine --gc:nome, did nothing except raising the memory usage from 5 to 330MB
Nim uses half of the memory
That is strange.
You do both of your tests with 32 bit OS, or both with 64 bit OS?
One more remark: You used int32 and float32 data types in your Nim version -- that seems to answer my initial question, you have not used c2nim for code transfer.
I think there is a good reason why Araq invented cint and cfloat, so why not use them?
And generally, I would recomment using c2nim, it mostly works even for C++ sources. After using c2nim I generally still do a line by line compare manually, but it prevents me from typing errors, and from much stupid typing.
One more short look...
tstack_p--;
tstack_p -= 1
You know that C pointer arithmetic is very different from plain integer arithmetic?
Yes, I have hidden converters :) Its in ptrmath.nim from @Jenah
About c2nim, yes i converted the code initialy, but i changed the types like cint, cfloat, cuint, if I remember correctly, the object types had some issues with the sizes. Also they had {.bycopy.} pragma
So after a first look through the C and Assembly code generated, there is a bug in the - implementation for float32 literals.
Test case:
proc main() =
let z = 10.0'f32
if z > -0.1'f32:
echo "more"
if z < 0.1'f32:
echo "less"
main()
Generated C code
N_LIB_PRIVATE N_NIMCALL(void, main__WT5bdlPWc6VHEZkxs56sUA)(void) {
NF32 z;
z = 1.0000000000000000e+01f;
{
if (!(-1.0000000000000001e-01 < z)) goto LA3_;
echoBinSafe(TM__tJFdVcCXt79a8K7xYCw4R7g_2, 1);
}
LA3_: ;
{
if (!(z < 1.0000000000000001e-01f)) goto LA7_;
echoBinSafe(TM__tJFdVcCXt79a8K7xYCw4R7g_4, 1);
}
LA7_: ;
}
Notice how the minus uses a double "-1.0000000000000001e-01" instead of a single precision float "1.0000000000000001e-01f".
Benchmarking shows that a lot of time is spent on single to double conversion via cvtss2sd instruction.
I also expect that several parts of your code should use 0.0f or 0.0'f32 instead of 0.0 to ensure that float32 is used.
Unfortunately I can't compile the original code on Linux easily due to the MSVC build system. Note that your filename/imports have casing issues on Linux as well.
Feel free to raise the - issue on the tracker, otherwise I'll do it later when I have more time.
I can confirm this: withouth 'f32:
Time [LightWorld ] 3.094s
CPU Time [WriteEntities] 0.256s