For parts of the code I want to have full control over memory, but for other parts I still need a GC. Is this possible?
Basically I want a part to be more like C where I have to manage my own memory and do not mind to not use any extra features that require a GC.
I finally want the application to have the following layers:
high level: non-compiled game code for modding using Nim script (https://github.com/komerdoor/nim-embedded-nimscript) mid level: compiled game code using garbage-collector low level: data-oriented game engine not using a garbage-collector
I've been coming back to Nim multiple times waiting for the moment that I can finally switch. I also use several C libraries that I have to create bindings for, hopefully that is now easier as well.
Is using Nim as a scripting language still a good idea of would you recommend to use Lua there instead?
So should I enable Nim's soft realtime GC for everything and then use ptr where I want to manage memory myself? Is that the default, what should I do to use it?
Can I compile using C99 to be able to use the restrict keyword with emit?
As an example, this is how I implemented one batch processor in C99 that can be used by any of the worker threads that picks it up (that is available for work):
void waves_integrate(XRESTRICT(waves_t*) waves) {
size_t wave_index = 0;
XRESTRICT(float*) velocities = waves->velocity;
XRESTRICT(float*) last_heights = waves->last_height;
XRESTRICT(float*) target_heights = waves->target_height;
for(wave_index = 0; wave_index < WAVE_COUNT; wave_index++) {
// Load
float velocity = velocities[wave_index];
float last_height = last_heights[wave_index];
float target_height = target_heights[wave_index];
// Transform
float force = (TENSION * (target_height - last_height) - velocity * DAMPENING);
// Store
last_heights[wave_index] = last_height + velocity + force;
velocities[wave_index] = velocity + force;
}
}
Here XRESTRICT is replaced by the restrict keyword that the C compiler supports. The waves_t* argument is a pointer to one of the batches.
Sorry for the C code, but I just want to know if I can do this in Nim.
@Araq I see you already answered this question once when I used another username I forgot about. Sorry :D
Also Nim seems to prefer that you copy list collection data-structures like seq etc. instead of sharing them between threads, but this time I really have to share them between multiple worker threads. I want to write the scheduler code that guarantees that no worker threads are working on the same batch at the same time myself.
Check this
For the run-once functions like proc LoadEntities() or proc WriteEntitiesToString() I dont care so much about performance, and Im using strings, seqs, joins, splits like there is no tomorrow.
For recursive non-critical functions, like proc MakeTnode() im passing by value,
but for the most CPU-intensive function - **proc TestLine()** im using pointers.
Totally agree - start simple and optimize where needed. Nim works very well as a result from PoC to production to scale.
This blog is a great example of this workflow:
https://www.chameth.com/2018/12/09/over-the-top-optimisations-in-nim/
You don't need to emit restrict, use codegenDecl instead:
when not defined(vcc):
{.pragma: restrict, codegenDecl: "$# __restrict__ $#".}
else:
{.pragma: restrict, codegenDecl: "$# __restrict $#".}
Put that in a template and before you need the pragma you can invoke this template. Usage:
let data{.restrict.} = cast[ptr UncheckedArray[float32]](data)
For low-level optimizations and mixing GC for high-level and no GC for low-level, Laser has plenty of code examples, for study, I suggest you start with the simple primitives like sum reductions:
You can use a seq and share its address with other threads when multithreading, Arraymancer works like that but you need to be sure that the thread owning the seq will not delete it until the end of processing which is true for my case (data parallel array processing) but not true in general for async task parallelism.
For low-level multi-core scheduling, here is one as well. You can reuse the parts that make sense (like pthread_barrier on OSX ...): link. The code doesn't use the GC but can be mixed with GC-ed code. You can only pass stack objects or raw pointer to other threads though. Note that this is a C translation and in the process of being Nim-ified for Project Picasso.
I am okay with managing my own memory. I mostly statically allocate, allocate on the stack and/or have large memory pools.
Thank you for all those examples. They answer more questions than what I was asking for, like alignment and prefetching :D.
My current C code divides work in batches and distributes them over multiple worker threads where they are run through kernels as decided by a scheduler managing batch dependencies etc. (a -> transform -> b -> transform -> c).
I also have to do several atomic operations for everything to be completely lock / wait free, but I already found out how.