TL;DR: I want to create an “actor system” in Nim, and I need some feedback, in particular, on how to move data across threads.
After reading the docs, multiple posts on the forum, and trying to code a bit myself, I believe I am ready to present my long-term goal, and I’m asking for some brain-storming from the community. I actually wanted to wait with this until I have some basic prototype to show, but atm I don’t see a clear path forward, so I thought it was preferable to ask for ideas first, before I start investing my time into coding something that might be a dead-end, or duplicates existing code.
At the highest level, I want to program a networked game, with a client/server-cluster setup. AFAIK, if you start with a simple client/single-server setup, and wait until everything is implemented and working fine to try and add cluster support, the refactoring is probably so large that the idea is usually abandoned (i.e. it’s cheaper to just buy a bigger server). By cluster support, I mean one single seamless world, dynamically split between multiple servers, with no single-point-of-failure, not the usual one-“shard”-per-server design, which is a no-brainer. That is why I want to have scalable cluster support from the get-go. And that is also why I think actors is the best design for this.
I’ve done most of my professional coding on the JVM, and I’m of the opinion that it is less-than-ideal for a soft-real-time game, with high memory demands, because of the GC mostly, because you cannot do low-level memory manipulation, and because all objects are allocated on the heap. While there are work-arounds for some of this, and improvements are coming in the future, I believe it was time for me to try something new. I also wanted to be more than just a “Java programmer”; as a student, I could code anything from Prolog to RISC Assembler, and now I feel “degenerated”, only programming in one (well, except for some python) language.
I literally spent years looking around, trying to decide which language/platform to use, instead of the JVM. I settled for Nim because of the following features:
I’ve spent several years cooperating with someone else on the web over a Java actor system. In the end, I moved away due to personal reasons, and he moved to Clojure, but that experience gave me “strong opinions” on how a good (IMO) actor system should be designed. The ideal actor system for me would have the following requirements:
I think some of this is similar to the C++ Actor Framework.
I would have called this actor system “reactor(s)” (REquest/REply ACTORs), but that name is already taken in Nimble, so I’m still searching for a good name for this.
I think most of this can be done in a reasonable amount of time, using Nim, threads, and channels (I’m not sure if channels are the best way to model inter-thread communication im this case, in particular since I’m going to have an unspecified number of different messages to exchange, and channels are fixed-type, but it’s really just an “optimization” problem, that can be postponed to V2).
But, there are a few things that are less than obvious (to me), atm, and this is why I’m posting this. Here is the list of the main problems I see atm:
I’m currently assuming most messages are going to be “simple” structurally. Therefore I expect problem #1 can be solved with some kind of shared-heap “generic list”, where by generic I mean it can store any type “inline”. This might even be possible with zero-copy, using “type IDs”, and pre-allocating space for the whole message before writing to it.
Problem #2 is probably a nice-to-have for V2.
Problem #3 is definitely going to be tricky, and I don’t think there is anything out there I can readily use to solve this. It’s like turning every actor state into a small in-memory embedded DB. I have spent time thinking about this, and I have some ideas, but idk if it’s going to be fast enough. If there IS anything that already solves that problem, I would like to know.
For problem #4 and #6, I could just use message-pack but this might get tricky if I have to write the mapping manually. Presumably, a clever macro could generate the mapping automatically.
Problem #5 is mostly related to batching in the cross-thread scenario. Within the same thread, I can allocate messages using the local heap, and have the CG take care of their lifetime. But across threads, I want to batch them into “buffers”, to reduce the number of copies and synchronization. If all messages are answered “immediately” (synchronously), I can free the buffer after delivering the replies. But if any actor chooses to reply asynchronously, I would have to either keep the whole buffer until the last reply is sent and received, or require that the message be copied. If the message needs to be copied, both the receiver AND the sender needs to use the copy (since the message is passed back to the sender together with the reply). If, OTOH, I decide to not use buffers, and allocate each message individually, the messaging becomes much simpler, at the cost of a much greater overhead per message (the individual shared heap allocation). Maybe I should save buffering for V2?
And, the biggest problem of all, AFAIK, is #7. I can’t just transfer state of a local heap actor to another thread without copying, and some of those actors (or actor teams) might grow into the multi-megabyte range, and I don’t even know how to do the copying with fixed-type channels, since each actor (team) might have a different type.
What I would like here, to solve #7, is to have multiple local heaps per thread, each independent of the others, and always accessed through an individual lock. Like that, each “actor team” would have it’s own private heap, and the whole thing could be passed to another thread without copying; just tell the other thread to use the “team lock”. I have found something similar in an old forum post: Lightweight threading (Goroutines) but my use-case is somewhat different. Firstly, I’m not sure how goroutines work, because I never used (nor ever intend to use) Go. Secondly, this post seems to be about running many short-lived “threads”, while what I will have is few long-lived ones (aiming for 10-20 per real thread). The old forum post did not seem to have a “solution”, but maybe things changed?
I believe problem #7 can be worked-around if I manage memory “manually”. It will be a pain, but if destructors support got to the point where one could do RAII like in C++, it would be doable. That is why I was particularly excited about the work being done in that direction. In the end, the thread-local heap is just the same physical memory as the shared heap, and the thread-local GC is there mainly to make it’s usage more convenient, but not faster (or so I assume; GCs add some overhead so how could they make memory faster?). What does make the thread-local heap faster is the thread-local allocator (I haven’t seen any specific reference to the alloator yet, but this is the usual way of “making memory management faster”, so I assume that is what Nim does). If the thread-local allocator cannot be subverted to support multiple independent local heaps, I could try to use one of the other OSS allocators out there. But then I feel like I’ve given up on one of the most important part of Nim; the built-in thread local heap, the very feature which convinced me to try Nim in the first place.
Slightly off topic, but related to problem #3, is finally the problem of running a local simulation on the client (Problem #8). More specifically, the server (cluster) will know about the entire state of the game world, but the client will only know about parts of it (since the entire world will not fit inside a single computer). Therefore, even with the best efforts, the client will sometimes produce different simulation results than the server. So the server has to regularly update the client with the state changes on the server. But the client is running the simulation locally, and therefore changing the state locally too. If the client wants to apply changes from the server, it must not only be able to replace the values that changed on the client with server changes, but also undo changes on the client for values that did not change on the server (and you don't want the server to send the entire state, including the unchanged parts). It's kind of like running two transactions on top of each other; the per-event transaction, and the all-the-client-changes transaction, the later being rollbacked before applying the server transaction log.
This is not completely related to nim, but have you tried a google search first? I found some interesting threads on the subject:
https://gamedev.stackexchange.com/questions/117602/updating-a-multithreaded-entity-component-system
But i'd say that, unless you have some experience making games and know what the purpouse of this engine will be (why not tailor one for your own needs instead of trying to fit your game into a generic engine?) thinking before hand will not give you too much experience.
First try to make some game with only one thread and take notes on how would you parallelize that.
@Arrrrrrrrr I know what a game entity system is. :) This is not what I'm trying to do here. I'm talking about actors in the "erlang" sense of the word. While I want to use the actor system for games, I want it to be generic enough to be used in other context. The only really game-specific thing about this is "requirement #12", which is a very minor feature, that will be "optional".
"why not tailor one for your own needs instead of trying to fit your game into a generic engine"
Well, because I don't actually have much interest in how "pixels are painted"; that's the job of the engine to do that. Nor do I have any artistic talents; I'm buying as much pre-made assets as I can (when it's on offer), and will get some freelancer to fit them together once I have enough. My job is to implement what the "generic game engine" doesn't give me, which is "massive scalability" in the form of cluster support. And of course the "game logic/gameplay"; I have so many ideas on that front, I'd need to create 10 games to try them all.
I will admit I never wrote a game anyone played, apart from some prototypes my wife and son tried. But I've read at least 3 books on programming "virtual worlds" (my main interest; I talk about games so people understand, but I think of it as a simulation... something full of AI citizens), another 3 about computer graphics, 2 on AI, 1 on Multiagent systems, "Massively Multiplayer game development 2", "Game Engine Architecture", ... So I kind have some clue about what I'm trying to do. I'm just not sure how to best do it in Nim, where best is "with the least amount of work".
I can't help you on that but go for it, awesome project!
Don't forget to check how Pony does its actor system (the whole language is based on Actors). It's probably state of the art.
In particular here are the papers (bibliography might be interesting as well):
The 2 garbage collection paper might be overkill since we can piggyback on Nim GC.
@monster, I only read half of your post, do you not consider Channel as the mailbox like Erlang?
Also, how about coroutine?
@mratsim Pony looks interesting, I'll have a look. And I haven't seen any of those links before. Lots of inspiration, thanks.
@mashingan I will use Channels. It's just that I remember looking at many Java queues to exchange messages, and I remember that the 1:1 queues usually were faster. But I currently have no intention to code my own queue (too low level for me). I have seen coroutine; I have no clue how it is implemented, but it probably has some 'cost/overhead' (any info on that?), so I'll try to not use it, if I can.
@andrea I only looked at the standard library and Nimble for similar APIs; so I missed Nimoy. I'll certainly have a look. Thanks.
@mratsim I've looked at Pony. While the feature-set is interesting, it fails my need in one critical point: I want C/C++ codegen, not just a C/C++ FFI. I might dig into the doc, for inspiration.
@andrea Nimroy reminds me too much of Akka, which is IMHO the wrong way to do an actor system. Still, it's nice to see some alternatives in Nim. At the very least, that gives me something to compare my code to (if I ever turn this idea into something real). Is "andis" on this forum too? We might want to exchange ideas...
Finally, in a "shower moment", I came to the conclusion that I might be able to use the local heap after all: