(I indented to post the message below as a reply to the recent "Is ORC considered production ready" thread, but it grew much longer than I expected and I think it might deserve a separate thread. Note that I often consider myself not smart enough to write proper multi-threaded code, so the text below might contain wrong or biased information)
Some time ago I have spent several weeks pushing multi threaded Nim 2.0 with shared data to the limits, mostly looking at ARC only.
My adventure started with this post where I asked a similar question: What is the state of threading in Nim 2.0, and how to make the best us of this: https://forum.nim-lang.org/t/9617
My plan was to create a minimal actor based system in which I can write multi threaded programs without ever having to think about the threading; sharing data over threads typically requires the user to take care of the synchronization primitives, which I usually find too cumbersome and error prone for daily work. My end goal was to be something like Erlang's "Process": a very lightweight 'fiber' like flow of control, allowing millions of those to be running on a handful of threads, with all communication being done with message passing through the processes mailboxes. The processes are built on top of disrupteks fine CPS library, which offers processes at the cost of tens of bytes of memory each, with very low scheduling overhead.
This where the shared memory+ARC comes in: for effective message passing of "large" data, you typically want to avoid deep copies, and move the data from thread to thread, effectively. This requires a few steps to happen in the proper order:
The Actors project is more or less complete, and is in a works-for-me state, but I must admit that I have not actually used it for very much after I got it to work; For those interested, take a peek at https://github.com/zevv/actors
A short summary of my conclusions, not complete and in random order:
My single most important takeway of this little adventure is: this problem is still hard, and Nim will not hold your hand - it will happily shoot you in the back of your head when you are not looking. Getting a SIGSEGV right away is usually the best result you can hope for, because these are obvious and traceable. The problem is of course that a lot of bugs of this class can be very, very subtle and can show up in a million different ways, not causing crashes but all kinds of other undefined behavior. Not something I want in my production code.
If you decide to go play with shared ARC managed memory, do yourself a huge favour and use and trust memory sanitizers like asan/tsan and Valgrind/Helgrind/Drd and take the output very serious. I have talked to some people telling me that they knew what they were doing and that Valgrind was just generating false positives. I beg to differ: Valgrind has been right 99% of the time. If Valgrind ever generates a false positive, something in your code is usually doing "funny stuff" and IMHO deserves proper annotation to make it shut up, and inform readers of the code that funny stuff is happening here.
My final conclusion would be that ARC simply does not play well with threading in the current state unless you really, really know what you are doing. Having atomic RC types in the language would take most of these headaches away.
Very interesting writeup! The actor system you describe is exactly the kind of threading story which would really help Nim. A programming language which in 2023 doesn't have anything easier than manual locking and such isn't exactly a great look. Unfortunately it sounds like Nim currently fights your attempts at getting this done fairly hard..
But your approach is still very interesting. If Nim could just be a bit friendlier about handing of one tree of ref objects to another then that would be a very nice way of dealing with threading.
Moving a tree around has never been easier:
import std / [json, isolation]
import threading / channels
var chan = newChan[JsonNode]()
var thr: Thread[void]
proc worker() {.thread.} =
var x: JsonNode
chan.recv(x) # somebody should fix this API...
echo "received ", x
createThread thr, worker
#chan.send unsafeIsolate(%* {"key": 2, "keyB": "value"})
chan.send isolate(%* {"key": 2, "keyB": "value"}) # JSON nodes do form a tree
joinThread thr
No need for "manual locking in 2023", but locking is still awesome anyway IMHO. ;-) Much easier to reason about than any message passing system or "actor model" that I've seen so far. But that's off-topic.
Just played around a bit with your example and unless I create the object entirely within the isolate call it doesn't work. This really limits what you're able to do with this since I can't create stuff outside the isolate call and then isolate them after the fact, and I'm not able to isolate something first and then edit it afterwards. E.g. something like this is not possible:
import std / [json, isolation]
import threading / channels
type
Test = ref object
data: string
Tree = ref object
left, right: Test
var chan = newChan[Tree]()
var thr: Thread[void]
proc worker() {.thread.} =
var x: Tree
chan.recv(x) # somebody should fix this API...
echo "received ", x.left.data, " ", x.right.data
createThread thr, worker
let hello = Test(data: "Hello")
chan.send isolate(Tree(left: hello, right: Test(data: "world")))
joinThread thr
because it can't isolate that let hello. I fail to see how this would be useful for anything more complicated than an example like this, if you have anything which does actual work with this I'm super curious to see how it's supposed to work.
On another note running it through Valgrind/Helgrind I get 3 errors, one of which is the aforementioned issue and two others about possible data races. I'm running this command to test it: nim c --passC:-g --passL:-g -d:useMAlloc araqtree.nim && valgrind --tool=helgrind ./araqtree so it seems like it still doesn't work quite as well as we'd like it to..
@araq: Your example is demonstrating moving a constant tree around has never been easier.
It is showing the happy path because isolate() is able to take your const JSON tree and isolate it; when trying to pass anything else, isolate() is no longer able to do the job and will tell you expression cannot be isolated.
Unfortunately, my data is usually not constant.
You already mentioned unsafeIsolate() in your snippet, which is just casting the value to isolated[T], without actually checking if this is the case. But now you're on your own - your code might work today but fail in interesting ways ten months from now. The programmer now has the responsibility to make sure the tree is isolated, but if it is not you run into undefined behavior - or a early crash if you are lucky.
Thanks @zevv for the nice summary! I'm sure you will excuse my more positive, totally biased take on your work:
The Actors project is more or less complete, and is in a works-for-me state, but I must admit that I have not actually used it for very much after I got it to work; For those interested, take a peek at https://github.com/zevv/actors
So, from what I understand, that is a runtime that combines "micro" processes with an async event loop while being able to use all of your CPU cores and client code is easy to write and "not blocking". Sounds amazing! Plenty of people have been waiting for this thing!
Was it hard to write? I bet. Are you burned out now that it finally begins to work? I can imagine. So let others join the party. ;-)
Would it been easier with an "atomic ARC" mode? Sure. Yet you managed to do without.
The real question is how hard it is for client code to avoid triggering (non atomic) ARC problems when using your actors runtime.
You already mentioned unsafeIsolate() in your snippet, which is just casting the value to isolated[T], without actually checking if this is the case. But now you're on your own - your code might work today but fail in interesting ways ten months from now. The programmer now has the responsibility to make sure the tree is isolated, but if it is not you run into undefined behavior - or an early crash if you are lucky.
Actually, it's not undefined behavior, it's simply always wrong, it's just that the tooling cannot detect it. I claim that it's not hard to ensure isolation for a programmer, but it's hard for Nim's type system. We need real usability data on these things and if you think that your experiments with "actors" is a valid data point I have to say that I don't agree:
Maybe what you say is true, maybe not. In your example
proc createTree(hello: Test): Tree =
Tree(left: Branch(data: "Hello"), right: Branch(data: "world"))
The parameter hello is unused. This means it's not a realistic example. I keep asking for realistic examples. Alias analysis depending on the involved types has proven to be hard to reason about and bites with generic algorithms which is why "strict funcs" evolved to use a mechanism based on the involved expressions only.
I can imagine the same will happen for isolate -- typed based alias analysis is too fragile and a rule like "cannot use local variables" is easier to understand. Or maybe a rule like "every local variable involved in isolate must not be used afterwards".
The result of the query borrows from the n parameter, and in theory the lifetime dependency can avoid refcounting activity since it's assumed that the lending parameter is reachable in the first place and the lent value lifetime won't extend its lender's. Does this already prevent refcount updates?
You misuse lent in your example which makes it harder to understand.
Or is this something that's planned?
You seem to describe an optimization that is "well known":
"There is an important special case in which it is possible to avoid incrementing and decrementing reference counts. Suppose that the program has a declaration
type C = counted collection of ...
We say that a scope S is C-conservative if it contains no assignments to variables of type ^C that are not local to S, it contains no uses of v.refCount for variables in C, and all procedures that it calls are also C-conservative. Within S it is not necessary to update reference counts for variables in C, since no variable in C can be freed in S and every such variable will have the same reference count on exit from S that it had on entry to S."
Nim doesn't do this optimization, instead Nim does "cursor" inference. Given cursor inference, it is not clear if the optimization is worth it. But it is interesting and reasonably easy to understand and implement.
Assuming no concurrent mutation (as for persistent data structures), or a coarse-grained read/write lock over the root Node, the query function above would be race-free, refcount-update-free, and thus thread-safe. It would be a big enabler for multithreaded ARC/ORC.
Maybe, maybe not. How can you assume a lock on the "root" Node? The compiler has no idea about a "root" node, the nodes are all of the same type and the hard part of analysing multi-threaded programs is that you don't know what the other threads may do, it's fundamentally a nonlocal analysis.
Last, I'm wondering if we could have lent T from X syntax similar to what's planned for var T from container syntax? It would allow for more flexibility on parameter position and also potentially borrowing from multiple parameters.
More syntax doesn't help when we still try to figure out the important idioms we need to support.
It's true, that example isn't a real word example. This was simply me trying different things to figure out where it broke, and then being surprised when it broke even if I didn't use the argument. The point is that right now isolate is so strict it's not useful at all. Since you can't pass in data to work with I can't think of a scenario where what you put inside isolate couldn't just have been moved to the receiving thread. Of course we can use anything that doesn't have refs, but since refs are considered safe anywhere else in the language they are pretty much everywhere.
I'm not saying that alias analysis is easy, I'm just saying that without it isolate isn't really all that useful. Indeed having a rule like "every local variable involved in isolate must not be used afterwards" would vastly improve the system. But you still have to know that the local variable can't be an alias and that it can't alias anything which is used afterwards. And then it seems like we've come full circle. The rule would really be "could this entire tree be garbage collected right now, if it weren't for the single reference we're trying to isolate". If that is the case then it should be safe to pass that single reference on to another thread, because without it the tree would be collected.
I could whip up a realistic example, but without a working isolate system it's hard to make sure the entire thing is correct. So I wrote you up two scenarios that I've thought about using such a system for, but apparently those aren't good enough? Would it help if I wrote them out in code so it would be more explicit what I would try to do? I'd have to invent some kind of work to be done though since I don't have anything specific I'm working on right now.
I could whip up a realistic example, but without a working isolate system it's hard to make sure the entire thing is correct.
Yes, please do that. And it's not hard to do: Instead of isolate use unsafeIsolate to make the compiler shut up. And use valgrind/some sanitizer to make sure it's correct. You might need to use tricks like zeroMem(addr local, sizeof(local)) or wasMoved(local) or move(local) etc so that the thread-unsafe destructor is not run on the local variable (which has been moved anyway).
let data = buildTree(..) send data to some thread
Just a thought. Ignore it if its indeed a stupid idea.
Just a thought. Ignore it if its indeed a stupid idea.
You're describing Isolated[T] and isolate and we're trying to figure out how exactly it can work.
My English is not good. Still, I will try to explain what I had in my mind a bit more before I give up.
I was answering these:
Araq: How can you assume a lock on the "root" Node? The compiler has no idea about a "root" node
PMunch: Indeed having a rule like "every local variable involved in isolate must not be used afterwards" would vastly improve the system
In short, my idea was confining graph creation into single proc. I guess its still not an easy job at all if it is even possible but I thought it might make the analysis easier somehow while providing a way to what OP and PMunch has asked.
import std / [json, isolation]
import threading / channels
var chan = newChan[JsonNode]()
var thr: Thread[void]
proc worker() {.thread.} =
let x = chan.recv()
echo "Welcome ", x["user"], x["msg"], " from ", x["uri"]
# this should be magic in compiler
template isolate(b: untyped): untyped =
b
# this is not an ordinary procedure, its a graph generator
proc buildTree(uri, msg, user : string) : JsonNode {.isolate.} =
# I'm telling the compiler, this proc and only this proc is where I'll create my graph.
# Consider the return value as my graph root. You may ignore refcount of children
# but root should have atomic refcount. IDK if extra rules about proc arguments is needed.
# also not sure how but children must be destroyed too when root is destroyed
var root = newJObject()
root["uri"] = %uri
root["msg"] = %msg
root["user"] = %user
# real return value should be isolated JsonNode, dont let me break the rules outside of this proc
return root
echo "Name?"
let user = stdin.readLine()
createThread thr, worker
# this would be an error:
# var x = buildTree("example.com","Hello",user)
# someProcess(x)
# but is it possible to make this work ?
let t = buildTree("example.com","Hello",user)
# chan.send t
# instead of this:
chan.send unsafeIsolate(t)
joinThread thr
Just an idea about proving that variable graph is isolated. Maybe it could be done same way as out of bound runtime check is done, at runtime. In dev mode each ref variable has a field with thread id, and if that id is different from the current thread id when you touch that variable, the exception will be thrown. In production build this flag is removed, so there will be no slow down or overhead. Also, if in the future someone would came up with a clever idea how to do that check at compile time, this flag could be removed without nobody noticing it.
P.S. I don't know much about compilers so feel free to ignore it if it sounds stupid.