Starting a new thread here because the others are too crowded. isUniqueRef seems to be more useful than isolate:
https://github.com/nim-lang/Nim/pull/21812
ARC without atomics is actually quite good at multi-threading. You need to know what you're doing though and coming up with
proc send(x: sink string; replyTo: sink DatabaseConnection) =
assert isUniqueRef(replyTo)
withMyLock headL:
head = MyList(data: x, next: head, other: replyTo)
assert isUniqueRef(head.other) # some protection against a potential destructor from the main thread
was a bit challenging. But it's way too early to encode the idioms in the type system as we explore them.
I've been experimenting with the same idea. Essentially if we assume that we can know that a reference is unique it would be possible to atomically swap that only reference in and out of some shared location. Imagine the following:
var x: Atomic[ref Complex] # Could of course just be a normal ref object, just shown explicitly here for clarity
proc myThread() {.thread.} =
var y: ref Complex
while y.isNil:
y = x.exchange(nil) # If there was a reference in X we now hold the only reference to it, we've essentially locked access to this tree by holding the only way to access it
y[].doWork # We are free to do whatever we want on Y since we hold the only reference. The only thing we can't do is copy a reference to Y or place memory referenced by us into Y.
y = x.exchange(y) # We now move Y back into X and null out our reference at the same time. The reference in X is now the only reference and access is essentially unlocked
This very simple pattern is essentially a spin lock around shared object access. As long as we can't copy the shared reference locally or store anything with external references in the Complex data type then we can do anything we want to it. The important part is just that the reference counts can't be touched after the reference is swapped back into the global position. As soon as we do that swap back nothing should touch that object.
I think part of the confusion people have had wile discussing this topic is the unclear definition of isolate. Since it draws inspiration from Pony I looked at their definitions, which I feel succinctly explains them the way I envisaged them:
Isolated data is safe
If a block of data has only one reference to it then we call it isolated. Since there is only one reference to it, isolated data cannot be shared by multiple threads, so there are no concurrency problems. Isolated data can be passed between multiple threads. As long as only one of them has a reference to it at a time then the data is still safe from concurrency problems.
Isolated data may be complex
An isolated piece of data may be a single byte. But it can also be a large data structure with multiple references between the various objects in that structure. What matters for the data to be isolated is that there is only a single reference to that structure as a whole. We talk about the isolation boundary of a data structure. For the structure to be isolated:
1. There must only be a single reference outside the boundary that points to an object inside.
2. There can be any number of references inside the boundary, but none of them must point to an object outside.
These are the same problems we are discussing here, and isUniqueRef seems like it would be able to solve 1 pretty easily and 2 if done recursively. I feel like the previous version of isolate which only did compile-time verification is still useful, but mostly as an optimisation to avoid calls to isUniqueRef on runtime. If it is impossible for isUniqueRef to return true there is no need for calling it.
Thanks Araq, nice to see this; I think this is part of the infrastructure we need to make these kind of operations safe.
I see the implementation verifies the RC of the referenced object itself is zero; do you have any opinions on doing deep/recursive inspection of a graph?
https://nim-lang.org/docs/destructors.html#sink-parameters
To move a variable into a collection usually sink parameters are involved. A location that is passed to a sink parameter should not be used afterward. This is ensured by a static analysis over a control flow graph. If it cannot be proven to be the last usage of the location, a copy is done instead and this copy is then passed to the sink parameter.
When sink was introduced, I thought I would be able to guarantee full ownership of object and no further use of them afterwards. This would be extremely useful for channels and sending objects safely across threads.
I don't really see a use-case where a proc would be tagged sink and the object is still used afterwards besides a programmer unintended mistake.
I think Nim2.0 would be a good point where sink warns against that to then make it an error.
This would be extremely useful for channels and sending objects safely across threads.
This was at some point x: owned ref T, not sink. Since owned also enforces moving, detects "use after move" reliably and prevents the creation of cycles, maybe it's time to take a fresh look at it.
do you have any opinions on doing deep/recursive inspection of a graph?
I'm not against it but it seems to be overly complex and pessimistic: If the graph is isolated, multi-threading is safe. If it is not, multi-threading might still be safe. That's what my example program shows. The linked list is not "isolated", it's safe because of the lock for the head of the list.
(Taken from Pony but adapted to Nim.)
Just to throw in some examples of the kind of thing I was talking about in the other topic. This uses a global atomic to share reference, this reference is exchanged for a nil in each thread meaning that only one thread at a time has access to the data in it. The threads are then free to do whatever they like with the reference and exchange it back (essentially locking and unlocking a global resource in this example). Of course the broader idea would be to pass such a reference through a channel for example. What I'd want is for Nim to use its reference counting to ensure that the tree I'm trying to exchange in is guaranteed to be isolated (as per the Pony definition), and static code analysis to figure out if there is no way it could have an outside reference and simply skip the check. This is similar to how ARC only actually uses a refcount when it isn't sure the data is contained and can be freed at the end of the scope. Similarly I'd want Nim to use a runtime check of isolatedness if it isn't possible to statically reason that it is isolated:
{.push, header: "<stdatomic.h>".}
type
MemoryOrder* {.importc: "memory_order".} = enum
moRelaxed
moConsume
moAcquire
moRelease
moAcquireRelease
moSequentiallyConsistent
AtomicInt64 {.importc: "_Atomic NI64".} = int64
proc atomic_exchange_explicit[T, A](location: ptr A; desired: T; order: MemoryOrder = moSequentiallyConsistent): T {.importc.}
{.pop.}
type Shared[T] = distinct AtomicInt64
proc newShared[T](x: typedesc[T]): Shared[T] =
discard atomicExchangeExplicit(result.addr, cast[ref T](nil))
proc exchange[T](s: var Shared[T], x: ref T): ref T =
GC_ref(x) # Tell Nim that the shared variable counts as a reference
atomicExchangeExplicit(s.addr, x)
type
TreeTest = ref TreeTestImpl
TreeTestImpl {.acyclic.} = object
data: string
next: TreeTest
var sharedTree: Shared[TreeTestImpl]
proc treeThreadFunc() {.thread.} =
for i in 0..<100:
var x: TreeTest
while x.isNil:
x = sharedTree.exchange(nil)
x = TreeTest(data: "Hello from " & $getThreadId() & ": " & $i, next: x)
x = sharedTree.exchange(x)
proc treeTest() =
discard sharedTree.exchange(TreeTest(data: "Bottom", next: nil))
var thr: array[50, Thread[void]]
for i in 0..high(thr):
createThread(thr[i], treeThreadFunc)
joinThreads(thr)
var
test = sharedTree.exchange(nil)
count = 0
while not test.next.isNil:
echo count, ": ", test.data
inc count
test = test.next
echo test.data
echo count
treeTest()
type
ClosureTest = ref ClosureTestImpl
ClosureTestImpl = object
cb: proc (x: int): int {.closure.}
var s: Shared[ClosureTestImpl]
proc closureThreadFunc() {.thread.} =
var
x = 1
for i in 0..<30:
var y: ClosureTest
while y.isNil:
y = s.exchange(nil)
{.cast(gcsafe).}:
x = y.cb(x)
y = s.exchange(y)
proc closureTest() =
var so: seq[int]
var myProc = proc (x: int): int =
so.add getThreadId()
x * 2
discard s.exchange(ClosureTest(cb: myProc))
var thr: array[50, Thread[void]]
for i in 0..high(thr):
createThread(thr[i], closureThreadFunc)
joinThreads(thr)
echo so.len
echo so
closureTest()
I found a talk on the above paper: https://youtu.be/yUtZwdGhxhw
The talk helped me make sense of their notation, but the iso parts seems pretty similar to isolated. They also have a if disconnected(x, y) which seems like an interesting/useful construct.
When sink was introduced, I thought I would be able to guarantee full ownership of object and no further use of them afterwards. This would be extremely useful for channels and sending objects safely across threads.
The system in the paper describes one possible way to do this with implicit static regions and a "focus" mechanism. They describe semantics for a way to annotate functions to work with the regions as well which might give ideas on how to model the semantics for sink and owned for threading.
Sorta makes me wonder how much could be implemented as a library to play with?
Then again, I'd not be sure how helpful type tracking thread-safe constructs would be useful outside really just iso. It's already possible to vet a few core multi-threaded constructs (channels, queues, etc) and rely on isUniqueRef or isolated for regular data.
I found a talk on the above paper: https://youtu.be/yUtZwdGhxhw
Thanks for the reference, I have now watched it.
I don't think we need to copy their model though, I copied Pony's solution which gets away without modelling regions and control flow in the compiler. So it's objectively simpler and hopefully good enough.
std / tasks does use isolate so if your spawn implementation uses toTask (which both malebogia and thready do at least, haven't checked Weave yet) you're already getting the isolation checking. It's never been easier to tinker with the resulting system, please give it a try.
I don't think we need to copy their model though, I copied Pony's solution which gets away without modelling regions and control flow in the compiler. So it's objectively simpler and hopefully good enough.
Great, I never dove too deep into Pony's system. The system in the paper also led to possible exponential solve times for the type checking in some cases too.
But yah, it wouldn't seem to add to much to current solution. Does Pony use the sink and owned as well? That's something I was hoping to get some deeper understanding about.
std / tasks does use isolate so if your spawn implementation uses toTask (which both malebogia and thready do at least, haven't checked Weave yet) you're already getting the isolation checking. It's never been easier to tinker with the resulting system, please give it a try.
Yah I used isolate a while back with channels. It's a solid design IMHO. I haven't tried the new tasks though. Getting the tryRecv and trySend was a bit finicky. I think malebolgia style returns would be much easier for day-to-day programmers.
Though I don't have a current project that needs multi-threading really. So I can't give detailed feedback on api.
One thing that would be really interesting to see in malebogia (or elsewhere) would be an ability isolate errors. I'd mentioned it before, but Erlang's process spawn let you choose whether a panic would also crash your parent process or not.
The Erlang model extended to threads could provide a powerful technique for embedded servers with Nim, IMHO. I've spent way more time figuring out how to handle errors on tasks / threads. The Master and tasks setup in malebogia seem like it'd be possible to add that sorta system.
The ability to have panics for "defects", but isolate them to a single thread and let the Master thread know about it, or die itself, would be amazing. Then regular exceptions would let you handle standard things, but you could still capture and when possible recover from defects.
For a good example, I was working on a robotic gantry system using ROS2. There's a race condition / timeout in the core ROS2 system which occurrs with the UDP packets with congested networking. The defect was part of the core system but wasn't critical and could be resolved by just retrying. Instead, it crashed the whole process.
Not really fun when your multi-ton robot just stops and the error logs are "timeout". It was C++ code, not Nim but I believe a discrete task with an Erlang style supervisor would've encouraged the ROS2 developers to "on error, restart this piece".
Hmmm, now I wonder if Pony also built on that idea too?
std / tasks does use isolate so if your spawn implementation uses toTask (which both malebogia and thready do at least, haven't checked Weave yet) you're already getting the isolation checking. It's never been easier to tinker with the resulting system, please give it a try.
Weave predates std/tasks so I do not use it there. To reduce stress on the memory allocator of short-lived tasks I implemented a memory pool with another level of caching (i.e. memory pool + cache). https://github.com/mratsim/weave/blob/v0.4.0/weave/memory/README.md#in-depth-description-for-the-memory-pool-and-lookaside-buffer
nim-taskpools does https://github.com/status-im/nim-taskpools/blob/v0.0.4/taskpools/taskpools.nim#L446-L453
In Constantine, I choose yet another approach.
In nim-taskpools, tasks have a huge overhead which is problematic for short-lived tasks:
On fibonacci benchmark, a contention/overhead benchmark since it spawns 2^40 almost empty tasks, this makes nim-taskpools perform really bad, like 10s or so while Weave takes 500ms or even 200ms with stack-allocated futures (only possible if you guarantee structured parallelism, i.e. futures don't escape their scope).
For Weave, besides the memory pool to avoid allocations, step 1 and 2 are fused as the closure environment is intrusive to the task ptr object. Issue is that I have to limit the environment in size: 144 bytes for the env, 112 bytes for tasks metadata, 256 bytes total, iirc that's the size of an Erlang actor. And step 3 can be made costless if the future does not escape its scope. Though currently it's a flag (-d:WV_LazyFlowvar) that applies to the whole program. I would need compiler help for "Heap allocation elision" (https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0981r0.html / https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p2477r0.html / https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p2006r1.pdf / https://www.youtube.com/watch?v=FWD1msG8YU4 / https://github.com/ziglang/zig/issues/1260).
Note that this would also be hugely beneficial for async / closure iterators and CPS.
For Constantine I choose yet another approach, using std/tasks was not performant enough and while Weave was efficient, it's unsuitable for cryptography because implementing your own (threadsafe!) memory management opens a whole new can of worms (and also Weave is a work-requesting runtime since it's message-passing based, truly stealing without requiring cooperation of the victim requires 2x less code, reduces states and reduces code to audit/bug surface).
I found a design to completely fuse steps 1-2-3 in a single alloc, with no limitations on size of the environment as well (very helpful when you might want to send cryptographic objects that are minimum 32 bytes in size) by having the future use the task buffer as well to send back the result and then coordinate the 3 owners of that buffer, the task creator, the task processor and the task/future awaiter and having use-after-free-free lifecycle.
In the end, that design is as efficient (or more) as Weave base design (but not as much as Weave with heap alloc elision) for significantly less code, less bug/attack surface (no custom memory management) and better compatibility with AddressSanitizer and Valgrind.
Note that this would also be hugely beneficial for async / closure iterators and CPS.
Please describe yet again what the optimization looks like. Whenever I look at it I come to the same conclusion "cannot optimize the realworld case because in the realworld the frame/object/whatever does escape".
Please describe yet again what the optimization looks like. Whenever I look at it I come to the same conclusion "cannot optimize the realworld case because in the realworld the frame/object/whatever does escape".
If the compiler can prove that a future/flowvar/coroutine/closure iterator does not escape, i.e. it's awaited in the same scope that spawned it, allocate it on the stack instead of on the heap.
This is guaranteed if you use structured concurrency/parallelism for example. And happens quite often for coroutines/closure iterators.
LLVM is able to optimize these coroutines away thanks to heap-alloc elision, https://godbolt.org/g/26viuZ:
++
int main() {
auto s = seq<int>();
auto t = take_until(s, 10);
auto m = multiply(t, 2);
auto a = add(m, 110);
return std::accumulate(a.begin(), a.end(), 0);
}