This pull request improves the thread safety of ARC with surprisingly little overhead. For some CPUs. Please help us by joining our benchmarking efforts.
Please report:
Stats for koch boot -d:release --threads:on --mm:arc
Stats for koch boot -d:release --threads:on --mm:atomicArc
Your CPU.
Hint: mm: arc; threads: on; opt: speed; options: -d:release 176527 lines; 6.574s; 609.012MiB peakmem;
Hint: mm: atomicArc; threads: on; opt: speed; options: -d:release 176527 lines; 7.462s; 609.34MiB peakmem;
i7-6700K ddr4 2666
arc: 5.519s; 491.391MiB peakmem
atomicArc: 5.750s; 491.387MiB peakmem
Apple M1
AMD Ryzen 5 5600x + Windows 10 for this benchmark
mm: arc; threads: on; opt: speed; options: -d:release 165616 lines; 8.583s; 489.793MiB peakmem
mm: atomicArc; threads: on; opt: speed; options: -d:release 165616 lines; 8.646s; 489.805MiB peakmem
Thanks for the thoughtful reply. I keep replying on this issue since I want to advocate for a more pro-thread and maybe a little fast-and-loose perspective since I don't think there is a shortage of opinions contrary to that.
I think we've gotten my point out with your "Now can a ConcurrentHashMap take in ref as elements ... that's a different question." paragraph. I think the answer changes with vs without atomic refs, though the data races issue is there without unique ownership ofc. I am happy to have been understood so thanks for putting up with my dominating this thread!
I think the answer changes with vs without atomic refs
Yes, you're right, it does. But that addition of expressivity comes with a huge cost. And the cost is too high for my taste as the additional expressivity might not buy us anything for real-world code, in my opinion. I need to do more experiments to validate my opinion.
@PMunch
The problem is that suddenly you want to pass an object from a library between threads, and it uses ref. So now you can't pass that between threads simply because the original library owner didn't use aref?
If the library wasn't designed with thread-safety in mind, just changing the ref to atomic ref will not automagically make the code threadsafe. Case in point
type Node[T] = ref object
next: Node[T]
This works fine in a single-threaded scenario. Now what if someone wants to share the list between threads?
Now you have races when appending or deleting the list, the changes to fix that are not just a simple atomic ref.
The real simple solution is to use locks, sharedTable style, or design the objects with threads in mind from the ground up. We might optimize the lock with a ReaderWriterLock if the object is mostly read and almost never updated.
@treeform
I love the Atomic ARC. Initially, I entertained the idea of having both a ref and thread-safe atomic ref. However, upon further reflection, I don't reckon it's a good idea. Numerous refs come from libraries that you can't edit. Sure, refs can influence one another, such as all refs of an atomic ref atomic as well - the "ref coloring problem". But what if you set an atomic to a local ref? Does that make it atomic too? It's overly convoluted and not worthwhile.
The solution is to use a simple arc for single threads, and when --threads:on is enabled, it should use atomic refs. The negligible speed loss is inconsequential. Look, I'm all about speed, but when you're optimizing for speed, you shift to using non ref objects, UnsafeArrays and pointers. Refs should be about convenience.
The distinction between ref and atomicRef makes it easy to spot "business-as-usual" and "reader beware, this object will be involved in one of the most complex areas of computer science."
@guzba
These examples always distill down to creating a clean new ref in a no-side-effect-proc with all the data passed in as value types. This seems like a pretty limiting set of preconditions to me. What if send needed to take a ref type instead of a string? And that ref type came from who knows where. (Yes the head / next is interesting, it requires care here.)
That's what isolate tries to do, basically a ref represent ownership and you send ownership to the remote thread, then the compiler considers all access in the local thread as invalid. This is what the Send trait is in Rust as well.
Could you point me to a language that's doing a a great job at this from your perspective? Seems like a good idea to learn about better solutions to thread stuff with some real examples.
Go is using channel to pass ownership between goroutines
Rust uses Send trait for passing ownership and has both Rc and Arc (Atomic Refcounting). Their Rc/Arc is less efficient than Nim since Nim doesn't change refcount if the compiler sees a move or if the object doesn't escape the instantiating scope.
Pony has 6 levels of ownership/shared-ownership https://tutorial.ponylang.io/reference-capabilities/reference-capabilities.html?h=reference+capabilities#what-about-global-variables
Erlang: sharing is done via sending messages.
Could you point me to a language that's doing a a great job at this from your perspective? Seems like a good idea to learn about better solutions to thread stuff with some real examples.
Personally, I think no language are great at thread stuff, they all have strengths and weaknesses:
Go: awesome for IO, webservers, etc, bad for scientific computing (how to access an array in parallel?)
Rust: excellent ecosystem for IO, webservers, etc, very annoying for scientific computing for shared memory parallelism due to the borrow checker. I did hear a lot of complaints about having to choose between tokio or async-std regarding async threadpools and having a coloring problem when a dependency you want doesn't use the async framework of your choosing.
Pony: the reference capabilities might take the whole problem a bit into overengineering.
Erlang: shared memory parallelism issue.
Now on the Nim side, shared memory parallelism is solved, at least for me with Weave and my newest threadpool (https://github.com/mratsim/constantine/tree/1c5341f/constantine/platforms/threadpool ), but for IO parallelism, we don't want to make the same mistake as Rust.
Note that the actual challenges in all those languages is independent from ref / atomicRef.
Go is using channel to pass ownership between goroutines
I do not think this is true. Go does not care about ownership at all. See here: https://go.dev/play/p/o2oph_JEaaW
This runnable little example shows Go does not care about ownership at all to me. Channels seem to like signaling and synchronization vs ownership transfer. Go uses shared thread-safe types all over the place.
The constantine threadpool seems very cool. Do you plan to (or have you already) package it as a stand-alone library?
If I read the "trade-offs" section correctly, it has basically no trade-offs compared to nim-taskpools (i.e. it is strictly an improvement over nim-taskpools)?
Btw, its very nice to see renewed interest for multithreading but still I am getting big fat warnings from threadsanitizer about the new threaded allocator patch with every example, I try:
==================
WARNING: ThreadSanitizer: data race (pid=7446)
Read of size 8 at 0x7fee05980d70 by main thread:
#0 rawAlloc__system_u6623 /home/antonisg/build/Nim/lib/system/alloc.nim:868:38 (tisuniqueref+0xe9b7f) (BuildId: 34e9c29b33f156d7e9cf99655ce10a7638b1b40a)
#1 alloc__system_u6838 /home/antonisg/build/Nim/lib/system/alloc.nim:1052:11 (tisuniqueref+0xef6e2) (BuildId: 34e9c29b33f156d7e9cf99655ce10a7638b1b40a)
#2 allocImpl__system_u1772 /home/antonisg/build/Nim/lib/system/alloc.nim:1128:11 (tisuniqueref+0xef6e2)
#3 allocSharedImpl /home/antonisg/build/Nim/lib/system/alloc.nim:1184:11 (tisuniqueref+0xef6e2)
#4 alignedAlloc__system_u1912 /home/antonisg/build/Nim/lib/system/memalloc.nim:331:12 (tisuniqueref+0xef6e2)
#5 nimNewObjUninit /home/antonisg/build/Nim/lib/system/arc.nim:99:343 (tisuniqueref+0xf0227) (BuildId: 34e9c29b33f156d7e9cf99655ce10a7638b1b40a)
#6 NimMainModule /home/antonisg/code/tisuniqueref.nim:67:129 (tisuniqueref+0xf11dd) (BuildId: 34e9c29b33f156d7e9cf99655ce10a7638b1b40a)
#7 NimMainInner /home/antonisg/build/Nim/lib/std/typedthreads.nim:167:2 (tisuniqueref+0xf13a2) (BuildId: 34e9c29b33f156d7e9cf99655ce10a7638b1b40a)
#8 NimMain /home/antonisg/build/Nim/lib/std/typedthreads.nim:178:2 (tisuniqueref+0xf13a2)
#9 main /home/antonisg/build/Nim/lib/std/typedthreads.nim:186:3 (tisuniqueref+0xf13a2)
Previous atomic write of size 8 at 0x7fee05980d70 by thread T1 (mutexes: write M0):
#0 addToSharedFreeList__system_u6494 /home/antonisg/build/Nim/lib/system/alloc.nim:767:127 (tisuniqueref+0xeb16c) (BuildId: 34e9c29b33f156d7e9cf99655ce10a7638b1b40a)
#1 rawDealloc__system_u6775 /home/antonisg/build/Nim/lib/system/alloc.nim:958:4 (tisuniqueref+0xeb16c)
#2 alignedDealloc /home/antonisg/build/Nim/lib/system/memalloc.nim (tisuniqueref+0xed526) (BuildId: 34e9c29b33f156d7e9cf99655ce10a7638b1b40a)
#3 nimRawDispose /home/antonisg/build/Nim/lib/system/arc.nim:171:2 (tisuniqueref+0xed526)
#4 eqdestroy___tisuniqueref_u67 /home/antonisg/code/tisuniqueref.nim:39:3 (tisuniqueref+0xf05aa) (BuildId: 34e9c29b33f156d7e9cf99655ce10a7638b1b40a)
#5 eqcopy___tisuniqueref_u55 /home/antonisg/code/tisuniqueref.nim:36:3 (tisuniqueref+0xf09a8) (BuildId: 34e9c29b33f156d7e9cf99655ce10a7638b1b40a)
#6 worker__tisuniqueref_u153 /home/antonisg/code/tisuniqueref.nim:31:5 (tisuniqueref+0xf09a8)
#7 threadProcWrapDispatch__stdZtypedthreads_u105 /home/antonisg/build/Nim/lib/system/threadimpl.nim:71:2 (tisuniqueref+0xef211) (BuildId: 34e9c29b33f156d7e9cf99655ce10a7638b1b40a)
#8 threadProcWrapStackFrame__stdZtypedthreads_u95 /home/antonisg/build/Nim/lib/system/threadimpl.nim:100:2 (tisuniqueref+0xef211)
#9 threadProcWrapper__stdZtypedthreads_u81 /home/antonisg/build/Nim/lib/system/threadimpl.nim:106:2 (tisuniqueref+0xe7983) (BuildId: 34e9c29b33f156d7e9cf99655ce10a7638b1b40a)
Mutex M0 (0x55c765feb458) created at:
#0 pthread_mutex_init <null> (tisuniqueref+0x8c158) (BuildId: 34e9c29b33f156d7e9cf99655ce10a7638b1b40a)
#1 initLock__coreZlocks_u7 /home/antonisg/build/Nim/lib/core/locks.nim:38:2 (tisuniqueref+0xf1102) (BuildId: 34e9c29b33f156d7e9cf99655ce10a7638b1b40a)
#2 NimMainModule /home/antonisg/code/tisuniqueref.nim:25:2 (tisuniqueref+0xf1102)
#3 NimMainInner /home/antonisg/build/Nim/lib/std/typedthreads.nim:167:2 (tisuniqueref+0xf13a2) (BuildId: 34e9c29b33f156d7e9cf99655ce10a7638b1b40a)
#4 NimMain /home/antonisg/build/Nim/lib/std/typedthreads.nim:178:2 (tisuniqueref+0xf13a2)
#5 main /home/antonisg/build/Nim/lib/std/typedthreads.nim:186:3 (tisuniqueref+0xf13a2)
Thread T1 (tid=7448, running) created by main thread at:
#0 pthread_create <null> (tisuniqueref+0x67206) (BuildId: 34e9c29b33f156d7e9cf99655ce10a7638b1b40a)
#1 createThread__stdZtypedthreads_u60 /home/antonisg/build/Nim/lib/std/typedthreads.nim:246:106 (tisuniqueref+0xe7a65) (BuildId: 34e9c29b33f156d7e9cf99655ce10a7638b1b40a)
#2 createThread__stdZtypedthreads_u51 /home/antonisg/build/Nim/lib/std/typedthreads.nim:262:2 (tisuniqueref+0xe7b4e) (BuildId: 34e9c29b33f156d7e9cf99655ce10a7638b1b40a)
#3 NimMainModule /home/antonisg/code/tisuniqueref.nim:63:2 (tisuniqueref+0xf1115) (BuildId: 34e9c29b33f156d7e9cf99655ce10a7638b1b40a)
#4 NimMainInner /home/antonisg/build/Nim/lib/std/typedthreads.nim:167:2 (tisuniqueref+0xf13a2) (BuildId: 34e9c29b33f156d7e9cf99655ce10a7638b1b40a)
#5 NimMain /home/antonisg/build/Nim/lib/std/typedthreads.nim:178:2 (tisuniqueref+0xf13a2)
#6 main /home/antonisg/build/Nim/lib/std/typedthreads.nim:186:3 (tisuniqueref+0xf13a2)
SUMMARY: ThreadSanitizer: data race /home/antonisg/build/Nim/lib/system/alloc.nim:868:38 in rawAlloc__system_u6623
==================
Is there any chance these can be fixed?
Is there any chance these can be fixed?
Sure. PR please...? ;-)
If I read the "trade-offs" section correctly, it has basically no trade-offs compared to nim-taskpools (i.e. it is strictly an improvement over nim-taskpools)?
Yes it's a strict improvement. The only tradeoff is that nim-taskpools as been used for a year in production by thousands of nodes running 24/7 with millions of dollars at stake. Constantine's threadpool has been running in CI.
Do you plan to (or have you already) package it as a stand-alone library?
Ideally when mono-repos are well supported. It's important for crypto libraries to have as few dependencies as possible, ideally zero, and also to ensure that everything work together. This becomes more and more critical as supply-chain attacks become more frequent.