I've written my first small module that uses locks. My simple test seems to work, but I have no experience with using locks in C/C++, so I thought I should better ask anyway. Also, I had to realize that, "if you're lucky" bad code still works (I used alloc instead of allocShared, and my first test worked anyway...).
I'm skipping the test code, and the 'mylist' module to keep this short (and because my test worked). newList() from "mylist" is like a fixed-size array, on the shared heap.
import locks
import mylist
import typetraits
var mappingLock {.global.}: Lock
initLock(mappingLock)
# All registered types until now
var types {.global, guard: mappingLock.} = newList[cstring](0)
# Creates a copy of a (temporary) cstring into the shared-heap
proc copyCString(s: cstring, len: int): cstring =
result = cast[cstring](allocShared(len+1))
copyMem(cast[pointer](result), cast[pointer](s), len+1)
# Returns the ID of a type
proc typeID*(T: typedesc): uint16 =
let typName: cstring = T.name
withLock(mappingLock):
let size = len(types)
for i in 0..<size:
# == compares cstring like strcmp, not like pointers
if typName == types[i]:
return uint16(i)
assert(size < high(uint16).int32)
let newTypes = newList[cstring](size+1)
for i in 0..<size:
newTypes[i] = types[i]
newTypes[size] = copyCString(typName, len(typName))
# EDIT Need deallocShared(types) here ...
types = newTypes
return uint16(size)
# Returns the name of a type, from it's ID
proc typeName*(id: uint16): cstring =
withLock(mappingLock):
let size = len(types)
assert(id.int32 < size)
return types[id.int32]
Finally, is there more documentation somewhere about "atomic" (or "volatile", like in Java) usage? The Nim manual shows a short example of "atomicRead(x)", but uses a "memoryReadBarrier()" procedure which doesn't seem to exist (Google returned nothing on it, I assume this is "left as an exercise", but idk how it should be implemented, as I have not used concurrency in C/C++ yet).
I think "memoryReadBarrier" was just an example.
One way you can get more info is to do a git clone of the source code, and then grep for where memoryReadBarrier occurs, for instance:
$ grep memoryReadBarrier * -R
doc/manual/locking.txt: memoryReadBarrier()
Grepping for "atomicRead" shows a few more examples, over in the tests:
$ grep atomicRead * -R
doc/manual/locking.txt: template atomicRead(x): untyped =
doc/manual/locking.txt: echo atomicRead(atomicCounter)
tests/parallel/tguard1.nim:template atomicRead(L, x): untyped =
tests/parallel/tguard1.nim: echo(atomicRead(c.L, c.i))
tests/parallel/tguard2.nim:template atomicRead(L, x): untyped =
tests/parallel/tguard2.nim: echo(atomicRead(c.L, c.i))
Nim also seems to have some kind of concept of volatile, too, if I grep for that:
david@david-pc:~/dev/learning/nim/github/Nim$ grep volatile * -R | grep -v 'c'
lib/pure/volatile.nim:template volatileStore*[T](dest: ptr T, val: T) =
lib/pure/volatile.nim: {.emit: ["*((", type(dest[]), " volatile*)(", dest, ")) = ", val, ";"].}
lib/system/threads.nim: var x {.volatile.}: pointer
lib/js/jsffi.nim: "volatile", "null", "true", "false"]
In particular, this builtin lib may be interesting to you:
lib/pure/volatile.nim
And here's one main lib I can find at the moment that deals with atomics:
lib/system/atomics.nim
I will try to explain your questions and/or provide some links on these topics (I know nothing about the quality of these links and I know almost nothing about the x86-architecture and Iam still a nimnoob). The ARM and various DSP´s is my "home". From my point of view, it´s better to understand the basics before starting to use the high-level-concurrency-stuff. In java, atomic and volatile are two different things ( https://brooker.co.za/blog/2013/01/06/volatile.html ).
Atomics in general are used to access the memory atomically so it can not be interrupted. If you have, for instance, a 16bit cpu with a 16bit memory-bus, a write or read of a 32-bit-word will never be atomic. So with atomics you can do lockless-stuff means without synchonisation. Explained here https://bartoszmilewski.com/2008/12/01/c-atomics-and-memory-ordering or here https://herbsutter.com/2013/02/11/atomic-weapons-the-c-memory-model-and-modern-hardware
With atomics, you can do some kind of "blackmagic", a lockless hashtable for instance: http://preshing.com/20130605/the-worlds-simplest-lock-free-hash-table/
The other part is about synchonized access of memory-regions (Mutex and Semaphores). Nim offers here the lowlevel stuff Locks, Conditions with wait() and signal() (as the cpp api) I posted some code-snippet here https://forum.nim-lang.org/t/3218 where you can experiment with. Also you may take a look into some nim-modules (threadpool for instance)
Nim offers also high-level concurrency stuff (Threadpool (spawn and ^), Guards and Channels) for safer access but beware of the thread-local gc. Just pass pointers between threads so the gc is not "playing" with it.
I hope I could help you a little bit.
@monster,
Assuming your your "myList" variable is setup correctly, and those are the only two functions that access the data, your code will not have data races. :-)
Note that is a big assumption on "myList", because the threadlocal GC can cause all kinds of problems here. as @Mikra pointed out. You seem to be aware of this because you have it setup as a global variable, and you stated that you are allocating on a "shared heap", but I just wanted to point out the caveats to be as clear as possible.
Now for the unsolicited code critique section.... Feel free to ignore this random guy on the internet :-P
Unless the "myList" array is very small, Your typeID proc is badly inefficient.
I would never put a lock around a loop like that if I can help it, let alone two loops! You generally want to keep the code inside your locks as small and as simple as possible. Loops and memory allocation are two of the worst things you can do while holding a lock.
You are also allocating and copy'ing the that whole array inside that lock, a very inefficient operation. It may be worth trying to figure out how to modify that array in place, or possibly using a more complex data structure that requires less looping, or better yet no locking.
As @mikra hinted at, the low level concurrency stuff in nim is mostly just wrapping the C apis. So understanding how C works in this regard is 95% of understanding nim.
The atomic operations in Nim are documented in the system module at https://nim-lang.org/docs/system.html#atomicInc,int,int
There is also the volatile.nim library as pointed out by @wizzardx. It's not in the docs, but it is really just a very thin wrapper around the C "volatile": https://github.com/nim-lang/Nim/blob/master/lib/pure/volatile.nim
Here is a good article about the C version of Volatile: https://barrgroup.com/Embedded-Systems/How-To/C-Volatile-Keyword
Volatile in C is very similar to Java, but not exactly the same. Be aware of that. They both prevent compiler optimizations, but the Java version has much more strict rules around Volatile.
P.S. Hi @mikra! I'm jealous of all the ARM programming you are doing! It sounds like very fun stuff!
First of all, thanks for the code review! I'm just starting in Nim, and I have no one else to ask. Secondly, I started coding 33 years ago (EDIT: I wrote my first program at 12), so I'm not generally a programming newbie. Thirdly, I'm the Networking/Server/DB/Load-Balancing/Profiling/Optimizing/Multi-Threading "specialist" in my team (EDIT: I don't claim to be great at all of that; just better than my teammates at it); basically the "backend" guy (partly, because I'm not a native German speaker, and the literature on the "domain knowledge" in German is beyond me, so the "other devs" take care mostly of the "business logic"). I do use volatile/atomics regularly. I try to avoid locks like the plague; firstly, because you can't get a dead-lock without locks, and secondly, because volatile/atomics are normally faster. What I was trying to say is, I'm new at native concurrency. I assume that atomics and locks will work about the same as in Java, but volatile might not.
Now for the specific comments:
@wizzardx Thanks. After countless years relying on an IDE (Eclipse, the slowest thing ever made thing since the 3 1/2 inch floppies), I feel somewhat helpless without one. But my grep-foo is strong, so I just have to get into the habit of using it that way.
@mikra Thanks for the overview. I had spotted the post about "about locks and condition variables", but I wasn't sure if I got it right. For example, is it OK to directly initialize a global variable guarded with a lock, as I did, or do I have to leave it uninitialized(nil), and get a lock just to initialize it? I wouldn't want to implement a lock-less hashtable myself, but I got used to using them, so it's good to see how it's done.
@rayman22201 I don't use any "refs" in myList(). In fact, it doesn't even accepts refs as type parameter, but I had to ask the forum to know how it's done. About "Your typeID proc is badly inefficient." I know! But I really didn't want to implement my own hashtable on the shared heap at this stage (firstly) (I assume the Nim library hashtables only work on local heap?) and secondly I would have used atomics instead of a lock anyway, if I had known how to! Thirdly, my assumption is that this code will only be called when initializing the modules, but afterward the app will just use the IDs, so simplicity trumps efficiency in that case (or, learn how to walk first, before you try to run).
so I'm not generally a programming newbie.
I don't think anyone meant to imply that you were a newbie! I apologize if I came off condescending.
I specialized in concurrency and multi-threading at university, and you would be surprised at how many many people would show me code in their hot path that had lots of loops and allocations, and then wonder why their code was still slow even though they "parallelized" it. These were fairly competent programmers otherwise.
For example, is it OK to directly initialize a global variable guarded with a lock, as I did, or do I have to leave it uninitialized(nil), and get a lock just to initialize it?
That is an interesting question. This is what the nim manual says about it: https://nim-lang.org/docs/manual.html#guards-and-locks
"Top level accesses to gdata are always allowed so that it can be initialized conveniently. It is assumed (but not enforced) that every top level statement is executed before any concurrent action happens."
I think that means that the answer is no, you don't have to take a lock to initialize the global variable. YMMV.
I don't use any "refs" in myList().
Perfect. Based on your post I was pretty sure you did the right thing, I just wanted to be clear and cover all the bases.
my assumption is that this code will only be called when initializing the modules
If it's not in the hot path, then it's good enough :-P
I assume the Nim library hashtables only work on local heap
Yeah, that's a safe assumption. :-( Nim doesn't have any good lock free or concurrent data structures built in atm. Thankfully Nim is great at interfacing with C lol.
my grep-foo is strong
This should hopefully become less of an issue as Nim matures. All aboard the better and moar documentation train!
Ah, also a reminder - try nimgrep in addition to regular grep:
https://nim-lang.org/docs/nimgrep.html
Nim's style insensitivity means that you may miss a few useful things if you just use the regular grep.
@Rayman2220 @wizzardx I had shared memory issue in OpenMP at one point in Arraymancer (solved since then).
I just checked volatile and from this Intel post, it says that there is no guarantee to have a memory fence, it just so happens that for x86 and Itanium volatile implies memory fences.
Consequently Intel removed all volatile variables from their TBB (Intel Threads Building Blocks, their task parallelism library).
So volatile in C is good for interrupts/hardware communication when memory can be written suddenly by the hardware but not suitable for multithreading.
@Monster sorry for my late response; I see rayman22201 already answered your question. I think it's a good habit to declare the globals immediately. My code was just to experiment with Locks and Conditions; I personally avoid globals.
@rayman22201 It was fun-stuff I have to work now with databases, but also locking stuff and dealing with deadlocks :-)
@mratsim thank you for your interesting additions
@mratsim Does that mean I shouldn't use https://github.com/nim-lang/Nim/blob/master/lib/pure/volatile.nim ?
@mikra How would you "share" something between threads without globals? Globals are generally considered "evil", but atm I would not know of a better way to do this in Nim... Create your data in the main thread before creating "worker threads", and pass a pointer to your data at thread creation maybe?
@mikra I forgot to ask; you said "If you have, for instance, a 16bit cpu with a 16bit memory-bus, a write or read of a 32-bit-word will never be atomic." I assume it also applies to volatile(?). I'm not going to run my code on a 16-bit CPU, but I might run my code on both 64-bit CPUs, and 32-bit CPUs (mobile, assuming I'm done before all mobile CPUs move to 64-bit...) So I could use int/ptr, and 32-bits or less integers and float32, but using 64-bit integers/float64 would be unsafe as atomic/volatile in Nim? I have to ask, because in Java volatile/atomic long/double is always accessed atomically, even in 32-bits!
@mratsim, That Intel article is really interesting. I also don't use volatile regularly, so I haven't been affected by this, but it is a really interesting thing to know... Another case of the devil is in the details, and how both the compiler and the platform affect things.
@monster, I don't have a lot of experience with mobile, but I think using an atomic long/double is only way to truly guarantee that you get atomic writes on a 32bit platform. It doesn't look like Nim has strong support for this kind of thing beyond ints, so you probably have to drop down to interfacing with C and using something like the GCC atomic intrinsics
https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html
@Monster to avoid globals, simply go for a root structure/object and pass the pointers around
to your second question:
How atomic is implemented depends completeley on your architecture. Thats the reason why you have the c standards; each Program should behave correct regardless if its compiled for PowerPC, ADSP-XXXXX or MIPS for instance. if you have a MC (for instance the 6502; I recommend to start with that if you like to go for basic research) who likes to write to memory per statement 8bit can be written. 16bit needs at least 2 asm-statements. You are single core here but 2 statements could be interrupted. And if you have a dma controller on the same bus, things could be worse because a single write could also be delayed. Thats the reason why some vendors have special, atomic, instructions. TAS (test and set) or CAS (compare and set) which behave atomic on the databus.
Due to the physical fact that external memory is very slow clocked, today there are memory caches. With multicore and caches things get complicated. Each core has its cache but needs to read/write to external memory. The access needs to synchonized/serialized but I dont know how its done on the 0x86 architecture. Most of the technology is not open source (hidden undocumented instructions possible). Also on modern SOC (GPU or Baseband chips) there is also running a proprietary RTOS behind the scenes. At least for two physical chips the VHDL is open, its the P8X32A from parallax (crazy non mainstream) and the risc-v core ( https://www.sifive.com/products/hifive1 ). The Sitara TI family is not open but all datasheets are public (Beaglebone for instance). They are linux ready and nim should also run on it (not tested yet).
Its always the same: tradeoff between IO/Bandwidth and CPU and dealing with contention regardless if your system is on singlecore, multicore, database or you do something distributed.
Due to the fact that you have multicore and for instance your architektural design lacks a little bit it could be possible that you go for parallel but your code is much slower (or not faster) than the single thread version.
So if you have a 32-bit arch and you like to do atomics with 64bit you have to look into your C api. If it´s there your are lucky if not you have to implement it for your own (Locks).
@mratsim, @rayman, OK, so for code portable to 32 and 64 bits, better not use volatile bigger than 32 bits.
@mikra I'll to Google some of the things you talked about when I get home, but I think I get the idea.
@Araq I was hoping there was already something like that. That is going to save me a lot of work. Thanks!