Can't find CompareAndSwap intrinsic in nimrod, only atomicInc, atomicDec, but need
msvc++: PVOID __cdecl InterlockedCompareExchangePointer(
_Inout_ PVOID volatile *Destination,
_In_ PVOID Exchange,
_In_ PVOID Comparand
);
GCC: type __sync_val_compare_and_swap (type *ptr, type
oldval, type newval, ...)
Is there already, or can it be implemented via importc?
Ok, I'll make a pull request soon, but should I write a separate "unit test" for compilability? The test requires certain options to be passed to compiler (e.g. --threads, -t:-march=i686 for mingw32...) for atomics to work.
Implemented, tested manually for compilability and analized disassembly (there's no inline asm support for x64 visual c so I used intrinsics only). Seems to work on gcc (ubuntu64), mingw32 (win7), ms visual c++ 2010 x64.
If someone is interested, here's the source: https://github.com/exhu/nimrod-misc/blob/master/atomicas.nim
BTW CAS operation is safe enough to operate on generic references so what's the way to change ugly cast here?
proc compareAndSwap(mem: ptr pointer, expected: pointer, newValue: pointer): pointer {.nodecl,
importc: " __sync_val_compare_and_swap".}
var r: ref int
compareAndSwap(cast[ptr pointer](addr(r)), nil, nil)
proc compareAndSwap[T: ptr|ref|pointer](mem: var T, expected, newValue: T): T {.nodecl, importc: " __sync_val_compare_and_swap".}
MFlamer said: I have expanded the atomics module so it now includes all the gcc atomics built-ins.
I would caution against making the GCC atomic built-ins a general module. There are a number of problems with them. They work fine if you are aware of these issues and avoid them for a specific project, or if you don't need the problematic parts.
First, the GCC atomic built-ins are modeled after the Itanium architecture, mirroring Intel's implementation and the Itanium memory model. This may not be a good fit for all applications.
Second, __sync_synchronize() is broken on older versions of GCC for Intel processors; it generates no code and simply becomes a compiler barrier. That means, in particular, that it won't work as advertised. You'll need to manually write code for this case if you don't know whether you won't encounter it (this includes systems where gcc is still stuck at 4.2.1 because of the GPL2->GPL3 change, such as OS X and FreeBSD, unless you can guarantee that you will be compiling with clang).
Third, the API only knows a full memory barrier (the aforementioned __synch_synchronize()). Since on many common processor architectures (in particular x86, x86_64, and SPARCs in TSO mode) many memory barriers are no-ops, this can incur tremendous overhead. Even other architectures often have cheaper memory barriers if you don't need a full barrier. At least for the common case of x86/x86_64 archictures, it may be a good idea to provide optimized implementations in the form of a compiler fence for LoadLoad and StoreStore barriers at least.
A problem with atomic operations is that there really isn't a great portable solution (at least until the C++11 and C1X standards become widely implemented). libatomic_ops is probably the most flexible existing implementation (it's used in the Boehm GC internally), but requires building and potentially linking another package and only provides portable atomic operations on full machine words (double words, where the architecture allows for it).
MFlamer wrote: Do these suffer from the same problems?
No, but they simply don't exist for older versions of gcc. They were introduced in gcc 4.7.0 as the underlying primitives upon which C++11 support for atomics was built.