Hi.
As I mentionned in a previous post, I would have liked to have a "relaxed" fetch_and_add, which also works on Windows. Such a method only exists starting with Windows 8 (interlockedExchangeAddNoFence64), which I thought was too limiting; there are still people out there with Windows 7 that I might want to "target" (anything below that is uninteresting). It seems that to be able to use interlockedExchangeAddNoFence64() "conditionally" on Windows 8+, I would either have to build a "Windows 8" version, and a "Windows 7" version, or put the code in a dynamically loaded DLL. This felt like an overkill for just one single function. Also, it would have meant using interlockedExchangeAdd64() (with fence) instead on Windows 7, which was also stupid, since it's a CPU feature, not an OS one.
So I searched and found out that it's basically trivial to implement under Intel, as defined in Wikipedia All you need is a "lock; xaddl %0, %1". So I set out to try to do the same in Nim, but it failed to compile. So I tried the Nim example for ASM usage, and it also failed to compile:
{.push stackTrace:off.}
proc addInt(a, b: int): int =
# a in eax, and b in edx
asm """
mov eax, `a`
add eax, `b`
jno theEnd
call `raiseOverflow`
theEnd:
"""
{.pop.}
Which produces:
cl.exe /c /nologo /Z7 /IC:\nim-0.17.2\lib /Fo...atomiks.obj ...atomiks.c
geth_atomiks.c
...atomiks.c(44): error C4235: nonstandard extension used: '__asm' keyword not supported on this architecture
...atomiks.c(45): error C2065: 'mov': undeclared identifier
...atomiks.c(45): error C2146: syntax error: missing ';' before identifier 'eax'
Searching for "error C4235: nonstandard extension used: '__asm' keyword not supported on this architecture", I read in several places things like "Inline asm on 64bit development is not a supported scenario ...".
So, basically, since x86 apps are dying out, and (almost) nobody owns a 32-bit Windows anymore, ASM on Windows is dead? Or am I missing something?
By now, I think that giving up on Windows 7 is probably the rational thing to do; anything else is overkill. :(
Visual Studio support for assembly has always been tricky (if I recall correctly, it to a whole for inline assembly to be supported).
Have you tried using Gcc/Mingw? That's what most people use.
@cheatfate So, I could still code something like interlockedExchangeAddNoFence64() in ASM, and use it both in 32-bits and 64-bits with MSVC, but I have to put it in a separate ASM file? And maybe explicitly "compile" (not sure if this is the right term) it with some ASM tool? I currently have no clue how to do that, but at least that sounds like a workable solution. Thanks.
@Varriount AFAIK, UE4 does not support gcc at all. On Mac and Linux, you should use clang. On Windows, only MSVC is "officially" supported. I think some people have tried to use clang on Windows, and it either not works, or is so much trouble as to not be worth the hassle.
@monster Have you tried running the compiled exe on a Windows 7 machine with the interlockedExchangeAddNoFence64 function? As far as I understand, this function is a compiler intrinsic, meaning that the compiler internally knows how to spit out assembly instructions for every place that function is called. So while interlockedExchangeAddNoFence64 indeed first started appearing in the Windows 8 SDK it does not mean it won't work on earlier Windows Systems.
Instead interlockedExchangeAddNoFence64 will not work if you compile with an older compiler which did not know this intrinsic function.
ref.: InterlockedExchangeAddNoFence64 function
This function is implemented using a compiler intrinsic where possible. For more information, see the WinBase.h header file and _InterlockedExchangeAdd64_nf.
And from Compiler Intrinsics:
Most functions are contained in libraries, but some functions are built in (that is, intrinsic) to the compiler. These are referred to as intrinsic functions or intrinsics. If a function is an intrinsic, the code for that function is usually inserted inline, avoiding the overhead of a function call and allowing highly efficient machine instructions to be emitted for that function.
One additional hint that InterlockedExchangeAddNoFence64 is not depending on a library (making it OS-agnostic), is that the MSDN page does not list the library file in which this function is implemented, in contrast to the FormatMessage function which lives in Kernel32.dll according to Microsoft (listed in the Requirements table on the bottom of the MSDN page)
BTW, our beloved gcc does the same thing, they just have different names. Atomic operations usally start with __atomic in gcc.
@couven92 At first, I thought the easiest was to "give up" on Windows 7, but when I actually ran the test, the program compiled, but failled to link, with this message:
atomiks.obj : error LNK2019: unresolved external symbol _InterlockedExchangeAddNoFence referenced in function atomicIncRelaxed_gujWN15RsV5Ef6kAMSLe8w
I'm assuming I'm declaring it correctly, otherwise it wouldn't even compile. But I have no clue how to deal with this error (Google was no help), so I gave up entirely on "NoFence". :(
FYI, this is how I declare it, before using it:
const
hasThreadSupport = compileOption("threads") and not defined(nimscript)
when declared(atomicLoadN):
# USE PTHREADS!
discard
elif defined(vcc) and hasThreadSupport:
when defined(cpp):
# ...
proc interlockedExchangeAddNoFence64(p: pointer; val: int64): int64
{.importcpp: "_InterlockedExchangeAddNoFence64(static_cast<__int64 volatile *>(#), #)", header: "<windows.h>".}
proc interlockedExchangeAddNoFence32(p: pointer; val: int32): int32
{.importcpp: "_InterlockedExchangeAddNoFence(static_cast<long volatile *>(#), #)", header: "<windows.h>".}
else:
# ...
proc interlockedExchangeAddNoFence64(p: pointer, val: int64): int64
{.importc: "_InterlockedExchangeAddNoFence64", header: "<windows.h>".}
proc interlockedExchangeAddNoFence32(p: pointer, val: int32): int32
{.importc: "_InterlockedExchangeAddNoFence", header: "<windows.h>".}
I Interpret this in the MS documentation:
Header Winnt.h (include Windows.h)
As saying it's defined in winnt.h, but you should include Windows.h instead, which is what I did.
If it's a "compiler intrinsic", then why does the name still even exist at link time? Should it not have been replaced with the ASM code at compile time?
@monster Hmmm... I assume the generated intrinsic ASM is bundled in one of the Windows SDK libs then... But I have no idea which one :P
That means you'd have to add some .lib file to the linking options... That still would make your application OS-agnostic.
Have you tried using the appearently truly instrinsic InterlockedExchangeAdd64_nf delcared in <intrin.h>? Although on the msdn page it states that the NoFence variants only apply for ARM architectures and you should the regular InterlockedExchangeAdd64 instead.
@conven92 Well, since it said it was only for ARM, I hadn't tried those. So now I did, and the same result comes out:
atomiks.obj : error LNK2019: unresolved external symbol InterlockedExchangeAdd64_nf
(with or without a "_" as prefix).
While I'd like to know what is going on, it is basically an "optimisation" problem, because the code will still work "with" a fence. So I'm (for now) leaving this as an "unsolved mistery", and moving on to add cluster support to my code, which is more important.