I'm making a proof of work (think hashcash) library but have encountered two bugs when running it under arc/orc. The first is encountered when using a normal hashing algorithm. I'm using a libsodium wrapper but if you replace crypto_generichash() with SHA1 in the standard library or this SHA2 library the bug is also triggered. It runs fine for the default refc gc but when I enable arc or orc I only get ~50% cpu usage instead of 100% with refc. Something weird must be happening under the hood since while htop does report 100% cpu usage, only 50% of the load bars are green while the other half is red. This means 50% of cpu time is used for user processes and 50% is used for kernel processes. The refc version takes 2.1 seconds to complete while arc/orc takes 19.1 seconds on my Haswell cpu.
The second bug happens when using libsodium crypto_pwhash(). As noted in the code comment, add two Fs to target to speed it up, uncomment the crypto_pwhash() line and comment the crypto_generichash() line above it. When using both refc and arc they work fine (takes 8.7 seconds), but when using orc it runs fine but I get
SIGSEGV: Illegal storage access. (Attempt to read from nil?) Error: execution of an external program failed: './programName '
at the end most of the time. Only occasionally when I run it with orc it runs fine with no error. Debug advice is appreciated, I'm using the latest git devel (git hash: c38487aa225674f67659c28ce171fd93e2231ad1). -d:release vs -d:danger make no difference. Here is my code:
import threadpool, strutils, cpuinfo
import libsodium/sodium
let
payload = "just testing?0!7801"
initialHashHex = crypto_generichash(payload, 16)
target = 0xFFFFFFFFFF'u64 # Add two Fs when using crypto_pwhash()
func toString(binary: openArray[byte]): string =
result = newStringOfCap(binary.len)
for bin in binary:
result.add(bin.char)
func pow(target: uint64, initialHashHex: string, nonce: uint64): (uint64, string) {.thread.} =
var
trialValue = high(uint64)
nonceThread = nonce
initialHashBin: array[16, byte]
resultHash = newStringOfCap(64)
for index, value in initialHashHex[0..15]:
initialHashBin[index] = value.byte
while trialValue > target:
resultHash = crypto_generichash(crypto_generichash($nonceThread, 32, initialHashHex).bin2hex, 32, initialHashHex).bin2hex
# resultHash = (crypto_pwhash($nonceThread, initialHashBin, 32, phaArgon2id13, 2, 1048576 * 4)).toString.bin2hex
trialValue = fromHex[uint64](resultHash[0..15])
inc(nonceThread)
return (nonceThread-1, resultHash)
proc multithread(totalThreads: range[1..256], thread: range[1..256], initialHashHex: string, target: uint64): uint64 {.thread.} =
let nonce = high(uint64) div uint64(totalThreads) * uint64(thread-1)
echo "thread: ", thread, " initialNonce: ", nonce.toHex # Debugging
let (finalNonce, finalHash) = pow(target, initialHashHex, nonce)
echo "thread: ", thread, " finalNonce: ", finalNonce.toHex, " finalHash: ", finalHash # Debugging
result = finalNonce
proc callMulti(threads: range[0..256], initialHashHex: string, target: uint64) =
var totalThreads = threads
if totalThreads == 0:
if countProcessors() == 0:
totalThreads = 1
elif countProcessors() > 256:
totalThreads = 256
else:
totalThreads = countProcessors()
var responses = newSeq[FlowVarBase](totalThreads)
for thread in 1..totalThreads:
responses[thread-1] = spawn multithread(totalThreads, thread, initialHashHex, target)
var index = blockUntilAny(responses)
let threads = 4
callMulti(threads, initialHashHex, target)
Also I could use some help returning the return values from the pow() thread that completes first and terminate the rest. Right now I'm just printing the results. Would channels be useful here?Regarding Weave vs threadpool, Weave is indeed higher performance (https://github.com/mratsim/weave) but at a high-level it's load balancer/scheduler + threadpool.
Proof-Of-Work doesn't really need a load balancer it's just use all threads until you stop and there is no synchronization required aside from start/stop so it's already the most ideal scenario for a threadpool. So don't expect super high improvement here with Weave over plain threadpool here.
Now regarding implementation.
For SHA2, I suggest you use nimcrypto's implementation which underwent security audit: https://github.com/cheatfate/nimcrypto/blob/master/nimcrypto/sha2.nim and doesn't involve the GC at all (i.e. no strings/seq only openarray). Alternatively, you might be able to isolate the pure assembly high-performance implementation from BLST:
FYI this is how we wrap the full library: https://github.com/status-im/nim-blscurve/blob/86d151d7/blscurve/blst/blst_lowlevel.nim#L20-L21
Lastly, here is my multithreaded implementation of ethhash: https://github.com/status-im/nim-ethash/blob/0b1a9969/src/proof_of_work.nim iirc, it's only partly parallelized with OpenMP (the || symbol) because at the time (2018) there was no toOpenArray for slicing without allocation.
I opened a Github issue so hopefully we can find a solution.
Thanks, I'll try using Weave. It looks much more polished then the standard library threads.
I appreciate the high performance SHA2 libraries you pointed out but I really only mentioned it because it also triggered the bug. I tested my code on a different hash implementation to make sure it wasn't just the hash code I used that caused the bug. Ethash is pretty interesting, even after all this time there's still no high performance ASIC. I wonder if ProgPoW will end up getting accepted into Ethereum. I kinda want my PoW function to not be used by any big/medium cryptocurrency for fear of having an ASIC be developed. I decided on Argon2 for now as it's a standard but I really like the asymmetric resource demand of generating and verifying hashes in Ethash and RandomX.
Note that I ended wrapping this assembly SHA256 implementation anyway https://github.com/status-im/nim-blscurve/blob/eed30c06/blscurve/blst/sha256_abi.nim
Benchmarks show that it's 15x faster than nimcrypto.sha2 on my machine
@mratsim, in the code you linked, what does this do?
proc vec_zero(ret: pointer, num: csize_t)
{.importc, exportc, header: srcPath/"vect.h", nodecl.}
If you remove it, you will have a complaint about undefined reference to vec_zero because this is defined in header as static void.
static void sha256_init(SHA256_CTX *ctx)
{
sha256_init_h(ctx->h);
ctx->N = 0;
vec_zero(ctx->buf, sizeof(ctx->buf));
ctx->off = 0;
}
And vec_zero is defined in header as static void as well, so you need to make sure that the both implementation are instantiated in the same file.
Also why I don't export directly sha256_init but wrap it in another proc.
For SHA2, I suggest you use nimcrypto's implementation which underwent security audit:
Could you please kindly point me to that audit. Thanks.
Could you please kindly point me to that audit. Thanks.
The audit of SHA2, HMAC and HKDF was part of reviewing this https://github.com/status-im/nim-blscurve/issues/73
In general our crypto primitives audit issues are on the respective repos:
And cover our implementations of SHA256, HMAC, HKDF, ECDSA over secp256k1, ED25519 over Curve25519, BLS signatures over BLS12-381 curve and related serialization (hex, ASN1, ...), CSPRNG as used in: