nimforum mirror - ARC/ORC cpu intensive loop bug?

VP8M8 (orginal) [2020-09-14T22:30:03+02:00] view original

I'm making a proof of work (think hashcash) library but have encountered two bugs when running it under arc/orc. The first is encountered when using a normal hashing algorithm. I'm using a libsodium wrapper but if you replace crypto_generichash() with SHA1 in the standard library or this SHA2 library the bug is also triggered. It runs fine for the default refc gc but when I enable arc or orc I only get ~50% cpu usage instead of 100% with refc. Something weird must be happening under the hood since while htop does report 100% cpu usage, only 50% of the load bars are green while the other half is red. This means 50% of cpu time is used for user processes and 50% is used for kernel processes. The refc version takes 2.1 seconds to complete while arc/orc takes 19.1 seconds on my Haswell cpu.

The second bug happens when using libsodium crypto_pwhash(). As noted in the code comment, add two Fs to target to speed it up, uncomment the crypto_pwhash() line and comment the crypto_generichash() line above it. When using both refc and arc they work fine (takes 8.7 seconds), but when using orc it runs fine but I get

SIGSEGV: Illegal storage access. (Attempt to read from nil?)
Error: execution of an external program failed: './programName '

at the end most of the time. Only occasionally when I run it with orc it runs fine with no error. Debug advice is appreciated, I'm using the latest git devel (git hash: c38487aa225674f67659c28ce171fd93e2231ad1). -d:release vs -d:danger make no difference. Here is my code:

import threadpool, strutils, cpuinfo
import libsodium/sodium

let
  payload = "just testing?0!7801"
  initialHashHex = crypto_generichash(payload, 16)
  target = 0xFFFFFFFFFF'u64  # Add two Fs when using crypto_pwhash()

func toString(binary: openArray[byte]): string =
  result = newStringOfCap(binary.len)
  for bin in binary:
    result.add(bin.char)

func pow(target: uint64, initialHashHex: string, nonce: uint64): (uint64, string) {.thread.} =
  var
    trialValue = high(uint64)
    nonceThread = nonce
    initialHashBin: array[16, byte]
    resultHash = newStringOfCap(64)
  for index, value in initialHashHex[0..15]:
    initialHashBin[index] = value.byte
  while trialValue > target:
    resultHash = crypto_generichash(crypto_generichash($nonceThread, 32, initialHashHex).bin2hex, 32, initialHashHex).bin2hex
#    resultHash = (crypto_pwhash($nonceThread, initialHashBin, 32, phaArgon2id13, 2, 1048576 * 4)).toString.bin2hex
    trialValue = fromHex[uint64](resultHash[0..15])
    inc(nonceThread)
  return (nonceThread-1, resultHash)

proc multithread(totalThreads: range[1..256], thread: range[1..256], initialHashHex: string, target: uint64): uint64 {.thread.} =
  let nonce = high(uint64) div uint64(totalThreads) * uint64(thread-1)
  echo "thread: ", thread, " initialNonce: ", nonce.toHex  # Debugging
  let (finalNonce, finalHash) = pow(target, initialHashHex, nonce)
  echo "thread: ", thread, " finalNonce: ", finalNonce.toHex, " finalHash: ", finalHash  # Debugging
  result = finalNonce

proc callMulti(threads: range[0..256], initialHashHex: string, target: uint64) =
  var totalThreads = threads
  if totalThreads == 0:
    if countProcessors() == 0:
      totalThreads = 1
    elif countProcessors() > 256:
      totalThreads = 256
    else:
      totalThreads = countProcessors()
  var responses = newSeq[FlowVarBase](totalThreads)
  for thread in 1..totalThreads:
    responses[thread-1] = spawn multithread(totalThreads, thread, initialHashHex, target)
  var index = blockUntilAny(responses)

let threads = 4
callMulti(threads, initialHashHex, target)

Also I could use some help returning the return values from the pow() thread that completes first and terminate the rest. Right now I'm just printing the results. Would channels be useful here?

Araq (orginal) [2020-09-15T08:43:23+02:00] view original

Please report bugs on github. Also, for CPU performance consider using Weave, much more effort went into Weave than into Nim's spawn implementation.

mratsim (orginal) [2020-09-15T09:57:27+02:00] view original

At first glance, I find this line suspect:

crypto_generichash($nonceThread, 32, initialHashHex), this converts (and so allocate) to a Nim string, and then convert to C-string implicitly as libsodium is using a C backend. It's possible that ARC/ORC is then reclaiming this somehow.

Regarding Weave vs threadpool, Weave is indeed higher performance (https://github.com/mratsim/weave) but at a high-level it's load balancer/scheduler + threadpool.

Proof-Of-Work doesn't really need a load balancer it's just use all threads until you stop and there is no synchronization required aside from start/stop so it's already the most ideal scenario for a threadpool. So don't expect super high improvement here with Weave over plain threadpool here.

Now regarding implementation.

For SHA2, I suggest you use nimcrypto's implementation which underwent security audit: https://github.com/cheatfate/nimcrypto/blob/master/nimcrypto/sha2.nim and doesn't involve the GC at all (i.e. no strings/seq only openarray). Alternatively, you might be able to isolate the pure assembly high-performance implementation from BLST:

header: https://github.com/supranational/blst/blob/master/src/sha256.h

assembly portable: https://github.com/supranational/blst/blob/a8398ed2/build/elf/sha256-portable-x86_64.s

assembly simd: https://github.com/supranational/blst/blob/master/build/elf/sha256-x86_64.s

FYI this is how we wrap the full library: https://github.com/status-im/nim-blscurve/blob/86d151d7/blscurve/blst/blst_lowlevel.nim#L20-L21

Lastly, here is my multithreaded implementation of ethhash: https://github.com/status-im/nim-ethash/blob/0b1a9969/src/proof_of_work.nim iirc, it's only partly parallelized with OpenMP (the || symbol) because at the time (2018) there was no toOpenArray for slicing without allocation.

VP8M8 (orginal) [2020-09-23T06:07:04+02:00] view original

I opened a Github issue so hopefully we can find a solution.

Thanks, I'll try using Weave. It looks much more polished then the standard library threads.

I appreciate the high performance SHA2 libraries you pointed out but I really only mentioned it because it also triggered the bug. I tested my code on a different hash implementation to make sure it wasn't just the hash code I used that caused the bug. Ethash is pretty interesting, even after all this time there's still no high performance ASIC. I wonder if ProgPoW will end up getting accepted into Ethereum. I kinda want my PoW function to not be used by any big/medium cryptocurrency for fear of having an ASIC be developed. I decided on Argon2 for now as it's a standard but I really like the asymmetric resource demand of generating and verifying hashes in Ethash and RandomX.

mratsim (orginal) [2020-09-23T09:13:41+02:00] view original

Note that I ended wrapping this assembly SHA256 implementation anyway https://github.com/status-im/nim-blscurve/blob/eed30c06/blscurve/blst/sha256_abi.nim

Benchmarks show that it's 15x faster than nimcrypto.sha2 on my machine

cdunn2001 (orginal) [2020-09-24T04:48:59+02:00] view original

@mratsim, in the code you linked, what does this do?

proc vec_zero(ret: pointer, num: csize_t)
    {.importc, exportc, header: srcPath/"vect.h", nodecl.}

mratsim (orginal) [2020-09-24T09:27:07+02:00] view original

If you remove it, you will have a complaint about undefined reference to vec_zero because this is defined in header as static void.

static void sha256_init(SHA256_CTX *ctx)
{
    sha256_init_h(ctx->h);
    ctx->N = 0;
    vec_zero(ctx->buf, sizeof(ctx->buf));
    ctx->off = 0;
}

And vec_zero is defined in header as static void as well, so you need to make sure that the both implementation are instantiated in the same file.

Also why I don't export directly sha256_init but wrap it in another proc.

moerm (orginal) [2020-09-25T00:29:17+02:00] view original

For SHA2, I suggest you use nimcrypto's implementation which underwent security audit:

Could you please kindly point me to that audit. Thanks.

mratsim (orginal) [2020-09-25T10:18:29+02:00] view original

Could you please kindly point me to that audit. Thanks.

The audit of SHA2, HMAC and HKDF was part of reviewing this https://github.com/status-im/nim-blscurve/issues/73

In general our crypto primitives audit issues are on the respective repos:

https://github.com/status-im/nim-blscurve/issues?q=is%3Aissue+is%3Aopen+SEC

https://github.com/cheatfate/nimcrypto/issues?q=is%3Aissue+is%3Aopen+SEC

https://github.com/status-im/nim-libp2p/issues?q=is%3Aissue+is%3Aopen+SEC+

And cover our implementations of SHA256, HMAC, HKDF, ECDSA over secp256k1, ED25519 over Curve25519, BLS signatures over BLS12-381 curve and related serialization (hex, ASN1, ...), CSPRNG as used in:

https://github.com/status-im/nim-libp2p/tree/master/libp2p/crypto

https://github.com/status-im/nim-blscurve

Mirror of forum.nim-lang.org

6817 :: ARC/ORC cpu intensive loop bug?