I am working on a project that spawns threads that operate on a global memory array (seq[seq[uint64]]) and am trying to wrap my head around the locks module and passing variables to a threaded funtion. If I pass a single variable it works, but fails with N>1 variables.
Slightly modifying the example from the docs:
(this doesn't work)
import std/locks
var
L: Lock
var x = 0
proc threadFunc(a, b: int) {.thread.} =
acquire(L) # lock stdout
x.inc(a)
echo b
release(L)
proc doIt() =
initLock(L)
var thr: array[0..4, Thread[int]]
for i in 0..high(thr):
createThread(thr[i], threadFunc, (i, i*i))
joinThreads(thr)
deinitLock(L)
doIt()
echo x
I have tried variations of Thread[int] as well
Thread[x, y: int]
Thread[int, int]
So I am at an impasse as the function in my actual code takes a handful of varied arguments. Any pointers?
And beyond that, while I can access the global var x in a modified example from the docs, in my code, that uses a global var, I get this error when trying to create a thread:
Error: 'processSegment' is not GC-safe as it accesses 'globalMem' which is a global using GC'ed memory
I tried changing it from a var to a ref object type but no luck.
To pass multiple arguments to a thread you need to wrap them in a tuple:
import std/locks
var
L: Lock
var x = 0
proc threadFunc(a, b: int) {.thread.} =
acquire(L) # lock stdout
x.inc(a)
echo b
release(L)
proc doIt() =
initLock(L)
var thr: array[0..4, Thread[int]]
for i in 0..high(thr):
createThread(thr[i], threadFunc, (i, i*i))
joinThreads(thr)
deinitLock(L)
doIt()
echo x
That being said working on a global seq[seq[T]] sounds like it could be trouble. At least it used to be trouble pre-ARC, not sure if it is any longer.
That should be:
import std/locks
var
L: Lock
var x = 0
proc threadFunc(a: (int, int)) {.thread.} =
acquire(L) # lock stdout
x.inc(a[0])
echo a[1]
release(L)
proc doIt() =
initLock(L)
var thr: array[0..4, Thread[(int, int)]]
for i in 0..high(thr):
createThread(thr[i], threadFunc, (i, i*i))
joinThreads(thr)
deinitLock(L)
doIt()
echo x
But you should just use Weave or Malebolgia for threading, these also include examples how to process arrays in parallel etc.
I see I managed to copy-paste the original snippet and not my edited version. This is what I meant to send:
import std/locks
var
L: Lock
var x = 0
proc threadFunc(y: tuple[a, b: int]) {.thread.} =
acquire(L) # lock stdout
x.inc(y.a)
echo y.b
release(L)
proc doIt() =
initLock(L)
var thr: array[0..4, Thread[tuple[a, b: int]]]
for i in 0..high(thr):
createThread(thr[i], threadFunc, (i, i*i))
joinThreads(thr)
deinitLock(L)
doIt()
echo x
Thank you for the responses.
@mratsim Argon2, 90% done implementing Argon2 just trying to get the threading working. And you are right, the threads access the "memory" in parallel but they are all assigned different regions which simplifies things. For better or worse I seem to be getting away with using a pointer to the memory so I guess I will stick with that.
It also says:
{.deprecated: "use the nimble packages `malebolgia`, `taskpools` or `weave` instead".}
Looking at the algorithm on Wikipedia: https://en.wikipedia.org/wiki/Argon2
Function Argon2
Inputs:
password (P): Bytes (0..232-1) Password (or message) to be hashed
salt (S): Bytes (8..232-1) Salt (16 bytes recommended for password hashing)
parallelism (p): Number (1..224-1) Degree of parallelism (i.e. number of threads)
tagLength (T): Number (4..232-1) Desired number of returned bytes
memorySizeKB (m): Number (8p..232-1) Amount of memory (in kibibytes) to use
iterations (t): Number (1..232-1) Number of iterations to perform
version (v): Number (0x13) The current version is 0x13 (19 decimal)
key (K): Bytes (0..232-1) Optional key (Errata: PDF says 0..32 bytes, RFC says 0..232 bytes)
associatedData (X): Bytes (0..232-1) Optional arbitrary extra data
hashType (y): Number (0=Argon2d, 1=Argon2i, 2=Argon2id)
Output:
tag: Bytes (tagLength) The resulting generated bytes, tagLength bytes long
Generate initial 64-byte block H0.
All the input parameters are concatenated and input as a source of additional entropy.
Errata: RFC says H0 is 64-bits; PDF says H0 is 64-bytes.
Errata: RFC says the Hash is H^, the PDF says it's ℋ (but doesn't document what ℋ is). It's actually Blake2b.
Variable length items are prepended with their length as 32-bit little-endian integers.
buffer ← parallelism ∥ tagLength ∥ memorySizeKB ∥ iterations ∥ version ∥ hashType
∥ Length(password) ∥ Password
∥ Length(salt) ∥ salt
∥ Length(key) ∥ key
∥ Length(associatedData) ∥ associatedData
H0 ← Blake2b(buffer, 64) //default hash size of Blake2b is 64-bytes
Calculate number of 1 KB blocks by rounding down memorySizeKB to the nearest multiple of 4*parallelism kibibytes
blockCount ← Floor(memorySizeKB, 4*parallelism)
Allocate two-dimensional array of 1 KiB blocks (parallelism rows x columnCount columns)
columnCount ← blockCount / parallelism; //In the RFC, columnCount is referred to as q
Compute the first and second block (i.e. column zero and one ) of each lane (i.e. row)
for i ← 0 to parallelism-1 do for each row
Bi[0] ← Hash(H0 ∥ 0 ∥ i, 1024) //Generate a 1024-byte digest
Bi[1] ← Hash(H0 ∥ 1 ∥ i, 1024) //Generate a 1024-byte digest
Compute remaining columns of each lane
for i ← 0 to parallelism-1 do //for each row
for j ← 2 to columnCount-1 do //for each subsequent column
//i' and j' indexes depend if it's Argon2i, Argon2d, or Argon2id (See section 3.4)
i′, j′ ← GetBlockIndexes(i, j) //the GetBlockIndexes function is not defined
Bi[j] = G(Bi[j-1], Bi′[j′]) //the G hash function is not defined
Further passes when iterations > 1
for nIteration ← 2 to iterations do
for i ← 0 to parallelism-1 do for each row
for j ← 0 to columnCount-1 do //for each subsequent column
//i' and j' indexes depend if it's Argon2i, Argon2d, or Argon2id (See section 3.4)
i′, j′ ← GetBlockIndexes(i, j)
if j == 0 then
Bi[0] = Bi[0] xor G(Bi[columnCount-1], Bi′[j′])
else
Bi[j] = Bi[j] xor G(Bi[j-1], Bi′[j′])
Compute final block C as the XOR of the last column of each row
C ← B0[columnCount-1]
for i ← 1 to parallelism-1 do
C ← C xor Bi[columnCount-1]
Compute output tag
return Hash(C, tagLength)
You need a threadpool that supports data parallelism / parallel for.
You can get away with OpenMP by using the || operator https://nim-lang.org/docs/system.html#%7C%7C.i%2CS%2CT%2Cstaticstring
Using https://github.com/mratsim/laser/blob/master/laser/openmp.nim for extra syntax sugar, you can use
for i in 0 || (omp_get_num_threads() - 1):
myArray[i] = <...>
For OpenMP to work, you normally need to pass passc:-fopenmp and passl:-fopenmp either in the command line or via pragma. With the openmp.nim utility you just need to pass -d:openmp on the command-line.
(On a Mac, the default Clang deos not support OpenMP you have to install GCC or Clang from Homebrew).
@mratsim Thanks for that in depth response. Will take be a bit to parse through it.
@Araq Yes I saw that but I am one of those people who hates having their code depend on someone elses code. As I am not writing commercial/production code, when at all possible I use only the standard library of a language. That is one (of many) reason(s) I like Nim, it has a strong and useful standard library.