I would like some guidance on how to best parallelize this loop. I've played around with the parallel: macro, and using spawn, but haven't gotten a working solution yet. It isn't entirely clear what the best approach should be:
for i,face in faces:  #face is a path to an image file
    var width,height,channels:int
    #loads and decodes jpgs/pngs from disk returning a seq[byte]
    let data = stbi.load(face,width,height,channels,stbi.Default)
    if data != nil and data.len != 0:
        let target = (GL_TEXTURE_CUBE_MAP_POSITIVE_X.int+i).TexImageTarget
        #sets up opengl texture
        glTexImage2D(target.GLenum,
                     level.GLint,
                     internalFormat.GLint,
                     width.GLsizei,
                     height.GLsizei,
                     0,
                     format.GLenum,
                     pixelType.GLenum,
                     data[0].unsafeAddr)  #error on this when using parallel:
    else:
        echo "Failure to Load Cubemap Image"
When I try to just use the parallel: macro I get: Error: cannot prove: 0 <= len(data) + -1 (bounds check)
I also experimented with spawn but wasn't clear on how to use that well.
from this post
convert something like the following to suit your specific usage
  var
    LoopSz = faces.len
    res: array[LoopSz, int]
  parallel:
    for i in 0..LoopSz-1:
      spawn whateverNeedsDoingInParallel(..., res[i], ....)
  sync()
  for i in 0..LoopSz-1:
    aggResult = doSomeAggregationOfResults(res[i])
Note: currently don't do 0..<LoopSz (unless that has been fixed).
sync() simply waits till all your spawned threads are finished. If you call spawn n-times on the same proc it´s almost the same as parallel except that parallel does some additionally checks for you. Also you don't need to collect the FlowVars and check manually with ^ (on each FlowVar) (or use sync()) till all threads are finished. To refer to the docu-link of jlp765: if you exit the parallel-block everything is done.
Inside the parallel-block you need to use spawn and tune your degree of parallelism (split your data in n-chunks and process each chunk with a workerthread). So parallel is a higher level of abstraction to work with threads.
I don´t know what happens if you first spawn some threads and then enter a parallel block (and also spawn some threads). Never tried that but it should work.