In an attemt to speed up rendering of a function to image using Malebolgia and pixy I ran into a problem. It is my first attempt at doing anything multi threaded.
The original proc that works fine:
proc renderXYFunc1(
image: Image, #from pixy
function: proc,
funcVA: varargs[float],
viewPort: (Vec2, Vec2),
scale: float
): void=
for j in 0..image.height:
let y = lerp(j.float, 0.0, image.height.float, viewPort[0].y, viewPort[1].y) * scale
for i in 0..image.width:
let x = lerp(i.float, 0.0, image.width.float, viewPort[0].x, viewPort[1].x) * scale
image[i,j] = function(x, y, funcVA)
after adding the malebolgia it results in an Error: expression has no address at -> image[i,j].
proc renderXYFunc2(
image: Image,
function: proc,
funcVA: varargs[float],
viewPort: (Vec2, Vec2),
scale: float
): void=
var m = createMaster()
m.awaitAll:
for j in 0..image.height:
let y = lerp(j.float, 0.0, image.height.float, viewPort[0].y, viewPort[1].y) * scale
for i in 0..image.width:
let x = lerp(i.float, 0.0, image.width.float, viewPort[0].x, viewPort[1].x) * scale
m.spawn function(x, y, funcVA) -> image[i,j]
So I tried with something I made for 'three dimensional arrays'. The example code here is incomplete, but it works fine. Looks similar to image[i,j] but probably isn't. I had a peek at pixy's code but could not completely understand it.
type
Seq3*[T] = object
data*: seq[T]
dimX*: int
dimY*: int
dimZ*: int
.
.
.
proc `[]`*[T](seq3: Seq3[T], x, y, z: int):T =
seq3.data[x + seq3.dimX*y + seq3.dimX * seq3.dimY * z]
import random
import malebolgia
var m = createMaster()
var s3 = initSeq3[int](8,8,8)
m.awaitAll:
for y in 0..<8:
for x in 0..<8:
for z in 0..<8:
m.spawn rand(100) -> s3[x,y,z]
So, what is the thing I'm missing?
@planitis, thanks for mentioning. Never noticed it.
@PMunch var Image results in the same error
firstly, you probably want for j in 0..<image.height: (and for width respectively) instead of for j in 0..image.height: as .. is an end-inclusive range and ..< is an end-exclusive range.
secondly that being said the reason spawn probably doesn't work here is because the result needs to be addressable. this works for seq[T] elements because [] returns a var T in the stdlib whereas pixie defines a []= (a setter procedure) instead of returning an immediately addressable element.
instead, try spawning without collecting - create a inner proc that sets the desired pixel.
proc assign(...) =
image[i, j] = function(x, y, funcVA)
for j in 0 ..< image.height:
let x = ...
for i in 0..< image.width:
let y = ...
m.spawn assign(...)
Based on your post I tried various versions. All failed, so I'm still not getting it. Here's the full scene:
# nim c -d:ThreadPoolSize=8 -d:FixedChanSize=16 renderfunc.nim
import std/[math, monotimes]
import pixie
import malebolgia
const
Width = 800
Height = 400
proc lerp(t, minin, maxin, minout, maxout: float):float {.inline.}=
result = ((t - minin) / (maxin - minin)) * (maxout - minout) + minout
proc shuheiKawachi (x: float, y: float, funcVA: varargs[float]): Color = #or ColorRGBX
let val = (((cos(x) * cos(y) + cos(( sqrt(funcVA[0]) * x - y) / funcVA[1]) *
cos((x + sqrt(funcVA[0]) * y) / funcVA[1]) + cos(( sqrt(funcVA[0]) * x + y) / funcVA[1]) *
cos((x - sqrt(funcVA[0]) * y) / funcVA[1])) / 3 ) + 1) #division by 3 to bring it in the [-1,1] range
return color(val, val, val) #color(val,val,val)..asRgbx()
proc renderXYFunc2(
image: var Image,
function: proc,
funcVA: varargs[float],
viewPort: (Vec2, Vec2),
scale: float
): void=
var m = createMaster()
proc assign(i:int, j:int, x:float, y:float, function:proc, funcVA:varargs[float]):Image =
#proc assign(i:int, j:int, x:float, y:float):Image =
image[i,j] = function(x, y, funcVA)
m.awaitAll:
for j in 0..<image.height:
let y = lerp(j.float, 0.0, image.height.float, viewPort[0].y, viewPort[1].y) * scale
for i in 0..<image.width:
let x = lerp(i.float, 0.0, image.width.float, viewPort[0].x, viewPort[1].x) * scale
#m.spawn function(x, y, funcVA) -> image[i,j]
m.spawn assign(i, j, x, y, function, funcVA) #Error: 'toTask' takes a GC safe call expression
#m.spawn assign(i, j, x, y) #Error: closure call is not allowed
#change the output of the function used to ColorRGBX type
#m.spawn function(x, y, funcVA) -> image.unsafe[i,j] #renderfunc.nim(50, 17) Error: illegal capture 'function' because 'function' has the calling convention: <nimcall>
var image = newImage(Width, Height)
let t0 = getMonoTime()
renderXYFunc2(
image,
shuheiKawachi,
[TAU, 1.5],
(vec2(0.0, 0.0), vec2(TAU, PI)),
6.0
)
let t1 = getMonoTime() - t0
echo t1
writeFile(image, "kawachi.png")
I tried your suggestions, they with: Error: 'toTask' takes a GC safe call expression
@guzba so the thing to do would be to store the data in an 2d array (as I did with the 3d one above) and then loop these into an image?
Here's one way you could do it. As Araq alludes to instead of spawning for each pixel, we split the image into chunks and work on those. I make the assumption that copying the Image ref for each thread should be safe as long as we make sure we do not access the same parts of the image from different threads. Also due to the proc closure I had a reason to learn about the {.sendable.} pragma. I think this should be fine, but I only learned about it 20 min ago. :)
Oh, and better check I didn't screw up the serial index per thread -> i,j indices conversion, heh.
I get about a 13x speedup using 32 threads over a single thread on an 8000x4000 image. Not amazing, but better than nothing. Haven't at all profiled the code.
@Araq: Any reason the ThreadPoolSize is not exported in Malebolgia? Would be useful for something like this. Should I make a PR?
import std/[math, monotimes, times]
from std / times import inMilliSeconds
import pixie
import malebolgia
const
Width = 8000
Height = 4000
NumThreads = 32
type
FnType = proc(x: float, y: float, funcVA: varargs[float]): Color {.noSideEffect, gcsafe.}
## Context type that is marked `Sendable` so that it can be isolated
Context {.sendable.} = object
img: Image
fn: FnType
proc lerp(t, minin, maxin, minout, maxout: float):float {.inline.}=
result = ((t - minin) / (maxin - minin)) * (maxout - minout) + minout
func shuheiKawachi(x: float, y: float, funcVA: varargs[float]): Color = #or ColorRGBX
let val = (((cos(x) * cos(y) + cos(( sqrt(funcVA[0]) * x - y) / funcVA[1]) *
cos((x + sqrt(funcVA[0]) * y) / funcVA[1]) + cos(( sqrt(funcVA[0]) * x + y) / funcVA[1]) *
cos((x - sqrt(funcVA[0]) * y) / funcVA[1])) / 3 ) + 1) #division by 3 to bring it in the [-1,1] range
result = color(val, val, val) #color(val,val,val)..asRgbx()
proc renderXYFunc2(
image: var Image,
function: FnType,
funcVA: varargs[float],
viewPort: (Vec2, Vec2),
scale: float
) =
var m = createMaster()
proc assign(ctx: Context, threadId: int, viewPort: (Vec2, Vec2), scale: float, funcVA:varargs[float]) =
let img = ctx.img
let numPer = (img.width * img.height) div NumThreads
let frm = numPer * threadId
let to = if threadId == NumThreads: img.width * img.height - 1 else: numPer * (threadId + 1)
let fn = ctx.fn
for idx in frm ..< to:
let i = idx mod img.width
let j = (idx.float / img.width.float).round.int
let y = lerp(j.float, 0.0, img.height.float, viewPort[0].y, viewPort[1].y) * scale
let x = lerp(i.float, 0.0, img.width.float, viewPort[0].x, viewPort[1].x) * scale
img[i,j] = fn(x, y, funcVA)
m.awaitAll:
for i in 0 ..< NumThreads:
# Context copies the image ref, leaves underlying data seq the same. This is only safe
# as long as we guarantee there is no overlap between the different thread portions of the image!
let ctx = Context(img: image, fn: function)
m.spawn assign(ctx, i, viewPort, scale, funcVA)
var image = newImage(Width, Height)
let t0 = getMonoTime()
renderXYFunc2(
image,
shuheiKawachi,
[TAU, 1.5],
(vec2(0.0, 0.0), vec2(TAU, PI)),
6.0
)
let t1 = getMonoTime() - t0
echo "Took ", t1.inMilliSeconds(), " ms"
writeFile(image, "kawachi.png")
To work with images easiest would be to use a threadpool that supports parallelFor, see Weave:
https://github.com/mratsim/weave/blob/7682784/demos/raytracing/smallpt.nim#L271-L274
https://github.com/mratsim/trace-of-radiance/blob/e928285c/trace_of_radiance/render.nim#L49-L68
Here's one way you could do it.
Thank you @Vindaar
Sadly it errors: funcfuncfunc.nim(54, 21) Error: 'toTask'ed function cannot have a parameter of nnkTupleConstr kind Nim2.0.0 Malebolgia 1.3.0 (I think)
Looking at what you do is what I called block rendering. The {.sendable.} I'll have to look into.
@mratsim Weave is on the list now. Thanks.
Sadly it errors: funcfuncfunc.nim(54, 21) Error: 'toTask'ed function cannot have a parameter of nnkTupleConstr kind Nim2.0.0 Malebolgia 1.3.0 (I think) Win11
Huh, I guess that's something that was fixed / changed on Nim devel then. :/
Any reason the ThreadPoolSize is not exported in Malebolgia? Would be useful for something like this. Should I make a PR?
Sure, go ahead.
Huh, I guess that's something that was fixed / changed on Nim devel then. :/
update devel fixed it. It renders in 2897ms. Thanks.
To work with images easiest would be to use a threadpool that supports parallelFor, see Weave
Malebolgia supports parMap fwiw.