Title should be "Weave: ". Reposting for readibility:
So I have this piece of code that errors with:
: illegal capture 'idsBuf' because 'weaveParallelForSection' has the calling convention: <inline>
The same code runs fine if not in a proc. Don't know much about weave, read only so much as to put this together. I cannot figure out what it wants from me. Please help fix.
Your code needs to explicitly captures all variables you want available in the parallel for loop:
# Original
proc parallelCoverURL(db, dbc: DbConn) =
init(Weave)
var ids: seq[uint32] = db.getAllRows(sql"SELECT tconst FROM title_basics WHERE tconst NOT IN (SELECT id FROM covers.lolz)").mapIt(parseuint(it[0]).uint32)
dbc.exec(sql"CREATE TABLE IF NOT EXISTS lolz (id INT PRIMARY KEY, url VARCHAR)")
let idsBuf = cast[ptr UncheckedArray[uint32]](ids[0].unsafeAddr)
parallelFor i in ids.low .. ids.high:
echo $idsBuf[i] & " " & myGetIMDBCoverURL(dbc,idsBuf,i,imdbURLs)
exit(Weave)
Instead you should use:
# Fixed
proc parallelCoverURL(db, dbc: DbConn) =
init(Weave)
var ids: seq[uint32] = db.getAllRows(sql"SELECT tconst FROM title_basics WHERE tconst NOT IN (SELECT id FROM covers.lolz)").mapIt(parseuint(it[0]).uint32)
dbc.exec(sql"CREATE TABLE IF NOT EXISTS lolz (id INT PRIMARY KEY, url VARCHAR)")
let idsBuf = cast[ptr UncheckedArray[uint32]](ids[0].unsafeAddr)
parallelFor i in ids.low .. ids.high:
captures: {dbc, idsBuf, imdbURLs}
echo $idsBuf[i] & " " & myGetIMDBCoverURL(dbc,idsBuf,i,imdbURLs)
exit(Weave)
That said, Weave is to optimize compute, getting data from a database is IO-bound, you'll only hammer your database and stress it with context switches:
Also from your proc names, if you only want to download data, be sure to read this: https://ep2019.europython.eu/media/conference/slides/KNhQYeQ-downloading-a-billion-files-in-python.pdf
Lastly, if you are dealing with text, parallelFor is likely the wrong architecture/construct. And make sure to compile with --gc:arc or --gc:orc
Thank you for the comprehensive response ! I had wrongly assumed that the explicit capture is not needed because the code worked without it, if put outside proc. But since it didn't work with code in the proc, I should have thought something needs to be done different.
The workflow here is simple. Retrieve a html page with a certain key passed in the URL, parse the html to find a certain string, put the string in an sqlite database. The limit here is the connection/http latency. The other parts are super fast.
I don't think async can be used here, unless I simulated some kind of thread pool to manage the number of parallel jobs, which defeats the purpose I think. I also do not know how to work with sync/spawn. This library was recommended when I asked in the community chat. I now understand it is overkill for my needs, as in, I do not use/need any of its fancy features, but then again, the parallelFor was simple enough to use, vs learning the others.
The limit here is the connection/http latency. The other parts are super fast.
I cover this in my blog post https://nim-lang.org/blog/2021/02/26/multithreading-flavors.html
What is important is how do you make progress.
If your program makes progress by working, yes Weave is a suitable tool. If your program makes progress by waiting, async is the ideal solution, and you can await thousands of files, network connections or user inputs before needing multiple cores.
In the first case, you can throw more cores, in the second case, say downloading a file, throwing more core at a download won't speed them up.
I don't think async can be used here, unless I simulated some kind of thread pool to manage the number of parallel jobs, which defeats the purpose I think.
That's where you're wrong, async is the ideal solution there. You start many downloads, await for the first one to finish, process, switch to the next. Downloads are already managed in the background by the OS.