I've compiled with --gc:boehm because I want to be able to share data structures between threads.
Despite this, I'm still getting this error from my .thread. proc: procedure is not GC-safe. As I understand it, this is the compiler trying to enforce separate heaps. This surprises me: I assumed with the boehm GC, this limitation would be lifted. Am I misunderstanding how this works?
We don't really want to have a Nim dialect per --gc option so the frontend checking is not aware of --gc:boehm. You can cast GC safety into existance though and be as dirty as Boehm allows for.
Sketch:
type EnforcedGcSafe = proc() {.gcsafe.}
proc myproc =
discard "... access shared heap here..."
spawn cast[EnforcedGcSafe](myproc)
Thanks for the casting idea.
I really don't think you should think of boehm as "dirty." I mean, compared to what?
Real multi-threaded programs need to store structured data in shared memory. Just to give an example, let's say that this structured data is json. With boehm, I can load the shared json using the existing json module, and the json will be garbage collected when it's no longer referenced. Without boehm, I have to clone the code for the json parser, then alter it to use createShared. My data structures won't be garbage collected. Is that really cleaner?
Or maybe, you just feel it's dirty to use shared structured data at all. If that's the case, then I don't know what to say, other than: I've spoken to many pure functional programmers who felt it was dirty to use mutation, because you can really shoot yourself in the foot with mutation. In a way, they're not wrong. But I'm not giving up on imperative programming.
Boehm is not a precise GC and thus "dirty".
Without boehm, I have to clone the code for the json parser, then alter it to use createShared. My data structures won't be garbage collected. Is that really cleaner?
No, you only have to protect and dispose the Json. You can wrap the dispose in another ref with a finalizer and have 100% automatic memory management with thread local GCs. That still doesn't make it the most beautiful memory management design out there, but it's not too bad.
I don't know what "protect" the json means. I looked in the manual, it doesn't mention a protect statement. I'm also not entirely sure about "dispose", the word "dispose" doesn't appear in the manual, but I know there's a dispose statement that ignores a return value. But I don't see where that comes in. I also don't know what it means to "wrap a dispose in a ref." Long story short: you lost me.
But there are two more things I don't understand:
Edit: I found a protect and a dispose in the system module. They're clearly intended for something having to do with referencing data across heaps, but I just haven't been able to intuit the details. Apparently, dispose doesn't do what I thought it did.
How do I avoid accidentally trashing the refcounts?
protect and dispose only give you a pointer that you can cast to ptr JsonObj and so RCs are not affected.
But as you mentioned before, I can force it with a cast. Is that part of the solution you're suggesting?
Well yes. But you really need to be careful and even then it doesn't support multiple threads creating data and adding it to an existing datastructure wihout copies, so Boehm may indeed be what you want.
When I think about how this plays out in a large multithreaded program, it worries me.
For example, I used to work on a program that served queries pertaining to products for a shopping website. This program used to load tons of data at program initialization time: it would load up the database of product categories, it loaded tables of statistics about how often various products were purchased, it loaded a script that helped guide serving, and so forth. There was a lot of data, in hundreds of classes, and it took up gigabytes, so keeping multiple copies in RAM wouldn't have been an option.
The libraries that handled this data weren't necessarily written for this multithreaded server. For example, the product category database was used in dozens of programs, most of which were single threaded. The scripting language was just a scripting language - it was used in many programs, again, most of which were single-threaded.
Now, I think: what if I had tried to implement this in Nim? The team that implemented the product category database would probably have been written it in the "normal" Nim style, with refs and garbage collection. Likewise for the scripting language. Then, the team that wrote the product server would have looked at these libraries and realized that they couldn't use them in a multithreaded server.
Of course, I can do true sharing using Boehm (ie, using established libraries in a multithreaded program), but the compiler is trying very hard to stop me from doing this. I can currently circumvent it with clever use of typecasts, but will future versions of the compiler be even more aggressive in trying to stop me? Will it always be feasible to use real shared memory in Nim?
I would urge the Nim designers to think of this as the real challenge for multithreading: if I already have a library, can I then take the data from that library and put it in shared memory? Or do I have to rewrite the library? If the latter, then that's seriously limiting to how realistic it is to craft large multithreaded programs.
I used the word "database" in the generic sense.
Basically, go to any good shopping website (eg, wayfair, or amazon, or google shopping, for example), and see what that website "knows" about coffee tables. If you take a little time and study the website, I think you'll realize that it's a surprisingly large amount of knowledge. There's a whole expert system in there about coffee tables - and another expert system about bedspreads, and so forth.
At the very base of it all, is a taxonomy:
Home Goods >
Tools >
Drills and Rotary Tools
Some product categories fit into the taxonomy in multiple places:
Apparel >
Shirts >
Football Jerseys
Sports >
Sports Memorabilia >
Football Jerseys
Then, for each category of products, you have data about which products in the category are most popular, data to drive a machine-learning classifier that classifies queries to determine if they belong in that category, etcetera, etcetera. Just take a little time studying one of these websites, and I think you'll come to appreciate how much is there.
So when I say a "database" here, I don't mean that it's SQL. I'm just saying it's complicated data. It's plain old C++ classes, loaded into RAM - but it's legitimately lots of C++ classes, and it takes up lots of RAM.
So imagine, now, that you've got a dozen teams: one is responsible for implementing a machine-learning classifier that differentiates queries about drills from queries about rotary hammers. Another is building a software system whose job is to estimate the popularity of products according to a dozen different metrics. Another is building a recommendation system that uses statistics to recommend "related products." And so forth.
Now, you know that at some point, you're going to need to integrate all this software into a web-server, which will probably be multithreaded.
So what do you do: do you tell these teams not to use refs? Or do you tell them to go ahead and use refs and garbage collection, and deal with the consequences?
So what do you do: do you tell these teams not to use refs? Or do you tell them to go ahead and use refs and garbage collection, and deal with the consequences?
To be honest I would fire teams who model business taxonomies with C++ classes (OMG), keeping all the data in RAM (OMG) and not using a (noSQL / SQL) database (OMG).
But in the context of Nim, I would likely live with refs, thread local GCs and when somebody queries my data, I would return a copy. Note that copying a query result doesn't mean to "keep full copies of the data around in RAM" which you sort of implied.
Anything new on this? I'm trying to implement a simple genetic algorithm where each thread shall have access to a shared seq[seq[int]], its easy to partition the data into chunks so the threads know which items to work on, but Nim seems IMPLICITLY copy the data, whether I try channels, parallel statements, spawn statements or even "bare" threads. Am I doing something wrong?
I thought that because the object is a ref object only the ref: the address, would be copied?
Code here: https://github.com/peheje/nim_genetic/blob/master/mutable_state.nim
I appreciate language features like parallel: with spawn does some magic in the background implicitly but I think being too creative might take many programmers by "unpleasent" surprise.
In case it helps, my shared memory, Garbage Collected, data structure (and 64 byte aligned) is this:
const FORCE_ALIGN = 64
type
BlasBufferArray[T] = object
dataRef: ref[ptr T]
data*: ptr UncheckedArray[T]
len*: int
proc deallocBlasBufferArray[T](dataRef: ref[ptr T]) =
if not dataRef[].isNil:
deallocShared(dataRef[])
dataRef[] = nil
proc newBlasBuffer[T](size: int): BlasBufferArray[T] =
## Create a heap array aligned with FORCE_ALIGN
new(result.dataRef, deallocBlasBufferArray)
# Allocate memory, we will move the pointer, if it does not fall at a modulo FORCE_ALIGN boundary
let address = cast[ByteAddress](allocShared0(sizeof(T) * size + FORCE_ALIGN - 1))
result.dataRef[] = cast[ptr T](address)
result.len = size
if (address and (FORCE_ALIGN - 1)) == 0:
result.data = cast[ptr UncheckedArray[T]](address)
else:
let offset = FORCE_ALIGN - (address and (FORCE_ALIGN - 1))
let data_start = cast[ptr UncheckedArray[T]](address +% offset)
result.data = data_start
I use it for shared-memory parallelism via OpenMP here.
To use it with the convenient array indexing syntax, data is a ptr UncheckedArray[T] instead of just ptr T.
dataRef is just there to make the object GC-managed, I don't use it otherwise.
The core is the same as a seq, except that I allocate memory manually but deallocated automatically via GC.
An array of data that I allocate manually hence data: ptr UncheckedArray[T]
When I want bounds checking or indexing from the end I need a length, hence len: int
I don't want to manage deallocation manually so I need a ref somewhere and a finalizer so dataRef = ref[ptr[T]] or dataRef = ref[ptr UncheckedArray[T]]
So you have this structure:
type
BlasBufferArray[T] = object
dataRef: ref[ptr T]
data*: ptr UncheckedArray[T]
len*: int
proc deallocBlasBufferArray[T](dataRef: ref[ptr T]) =
# finalizer: called by Nim GC
if not dataRef[].isNil:
deallocShared(dataRef[])
dataRef[] = nil
proc newBlasBuffer[T](size: int): BlasBufferArray[T] =
new(result.dataRef, deallocBlasBufferArray) # new with finalizer
# Allocate shared memory with allocShared, and get the pointer address
let address = cast[ByteAddress](allocShared0(sizeof(T) * size))
# Store the reference, length and pointer to data
result.dataRef[] = cast[ptr T](address)
result.len = size
result.data = cast[ptr UncheckedArray[T]](address)
The rest of the newBlasBuffer proc was to enforce 64 byte alignment. So this is how to control allocation manually but leave deallocation to the GC