nimforum mirror - Send data structures between threads?

jyelon (orginal) [2016-08-11T06:18:59+02:00] view original

I've compiled with --gc:boehm because I want to be able to share data structures between threads.

Despite this, I'm still getting this error from my .thread. proc: procedure is not GC-safe. As I understand it, this is the compiler trying to enforce separate heaps. This surprises me: I assumed with the boehm GC, this limitation would be lifted. Am I misunderstanding how this works?

Josh

Araq (orginal) [2016-08-11T14:44:54+02:00] view original

We don't really want to have a Nim dialect per --gc option so the frontend checking is not aware of --gc:boehm. You can cast GC safety into existance though and be as dirty as Boehm allows for.

Sketch:

type EnforcedGcSafe = proc() {.gcsafe.}

proc myproc =
  discard "... access shared heap here..."

spawn cast[EnforcedGcSafe](myproc)

jyelon (orginal) [2016-08-11T17:27:30+02:00] view original

Thanks for the casting idea.

I really don't think you should think of boehm as "dirty." I mean, compared to what?

Real multi-threaded programs need to store structured data in shared memory. Just to give an example, let's say that this structured data is json. With boehm, I can load the shared json using the existing json module, and the json will be garbage collected when it's no longer referenced. Without boehm, I have to clone the code for the json parser, then alter it to use createShared. My data structures won't be garbage collected. Is that really cleaner?

Or maybe, you just feel it's dirty to use shared structured data at all. If that's the case, then I don't know what to say, other than: I've spoken to many pure functional programmers who felt it was dirty to use mutation, because you can really shoot yourself in the foot with mutation. In a way, they're not wrong. But I'm not giving up on imperative programming.

Araq (orginal) [2016-08-11T23:51:53+02:00] view original

Boehm is not a precise GC and thus "dirty".

Without boehm, I have to clone the code for the json parser, then alter it to use createShared. My data structures won't be garbage collected. Is that really cleaner?

No, you only have to protect and dispose the Json. You can wrap the dispose in another ref with a finalizer and have 100% automatic memory management with thread local GCs. That still doesn't make it the most beautiful memory management design out there, but it's not too bad.

jyelon (orginal) [2016-08-12T00:04:56+02:00] view original

I don't know what "protect" the json means. I looked in the manual, it doesn't mention a protect statement. I'm also not entirely sure about "dispose", the word "dispose" doesn't appear in the manual, but I know there's a dispose statement that ignores a return value. But I don't see where that comes in. I also don't know what it means to "wrap a dispose in a ref." Long story short: you lost me.

But there are two more things I don't understand:

If the json is stored in a thread-local heap, then I let other threads reference the json, those other threads are going to tend to touch the refcounts. But the refcounts, presumably, aren't atomic ints. How do I avoid accidentally trashing the refcounts?

The compiler is trying very hard to prevent me from passing a ref to the json from the thread that created it to any other thread. (That's what gc-safe is all about). But as you mentioned before, I can force it with a cast. Is that part of the solution you're suggesting?

Edit: I found a protect and a dispose in the system module. They're clearly intended for something having to do with referencing data across heaps, but I just haven't been able to intuit the details. Apparently, dispose doesn't do what I thought it did.

jyelon (orginal) [2016-08-25T16:43:26+02:00] view original

Say, I never did get an answer to either of the questions above. I'm still curious:

If the json is stored in a thread-local heap, then I let other threads reference the json, those other threads are going to tend to touch the refcounts. But the refcounts, presumably, aren't atomic ints. How do I avoid accidentally trashing the refcounts?

The compiler is trying very hard to prevent me from passing a ref to the json from the thread that created it to any other thread. (That's what gc-safe is all about). But as you mentioned before, I can force it with a cast. Is that part of the solution you're suggesting?

Araq (orginal) [2016-08-25T22:40:52+02:00] view original

How do I avoid accidentally trashing the refcounts?

protect and dispose only give you a pointer that you can cast to ptr JsonObj and so RCs are not affected.

But as you mentioned before, I can force it with a cast. Is that part of the solution you're suggesting?

Well yes. But you really need to be careful and even then it doesn't support multiple threads creating data and adding it to an existing datastructure wihout copies, so Boehm may indeed be what you want.

jyelon (orginal) [2016-08-26T15:19:55+02:00] view original

So, this only sounds half-usable: yes, I can pass the json to some other thread, but then when it arrives at the other thread, I can't call library functions like "getFields" or even 'x == y' to examine the json (these functions take refs). So now I have a ptr to a json object that I can't pass to the json library. That's not really so helpful.

Varriount (orginal) [2016-08-26T23:57:14+02:00] view original

In those cases, do you really need to access the exact same memory, or will a copy do?

jyelon (orginal) [2016-08-27T00:42:55+02:00] view original

When I think about how this plays out in a large multithreaded program, it worries me.

For example, I used to work on a program that served queries pertaining to products for a shopping website. This program used to load tons of data at program initialization time: it would load up the database of product categories, it loaded tables of statistics about how often various products were purchased, it loaded a script that helped guide serving, and so forth. There was a lot of data, in hundreds of classes, and it took up gigabytes, so keeping multiple copies in RAM wouldn't have been an option.

The libraries that handled this data weren't necessarily written for this multithreaded server. For example, the product category database was used in dozens of programs, most of which were single threaded. The scripting language was just a scripting language - it was used in many programs, again, most of which were single-threaded.

Now, I think: what if I had tried to implement this in Nim? The team that implemented the product category database would probably have been written it in the "normal" Nim style, with refs and garbage collection. Likewise for the scripting language. Then, the team that wrote the product server would have looked at these libraries and realized that they couldn't use them in a multithreaded server.

Of course, I can do true sharing using Boehm (ie, using established libraries in a multithreaded program), but the compiler is trying very hard to stop me from doing this. I can currently circumvent it with clever use of typecasts, but will future versions of the compiler be even more aggressive in trying to stop me? Will it always be feasible to use real shared memory in Nim?

I would urge the Nim designers to think of this as the real challenge for multithreading: if I already have a library, can I then take the data from that library and put it in shared memory? Or do I have to rewrite the library? If the latter, then that's seriously limiting to how realistic it is to craft large multithreaded programs.

jyelon (orginal) [2016-08-27T23:30:50+02:00] view original

I used the word "database" in the generic sense.

Basically, go to any good shopping website (eg, wayfair, or amazon, or google shopping, for example), and see what that website "knows" about coffee tables. If you take a little time and study the website, I think you'll realize that it's a surprisingly large amount of knowledge. There's a whole expert system in there about coffee tables - and another expert system about bedspreads, and so forth.

At the very base of it all, is a taxonomy:


Home Goods >
  Tools >
    Drills and Rotary Tools

Some product categories fit into the taxonomy in multiple places:


Apparel >
  Shirts >
    Football Jerseys

Sports >
  Sports Memorabilia >
    Football Jerseys

Then, for each category of products, you have data about which products in the category are most popular, data to drive a machine-learning classifier that classifies queries to determine if they belong in that category, etcetera, etcetera. Just take a little time studying one of these websites, and I think you'll come to appreciate how much is there.

So when I say a "database" here, I don't mean that it's SQL. I'm just saying it's complicated data. It's plain old C++ classes, loaded into RAM - but it's legitimately lots of C++ classes, and it takes up lots of RAM.

So imagine, now, that you've got a dozen teams: one is responsible for implementing a machine-learning classifier that differentiates queries about drills from queries about rotary hammers. Another is building a software system whose job is to estimate the popularity of products according to a dozen different metrics. Another is building a recommendation system that uses statistics to recommend "related products." And so forth.

Now, you know that at some point, you're going to need to integrate all this software into a web-server, which will probably be multithreaded.

So what do you do: do you tell these teams not to use refs? Or do you tell them to go ahead and use refs and garbage collection, and deal with the consequences?

Araq (orginal) [2016-08-28T01:49:44+02:00] view original

So what do you do: do you tell these teams not to use refs? Or do you tell them to go ahead and use refs and garbage collection, and deal with the consequences?

To be honest I would fire teams who model business taxonomies with C++ classes (OMG), keeping all the data in RAM (OMG) and not using a (noSQL / SQL) database (OMG).

But in the context of Nim, I would likely live with refs, thread local GCs and when somebody queries my data, I would return a copy. Note that copying a query result doesn't mean to "keep full copies of the data around in RAM" which you sort of implied.

Varriount (orginal) [2016-08-28T02:03:12+02:00] view original

@Araq This sounds like object mapping, in which data from a database is mapped into a set of objects. That's quite common, and makes working with data much easier.

peheje (orginal) [2017-10-26T17:59:53+02:00] view original

Anything new on this? I'm trying to implement a simple genetic algorithm where each thread shall have access to a shared seq[seq[int]], its easy to partition the data into chunks so the threads know which items to work on, but Nim seems IMPLICITLY copy the data, whether I try channels, parallel statements, spawn statements or even "bare" threads. Am I doing something wrong?

I thought that because the object is a ref object only the ref: the address, would be copied?

Code here: https://github.com/peheje/nim_genetic/blob/master/mutable_state.nim

I appreciate language features like parallel: with spawn does some magic in the background implicitly but I think being too creative might take many programmers by "unpleasent" surprise.

monster (orginal) [2017-10-26T20:53:26+02:00] view original

Hi @peheje. I'm really not the expert on this, as I'm a newbie at Nim myself, but I'm pretty sure I read that everything you send over a Channel is copied. The refs are local to the thread, so if you send data with ref, both the data, and what is pointed by refs, gets cloned "magically". This is why I'm looking into making my own "shared heap" replacements for seq/array, and eventually sets and tables (although, of course, I would rather use an existing "shared heap" implementation of those structures, if one is available in some nimble package).

mratsim (orginal) [2017-10-26T21:19:16+02:00] view original

In case it helps, my shared memory, Garbage Collected, data structure (and 64 byte aligned) is this:

const FORCE_ALIGN = 64

type
  BlasBufferArray[T]  = object
    dataRef: ref[ptr T]
    data*: ptr UncheckedArray[T]
    len*: int

proc deallocBlasBufferArray[T](dataRef: ref[ptr T]) =
  if not dataRef[].isNil:
    deallocShared(dataRef[])
    dataRef[] = nil

proc newBlasBuffer[T](size: int): BlasBufferArray[T] =
  ## Create a heap array aligned with FORCE_ALIGN
  new(result.dataRef, deallocBlasBufferArray)
  
  # Allocate memory, we will move the pointer, if it does not fall at a modulo FORCE_ALIGN boundary
  let address = cast[ByteAddress](allocShared0(sizeof(T) * size + FORCE_ALIGN - 1))
  
  result.dataRef[] = cast[ptr T](address)
  result.len = size
  
  if (address and (FORCE_ALIGN - 1)) == 0:
    result.data = cast[ptr UncheckedArray[T]](address)
  else:
    let offset = FORCE_ALIGN - (address and (FORCE_ALIGN - 1))
    let data_start = cast[ptr UncheckedArray[T]](address +% offset)
    result.data = data_start

I use it for shared-memory parallelism via OpenMP here.

To use it with the convenient array indexing syntax, data is a ptr UncheckedArray[T] instead of just ptr T.

dataRef is just there to make the object GC-managed, I don't use it otherwise.

wizzardx (orginal) [2017-10-29T09:05:27+01:00] view original

I don't understand it, but nice :-) Have added an issue on the Nim Cookbook for this to be added there.

mratsim (orginal) [2017-10-29T11:15:55+01:00] view original

The core is the same as a seq, except that I allocate memory manually but deallocated automatically via GC.

An array of data that I allocate manually hence data: ptr UncheckedArray[T]

When I want bounds checking or indexing from the end I need a length, hence len: int

I don't want to manage deallocation manually so I need a ref somewhere and a finalizer so dataRef = ref[ptr[T]] or dataRef = ref[ptr UncheckedArray[T]]

So you have this structure:

type
  BlasBufferArray[T]  = object
    dataRef: ref[ptr T]
    data*: ptr UncheckedArray[T]
    len*: int

proc deallocBlasBufferArray[T](dataRef: ref[ptr T]) =
  # finalizer: called by Nim GC
  if not dataRef[].isNil:
    deallocShared(dataRef[])
    dataRef[] = nil

proc newBlasBuffer[T](size: int): BlasBufferArray[T] =
  new(result.dataRef, deallocBlasBufferArray) # new with finalizer
  
  # Allocate shared memory with allocShared, and get the pointer address
  let address = cast[ByteAddress](allocShared0(sizeof(T) * size))
  
  # Store the reference, length and pointer to data
  result.dataRef[] = cast[ptr T](address)
  result.len = size
  result.data = cast[ptr UncheckedArray[T]](address)

The rest of the newBlasBuffer proc was to enforce 64 byte alignment. So this is how to control allocation manually but leave deallocation to the GC

peheje (orginal) [2017-11-01T23:50:40+01:00] view original

Thanks mratsim!

Mirror of forum.nim-lang.org

2457 :: Send data structures between threads?