nimforum mirror - How to use global immutable variables in Threads?

alexeypetrushin (orginal) [2021-03-26T12:19:52+01:00] view original

The example below would fail complaining that it's not gsafe, and let can't be changed into const.

How it should be handled, is there a directive to tell Nim compiler to put let into globally shared memory?

import mimetypes, threadpool

let mime = new_mimetypes()
proc parse_format(): string =
  mime.get_ext("text/html", "unknown")

var cresp = spawn parse_format()
echo ^cresp

Symb0lica (orginal) [2021-03-26T13:05:46+01:00] view original

I suspect you will have to pass mime as a parameter (unless someone knows of a better way).

r3c (orginal) [2021-03-26T13:12:22+01:00] view original

import mimetypes, threadpool

proc parse_format(mime: any): string =
  mime.get_ext("text/html", "unknown")

var cresp = spawn parse_format(new_mimetypes())
echo ^cresp

alexeypetrushin (orginal) [2021-03-26T14:27:22+01:00] view original

Passing as a parameter won't work, it's a simplified example. The actual usage is in multi-threaded web server, where some helper methods detect MIME type.

Using gcsafe block forces it to compile, and it works. But... not sure how safe to use it, maybe it would crash randomly or stop working with the next Nim version.

I'm also tried to use locks, not working either.

var mime_lock: Lock
init_lock mime_lock
let mime {.guard: mime_lock.} = new_mimetypes()

proc parse_format(): string {.gcsafe.} =
  with_lock mime_lock:
    mime.get_ext("text/html", "unknown")

var cresp = spawn parse_format()
echo ^cresp


Error: 'parse_format' is not GC-safe as it accesses 'mime' which is a global using GC'ed memory

Hmm, I'm not sure, how to use locks then?

alexeypetrushin (orginal) [2021-03-26T15:01:46+01:00] view original

A second though about forcefully using `gsafe` block. Actually, it should be safe to use it in this case.

There going to be no race conditions as the data is immutable. And, this memory can't be erased, as the variable would be never garbage collected, and it's in the main thread that's never going to be terminated.

So, it should be safe?

r3c (orginal) [2021-03-26T15:19:35+01:00] view original

I've never saw crash in a gcsafe marked proc, but if someone knows better solution please post it.

alexeypetrushin (orginal) [2021-03-26T15:25:45+01:00] view original

The gcsafe block is different from gcsafe proc. The gcsafe procs are proper way, verified by compiler. While the gcsafe block forces compiler to compile unsafe code anyway.

boia01 (orginal) [2021-03-26T15:28:23+01:00] view original

The general rule is to avoid sharing refs across threads. Nim's main GC algorithms (ARC, ORC) do not provide the guarantee that reference-counting is thread-safe.

And more specifically, even protecting access to the root of an object graph through a lock is generally insufficient, because even if your object graph is immutable, the GC can mutate the refcount under the hood in a non-thread-safe way. All access to all objects in the graph would need to be protected through a lock or some other consistency mechanism.

In your case, if new_mimetypes() is an expensive operation, you could store the result in thread-local storage to cache the result locally and avoid sharing across threads.

Something like this:

import mimetypes, threadpool, options

var localMimeTypes {.threadvar.}: Option[MimeDB]

proc getMimeTypes(): MimeDB =
  if localMimeTypes.isNone:
    localMimeTypes = some(new_mimetypes())
  result = localMimeTypes.get

proc parse_format(): string =
  getMimeTypes().get_ext("text/html", "unknown")

var cresp = spawn parse_format()
echo ^cresp

boia01 (orginal) [2021-03-26T16:06:31+01:00] view original

@alexeypetrushin Sorry I missed reading your postscriptum:

P.S. There's also a way to use {.threadvar.} and keep separate copy for each thread. But it defeats the whole point of having multi threaded server that could optimise memory by sharing some common data between threads.

Yes, you're right that it's sub-optimal, and I don't believe there's a way to do this safely with refs today in the general sense, without having to deal with locking/consistency mechanisms and taking into consideration specific access patterns.

If I understand correctly, this may become possible through the use of view types. https://nim-lang.org/docs/manual_experimental.html#view-types

In the mean time, there's lots of way to share things across threads (sharedtable, smartptrs module from fusion, etc.) There's just no great solution for refs, AFAIK.

alexeypetrushin (orginal) [2021-03-26T16:54:23+01:00] view original

there's lots of way to share things across threads

I was wondering, would it be currently possible in Nim to do something like a database query? Make one thread responsible for storing large data object. And while not sharing the data directly, allow other threads to query the data (submit a query function and get back small chunk of non-ref data)?

Something like a pseudocode below:

import mimetypes, threadpool, strformat

# Data thread, storing huge data object -----------------------
proc run_data_thread() {.thread.} =
  # Large data object, available only for this thread
  let mime = new_mimetypes()
  # Listening for query requsts from other threads, and responding with only
  # small portion of data, the result of the query.
  on_query((query_fn, arg) => reply(query_fn(mime, arg)))

var data_thread: Thread[void]
createThread[void](data_thread, run_data_thread)


# Some other thread -------------------------------------------
proc run_some_thread() {.thread.} =
  # Querying the "Data thread" for some information
  let format = data_thread.query((mime, arg) => mime.get_ext(arg), "text/html")
  echo format

var some_thread: Thread[void]
createThread[void](some_thread, run_some_thread)

alexeypetrushin (orginal) [2021-03-26T17:19:08+01:00] view original

I just realised there's also the ptr. Could it be used?

import mimetypes, threadpool, strformat

var mime: ptr MimeDB
mime = create(MimeDB)
mime[] = new_mimetypes()

proc parse_format(): string {.gcsafe.} =
  mime[].get_ext("text/html", "unknown")

var cresp = spawn parse_format()
echo ^cresp

shirleyquirk (orginal) [2021-03-26T20:06:56+01:00] view original

Yes, this method is 'mentioned in the docs<https://nim-lang.org/docs/channels.html#example-passing-channels-safely>'_

boia01 (orginal) [2021-03-28T20:24:50+02:00] view original

@alexeypetrushin Unfortunately, accessing any graph containing ref objects using multiple threads is generally unsafe. This is true even if the access starts through a pointer; the pointer doesn't make it any safer, even if the graph is immutable. It would work if it was just a plain object structure (no refs), but it isn't foolproof if involves refs.

AFAIK, the use of refs across threads can work [*] only under the following conditions:

proper memory barriers are in place to guarantee consistency of the graph during access

the graph isn't being mutated

the reachability of the graph is guaranteed (the graph won't be GC'ed concurrently)

no external references into the graph are added/removed

[*] I'm using "can work" in a narrow sense here because it's still generally unsafe because maintaining all the conditions above is challenging (understatement). The compiler currently doesn't help provide these guarantees.

The last condition is particularly difficult to enforce because it's easy to create external references into the graph without realizing. ("But Maaaaa, I'm not mutating anything!")

# this "works"
let foo = someGraph.somePath.someRef

# this will get you into trouble
var foo = someGraph.somePath.someRef

# this will also get you in a world of trouble
myLocalObject.fooRef = someGraph.somePath.someRef

Notice how the code above never mutates the graph explicitly! But the graph is still mutated under the hood by the GC to maintain reference counts.

We're "lucky" that some of Nim's popular types (e.g., strings, seqs) have value semantics instead of being refs ... otherwise, we'd see memory corruption errors pop up a lot more frequently.

To give an example closer to home, you might just be accessing a JsonNode (which is a ref btw) and putting the node into another data structure, e.g. jsonResponse.body = sharedJsonNode looks totally innocuous. Now your code contains a potential race under multi-threading, and may some day lead to a memory corruption.

PS: I don't want to spread FUD, so if anybody knows about this better than I do, please correct me. I'm not an authority on this by any means, this is just my understanding based on reading about ARC/ORC, destructors, comments on Github issues, etc.

Mirror of forum.nim-lang.org

7703 :: How to use global immutable variables in Threads?