nimforum mirror - A new concurrency system

Araq (orginal) [2014-01-06T01:31:19+01:00] view original

Here is what I have so far:

http://nimrod-lang.org/blog/concurrency.html

http://nimrod-lang.org/blog/concurrency2.html

Please comment but don't reddit yet!

Orion (orginal) [2014-01-06T02:13:07+01:00] view original

"You can simulate shared memory via message passing (which is what modern hardware does, to some extent) and you can use message passing to simulate shared memory."

Might want to look at that sentence again.

How do you assign to it over and over?


  let it = b
  while it != nil:
    if it.word == word:
      inc it.counter
      return
    it = it.next

Jehan (orginal) [2014-01-06T07:31:44+01:00] view original

Orion: Might want to look at that sentence again.

I don't see the problem with the sentence?

Orion: How do you assign to it over and over?

That should probably be:


var it = b
...

Jehan (orginal) [2014-01-06T07:36:39+01:00] view original

Some more feedback:

The example actually illustrates what I think are the major issues I see:

No support for shared ref (yet). You'll notice that you'll never be able to free the contents of the hash table, as it contains data from 256 different worker threads, all mixed up. In fact, unless I'm mistaken, sync() will terminate the workers and free up their heaps, so whether listing() will actually work is anybody's guess. Using GCed memory is probably the most common use case even for concurrent programs, so it does not make much sense to me to defer such a critical part until a later date.

Unless your planned implementation of allocShared0() is substantially different from what's currently on GitHub, it will function as a serialization bottleneck for the example, as each allocShared0() will have to acquire and release the global HeapLock. I note also that allocShared() and allocShared0() currently have an int argument rather than a type argument, so it may be that you have changes planned that I don't know yet.

Furthermore, there's the following:

In general, having a single shared global heap will also present you with performance issues for NUMA architectures even if you do have a concurrent allocator; in principle, these can be resolved, but it's not trivial, especially without a garbage collector that can move/compact memory.

As I've mentioned before, there seems to be currently no support for read/write locks. Having multiple concurrent readers is critical for a number of performance-sensitive applications. A simple example is a parallel matrix multiplication operation, where you want to be able to obtain read locks rather than write locks on at least one of the input matrices.

It looks as though spawn() and sync() are intended to work at the OS thread level (unless I'm misunderstanding you, but this is what the reference to createThread() seems to imply). Creating an OS thread is potentially a pretty heavyweight operation, and Cilk and other languages that use this construct have their own scheduler with lightweight threads for that reason (I think the Cilk reference manual mentions that creating one of their threads is only moderately more expensive than a procedure call).

It is unclear how spawn() and sync() deal with thread-local variables. There does not seem to be a place to provide hooks to have them initialized/cleaned up, and Nimrod does not allow them to be initialized in their declaration. (Actually, that's a bit of an issue with the current way Nimrod handles them, too.) While you can add such code in the main function, thread-local variables are often a hidden implementation detail of a module and the caller may not know how to initialize them.

I'm not clear what Future[T] [1] being read destructively means when somebody actually tries to read it twice. Will it throw an exception, return nil or something else, etc.? I'm also not sure why it's necessary to introduce an operator for this rather than a function.

(Note that most of these are issues with the implementation and/or style; I think the basic approach is mostly sound, though I'd still like to give it a closer look once I have more time.)

[1] As an aside, I'm not a big fan of naming types based on computer science jargon. Types such as Unit and Future make perfect sense in the context of the work that introduced them, but are likely to be confusing to a newcomer who isn't immersed in the background.

Araq (orginal) [2014-01-06T09:38:14+01:00] view original

*(1) No support for shared ref (yet). You'll notice that you'll never be able to free the contents of the hash table, as it contains data from 256 different worker threads, all mixed up.*

No. Every thread uses allocShared. It's all in the same heap.

I note also that allocShared() and allocShared0() currently have an int argument rather than a type argument, so it may be that you have changes planned that I don't know yet.

Indeed. allocShared gets a safer interface. It now returns a typed shared ptr.

(4) As I've mentioned before, there seems to be currently no support for read/write locks. Having multiple concurrent readers is critical for a number of performance-sensitive applications. A simple example is a parallel matrix multiplication operation, where you want to be able to obtain read locks rather than write locks on at least one of the input matrices.

Yeah well, I have not enough experience with read/write locks to integrate them properly. Of course wrapping posix stuff works and then you need the lockfree environment that I mention to keep the the type system happy. But feel free to improve the system!

(5) It looks as though spawn() and sync() are intended to work at the OS thread level (unless I'm misunderstanding you, but this is what the reference to createThread() seems to imply). Creating an OS thread is potentially a pretty heavyweight operation, and Cilk and other languages that use this construct have their own scheduler with lightweight threads for that reason (I think the Cilk reference manual mentions that creating one of their threads is only moderately more expensive than a procedure call).

No, instead it follows Cilk's approach indeed.

(6) It is unclear how spawn() and sync() deal with thread-local variables. There does not seem to be a place to provide hooks to have them initialized/cleaned up, and Nimrod does not allow them to be initialized in their declaration. (Actually, that's a bit of an issue with the current way Nimrod handles them, too.) While you can add such code in the main function, thread-local variables are often a hidden implementation detail of a module and the caller may not know how to initialize them.

That's the price of "light-weight" tasks and why there is both spawn and createThread. If you map thread local storage to task local storage in your design, you need more control and thus createThread.

(7) I'm not clear what Future[T] [1] being read destructively means when somebody actually tries to read it twice. Will it throw an exception, return nil or something else, etc.? I'm also not sure why it's necessary to introduce an operator for this rather than a function.

An operator is cooler. ;-) Wether it returns default(T) or raises an exception depends on what the implementation suggests. IMO It needs to be as fast as possible. However, read could also transmit any exception that the spawned proc raised to the thread that performs the read.

Jehan (orginal) [2014-01-06T10:04:37+01:00] view original

Araq: No. Every thread uses allocShared. It's all in the same heap.

Not the strings that you store (in the line x.word = word). When a string gets assigned, the deep copy implicitly does a shared allocation (at least looking at the code that's currently being generated).

Araq (orginal) [2014-01-06T10:14:34+01:00] view original

Not the strings that you store (in the line x.word = word). When a string gets assigned, the deep copy implicitly does a shared allocation (at least looking at the code that's currently being generated).

word is declared as an array of char for this reason, but the code indeed uses == and =. Will correct it. And the type system catches the bug you mention: string can't be embedded in a shared ptr.

Jehan (orginal) [2014-01-06T10:55:23+01:00] view original

Araq: word is declared as an array of char for this reason, but the code indeed uses == and =.

Ah, I had overlooked it in the object definition (the inc procedure takes a string argument).

Araq: And the type system catches the bug you mention: string can't be embedded in a shared ptr.

This will severely limit its usefulness, then (I'm assuming the same holds for seq). In your example, you're essentially forced to either limit the maximum word length that you can store or to duplicate functionality that you already have elsewhere.

The underlying problem is that there's no shared ref (yet), which I would still argue is the more important use case (you're simply losing too much core functionality of Nimrod when you're forced to manually manage memory).

Jehan (orginal) [2014-01-06T11:02:24+01:00] view original

Araq: Yeah well, I have not enough experience with read/write locks to integrate them properly.

In principle, they do not work much differently from mutexes, except that you have to specify whether you want a read lock or a write lock (your deadlock avoidance approach should carry over, more or less). At the language level, you must also be able to guarantee that any data for which you only acquire a read lock does not actually get modified for the duration of the read lock.

gradha (orginal) [2014-01-06T12:02:55+01:00] view original

The last example on the first part is confusing: since entry 9 is not protected by lock 0. There is no indication where this comes from. Previous to the example a sentence like this could be added: This system requires the programmer to know which lock corresponds to which variables. In the following case we will distribute evenly the array of ints so that eight consecutive values are handled by the same lock.

In the second article spawn is a teaser with "passes a thread proc to some underlying thread pool implementation and causes it to be run concurrently". spawn group and sync imply that there is some form of serialisation. That seems to map well to GCD, do you plan on implementing nimrod's thread pools on top of such systems (be it GCD or other)?

Araq (orginal) [2014-01-06T13:21:35+01:00] view original

gradha: > In the second article spawn is a teaser with "passes a thread proc to some underlying thread pool implementation and causes it to be run concurrently". spawn group and sync imply that there is some form of serialisation. That seems to map well to GCD, do you plan on implementing nimrod's thread pools on top of such systems (be it GCD or other)?

I don't plan any GCD integration, but it seems like it's possible in theory so perhaps later we can add that.

Orion (orginal) [2014-01-06T16:08:40+01:00] view original

"You can simulate shared memory via message passing (which is what modern hardware does, to some extent) and you can use message passing to simulate shared memory."

Says you can simulate shared memory with message passing, AND you can simulate shared memory with message passing

Jehan (orginal) [2014-01-06T19:26:58+01:00] view original

Araq: That essentially means we have not lock levels but read lock levels and write lock levels, right? That also means we need guarded read ptr and guarded write ptr afaict.

Neither seems to be necessary. A total or partial ordering that makes mutex acquisition deadlock-free will also make read/write lock acquisition deadlock free (been there, done that). Read/write locks have an extra potential for deadlock only because if two threads simultaneously want to upgrade a read lock to a write lock, it'll deadlock, and any irreflexive order prevents that. There is no need to have separate guarded ptr types, either: All you need to make sure is that a value that is locked as readonly in a lock statement is not modified within that lock statement (i.e., does not have any of its fields assigned to or gets passed as a var argument to a procedure). Example (with made-up syntax):


type L = ...
var a, b: L

proc reset(var x: L) = ...

...

lock a.rwlock as readonly, b.rwlock as readwrite:
  b.data = a.data # correct
  a.data = b.data # compile time error
  reset(b) # correct
  reset(a) # compile time error

Jehan (orginal) [2014-01-06T19:36:25+01:00] view original

Edit: Nevermind, I think I misinterpreted the {.guard.} pragma in part 2 to replace the protection mechanism in part 1 rather than to complement it.

Looking at the {.guard.} pragma, I'd recommend an improvement to it.

Right now, if you want to protect an entire object by it (which is a rather common case), you have to tag each and every field. That's just not very extensible. It would be nice to be able to do something like:


type Lockable =
  guarded ptr object {.inheritable, guard: accessLock.}
    accessLock: Lock[0]

template exclusive(obj: Lockable, body: stmt): stmt =
  lock obj.accessLock:
    body

to guard an entire object in case you don't want/need to micromanage access to individual fields.

Araq (orginal) [2014-01-07T21:53:17+01:00] view original

Jehan: It would be nice to be able to do something like: object {.inheritable, guard: accessLock.}

Yeah ok. And your proposal that includes reader/writer locks looks good too.

Araq (orginal) [2014-01-08T00:29:45+01:00] view original

Jehan: The underlying problem is that there's no shared ref (yet), which I would still argue is the more important use case (you're simply losing too much core functionality of Nimrod when you're forced to manually manage memory).

Ok, here is an idea to deal with this problem: We introduce shared ref, shared string and shared seq but implement them as a pair (pointer to allocator, ptr) and require manual memory management for them: An assignment of the form sharedRef = someRef then translates into: GC_ref(someRef); sharedRef = (<thisAllocator>, someRef). Since this runs in the thread that owns the allocator this requires no synchronization.

However the memory needs to be freed explicitly with GC_unref: Since this might be invoked by some other thread however, this would require synchronization. Instead we put them into a (protected) lockfree queue that belongs to the allocator. When the GC that belongs to this allocator runs again, these are considered (GC decrefs).

Araq (orginal) [2014-07-21T19:01:18+02:00] view original

Just for your information: This thread is pretty old and is now at the top due to spam bot activity. The concurrency system we're working on differs significantly from these blog posts.

dom96 (orginal) [2014-07-21T19:45:22+02:00] view original

Sounds like a Nimforum bug. I was puzzled when I saw it because you must have removed the post but it was still at the top.

Libman (orginal) [2019-01-05T06:20:02+01:00] view original

I noticed that this blog article was just resubmitted to Hacker News by someone named mindB. It's currently in 8th place / 24 points / no comments so far.

Mirror of forum.nim-lang.org

334 :: A new concurrency system