nimforum mirror - How to create, destroy, and recreate threads

cdunn2001 (orginal) [2017-03-27T21:14:26+02:00] view original

I am trying to test memory fragmentation after thread destruction, but I don't know how to destroy threads properly. What is wrong with this program?

import os

var threads: seq[Thread[int]]

proc wait(s: int) =
  for i in 0..10:
    var x = newStringOfCap(100000*(11-i))
  os.sleep(s*1000)
proc cycle() =
  for i in 0..threads.high:
    assert(not running(threads[i]))
    createThread(threads[i], wait, 1)
  echo "Joining", threads.len
  joinThreads(threads)
  echo "Joined", threads.len
proc main() =
  newSeq(threads, 4)
  for i in 0..30:
    cycle()
main()


Joining4
Joined4
Joining4
.... (hangs here)

cdunn2001 (orginal) [2017-03-27T21:37:35+02:00] view original

Interesting. I got it working by moving newSeq(threads, 4) into cycle(). I guess once a thread has finished running, simply releasing the Thread[] object to GC is the way to destroy it. They are apparently not re-usable.

I haven't been able to cause fragmentation after thread destruction, which is good. But I don't yet know the performance penalty of continually creating new threads, and whether it's justified by the savings in reduced fragmentation...

Stefan_Salewski (orginal) [2017-03-27T21:46:56+02:00] view original

I haven't been able to cause fragmentation

Are you sure that

var x = newStringOfCap(100000*(11-i))

is really contained in the executable? Well maybe with -O0, but maybe Nim compiler already removes it?

Jehan (orginal) [2017-03-27T23:05:03+02:00] view original

The code doesn't hang for me (either on macOS or Linux), and I can't immediately see why it would from looking at the code in system/threads.nim. Reusing a thread that is still running would obviously be a recipe for disaster, but that does not seem to be happening.

cheatfate (orginal) [2017-03-27T23:42:06+02:00] view original

@cdunn2001, Thread[] variable was not reusable until https://github.com/nim-lang/Nim/pull/5585. So, to run your code and see expected results, please update your Nim to current devel version.

cheatfate (orginal) [2017-03-27T23:46:39+02:00] view original

@Jehan, reusing is now possible, but if you reuse variable, you must not use joinThread(s), because it will join only last created thread for this variable.

cdunn2001 (orginal) [2017-03-28T01:04:32+02:00] view original

@cheatfate, Interesting, and good timing.

What about this bit of test/threads/treusetvar.nim for your new code:

+for i in 0..(ThreadsCount - 1):
+  var thread: Thread[Marker]
+  createThread(thread, worker, p)
+  joinThread(thread)
+echo p.counter

Shouldn't var thread: be outside the for-loop?

I tried it with my month-old Nim, and it does indeed hang either way. But why? Isn't var a brand new thing on the stack each time through the for-loop, fully zeroed?

My guess: The ThreadId was simply the address, and in this case the address does not change. And that's why it was so important to make Thread objects re-usable. Yes?

cheatfate (orginal) [2017-03-28T01:11:54+02:00] view original

@cdunn2001, you are correct. If you saw my PR, then you have seen issue it fixes https://github.com/nim-lang/Nim/issues/4719.

cdunn2001 (orginal) [2017-03-28T01:45:56+02:00] view original

@cheatfate, Wonderful! Not just a new feature, but also a fix for a dangerous bug.

Independent of that, I have found that create/destroy of threads is very fast (on OSX). So araq's idea of relying of thread destruction for fast memory clean-up works beautifully.

(Large memory allocation is still expensive, but that's a separate issue, possibly just the cost of bzero.)

Another question: When a thread ends (for later re-spawning) in the threadpool, will it now have its heap quickly-cleaned, as for thread destruction?

Alternatively, are threadvars persistent across spawns of the same thread (by chance) in a threadpool?

cdunn2001 (orginal) [2017-03-28T07:04:12+02:00] view original

Maybe there is still a bug in Thread? I now use threads in a very simple way:

for q in get_seq_data(config, min_n_read, min_len_aln):
    var (seqs, seed_id) = q
    log("len(seqs)=", $len(seqs), ", seed_id=", seed_id)
    var cargs: ConsensusArgs = (inseqs: seqs, seed_id: seed_id, config: config)
    if n_core == 0:
      process_consensus(cargs)
    else:
      var rthread: ref Thread[ConsensusArgs]
      new(rthread)
      createThread(rthread[], process_consensus, cargs)
      joinThread(rthread[])


... (threadpool first creates 48 threads, even though I do not use threadpool.)
[New Thread 0x7ffff015a700 (LWP 202052)]
[New Thread 0x7fffefedb700 (LWP 202053)]
[New Thread 0x7fffefbdc700 (LWP 202054)]
main(n_core=1)
len(seqs)=25, seed_id=2
[New Thread 0x7fffef52b700 (LWP 202055)]
[Thread 0x7fffef52b700 (LWP 202055) exited]
len(seqs)=98, seed_id=14
[New Thread 0x7fffef52b700 (LWP 202056)]
[Thread 0x7fffef52b700 (LWP 202056) exited]
len(seqs)=58, seed_id=15
[New Thread 0x7fffef52b700 (LWP 202057)]
[Thread 0x7fffef52b700 (LWP 202057) exited]
len(seqs)=43, seed_id=22
[New Thread 0x7fffef52b700 (LWP 202058)]
[Thread 0x7fffef52b700 (LWP 202058) exited]
len(seqs)=55, seed_id=25
[New Thread 0x7fffef52b700 (LWP 202059)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffef52b700 (LWP 202059)]
deallocOsPages_e5IRqVbks39a9bBzvLjGxw2g (a=0x7ffff7f3d0c8) at /home/UNIXHOME/cdunn/repo/gh/Nim/lib/system/alloc.nim:740
740         osDeallocPages(it, it.origSize and not 1)

(gdb) bt
#0  deallocOsPages_e5IRqVbks39a9bBzvLjGxw2g (a=0x7ffff7f3d0c8) at /home/UNIXHOME/cdunn/repo/gh/Nim/lib/system/alloc.nim:740
#1  0x00000000004143f3 in deallocOsPages_njssp69aa7hvxte9bJ8uuDcg_3 () at /home/UNIXHOME/cdunn/repo/gh/Nim/lib/system/gc.nim:107
#2  threadProcWrapStackFrame_dXJaXMz804k05DGz7X4RkA (thrd=0x7ffff7f79328) at /home/UNIXHOME/cdunn/repo/gh/Nim/lib/system/threads.nim:427
#3  threadProcWrapper_2AvjU29bJvs3FXJIcnmn4Kg_2 (closure=0x7ffff7f79328) at /home/UNIXHOME/cdunn/repo/gh/Nim/lib/system/threads.nim:437
#4  0x00007ffff76ba182 in start_thread (arg=0x7fffef52b700) at pthread_create.c:312
#5  0x00007ffff73e700d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

(gdb) l
735         when defined(debugHeapLinks):
736           cprintf("owner %p; dealloc A: %p size: %ld; next: %p\n", addr(a),
737             it, it.origSize and not 1, next)
738         sysAssert it.origSize >= PageSize, "origSize too small"
739         # note:
740         osDeallocPages(it, it.origSize and not 1)
741         it = next
742       when false:
743         for p in elements(a.chunkStarts):
744           var page = cast[PChunk](p shl PageShift)

(gdb) p it
$1 = (BigChunk_Rv9c70Uhp2TytkX7eH78qEg *) 0x101010101010101

That is with Nim origin/devel up-to-date, at


commit 172a9c8e97694846c3348983a9b2b7c2931c939d
Author: Dominik Picheta <[email protected]>
Date:   Mon Mar 27 12:14:06 2017

My program works fine without threads (n_core=0). It worked fine when I used threadpool.

Another problem with this approach is that is goes 3x slower (despite using GC_disable within the thread) than my single-threaded version, which was 3x faster than C+Python/multiprocessing. Very disappointing. The single-threaded version also suffers an explosion in memory fragmentation, though not as bad as before I started re-using strings and seqs within each task.

So at this point, I've lost my runtime advantage; I have to jump through hoops to avoid memory fragmentation (compared with Python multiprocessing); and now I have this seg-fault.

If anyone wants to debug this, let me know. I can put together a full test-case (via my corporate cloud server). I have 3 test-cases: 75k, 1.4M, and 800M. This crash happens only on the largest, but at least it happens pretty quickly.

cdunn2001 (orginal) [2017-04-03T18:07:04+02:00] view original

Sorry, I couldn't test this program

Could you tell me the problem? @bpr got it working (and failing). Try a fresh clone.

Araq (orginal) [2017-04-03T18:14:51+02:00] view original

No problem except that a day only has 24 hours. ;-)

bpr (orginal) [2017-04-03T18:49:06+02:00] view original

I'm guessing that @Araq meant he didn't have time to download and try out the program at all, not that he downloaded and couldn't get it to work. After the initial problem was fixed it was quite simple to get the results you described (thanks!) so I can't imagine it was a problem there.

I also had little time to experiment so I have no new info. Also, I'm hoping that @Araq or someone else gets there first and solves it :-)

cdunn2001 (orginal) [2017-04-04T00:24:10+02:00] view original

With that fix on devel, it no longer hangs on OSX, but it still seg-faults on Ubuntu.

Also, on OSX it runs fine, seems fast, but sucks up lots of virtual memory -- about 10GB/min (yes, ten). I have only 8GB of real RAM on my Mac, so I was surprised to see a process taking 40GB. New threads are clearly not re-using the released memory of discarded threads.

bpr (orginal) [2017-04-04T17:01:50+02:00] view original

@cdunn2001, it looks like it's running for me with the latest on Ubuntu 14.04 and 16.04; I haven't tried on OS X yet.

cdunn2001 (orginal) [2017-04-05T20:13:52+02:00] view original

Excellent!

Yes, it is working now, on both Ubuntu (and Centos6.6, built on Ubuntu) and OSX. With one worker thread, memory consumption is very low and stable, around 300MB. Beautiful!

With the earlier fix, there was actually a disturbing diff between expected and new output, indicating a really subtle memory bug, but that is fixed on origin/devel now too.

I will concentrate on runtime next, and experiment with multiple threads.

cdunn2001 (orginal) [2017-04-06T02:44:24+02:00] view original

The profiler seems to hang when using more than 1 worker thread. Is that expected? Unsupported?

@bpr, could you verify? I have pushed an update that supports N threads, stored in a seq. Until we are sure, let's discuss this via email, or in:

https://github.com/pb-cdunn/nim-debug/issues/2

cdunn2001 (orginal) [2017-04-09T23:17:40+02:00] view original

The profiler seems to hang when using more than 1 worker thread.... Ah, that is now fixed. See discussion with @bpr:

https://github.com/pb-cdunn/nim-debug/issues/2

Now the problem is a sudden jump in freemem on the main thread, and huge use of virtual memory. E.g.

+      log("tot=$1 occ=$2, free=$3 b4" % [$getTotalMem(), $getOccupiedMem(), $getFreeMem()])
+      GC_fullCollect()
+      log("tot=$1 occ=$2, free=$3 now" % [$getTotalMem(), $getOccupiedMem(), $getFreeMem()])


$ time N=4 SIZE=huge make
../main.exe --output_multi --min_idt 0.70 --min_cov 4 --max_n_read 500 --n_core 4 > out.nim.fasta < data/la4.huge/huge.la4falco
n
main(n_core=4)
len(seqs)=25, seed_id=2
tot=4206592 occ=3895296, free=311296 b4
tot=4206592 occ=1511424, free=2695168 now
len(seqs)=98, seed_id=14
tot=12738560 occ=5505024, free=7233536 b4
tot=12738560 occ=5517312, free=7221248 now
len(seqs)=58, seed_id=15
tot=8822784 occ=8105984, free=716800 b4
tot=8822784 occ=5414912, free=3407872 now
...
len(seqs)=37, seed_id=42
tot=9052160 occ=6340608, free=2711552 b4
tot=9052160 occ=5013504, free=4038656 now
len(seqs)=49, seed_id=43
tot=9052160 occ=6770688, free=2281472 b4
tot=9052160 occ=6270976, free=2781184 now
len(seqs)=55, seed_id=53
tot=2156535808 occ=7073792, free=2149462016 b4  !!!!!!!!!!!
tot=2156535808 occ=6787072, free=2149748736 now !!!!!!!!!!!
len(seqs)=50, seed_id=57
tot=9445376 occ=7110656, free=2334720 b4  ???
tot=9445376 occ=6815744, free=2629632 now
len(seqs)=29, seed_id=58
tot=9445376 occ=7041024, free=2404352 b4
tot=9445376 occ=6066176, free=3379200 now
...

See the sudden jump? Something weird is going on. (That is with GC_fullCollect() bewteen "b4" and "now" on the main thread.)

Virtual memory jumps around, as low at 1TB and as high as 60TB. So think there is still a problem, though not as bad as before: no crash, no increase in system RAM, decent runtime.

https://github.com/pb-cdunn/nim-debug/issues/3

cdunn2001 (orginal) [2017-04-11T18:03:26+02:00] view original

@bpr never duplicated that problem, even on Ubuntu14. Things are fine on my MacAir, and on a Centos7 machine at work. But the Ubuntu14 machines at work have this strange behavior -- except they seem fine when my Nim program is run in serial mode.

Part of the problem was me. I had work-arounds in the the threadpool library which had a strange effect with araq's new code. Without those work-arounds, virtual memory is stable, but runtime is still horrible on the Ubuntu14 virtual machines. So I can't explain it, but I guess it's something with the set-up at work.

Mirror of forum.nim-lang.org

2887 :: How to create, destroy, and recreate threads