nimforum mirror - Collectors, heap, sharedRam, threads.

ML (orginal) [2023-06-05T17:05:37+02:00] view original

Dear all. After hours of research and googling for possible solution I cannot find the answer. So here goes my first post on this forum. I need to have memory vise optimized executable, but with use of dynamic MM. Here below is shortened version of similar program structure. The difference is that it not does silly things like copy and useless data processing:


import
  tables,
  os

# Definition of nput params for the Thread
type
  param = tuple[tbl: ptr OrderedTable[string, seq[string]]]

# Thread procedure. Just fot test it stores in Dictionary the key and sequence of strings with same value.
proc theProc(attr: param) {.thread.} =
  var
    labels = @["Black Demon", "Turtoise", "Bill Torvalds", "BUG", "Black Demon", "Bill Torvalds", "Linus Gates", "Linus Gates", "Linus Gates"]
    strings: ref seq[string] = new(seq[string])
  
  `=copy`(strings[], labels)
  
  for i in strings[]:
    # debugEcho i
    if not attr.tbl[].hasKey(i):
      attr.tbl[][i] = @[]
    var str: string = newStringOfCap(i.len)
    `=copy`(str, i)
    attr.tbl[][i].add str
  
  # Emptying the sequences. I expect Collector will free them
  labels = @[]
  strings[] = @[]


proc main(): void =
  # Shared ram Table, thread input params(attributes), and thread var.
  var
    theTable =  createShared(OrderedTable[string, seq[string]], 1)
    attr = ( tbl: theTable)
    thread = Thread[param]()
  
  # Creating the thread
  createThread(thread, theProc , attr)
  # Awaiting to finish thread execution
  thread.joinThreads()
  
  # Calculating Table contents size, and printing out content of theTable
  var size: int = 0
  debugEcho "\n Printing result from main():"
  for key, sequence in theTable[].pairs:
    size.inc(key.len)
    for i in sequence:
      size.inc(i.len)
    debugEcho "- ", key, sequence
  
  if not running(thread):
    freeShared(theTable)
    GC_fullCollect()
    debugEcho "--------------------------------------------"
    debugEcho "Size of theTable content is: ", size, ". But resources were not freed by the collector. See Resident (RES) in linux 'top' for example."
    sleep(120000)
  else:
    debugEcho "Thread still running, impossible scenario."

main()

My results with use of different GC:


arc: 786 bytes
orc: 892 bytes
boehm: 2410 bytes
regions: 904 bytes

While size of resulting Table content is: 138 bytes. I know resident data in ram contains stack too. But in case of huge data processing, usage goes to GigaBytes of not freed ram, until process will not quit. How to deal with it?

Araq (orginal) [2023-06-05T17:32:54+02:00] view original

But resources were not freed by the collector. See Resident (RES) in linux 'top' for example.

The allocator is not obliged to return memory back to the OS, instead it reuses the memory for itself. Also, your threading code is broken and your code style alien.

ML (orginal) [2023-06-05T19:50:00+02:00] view original

Oh, thanks Araq for pointing me on the problem in code. I not expected that language inventor will review it. Otherwise I would prepare much better! So regarding the GC, does it mean, that even with --d:malloc the allocator will hold that memory until whole process quits? Along with that mistake in code, I forgot also mention that I specified this parameter in nim.cfg If so, I really don't understand what to do with it? Switch to manual memory management? What options we can have when program needs to use actively ram, and alloc/dealloc objects with immediate real free()? I'm not very experienced in different Gc's and their working principles, but I thought GC will use the ram optimally. But with current example, such executable can hold a lot of ram on the server, while other heavy executable may need it. For example until the exe will not save processed data on the disk, all that GigBytes will be hold.

Here is updated code, with corrected Thread variable declaration. Also I removed unnecessary stuff, to not confuse others with my unique coding style :)


import
  tables,
  os

# Lock, Shared ram Table, thread input params(attributes), and thread var.
var
  theTable = createShared(OrderedTable[string, seq[string]], 1)
  thread: array[1, Thread[ptr OrderedTable[string, seq[string]]]]

# Thread procedure. Just fot test it stores in Dictionary the key and sequence of strings with same value.
proc theProc(tbl: ptr OrderedTable[string, seq[string]]): void {.thread.} =
    var
      labels = @["Black Demon", "Turtoise", "Bill Torvalds", "BUG", "Black Demon", "Bill Torvalds", "Linus Gates", "Linus Gates", "Linus Gates"]
    
    for i in labels.items:
      
      if not tbl[].hasKey(i):
        tbl[][i] = @[]
      var str: string = i
      tbl[][i].add str
    # Emptying the sequences. I expect Collector will free them
    labels = @[]




# Creating the thread
createThread[ptr OrderedTable[string, seq[string]]](thread[0], theProc, theTable)

# Awaiting to finish thread execution
joinThreads(thread)

# Calculating Table contents size, and printing out content of theTable
var size: int = 0
debugEcho "\n Printing result from main():"
for key, sequence in theTable[].pairs:
  size.inc(key.len)
  for i in sequence:
    size.inc(i.len)
  debugEcho "- ", key, sequence

if not running(thread[0]):
  freeShared(theTable)
  GC_fullCollect()
  debugEcho "--------------------------------------------"
  debugEcho "Size of theTable content is: ", size, ". But the ram were not freed by collector. See Resident (RES) in linux 'top' for example."
  sleep(120000)
else:
  debugEcho "Thread still running, impossible scenario."

elcritch (orginal) [2023-06-05T23:49:00+02:00] view original

So regarding the GC, does it mean, that even with --d:malloc the allocator will hold that memory until whole process quits?

That depends on a lot of factors. Generally allocators will hold on to a certain amount of memory for performance reasons. Here's an interesting read on the topic: https://www.algolia.com/blog/engineering/when-allocators-are-hoarding-your-precious-memory/

Along with that mistake in code, I forgot also mention that I specified this parameter in nim.cfg If so, I really don't understand what to do with it? Switch to manual memory management? What options we can have when program needs to use actively ram, and alloc/dealloc objects with immediate real free()?

It'd help to clean up your code a bit first and make it more idiomatic. Even in a GC'ed language it's possible to leak. The usage of raw pointers makes me think that's a possible culprit in your case.

Try to avoid ptr and use ref instead. Using ptr is natural in C, but is rarely needed in Nim. You can simplify your code using by switching from ptr OrderedTable to OrderedTableRef. It'd also remove the need for manual [] derefs. Additionally you can run you Nim program compiled with --useMalloc using valgrind to check for memory leaks / threading issues. Lastly, Nim defaults to copying strings on assignment so you shouldn't need to use copy= manually.

Once you get that figured out, I'd suggest playing with turning --useMalloc on and off, and maybe finding some environment variables for GLIBC (or the MacOS / Windows equivalents). Note that figuring out the actual spaced used can be tricky, RSS vs VSS, etc.

Also, @araq's new malebolgia might be easier to use for your multi-threading. It's experimental but seems nice for these sorta tasks.

I'm not very experienced in different Gc's and their working principles, but I thought GC will use the ram optimally. But with current example, such executable can hold a lot of ram on the server, while other heavy executable may need it. For example until the exe will not save processed data on the disk, all that GigBytes will be hold.

Yes and no, GC's can be very complicated but in my experience and can be tuned for different properties. For example, Java's GC tends to be tuned for throughput at the expense of using more RAM. Go's GC tends to be tuned for lower memory usage.

Generally though reference counting like Nim's ARC/ORC will use less RAM as it'll generally be freed sooner. Note that some folks don't even consider pure reference counting a GC!

termer (orginal) [2023-06-06T09:47:40+02:00] view original

@ML >So regarding the GC, does it mean, that even with --d:malloc the allocator will hold that memory until whole process quits?

If I remember correctly, -d:useMalloc only works for ARC/ORC, so if you're not using those, it has no effect. I've dealt with the same issue, thinking the program was leaking memory everywhere when really it was just keeping it for itself. If you're ever doing debugging with Valgrind or similar, using -d:useMalloc can help, but in general it does also let you return memory to the OS. I tend to compile with it because I don't like Nim sucking up memory and freeing it at weird times.

If you want a deterministic memory management strategy and can handle yourself and not create any cycles, use ARC. You'll have a much easier time reasoning about memory management that way.

ML (orginal) [2023-06-06T11:13:26+02:00] view original

Hello elcritch! Thanks for reply.

: Generally allocators will hold on to a certain amount of memory for performance reasons.

I believe wrapper around malloc does that? Since I used malloc() with my own manual memory management in C, and only free() will return back to the OS. In our case we don't see free() to be executed after thread exits (what is quiet bit strange). Of course I understand what performance reasons it can be. GC does not knows will the thread be executed again or not, and holds all allocations to not alloc again.

: Try to avoid ptr and use ref instead.

Already did that previously, with same results. Changing to OrderedTableRef will not help. My first example with =copy and other weird stuff was used especially to point to the problem I complain.

: Additionally you can run you Nim program compiled with --useMalloc using valgrind to check for memory leaks / threading issues.

I thought --d:useMalloc used in my example should do the same? Meanwhile here compiler prints: Error: invalid command line option: '--useMalloc'

I use Google Sanitizers, while valgrind is indeed a good tool! I suggest you to try Dr.Memory (also nice one).

: Generally though reference counting like Nim's ARC/ORC will use less RAM as it'll generally be freed sooner.

Unfortunately it will not work for me either. The executable does it's work fast, while during saving processed data to the disk, it can hold a lot of gigs right before it exits, so at that point I expect to have only output and program data to be stored in ram. What not happens and no reasons to await Arc/Orc to do their job. I think some kind of "force Collector run" function should be included for such cases.

I always can do manual MM, but having optimal for my task GC and dynamics was the reason why I choose not to do so.

ML (orginal) [2023-06-06T12:03:39+02:00] view original

Hello termer! >: If I remember correctly, -d:useMalloc only works for ARC/ORC, so if you're not using those, it has no effect.

The reason I compare other GC's is to show the problem persists with them too. Arc/Orc ram consumption results was included in my first message.

: I've dealt with the same issue, thinking the program was leaking memory everywhere when really it was just keeping it for itself. If you're ever doing debugging with Valgrind or similar, using -d:useMalloc can help, but in general it does also let you return memory to the OS.

Sure, there is indeed no memory leaks at all. At least leaksan not reports anything. I like valgrind, but sometimes I'm too lazy to use it :)

: I tend to compile with it because I don't like Nim sucking up memory and freeing it at weird times.

I expected GC_fullCollect() will force collector to do it's tasks. But no.. The more strange for me, is that I call it right before program exits, also sharedRam were "freed" right before it.

elcritch (orginal) [2023-06-06T23:06:41+02:00] view original

Greetings @ML!

I thought --d:useMalloc used in my example should do the same? Meanwhile here compiler prints: Error: invalid command line option: '--useMalloc'

Yah, my bad it's -d:useMalloc.

I use Google Sanitizers, while valgrind is indeed a good tool! I suggest you to try Dr.Memory (also nice one).

I'll check it out!

Already did that previously, with same results. Changing to OrderedTableRef will not help.

My main point there was to check and make sure that there wasn't an accidental memory leak and that avoiding ptr's helps with that by making it easier to read/follow. The same applies with calling createShared or freeShared.

I expected GC_fullCollect() will force collector to do it's tasks. But no.. The more strange for me, is that I call it right before program exits, also sharedRam were "freed" right before it.

It's strange because you're reasoning about it incorrectly due to missing context. ;) You can do GC_fullCollect() but if the GC still thinks you have a reference to the memory it won't release it.

For a bit of context, GC's rely on the current stack or scope to know when a piece of memory is being referenced. The natural "scopes" that Nim uses are procs, funcs, blocks (including if/else/for loop blocks), and closures which capture memory.

I read through your example a bit more and see some reasons why the GC won't free the memory in table. Note the example below isn't meant to compile, but just to give exaples. I probably typo'ed a few details. :)

Note that freeShared call doesn't properly tell the GC that the memory is free, but it manually skips the GC and frees just that piece of memory. You can use GC_unref to force the GC to release a reference, but that's error prone. The more natural way to work with GC's is to think in "scopes" like proc's or block's and let the compiler do that for you.

Here's some more detailed notes:

type
  param = tuple[tbl: OrderedTableRef[string, seq[string]]]

proc theProc(attr: param) {.thread.} =
    var
      labels = @["Black Demon", "Turtoise", "Bill Torvalds", "BUG", "Black Demon", "Bill Torvalds", "Linus Gates", "Linus Gates", "Linus Gates"]
    
    for i in labels.items:
      if not tbl.hasKey(i):
        tbl[i] = @[]
      var str: string = i
      tbl[i].add str
     # labels = @[] ## not needed, will be freed when `theProc` exits
                             ## also note labels are copied to `tbl` and still exist

proc main(): void =
  block: ## adding this block with ARC/ORC will free the
              ## memory in `tbl` at the end of the block
              ## the original code kept this in scope until the end
              ## meaning the GC can't determine it's not used anymore
    var
      theTable =  newOrderedTableRef[string, seq[string]]()
      attr = (tbl: theTable)
      thread = Thread[param]()
    
    createThread(thread, theProc , attr)
    thread.joinThreads()
    
    var size: int = 0
    debugEcho "\n Printing result from main():"
    for key, sequence in theTable[].pairs:
       ...
    ## the block end's here, which will call `GC_unref` on all the variables
    ## in this scope. This means `theTable` should be properly freed
  
  if not running(thread):
    ## freeShared(theTable) ## if you must manually free you should use `GC_unref`
                                               ## otherwise the GC won't know free the sub-items (at least with ARC/ORC).
                                               ## Though using a `block` or a separate proc will
                                               ## end the the scope for you, and call `GC_unref` for you.
    GC_fullCollect() ## usually not needed with ARC, ORC use heuristics to call
    debugEcho "--------------------------------------------"
    debugEcho "Size of theTable content is: ", size, ". But resources were not freed by the collector. See Resident (RES) in linux 'top' for example."
    sleep(120000)
  else:
    debugEcho "Thread still running, impossible scenario."

main()

Hopefully that helps some!

Oh and I would second @termer and say that using ARC is the best for reasoning about memory and freeing memory immediately when you think in terms of "scopes". It's what I used on embedded devices where I only have kb's or RAM.

ML (orginal) [2023-06-07T00:39:13+02:00] view original

@elcritch, @Araq, @termer

I was wrong about GC. This issue related to the kernel. I found the way how to release freed ram by GCback.

proc malloc_trim*(size: csize_t): cint {.importc, varargs, header: "malloc.h", discardable.}

Usage example:

discard malloc_trim(0.csize_t)

More info:

https://man7.org/linux/man-pages/man3/malloc_trim.3.html

P.S. from 11.2GB it dropped Resident ram to 3.2GB (processed data with little overhead)

elcritch (orginal) [2023-06-07T00:43:28+02:00] view original

Great!

Mirror of forum.nim-lang.org

10248 :: Collectors, heap, sharedRam, threads.