nimforum mirror - Memory problem with HashSet?

fapik (orginal) [2015-06-09T11:19:02+02:00] view original

Hi.

I have a problem with understanding how Nim GC exactly works. I wrote simple program and after few minutes it reached to 15 MB of ram, and after another few minutes a program finally crashed. What i'm doing wrong? How can i fix it? I do not understand what i'm doing wrong. In other GC languages (java, python etc.) it could work perfectly fine, but in Nim it seems allocating memory infinitely without releasing it.


nim --version
Nim Compiler Version 0.11.2 (2015-05-04) [Windows: amd64]
Copyright (c) 2006-2015 by Andreas Rumpf

git hash: 3bef848a2cf60008f23e72571d7c20c0eb9fd728
active boot switches: -d:release


import sets
import os

type MyStruct = ref object
  field1: HashSet[string]
  field2: HashSet[string]

proc bigSet(): MyStruct =
  new(result)
  var s1 = initSet[string]()
  for i in 0..1000:
    s1.incl($i & "After becoming the .NET Compact Framework (.NETCF) dynamic memory management module owner I am continually learning a lot about the subject. Based on hallway discussion I figured out that a lot of developers are not very clear about the subject and would like to learn more. Most online material I encountered gets into too much of details and implementation for most casual readers. ")
  var s2 = initSet[string]()
  for i in 1000..2000:
    s2.incl($i & "123 After becoming the .NET Compact Framework (.NETCF) dynamic memory 123 management module owner I am continually learning a lot about the subject. Based on hallway discussion I figured out that a lot of developers are not very clear about the subject and would like to learn more. Most online material I encountered gets into too much of details and implementation for most casual readers. ")
  result.field1 = s1
  result.field2 = s2

for i in 0..1000000:
  let s = bigSet()
  if i div 100 == 0:
    echo s.field1
    echo s.field2
    GC_fullCollect()
    sleep(300)

quit 0

Jehan (orginal) [2015-06-09T11:48:44+02:00] view original

First of all, 15MB sounds reasonable in debug mode. I'm getting 16-20MB on OS X with 64-bit myself. You're generating two sets that require over 1MB of memory each, in order to print them, strings of about the same size need to be generated; and in debug mode, two copies of each may be kept alive, and the GC needs some breathing room on top of that.

That said, it should stabilize at this point (and should definitely not crash). For me, it does work fine, so something else must be going wrong. I am not clear what exactly causes the crash on your machine, though; does the program actually start using an indefinite amount of memory (i.e. can you observe it?), or does it crash for some other reason?

fapik (orginal) [2015-06-09T12:16:44+02:00] view original

I will investigate crash problem in the future.

But for me the greatest problem is big memory consumption. I want use (possibly) big sets in machine with very low memory and for me 5MB-10MB is greatest value that is acceptable for my program. Consumption above 10MB is definitely not acceptable in my situation. It would be perfectly ok if memory consumption will be ~5MB and around around this value. Are you suggesting launching this in release mode would decrease memory consumption? Can you suggest other things I can do to decrease memory consumption? Maybe Nim is not the right tool to do this kinds of work, but personally I do not like to write directly in C.

Araq (orginal) [2015-06-09T12:23:03+02:00] view original

Maybe Nim is not the right tool to do this kinds of work, but personally I do not like to write directly in C.

You can always write C-like code in Nim. I don't know why people think they can't. I really don't. There is a trivial 1 to 1 mapping to C if you're after that except that you get better type checking from Nim, tons of more checks in debug mode and don't have to mess with header files. The Nim code base has no hand written C except nimbase.h and even that we could easily do without.

I want use (possibly) big sets in machine with very low memory and for me 5MB-10MB is greatest value that is acceptable for my program.

I don't see any problems here. If others can do it, so can Nim. That said, maybe you should use - e.g. - SQLite? Databases are quite good at abstracting over the RAM vs disk distinctions. But let me guess .. you don't have a hard disk on this machine either? ;-)

Jehan (orginal) [2015-06-09T13:28:06+02:00] view original

fapik: Are you suggesting launching this in release mode would decrease memory consumption?

Not quite (because it can't be guaranteed). What I'm pointing out is that you create huge data structures (and their string representations) and then discard them by ignoring them. Having large data structures with unpredictable lifetimes is where your primary problem lies.

The standard sets implementation is also not very suitable for tight memory requirements, since it uses dynamic length arrays; if you use a GC under these circumstances, you want individual blocks to be relatively small. A tree-based implementation of sets would be better (a quick experiment using red-black trees with a dummy implementation of the $ operator resulted in memory-usage at around 3MB for me.

That said -- if you're working that close to your memory limit, a garbage collector is a risky proposition to begin with. You probably want to use manual memory management to be able to obtain hard(er) constraints on your memory usage, as suggested by Araq.

kirbyfan64sos (orginal) [2015-06-09T16:06:00+02:00] view original

What processor type does the system have?

fapik (orginal) [2015-06-09T16:48:24+02:00] view original

@kirbyfan64sos I'm developing on windows, but I will do cross-compilation to target PowerPc Linux 256MB Ram.

@Araq I was thinking about Sqlite but I finally abandon that idea because I want doing heavy disk IO operation, and I want do possible simplest solution without unnecessary dependency.

You can always write C-like code in Nim.

Can you recommend some valuable tutorials focused how to write C-like code in Nim? That will be very helpful.

@Jehan Red-black trees looks promising. I will try implement set using Red-black trees. Thank you for hint.

Jehan (orginal) [2015-06-09T22:09:25+02:00] view original

fapik: I will try implement set using Red-black trees.

I wrote a very basic implementation of red-black trees here, which may serve you as a starting point. (Note: there's no init function, a nil reference to a a red-black tree represents the empty set).

fapik (orginal) [2015-06-09T22:31:51+02:00] view original

Jehan: Thank you. Your code will be very helpful. If I find more free time I will definitely use it as starting point.

didlybom (orginal) [2015-06-09T22:54:08+02:00] view original

@araq said:

"You can always write C-like code in Nim. I don't know why people think they can't. I really don't."

I think that the fact that nim has a GC trips and scares people off. In one of the slides of your recent O'Reilly talk you said "string, ref, seq and closure are GC'ed, nothing else". My guess is that if people where more aware of that fact they would be less inclined to think this way. Maybe this could be stressed even more (and earlier, as soon as the GC is introduced) on the tutorials and in the manual?

Of course you would then need to avoid using those types at all, and that may mean not using a big chunk of the standard library...

That is why I really wish that there were a way to tell nim that you want to allocate those types (or at least strings and seqs, and perhaps also refs) on the stack and/or at least deallocate them as soon as you get out of their scope. My hope is that this would let you avoid the GC pauses altogether, which for certain applications (e.g. real-time, embedded software) makes nim unusable. Basically I'd like to be able to make nim behave more like C++ or rust. I know that you can use alloc and dealloc, but it would be much nicer and less error prone to not have to do the explicit deallocation. What I´m thinking is being able to do something like:

proc myproc()
  local let myseq = @[1, 2, 3]
  echo myseq

myproc()

And then be sure that when calling myproc(), seq would be automatically and immediately deallocated as soon as we exit myproc().

Maybe this does not make sense at all though, in which case this reply would just be another proof that people get confused as soon as a GC is involved :-P

Araq (orginal) [2015-06-09T23:40:19+02:00] view original

Maybe this does not make sense at all though, in which case this reply would just be another proof that people get confused as soon as a GC is involved

Well it certainly it another proof that people won't read what I write here anyway.

My hope is that this would let you avoid the GC pauses altogether, which for certain applications (e.g. real-time, embedded software) makes nim unusable.

Nim is not unusable for these domains at all and I'd like to read a single good reason for these claims. Ignorance is not a reason. And again, use glib's tree library, create a big tree and then freeing it will produce a "pause". No GC required for pauses.

Of course you would then need to avoid using those types at all, and that may mean not using a big chunk of the standard library...

Oh yeah, that's of course a good reason to use C instead which has no real stdlib to begin with...

Nim's GC is per thread: You can easily have threads which are hard realtime and don't use the GC and happily use the GC in the other threads... And the type and effect system ensures you don't mix these different per thread heaps. This is really a terrific design and I am beginning to think Nim is still the most misunderstood language out there.

jibal (orginal) [2015-06-10T00:09:06+02:00] view original

Araq: This is really a terrific design and I am beginning to think Nim is still the most misunderstood language out there.

As a friend of mine often says, no good deed goes unpunished.

Perhaps your appearance at OSCON will help garner interest and clarity.

didlybom (orginal) [2015-06-10T00:37:36+02:00] view original

Maybe this does not make sense at all though, in which case this reply would just be another proof that people get confused as soon as a GC is involved
Well it certainly it another proof that people won't read what I write here anyway.

Would you mind being a bit more specific about what you are referring to? Is it something that you have written to me specifically which I didn't read or didn't understand? I assure you that I'd like to understand whatever I missed.

My hope is that this would let you avoid the GC pauses altogether, which for certain applications (e.g. real-time, embedded software) makes nim unusable.
Nim is not unusable for these domains at all and I'd like to read a single good reason for these claims. Ignorance is not a reason. And again, use glib's tree library, create a big tree and then freeing it will produce a "pause". No GC required for pauses.

A while ago I discussed GC pauses with you on this forum. My understanding from that thread is that getting guaranteed GC pauses in the sub-millisecond (e.g. 100 us or less) range is not possible. When the system "tick" is 1 ms as is our case (by which I mean that every 1 ms we must do some deterministic, non delayable amount of work), having GC pauses in the 1 ms order of magnitude is not an option.

As you say, regular memory allocation, deallocation operations can introduce pauses as well. However the moment where those pauses happens is deterministic (when the variable gets out of scope or when you manually free it). This determinism makes the whole difference in the embedded, real-time use case.

That being said we do avoid that sort of thing by doing most of our allocations (including setting the capacity of std:vector and friends) at startup time.

Of course you would then need to avoid using those types at all, and that may mean not using a big chunk of the standard library...
Oh yeah, that's of course a good reason to use C instead which has no real stdlib to begin with...

I never said that is the case. We use C++ for most of our code. The C++ STL is not perfect nor complete but for the kind of things we do it is (barely) sufficient. In any case I don't particularly like C++! Particularly, the Cx03 version of the C++ language that we are forced to use because our board vendor only supports GCC 4.3 is terrible. I am not looking for reasons to use C++. On the contrary I would love to use nim instead! I just feel I can't for the reasons I stated above (long, non determinstic GC pauses).

Nim's GC is per thread: You can easily have threads which are hard realtime and don't use the GC and happily use the GC in the other threads... And the type and effect system ensures you don't mix these different per thread heaps. This is really a terrific design and I am beginning to think Nim is still the most misunderstood language out there.

Now I'm the one that feels misunderstood. I really, really like nim. I admire its design. I find that the language is terrific. I've gone through its manual, I've listened to some of your talks, I've read many threads on this forum, I've made a few small little programs with it. I believe I can replace a lot of my python work with nim.

Yet, I feel that in order to use it for what I'd like to use it the most (embedded, sub 1 ms real-time development) I would need to avoid using an important part of what makes nim practical (some of its nice stdlib, the useful and expressive seq type, strings...). Maybe I am wrong, but so far I don't see how.

Could we use nim as it is right now? Yes. Would a GC-less nim be better than what we use now? Probably, but the case would be much stronger if we could use the whole language.

The way I see it, currently nim gives you 2 options:

You can use the GC to get access to the full language

You can avoid the GC'ed parts of the language (seqs, strings, refs, closures and anything in the stdlib that uses any of those)

In both cases you can also do manual (alloc / dealloc) unsafe memory handling (lets call this option #3).

What I would like is to have a 4th option, which is to tell nim to deallocate certain variables when they are out of scope (a la rust or C++). I don't think this is currently possible but maybe I am wrong. I think if nim could do this then it would be usable exactly everywhere C and C++ can be used today.

Araq (orginal) [2015-06-10T01:20:16+02:00] view original

A while ago I discussed GC pauses with you on this forum.

Ah sorry, wasn't aware that it was you. ;-)

Could we use nim as it is right now? Yes. Would a GC-less nim be better than what we use now? Probably

Well the pause time depends on:

How fast the CPU is (obviously)

How deep the stack is when you call the GC_collect(). In a typical "render loop" setting it's tiny, but I have no idea how the structure of your programs looks like.

How much memory you actually allocate, cause it's of no help if the GC doesn't miss its deadlines but cannot keep up with the mutator either.

So ... in my opinion that's the real problem: If I tell you "got it to 0.1msec pause time!" that's rather meaningless cause I use different CPUs and workloads. You have to try it in practice and then report back "doesn't work because of ...". I am aware that it might be unfeasible to do this experiment, but if you don't I cannot see how you can claim "cannot use it for hard realtime" either.

And finally, the local feature for strings and sequences can be done with an escape analysis, it doesn't require yet another pragma. (Though a pragma could be useful as an annotation so that the compiler can report "in my opinion it does escape".)

didlybom (orginal) [2015-06-11T10:07:34+02:00] view original

Couldn't render post #8235.

jibal (orginal) [2015-06-11T10:17:37+02:00] view original

http://en.wikipedia.org/wiki/Escape_analysis

fapik (orginal) [2015-06-12T11:56:56+02:00] view original

I finally found way make my sample program stable. I will share with you my thoughts. Maybe someday my experience will help someone. GC works as expected, and HashSet works perfectly fine after some 'tunings'. I spent big amount of time testing and observing what is root of problem. My solution is not perfect but it is definitely good enough for me. I achieve reducing form 15 MB to 6 MB memory usage which is OK for me :) What I did was:

Copy 'HashSet' implementation from GitHub and paste it to my file. (I hope license permits that)

Remove all stuff (methods) I don't needed.

I added reset method which clear internal content but do not make any allocation or deallocation. This change was needed to avoid creation and destruction HashSet over and over again - it was made hard work to GC and leaded to allocate much more memory than needed.

Added buildString method which build string representation of set. It do not allocate anything, but rather use buffer provided by user. This change was needed because default string building was extremely inefficient in terms of memory (allocation over and over again leads to more allocation than needed - like in my previous point).

After this changes my test program is rock solid in terms of memory usage, and I do not see any weird memory behaviour.

Below is improved code. I will try prepare in the future pull request to Nim main line, to support reset method, and generating string using external buffer.


import
  os, hashes, math

{.pragma: myShallow.}
when not defined(nimhygiene):
  {.pragma: dirty.}

type
  KeyValuePair[A] = tuple[hcode: THash, key: A]
  KeyValuePairSeq[A] = seq[KeyValuePair[A]]
  HashSet* {.myShallow.}[A] = object ## \
    data: KeyValuePairSeq[A]
    counter: int

{.deprecated: [TSet: HashSet].}

# THIS IS NEW METHOD IN HashSet
proc reset*[A](s: var HashSet[A]) =
  s.counter = 0
  for i in 0..high(s.data):
    s.data[i].hcode = 0
    s.data[i].key = nil

proc isEmpty(hcode: THash): bool {.inline.} =
  result = hcode == 0

proc isFilled(hcode: THash): bool {.inline.} =
  result = hcode != 0

proc isValid*[A](s: HashSet[A]): bool =
  result = not s.data.isNil

proc len*[A](s: HashSet[A]): int =
  result = s.counter

iterator items*[A](s: HashSet[A]): A =
  assert s.isValid, "The set needs to be initialized."
  for h in 0..high(s.data):
    if isFilled(s.data[h].hcode): yield s.data[h].key

const
  growthFactor = 2

proc mustRehash(length, counter: int): bool {.inline.} =
  assert(length > counter)
  result = (length * 2 < counter * 3) or (length - counter < 4)

proc nextTry(h, maxHash: THash): THash {.inline.} =
  result = (h + 1) and maxHash

template rawGetKnownHCImpl() {.dirty.} =
  var h: THash = hc and high(s.data)  # start with real hash value
  while isFilled(s.data[h].hcode):
    if s.data[h].hcode == hc and s.data[h].key == key:  # compare hc THEN key
      return h
    h = nextTry(h, high(s.data))
  result = -1 - h                   # < 0 => MISSING; insert idx = -1 - result

template rawGetImpl() {.dirty.} =
  hc = hash(key)
  if hc == 0:       # This almost never taken branch should be very predictable.
    hc = 314159265  # Value doesn't matter; Any non-zero favorite is fine.
  rawGetKnownHCImpl()

template rawInsertImpl() {.dirty.} =
  data[h].key = key
  data[h].hcode = hc

proc rawGetKnownHC[A](s: HashSet[A], key: A, hc: THash): int {.inline.} =
  rawGetKnownHCImpl()

proc rawGet[A](s: HashSet[A], key: A, hc: var THash): int {.inline.} =
  rawGetImpl()

proc contains*[A](s: HashSet[A], key: A): bool =
  assert s.isValid, "The set needs to be initialized."
  var hc: THash
  var index = rawGet(s, key, hc)
  result = index >= 0

proc rawInsert[A](s: var HashSet[A], data: var KeyValuePairSeq[A], key: A,
                  hc: THash, h: THash) =
  rawInsertImpl()

var enlargeCount = 0

proc enlarge[A](s: var HashSet[A]) =
  inc(enlargeCount)
  var n: KeyValuePairSeq[A]
  newSeq(n, len(s.data) * growthFactor)
  swap(s.data, n)                   # n is now old seq
  for i in countup(0, high(n)):
    if isFilled(n[i].hcode):
      var j = -1 - rawGetKnownHC(s, n[i].key, n[i].hcode)
      rawInsert(s, s.data, n[i].key, n[i].hcode, j)

template inclImpl() {.dirty.} =
  var hc: THash
  var index = rawGet(s, key, hc)
  if index < 0:
    if mustRehash(len(s.data), s.counter):
      enlarge(s)
      index = rawGetKnownHC(s, key, hc)
    rawInsert(s, s.data, key, hc, -1 - index)
    inc(s.counter)

proc incl*[A](s: var HashSet[A], key: A) =
  assert s.isValid, "The set needs to be initialized."
  inclImpl()

proc incl*[A](s: var HashSet[A], other: HashSet[A]) =
  assert s.isValid, "The set `s` needs to be initialized."
  assert other.isValid, "The set `other` needs to be initialized."
  for item in other: incl(s, item)

template doWhile(a: expr, b: stmt): stmt =
  while true:
    b
    if not a: break

proc excl*[A](s: var HashSet[A], key: A) =
  assert s.isValid, "The set needs to be initialized."
  var hc: THash
  var i = rawGet(s, key, hc)
  var msk = high(s.data)
  if i >= 0:
    s.data[i].hcode = 0
    dec(s.counter)
    while true:         # KnuthV3 Algo6.4R adapted for i=i+1 instead of i=i-1
      var j = i         # The correctness of this depends on (h+1) in nextTry,
      var r = j         # though may be adaptable to other simple sequences.
      s.data[i].hcode = 0              # mark current EMPTY
      doWhile ((i >= r and r > j) or (r > j and j > i) or (j > i and i >= r)):
        i = (i + 1) and msk            # increment mod table size
        if isEmpty(s.data[i].hcode):   # end of collision cluster; So all done
          return
        r = s.data[i].hcode and msk    # "home" location of key@i
      shallowCopy(s.data[j], s.data[i]) # data[j] will be marked EMPTY next loop

proc excl*[A](s: var HashSet[A], other: HashSet[A]) =
  assert s.isValid, "The set `s` needs to be initialized."
  assert other.isValid, "The set `other` needs to be initialized."
  for item in other: excl(s, item)

proc init*[A](s: var HashSet[A], initialSize=64) =
  inc(enlargeCount)
  assert isPowerOfTwo(initialSize)
  s.counter = 0
  newSeq(s.data, initialSize)

proc initSet*[A](initialSize=64): HashSet[A] =
  result.init(initialSize)

template dollarImpl(): stmt {.dirty.} =
  result = "{"
  for key in items(s):
    if result.len > 1: result.add(", ")
    result.add($key)
  result.add("}")

proc `$`*[A](s: HashSet[A]): string =
  assert s.isValid, "The set needs to be initialized."
  dollarImpl()

# THIS IS NEW METHOD IN HashSet
proc appendToBuff[A](buff: var string, buffFillLen: var int, toAppend: A) =
  var toAppendStr = $toAppend
  
  # make room in buffer
  var newLen = buffFillLen + toAppendStr.len
  if(buff.len < newLen):
    # "align" 'newLen' to be multiply of 2
    var alignedNewLen = nextPowerOfTwo(newLen)
    buff.setLen(alignedNewLen)
  
  for i in 0..high(toAppendStr):
    buff[buffFillLen + i] = toAppendStr[i]
  buffFillLen = newLen

# THIS IS NEW METHOD IN HashSet
proc buildString*[A](aSet: HashSet[A], buff: var string) =
  var buffFillLen = 0
  appendToBuff(buff, buffFillLen, "{")
  for toInsert in items(aSet):
    if buffFillLen > 1: appendToBuff(buff, buffFillLen, ", ")
    appendToBuff(buff, buffFillLen, toInsert)
  appendToBuff(buff, buffFillLen, "}\0")

import os

proc bigSet(s1: var HashSet[string], s2: var HashSet[string]) =
  for i in 0..1000:
    s1.incl($i & "After becoming the .NET Compact Framework (.NETCF) dynamic memory management module owner I am continually learning a lot about the subject. Based on hallway discussion I figured out that a lot of developers are not very clear about the subject and would like to learn more. Most online material I encountered gets into too much of details and implementation for most casual readers. ")
  for i in 1000..2000:
    s2.incl($i & "123 After becoming the .NET Compact Framework (.NETCF) dynamic memory 123 management module owner I am continually learning a lot about the subject. Based on hallway discussion I figured out that a lot of developers are not very clear about the subject and would like to learn more. Most online material I encountered gets into too much of details and implementation for most casual readers. ")

var s1: HashSet[string] = initSet[string](2048)
var s2: HashSet[string] = initSet[string](2048)
var strBuff = newString(2048 * 2)
for i in 0..1000000:
  bigSet(s1, s2)
  s1.buildString(strBuff)
  echo strBuff
  s2.buildString(strBuff)
  echo strBuff
  
  s1.reset()
  s2.reset()

And final notes. Will be useful ability of manual freeing objects. But I have no idea how to do it right :( I think after adding this kind of ability we loose safety of the language. Also I'm not sure what semantic will be suitable. Manual freeing objects will be useful in "worker" pattern where some method get some data from queue and then work on them and produce result - data it is not needed anymore and can be safely freed. Sample pseudocode about manual memory freeing. I included in comments some thoughts about possible difficulty.


# PSEUDOCODE
var someObject = produceSomeTracedObject()
var someOtherReference = someObject
forceFreeObject(someObject)  # This call could get 'someObject' reference to null? It also return memory to OS regardless some other objects has references to it or not.
echo someOtherReference # ?? segfault? I think raising nim exception with stack trace would be hard to implement in this case.


# PSEUDOCODE
var someObject = produceSomeTracedObject()
var someOtherReference = someObject
forceFreeObject(someObject)  # ?? What will happen if GC runs just before this line and already freed this object?

forceFreeObject would be useful when GC is disabled, but then Std library should be improved to support working without GC - this would be huge work.

Other but more safe option.


# PSEUDOCODE
var someObject = produceSomeTracedObject()
var someOtherReference = someObject
GC_tryFreeObject(someObject)  # This call could set 'gameObject' reference to null? It my also return memory to OS if GC detect that there is no other live references to this variable. I don't know if it is feasible implement this with good performance (checking if there are some live references would be expensive).
echo someOtherReference # Segfault is not possible in this case, because this is live reference to memory location.

Manual freeing memory is not 'must have' for Nim but could be nice feature which would make Nim easier controlling memory consumption. If this feature will be implemented then would be nice and probably will help people with 'hard real time' background. I also understand that adding new feature is huge work, and requires careful design.

Edit1: I slightly improve code - now memory consumption dropped to ~4.5 MB

Arrrrrrrrr (orginal) [2015-07-02T11:58:05+02:00] view original

I will try prepare in the future pull request to Nim main line, to support reset method

It was about time!

Mirror of forum.nim-lang.org

1307 :: Memory problem with HashSet?