nimforum mirror - Memory usage skyrocketed with nim 0.18.0 (in my async tcp service test)

aguspiza2 (orginal) [2018-04-01T15:31:35+02:00] view original

I have done a little test to find out if I could run some microservices with async code in memory constrained devices. Here is the test code:

https://gist.github.com/aguspiza/80e34b5cf65aa3bbfd19c7339ee9b695

That test uses 5x more memory with 0.18.0 than with 0.17.2 (or any previous version at least with boehm gc)

Here you have some numbers:


Linux x64 0.17.2 boehm 24MB used (release 14MB)
Linux x64 0.18.0 boehm 94MB used (release 84MB)

other gc's have similar increases +40MB:
0.17.2 default gc 70MB (release 30-36MB)
0.18.0 default gc 146MB (release 110MB)
similar result with gc:v2 146MB (release 110MB)

I am getting those numbers from RSS column on "ps"


$ ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root     13946 38.9  0.0  18024 15136 pts/0    Sl+  11:55   0:22 ./httpasync

Is this a known issue?

Is this expected from latest gc optimization (TLSF algorithm) in 0.18.0?

Why is there so much difference between gc's?

Should I use boehm, markandsweep, v2 or default gc?

Boehm is giving the best results but Araq told me (at IRC) that it is not using Nim's memory allocator ?!?!

Araq (orginal) [2018-04-01T20:21:31+02:00] view original

Well you have a logical memory leak in your code, you keep adding things to you sequence, clients.add client. If async is faster, it fills up your sequence faster. In a memory constrained environment, don't have logical leaks.

jcosborn (orginal) [2018-04-01T20:21:51+02:00] view original

Can you try testing with the Nim devel branch? Some allocator bugs have been fixed there.

aguspiza2 (orginal) [2018-04-02T23:46:45+02:00] view original

Thanks Araq, right now I am using "clients" to avoid deallocating memory, so that I can get peak memory use. In the future, it will be used to dispatch messages to specific connected clients. This service should also remove clients from that sequence when they disconnect. Sorry, I will try to clean up the code so that the test makes more "sense", this may help me to find out which part is allocating those extra 70MB (Sorry for the long post):


sudo pmap 26153
26153:   httpasync/src/httpasync018
0000556fff75a000    124K r-x-- httpasync018
0000556fff978000      4K r---- httpasync018
0000556fff979000      4K rw--- httpasync018
0000556fff97a000     32K rw---   [ anon ]
0000557000ed2000    124K rw---   [ anon ]
00007f2806a93000  84164K rw---   [ anon ]
00007f280bcc4000   2172K r-x-- libgc.so.1.0.3
00007f280bee3000      4K r---- libgc.so.1.0.3
00007f280bee4000      4K rw--- libgc.so.1.0.3
00007f280bee5000    264K rw---   [ anon ]
00007f280bf2c000    548K r-x-- ld-musl-x86_64.so.1
00007f280bfbc000   1472K rw---   [ anon ]
00007f280c12c000      4K r--s- localtime
00007f280c12f000      8K -----   [ anon ]
00007f280c131000     84K rw---   [ anon ]
00007f280c146000      8K -----   [ anon ]
00007f280c148000     84K rw---   [ anon ]
00007f280c15d000      8K -----   [ anon ]
00007f280c15f000    340K rw---   [ anon ]
00007f280c1b4000      4K r---- ld-musl-x86_64.so.1
00007f280c1b5000      4K rw--- ld-musl-x86_64.so.1
00007f280c1b6000     12K rw---   [ anon ]
00007ffdf4ad5000    132K rw---   [ pila ]
00007ffdf4b9c000      8K r----   [ anon ]
00007ffdf4b9e000      8K r-x--   [ anon ]
ffffffffff600000      4K r-x--   [ anon ]
 total            89624K

versus 0.17.2


sudo pmap 26094
26094:   httpasync/src/httpasync
000055b617f37000    104K r-x-- httpasync
000055b618150000      4K r---- httpasync
000055b618151000      4K rw--- httpasync
000055b618152000     28K rw---   [ anon ]
000055b6194e0000    120K rw---   [ anon ]
00007fe5853e4000  11648K rw---   [ anon ]
00007fe585f44000   2172K r-x-- libgc.so.1.0.3
00007fe586163000      4K r---- libgc.so.1.0.3
00007fe586164000      4K rw--- libgc.so.1.0.3
00007fe586165000    264K rw---   [ anon ]
00007fe5861ac000    548K r-x-- ld-musl-x86_64.so.1
00007fe586235000   1512K rw---   [ anon ]
00007fe5863af000      8K -----   [ anon ]
00007fe5863b1000     84K rw---   [ anon ]
00007fe5863c6000      8K -----   [ anon ]
00007fe5863c8000     84K rw---   [ anon ]
00007fe5863dd000      8K -----   [ anon ]
00007fe5863df000    340K rw---   [ anon ]
00007fe586434000      4K r---- ld-musl-x86_64.so.1
00007fe586435000      4K rw--- ld-musl-x86_64.so.1
00007fe586436000     12K rw---   [ anon ]
00007ffebaff6000    132K rw---   [ pila ]
00007ffebb124000      8K r----   [ anon ]
00007ffebb126000      8K r-x--   [ anon ]
ffffffffff600000      4K r-x--   [ anon ]
 total            17116K

Anyway, I understand that it should work (leak) in a similar way regardless of the gc or nim version used. There is a fixed number of clients that are connecting, so that sequence will grow to 1000 and stop there. I am measuring memory use after all clients (1000) have sent and received 1000 messages, so it should not depend on the speed they are added.

jcosborn, I have been doing all tests with "official" docker images (nimlang/nim:0.xx.y-alpine), and nimlang/nim:devel reports version 0.15.3. I have been looking for the original Dockerfile of those images to build an updated devel branch but it seems it is actually generated with nim code in a very smart way :) I'll put some time on this and report back.

jcosborn (orginal) [2018-04-03T03:07:32+02:00] view original

Actually, the allocator fixes, I think, would only apply to large (>1 GB) memory use, so that may not be relevant here. I'm sorry if I lead you down the wrong path.

aguspiza2 (orginal) [2018-04-03T22:44:57+02:00] view original

indeed no changes:


sudo pmap 28234
28234:   ./httpasync0181
0000561967f7a000    124K r-x-- httpasync0181
0000561968198000      4K r---- httpasync0181
0000561968199000      4K rw--- httpasync0181
000056196819a000     32K rw---   [ anon ]
0000561968b41000    124K rw---   [ anon ]
00007fde9f0e3000  84164K rw---   [ anon ]
00007fdea4314000   2172K r-x-- libgc.so.1.0.3
00007fdea4533000      4K r---- libgc.so.1.0.3
00007fdea4534000      4K rw--- libgc.so.1.0.3
00007fdea4535000    264K rw---   [ anon ]
00007fdea457c000    548K r-x-- ld-musl-x86_64.so.1
00007fdea460c000   1472K rw---   [ anon ]
00007fdea477c000      4K r--s- localtime
00007fdea477f000      8K -----   [ anon ]
00007fdea4781000     84K rw---   [ anon ]
00007fdea4796000      8K -----   [ anon ]
00007fdea4798000     84K rw---   [ anon ]
00007fdea47ad000      8K -----   [ anon ]
00007fdea47af000    340K rw---   [ anon ]
00007fdea4804000      4K r---- ld-musl-x86_64.so.1
00007fdea4805000      4K rw--- ld-musl-x86_64.so.1
00007fdea4806000     12K rw---   [ anon ]
00007fffec4e9000    132K rw---   [ pila ]
00007fffec5ec000      8K r----   [ anon ]
00007fffec5ee000      8K r-x--   [ anon ]
ffffffffff600000      4K r-x--   [ anon ]
 total            89624K

aguspiza2 (orginal) [2018-04-04T00:03:47+02:00] view original

For anyone interested I found that:


import asyncdispatch, asyncnet, os

var server = newAsyncSocket() #or newAsyncNativeSocket()
let x = readFile("/proc/self/status")
echo x

Uses 44MB with 0.18.x

Uses 4MB with 0.17.2

dom96 (orginal) [2018-04-04T00:18:52+02:00] view original

That is interesting. Async has indeed changed significantly, I have no idea what could account for such a large memory usage difference though. Will you dig deeper?

aguspiza2 (orginal) [2018-04-06T00:27:22+02:00] view original

I am thinking about bisecting it now that I have a simple test. Also now I have a Dockerfile to bootstrap any commit


FROM alpine
RUN apk add --no-cache g++ curl tar git
RUN git clone https://github.com/nim-lang/Nim.git;
RUN cd Nim; \
git clone --depth 1 https://github.com/nim-lang/csources.git;
ARG GITCOMMIT
RUN cd /Nim; \
git checkout -b testMemory; \
echo $GITCOMMIT; \
git reset --hard $GITCOMMIT
RUN cd /Nim/csources; \
sh build.sh; \
cd ../; \
bin/nim c koch; \
./koch boot -d:release; \
./koch tools;

aguspiza2 (orginal) [2018-04-06T23:52:52+02:00] view original

Found, big commit, but i will check it



d3394be5559c324da1c6b576d0ae9bfa966698d9 is the first bad commit
commit d3394be5559c324da1c6b576d0ae9bfa966698d9
Author: Dominik Picheta <[email protected]>
Date:   Wed Nov 22 14:43:10 2017 +0000
    
    Async upcoming (#6585)
    
    * Merge upcoming async with current.
    * Various improvements to selectors (mostly docs).
    
    Two changes to highlight:
    
    * Renamed setEvent to trigger
    * Reused setBlocking from nativesockets.
    * Various changes/fixes to asyncdispatch after upcoming merge.
    * Make some attempts to be compatible with older selectors.
    * Reuse epoll module in ioselectors_epoll.

:040000 040000 16b975f4a522742edfd7a5c97ad650dd4811b994 251521440149c2bba6c176f3e3d2ccba931b1efa M	lib
:040000 040000 46c44c82e288c889cecdb39d2fc565124e8c7590 34039d8cb5aee9718888cbf87ea77ddff885afb9 M	tests


/Nim # git bisect log
git bisect start
# bad: [e695f9d94e752fbf18012f9ee164af1ba0649fa1] make Nim compile with older nim versions
git bisect bad e695f9d94e752fbf18012f9ee164af1ba0649fa1
# good: [811fbdafd958443ddac98ad58c77245860b38620] Revert usage of non-standard uname -o command in build.sh (#6342)
git bisect good 811fbdafd958443ddac98ad58c77245860b38620
# good: [157d48bc3208720af98ac5ed60a08494ddc5e328] make 'nim doc2 system.nim' work again
git bisect good 157d48bc3208720af98ac5ed60a08494ddc5e328
# bad: [2bb2e6975e397bef1b320cd5dbafb6b3338fdaf0] Fix infinite recursion when using json.to on ref with cycle.
git bisect bad 2bb2e6975e397bef1b320cd5dbafb6b3338fdaf0
# good: [aac94e0ab58e9bdc896a59422d4ab9dcb87bb9e3] Don't run "kde-open" and "gnome-open" under Unix (#6426)
git bisect good aac94e0ab58e9bdc896a59422d4ab9dcb87bb9e3
# bad: [a372363190488d7267e545ce9ec6c81ccb79ed8c] Don't catch-all in asynchttpserver. It hides bugs.
git bisect bad a372363190488d7267e545ce9ec6c81ccb79ed8c
# bad: [a6226d9452194cc3b8d35a0d38d9f8c89ddf04f5] fixes #6682
git bisect bad a6226d9452194cc3b8d35a0d38d9f8c89ddf04f5
# good: [844e123061631d4cc5bc4300301d1865eda5a337] updated changelog to mention the removal of 'generic' and 'atomic'
git bisect good 844e123061631d4cc5bc4300301d1865eda5a337
# good: [85b00aff56c98f4da8f70b120dc889c18791fab9] fixes #6445 (concepts for the JS target)
git bisect good 85b00aff56c98f4da8f70b120dc889c18791fab9
# bad: [89fa4388574872c3e56e9e5030cdfe4d758d106c] use more make conventions (#6789)
git bisect bad 89fa4388574872c3e56e9e5030cdfe4d758d106c
# bad: [d3394be5559c324da1c6b576d0ae9bfa966698d9] Async upcoming (#6585)
git bisect bad d3394be5559c324da1c6b576d0ae9bfa966698d9
# good: [2c584cdb3d71f0cd811e3774c49db34735040032] use make default linker naming convention (#6787)
git bisect good 2c584cdb3d71f0cd811e3774c49db34735040032
# first bad commit: [d3394be5559c324da1c6b576d0ae9bfa966698d9] Async upcoming (#6585)

aguspiza2 (orginal) [2018-04-07T13:39:17+02:00] view original

Ok, i have found why:

In that commit there is a new templated implementation for newSelector() (0.17.2) to newSelectorT (0.17.3+) in linux specifically it is using epoll.


proc newSelector*[T](): Selector[T] =
  # Retrieve the maximum fd count (for current OS) via getrlimit()
  var a = RLimit()
  if getrlimit(RLIMIT_NOFILE, a) != 0:
    raiseOsError(osLastError())
  var maxFD = int(a.rlim_max)
  doAssert(maxFD > 0)
  
  var epollFD = epoll_create(MAX_EPOLL_EVENTS)
  if epollFD < 0:
    raiseOsError(osLastError())
  
  when hasThreadSupport:
    result = cast[Selector[T]](allocShared0(sizeof(SelectorImpl[T])))
    result.epollFD = epollFD
    result.maxFD = maxFD
    result.fds = allocSharedArray[SelectorKey[T]](maxFD)
  else:
    result = Selector[T]()
    result.epollFD = epollFD
    result.maxFD = maxFD
    result.fds = newSeq[SelectorKey[T]](maxFD)

As you see, it is allocating newSeqSelectorKey[T]

in my system (docker alpine) maxFD is set to 1M and SelectorKey is defined as:


SelectorKey[T] = object
      ident: int
      events: set[Event]
      param: int
      data: T

sizeof SelectorKey[AsyncData] is 40 so a simple math 40*1M = 40M

A simple workaround that uses 10x less memory:


proc newSelector*[T](): Selector[T] =
  # Retrieve the maximum fd count (for current OS) via getrlimit()
  var a = RLimit()
  if getrlimit(RLIMIT_NOFILE, a) != 0:
    raiseOsError(osLastError())
  var maxFD = int(a.rlim_max)
  doAssert(maxFD > 0)
  
  var epollFD = epoll_create(MAX_EPOLL_EVENTS)
  if epollFD < 0:
    raiseOsError(osLastError())
  
  when defined(maxFileDescriptors):
    const maxFileDescriptors {.intdefine.}: int = 132072
    if maxFD > maxFileDescriptors:
      maxFD = maxFileDescriptors
  
  when hasThreadSupport:
    result = cast[Selector[T]](allocShared0(sizeof(SelectorImpl[T])))
    result.epollFD = epollFD
    result.maxFD = maxFD
    result.fds = allocSharedArray[SelectorKey[T]](maxFD)
  else:
    result = Selector[T]()
    result.epollFD = epollFD
    result.maxFD = maxFD
    result.fds = newSeq[SelectorKey[T]](maxFD)

For a proper solution we should dinamically increase fds if we are requesting more than initialFileDescriptors file descriptors

dom96 (orginal) [2018-04-07T15:59:25+02:00] view original

Nice find :)

I think dynamically increasing FDs might be less efficient than allocating them all up front. Perhaps we could offer this as an option?

aguspiza2 (orginal) [2018-04-09T23:55:05+02:00] view original

If performance is the reason to avoid allocations, my suggestion is to allocate some initial amount which will be good for 99% of use cases and if that initial amount is reached and maxFD is bigger, reallocate one more time the full maxFD and copy from initial seq. This allocation will only affect to whoever really needs maxFD descriptors. Of course that initial amount should be 'definible' like:

-d:initialFileDescriptors=4096

Another reason to avoid more allocations or reallocations is thread contention, but my suggestion is quite simple and again will only apply to whoever needs all FDs

Mirror of forum.nim-lang.org

3713 :: Memory usage skyrocketed with nim 0.18.0 (in my async tcp service test)