I have done a little test to find out if I could run some microservices with async code in memory constrained devices. Here is the test code:
https://gist.github.com/aguspiza/80e34b5cf65aa3bbfd19c7339ee9b695
That test uses 5x more memory with 0.18.0 than with 0.17.2 (or any previous version at least with boehm gc)
Here you have some numbers:
Linux x64 0.17.2 boehm 24MB used (release 14MB)
Linux x64 0.18.0 boehm 94MB used (release 84MB)
other gc's have similar increases +40MB:
0.17.2 default gc 70MB (release 30-36MB)
0.18.0 default gc 146MB (release 110MB)
similar result with gc:v2 146MB (release 110MB)
I am getting those numbers from RSS column on "ps"
$ ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 13946 38.9 0.0 18024 15136 pts/0 Sl+ 11:55 0:22 ./httpasync
Is this a known issue?
Is this expected from latest gc optimization (TLSF algorithm) in 0.18.0?
Why is there so much difference between gc's?
Should I use boehm, markandsweep, v2 or default gc?
Boehm is giving the best results but Araq told me (at IRC) that it is not using Nim's memory allocator ?!?!
Thanks Araq, right now I am using "clients" to avoid deallocating memory, so that I can get peak memory use. In the future, it will be used to dispatch messages to specific connected clients. This service should also remove clients from that sequence when they disconnect. Sorry, I will try to clean up the code so that the test makes more "sense", this may help me to find out which part is allocating those extra 70MB (Sorry for the long post):
sudo pmap 26153
26153: httpasync/src/httpasync018
0000556fff75a000 124K r-x-- httpasync018
0000556fff978000 4K r---- httpasync018
0000556fff979000 4K rw--- httpasync018
0000556fff97a000 32K rw--- [ anon ]
0000557000ed2000 124K rw--- [ anon ]
00007f2806a93000 84164K rw--- [ anon ]
00007f280bcc4000 2172K r-x-- libgc.so.1.0.3
00007f280bee3000 4K r---- libgc.so.1.0.3
00007f280bee4000 4K rw--- libgc.so.1.0.3
00007f280bee5000 264K rw--- [ anon ]
00007f280bf2c000 548K r-x-- ld-musl-x86_64.so.1
00007f280bfbc000 1472K rw--- [ anon ]
00007f280c12c000 4K r--s- localtime
00007f280c12f000 8K ----- [ anon ]
00007f280c131000 84K rw--- [ anon ]
00007f280c146000 8K ----- [ anon ]
00007f280c148000 84K rw--- [ anon ]
00007f280c15d000 8K ----- [ anon ]
00007f280c15f000 340K rw--- [ anon ]
00007f280c1b4000 4K r---- ld-musl-x86_64.so.1
00007f280c1b5000 4K rw--- ld-musl-x86_64.so.1
00007f280c1b6000 12K rw--- [ anon ]
00007ffdf4ad5000 132K rw--- [ pila ]
00007ffdf4b9c000 8K r---- [ anon ]
00007ffdf4b9e000 8K r-x-- [ anon ]
ffffffffff600000 4K r-x-- [ anon ]
total 89624K
versus 0.17.2
sudo pmap 26094
26094: httpasync/src/httpasync
000055b617f37000 104K r-x-- httpasync
000055b618150000 4K r---- httpasync
000055b618151000 4K rw--- httpasync
000055b618152000 28K rw--- [ anon ]
000055b6194e0000 120K rw--- [ anon ]
00007fe5853e4000 11648K rw--- [ anon ]
00007fe585f44000 2172K r-x-- libgc.so.1.0.3
00007fe586163000 4K r---- libgc.so.1.0.3
00007fe586164000 4K rw--- libgc.so.1.0.3
00007fe586165000 264K rw--- [ anon ]
00007fe5861ac000 548K r-x-- ld-musl-x86_64.so.1
00007fe586235000 1512K rw--- [ anon ]
00007fe5863af000 8K ----- [ anon ]
00007fe5863b1000 84K rw--- [ anon ]
00007fe5863c6000 8K ----- [ anon ]
00007fe5863c8000 84K rw--- [ anon ]
00007fe5863dd000 8K ----- [ anon ]
00007fe5863df000 340K rw--- [ anon ]
00007fe586434000 4K r---- ld-musl-x86_64.so.1
00007fe586435000 4K rw--- ld-musl-x86_64.so.1
00007fe586436000 12K rw--- [ anon ]
00007ffebaff6000 132K rw--- [ pila ]
00007ffebb124000 8K r---- [ anon ]
00007ffebb126000 8K r-x-- [ anon ]
ffffffffff600000 4K r-x-- [ anon ]
total 17116K
Anyway, I understand that it should work (leak) in a similar way regardless of the gc or nim version used. There is a fixed number of clients that are connecting, so that sequence will grow to 1000 and stop there. I am measuring memory use after all clients (1000) have sent and received 1000 messages, so it should not depend on the speed they are added.
jcosborn, I have been doing all tests with "official" docker images (nimlang/nim:0.xx.y-alpine), and nimlang/nim:devel reports version 0.15.3. I have been looking for the original Dockerfile of those images to build an updated devel branch but it seems it is actually generated with nim code in a very smart way :) I'll put some time on this and report back.
indeed no changes:
sudo pmap 28234
28234: ./httpasync0181
0000561967f7a000 124K r-x-- httpasync0181
0000561968198000 4K r---- httpasync0181
0000561968199000 4K rw--- httpasync0181
000056196819a000 32K rw--- [ anon ]
0000561968b41000 124K rw--- [ anon ]
00007fde9f0e3000 84164K rw--- [ anon ]
00007fdea4314000 2172K r-x-- libgc.so.1.0.3
00007fdea4533000 4K r---- libgc.so.1.0.3
00007fdea4534000 4K rw--- libgc.so.1.0.3
00007fdea4535000 264K rw--- [ anon ]
00007fdea457c000 548K r-x-- ld-musl-x86_64.so.1
00007fdea460c000 1472K rw--- [ anon ]
00007fdea477c000 4K r--s- localtime
00007fdea477f000 8K ----- [ anon ]
00007fdea4781000 84K rw--- [ anon ]
00007fdea4796000 8K ----- [ anon ]
00007fdea4798000 84K rw--- [ anon ]
00007fdea47ad000 8K ----- [ anon ]
00007fdea47af000 340K rw--- [ anon ]
00007fdea4804000 4K r---- ld-musl-x86_64.so.1
00007fdea4805000 4K rw--- ld-musl-x86_64.so.1
00007fdea4806000 12K rw--- [ anon ]
00007fffec4e9000 132K rw--- [ pila ]
00007fffec5ec000 8K r---- [ anon ]
00007fffec5ee000 8K r-x-- [ anon ]
ffffffffff600000 4K r-x-- [ anon ]
total 89624K
For anyone interested I found that:
import asyncdispatch, asyncnet, os
var server = newAsyncSocket() #or newAsyncNativeSocket()
let x = readFile("/proc/self/status")
echo x
Uses 44MB with 0.18.x
Uses 4MB with 0.17.2
I am thinking about bisecting it now that I have a simple test. Also now I have a Dockerfile to bootstrap any commit
FROM alpine
RUN apk add --no-cache g++ curl tar git
RUN git clone https://github.com/nim-lang/Nim.git;
RUN cd Nim; \
git clone --depth 1 https://github.com/nim-lang/csources.git;
ARG GITCOMMIT
RUN cd /Nim; \
git checkout -b testMemory; \
echo $GITCOMMIT; \
git reset --hard $GITCOMMIT
RUN cd /Nim/csources; \
sh build.sh; \
cd ../; \
bin/nim c koch; \
./koch boot -d:release; \
./koch tools;
Found, big commit, but i will check it
d3394be5559c324da1c6b576d0ae9bfa966698d9 is the first bad commit
commit d3394be5559c324da1c6b576d0ae9bfa966698d9
Author: Dominik Picheta <[email protected]>
Date: Wed Nov 22 14:43:10 2017 +0000
Async upcoming (#6585)
* Merge upcoming async with current.
* Various improvements to selectors (mostly docs).
Two changes to highlight:
* Renamed setEvent to trigger
* Reused setBlocking from nativesockets.
* Various changes/fixes to asyncdispatch after upcoming merge.
* Make some attempts to be compatible with older selectors.
* Reuse epoll module in ioselectors_epoll.
:040000 040000 16b975f4a522742edfd7a5c97ad650dd4811b994 251521440149c2bba6c176f3e3d2ccba931b1efa M lib
:040000 040000 46c44c82e288c889cecdb39d2fc565124e8c7590 34039d8cb5aee9718888cbf87ea77ddff885afb9 M tests
/Nim # git bisect log
git bisect start
# bad: [e695f9d94e752fbf18012f9ee164af1ba0649fa1] make Nim compile with older nim versions
git bisect bad e695f9d94e752fbf18012f9ee164af1ba0649fa1
# good: [811fbdafd958443ddac98ad58c77245860b38620] Revert usage of non-standard uname -o command in build.sh (#6342)
git bisect good 811fbdafd958443ddac98ad58c77245860b38620
# good: [157d48bc3208720af98ac5ed60a08494ddc5e328] make 'nim doc2 system.nim' work again
git bisect good 157d48bc3208720af98ac5ed60a08494ddc5e328
# bad: [2bb2e6975e397bef1b320cd5dbafb6b3338fdaf0] Fix infinite recursion when using json.to on ref with cycle.
git bisect bad 2bb2e6975e397bef1b320cd5dbafb6b3338fdaf0
# good: [aac94e0ab58e9bdc896a59422d4ab9dcb87bb9e3] Don't run "kde-open" and "gnome-open" under Unix (#6426)
git bisect good aac94e0ab58e9bdc896a59422d4ab9dcb87bb9e3
# bad: [a372363190488d7267e545ce9ec6c81ccb79ed8c] Don't catch-all in asynchttpserver. It hides bugs.
git bisect bad a372363190488d7267e545ce9ec6c81ccb79ed8c
# bad: [a6226d9452194cc3b8d35a0d38d9f8c89ddf04f5] fixes #6682
git bisect bad a6226d9452194cc3b8d35a0d38d9f8c89ddf04f5
# good: [844e123061631d4cc5bc4300301d1865eda5a337] updated changelog to mention the removal of 'generic' and 'atomic'
git bisect good 844e123061631d4cc5bc4300301d1865eda5a337
# good: [85b00aff56c98f4da8f70b120dc889c18791fab9] fixes #6445 (concepts for the JS target)
git bisect good 85b00aff56c98f4da8f70b120dc889c18791fab9
# bad: [89fa4388574872c3e56e9e5030cdfe4d758d106c] use more make conventions (#6789)
git bisect bad 89fa4388574872c3e56e9e5030cdfe4d758d106c
# bad: [d3394be5559c324da1c6b576d0ae9bfa966698d9] Async upcoming (#6585)
git bisect bad d3394be5559c324da1c6b576d0ae9bfa966698d9
# good: [2c584cdb3d71f0cd811e3774c49db34735040032] use make default linker naming convention (#6787)
git bisect good 2c584cdb3d71f0cd811e3774c49db34735040032
# first bad commit: [d3394be5559c324da1c6b576d0ae9bfa966698d9] Async upcoming (#6585)
Ok, i have found why:
In that commit there is a new templated implementation for newSelector() (0.17.2) to newSelectorT (0.17.3+) in linux specifically it is using epoll.
proc newSelector*[T](): Selector[T] =
# Retrieve the maximum fd count (for current OS) via getrlimit()
var a = RLimit()
if getrlimit(RLIMIT_NOFILE, a) != 0:
raiseOsError(osLastError())
var maxFD = int(a.rlim_max)
doAssert(maxFD > 0)
var epollFD = epoll_create(MAX_EPOLL_EVENTS)
if epollFD < 0:
raiseOsError(osLastError())
when hasThreadSupport:
result = cast[Selector[T]](allocShared0(sizeof(SelectorImpl[T])))
result.epollFD = epollFD
result.maxFD = maxFD
result.fds = allocSharedArray[SelectorKey[T]](maxFD)
else:
result = Selector[T]()
result.epollFD = epollFD
result.maxFD = maxFD
result.fds = newSeq[SelectorKey[T]](maxFD)
As you see, it is allocating newSeqSelectorKey[T]
in my system (docker alpine) maxFD is set to 1M and SelectorKey is defined as:
SelectorKey[T] = object
ident: int
events: set[Event]
param: int
data: T
sizeof SelectorKey[AsyncData] is 40 so a simple math 40*1M = 40M
A simple workaround that uses 10x less memory:
proc newSelector*[T](): Selector[T] =
# Retrieve the maximum fd count (for current OS) via getrlimit()
var a = RLimit()
if getrlimit(RLIMIT_NOFILE, a) != 0:
raiseOsError(osLastError())
var maxFD = int(a.rlim_max)
doAssert(maxFD > 0)
var epollFD = epoll_create(MAX_EPOLL_EVENTS)
if epollFD < 0:
raiseOsError(osLastError())
when defined(maxFileDescriptors):
const maxFileDescriptors {.intdefine.}: int = 132072
if maxFD > maxFileDescriptors:
maxFD = maxFileDescriptors
when hasThreadSupport:
result = cast[Selector[T]](allocShared0(sizeof(SelectorImpl[T])))
result.epollFD = epollFD
result.maxFD = maxFD
result.fds = allocSharedArray[SelectorKey[T]](maxFD)
else:
result = Selector[T]()
result.epollFD = epollFD
result.maxFD = maxFD
result.fds = newSeq[SelectorKey[T]](maxFD)
For a proper solution we should dinamically increase fds if we are requesting more than initialFileDescriptors file descriptors
Nice find :)
I think dynamically increasing FDs might be less efficient than allocating them all up front. Perhaps we could offer this as an option?
If performance is the reason to avoid allocations, my suggestion is to allocate some initial amount which will be good for 99% of use cases and if that initial amount is reached and maxFD is bigger, reallocate one more time the full maxFD and copy from initial seq. This allocation will only affect to whoever really needs maxFD descriptors. Of course that initial amount should be 'definible' like:
-d:initialFileDescriptors=4096
Another reason to avoid more allocations or reallocations is thread contention, but my suggestion is quite simple and again will only apply to whoever needs all FDs