So I'm compiling some new code in 0.18.0.
This code code compiles and runs correctly as one process with no problems.
proc segsieve(Kn: int) = # for Kn resgroups|bytes in segment
...
for indx in 0..5: # for nextp row indexes,
var r = indx + 1 # indx|r: 0 -> 1, 1 -> 2, 2 -> 3, 3 -> 4
if indx > 3: r += 1 # indx|r: 4 -> 6, 5 -> 7 restracks in seg
let biti = uint8(1 shl r) # set its residue track bit mask
let row = indx * pcnt # set address to its restrack in 'nextp'
for j, prime in primes: # for each prime r1..sqrt(N)
if nextp[row + j] < Kn.uint: # if 1st mult resgroup is within 'seg'
var k = int(nextp[row + j]) # starting from this resgroup in 'seg'
while k < Kn: # for each primenth byte to end of 'seg'
seg[k] = seg[k] or biti # mark restrack bit in byte as nonprime
k += prime # compute next prime multiple resgroup
nextp[row + j] = uint(k-Kn) # save 1st resgroup in next eligible seg
else: nextp[row+j] -= Kn.uint # do if 1st mult resgroup not within seg
...
Then I separate it into 2 processes, to make it parallel ready, but first run it single threaded.
proc residue_sieve(Kn, r, indx: int) =
let biti = uint8(1 shl r) # set its residue track bit mask
let row = indx * pcnt # set address to its restrack in 'nextp'
for j, prime in primes: # for each prime r1..sqrt(N)
if nextp[row + j] < Kn.uint: # if 1st mult resgroup is within 'seg'
var k = int(nextp[row + j]) # starting from this resgroup in 'seg'
while k < Kn: # for each primenth byte to end of 'seg'
seg[k] = seg[k] or biti # mark restrack bit in byte as nonprime
k += prime # compute next prime multiple resgroup
nextp[row + j] = uint(k-Kn) # save 1st resgroup in next eligible seg
else: nextp[row+j] -= Kn.uint # do if 1st mult resgroup not within seg
proc segsieve(Kn: int) = # for Kn resgroups|bytes in segment
...
for indx in 0..5: # for nextp row indexes,
var r = indx + 1 # indx|r: 0 -> 1, 1 -> 2, 2 -> 3, 3 -> 4
if indx > 3: r += 1 # indx|r: 4 -> 6, 5 -> 7 restracks in seg
residue_sieve(Kn,r,indx) # sieve on just the necessary restracks
...
It compiles with no problems, and runs correctly, until the input values gets past a certain size, then it throws a runtime SEGFAULT error.
SIGSEGV: Illegal storage access. (Attempt to read from nil?)
This seems like a clear compiler error. Doing the same algorithm|work in one process compiles|runs correctly, while breaking it into 2 processes compiles, but only works correctly for input values below some ERROR value.
It seems the error occurs reading from nextp[] or seg[]'s mem. I have no idea why it's doing that, or how to fix it.
Also, the single process compiled binary (111128) is about 18K bytes smaller than for the separated processes (129991).
Turn of your OS's memory overcommitment.
This seems like a clear compiler error.
On the contrary, it looks like a clear programming bug / OOM.
I have no idea why it's doing that, or how to fix it.
A full example we can compile&run would be helpful.
Turn of your OS's memory overcommitment.
I have no idea what this means. Giving an example would be helpful next time.
Unfortunately, 0.18.0 has some bigtime regressions. I compiled the same code in 0.17.2, and not only does it compile and run correctly for all input values, the compiled binary went from ~ 112K bytes to ~85K bytes. So I guess I'm sticking with 0.17.2 for awhile.
Below are the source files:
twinprimes_ssozp5a2.nim - sieve as single process
https://gist.github.com/jzakiya/8f7f4f0dc8c9efd70870a1b3449c60cc
twinprimes_ssozp5a2a.nim - sieve separated into 2 processes
https://gist.github.com/jzakiya/308b20892013f41d808c926b30e9b94d
Compile as (directions in beginning of file comments)
$ nim c --cc:gcc --d:release --gc:none twinprimes_ssozp5a2|a.nim
and run and enter big number e.g. 500 billion (500_000_000_000)
$ ./twinprimes_ssozp5a2|a
Enter integer number: 50000000000
this takes ~246 secs (4 minutes, single thread), on my I7, 3.5GHz, 64-bit Linux distro laptop
Large values like this crash twin...5a2a with 0.18.0, but both programs run perfectly with 0.17.2.
@jzakiya what do you mean exactly with the word "process"? I do not see usage of osproc module in your code...
Also, what is an input value for which you get that error?
Hello,
I tested your code on a similar configuration (Linux 64-bit i7-2675QM 2.20GHz). With an input of 500 billions, the system had to make use of the swap and then crashed.
Moreover for 50 billions, with --gc:none:
$ /usr/bin/time ./twinprimes_ssozp5a2a
Enter integer number: 50000000000
segment has 262144 bytes and residues groups
prime candidates = 13333333333; resgroups = 1666666667
create nextp[6x19907] array
perform Twin Prime Segmented SoZ
last segment = 217259 resgroups; segment slices = 6358
total twins = 118903682; last twin = 49999999590+/-1
total time = 30.972 secs
elapsed = 0:33.04 s
user = 30.37 s
system = 0.66 s
CPU = 93%
Mem = 0 kB
Mmax= 1659060 kB
inputs = 0
outputs = 0
swaps = 0
With default gc:
Enter integer number: 50000000000
segment has 262144 bytes and residues groups
prime candidates = 13333333333; resgroups = 1666666667
create nextp[6x19907] array
perform Twin Prime Segmented SoZ
last segment = 217259 resgroups; segment slices = 6358
total twins = 118903682; last twin = 49999999590+/-1
total time = 29.736 secs
elapsed = 0:32.34 s
user = 29.73 s
system = 0.00 s
CPU = 91%
Mem = 0 kB
Mmax= 6980 kB
inputs = 0
outputs = 0
swaps = 0
1659060 kB max memory without gc, 6980 kB with gc.
500 billions, with default GC:
Enter integer number: 500000000000
segment has 262144 bytes and residues groups
prime candidates = 133333333333; resgroups = 16666666667
create nextp[6x57081] array
perform Twin Prime Segmented SoZ
last segment = 75435 resgroups; segment slices = 63579
total twins = 986222314; last twin = 499999999062+/-1
total time = 361.130 secs
elapsed = 6:10.44 s
user = 361.12 s
system = 0.01 s
CPU = 97%
Mem = 0 kB
Mmax= 6376 kB
inputs = 0
outputs = 0
swaps = 0
I can speculate that the new memory allocator introduced in the 0.18.0 release explains the difference of behaviour between 0.17.2 and 0.18.0. Anyway, it seems you can't afford to not use the GC for this program, or you would have to manually manage memory.
jzakiya, you may wonder why your program eats that much memory.
I had just looked at you code, and tried this:
#for byt in seg[0..Kn-1]: # count the twin primes in the segment
# primecnt += uint(pbits[byt]) # count the '0' bit pairs as twin primes
for jj in 0 .. Kn-1: # count the twin primes in the segment
primecnt += uint(pbits[seg[jj]])
$ nim c --cc:gcc --d:release --gc:none h.nim
$ ./h
Enter integer number: 500_000_000_000
segment has 262144 bytes and residues groups
prime candidates = 133333333333; resgroups = 16666666667
create nextp[6x57081] array
perform Twin Prime Segmented SoZ
last segment = 75435 resgroups; segment slices = 63579
total twins = 986222314; last twin = 499999999062+/-1
total time = 205.109 secs
Seems to work now, so that part of your code seems to generate a copy of the seq for each call.
Stafan_Salewski, yes, thanks for reminding about that again.
The twinprimes code is old (almost a year, using 0.17.x), when I was just getting seriously into converting C++ code into Nim. Using the original construct for that loop worked, because I always compiled with gc on. Then I started playing with compiling with --gc:none to see the difference. If you open a terminal window with htop while the program ran you can see memory being eaten up in real time - fascinating. I ultimately found that problem too, and all current versions of all my code has corrected that implementation. But when 0.18.0 came out, I realized I had never updated THAT original code, and used it to update with (it works fine doing a default compile). As soon as you showed me that I looked at my current codebase and they all had the correction, so they can run without gc and not eat up memory.
But to the original issue, I think I've traced it to the differences in gcc versions.
My base system on my laptop is old, and I've never (yet) gotten around to updating it. It uses gcc 4.9.2. So what I observed occurred on that configuration. Today I installed 0.18.0 in a VB with the updated distro of my base system, which uses gcc 7.3.0 and voila! the programs run with-or-without compiling with gc on|off. So that seems to be the cause of the different behaviors in runtimes (as they all compile on both systems). (Actually, this is one reason|excuse I keep my old configuration around, to see what stuff breaks on it.)
One lesson learned: don't change the compiler option settings I put in my code for the exact reason why it's there.
I'm going to run tests in other VB distro instances, to make sure the code runs on those systems too. But this makes me feel better that its not 0.18.0 per se, but which compiler is used with it. If I really feel motivated to experiment, I may see what the behavior is on both systems compiling with clang.