I have identified a very subtle memory management error that took me weeks to characterize.
I'm directly translating a working C++ program to Nim. The problem seems to be that the Nim code erroneously addresses the seg array beyonds its bounds, because the k values used to update the nextp array are thrown off, which causes incorrect values in seg to accumulate as the segment iterations increase.
In Listing 1 I show the corresponding code where the problem occurs. In it I output the values shown to see the difference in values that start to occur between the C++ and Nim code. Here I output the values for k and seg[ki] before they are updated in the loop. Both versions output the same values for the first complete segment iteration process. Starting within the second iteration those values start to become different, as nextp is not getting the correct values in the Nim version.
Listing 1.
C++
for (uint j=0; j < pcnt; ++j){ // for each prime r1..sqrt(N)
if (nextp[row+j] < Kn) { // if 1st mult resgroup is within 'seg'
uint k = nextp[row+j]; // starting from this resgroup in 'seg'
uint ki = k*bprg + byti; // convert it to a byte address in seg[]
uint prime = primes[j]; // get prime, convert it to number of
uint prmstep = prime * bprg; // bytes to nextp primenth resgroup byte
for (; k < Kn; k += prime){ // for each primenth byte to end of 'seg'
cout << "(" << r << "," << prime << "," << k << "," << (seg[ki] & 255) << "," << biti << ") ";
seg[ki] |= biti; // mark restrack bit in byte as nonprime
ki += prmstep; // byte in next prime multiple resgroup
}
nextp[row+j] = k - Kn; // save 1st resgroup in next eligible seg
}
else nextp[row+j] -= Kn; // when 1st mult resgroup > seg resgroup range
}
-------------------------------------------------------------------------------
Nim
for j, prime in primes: # for each prime r1..sqrt(N)
if nextp[row + j] < uint(Kn): # if 1st mult resgroup is within 'seg'
var k = int(nextp[row + j]) # starting from this resgroup in 'seg'
var ki = k*bprg + byti # convert it to a byte addres in 'seg'
let prmstep = prime * bprg # set number of bytes to next prime mult
while k < Kn: # for each primenth byte to end of 'seg'
write(stdout, "(",r, ",", prime, ",", k,",",seg[ki], ",", biti,") ")
seg[ki] = seg[ki] or biti # mark restrack bit in byte as nonprime
ki += prmstep # byte in next prime multiple resgroup
k += prime # set resgroup of next prime multiple
nextp[row + j] = uint(k-Kn) # save 1st resgroup in next eligible seg
else: nextp[row+j] -= uint(Kn) # when 1st mult resgroup > seg resgroup range
In Listing 2 below, instead of outputting the values before they are updated I do it afterwards. This should throw a Segfault error, and did in the C++ code when it tried to output seg[ki] at an out-of-bound address. However, the Nim code keeps outputting out-of-bounds values for seg[ki] until the loop finished.
After hitting the docs I saw that compiling using --d:release turns off array bounds checking. So I compiled the Listing 2 code as: $ nim c -r file.nim and now the Nim output also gave a segfault error, but it didn't occur at the same spot (r and prime values) as the C++ compiled code.
So then I compiled the Listing 1 Nim code similarly, and it gave better results, but it still produced errors.
I'm convinced now that I can't rewrite the Nim code to produce the correct results because something is happening at the compile(r) level that I don't understand and can't control.
Is there some compile flag to be set, or something that can be done, to make the Nim code work correctly, and/or is this a bug in the compile process?
This was run with Nim 0.17.0, but I also got it to run under 0.12.0 (with a VB VM Linux distros) with same results.
Listing 2
C++
for (uint j=0; j < pcnt; ++j){ // for each prime r1..sqrt(N)
if (nextp[row+j] < Kn) { // if 1st mult resgroup is within 'seg'
uint k = nextp[row+j]; // starting from this resgroup in 'seg'
uint ki = k*bprg + byti; // convert it to a byte address in seg[]
uint prime = primes[j]; // get prime, convert it to number of
uint prmstep = prime * bprg; // bytes to nextp primenth resgroup byte
for (; k < Kn; k += prime){ // for each primenth byte to end of 'seg
seg[ki] |= biti; // mark restrack bit in byte as nonprime
ki += prmstep; // byte in next prime multiple resgroup
cout << "(" << r << "," << prime << "," << k << "," << (seg[ki] & 255) << "," << biti << ") ";
}
nextp[row+j] = k - Kn; // save 1st resgroup in next eligible seg
}
else nextp[row+j] -= Kn; // when 1st mult resgroup > seg resgroup range
}
-------------------------------------------------------------------------------
Nim
for j, prime in primes: # for each prime r1..sqrt(N)
if nextp[row + j] < uint(Kn): # if 1st mult resgroup is within 'seg'
var k = int(nextp[row + j]) # starting from this resgroup in 'seg'
var ki = k*bprg + byti # convert it to a byte addres in 'seg'
let prmstep = prime * bprg # set number of bytes to next prime mult
while k < Kn: # for each primenth byte to end of 'seg'
seg[ki] = seg[ki] or biti # mark restrack bit in byte as nonprime
ki += prmstep # byte in next prime multiple resgroup
k += prime # set resgroup of next prime multiple
write(stdout, "(",r, ",", prime, ",", k,",",seg[ki], ",", biti,") ")
nextp[row + j] = uint(k-Kn) # save 1st resgroup in next eligible seg
else: nextp[row+j] -= uint(Kn) # when 1st mult resgroup > seg resgroup range
Yes, better to avoid all the conversions between int and uint. Nim compiler does not really like uint, but can use it.
Note that your Nim arithmetics are not identical to C arithmetic. For example in Nim code you have
uint(k-Kn)
So you use signed values for substraction, and convert result to unsigned. C code does unsigned substraction. You may try
uint(k) - uint(Kn)
The issue isn't uint (uint(k-Kn) and unit(k) - uint(Kn) give same results as k >= Kn when leaving while loop).
As I said, when you compile under different directives you get different behavior and the C++ code works correctly using uints too.
Here are full working versions of the C++ and Nim code which the snippets came from you can compile, observe the stated behavior, and play with yourselves.
ssozp7test.nim -- Nim test code for SSoZ using P7 prime generator
https://gist.github.com/jzakiya/9fd7c777643a95f5fadd16a09d740c5d
ssozp7testcpp.cpp -- C++ test code for SSoZ using P7 prime generator
https://gist.github.com/jzakiya/39930b6573fd55383bf9abbf0b1c528f
I see that you are using the pairs iterator followed by a if condition in the Nim code. Can you check without? so that it's more in line with the C++ code?
for j, prime in primes: # for each prime r1..sqrt(N)
if nextp[row + j] < uint(Kn): # if 1st mult resgroup is within 'seg'
You can change the code yourself, but it doesn't change the behavior. I've tried every equivalent implementation I can think, and the problem still exists.
for j in 0.. <pcnt: # for each prime r1..sqrt(N)
if nextp[row + j] < uint(Kn): # if 1st mult resgroup is within 'seg'
var k = int(nextp[row + j]) # starting from this resgroup in 'seg'
var ki = k*bprg + byti # convert it to a byte addres in 'seg'
let prime = primes[j] # for this prime
let prmstep = prime * bprg # set number of bytes to next prime mult
while k < Kn: # for each primenth byte to end of 'seg'
The inexorable conclusion is there is a fundamental bug in the structure of the language which causes this error to occur. The good thing is now you know it exists and can find and fix it (remember when the GIMPS folks found a hardware glitch in Intel chips). You're only at 0.17, so it's good this has been identified now rather than later. This is a benefit of people using/testing the language against reference code from other languages.
The inexorable conclusion
Does your code really only works when uint is used? Because most of us try to avoid using uint, so uint is not that much tested.
Have you tried without optimization or different C compiler backend?
Have you tried compiling to C++ instead of C? (nim cpp test.nim)?
I think you know C very well, so maybe you can inspect the intermediate C code in nimcache directory for bugs?
Actually I'm not a C/C++ programmer, I'm (mostly now) a Ruby programmer who is functional in C++, and haven't really done anything in C of any note.
I've been looking to see how to view the C/C++ that Nim is generating but I don't know how to do that. Can you show me how to do that?
Remember, I'm a Nim newbie, though I've been programming for over 40 years. I know how to program what I want to do, I just have to figure out the linguistics of the language to make it do what I want.
Oh, I forgot to add. The nature of the problem is this:
If an input value only requires one segment slice you get the correct answers because the nextp table uses its initial values to process the seg array. However, nextp is being updated with wrong k values, so when you process more than one segment slice you start to get more prime candidates (pcs) being marked as non-prime than should be. This means the updated k values being computed and stored in nextp are too small, because they are knocking out more pcs than should be.
If you look at the total primes count, the Nim version gives a smaller and smaller total primecnt as more slices are processed than the correct answers with the C++ code.
I think something is happening in the loop/conditional structure that is making these k values too small and causes more pcs to be marked in seg. But for the life of me I have no idea of why that is happening and what's causing it. It has to be happening under the hood, in the internals of the language, because the structure of the source code should work in Nim, like its does in C++.
The inexorable conclusion is there is a fundamental bug in the structure of the language which causes this error to occur.
Are you sure about this? I found at least one difference between the C++ and the Nim implementation (the first loop in next_pinit starts with 1 in C++ and and with 0 in Nim. While this doesn't seem to change the behavior, it means that you can't rule out that there are other subtle differences. (I only browsed the code briefly, I didn't check it for equivalence.)
It is of course entirely possible that there's a compiler bug (there've been others before, after all), but I'd like somewhat stronger evidence that this isn't just a subtle semantic difference between the two implementations before going down that rabbit hole.
Can you show me how to do that?
When you are using the Nim compiler from the command line, then in the current working directory a subdirectory with name nimcache is created. For example "nim c test.nim" creates "nimcache/test.c". You can inspect that intermediate C code. When you have clang installed, you may try "nim c --cc:clang myprog.nim" to use the clang compiler. Or try compiling with and without option "-d:release" to see if it makes a difference. Other C compiler options you may specify in the global or local nim.cfg file. For example I generally use in my working directory something like
path:"$projectdir"
nimcache:"/tmp/$projectdir"
gcc.options.speed = "-march=native -O3 -flto -fstrict-aliasing"
fstrict-aliasing may not work properly as Mr R. Behrends recently told us, so use better -fno-strict-aliasing which is the default. -flto is link time optimazion, it gives smaller executables generally. -O3 is highest gcc optimazion, you may try -O0, -O1, -O2 also. And also without -march=native.
Do you have a recent, but not too new gcc version?
The only operator with this gotcha is < right? Maybe it would be best to just deprecate it. Is there unary >?
By the way, Nim manual has a part with the discouraged syntax here
proc transposed*(m: AnyMatrix): m.TransposedType =
for r in 0 .. <m.R:
for c in 0 .. <m.C:
result[r, c] = m[c, r]
1) In nextp_init in the line below, i can start from eithre '0' or '1'.
for i in 0|1..<rescnt: pos[residues[i]] = i-1 # start from either 0 or 1
2) The program needs to use unsigned math, so yes, it needs to use uints. I had a hard enough time to get all the type casting compatible, but if you want to rewrite it to not use ints you wll reduce the size of the inputs you can handle. Besides, uints are as old as programming, so shouldn't you fix making it work correctly rather than discourage its use?
3) I've compiled using clang and gcc with: --cc:[clang:gcc]. In a Manjaro VB VM is has clang 4.0.1 and gcc 7.1.1, and I've used another VM that has gcc 7.2.1.
As stated in previous posts, I've comiled the Nim source under multiple directives, which just emphasizes the point to me the issue isn't the source code but its interpretation by the compiler, otherwise it would work correctly no matter how it is compiled, though its performance my differ.
But again, the source is available for anyone to play with.
Besides, uints are as old as programming, so shouldn't you fix making it work correctly rather than discourage its use?
They silently overflow. What does it even mean to "make it work correctly"? They don't, hence we discourage them.
Yes, that was the problem!!
for b in 0 .. <Kn*bprg: seg[b] = 0 vs for b in 0 .. <(Kn*bprg): seg[b] = 0
I was half right, it wasn't memory management (directly) but it damn sure was subtle. :-)
This makes perfectly good sense now. The seg array wasn't being completely cleared for each segment slice, so bits set from previous slices were still set.
Actually, I'm glad it was this than a real compiler error, because as comments have stated, maybe this style of operators can be changed, and/or parsing the semantics can either follow C++, or throw an error.
It now makes me wonder what did the parser/compiler do with the multiplication by bprg since it didn't apply the multiplication value as the iteration max? So there also seems to be the issue of why this didn't throw an error for the extraneous part of the expression that seemingly wasn't applied?
But a very Big Thanks to Jehan for finding this. I was at my wits end.
I also notice at the bottom of the segsieve function I have this code:
for byt in seg[0..<Kn*bprg]:
but this works. So are you saying there is a difference between 0..<Kn*bprg and 0 .. <Kn*bprg?
If so, man, that's really brutal, and totally non-intuitive.
Please clean this up so that people with hair won't end up losing it over stuff like this. :-)
So are you saying there is a difference between 0..<Kn*bprg and 0 .. <Kn*bprg?
The difference is between ..< as a binary operator and < as a unary operator.
Briefly, <x is shorthand for pred(x), while x ..< y is shorthand for x..pred(y). What makes the difference in your code is that < as a unary operator has (unintuively) higher precedence than *, so <Kn*bprg is parsed as (<Kn)*brpg, which translates to (Kn-1) * bprg. This problem does not occur with ..< as a binary operator, which has a lower precedence than multiplication (and if it didn't, you'd get a type error, because you'd be multiplying a range with an integer).
Wow, that IS quite bit surprising! It seemed to me ..< was originally meant to be more-or-less equivalent to .. < (or maybe even an anti-typo trick?). I'm glad I never use unary < to begin with. :)
@Araq I like the idea. One of my friends who's coding a hobby project in Nim always avoids < so much that he doesn't even use ..<. Maybe depreciating < would encourage such people? I never had an issue with that but I imagine unary < could also potentially lead to nasty bugs with typos when using unusual bool operators, consider:
let x = -7
let arr = [-1, 0, 1]
echo arr[(int)(-10<x) + (int)(x<-5)] # 1 (arr[1])
echo arr[(int)(-10<x) + (int)(x-<5)] # error (arr[-10])
# even worse if the result is in bounds
It may look silly but one of my friends, a C coder, once typoed -= for =- (which would never happen in Nim, hurray!) in a really simple function handling some byte buffers.
Deprecating unary < is growing on me, it's always used with .. and so "..<" can be used instead.
+1