Hi
I have programmed in a few languages extensively in my life, though now I primarly use Ruby.
I just found out about Nim literlly 3 weeks ago now (mid July 2017) after reading this article: Nim for the discerning Rubyist http://www.rubyflow.com/p/1j08ef-nim-for-the-discerning-rubyist
I primarly use Ruby to apply to math/science problems/projects because it's so easy to programm in, and it allows me to think about how to solve problems without worrying about how to code it. I've also played around with Crystal (aka Ruby on steroids) but it's still young, and doesn't let me (currently) do what I want, or without a lot of hassles.
However Nim looks promising, especially it's parallel programing features (if real and not brain destroying).
So here's a request, and maybe a challenge, to the Nimers (Nimrods, Nimists, Nimsters,..) gurus.
I developed a prime sieve called the Sieve of Zakiya (SoZ), and it's companion the Segmented Sieve of Zakiya (SSoZ).
I wrote a paper, The Segmented Sieve of Zakiya (SSoZ) which describes its mathematical foundations and algorithm, and provides a working cpp implementation at the end of the paper (compiled, run, and verified). It's a relatively short program.
Here's a link to read and download the paper:
https://www.scribd.com/doc/228155369/The-Segmented-Sieve-of-Zakiya-SSoZ
My request/challenge is for someone(s) who are good at Nim to translate the code into idiomatic Nim, to demonstrate the best way to do this algorithm in Nim. Extra points if someone can do a parallel version of the algorithm, which I attempted to do using OpemMP, but what I did didnt' seem to make the code faster than the serial version.
I know this may be a lot to ask, but I learn best from code doing problems I understand, like the Rosetta Code benchmarks, but those are too short to learn enough for this algorithm.
Ultimately, I'd like to publish to results of benchmarks in different languages doing the SSoZ in an updated paper.
If anyone has any question, I'd be pleased to answer them as best I can.
Thanks in advance.
Jabari
Honestly, this sounds like an interesting little project to bang out. However, I must admit that I am a humble Web Developer and, by extension, am not well versed in mathematics.
The best I could do is translate your C code in the linked document into Nim and maybe make a few small changes. However, without a deeper understanding of your algorithm, I doubt I would be able to optimize it in any meaningful way. It seems like your paper does a good job explaining the concepts behind the algorithm, but I do not have enough free time at the moment to work on this as I am approaching a deadline to launch a web application.
My suggestion is for you to possibly try and implement the algorithm in Nim yourself. You will find it to be surprisingly simple, I'm sure. If you then link to the repository here, we can help you by reviewing the code and making pull requests.
Thanks. I'm in the process of learning enough Nim to take on this effort. However, I would do what you were thinking, and first try a comparable direct C++ to Nim translation, just to get something to work. But it's going to be some time (with my time/learning curve constraints) to create something workable.
I'm hoping someone who's really fluent in Nim, and who thinks in Nim, can perform the algorithm in a Nimby way that best showcases its capabilites. Once you get your head around the algorithm (and/or math) it shouldn't be too hard to understand what the C++ code is doing. Then someone versed in Nim can structure the alogrithm, particularly a true parallel processing version, in an efficient manner.
The link I provided in my first post allows you to read/download it free from my Scribd site. Here it is again.
https://www.scribd.com/doc/228155369/The-Segmented-Sieve-of-Zakiya-SSoZ
I've tested it many times, and there is no filter/paywall to get around. Let me know if this doesn't work and I'll provide another method to get to it.
Sounds interesting, I might try doing a direct port from cpp to nim when I get some free time.
Btw, download is indeed behind a paywall (looks like scribd offers a month free but after that it's paid). Reading is free though.
Copy pasting from the pdf is impossible, as latex or whatever mangles the symbols, but I found a gist here
Also found this link in the references https://www.4shared.com/dir/TcMrUvTB/sharing.html but it won't let me download anything.
I'd suggest setting up a github pages website with an html version (pandoc or similar can convert most formats to html) or at least some kind of git repo with the code, as it's free and most people here are used to cloning things to look at them on their pc (at least I am).
OK, thanks for the feedback on your problems downloading the paper.
Can you tell me what platform you are using, and what country you're accessing from?
I see from my Scribd stats that the paper is being downloaded so it appears to be working for others.
I'll look at putting the paper up on github, though.
Below is the Nim direct translation of the C++ code in my paper. I have verified it on Nim 0.17 on Linux. In comparison on this same machine the Nim code runs a bit faster for some (larger >= 10^9) test values (not comprehensibly benchmarked yet).
I'll post it to a gist later to make it easier to download. I'm now re-architecting it to make it parallel friendlier (I hope). Feedback welcome on style, and performance tips.
ssozp5.nim
#[
This Nim source file will compile to an executable program to
perform the Segmented Sieve of Zakiya (SSoZ) to find primes <= N.
It is based on the P5 Strictly Prime (SP) Prime Generator.
Prime Genrators have the form: mod*k + ri; ri -> {1,r1..mod-1}
The residues ri are integers coprime to mod, i.e. gcd(ri,mod) = 1
For P5, mod = 2*3*5 = 30 and the number of residues are
rescnt = (2-1)(3-1)(5-1) = 8, which are {1,7,11,13,17,19,23,29}.
Adjust segment byte length parameter B (usually L1|l2 cache size)
for optimum operational performance for cpu being used.
Verified on Nim 0.17, using clang (smaller exec) or gcc
$ nim c --cc:[clang|gcc] --d:release ssozp5.nim
Then run executable: $ ./ssozp5 <cr>, and enter value for N.
As coded, input values cover the range: 7 -- 2^64-1
Related code, papers, and tutorials, are downloadable here:
http://www.4shared.com/folder/TcMrUvTB/_online.html
Use of this code is free subject to acknowledgment of copyright.
Copyright (c) 2017 Jabari Zakiya -- jzakiya at gmail dot com
Version Date: 2017/08/21
This code is provided under the terms of the
GNU General Public License Version 3, GPLv3, or greater.
License copy/terms are here: http://www.gnu.org/licenses/
]#
import math
import strutils, typetraits
let pbits = [
8,7,7,6,7,6,6,5,7,6,6,5,6,5,5,4,7,6,6,5,6,5,5,4,6,5,5,4,5,4,4,3
,7,6,6,5,6,5,5,4,6,5,5,4,5,4,4,3,6,5,5,4,5,4,4,3,5,4,4,3,4,3,3,2
,7,6,6,5,6,5,5,4,6,5,5,4,5,4,4,3,6,5,5,4,5,4,4,3,5,4,4,3,4,3,3,2
,6,5,5,4,5,4,4,3,5,4,4,3,4,3,3,2,5,4,4,3,4,3,3,2,4,3,3,2,3,2,2,1
,7,6,6,5,6,5,5,4,6,5,5,4,5,4,4,3,6,5,5,4,5,4,4,3,5,4,4,3,4,3,3,2
,6,5,5,4,5,4,4,3,5,4,4,3,4,3,3,2,5,4,4,3,4,3,3,2,4,3,3,2,3,2,2,1
,6,5,5,4,5,4,4,3,5,4,4,3,4,3,3,2,5,4,4,3,4,3,3,2,4,3,3,2,3,2,2,1
,5,4,4,3,4,3,3,2,4,3,3,2,3,2,2,1,4,3,3,2,3,2,2,1,3,2,2,1,2,1,1,0
]
let residues = [1, 7, 11, 13, 17, 19, 23, 29, 31]
# Global parameters
const
modp5 = 30 # prime generator mod value
rescnt = 8 # number of residues for prime generator
var
pcnt = 0 # number of primes from r1..sqrt(N)
primecnt = 0'u64 # number of primes <= N
next: seq[uint64] # array of primes first multiples
primes: seq[int] # array of primes <= sqrt(N)
seg: seq[uint8] # seg[B] segment byte array
proc sozp7(val: int | int64) =
let md = 210
let rscnt = 48
let res = [1, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67
, 71, 73, 79, 83, 89, 97,101,103,107,109,113,121,127,131,137,139
,143,149,151,157,163,167,169,173,179,181,187,191,193,197,199,209,211]
var posn = newSeq[int](210)
for i in 0 .. <rscnt: posn[res[i]] = i-1
let num = (val-1) or 1 # if value even number then subtract 1
var k = num div md # kth residue group
var modk = md * k # base num value
var r = 1 # starting with first residue
while num >= modk+res[r]: r += 1 # find last pc position <= num
let maxprms = k*rscnt + r - 1 # maximum number of prime candidates
var prms = newSeq[bool](maxprms) # array of prime candidates set False
let sqrtN = int(sqrt float64(num))
modk=0; r=0; k=0
# sieve to eliminate nonprimes from small primes prms array
for prm in prms:
r += 1; if r > rscnt: (r = 1; modk += md; k += 1)
if prm: continue
let res_r = res[r]
let prime = modk + res_r
if prime > sqrtN: break
let prmstep = prime * rscnt
for ri in res[1..rscnt]:
let prod = res_r * ri # residues cross-product
var nonprm = (k*(prime + ri) + prod div md)*rscnt + posn[prod mod md]
while nonprm < maxprms: prms[nonprm] = true; nonprm += prmstep
# the prms array now has all the positions for primes r1..N
# extract prime numbers and count from prms into primes array
primes = @[7] # r1 prime for P5
modk = 0; r = 0
for prm in prms:
r += 1; if r > rscnt: (r = 1; modk += md)
if not prm: primes.add(modk + res[r])
pcnt = len(primes)
proc next_init() =
var pos = newSeq[int](modp5)
for i in 0 .. <rescnt: pos[residues[i]] = i-1
pos[1] = rescnt-1
# for each prime store resgroup on each restrack for prime*(modk+ri)
for j, prime in primes:
let k = uint((prime-2)) div uint(modp5) # find the resgroup it's in
let r = uint((prime-2)) mod uint(modp5) + 2 # its base residue value
for ri in residues[1 .. rescnt]: # for each residue value
let prod: int = int(r) * ri # create residues cross-product r*ri
let row: int = pos[prod mod modp5] * pcnt # create residue track address
next[row + j] = k*(uint(prime) + uint(ri)) + uint(prod-2) div uint(modp5)
proc segsieve(Kn: int) =
# for Kn resgroups in segment
for b in 0 .. <Kn: # for every byte in the segment
seg[b] = 0 # set every byte bit to prime (0)
for r in 0 .. <rescnt: # for each ith (of 8) residues for P5
let biti = uint8(1 shl r) # set the ith residue track bit mask
let row = r*pcnt # set address to ith row in next[]
for j in 0 .. <pcnt: # for each prime <= sqrt(N) for restrack
if next[row+j] < uint(Kn): # if 1st mult resgroup index <= seg size
var k: int = int(next[row+j]) # get its resgroup value
let prime = primes[j] # get the prime
while k < Kn: # for each primenth byte < segment size
seg[k] = seg[k] or biti # set ith residue in byte as nonprime
k += int(prime)
next[row+j] = uint(k - Kn) # 1st resgroup in next eligible segment
else: next[row+j] -= uint(Kn) # if 1st mult resgroup index > seg size
# count the primes in the segment
for byt in seg[0..<Kn]: # for the Kn resgroup bytes
primecnt += uint(pbits[byt]) # count the '0' bits as primes
proc printprms(Kn: int, Ki: uint64) =
# Extract and print the primes in each segment:
# recommend piping output to less: ./ssozpxxx | less
# can then use Home, End, Pgup, Pgdn keys to see primes
for k in 0 .. <Kn: # for Kn residues groups|bytes
for r in 0 .. <8: # for each residue|bit in a byte
if (int(seg[k]) and (1 shl r)) == 0: # if it's '0' it's a prime
write(stdout, " ", uint64(modp5)*(Ki+uint64(k)) + uint64(residues[r+1]))
#echo "\n"
proc ssozp5() =
stdout.write "Enter integer number: "
let val = stdin.readline.parseBiggestUInt # find primes <= val (7..2^64-1)
let B = 262144 # L2D_CACHE_SIZE 256*1024 bytes
let KB = B # number of segment resgroups
seg = newSeq[uint8](B) # create segment array of B bytes size
writeLine(stdout, "segment has ", B, " bytes and ", KB, " residues groups")
let num = uint64((val-1) or 1) # if val even subtract 1
var k: uint64 = num div modp5 # resgroup of num
var modk = uint64(modp5) * k # base value of num
var r = 1 # starting with first residue
while num >= modk+uint64(residues[r]): r += 1 # find last pc position <= num
let maxpcs = k * uint64(rescnt) + uint64(r-1) # maximum number of prime candidates
let Kmax = uint64(num-2) div uint64(modp5) + 1 # maximum number of resgroups for val
writeLine(stdout, "prime candidates = ", maxpcs, "; resgroups = ", Kmax)
let sqrtN = int(sqrt float64(num))
sozP7(sqrtN) # get pcnt and primes <= sqrt(nun)
writeLine(stdout, "create next[", rescnt, "x", pcnt, "] array")
next = newSeq[uint64](rescnt*pcnt) # create the next[] array
next_init() # load with first nonprimes resgroups
primecnt = 3 # 2,3,5 the P5 excluded primes count
var Kn: int = KB # set sieve resgroups size to segment size
var Ki = 0'u64 # starting resgroup index for each segment
echo "perform segmented SoZ\n";
while Ki < Kmax: # for KB size resgroup slices up to Kmax
if Kmax-Ki < uint64(KB): Kn = int(Kmax-Ki) # set sieve resgroups size for last segment
segsieve(Kn) # sieve primes for current segment
#printprms(Kn,Ki) # print primes for the segment (optional)
Ki += uint64(KB) # next segment slice
var lprime = 0'u64 # get last prime and primecnt <= val
modk = uint64(modp5)*(Kmax-1) # mod for last resgroup in last segment
var b = Kn-1 # num bytes to last resgroup in segment
r = rescnt-1 # from last restrack in resgroup
while true: # repeat until last prime determined
if (int(seg[b]) and (1 shl r)) == 0:
lprime = modk+uint64(residues[r+1]) # determine the prime value
if lprime <= num: break # if <= num exit from loop with lprime
primecnt -= 1 # else reduce total primecnt
# reduce restrack, setup next resgroup
r -= 1; if r < 0: (r = rescnt-1; modk -= modp5; b -= 1) # if necessary
writeLine(stdout, "last segment = ", Kn, " resgroups; segments iterations = ", Ki div uint64(KB))
writeLine(stdout, "last prime (of ", primecnt, ") is ", lprime)
ssozp5()
I cleaned up the code and added more commments. It should be much easier now to uderstand what/why the code is doing after reading my paper.
Here is the gist location of the source code for this version.
https://gist.github.com/jzakiya/94670e6144735eb0041919f633d6011c
I've created a rearchitected version to run in parallel, and in fact I got it to compile, and run with correct results, but is 2x slower than the serial version. If anyone knows how to really create parallel running programs please let me know.
I will create a github project for all the different versions I|others create, for different languages, prime generators, parallel versions, etc. So far I have various C++ versions, and now this Nim version, and I plan to do a Crystal version too.
If people want to, you can contact me directly via email, which is included in my pape and the code intro.
I would urge people who run the code to play with the B value for their system cache. I've only run this code on Intel I5|7 cpu laptops. I'm interested how the code runs on other systems (AMD, ARM, PowerPC, etc) and operarting systems (I've only used Linux).
Here's the code in the gist file.
#[
This Nim source file will compile to an executable program to
perform the Segmented Sieve of Zakiya (SSoZ) to find primes <= N.
It is based on the P5 Strictly Prime (SP) Prime Generator.
Prime Genrators have the form: mod*k + ri; ri -> {1,r1..mod-1}
The residues ri are integers coprime to mod, i.e. gcd(ri,mod) = 1
For P5, mod = 2*3*5 = 30 and the number of residues are
rescnt = (2-1)(3-1)(5-1) = 8, which are {1,7,11,13,17,19,23,29}.
Adjust segment byte length parameter B (usually L1|l2 cache size)
for optimum operational performance for cpu|system being used.
Verified on Nim 0.17, using clang (smaller exec) or gcc
$ nim c --cc:[clang|gcc] --d:release ssozp5.nim
Then run executable: $ ./ssozp5 <cr>, and enter value for N.
As coded, input values cover the range: 7 -- 2^64-1
Related code, papers, and tutorials, are downloadable here:
http://www.4shared.com/folder/TcMrUvTB/_online.html
Use of this code is free subject to acknowledgment of copyright.
Copyright (c) 2017 Jabari Zakiya -- jzakiya at gmail dot com
Version Date: 2017/08/23
This code is provided under the terms of the
GNU General Public License Version 3, GPLv3, or greater.
License copy/terms are here: http://www.gnu.org/licenses/
]#
import math # for sqrt function
import strutils, typetraits # for number input
# Global var used to count number of primes in 'seg' variable
# Each value is number of '0' bits (primes) for values 0..255
let pbits = [
8,7,7,6,7,6,6,5,7,6,6,5,6,5,5,4,7,6,6,5,6,5,5,4,6,5,5,4,5,4,4,3
,7,6,6,5,6,5,5,4,6,5,5,4,5,4,4,3,6,5,5,4,5,4,4,3,5,4,4,3,4,3,3,2
,7,6,6,5,6,5,5,4,6,5,5,4,5,4,4,3,6,5,5,4,5,4,4,3,5,4,4,3,4,3,3,2
,6,5,5,4,5,4,4,3,5,4,4,3,4,3,3,2,5,4,4,3,4,3,3,2,4,3,3,2,3,2,2,1
,7,6,6,5,6,5,5,4,6,5,5,4,5,4,4,3,6,5,5,4,5,4,4,3,5,4,4,3,4,3,3,2
,6,5,5,4,5,4,4,3,5,4,4,3,4,3,3,2,5,4,4,3,4,3,3,2,4,3,3,2,3,2,2,1
,6,5,5,4,5,4,4,3,5,4,4,3,4,3,3,2,5,4,4,3,4,3,3,2,4,3,3,2,3,2,2,1
,5,4,4,3,4,3,3,2,4,3,3,2,3,2,2,1,4,3,3,2,3,2,2,1,3,2,2,1,2,1,1,0
]
# The global residue values for the P5 prime generator.
let residues = [1, 7, 11, 13, 17, 19, 23, 29, 31]
# Global parameters
const
modp5 = 30 # mod value for P5 prime generator
rescnt = 8 # number of residues for P5 prime generator
var
pcnt = 0 # number of primes from r1..sqrt(N)
primecnt = 0'u64 # number of primes <= N
next: seq[uint64] # table of regroups vals for primes multiples
primes: seq[int] # list of primes r1..sqrt(N)
seg: seq[uint8] # segment byte array to perform ssoz
# This routine is used to compute the list of primes r1..sqrt(input_num),
# stored in global var 'primes', and its count stored in global var 'pcnt'.
# Any algorithm (fast|small) can be used. Here the SoZ using P7 is used.
proc sozp7(val: int | int64) = # Use P7 prime gen to finds upto val
let md = 210 # P7 mod value
let rscnt = 48 # P7 residue count and residues list
let res = [1, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67
, 71, 73, 79, 83, 89, 97,101,103,107,109,113,121,127,131,137,139
,143,149,151,157,163,167,169,173,179,181,187,191,193,197,199,209,211]
var posn = newSeq[int](210) # small array (hash) to convert
for i in 0 .. <rscnt: posn[res[i]] = i-1 # (x mod md) to values -1,0,1,2..6
let num = (val-1) or 1 # if val even then subtract 1
var k = num div md # compute its residue group val
var modk = md * k # compute its base num
var r = 1 # starting with first residue
while num >= modk+res[r]: r += 1 # find last pc position <= num
let maxprms = k*rscnt + r - 1 # maximum number of prime candidates
var prms = newSeq[bool](maxprms) # array of prime candidates set False
let sqrtN = int(sqrt float64(num))
modk = 0; r = 0; k = 0
# sieve to eliminate prime multiples from list of pcs r1..sqrtN
for prm in prms: # from list of pc positions in prms
r += 1; if r > rscnt: (r = 1; modk += md; k += 1)
if prm: continue # if pc not prime go to next location
let res_r = res[r] # if prime save residue of prime value
let prime = modk + res_r # numerate the prime value
if prime > sqrtN: break # exit if > sqrtN
let prmstep = prime * rscnt # prime's stepsize to eliminate its mults
for ri in res[1..rscnt]: # eliminate prime's multiples from prms
let prod = res_r * ri # residues cross-product for this prime
# compute resgroup val of 1st prime multiple for each restrack
# starting there, mark all prime multiples on restrack upto end of prms
var prm_mult = (k*(prime + ri) + prod div md)*rscnt + posn[prod mod md]
while prm_mult < maxprms: prms[prm_mult] = true; prm_mult += prmstep
# prms now contains the nonprime positions for the prime candidates r1..N
# extract primes into global var 'primes' and count into global var 'pcnt'
primes = @[7] # r1 prime for P5
modk = 0; r=0
for prm in prms: # numerate primes from processed pcs list
r += 1; if r > rscnt: (r = 1; modk += md)
if not prm: primes.add(modk + res[r]) # put prime in global 'primes' list
pcnt = len(primes) # set global count of primes
# This routine initializes the [rescnt x pcnt] global var 'next' table with the
# resgroup values of the 1st prime multiples for each prime r1..sqrtN along the
# restracks for the 8 P5 residues (7, 11..29, 31). 'next' is used to eliminate
# prime multiples in 'seg' for each SSoZ segment, and is updated with new
# resgroup values for each prime for use with subsequent segment iterations.
proc next_init() =
var pos = newSeq[int](modp5) # create small array (hash)
for i in 0 .. <rescnt: pos[residues[i]] = i-1 # to convert an (x mod modp5)
pos[1] = rescnt-1 # val into restrack val 0..7
# load 'next' with 1st prime multiples regroups vals along each residue track
for j, prime in primes: # for each prime r1..sqrt(N)
let k = uint((prime-2)) div uint(modp5) # find the resgroup it's in
let r = uint((prime-2)) mod uint(modp5) + 2 # and its residue value
for ri in residues[1 .. rescnt]: # for each prime|residue pair
let prod: int = int(r) * ri # compute res cross-product r*ri
let row: int = pos[prod mod modp5] * pcnt # compute residue track address
# compute|store resgroup val of 1st prime multiple for prime|ri residue pair
next[row + j] = k*(uint(prime) + uint(ri)) + uint(prod-2) div uint(modp5)
# This routine performs the segment sieve for a segment of Kn resgroups|bytes.
# Each 'seg' bit represents a residue track of Kn resgroups, which are processed
# sequentially. 'next' resgroup vals are used to mark prime multiples in 'seg' along
# each restrack, and is udpated for each prime for the next segment. Upon completion
# the number of primes ('0' bits) per 'seg' byte are added to global var 'primecnt'.
proc segsieve(Kn: int) = # for Kn resgroups in segment
for b in 0 .. <Kn: seg[b] = 0 # initialize seg bits to all prime '0'
for r in 0 .. <rescnt: # for each ith (of 8) residues for P5
let biti = uint8(1 shl r) # set the ith residue track bit mask
let row = r * pcnt # set address to ith row in next[]
for j, prime in primes: # for each prime r1..sqrt(N)
if next[row + j] < uint(Kn): # if 1st mult resgroup index <= seg size
var k = int(next[row + j]) # get its resgroup value
while k < Kn: # for each primenth byte < segment size
seg[k] = seg[k] or biti # set resgroup's restrack bit to '1' (nonprime)
k += int(prime) # compute next prime multiple resgroup
next[row + j] = uint(k - Kn) # 1st resgroup in next eligible segment
else: next[row + j] -= uint(Kn) # do if 1st mult resgroup index > seg size
# count the primes in the segment
for byt in seg[0..<Kn]: # for the Kn resgroup bytes
primecnt += uint(pbits[byt]) # count the '0' bits as primes
# Print segment primes. Can pipe output as: $ ./ssozp5 | less (or output file)
# to make viewing primes easier
proc printprms(Kn: int, Ki: uint64) = # print out prime values in segment
for k in 0 .. <Kn: # for Kn residues groups|bytes
for r in 0 .. <8: # for each restrack bit in byte
if (int(seg[k]) and (1 shl r)) == 0: # if it's '0' it's a prime
write(stdout, " ", uint64(modp5)*(Ki+uint64(k)) + uint64(residues[r+1]))
#echo "\n"
proc ssozp5() = # Perform SSoZ using P5 prime generator
stdout.write "Enter integer number: "
let val = stdin.readline.parseBiggestUInt # input integer val (7..2^64-1)
const B = 256 * 1024 # adjust size to optimize for system's cache
let KB = B # number of segment resgroups
seg = newSeq[uint8](B) # create segment array of B bytes size
echo("segment has ", B, " bytes and ", KB, " residues groups")
let num = uint64((val-1) or 1) # if val even subtract 1
var k: uint64 = num div modp5 # compute its resgroup val
var modk = uint64(modp5) * k # compute its base num
var r = 1 # starting with first residue
while num >= modk+uint64(residues[r]): r += 1 # find last pc position <= num
let maxpcs = k * uint64(rescnt) + uint64(r-1) # maximum number of prime candidates
let Kmax = uint64(num-2) div uint64(modp5) + 1 # maximum number of resgroups for num
echo("prime candidates = ", maxpcs, "; resgroups = ", Kmax)
let sqrtN = int(sqrt float64(num))
sozP7(sqrtN) # get pcnt and primes <= sqrt(nun)
echo("create next[", rescnt, "x", pcnt, "] array")
next = newSeq[uint64](rescnt*pcnt) # create size of the global next[] table
next_init() # load with first prime multiples resgroups
primecnt = 3 # 2,3,5 the P5 excluded primes count
var Kn: int = KB # set sieve resgroups size to segment size
var Ki = 0'u64 # starting resgroup index for first segment
echo("perform segmented SoZ")
while Ki < Kmax: # for KB size resgroup slices up to Kmax
if Kmax-Ki < uint64(KB): Kn = int(Kmax-Ki) # to set last segment size
segsieve(Kn) # sieve primes for current segment
#printprms(Kn,Ki) # print primes for the segment (optional)
Ki += uint64(KB) # first resgroup index for next seg slice
var lprime = 0'u64 # get last prime and primecnt <= val
modk = uint64(modp5) * (Kmax-1) # mod for last resgroup in last segment
var b = Kn-1 # number of bytes|resgroup in last segment
r = rescnt-1 # set to last restrack in resgroup
while true: # repeat until last prime determined
if (int(seg[b]) and (1 shl r)) == 0:
lprime = modk + uint64(residues[r+1]) # numerate the prime value
if lprime <= num: break # if <= num exit from loop with lprime
primecnt -= 1 # else reduce primecnt if prime > num
# reduce restrack, setup next resgroup
r -= 1; if r < 0: (r = rescnt-1; modk -= modp5; b -= 1) # if necessary
echo("last segment = ", Kn, " resgroups; segments iterations = ", Ki div uint64(KB))
echo("total primes = ", primecnt, "; last prime = ", lprime)
ssozp5()
A user may not know, what is terminal (seriously). And this won't work on Windows (quite widely used thing). Probably a couple of links to some articles (somewhere on web, at some OS-related forums), one for POSIX and one for Windows, would give more for less effort.
Anyway everything cannot be covered - some assumptions on the user's knowledge always remain; and too much intermixing of essential information with additional elucidations may distract the attention.
I also agree with @jzakiya. I personally have only experience with the windows env (I have to use it on daily base (job)) - I used linux since 1994 for about 10 years but finally I throwed it away cause of the awful desktop. Today I cant speak about it because of lacking experience; I never tried it again.
The nim installer on windows works very well and out of the box (no extra settings needed). If you like to use your own gcc I recommend the tdm-gcc (works also out of the box but you have to set your cli-path). Also It works without tweaks with the cl.exe (VS2017 community edition for instance) I never got nim running with the gcc of the mingw-installation-manager (the gcc compiler check from nim always failed). I think the docs are very good but sometimes outdated (for instance the examples) but for me its normal. I searched also a while for the nimscript feature of the compiler (the compiler help says nothing about that) and within the nimscript-docpage its a litte bit hidden).
The overall feeling for me is that Nim is a very very nice designed and architected piece of software (design by committee is awful) and not burdened with any tweaks. the nimcode itself is fun to read (especially from others and thats not normal for any computer language). And also the new book is very good and worth reading. The idea of the integrated build-system is a big plus (I hate ant, maven and so on especially when the buildscript is more complex than the software itself which needs to be build)
It's these type of little things that cause unnecessary frustrations and shy people away from using new things.
Please think about the things that could potentially go wrong, then create instructions and documentation that will prevent them.
We do try, but we don't have the time to ensure everything is 100%. I think I need to create a page/article about this because I'm finding myself saying this a lot.
Please understand that Nim is a volunteer-run open source project. We need you to help us improve it. This isn't to say that I don't appreciate your message, you are giving us feedback on how things can be improved which is already much better than just abandoning Nim without saying a word. But can we step things up a notch and all act on our own feedback? These sorts of improvements are a perfect first contribution. All you need to do is edit the website and create a PR: https://github.com/nim-lang/website.
By the way, I've created choosenim precisely to make installation as easy as possible. I'd appreciate if you could try it out: https://github.com/dom96/choosenim#installation
I agree - Nim is a community driven project, like many open source projects, and it gets better only if you scratch your own itch. Small improvements to the documentation is, like dom96 says, a perfect way to start contributing. :)
[Edit] I agree that not everyone is command-line ninjas, and I think a link or two to good docs on setting the PATH on various systems elsewhere on the internet would be a good addition to the build instructions.
I don't think you should expect anyone else to add that to the docs for you. I wouldn't bother, or even consider that information missing, simply because I know how to do this already. [/Edit]
Hey, welcome to Nim,
If you like learning with mathematical challenges, I started Nim and many other languages with Project Euler, which gives you small mathematical challenges for your programming languages craving.
Here are my solutions to the first 10 problems including a fast packed bitarray-based Sieve of Eratosthenes with OpenMP parallel loop.
I compared it to Rosettacode sieves and primesieve/primegen (C/C++) and various Haskell solutions and even Sieve of Atkins, on my machine it is 1.1 to 2x faster. Probably due to better cache locality.
I tried the run your code but the compiler doesn't see the euler libraries.
eulerprimes.nim(24, 6) Error: cannot open 'lib_euler_math'
How do I get all the libraries to get it to run?
I'd be interested in benchmark times. Can you post your hardware specs and the times for counting the primes <= N, for for N = 10^9, 10^10, and 10^11?
I don't know if you've seen my sequential SSoZ code, but I'd be interested in how they compare. I'm still learning Nim, and don't know the correct way to implement a parallel version yet, which if done correctly will be much faster. Studying your code I've already learned some Nim stuff I didn't know.
Here again are the gists to some reference versions of the sequential SSoZ using the P5 prime generator.
ssozp5.nim
Find the primes <= N, using sequential Segmented Sieve of Zakiya (SSoZ) with P5 prime generator, written in Nim
https://gist.github.com/jzakiya/94670e6144735eb0041919f633d6011c
twinprimes_ssozp5.nim
Find Twin Primes <= N, using sequential Segmented Sieve of Zakiya (SSoZ) with P5 prime generator, written in Nim
https://gist.github.com/jzakiya/776b7aae3126168b4cad90de9adc4961
nthprime_ssozp5.nim
Find Nth Primes, using sequential Segmented Sieve of Zakiya (SSoZ) with P5 prime generator, written in Nim
https://gist.github.com/jzakiya/aff4f2e38f9b0f833955b4cc391de3d4
You probably need to specify a path while importing. For example, running from ./bin "prime_bench.nim" file:
from ../src/lib/lib_euler_primes import primeSieve
from os import commandLineParams
from strutils import parseUInt
let arguments = commandLineParams()
echo arguments[0].parseUInt.primeSieve.len
$ time ./bin/prime_bench 1_000_000_000
50847534
real 0m6.478s
user 0m6.069s
sys 0m0.392s
$ time ./bin/prime_bench 10_000_000_000
455052511
real 1m33.301s
user 1m11.694s
sys 0m14.762s
Compilation was done with -d:release. I tried with march=native but it didn't seem to help.
This is on a MacBook Pro 13 2015, so i5-5257U. You can compare it to others on Notebookcheck.
Hey thanks.
I'll try to play around with the settings to get your code to compile when I get some time.
On my System76 Gazelle laptop, with I7-6700HQ, 2.6-3.5GHz, 16GB ram, running in a VB VM for Manjaro KDE, using Nim 0.17.0 compiled with: $ nim c --cc:clang --d:release ssozp5.nim
I get these times, running on a noisy system (multiple browsers, et al, open) for the sequential version of SSoZ using P5 prime generator.
[jzakiya@jabari-pc nim]$ time ./ssozp
Enter integer number: 1_000_000_000
segment has 262144 bytes and 262144 residues groups
prime candidates = 266666665; resgroups = 33333334
create next[8x3398] array
perform segmented SoZ
last segment = 41046 resgroups; segment slices = 128
total primes = 50847534; last prime = 999999937
real 0m6.915s
user 0m0.550s
sys 0m0.007s
[jzakiya@jabari-pc nim]$ time ./ssozp5
Enter integer number: 10_000_000_000
segment has 262144 bytes and 262144 residues groups
prime candidates = 2666666665; resgroups = 333333334
create next[8x9589] array
perform segmented SoZ
last segment = 148310 resgroups; segment slices = 1272
total primes = 455052511; last prime = 9999999967
real 0m11.243s
user 0m5.260s
sys 0m0.010s
[jzakiya@jabari-pc nim]$ time ./ssozp5
Enter integer number: 100_000_000_000
segment has 262144 bytes and 262144 residues groups
prime candidates = 26666666665; resgroups = 3333333334
create next[8x27290] array
perform segmented SoZ
last segment = 172374 resgroups; segment slices = 12716
total primes = 4118054813; last prime = 99999999977
real 1m7.795s
user 0m59.667s
sys 0m0.027s
The times I previously posted were from a VirtualBox VM, limited to 4GB of ram, and in a "noisy" environment. The times here are derived from my host OS (PCLinuxOS) with full access to the system's 16GB of ram, in a "quiet" environment, i.e. I closed everything and rebooted, then just opened a terminal and ran the timing tests.
[jzakiya@localhost nim]$ time ./ssozp5
Enter integer number: 1_000_000_000
segment has 262144 bytes and 262144 residues groups
prime candidates = 266666665; resgroups = 33333334
create nextp[8x3398] array
perform segmented SoZ
last segment = 41046 resgroups; segment slices = 128
total primes = 50847534; last prime = 999999937
real 0m6.928s
user 0m0.393s
sys 0m0.000s
-----------------------------------------------------------
[jzakiya@localhost nim]$ time ./ssozp5
Enter integer number: 10_000_000_000
segment has 262144 bytes and 262144 residues groups
prime candidates = 2666666665; resgroups = 333333334
create nextp[8x9589] array
perform segmented SoZ
last segment = 148310 resgroups; segment slices = 1272
total primes = 455052511; last prime = 9999999967
real 0m5.355s
user 0m4.247s
sys 0m0.000s
-----------------------------------------------------------
[jzakiya@localhost nim]$ time ./ssozp5
Enter integer number: 100_000_000_000
segment has 262144 bytes and 262144 residues groups
prime candidates = 26666666665; resgroups = 3333333334
create nextp[8x27290] array
perform segmented SoZ
last segment = 172374 resgroups; segment slices = 12716
total primes = 4118054813; last prime = 99999999977
real 0m48.434s
user 0m46.852s
sys 0m0.008s
-----------------------------------------------------------
[jzakiya@localhost nim]$ time ./ssozp5
Enter integer number: 1_000_000_000_000
segment has 262144 bytes and 262144 residues groups
prime candidates = 266666666665; resgroups = 33333333334
create nextp[8x78495] array
perform segmented SoZ
last segment = 150870 resgroups; segment slices = 127157
total primes = 37607912018; last prime = 999999999989
real 11m0.884s
user 10m57.985s
sys 0m0.010s
I also wanted to show that the math that forms the foundation of the Sieve of Zakiya (SoZ) reduces the number of prime candidates to sieve primes from as you use more "efficient" Strictly Prime (SP) generators. This makes the SoZ algorithmically possibly the fastest prime sieve.
To illustrate this, below are comparisons of the sequential SoZ between the P5 and P7 prime generators. Because the P7 generator sieves through 8/30 (26.67%) of all integers, compared to 2/6 (33.33%) for P5, it greatly reduces the number of prime candidates it needs to sieve through than P5.
These versions just count the primes <= N, and don't numerate them into a list, to demonstrate the true performance of the prime sieve itself.
Again, these timings were derived on my host OS in a "quiet" environment, directly after rebooting.
[jzakiya@localhost nim]$ time ./sozp5d
Enter integer number: 1_000_000_000
50847534
real 0m12.074s
user 0m3.738s
sys 0m0.054s
[jzakiya@localhost nim]$ time ./sozp7d
Enter integer number: 1_000_000_000
50847534
real 0m6.938s
user 0m3.055s
sys 0m0.048s
--------------------------------------------
[jzakiya@localhost nim]$ time ./sozp5d
Enter integer number: 10_000_000_000
455052511
real 0m43.221s
user 0m40.365s
sys 0m0.554s
[jzakiya@localhost nim]$ time ./sozp7d
Enter integer number: 10_000_000_000
455052511
real 0m38.520s
user 0m35.655s
sys 0m0.496s
Because the "classical Sieve of Eratosthenes" searches through the entire number space up to some N, it is always inherently slower than the SoZ method. The ways to increase its performance is to reduce the number space it sieves, by eliminating the even numbers (a 50% reduction of the number space) and applying various wheel and other reduction techniques to eliminate more of the sieved number space.
The SoZ inherently works on a reduced number space that can theoretically be made as small as desired by the choice of the prime generator used to perform the sieve.
The SoZ is also an inherently parallelizable algorithm, which can take advantage of multicore|threaded systems and CUDA|OpenMP implementations, and others with GPUs (graphic processing units).