nimforum mirror - Nim newbie request/challenge

jzakiya (orginal) [2017-08-07T06:45:13+02:00] view original

I have programmed in a few languages extensively in my life, though now I primarly use Ruby.

I just found out about Nim literlly 3 weeks ago now (mid July 2017) after reading this article: Nim for the discerning Rubyist http://www.rubyflow.com/p/1j08ef-nim-for-the-discerning-rubyist

I primarly use Ruby to apply to math/science problems/projects because it's so easy to programm in, and it allows me to think about how to solve problems without worrying about how to code it. I've also played around with Crystal (aka Ruby on steroids) but it's still young, and doesn't let me (currently) do what I want, or without a lot of hassles.

However Nim looks promising, especially it's parallel programing features (if real and not brain destroying).

So here's a request, and maybe a challenge, to the Nimers (Nimrods, Nimists, Nimsters,..) gurus.

I developed a prime sieve called the Sieve of Zakiya (SoZ), and it's companion the Segmented Sieve of Zakiya (SSoZ).

I wrote a paper, The Segmented Sieve of Zakiya (SSoZ) which describes its mathematical foundations and algorithm, and provides a working cpp implementation at the end of the paper (compiled, run, and verified). It's a relatively short program.

Here's a link to read and download the paper:

https://www.scribd.com/doc/228155369/The-Segmented-Sieve-of-Zakiya-SSoZ

My request/challenge is for someone(s) who are good at Nim to translate the code into idiomatic Nim, to demonstrate the best way to do this algorithm in Nim. Extra points if someone can do a parallel version of the algorithm, which I attempted to do using OpemMP, but what I did didnt' seem to make the code faster than the serial version.

I know this may be a lot to ask, but I learn best from code doing problems I understand, like the Rosetta Code benchmarks, but those are too short to learn enough for this algorithm.

Ultimately, I'd like to publish to results of benchmarks in different languages doing the SSoZ in an updated paper.

If anyone has any question, I'd be pleased to answer them as best I can.

Thanks in advance.

Jabari

scriptkiddy (orginal) [2017-08-08T17:37:29+02:00] view original

Honestly, this sounds like an interesting little project to bang out. However, I must admit that I am a humble Web Developer and, by extension, am not well versed in mathematics.

The best I could do is translate your C code in the linked document into Nim and maybe make a few small changes. However, without a deeper understanding of your algorithm, I doubt I would be able to optimize it in any meaningful way. It seems like your paper does a good job explaining the concepts behind the algorithm, but I do not have enough free time at the moment to work on this as I am approaching a deadline to launch a web application.

My suggestion is for you to possibly try and implement the algorithm in Nim yourself. You will find it to be surprisingly simple, I'm sure. If you then link to the repository here, we can help you by reviewing the code and making pull requests.

jzakiya (orginal) [2017-08-08T21:25:44+02:00] view original

Thanks. I'm in the process of learning enough Nim to take on this effort. However, I would do what you were thinking, and first try a comparable direct C++ to Nim translation, just to get something to work. But it's going to be some time (with my time/learning curve constraints) to create something workable.

I'm hoping someone who's really fluent in Nim, and who thinks in Nim, can perform the algorithm in a Nimby way that best showcases its capabilites. Once you get your head around the algorithm (and/or math) it shouldn't be too hard to understand what the C++ code is doing. Then someone versed in Nim can structure the alogrithm, particularly a true parallel processing version, in an efficient manner.

Krux02 (orginal) [2017-08-08T21:36:29+02:00] view original

Can you put the paper on a link that is not behind a paywall? I am normally pretty trained to port from different languages to different languages, but from the paper I can't do copy paste at all.

jzakiya (orginal) [2017-08-08T22:55:48+02:00] view original

The link I provided in my first post allows you to read/download it free from my Scribd site. Here it is again.

https://www.scribd.com/doc/228155369/The-Segmented-Sieve-of-Zakiya-SSoZ

I've tested it many times, and there is no filter/paywall to get around. Let me know if this doesn't work and I'll provide another method to get to it.

stisa (orginal) [2017-08-09T09:05:35+02:00] view original

Sounds interesting, I might try doing a direct port from cpp to nim when I get some free time.

Btw, download is indeed behind a paywall (looks like scribd offers a month free but after that it's paid). Reading is free though.

Copy pasting from the pdf is impossible, as latex or whatever mangles the symbols, but I found a gist here

Also found this link in the references https://www.4shared.com/dir/TcMrUvTB/sharing.html but it won't let me download anything.

I'd suggest setting up a github pages website with an html version (pandoc or similar can convert most formats to html) or at least some kind of git repo with the code, as it's free and most people here are used to cloning things to look at them on their pc (at least I am).

jzakiya (orginal) [2017-08-09T17:39:41+02:00] view original

OK, thanks for the feedback on your problems downloading the paper.

Can you tell me what platform you are using, and what country you're accessing from?

I see from my Scribd stats that the paper is being downloaded so it appears to be working for others.

I'll look at putting the paper up on github, though.

jzakiya (orginal) [2017-08-21T06:34:40+02:00] view original

Below is the Nim direct translation of the C++ code in my paper. I have verified it on Nim 0.17 on Linux. In comparison on this same machine the Nim code runs a bit faster for some (larger >= 10^9) test values (not comprehensibly benchmarked yet).

I'll post it to a gist later to make it easier to download. I'm now re-architecting it to make it parallel friendlier (I hope). Feedback welcome on style, and performance tips.

ssozp5.nim


#[
 This Nim source file will compile to an executable program to
 perform the Segmented Sieve of Zakiya (SSoZ) to find primes <= N.
 It is based on the P5 Strictly Prime (SP) Prime Generator.
 
 Prime Genrators have the form:  mod*k + ri; ri -> {1,r1..mod-1}
 The residues ri are integers coprime to mod, i.e. gcd(ri,mod) = 1
 For P5, mod = 2*3*5 = 30 and the number of residues are
 rescnt = (2-1)(3-1)(5-1) = 8, which are {1,7,11,13,17,19,23,29}.
 
 Adjust segment byte length parameter B (usually L1|l2 cache size)
 for optimum operational performance for cpu being used.
 
 Verified on Nim 0.17, using clang (smaller exec) or gcc
 
 $ nim c --cc:[clang|gcc] --d:release  ssozp5.nim
 
 Then run executable: $ ./ssozp5 <cr>, and enter value for N.
 As coded, input values cover the range: 7 -- 2^64-1
 
 Related code, papers, and tutorials, are downloadable here:
 
 http://www.4shared.com/folder/TcMrUvTB/_online.html
 
 Use of this code is free subject to acknowledgment of copyright.
 Copyright (c) 2017 Jabari Zakiya -- jzakiya at gmail dot com
 Version Date: 2017/08/21
 
 This code is provided under the terms of the
 GNU General Public License Version 3, GPLv3, or greater.
 License copy/terms are here:  http://www.gnu.org/licenses/
]#

import math
import strutils, typetraits

let pbits = [
 8,7,7,6,7,6,6,5,7,6,6,5,6,5,5,4,7,6,6,5,6,5,5,4,6,5,5,4,5,4,4,3
,7,6,6,5,6,5,5,4,6,5,5,4,5,4,4,3,6,5,5,4,5,4,4,3,5,4,4,3,4,3,3,2
,7,6,6,5,6,5,5,4,6,5,5,4,5,4,4,3,6,5,5,4,5,4,4,3,5,4,4,3,4,3,3,2
,6,5,5,4,5,4,4,3,5,4,4,3,4,3,3,2,5,4,4,3,4,3,3,2,4,3,3,2,3,2,2,1
,7,6,6,5,6,5,5,4,6,5,5,4,5,4,4,3,6,5,5,4,5,4,4,3,5,4,4,3,4,3,3,2
,6,5,5,4,5,4,4,3,5,4,4,3,4,3,3,2,5,4,4,3,4,3,3,2,4,3,3,2,3,2,2,1
,6,5,5,4,5,4,4,3,5,4,4,3,4,3,3,2,5,4,4,3,4,3,3,2,4,3,3,2,3,2,2,1
,5,4,4,3,4,3,3,2,4,3,3,2,3,2,2,1,4,3,3,2,3,2,2,1,3,2,2,1,2,1,1,0
]

let residues = [1, 7, 11, 13, 17, 19, 23, 29, 31]

# Global parameters
const
  modp5 = 30           # prime generator mod value
  rescnt = 8           # number of residues for prime generator
var
  pcnt = 0             # number of primes from r1..sqrt(N)
  primecnt = 0'u64     # number of primes <= N
  next: seq[uint64]    # array of primes first multiples
  primes: seq[int]     # array of primes <= sqrt(N)
  seg: seq[uint8]      # seg[B] segment byte array

proc sozp7(val: int | int64) =
  let md = 210
  let rscnt = 48
  let res = [1, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67
          , 71, 73, 79, 83, 89, 97,101,103,107,109,113,121,127,131,137,139
          ,143,149,151,157,163,167,169,173,179,181,187,191,193,197,199,209,211]
  
  var posn = newSeq[int](210)
  for i in 0 .. <rscnt: posn[res[i]] = i-1
  
  let num = (val-1) or 1           # if value even number then subtract 1
  var k = num div md               # kth residue group
  var modk = md * k                # base num value
  var r = 1                        # starting with first residue
  while num >= modk+res[r]: r += 1 # find last pc position <= num
  let maxprms = k*rscnt + r - 1    # maximum number of prime candidates
  var prms = newSeq[bool](maxprms) # array of prime candidates set False
  
  let sqrtN = int(sqrt float64(num))
  modk=0; r=0; k=0
  
  # sieve to eliminate nonprimes from small primes prms array
  for prm in prms:
    r += 1; if r > rscnt: (r = 1; modk += md; k += 1)
    if prm: continue
    let res_r = res[r]
    let prime = modk + res_r
    if prime > sqrtN: break
    let prmstep = prime * rscnt
    for ri in res[1..rscnt]:
      let prod = res_r * ri         # residues cross-product
      var nonprm = (k*(prime + ri) + prod div md)*rscnt + posn[prod mod md]
      while nonprm < maxprms: prms[nonprm] = true; nonprm += prmstep
  
  # the prms array now has all the positions for primes r1..N
  # extract prime numbers and count from prms into primes array
  primes = @[7]                     # r1 prime for P5
  modk = 0; r = 0
  for prm in prms:
    r += 1; if r > rscnt: (r = 1; modk += md)
    if not prm: primes.add(modk + res[r])
  pcnt = len(primes)

proc next_init() =
  var pos = newSeq[int](modp5)
  for i in 0 .. <rescnt: pos[residues[i]] = i-1
  pos[1] = rescnt-1
  
  # for each prime store resgroup on each restrack for prime*(modk+ri)
  for j, prime in primes:
    let k = uint((prime-2)) div uint(modp5)       # find the resgroup it's in
    let r = uint((prime-2)) mod uint(modp5) + 2   # its base residue value
    for ri in residues[1 .. rescnt]:  # for each residue value
      let prod: int = int(r) * ri                 # create residues cross-product r*ri
      let row:  int = pos[prod mod modp5] * pcnt  # create residue track address
      next[row + j] = k*(uint(prime) + uint(ri)) + uint(prod-2) div uint(modp5)

proc segsieve(Kn: int) =
                                      # for Kn resgroups in segment
  for b in 0 .. <Kn:                  # for every byte in the segment
    seg[b] = 0                        # set every byte bit to prime (0)
  for r in 0 .. <rescnt:              # for each ith (of 8) residues for P5
    let biti = uint8(1 shl r)         # set the ith residue track bit mask
    let row  = r*pcnt                 # set address to ith row in next[]
    for j in 0 .. <pcnt:              # for each prime <= sqrt(N) for restrack
      if next[row+j] < uint(Kn):      # if 1st mult resgroup index <= seg size
        var k: int = int(next[row+j]) # get its resgroup value
        let prime = primes[j]         # get the prime
        while k < Kn:                 # for each primenth byte < segment size
          seg[k] = seg[k] or biti     # set ith residue in byte as nonprime
          k += int(prime)
        next[row+j] = uint(k - Kn)    # 1st resgroup in next eligible segment
      else: next[row+j] -= uint(Kn)   # if 1st mult resgroup index > seg size
                                      # count the primes in the segment
  for byt in seg[0..<Kn]:             # for the Kn resgroup bytes
    primecnt += uint(pbits[byt])      # count the '0' bits as primes

proc printprms(Kn: int, Ki: uint64) =
  
  # Extract and print the primes in each segment:
  # recommend piping output to less: ./ssozpxxx | less
  # can then use Home, End, Pgup, Pgdn keys to see primes
  for k in 0 .. <Kn:                 # for Kn residues groups|bytes
    for r in 0 .. <8:                # for each residue|bit in a byte
      if (int(seg[k]) and (1 shl r)) == 0:   # if it's '0' it's a prime
        write(stdout, " ", uint64(modp5)*(Ki+uint64(k)) + uint64(residues[r+1]))
  #echo "\n"

proc ssozp5() =
  
  stdout.write "Enter integer number: "
  let val = stdin.readline.parseBiggestUInt # find primes <= val (7..2^64-1)
  
  let B = 262144                    # L2D_CACHE_SIZE 256*1024 bytes
  let KB = B                        # number of segment resgroups
  seg = newSeq[uint8](B)            # create segment array of B bytes size
  
  writeLine(stdout, "segment has ", B, " bytes and ", KB, " residues groups")
  
  let num = uint64((val-1) or 1)    # if val even subtract 1
  var k: uint64 = num div modp5     # resgroup of num
  var modk = uint64(modp5) * k      # base value of num
  var r = 1                         # starting with first residue
  while num >= modk+uint64(residues[r]): r += 1  # find last pc position <= num
  let maxpcs = k * uint64(rescnt) + uint64(r-1)  # maximum number of prime candidates
  let Kmax = uint64(num-2) div uint64(modp5) + 1 # maximum number of resgroups for val
  
  writeLine(stdout, "prime candidates = ", maxpcs, "; resgroups = ", Kmax)
  
  let sqrtN = int(sqrt float64(num))
  
  sozP7(sqrtN)                     # get pcnt and primes <= sqrt(nun)
  
  writeLine(stdout, "create next[", rescnt, "x", pcnt, "] array")
  
  next = newSeq[uint64](rescnt*pcnt)  # create the next[] array
  next_init()                      # load with first nonprimes resgroups
  
  primecnt = 3                     # 2,3,5 the P5 excluded primes count
  var Kn: int = KB                 # set sieve resgroups size to segment size
  var Ki = 0'u64                   # starting resgroup index for each segment
  
  echo "perform segmented SoZ\n";
  
  while Ki < Kmax:                 # for KB size resgroup slices up to Kmax
    if Kmax-Ki < uint64(KB): Kn = int(Kmax-Ki)  # set sieve resgroups size for last segment
    segsieve(Kn)                   # sieve primes for current segment
    #printprms(Kn,Ki)              # print primes for the segment (optional)
    Ki += uint64(KB)               # next segment slice
  
  var lprime = 0'u64               # get last prime and primecnt <= val
  modk = uint64(modp5)*(Kmax-1)    # mod for last resgroup in last segment
  var b = Kn-1                     # num bytes to last resgroup in segment
  r = rescnt-1                     # from last restrack in resgroup
  while true:                      # repeat until last prime determined
    if (int(seg[b]) and (1 shl r)) == 0:
      lprime = modk+uint64(residues[r+1])  # determine the prime value
      if lprime <= num: break      # if <= num exit from loop with lprime
      primecnt -= 1                # else reduce total primecnt
                                   # reduce restrack, setup next resgroup
    r -= 1; if r < 0: (r = rescnt-1; modk -= modp5; b -= 1) # if necessary
  
  writeLine(stdout, "last segment = ", Kn, " resgroups; segments iterations = ", Ki div uint64(KB))
  writeLine(stdout, "last prime (of ", primecnt, ") is ", lprime)

ssozp5()

scriptkiddy (orginal) [2017-08-21T17:55:57+02:00] view original

I don't have time right at this very moment, but I will take a look at this within the next few days and see if there are any changes I can make to get a faster runtime. Love all your comments by the way.

jzakiya (orginal) [2017-08-23T19:25:12+02:00] view original

I cleaned up the code and added more commments. It should be much easier now to uderstand what/why the code is doing after reading my paper.

Here is the gist location of the source code for this version.

https://gist.github.com/jzakiya/94670e6144735eb0041919f633d6011c

I've created a rearchitected version to run in parallel, and in fact I got it to compile, and run with correct results, but is 2x slower than the serial version. If anyone knows how to really create parallel running programs please let me know.

I will create a github project for all the different versions I|others create, for different languages, prime generators, parallel versions, etc. So far I have various C++ versions, and now this Nim version, and I plan to do a Crystal version too.

If people want to, you can contact me directly via email, which is included in my pape and the code intro.

I would urge people who run the code to play with the B value for their system cache. I've only run this code on Intel I5|7 cpu laptops. I'm interested how the code runs on other systems (AMD, ARM, PowerPC, etc) and operarting systems (I've only used Linux).

Here's the code in the gist file.


#[
 This Nim source file will compile to an executable program to
 perform the Segmented Sieve of Zakiya (SSoZ) to find primes <= N.
 It is based on the P5 Strictly Prime (SP) Prime Generator.
 
 Prime Genrators have the form:  mod*k + ri; ri -> {1,r1..mod-1}
 The residues ri are integers coprime to mod, i.e. gcd(ri,mod) = 1
 For P5, mod = 2*3*5 = 30 and the number of residues are
 rescnt = (2-1)(3-1)(5-1) = 8, which are {1,7,11,13,17,19,23,29}.
 
 Adjust segment byte length parameter B (usually L1|l2 cache size)
 for optimum operational performance for cpu|system being used.
 
 Verified on Nim 0.17, using clang (smaller exec) or gcc
 
 $ nim c --cc:[clang|gcc] --d:release  ssozp5.nim
 
 Then run executable: $ ./ssozp5 <cr>, and enter value for N.
 As coded, input values cover the range: 7 -- 2^64-1
 
 Related code, papers, and tutorials, are downloadable here:
 
 http://www.4shared.com/folder/TcMrUvTB/_online.html
 
 Use of this code is free subject to acknowledgment of copyright.
 Copyright (c) 2017 Jabari Zakiya -- jzakiya at gmail dot com
 Version Date: 2017/08/23
 
 This code is provided under the terms of the
 GNU General Public License Version 3, GPLv3, or greater.
 License copy/terms are here:  http://www.gnu.org/licenses/
]#

import math                   # for sqrt function
import strutils, typetraits   # for number input

# Global var used to count number of primes in 'seg' variable
# Each value is number of '0' bits (primes) for values 0..255
let pbits = [
 8,7,7,6,7,6,6,5,7,6,6,5,6,5,5,4,7,6,6,5,6,5,5,4,6,5,5,4,5,4,4,3
,7,6,6,5,6,5,5,4,6,5,5,4,5,4,4,3,6,5,5,4,5,4,4,3,5,4,4,3,4,3,3,2
,7,6,6,5,6,5,5,4,6,5,5,4,5,4,4,3,6,5,5,4,5,4,4,3,5,4,4,3,4,3,3,2
,6,5,5,4,5,4,4,3,5,4,4,3,4,3,3,2,5,4,4,3,4,3,3,2,4,3,3,2,3,2,2,1
,7,6,6,5,6,5,5,4,6,5,5,4,5,4,4,3,6,5,5,4,5,4,4,3,5,4,4,3,4,3,3,2
,6,5,5,4,5,4,4,3,5,4,4,3,4,3,3,2,5,4,4,3,4,3,3,2,4,3,3,2,3,2,2,1
,6,5,5,4,5,4,4,3,5,4,4,3,4,3,3,2,5,4,4,3,4,3,3,2,4,3,3,2,3,2,2,1
,5,4,4,3,4,3,3,2,4,3,3,2,3,2,2,1,4,3,3,2,3,2,2,1,3,2,2,1,2,1,1,0
]

# The global residue values for the P5 prime generator.
let residues = [1, 7, 11, 13, 17, 19, 23, 29, 31]

# Global parameters
const
  modp5 = 30           # mod value for P5 prime generator
  rescnt = 8           # number of residues for P5 prime generator
var
  pcnt = 0             # number of primes from r1..sqrt(N)
  primecnt = 0'u64     # number of primes <= N
  next: seq[uint64]    # table of regroups vals for primes multiples
  primes: seq[int]     # list of primes r1..sqrt(N)
  seg: seq[uint8]      # segment byte array to perform ssoz

# This routine is used to compute the list of primes r1..sqrt(input_num),
# stored in global var 'primes', and its count stored in global var 'pcnt'.
# Any algorithm (fast|small) can be used. Here the SoZ using P7 is used.
proc sozp7(val: int | int64) =      # Use P7 prime gen to finds upto val
  let md = 210                      # P7 mod value
  let rscnt = 48                    # P7 residue count and residues list
  let res = [1, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67
          , 71, 73, 79, 83, 89, 97,101,103,107,109,113,121,127,131,137,139
          ,143,149,151,157,163,167,169,173,179,181,187,191,193,197,199,209,211]
  
  var posn = newSeq[int](210)              # small array (hash) to convert
  for i in 0 .. <rscnt: posn[res[i]] = i-1 # (x mod md) to values -1,0,1,2..6
  
  let num = (val-1) or 1            # if val even then subtract 1
  var k = num div md                # compute its residue group val
  var modk = md * k                 # compute its base num
  var r = 1                         # starting with first residue
  while num >= modk+res[r]: r += 1  # find last pc position <= num
  let maxprms = k*rscnt + r - 1     # maximum number of prime candidates
  var prms = newSeq[bool](maxprms)  # array of prime candidates set False
  
  let sqrtN = int(sqrt float64(num))
  modk = 0; r = 0; k = 0
  
  # sieve to eliminate prime multiples from list of pcs r1..sqrtN
  for prm in prms:                  # from list of pc positions in prms
    r += 1; if r > rscnt: (r = 1; modk += md; k += 1)
    if prm: continue                # if pc not prime go to next location
    let res_r = res[r]              # if prime save residue of prime value
    let prime = modk + res_r        # numerate the prime value
    if  prime > sqrtN: break        # exit if > sqrtN
    let prmstep = prime * rscnt     # prime's stepsize to eliminate its mults
    for ri in res[1..rscnt]:        # eliminate prime's multiples from prms
      let prod = res_r * ri         # residues cross-product for this prime
      # compute resgroup val of 1st prime multiple for each restrack
      # starting there, mark all prime multiples on restrack upto end of prms
      var prm_mult = (k*(prime + ri) + prod div md)*rscnt + posn[prod mod md]
      while prm_mult < maxprms: prms[prm_mult] = true; prm_mult += prmstep
  
  # prms now contains the nonprime positions for the prime candidates r1..N
  # extract primes into global var 'primes' and count into global var 'pcnt'
  primes = @[7]                     # r1 prime for P5
  modk = 0; r=0
  for prm in prms:                  # numerate primes from processed pcs list
    r += 1; if r > rscnt: (r = 1; modk += md)
    if not prm: primes.add(modk + res[r])  # put prime in global 'primes' list
  pcnt = len(primes)                       # set global count of primes

# This routine initializes the [rescnt x pcnt] global var 'next' table with the
# resgroup values of the 1st prime multiples for each prime r1..sqrtN along the
# restracks for the 8 P5 residues (7, 11..29, 31). 'next' is used to eliminate
# prime multiples in 'seg' for each SSoZ segment, and is updated with new
# resgroup values for each prime for use with subsequent segment iterations.
proc next_init() =
  var pos = newSeq[int](modp5)                    # create small array (hash)
  for i in 0 .. <rescnt: pos[residues[i]] = i-1   # to convert an (x mod modp5)
  pos[1] = rescnt-1                               # val into restrack val 0..7
  
  # load 'next' with 1st prime multiples regroups vals along each residue track
  for j, prime in primes:                         # for each prime r1..sqrt(N)
    let k = uint((prime-2)) div uint(modp5)       # find the resgroup it's in
    let r = uint((prime-2)) mod uint(modp5) + 2   # and its residue value
    for ri in residues[1 .. rescnt]:              # for each prime|residue pair
      let prod: int = int(r) * ri                 # compute res cross-product r*ri
      let row:  int = pos[prod mod modp5] * pcnt  # compute residue track address
      # compute|store resgroup val of 1st prime multiple for prime|ri residue pair
      next[row + j] = k*(uint(prime) + uint(ri)) + uint(prod-2) div uint(modp5)

# This routine performs the segment sieve for a segment of Kn resgroups|bytes.
# Each 'seg' bit represents a residue track of Kn resgroups, which are processed
# sequentially. 'next' resgroup vals are used to mark prime multiples in 'seg' along
# each restrack, and is udpated for each prime for the next segment. Upon completion
# the number of primes ('0' bits) per 'seg' byte are added to global var 'primecnt'.
proc segsieve(Kn: int) =              # for Kn resgroups in segment
  for b in 0 .. <Kn: seg[b] = 0       # initialize seg bits to all prime '0'
  for r in 0 .. <rescnt:              # for each ith (of 8) residues for P5
    let biti = uint8(1 shl r)         # set the ith residue track bit mask
    let row  = r * pcnt               # set address to ith row in next[]
    for j, prime in primes:           # for each prime r1..sqrt(N)
      if next[row + j] < uint(Kn):    # if 1st mult resgroup index <= seg size
        var k = int(next[row + j])    # get its resgroup value
        while k < Kn:                 # for each primenth byte < segment size
          seg[k] = seg[k] or biti     # set resgroup's restrack bit to '1' (nonprime)
          k += int(prime)             # compute next prime multiple resgroup
        next[row + j] = uint(k - Kn)  # 1st resgroup in next eligible segment
      else: next[row + j] -= uint(Kn) # do if 1st mult resgroup index > seg size
                                      # count the primes in the segment
  for byt in seg[0..<Kn]:             # for the Kn resgroup bytes
    primecnt += uint(pbits[byt])      # count the '0' bits as primes

# Print segment primes. Can pipe output as: $ ./ssozp5 | less (or output file)
# to make viewing primes easier
proc printprms(Kn: int, Ki: uint64) = # print out prime values in segment
  for k in 0 .. <Kn:                  # for Kn residues groups|bytes
    for r in 0 .. <8:                 # for each restrack bit in byte
      if (int(seg[k]) and (1 shl r)) == 0:  # if it's '0' it's a prime
        write(stdout, " ", uint64(modp5)*(Ki+uint64(k)) + uint64(residues[r+1]))
  #echo "\n"

proc ssozp5() =                    # Perform SSoZ using P5 prime generator
  stdout.write "Enter integer number: "
  let val = stdin.readline.parseBiggestUInt # input integer val (7..2^64-1)
  
  const B = 256 * 1024             # adjust size to optimize for system's cache
  let KB = B                       # number of segment resgroups
  seg = newSeq[uint8](B)           # create segment array of B bytes size
  
  echo("segment has ", B, " bytes and ", KB, " residues groups")
  
  let num = uint64((val-1) or 1)   # if val even subtract 1
  var k: uint64 = num div modp5    # compute its resgroup val
  var modk = uint64(modp5) * k     # compute its base num
  var r = 1                        # starting with first residue
  while num >= modk+uint64(residues[r]): r += 1  # find last pc position <= num
  let maxpcs = k * uint64(rescnt) + uint64(r-1)  # maximum number of prime candidates
  let Kmax = uint64(num-2) div uint64(modp5) + 1 # maximum number of resgroups for num
  
  echo("prime candidates = ", maxpcs, "; resgroups = ", Kmax)
  
  let sqrtN = int(sqrt float64(num))
  
  sozP7(sqrtN)                     # get pcnt and primes <= sqrt(nun)
  
  echo("create next[", rescnt, "x", pcnt, "] array")
  
  next = newSeq[uint64](rescnt*pcnt)  # create size of the global next[] table
  next_init()                      # load with first prime multiples resgroups
  
  primecnt = 3                     # 2,3,5 the P5 excluded primes count
  var Kn: int = KB                 # set sieve resgroups size to segment size
  var Ki = 0'u64                   # starting resgroup index for first segment
  
  echo("perform segmented SoZ")
  
  while Ki < Kmax:                 # for KB size resgroup slices up to Kmax
    if Kmax-Ki < uint64(KB): Kn = int(Kmax-Ki) # to set last segment size
    segsieve(Kn)                   # sieve primes for current segment
    #printprms(Kn,Ki)              # print primes for the segment (optional)
    Ki += uint64(KB)               # first resgroup index for next seg slice
  
  var lprime = 0'u64               # get last prime and primecnt <= val
  modk = uint64(modp5) * (Kmax-1)  # mod for last resgroup in last segment
  var b = Kn-1                     # number of bytes|resgroup in last segment
  r = rescnt-1                     # set to last restrack in resgroup
  while true:                      # repeat until last prime determined
    if (int(seg[b]) and (1 shl r)) == 0:
      lprime = modk + uint64(residues[r+1])  # numerate the prime value
      if lprime <= num: break      # if <= num exit from loop with lprime
      primecnt -= 1                # else reduce primecnt if prime > num
                                   # reduce restrack, setup next resgroup
    r -= 1; if r < 0: (r = rescnt-1; modk -= modp5; b -= 1) # if necessary
  
  echo("last segment = ", Kn, " resgroups; segments iterations = ", Ki div uint64(KB))
  echo("total primes = ", primecnt, "; last prime = ", lprime)

ssozp5()

LeuGim (orginal) [2017-09-02T19:35:47+02:00] view original

A user may not know, what is terminal (seriously). And this won't work on Windows (quite widely used thing). Probably a couple of links to some articles (somewhere on web, at some OS-related forums), one for POSIX and one for Windows, would give more for less effort.

Anyway everything cannot be covered - some assumptions on the user's knowledge always remain; and too much intermixing of essential information with additional elucidations may distract the attention.

mikra (orginal) [2017-09-02T22:09:17+02:00] view original

I also agree with @jzakiya. I personally have only experience with the windows env (I have to use it on daily base (job)) - I used linux since 1994 for about 10 years but finally I throwed it away cause of the awful desktop. Today I cant speak about it because of lacking experience; I never tried it again.

The nim installer on windows works very well and out of the box (no extra settings needed). If you like to use your own gcc I recommend the tdm-gcc (works also out of the box but you have to set your cli-path). Also It works without tweaks with the cl.exe (VS2017 community edition for instance) I never got nim running with the gcc of the mingw-installation-manager (the gcc compiler check from nim always failed). I think the docs are very good but sometimes outdated (for instance the examples) but for me its normal. I searched also a while for the nimscript feature of the compiler (the compiler help says nothing about that) and within the nimscript-docpage its a litte bit hidden).

The overall feeling for me is that Nim is a very very nice designed and architected piece of software (design by committee is awful) and not burdened with any tweaks. the nimcode itself is fun to read (especially from others and thats not normal for any computer language). And also the new book is very good and worth reading. The idea of the integrated build-system is a big plus (I hate ant, maven and so on especially when the buildscript is more complex than the software itself which needs to be build)

dom96 (orginal) [2017-09-03T00:00:43+02:00] view original

It's these type of little things that cause unnecessary frustrations and shy people away from using new things.
Please think about the things that could potentially go wrong, then create instructions and documentation that will prevent them.

We do try, but we don't have the time to ensure everything is 100%. I think I need to create a page/article about this because I'm finding myself saying this a lot.

Please understand that Nim is a volunteer-run open source project. We need you to help us improve it. This isn't to say that I don't appreciate your message, you are giving us feedback on how things can be improved which is already much better than just abandoning Nim without saying a word. But can we step things up a notch and all act on our own feedback? These sorts of improvements are a perfect first contribution. All you need to do is edit the website and create a PR: https://github.com/nim-lang/website.

By the way, I've created choosenim precisely to make installation as easy as possible. I'd appreciate if you could try it out: https://github.com/dom96/choosenim#installation

jacmoe (orginal) [2017-09-03T00:52:24+02:00] view original

I agree - Nim is a community driven project, like many open source projects, and it gets better only if you scratch your own itch. Small improvements to the documentation is, like dom96 says, a perfect way to start contributing. :)

[Edit] I agree that not everyone is command-line ninjas, and I think a link or two to good docs on setting the PATH on various systems elsewhere on the internet would be a good addition to the build instructions.

I don't think you should expect anyone else to add that to the docs for you. I wouldn't bother, or even consider that information missing, simply because I know how to do this already. [/Edit]

mratsim (orginal) [2017-09-04T09:34:33+02:00] view original

Hey, welcome to Nim,

If you like learning with mathematical challenges, I started Nim and many other languages with Project Euler, which gives you small mathematical challenges for your programming languages craving.

https://projecteuler.net/

Here are my solutions to the first 10 problems including a fast packed bitarray-based Sieve of Eratosthenes with OpenMP parallel loop.

I compared it to Rosettacode sieves and primesieve/primegen (C/C++) and various Haskell solutions and even Sieve of Atkins, on my machine it is 1.1 to 2x faster. Probably due to better cache locality.

jzakiya (orginal) [2017-09-04T18:59:27+02:00] view original

I tried the run your code but the compiler doesn't see the euler libraries.


eulerprimes.nim(24, 6) Error: cannot open 'lib_euler_math'

How do I get all the libraries to get it to run?

I'd be interested in benchmark times. Can you post your hardware specs and the times for counting the primes <= N, for for N = 10^9, 10^10, and 10^11?

I don't know if you've seen my sequential SSoZ code, but I'd be interested in how they compare. I'm still learning Nim, and don't know the correct way to implement a parallel version yet, which if done correctly will be much faster. Studying your code I've already learned some Nim stuff I didn't know.

Here again are the gists to some reference versions of the sequential SSoZ using the P5 prime generator.

ssozp5.nim

Find the primes <= N, using sequential Segmented Sieve of Zakiya (SSoZ) with P5 prime generator, written in Nim

https://gist.github.com/jzakiya/94670e6144735eb0041919f633d6011c

twinprimes_ssozp5.nim

Find Twin Primes <= N, using sequential Segmented Sieve of Zakiya (SSoZ) with P5 prime generator, written in Nim

https://gist.github.com/jzakiya/776b7aae3126168b4cad90de9adc4961

nthprime_ssozp5.nim

Find Nth Primes, using sequential Segmented Sieve of Zakiya (SSoZ) with P5 prime generator, written in Nim

https://gist.github.com/jzakiya/aff4f2e38f9b0f833955b4cc391de3d4

mratsim (orginal) [2017-09-04T20:00:43+02:00] view original

You probably need to specify a path while importing. For example, running from ./bin "prime_bench.nim" file:

from ../src/lib/lib_euler_primes import primeSieve
from os import commandLineParams
from strutils import parseUInt

let arguments = commandLineParams()

echo arguments[0].parseUInt.primeSieve.len


$ time ./bin/prime_bench 1_000_000_000
50847534

real    0m6.478s
user    0m6.069s
sys     0m0.392s


$ time ./bin/prime_bench 10_000_000_000
455052511

real    1m33.301s
user    1m11.694s
sys     0m14.762s

Compilation was done with -d:release. I tried with march=native but it didn't seem to help.

This is on a MacBook Pro 13 2015, so i5-5257U. You can compare it to others on Notebookcheck.

jzakiya (orginal) [2017-09-04T22:30:37+02:00] view original

Hey thanks.

I'll try to play around with the settings to get your code to compile when I get some time.

On my System76 Gazelle laptop, with I7-6700HQ, 2.6-3.5GHz, 16GB ram, running in a VB VM for Manjaro KDE, using Nim 0.17.0 compiled with: $ nim c --cc:clang --d:release ssozp5.nim

I get these times, running on a noisy system (multiple browsers, et al, open) for the sequential version of SSoZ using P5 prime generator.


[jzakiya@jabari-pc nim]$ time ./ssozp
Enter integer number: 1_000_000_000
segment has 262144 bytes and 262144 residues groups
prime candidates = 266666665; resgroups = 33333334
create next[8x3398] array
perform segmented SoZ
last segment = 41046 resgroups; segment slices = 128
total primes = 50847534; last prime = 999999937

real    0m6.915s
user    0m0.550s
sys     0m0.007s
[jzakiya@jabari-pc nim]$ time ./ssozp5
Enter integer number: 10_000_000_000
segment has 262144 bytes and 262144 residues groups
prime candidates = 2666666665; resgroups = 333333334
create next[8x9589] array
perform segmented SoZ
last segment = 148310 resgroups; segment slices = 1272
total primes = 455052511; last prime = 9999999967

real    0m11.243s
user    0m5.260s
sys     0m0.010s
[jzakiya@jabari-pc nim]$ time ./ssozp5
Enter integer number: 100_000_000_000
segment has 262144 bytes and 262144 residues groups
prime candidates = 26666666665; resgroups = 3333333334
create next[8x27290] array
perform segmented SoZ
last segment = 172374 resgroups; segment slices = 12716
total primes = 4118054813; last prime = 99999999977

real    1m7.795s
user    0m59.667s
sys     0m0.027s

jzakiya (orginal) [2017-09-05T08:32:13+02:00] view original

The times I previously posted were from a VirtualBox VM, limited to 4GB of ram, and in a "noisy" environment. The times here are derived from my host OS (PCLinuxOS) with full access to the system's 16GB of ram, in a "quiet" environment, i.e. I closed everything and rebooted, then just opened a terminal and ran the timing tests.


[jzakiya@localhost nim]$ time ./ssozp5
Enter integer number: 1_000_000_000
segment has 262144 bytes and 262144 residues groups
prime candidates = 266666665; resgroups = 33333334
create nextp[8x3398] array
perform segmented SoZ
last segment = 41046 resgroups; segment slices = 128
total primes = 50847534; last prime = 999999937

real    0m6.928s
user    0m0.393s
sys     0m0.000s

-----------------------------------------------------------
[jzakiya@localhost nim]$ time ./ssozp5
Enter integer number: 10_000_000_000
segment has 262144 bytes and 262144 residues groups
prime candidates = 2666666665; resgroups = 333333334
create nextp[8x9589] array
perform segmented SoZ
last segment = 148310 resgroups; segment slices = 1272
total primes = 455052511; last prime = 9999999967

real    0m5.355s
user    0m4.247s
sys     0m0.000s

-----------------------------------------------------------
[jzakiya@localhost nim]$ time ./ssozp5
Enter integer number: 100_000_000_000
segment has 262144 bytes and 262144 residues groups
prime candidates = 26666666665; resgroups = 3333333334
create nextp[8x27290] array
perform segmented SoZ
last segment = 172374 resgroups; segment slices = 12716
total primes = 4118054813; last prime = 99999999977

real    0m48.434s
user    0m46.852s
sys     0m0.008s

-----------------------------------------------------------
[jzakiya@localhost nim]$ time ./ssozp5
Enter integer number: 1_000_000_000_000
segment has 262144 bytes and 262144 residues groups
prime candidates = 266666666665; resgroups = 33333333334
create nextp[8x78495] array
perform segmented SoZ
last segment = 150870 resgroups; segment slices = 127157
total primes = 37607912018; last prime = 999999999989

real    11m0.884s
user    10m57.985s
sys     0m0.010s

I also wanted to show that the math that forms the foundation of the Sieve of Zakiya (SoZ) reduces the number of prime candidates to sieve primes from as you use more "efficient" Strictly Prime (SP) generators. This makes the SoZ algorithmically possibly the fastest prime sieve.

To illustrate this, below are comparisons of the sequential SoZ between the P5 and P7 prime generators. Because the P7 generator sieves through 8/30 (26.67%) of all integers, compared to 2/6 (33.33%) for P5, it greatly reduces the number of prime candidates it needs to sieve through than P5.

These versions just count the primes <= N, and don't numerate them into a list, to demonstrate the true performance of the prime sieve itself.

Again, these timings were derived on my host OS in a "quiet" environment, directly after rebooting.


[jzakiya@localhost nim]$ time ./sozp5d
Enter integer number: 1_000_000_000
50847534

real    0m12.074s
user    0m3.738s
sys     0m0.054s

[jzakiya@localhost nim]$ time ./sozp7d
Enter integer number: 1_000_000_000
50847534

real    0m6.938s
user    0m3.055s
sys     0m0.048s

--------------------------------------------
[jzakiya@localhost nim]$ time ./sozp5d
Enter integer number: 10_000_000_000
455052511

real    0m43.221s
user    0m40.365s
sys     0m0.554s

[jzakiya@localhost nim]$ time ./sozp7d
Enter integer number: 10_000_000_000
455052511

real    0m38.520s
user    0m35.655s
sys     0m0.496s

Because the "classical Sieve of Eratosthenes" searches through the entire number space up to some N, it is always inherently slower than the SoZ method. The ways to increase its performance is to reduce the number space it sieves, by eliminating the even numbers (a 50% reduction of the number space) and applying various wheel and other reduction techniques to eliminate more of the sieved number space.

The SoZ inherently works on a reduced number space that can theoretically be made as small as desired by the choice of the prime generator used to perform the sieve.

The SoZ is also an inherently parallelizable algorithm, which can take advantage of multicore|threaded systems and CUDA|OpenMP implementations, and others with GPUs (graphic processing units).

Mirror of forum.nim-lang.org

3095 :: Nim newbie request/challenge