nimforum mirror - Twinprimes generator that showcases Nim

jzakiya (orginal) [2018-03-27T21:23:22+02:00] view original

It's been a long and interesting journey, but I've finally gotten this to a point where I feel I can release it.

This is a twinprimes generator, that I did an initial version of back in 2014|5 in C++. Over time, I did a C++ parallel version using OpenMP. Then a few months ago, at the beginning of 2018, I started looking at this again (I can't remember the specifics of why) and said, man, I can do better than that now. I had been reimplementing the straight SSoZ in Nim, and trying to figure out the best architecture. It turns out in this exercise, I figured out how to use better math to allow me to create a much simpler, and MANY TIMES faster architecture for parallel implementation.

What you see here maybe the fastest twinnprimes generator if this other program that claimed to be the fastest really is.

There is a program called primesieve (https://primesieve.org/), which is open source, written in C++, whose author says its the fastest programs that generates primes. It is a very nice, and well written, program, and I've used it to learn how to do good C++ programming. In the past, I've written him about my primesieve method(s) (SoZ, SSoZ). It's easy to download and run. I just get the console version of the compiled version.

So I use primesieve (ver 6.2 here) as the reference to check out my methods|techniques. After some head scratching, and a few Aha moments, I've finally figured out how to bang on Nim to make my program faster than primesieve for generating twinprimes.

Below are time comparisons between primesieve and mine - twinprimes_ssoz. You see as the numbers get bigger then twinprimes_ssoz becomes increasingly faster. It's optimum math now (finally) married to an optimum implementation.

To get these times, I ran both programs on a quiet system, where I shutdown and rebooted, with no other apps loaded other than a terminal, to run both in, and an editor to record their times. If you have other programs that operate in the background that use threads (e.g. browsers) or eat up memory, the (relative) times will be different|slower for both.

To run primesieve to generate twinprimes from a console do e.g.: $ ./primesieve 7e9 -c2

To run twinprimes_ssoz just do: $ ./twinprimes_ssoz<cr> and then enter number as: 7000000000

It was compiled as follows:


$ nim c --cc:gcc --d:release --threads:on --gc:none twinprimes_ssoz.nim

Below are the times I got (in seconds) on my system: System76 laptop, with an Intel I7 6700HQ cpu, 2.6-3.5 GHz clock, with 8 threads, and 16GB of memory, run on a 64-bit PCLinuxOS Linux distro, compiled using gcc 4.9.2, with Nim 0.17.2. Here both programs just count the number of twinprimes (twinprimes_ssoz also prints the last twinprime value).


Input Number   |  twinprimes_ssoz  |  primesieve
--------------------------------------------------
 < = 1e6       |        0.000      |      0.000
5e6            |        0.001      |      0.001
1e7            |        0.001      |      0.001
5e7            |        0.003      |      0.004
1e8            |        0.005      |      0.005
5e8            |        0.025      |      0.024
1e9            |        0.054      |      0.050
5e9            |        0.222      |      0.262
1e10           |        0.483      |      0.572
5e10           |        2.693      |      3.112
1e11           |        5.581      |      6.475
5e11           |       29.371      |     35.675
1e12           |       63.274      |     75.638
5e12           |      408.348      |    448.066
1e13           |      914.001      |    953.423

Value to Nim

This code can be used to benefit|showcase Nim in various ways.

show that Nim can be a player in the numerical analysis arena, particularly for parallel algorithms

can be used as a standard benchmark to evaluate new versions of Nim for improvements|regressions

can be used to talk stuff to to C++, et al, programmers :-)

Future Development

After taking some rest, I plan to implement this new architecture inn C++|OpenMP so I can compare performance. I'll report the results afterwards. I also plan to update my paper The Segmented Sieve of Zakiya (SSoZ) (released in 2104) to include the new math, architecture, and coding improvements.

Below are references for learning about the SoZ and SSoZ.

The Segmented Sieve of Zakiya (SSoZ)

https://www.scribd.com/doc/228155369/The-Segmented-Sieve-of-Zakiya-SSoZ

Primes Utils Handbook

https://www.scribd.com/document/266461408/Primes-Utils-Handbook

Twinprimes information and list

http://mathworld.wolfram.com/TwinPrimes.html

But I encourage, implore, welcome, people to beat on the code to improve it and make it faster. What idioms are faster than the ones I used, etc. Also, I thought I saw mentioned herein that OpenMP can be used with Nim. If so, it would be interesting if using it in Nim would make the code faster. Whoever knows how to do that, go for it!

Below is the code and its gist (it's 317 loc, with ~60 separate loc of comments, compare that to primesieve's code size):

https://gist.github.com/jzakiya/6c7e1868bd749a6b1add62e3e3b2341e


#[
 This Nim source file is a multiple threaded implementation to perform an
 extremely fast Segmented Sieve of Zakiya (SSoZ) to find Twin Primes <= N.
 
 Based on the size of input value N, it dynamically selects the best PG to
 use from PGs P5, P7, P11, and P13, and then sets the optimum segment size.
 
 This code was developed on a System76 laptop with an Intel I7 6700HQ cpu,
 2.6-3.5 GHz clock, with 8 threads, and 16GB of memory. I suspect parameter
 tuning may have to be done on other hadware systems (ARM, PowerPC, etc) to
 achieve optimum performance on them. It was tested on various Linux 64 bit
 distros, native and in Virtual Box, using 8 or 4 threads, or 16|4GB of mem.
 
 At time of writing, verified using Nim: 0.17.2, 0.18.0
 
 The code was compiled using these compiler directives|flags. Not usng GC
 produces smaller executable, and maybe little faster. Try w/wo to see
 difference on your system. For optimum performance use gcc over clang.
 
 $ nim c --cc:gcc --d:release --threads:on --gc:none twinprimes_ssoz.nim
 
 Then run executable: $ ./twinprimes_ssoz <cr>, and enter value for N.
 As coded, input values cover the range: 13 -- 2^64-1
 
 This source file, and updates, will be available here:
 https://gist.github.com/jzakiya/6c7e1868bd749a6b1add62e3e3b2341e
 
 Related code, papers, and tutorials, are available here:
 https://www.scribd.com/doc/228155369/The-Segmented-Sieve-of-Zakiya-SSoZ
 https://www.scribd.com/document/266461408/Primes-Utils-Handbook
 
 Use of this code is free subject to acknowledgment of copyright.
 Copyright (c) 2017 Jabari Zakiya -- jzakiya at gmail dot com
 Version Date: 2018/03/18
 Version Date: 2018/03/28
 
 This code is provided under the terms of the
 GNU General Public License Version 3, GPLv3, or greater.
 License copy/terms are here:  http://www.gnu.org/licenses/
]#

import math                   # for sqrt function
import strutils, typetraits   # for number input
import times, os              # for timing code execution
import osproc                 # for getting threads count
import threadpool             # for parallel processing
{.experimental.}              # required to use 'parallel' (<= 0.18.x)

# Compute modular inverse a^(-1) of a to base b, e.g. a*(a^-1) mod b = 1.
proc modinv(a0, b0: int): int =
  var (a, b, x0) = (a0, b0, 0)
  result = 1
  if b == 1: return
  while a > 1:
    let q = a div b
    a = a mod b
    swap a, b
    result -= q * x0
    swap x0, result
  if result < 0: result += b0

# Create constant parameters for given PG at compile time.
proc genPGparameters(prime: int): (int, int, int, seq[int], seq[int], seq[int]) =
  echo("generating parameters for P", prime)
  let primes = [2, 3, 5, 7, 11, 13, 17, 19, 23]
  var modpg = 1
  for prm in primes: (modpg *= prm; if prm == prime: break)  # PG's mdoulus
  
  var residues: seq[int] = @[]    # generate PG's residue values here
  var (pc, inc) = (1, 4)
  while pc < modpg: (pc += inc; inc = inc xor 0b110; if gcd(modpg, pc) == 1: residues.add(pc))
  let rescnt = residues.len       # PG's residues count
  
  var restwins: seq[int] = @[]    # extract twinpair residues here
  var j = 0
  while j < rescnt-1:
    if residues[j]+2 == residues[j+1]:
      restwins.add residues[j]; restwins.add residues[j+1]; j += 1
    j += 1
  let rescntp = restwins.len     # twinpair residues count
  
  var inverses: seq[int] = @[]   # create PG's residues inverses here
  for res in residues: inverses.add(modinv(res,modpg))
  
  result = (modpg, rescnt, rescntp, residues, restwins, inverses)

# Generate at compile time the parameters for PGs P5-P13.
const parametersp5  = genPGparameters(5)
const parametersp7  = genPGparameters(7)
const parametersp11 = genPGparameters(11)
const parametersp13 = genPGparameters(13)

# Global parameters
var
  pcnt = 0             # number of primes from r1..sqrt(N)
  num = 0'u64          # adjusted (odd) input value
  primecnt = 0'u64     # number of twinprimes <= N
  nextp: seq[uint64]   # table of regroups vals for primes multiples
  primes: seq[int]     # list of primes r1..sqrt(N)
  seg: seq[uint8]      # segment byte array to perform ssoz
  KB = 0               # segment size for each seg restrack
  cnts: seq[uint]      # hold twinprime counts for seg bytes
  pos: seq[int]        # convert residue val to its residues index val
                       # faster than `residues.find(residue)`
  modpg:    int        # PG's modulus value
  rescnt:   int        # PG's residues count
  rescntp:  int        # PG's twinpairs residues count
  residues: seq[int]   # PG's list of residues
  restwins: seq[int]   # PG's list of twinpair residues
  resinvrs: seq[int]   # PG's list of residues inverses
  Bn:       int        # segment size factor for PG and input number

# Select at runtime best PG and segment size factor to use for input value.
# These are good estimates derived from PG data profiling. Can be improved.
proc selectPG(num: uint) =
  if num < 10_000_000:
    (modpg, rescnt, rescntp, residues, restwins, resinvrs) = parametersp5
    Bn = 16
  elif num < 1_100_000_000'u:
    (modpg, rescnt, rescntp, residues, restwins, resinvrs) = parametersp7
    Bn = 32
  elif num < 35_500_000_000'u:
    (modpg, rescnt, rescntp, residues, restwins, resinvrs) = parametersp11
    Bn = 64
  else:
    (modpg, rescnt, rescntp, residues, restwins, resinvrs) = parametersp13
    if   num > 7_000_000_000_000'u: Bn = 384
    elif num > 2_500_000_000_000'u: Bn = 320
    elif num >   250_000_000_000'u: Bn = 196
    else: Bn = 96
  cnts = newSeq[uint](rescntp div 2)           # twinprime sums for seg bytes
  pos  = newSeq[int](modpg)                    # create modpg size array to
  for i in 0..rescnt-1: pos[residues[i]-2] = i # convert residue val -> indx

# Compute the list of primes r1..sqrt(input_num), and store in global
# 'primes' array, and store their count in global var 'pcnt'.
# Any algorithm (fast|small) is usable. Here the SoZ for the PG is used.
proc sozpg(val: int | int64) =      # Use PG to finds primes upto val
  let md = modpg                    # PG's modulus value
  let rscnt = rescnt                # PG's residue count
  let res = residues                # PG's residues list
  
  let num = (val-1) or 1            # if val even then subtract 1
  var k = num div md                # compute its residue group value
  var modk = md * k                 # compute the resgroup base value
  var r = 0                         # from 1st residue in num's resgroup
  while num >= modk+res[r]: r += 1  # find last pc val|position <= num
  let maxpcs = k * rscnt + r        # max number of prime candidates <= num
  var prms = newSeq[bool](maxpcs)   # array of prime candidates set False
  
  let sqrtN =int(sqrt float64(num)) # compute integer sqrt of input num
  modk = 0; r = -1; k = 0           # initialize sieve parameters
  
  # mark the multiples of the primes r1..sqrtN in 'prms'
  for prm in prms:                  # from list of pc positions in prms
    r += 1; if r == rscnt: (r = 0; modk += md; k += 1)
    if prm: continue                # if pc not prime go to next location
    let prm_r = res[r]              # if prime save its residue value
    let prime = modk + prm_r        # numerate the prime value
    if  prime > sqrtN: break        # we're finished if it's > sqrtN
    let prmstep = prime * rscnt     # prime's stepsize to mark its mults
    for ri in res:                  # mark prime's multiples in prms
      let prod = prm_r * ri - 2     # compute cross-product for r|ri pair
      # compute resgroup val of 1st prime multiple for each restrack
      # starting there, mark all prime multiples on restrack upto end of prms
      var prm_mult = (k*(prime + ri) + prod div md)*rscnt + pos[prod mod md]
      while prm_mult < maxpcs: prms[prm_mult] = true; prm_mult += prmstep
  
  # prms now contains the nonprime positions for the prime candidates r1..N
  # extract primes into global var 'primes' and count into global var 'pcnt'
  primes = @[]                      # create empty dynamic array for primes
  modk = 0; r = -1                  # initialize loop parameters
  for prm in prms:                  # numerate|store primes from pcs list
    r += 1; if r == rscnt: (r = 0; modk += md)
    if not prm: primes.add(modk + res[r]) # put prime in global 'primes' list
  pcnt = primes.len                       # set global count of primes

# For 'nextp' 'row_index' restrack for given residue, for primes r1..sqrt(N),
# init each col w/1st prime multiple resgroup for each prime along restrack.
proc resinit(row_indx, res: int) {.gcsafe.} =
 {.gcsafe.}:                              # for given residue 'res'
    let row = row_indx * pcnt             # along its restrack row in 'nextp'
    for j, prime in primes:               # for each primes r1..sqrt(N)
      let k = (prime-2) div modpg         # find the resgroup it's in
      let r = (prime-2) mod modpg + 2     # and its residue value
      let ri = (res*resinvrs[pos[r-2]]-2) mod modpg + 2 # compute the ri for r
      let prod = r * ri - 2               # compute residues cross-product
      # compute|store 1st prime mult resgroup at col j for prime, for 'res'
      nextp[row + j] = uint(k*(prime + ri) + prod div modpg)

# Print twinprimes for given twinpair for given segment slice.
# Primes will not be displayed in sorted order, collect|sort later for that.
proc printprms(Kn: int, Ki: uint64, indx: int)= # starting at seg resgroup Ki
  var modk = Ki * modpg.uint         # compute its base num
  let s_row = indx * KB              # set seg row index for given twinpair
  let res = restwins[indx * 2 + 1]   # get upper twinpair residue value
  for k in 0..Kn-1:                  # then for each of Kn seg resgroups
    if int(seg[s_row + k]) == 0:     # if both lsb bits in byte are prime
      if modk + res.uint <= num:     # and upper twinprime <= num
        echo(modk+res.uint-1)        # print one twinprime mid val per line
    modk += modpg.uint

# First init twinpair's 1st prime mults for their residues rows in 'nextp'.
# Perform prime sieve on selected twinpair restracks in seg of Kn resgroups.
# Set lsbs in bytes to '1' along restrack for nonprime pc resgroups where each
# of the Kn resgroup bytes along the restrack represent its prime candidates.
# Update 'nextp' array for that restrack for next segment slices accordingly.
# Then compute twinprimes count for segment, store in 'cnts' array for row.
proc twins_sieve(Kmax: uint, indx, Ks: int) {.gcsafe.} =
 {.gcsafe.}:
  let i = indx * 2                     # lower lsb seg row index for twinpair
  resinit(i,   restwins[i])            # init 1st prime mults for lower lsb
  resinit(i+1, restwins[i+1])          # init 1st prime mults for upper lsb
  var sum = 0'u                        # init primes cnt for this seg byte row
  var Ki  = 0'u                        # 1st resgroup seg val for each slice
  let s_row = indx * KB                # set seg row address for this twinpair
  while Ki < Kmax:                     # for Ks resgroup size slices upto Kmax
    let Kn = min(Ks, int(Kmax-Ki))     # set segment slice resgroup length
    for b in s_row..s_row+KB-1: seg[b]=0 # set all seg restrack bits to prime
    for r in 0..1:                     # for 2 lsbs for twinpair bits in byte
      let row = (i + r) * pcnt         # set address to its restrack in 'nextp'
      let biti = uint8(1 shl r)        # set residue track bit mask
      for j in 0..pcnt-1:              # for each prime index r1..sqrt(N)
        if nextp[row + j] < Kn.uint:   # if 1st mult resgroup is within 'seg'
          var k = int(nextp[row + j])  # starting from this resgroup in 'seg'
          let prime = primes[j]        # for this prime
          while k < Kn:                # for each primenth byte to end of 'seg'
            seg[s_row+k] = seg[s_row+k] or biti # mark restrack bit nonprime
            k += prime                 # compute next prime multiple resgroup
          nextp[row + j] = uint(k-Kn)  # save 1st resgroup in next eligible seg
        else: nextp[row+j] -= Kn.uint  # do if 1st mult resgroup not within seg
    for k in 0..Kn-1: (if seg[s_row+k] == 0: sum += 1) # sum bytes with twins
    #printprms(Kn, Ki, indx)            # display twinprimes for this twinpair
    Ki += Ks.uint                      # set 1st resgroup val of next seg slice
  cnts[indx] = sum                     # save prime count for this seg twinpair

# Perform sieve on twinpair residues for segment of Kn resgroups in parallel.
# Then add seg twinprimes count for each seg tp restrack total to 'primecnt'.
proc segsieve(Kmax: uint) =          # perform sieve on all segments
  let Ks = KB                        # make default seg size immutable
  let twinpairs = rescntp div 2      # for number of seg twinpair byte rows
  parallel:                          # perform in parallel
    for indx in 0..twinpairs-1:      # for nextp|seg twinpair row indexes,
      spawn twins_sieve(Kmax,indx,Ks)# sieve the selected twinpair restracks
  sync()                             # then count the twinprimes in the seg
  for sum in cnts: primecnt += sum   # update 'primecnt' w/seg twinprime cnts

proc twinprimes_ssoz() =             # Use selected Pn prime generator for SSoZ
  stdout.write "Enter integer number: "
  let val = stdin.readline.parseBiggestUInt # find primes <= val (13..2^64-1)
  
  echo("threads = ", countProcessors())
  let ts = epochTime()             # start timing sieve setup execution
  
  num = uint64((val-1) or 1)       # if val even subtract 1
  selectPG(num.uint)               # select PG and seg factor for input number
  let modpg = modpg.uint           # to reduce casting hell noise
  var k = num div modpg            # compute its resgroup
  var modk = modpg * k.uint        # compute its base num
  let Kmax = (num-2) div modpg + 1 # maximum number of resgroups for num
  
  let B = Bn * 1024                # set seg size to optimize for selected PG
  KB = if Kmax.int < B: (Kmax.int + 1) else: B  # min num of segment resgroups
  seg = newSeq[uint8]((rescntp div 2)*KB) # create [twinpairs x KB] bytes array
  
  echo("segment is [", (rescntp div 2), " x ", KB, "] bytes array")
  
  # This is not necessary for running the program but provides information
  # to determine the 'efficiency' of the used PG: (num of primes)/(num of pcs)
  # The closer the ratio is to '1' the higher the PG's 'efficiency'.
  var r = 0                        # starting with first residue
  while num.uint >= modk+restwins[r].uint: r += 1 # find last tp index <= num
  let maxpcs = k*rescntp.uint + r.uint  # maximum number of twinprime pcs
  let maxpairs = maxpcs div 2      # maximum number of twinpair candidates
  
  echo("twinprime candidates = ", maxpairs, "; resgroups = ", Kmax)
  
  let sqrtN=int(sqrt float64(num)) # compute integer sqrt of input num
  sozpg(sqrtN)                     # compute pcnt and primes <= sqrt(num)
  
  nextp = newSeq[uint64](rescntp*pcnt) # create size of global 'nextp' array
  
  echo("created|initialized nextp[", rescntp, " x ", pcnt, "] array")
  let te = (epochTime()-ts)        # sieve setup time
  echo("setup time = ", te.formatFloat(ffDecimal, 3), " secs")
  
  primecnt = if modpg > 210'u: 3 else: 2   # for 1st tps (3,5)|(5,7)|(11,13)
  
  echo("perform twinprimes ssoz sieve")    # start doing ssoz now
  
  let t1 = epochTime()             # start timing ssoz sieve execution
  segsieve(Kmax.uint)              # perform total twinprime ssoz sieve
  
  let Kn = min(KB, (Kmax.int mod KB)) # set number of resgroups in last slice
  var lprime = 0'u64               # to store last twinprime value <= num
  modk = (Kmax-1).uint * modpg     # set mod for last resgroup in last segment
  k = uint(Kn-1)                   # set val for last resgroup in last segment
  let lasti = rescntp div 2 - 1    # set val for last seg twinpair row index
  r = lasti                        # starting from last resgroup twinpair byte
  while true:                      # step backwards from end of last resgroup
    var row_i = r * KB             # set 'seg' byte row for twinpair restrack
    if int(seg[row_i + k.int]) == 0: # if both twinpair bits in byte are prime
      lprime = modk + restwins[r*2 + 1].uint # numerate the upper twinprime val
      if lprime <= num: break      # if its <= num its the last prime, so exit
      primecnt -= 1                # else reduce primecnt, keep backtracking
                                   # reduce restrack, next resgroup if needed
    r -= 1; if r < 0: (r = lasti; modk -= modpg;  k -= 1)
  
  let t2 = (epochTime()-t1)        # sieve execution time
  
  echo("sieve time = ", t2.formatFloat(ffDecimal, 3), " secs")
  echo("last segment = ", Kn, " resgroups; segment slices = ", Kmax div KB.uint+1)
  echo("total twins = ", primecnt, "; last twin = ", lprime-1, "+/-1")
  echo("total time = ", (t2+te).formatFloat(ffDecimal, 3), " secs\n")

twinprimes_ssoz()

mratsim (orginal) [2018-03-28T01:17:15+02:00] view original

Congrats, that's really an achievement !

OpenMP is easy, you can use it like this, the following defines a new -d:openmp compilation flag:

when defined(openmp):
  {.passC: "-fopenmp".}
  {.passL: "-fopenmp".}

# Note: compile with stacktraces off or put
# when defined(openmp): {.push stacktrace: off.}
# and
# when defined(openmp): {.pop.}
# around OpenMP proc.
# Heap allocation within a loop will crash OpenMP
# Nim stacktraces allocate a string :/
# Also avoid yourseq.add within an OpenMP loop as that could trigger the GC.
# Preallocate and use yourseq[i] = foo

var foo = newSeq[int](1000)

{.push stacktrace: off.}
for i in 0 || foo.len - 1:
  foo[i] = i
{.pop.}

# Alternatively

{.push stacktrace: off.}
for i in `||`(0, foo.len - 1, "simd if(ompsize > 500)"):
  foo[i] = i
{.pop.}

# simd will tell the compiler to use the SSE/AVX instructions
# if(ompsize > 500) will only trigger OpenMP if the loop is bigger than 500
# There are lots of other possibilities

Note that OpenMP is sometimes a bit of a hard beast to tame if you work on structures less than 64 bytes (the size of a cache line:

The work is split on several threads and threads should not work on the same variable (or use OpenMP atomics) and you might also encounter false sharing/cache invalidation: if you have an array of 8 integers, it fits in a cache line (64B), a thread trying to update it will invalidate the cache line for all other threads and ad repetitum. Result will be slower than single threaded.

jzakiya (orginal) [2018-04-03T23:30:41+02:00] view original

New and Improved (current) version, with significantly reduced memory footprint as numbers get larger.

Now I create|initialize nextp arrays of 1st prime multiples in each thread, which gets gc'd (garbage collected) at end of thread, thus using a much lower constant runtime memory. (I could also generate|use the segment memory on a per thread basis too, but at the end I need full last segment memory to find last twinprime and correct count.) For this version I need to compile using gc, so compile as below:


$ nim c --cc:gcc --d:release --threads:on twinprimes_ssoz.nim

Hey @mratsim, I started converting code to C++, and hit a roadblock in passing multiple outputs from a function to multiple inputs, needed to do const paramterspxx = genPGparameters(xx) and in selectPG. And not sure how to compile to deallocate memory from a thread, as C++ doesn't use GC. Also, the information you provided on using OpenMP is way over my head. Do you feel like trying to do an OpenMP implementation in Nim?

Anyway, here's the updated gist file, and source code:

https://gist.github.com/jzakiya/6c7e1868bd749a6b1add62e3e3b2341e


#[
 This Nim source file is a multple threaded implementation to perform an
 extremely fast Segmented Sieve of Zakiya (SSoZ) to find Twin Primes <= N.
 
 Based on the size of input value N, it dynamically selects the best PG to
 use from PGs P5, P7, P11, and P13, and then sets the optimum segment size.
 
 This code was developed on a System76 laptop with an Intel I7 6700HQ cpu,
 2.6-3.5 GHz clock, with 8 threads, and 16GB of memory. I suspect parameter
 tuning may have to be done on other hadware systems (ARM, PowerPC, etc) to
 achieve optimum performance on them. It was tested on various Linux 64 bit
 distros, native and in Virtual Box, using 8 or 4 threads, or 16|4GB of mem.
 
 At time of writing, verified using Nim: 0.17.2, 0.18.0
 
 The code was compiled using these compiler directives|flags. Must use GC.
 For optimum performance use gcc over clang.
 
 $ nim c --cc:gcc --d:release --threads:on twinprimes_ssoz.nim
 
 Then run executable: $ ./twinprimes_ssoz <cr>, and enter value for N.
 As coded, input values cover the range: 13 -- 2^64-1
 
 This source file, and updates, will be available here:
 https://gist.github.com/jzakiya/6c7e1868bd749a6b1add62e3e3b2341e
 
 Related code, papers, and tutorials, are available here:
 https://www.scribd.com/doc/228155369/The-Segmented-Sieve-of-Zakiya-SSoZ
 https://www.scribd.com/document/266461408/Primes-Utils-Handbook
 
 Use of this code is free subject to acknowledgment of copyright.
 Copyright (c) 2017/18 Jabari Zakiya -- jzakiya at gmail dot com
 Version Date: 2018/03/18
 Version Date: 2018/04/07
 
 This code is provided under the terms of the
 GNU General Public License Version 3, GPLv3, or greater.
 License copy/terms are here:  http://www.gnu.org/licenses/
]#

import math                   # for sqrt, gcd, mod functions
import strutils, typetraits   # for number input
import times, os              # for timing code execution
import osproc                 # for getting threads count
import threadpool             # for parallel processing
{.experimental.}              # required to use 'parallel' (<= 0.18.x)

# Compute modular inverse a^(-1) of a to base b, e.g. a*(a^-1) mod b = 1.
proc modinv(a0, b0: int): int =
  var (a, b, x0) = (a0, b0, 0)
  result = 1
  if b == 1: return
  while a > 1:
    let q = a div b
    a = a mod b
    swap a, b
    result -= q * x0
    swap x0, result
  if result < 0: result += b0

# Create constant parameters for given PG at compile time.
proc genPGparameters(prime: int): (int, int, int, seq[int], seq[int], seq[int]) =
  echo("generating parameters for P", prime)
  let primes = [2, 3, 5, 7, 11, 13, 17, 19, 23]
  var modpg = 1
  for prm in primes: (modpg *= prm; if prm == prime: break)  # PG's mdoulus
  
  var residues: seq[int] = @[]    # generate PG's residue values here
  var (pc, inc) = (1, 4)
  while pc < modpg: (pc += inc; inc = inc xor 0b110; if gcd(modpg, pc) == 1: residues.add(pc))
  let rescnt = residues.len       # PG's residues count
  
  var restwins: seq[int] = @[]    # extract twinpair residues here
  var j = 0
  while j < rescnt-1:
    if residues[j]+2 == residues[j+1]:
      restwins.add residues[j]; restwins.add residues[j+1]; j += 1
    j += 1
  let rescntp = restwins.len     # twinpair residues count
  
  var inverses: seq[int] = @[]   # create PG's residues inverses here
  for res in residues: inverses.add(modinv(res,modpg))
  
  result = (modpg, rescnt, rescntp, residues, restwins, inverses)

# Generate at compile time the parameters for PGs P5-P13.
const parametersp5  = genPGparameters(5)
const parametersp7  = genPGparameters(7)
const parametersp11 = genPGparameters(11)
const parametersp13 = genPGparameters(13)

# Global parameters
var
  pcnt = 0             # number of primes from r1..sqrt(N)
  num = 0'u64          # adjusted (odd) input value
  twinscnt = 0'u64     # number of twinprimes <= N
  primes: seq[int]     # list of primes r1..sqrt(N)
  seg: seq[uint8]      # segment byte array to perform ssoz
  KB = 0               # segment size for each seg restrack
  cnts: seq[uint]      # hold twinprime counts for seg bytes
  pos: seq[int]        # convert residue val to its residues index val
                       # faster than `residues.find(residue)`
  modpg:    int        # PG's modulus value
  rescnt:   int        # PG's residues count
  rescntp:  int        # PG's twinpairs residues count
  residues: seq[int]   # PG's list of residues
  restwins: seq[int]   # PG's list of twinpair residues
  resinvrs: seq[int]   # PG's list of residues inverses
  Bn:       int        # segment size factor for PG and input number

# Select at runtime best PG and segment size factor to use for input value.
# These are good estimates derived from PG data profiling. Can be improved.
proc selectPG(num: uint) =
  if num < 10_000_000:
    (modpg, rescnt, rescntp, residues, restwins, resinvrs) = parametersp5
    Bn = 16
  elif num < 1_100_000_000'u:
    (modpg, rescnt, rescntp, residues, restwins, resinvrs) = parametersp7
    Bn = 32
  elif num < 35_500_000_000'u:
    (modpg, rescnt, rescntp, residues, restwins, resinvrs) = parametersp11
    Bn = 64
  else:
    (modpg, rescnt, rescntp, residues, restwins, resinvrs) = parametersp13
    if   num > 7_000_000_000_000'u: Bn = 384
    elif num > 2_500_000_000_000'u: Bn = 320
    elif num >   250_000_000_000'u: Bn = 196
    else: Bn = 96
  cnts = newSeq[uint](rescntp div 2)           # twinprime sums for seg bytes
  pos  = newSeq[int](modpg)                    # create modpg size array to
  for i in 0..rescnt-1: pos[residues[i]-2] = i # convert residue val -> indx

# Compute the list of primes r1..sqrt(input_num), and store in global
# 'primes' array, and store their count in global var 'pcnt'.
# Any algorithm (fast|small) is usable. Here the SoZ for the PG is used.
proc sozpg(val: int | int64) =      # Use PG to finds primes upto val
  let md = modpg                    # PG's modulus value
  let rscnt = rescnt                # PG's residue count
  let res = residues                # PG's residues list
  
  let num = (val-1) or 1            # if val even then subtract 1
  var k = num div md                # compute its residue group value
  var modk = md * k                 # compute the resgroup base value
  var r = 0                         # from 1st residue in num's resgroup
  while num >= modk+res[r]: r += 1  # find last pc val|position <= num
  let maxpcs = k * rscnt + r        # max number of prime candidates <= num
  var prms = newSeq[bool](maxpcs)   # array of prime candidates set False
  
  let sqrtN =int(sqrt float64(num)) # compute integer sqrt of input num
  modk = 0; r = -1; k = 0           # initialize sieve parameters
  
  # mark the multiples of the primes r1..sqrtN in 'prms'
  for prm in prms:                  # from list of pc positions in prms
    r += 1; if r == rscnt: (r = 0; modk += md; k += 1)
    if prm: continue                # if pc not prime go to next location
    let prm_r = res[r]              # if prime save its residue value
    let prime = modk + prm_r        # numerate the prime value
    if  prime > sqrtN: break        # we're finished if it's > sqrtN
    let prmstep = prime * rscnt     # prime's stepsize to mark its mults
    for ri in res:                  # mark prime's multiples in prms
      let prod = prm_r * ri - 2     # compute cross-product for r|ri pair
      # compute resgroup val of 1st prime multiple for each restrack
      # starting there, mark all prime multiples on restrack upto end of prms
      var prm_mult = (k*(prime + ri) + prod div md)*rscnt + pos[prod mod md]
      while prm_mult < maxpcs: prms[prm_mult] = true; prm_mult += prmstep
  
  # prms now contains the nonprime positions for the prime candidates r1..N
  # extract primes into global var 'primes' and count into global var 'pcnt'
  primes = @[]                      # create empty dynamic array for primes
  modk = 0; r = -1                  # initialize loop parameters
  for prm in prms:                  # numerate|store primes from pcs list
    r += 1; if r == rscnt: (r = 0; modk += md)
    if not prm: primes.add(modk + res[r]) # put prime in global 'primes' list
  pcnt = primes.len                       # set global count of primes

# Print twinprimes for given twinpair for given segment slice.
# Primes will not be displayed in sorted order, collect|sort later for that.
proc printprms(Kn: int, Ki: uint64, indx: int)= # starting at seg resgroup Ki
  var modk = Ki * modpg.uint         # compute its base num
  let s_row = indx * KB              # set seg row index for given twinpair
  let res = restwins[indx * 2 + 1]   # get upper twinpair residue value
  for k in 0..Kn-1:                  # then for each of Kn seg resgroups
    if int(seg[s_row + k]) == 0:     # if both lsb bits in byte are prime
      if modk + res.uint <= num:     # and upper twinprime <= num
        echo(modk+res.uint-1)        # print one twinprime mid val per line
    modk += modpg.uint

# For 'nextp' array for given twinpair thread, for twinpair restracks 'i|i+1',
# init each col w/1st prime multiple resgroup, for the primes r1..sqrt(N).
proc resinit(i: int, nextp: seq[uint64]): seq[uint64] =
  var nextp = nextp                       # 1st mults array for this twinpair
  for indx in 0..1:                       # for both twinpair residues
    let row = indx * pcnt                 # along each restrack row in 'nextp'
    let res = restwins[i + indx]          # for this twinpair residue
    for j, prime in primes:               # for each primes r1..sqrt(N)
      let k = (prime-2) div modpg         # find the resgroup it's in
      let r = (prime-2) mod modpg + 2     # and its residue value
      let ri = (res*resinvrs[pos[r-2]]-2) mod modpg + 2 # compute the ri for r
      let prod = r * ri - 2               # compute residues cross-product
      # compute|store 1st prime mult resgroup at col j for prime, for 'res'
      nextp[row + j] = uint(k*(prime + ri) + prod div modpg)
  result = nextp

# Perform in a thread, the ssoz for a given twinpair, along its seg byte row,
# for Kmax resgroups, and max segsize of Ks regroups, for twinpairs at 'indx'.
# First create|init 'nextp' array of 1st prime mults for given twinpair, which
# at end of thread will be gc'd. For sieve, set 2 lsbs in seg byte to '1' for
# primes mults resgroups and update 'nextp' restrack seg slices acccordingly.
# Then compute twinprimes count for segment, store in 'cnts' array for row.
# Can optionally compile to print mid twinprime values generated by twinpair.
proc twins_sieve(Kmax: uint, indx, Ks: int) {.gcsafe.} =
 {.gcsafe.}:
  var (sum, Ki) = (0'u, 0'u)           # init twins cnt|1st resgroup for slice
  let (i, s_row) = (indx*2, indx*KB)   # set twinpair row addrs for nextp|seg
  var nextp = newSeq[uint64](pcnt * 2) # create 1st mults array for twinpair
  nextp = resinit(i, nextp)            # init w/1st prime mults for twinpair
  while Ki < Kmax:                     # for Ks resgroup size slices upto Kmax
    let Kn = min(Ks, int(Kmax-Ki))     # set segment slice resgroup length
    for b in 0..Kn-1: seg[s_row+b] = 0 # set all seg restrack bits to prime
    for biti in 1..2:                  # for 2 lsbs for twinpair bits in byte
      let row  = (biti - 1) * pcnt     # set address to bit's 'nextp' restrack
      for j in 0..pcnt-1:              # for each prime index r1..sqrt(N)
        if nextp[row + j] < Kn.uint:   # if 1st mult resgroup is within 'seg'
          var k = nextp[row + j].int   # starting from this resgroup in 'seg'
          let prime = primes[j]        # for this prime
          while k < Kn:                # for each primenth byte to end of 'seg'
            seg[s_row+k] = seg[s_row+k] or biti.uint8 # mark kth byte nonprime
            k += prime                 # update to next prime multiple resgroup
          nextp[row + j] = uint(k-Kn)  # save 1st resgroup in next eligible seg
        else: nextp[row+j] -= Kn.uint  # do if 1st mult resgroup val > seg size
    for b in 0..Kn-1: (if seg[s_row+b] == 0: sum.inc) # sum bytes with twins
    #printprms(Kn, Ki, indx)            # display twinprimes for this twinpair
    Ki += Ks.uint                      # set 1st resgroup val of next seg slice
  cnts[indx] = sum                     # save twins count for this seg twinpair

# Perform sieve on twinpair residues for segment of Kn resgroups in parallel.
# Then add seg twinprimes count for each seg tp restrack total to 'twinscnt'.
proc segsieve(Kmax: uint) =          # perform sieve on all segments
  let Ks = KB                        # make default seg size immutable
  let twinpairs = rescntp div 2      # for number of seg twinpair byte rows
  parallel:                          # perform in parallel
    for indx in 0..twinpairs-1:      # for nextp|seg twinpair row indexes
      spawn twins_sieve(Kmax,indx,Ks)# sieve the selected twinpair restracks
  sync()                             # then count the twinprimes in the seg
  for sum in cnts: twinscnt += sum   # update 'twinscnt' w/seg twinprime cnts

proc twinprimes_ssoz() =             # Use selected Pn prime generator for SSoZ
  stdout.write "Enter integer number: "
  let val = stdin.readline.parseBiggestUInt # find primes <= val (13..2^64-1)
  
  echo("threads = ", countProcessors())
  let ts = epochTime()             # start timing sieve setup execution
  
  num = uint64((val-1) or 1)       # if val even subtract 1
  selectPG(num.uint)               # select PG and seg factor for input number
  let modpg = modpg.uint           # to reduce casting hell noise
  var k = num div modpg            # compute its resgroup
  var modk = modpg * k.uint        # compute its base num
  let Kmax = (num-2) div modpg + 1 # maximum number of resgroups for num
  
  let B = Bn * 1024                # set seg size to optimize for selected PG
  KB = if Kmax.int < B: (Kmax.int + 1) else: B  # min num of segment resgroups
  seg = newSeq[uint8]((rescntp div 2)*KB) # create [twinpairs x KB] bytes array
  
  echo("segment is [", (rescntp div 2), " x ", KB, "] bytes array")
  
  # This is not necessary for running the program but provides information
  # to determine the 'efficiency' of the used PG: (num of primes)/(num of pcs)
  # The closer the ratio is to '1' the higher the PG's 'efficiency'.
  var r = 0                        # starting with first residue
  while num.uint >= modk+restwins[r].uint: r += 1 # find last tp index <= num
  let maxpcs = k*rescntp.uint + r.uint  # maximum number of twinprime pcs
  let maxpairs = maxpcs div 2      # maximum number of twinprime candidates
  
  echo("twinprime candidates = ", maxpairs, "; resgroups = ", Kmax)
  
  sozpg(int(sqrt float64(num)))    # compute pcnt and primes <= sqrt(num)
  
  echo("each thread creates|inits nextp[", 2, " x ", pcnt, "] array")
  let te = (epochTime()-ts)        # sieve setup time
  echo("setup time = ", te.formatFloat(ffDecimal, 3), " secs")
  
  twinscnt = if modpg > 210'u: 3 else: 2   # for 1st tps (3,5)|(5,7)|(11,13)
  
  echo("perform twinprimes ssoz sieve")    # start doing ssoz now
  
  let t1 = epochTime()             # start timing ssoz sieve execution
  segsieve(Kmax.uint)              # perform total twinprime ssoz sieve
  
  var Kn = Kmax.int mod KB         # set number of resgroups in last slice
  if Kn == 0: Kn = KB              # if multiple of seg size set to seg size
  var ltwin = 0'u64                # to store last twinprime value <= num
  modk = (Kmax-1).uint * modpg     # set mod for last resgroup in last segment
  k = uint(Kn-1)                   # set val for last resgroup in last segment
  let lasti = rescntp div 2 - 1    # set val for last seg twinpair row index
  r = lasti                        # starting from last resgroup twinpair byte
  while true:                      # step backwards from end of last resgroup
    var row_i = r * KB             # set 'seg' byte row for twinpair restrack
    if int(seg[row_i + k.int]) == 0: # if both twinpair bits in byte are prime
      ltwin = modk + restwins[r*2 + 1].uint # numerate the upper twinprime val
      if ltwin <= num: break       # if its <= num its the last prime, so exit
      twinscnt -= 1                # else reduce twinscnt, keep backtracking
                                   # reduce restrack, next resgroup if needed
    r -= 1; if r < 0: (r = lasti; modk -= modpg;  k -= 1)
  
  let t2 = (epochTime()-t1)        # sieve execution time
  
  echo("sieve time = ", t2.formatFloat(ffDecimal, 3), " secs")
  echo("last segment = ", Kn, " resgroups; segment slices = ", Kmax div KB.uint+1)
  echo("total twins = ", twinscnt, "; last twin = ", ltwin-1, "+/-1")
  echo("total time = ", (t2+te).formatFloat(ffDecimal, 3), " secs\n")

twinprimes_ssoz()

miran (orginal) [2018-04-04T15:18:40+02:00] view original

show that Nim can be a player in the numerical analysis arena, particularly for parallel algorithms

IMO, it would be nice if you would convert this to a blog post, which could be shared on Reddit/HN/etc.

Maybe something similar to this: https://nim-lang.org/blog/2018/01/22/yes-command-in-nim.html ?

miran (orginal) [2018-04-04T18:58:09+02:00] view original

But I encourage, implore, welcome, people to beat on the code to improve it and make it faster. What idioms are faster than the ones I used, etc.

Here is my gist - I took your improved version, went quickly through it, and have made mostly cosmetic changes - theoretically this might be a bit quicker (e.g. variables declared outside of the loops), but I don't expect to see any difference in practice.

Some of the stuff I have changed:

comments above function declarations are now docstrings for those functions (comments starting with ##)

a new type to make return type of genPGparameters more readable

variables are declared outside of the loops

var and let blocks

use a ..< b instead of a .. b-1

added some breathing room :)

I haven't changed any logic (or I think so), but please test if it works as it should.

jzakiya (orginal) [2018-04-04T19:23:41+02:00] view original

Hey @miran, thanks. I'll look at your code when I get some time today.

It would be really helpful if people could run the code on different (Intel cpus, AMD, ARM, etc) platforms with different number of threads, cache size, etc, and if possible a|b it against primesieve on their systems, and post results. I'm going to try and see if I can get (expert) programmers in other language communities (C++, Crystal, D, Rust, etc) to do it in them and compare the results. I initially just want to see what their coded versions look like (idiomatically), or even if possible (can they do parallel programming, gc, etc).

Below are updated times for the current version, produced on my System76 laptop, Intel I7 6700HQ cpu, 2.6-3.5 GHz clock, with 8 threads, and 16GB of memory. I'm really interested to see how it performs under different threading systems.


Input Number |   twinprimes   | twinprimes_ssoz |  primesieve
----------------------------------------------------------------
1e10         |     27,412,679 |       0.458     |      0.572
5e10         |    118,903,682 |       2.501     |      3.112
1e11         |    224,376,048 |       5.110     |      6.475
5e11         |    986,222,314 |      28.583     |     35.675
1e12         |  1,870,585,220 |      61.236     |     75.638
2e12         |  3,552,770,943 |     135.701     |    162.975
3e12         |  5,173,760,785 |     223.803     |    252.175
4e12         |  6,756,832,076 |     316.653     |    354.066
5e12         |  8,312,493,003 |     400.375     |    448.066
6e12         |  9,846,842,484 |     497.222     |    547.792
7e12         | 11,363,874,338 |     595.554     |    642.447
8e12         | 12,866,256,870 |     698.129     |    749.884
9e12         | 14,356,002,120 |     809.974     |    843.042
1e13         | 15,834,664,872 |     903.161     |    953.423

jzakiya (orginal) [2018-04-06T20:54:14+02:00] view original

UPDATE, mostly more tweaking of proc twins_sieve. Tried different loop structures|idioms to see what would be faster (for vs while loops mostly, etc). Made some cosmetic, line order changes. Code version 2108/04/05 more compact, a little more cleaner, with little more clearer comments than version 2018/04/03.

https://gist.github.com/jzakiya/6c7e1868bd749a6b1add62e3e3b2341e

Biggest change was compiling source in VB (Virtual Box) image of base OS distros which has gcc 7.3.0. Compiled source in it to create binary, then ran binary on base hardware (I7, 8 threads, 16GB mem) to a|b compare with previous version compiled using gcc 4.9.2. With gcc 4.9.2 the binary is 264,158 bytes, with gcc 7.3.0 it's 255,720 bytes. Not only was binary smaller, performance with gcc 7.3.0 was discernibly faster. Times are given below for tests done on quiet system.


      
    
      
        
          

            
              jzakiya
             

            
              (orginal)
             

            [2018-04-07T18:39:45+02:00] 

            
              view original
            
          

          UPDATE, caught, and corrected, a subtle coding error that affected the total twinprimes count being possibly off by 1 (and wrong last twinprime value) when the Kmax residues groups for an input is (very rarely) an exact multiple of the segment size. If anybody is running code please use updated version 2018/04/07.

https://gist.github.com/jzakiya/6c7e1868bd749a6b1add62e3e3b2341e

I was hoping a few people would be curious enough to run the code and post (or send me via email) their results on their systems. If you are willing to do so, please list hardware and OS|gcc specs (cpu system, clocks, threads, mem|OS, gcc version, distro version).

The next biggest design milestone would be to implement the algorithm using GPUs (graphics processor units). I guess the easiest way to do that is to use something like CUDA, which I don't know if Nim supports.

        
      
      
    
      
        
          

            
              miran
             

            
              (orginal)
             

            [2018-04-07T20:35:41+02:00] 

            
              view original
            
          

          
 I was hoping a few people would be curious enough to run the code and post (or send me via email) their results on their systems
Is there a way to easily automate this?

I have tried to call it with ./twinprimes_ssoz 1000000, but I still have to enter manually the wanted number in the next step. Maybe in some next version you might consider parsing the argument, and only if there is none to ask to give it a number like it is now? 

        
      
      
    
      
        
          

            
              SolitudeSF
             

            
              (orginal)
             

            [2018-04-08T22:01:41+02:00] 

            
              view original
            
          

          miran, you can just do echo 1000000 | ./twinprimes_ssoz

        
      
      
    
      
        
          

            
              mratsim
             

            
              (orginal)
             

            [2018-04-11T18:29:55+02:00] 

            
              view original
            
          

          There is no problem using fortran from Nim and many fortran libraries have a C header for C compat that will also deal with converting from C 0-indexing to Fortran 1-indexing.

Nim can use Cuda and OpenCL, there are some hurdles but in the end it's really nice to use once the low-level stuff are abstracted away.

Also in my (data science) experience, no one wants to touch Julia for numerical computing, R people stay with R and C++, Python people would rather use Python, C, Numba, Cython, or whatever JIT or multiprocessing lib they can try than use Julia. The Julia ecosystem for data science is lacking (better than Nim theoretically but an uphill battle in usage), the syntax is more noisy than Python (begin, end), the speed can be workaround by using wrapped C/Fortran libraries and indexing start as 1.

For me Julia is not after Python nor a Python killer for numerical computing but more trying to seduce the Matlab people.

        
      
      
    
      
        
          

            
              miran
             

            
              (orginal)
             

            [2018-04-11T18:54:23+02:00] 

            
              view original
            
          

          
 no one wants to touch Julia for numerical computing
[citation needed]

        
      
      
    
      
        
          

            
              mratsim
             

            
              (orginal)
             

            [2018-04-11T23:29:37+02:00] 

            
              view original
            
          

          Julia has been broken on Kaggle, the largest data science community, for 1.5 years and very few ask for it: https://www.kaggle.com/product-feedback/25044. Also I mention data science, not say fluid dynamics. Don't take it out of context ;).

        
      
      
    
      
        
          

            
              Udiknedormin
             

            
              (orginal)
             

            [2018-04-12T16:14:05+02:00] 

            
              view original
            
          

          @Araq

As far as I know, there is no easy way to do that other than "pretty much the same like in C", which means linking complications and representation problems, for instance: Fortran's logical is NOT the same as bool in C ( _Bool, but bool ;) ). And it's not nearly as easy as with C++ --- I don't think you can import a Fortran class easily, can you? Can you easily use Fortran >=90 array functions without writing a Fortran-to-C wrapper in Fortran first? I don't think so too.

Of course maybe you could write a lib for that, I'd say it sounds entirely possible. But I haven't seen a lib like that yet.

@mratsim

People don't use Julia for similar reasons they don't use Nim --- it's too unstable, too buggy and there aren't many good tutorials, docs etc. Or libs, actually, as Julia lacks many libs too (less than Nim though, as far as I know). Still, as you've already mentioned, it depends on the field. Two of my lecturers consider trying Nim, for instance. :D Today one more agreed for me to write my mid-semester project in Nim, as an experiment.

        
      
      
    
      
        
          

            
              metaden
             

            
              (orginal)
             

            [2018-05-05T04:59:35+02:00] 

            
              view original
            
          

          Couldn't render post #23698.

        
      
      
    
      
        
          

            
              jzakiya
             

            
              (orginal)
             

            [2018-05-24T20:40:47+02:00] 

            
              view original
            
          

          In honor of the new Nim forum update :-)  I finally decided to post an update to my code.  I had been coding different versions using different methods, to minimize memory use, max speed, etc.  This version is good compromise, lower memory usage than previous version with no discernible speed loss.

An additional benefit of this version's architecture is that all the memory allocation necessary to process a twinpair now is created|recovered in each thread in the method twins_sieve.  I can even abstract out needing to create a seg array for each thread by using sets to minimize mem usage, at the expense of speed.  Also because all the sieving mem fits in a thread, it can now possibly fit inside a GPU thread, making it possibly able to fly in a CUDA implementation.

Lastly, if Nim had a really fast bit vector (?) implementation replacing the segment byte array with it would probably speed things up (and reduce mem too ?).

https://gist.github.com/jzakiya/6c7e1868bd749a6b1add62e3e3b2341e

        
      
      
    
      
        
          

            
              dataman
             

            
              (orginal)
             

            [2018-05-24T22:53:31+02:00] 

            
              view original
            
          

          > if Nim had a really fast bit vector (?) implementation

Will be implemented. Soon™.

        
      
      
    
      
        
          

            
              jzakiya
             

            
              (orginal)
             

            [2018-05-25T00:27:17+02:00] 

            
              view original
            
          

          
 Will be implemented. Soon™.
That would be most excellent!

        
      
      
    
      
        
          

            
              dataman
             

            
              (orginal)
             

            [2018-05-25T03:56:56+02:00] 

            
              view original
            
          

          @jzakiya3h

While you can use the bitsets.nim module from the compiler's folder.

Mirror of forum.nim-lang.org

3696 :: Twinprimes generator that showcases Nim