nimforum mirror - R-style logical vector operations in Nim?

Nimrookie (orginal) [2018-10-31T16:53:52+01:00] view original

Hi, does anyone know if Nim provides a handy function like 'ifelse' in R, that allows to apply conditional calculations on vectors? Didn't found that in the documentation. I would like to do something like

a<-c(1,2,3,4) b<-c(7,2,7,4) c<-ifelse(a==b,a*b,a+b)

Thanks in advance!

moigagoo (orginal) [2018-10-31T17:18:03+01:00] view original

You could use a list comprehension to create a sequence from another sequence based on a condition. To merge sequences on conditions look at procs and iterators in sequtils.

juancarlospaco (orginal) [2018-10-31T18:52:43+01:00] view original

Templates ?.

mratsim (orginal) [2018-10-31T19:56:56+01:00] view original

Loop-fusion or zero-functional

miran (orginal) [2018-11-01T07:03:27+01:00] view original

handy function like 'ifelse' in R

Yes, you can do let c = if a == b: a*b else: a+b

If you want to operate on vectors/arrays like that, you might need to define your + and * functions.

Nimrookie (orginal) [2018-11-01T08:51:16+01:00] view original

thanks to all for your hints. Performance is crucial for my task and with R the biggest progress was achieved by vectorization of functions. The difference between sequential and parallel calculation was stunning. For a function with 50 conditional assignments it took about 0.1s per call and was scaling linearly. Putting the input of 100 calls in vectors and than do all calculations with one call took 0.3s and expanding the length of the vectors to 1000 took 0.4s. Maybe this is not the case with Nim.

mratsim (orginal) [2018-11-04T09:13:00+01:00] view original

Vectorisation in R (or Python) means do the operation in compiled C or C++ code instead of interpreted R code.

Nim doesn't have this optimisation issue as it's compiled and will be optimised like C or C++.

Nimrookie (orginal) [2018-11-05T10:32:13+01:00] view original

Please see also this post from stackoverflow - suggesting tuple-skills for operands.

mratsim (orginal) [2018-11-05T16:51:30+01:00] view original

R (and C++) vectors equivalent in Nim are sequences.

Tuples are fine if you work with a limited fixed size of elements but otherwise use sequences or dedicated data structure (either Neo's vectors or Arraymancer's tensors).

Nimrookie (orginal) [2018-11-05T23:19:28+01:00] view original

Thanks for the clarification. Regarding the ifelse task with 1000 elements: does anyone know how big the relative performance gain of array vs. sequence would be? Otherwise, I will benchmark this, but unfortunately this will take some time...

Stefan_Salewski (orginal) [2018-11-06T09:46:14+01:00] view original

Otherwise, I will benchmark this, but unfortunately this will take some time

Yes, you really should do that, it may help you to understand that your questions make not that much sense -- asking here and in stackoverflow at the same time make it not really better.

Mratsim gave you a very good answer already -- and in case you do not know already, he is one of the brightest Nim devs.

You may use criterion package for you benchmark, see

https://forum.nim-lang.org/t/4142

Don't forget to compile your test in release mode with option -d:release

After testing yourself, you may post your concrete Nin code here, maybe some experts can give you more tips to improve performance. With some luck the compilers can apply SIMD instructions, which may boost performance. Or operations can be done in parallel.

mratsim (orginal) [2018-11-06T12:15:33+01:00] view original

There is actually no need to benchmark.

You shouldn't put 1000 elements in an array, 1000 elements of int32 or float32 will take 4000 bytes and 1000 elements of int64 or float64 will take 8000 bytes (int32 is 32 bits = 4 bytes).

Stack space is very limited, 1MB to 8MB in general. So with 2 arrays of 1000 elements you are taking 1%~16% of the stack space which is used to store local variables, function pointers, code segments, ...
- Note that contrary to most languages, Nim will not copy by value stack variables when passing them to other functions if the copy is too costly.

The costly functions of a sequence compared to a stack-based array are:
- memory allocation, but you allocate only once
- appending, with add operator, because the sequence has to check the current reserved memory and possibly reallocate.

If you only use indexing a[i] in sequences, like with an array, the cost are the same: optional bound-checking + pointer dereference. Bound-checking is not done on release builds and pointer dereference is automatically cached by the CPU if done multiple times in a tight loop.

If you are really really worried about the total performance allocate your elements in a sequence and then access them via a ptr UncheckedArray[T] which will work like a C pointer.

In summary, if you want the best performance for your algorithm:

use seqs

avoid temporary sequence memory allocation if you can

use the indexing scheme a[i] = value

do not use the appending proc a.add value

Now here is the benchmark, note that you might run out of stack space for the arrays:

# ##########################################
# Benchmarking tools
import random, times, stats, strformat, math, sequtils

proc warmup() =
  # Warmup - make sure cpu is on max perf
  let start = cpuTime()
  var foo = 123
  for i in 0 ..< 300_000_000:
    foo += i*i mod 456
    foo = foo mod 789
  
  # Compiler shouldn't optimize away the results as cpuTime rely on sideeffects
  let stop = cpuTime()
  echo &"Warmup: {stop - start:>4.4f} s, result {foo} (displayed to avoid compiler optimizing warmup away)"

template printStats(name: string, output: openarray) {.dirty.} =
  echo "\n" & name
  echo &"Collected {stats.n} samples in {global_stop - global_start:>4.3f} seconds"
  echo &"Average time: {stats.mean * 1000 :>4.3f} ms"
  echo &"Stddev  time: {stats.standardDeviationS * 1000 :>4.3f} ms"
  echo &"Min     time: {stats.min * 1000 :>4.3f} ms"
  echo &"Max     time: {stats.max * 1000 :>4.3f} ms"
  echo &"Theoretical perf: {a.len.float / (float(10^6) * stats.mean):>4.3f} MFLOP/s"
  echo "\nDisplay output[0] to make sure it's not optimized away"
  echo output[0] # Prevents compiler from optimizing stuff away

template bench(name: string, output: openarray, body: untyped) {.dirty.}=
  block: # Actual bench
    var stats: RunningStat
    let global_start = cpuTime()
    for _ in 0 ..< nb_samples:
      let start = cpuTime()
      body
      let stop = cpuTime()
      stats.push stop - start
    let global_stop = cpuTime()
    printStats(name, output)

# #############################################

proc benchArray(a, b: array[1000, int], nb_samples: int) =
  var output: array[1000, int]
  bench("Array ifelse", output):
    for i in 0 ..< a.len:
      if a[i] == b[i]:
        output[i] = a[i] * b[i]
      else:
        output[i] = a[i] + b[i]

proc benchSeq(a, b: array[1000, int], nb_samples: int) =
  let a_seq = @a
  let b_seq = @b
  var output = newSeq[int](1000)
  bench("Seq ifelse", output):
    for i in 0 ..< a.len:
      if a[i] == b[i]:
        output[i] = a[i] * b[i]
      else:
        output[i] = a[i] + b[i]

# ###########################################

when defined(fast_math):
  {.passC:"-ffast-math".}

when defined(march_native):
  {.passC:"-march=native".}

when isMainModule:
  randomize(42) # For reproducibility
  warmup()
  block:
    var a, b: array[1000, int]
    for i in 0 ..< 1000:
      a[i] = rand(0..100)
      b[i] = rand(0..100)
    
    benchArray(a, b, nb_samples = 1_000_000)
    benchSeq(a, b, nb_samples = 1_000_000)

And the result on my machine:


Warmup: 1.3207 s, result 224 (displayed to avoid compiler optimizing warmup away)

Array ifelse
Collected 1000000 samples in 2.685 seconds
Average time: 0.002 ms
Stddev  time: 0.001 ms
Min     time: 0.001 ms
Max     time: 0.087 ms
Theoretical perf: 565.750 MFLOP/s

Display output[0] to make sure it's not optimized away
18

Seq ifelse
Collected 1000000 samples in 2.611 seconds
Average time: 0.002 ms
Stddev  time: 0.001 ms
Min     time: 0.001 ms
Max     time: 0.219 ms
Theoretical perf: 589.900 MFLOP/s

Display output[0] to make sure it's not optimized away
18

Note that MFLOP/s isn't technically true, first of all, we are using integers and second it consider that the following is a single operation, while there is a comparison + mul/add.

if a[i] == b[i]:
        output[i] = a[i] * b[i]
      else:
        output[i] = a[i] + b[i]

Mirror of forum.nim-lang.org

4343 :: R-style logical vector operations in Nim?