nimforum mirror - How to ensure large arrays not duplicated?

god (orginal) [2015-12-25T18:56:07+01:00] view original

(apologies if duplicate post - original vanished and needed small edit anyway..)

I am not sure how many times my large arrays are being duplicated or copied by value, but looking at the C code, there's a lot of genericSeqAssign happening. I want to read some large prices files from disk and process them without unnecessary duplication. Here's a contrived example to show what I'm trying to do

# my price arrays are thousands of float elements, simulated here with 10 floats
var pricesA:seq[float] = @[]    # google prices, say
var pricesB:seq[float] = @[]    # apple
var pricesC:seq[float] = @[]    # yahoo

for i in 1..10:  # simulate reading prices from a file
  pricesA.add(float(i))
  pricesB.add(float(i))
  pricesC.add(float(i))

# I want to work with the last 1000 prices - simulated by tailing 5 of the floats
let a = pricesA[^5..^1]     # (see 1 below)
let b = pricesB[^5..^1]
let c = pricesC[^5..^1]

let prm = @[a,b,c]   # group the prices I want to work with into one parameter (see 2 below)

proc doit(t: seq[seq[float]]) =
  # process two of the arrays (could be any - use a convenience variable)
  let curr1 = t[0]   # (see 3 below)
  let curr2 = t[2]
  
  # code to process curr1 vs curr2........
  # etc..

doit(prm)

There seems to be array copying/duplication at

slicing/tailing the arrays

creation of the parameter as one seq

using the convenience variables (curr1, curr2) in the proc doit

(Am not 100% sure of this duplication but when I used 'var's instead of 'let' I display the array[0] addresses and they were all different - maybe this 'let' version reduced the duplication.) My basic question is - is there a tried and trusted way to ensure large arrays are not copied? I can probably live with the slice/tail dups because they only happen once, but duplication when grouping params or creating a 2nd var to an array will be inefficient for what I'm doing.

Note. I'm aware of 'ref' and 'ptr' and deref '[]' access types, but trying to do that seemed to be getting as ugly as C code. Coming from python and having used D, these languages MAKE you duplicate arrays eg arr2 = arr1.dup (in D) or arr2 = numpy.copy(arr1) in python otherwise everything is a reference to the original array. Am I looking at the wrong language in nim? or are there some simple rules to follow? Thanks for any help

perturbation2 (orginal) [2015-12-26T06:35:01+01:00] view original

Sequences (and strings) have copy semantics for assignment, unless you use shallowCopy.

Slices are not yet optimized to not copy:

Manual (parallel block section): Manual: Slices are optimized so that no copy is performed. This optimization is not yet performed for ordinary slices outside of a parallel section.

If you pass the arrays as an openarray of seq[float]s to prm, or just pass as normal seq[float] parameters, it shouldn't need to do a copy. (And that eliminates the problem of convenience variables, too :))

# i.e., this
proc doit(a,b,c: seq[float])
# or this, referring to each sequence as t[i]
proc doit(t: varargs[seq[float]])

Side note: let and const (IIRC) are optimized to not copy since the variable a let or const assignment assigns to cannot be modified once assigned.

Araq (orginal) [2015-12-26T11:25:42+01:00] view original

Note. I'm aware of 'ref' and 'ptr' and deref '[]' access types, but trying to do that seemed to be getting as ugly as C code.

Then you're doing it wrong. ;-) You can also mark your seqs as shallow at creation and then only pointers are copied.

Angluca (orginal) [2015-12-26T11:44:22+01:00] view original

How to use shallow and byref pragma?

god (orginal) [2015-12-27T14:17:22+01:00] view original

Thanks guys for the feedback and tips. I think I'm getting the hang of this now.

Mirror of forum.nim-lang.org

1893 :: How to ensure large arrays not duplicated?