(apologies if duplicate post - original vanished and needed small edit anyway..)
I am not sure how many times my large arrays are being duplicated or copied by value, but looking at the C code, there's a lot of genericSeqAssign happening. I want to read some large prices files from disk and process them without unnecessary duplication. Here's a contrived example to show what I'm trying to do
# my price arrays are thousands of float elements, simulated here with 10 floats
var pricesA:seq[float] = @[] # google prices, say
var pricesB:seq[float] = @[] # apple
var pricesC:seq[float] = @[] # yahoo
for i in 1..10: # simulate reading prices from a file
pricesA.add(float(i))
pricesB.add(float(i))
pricesC.add(float(i))
# I want to work with the last 1000 prices - simulated by tailing 5 of the floats
let a = pricesA[^5..^1] # (see 1 below)
let b = pricesB[^5..^1]
let c = pricesC[^5..^1]
let prm = @[a,b,c] # group the prices I want to work with into one parameter (see 2 below)
proc doit(t: seq[seq[float]]) =
# process two of the arrays (could be any - use a convenience variable)
let curr1 = t[0] # (see 3 below)
let curr2 = t[2]
# code to process curr1 vs curr2........
# etc..
doit(prm)
There seems to be array copying/duplication at
(Am not 100% sure of this duplication but when I used 'var's instead of 'let' I display the array[0] addresses and they were all different - maybe this 'let' version reduced the duplication.) My basic question is - is there a tried and trusted way to ensure large arrays are not copied? I can probably live with the slice/tail dups because they only happen once, but duplication when grouping params or creating a 2nd var to an array will be inefficient for what I'm doing.
Note. I'm aware of 'ref' and 'ptr' and deref '[]' access types, but trying to do that seemed to be getting as ugly as C code. Coming from python and having used D, these languages MAKE you duplicate arrays eg arr2 = arr1.dup (in D) or arr2 = numpy.copy(arr1) in python otherwise everything is a reference to the original array. Am I looking at the wrong language in nim? or are there some simple rules to follow? Thanks for any help
Sequences (and strings) have copy semantics for assignment, unless you use shallowCopy.
Slices are not yet optimized to not copy:
If you pass the arrays as an openarray of seq[float]s to prm, or just pass as normal seq[float] parameters, it shouldn't need to do a copy. (And that eliminates the problem of convenience variables, too :))
# i.e., this
proc doit(a,b,c: seq[float])
# or this, referring to each sequence as t[i]
proc doit(t: varargs[seq[float]])
Side note: let and const (IIRC) are optimized to not copy since the variable a let or const assignment assigns to cannot be modified once assigned.
Note. I'm aware of 'ref' and 'ptr' and deref '[]' access types, but trying to do that seemed to be getting as ugly as C code.
Then you're doing it wrong. ;-) You can also mark your seqs as shallow at creation and then only pointers are copied.