nimforum mirror - D like slices

adrianv (orginal) [2012-06-24T23:22:25+02:00] view original

Hi,

I started a small library, which tries to implement D style array slices (without copying the data - just keep a reference and the length). Initial code is here https://github.com/AdrianV/nimrod-tools - comments and critics are welcome

here is the basic idea:


  
  var test = @[1,2,3,4,5,6,7,8,9].all()
  var stest = test.slice(2..4) # [3,4,5] slice of test - nothing copied
  var t2 = test[1..6] #  [2,3,4,5,6,7] - nothing copied
  var t3 = t2[0..1000] # [2,3,4,5,6,7,8,9] slice of t2 - access the underlying test - nothing copied
  test[3] = 12 # [1,2,3,12,5,6,7,8,9]
  # stest = [3,12,5]
  stest[1] = 5 #  [3,5,5]
  # t2 is still [2,3,12,5,6,7]

Araq (orginal) [2012-06-25T00:34:29+02:00] view original

I only skimmed your code and don't know if it's safe. ;-)

I'd use some ptr array [0..1000_000] instead of the raw pointer for convenience. Well I'd use the array if system.shallowCopy for sequences wouldn't exist...

adrianv (orginal) [2012-06-25T10:26:56+02:00] view original

oh - didn't thought about shallowCopy. my second implementation is much simpler now ;-)

Araq (orginal) [2012-06-25T19:04:37+02:00] view original

Your way is still too complicated: ;-)


# since it has reference semantics, I choose 'P'
type
  PSeqSlice {.final, pure, shallow.}[T] = object
     s: seq[T]
     a, b: int

proc slice*[T](fseq: seq[T], x: TSlice[int]): PSeqSlice[T] =
  # create a slice from a seq[T]; we need to copy here as it could be
  # a constant sequence not allocated on the heap:
  result.s = fseq
  result.a = max(0, x.a)
  result.b = min(fseq.high, x.b)

proc `[]`* [T](s: PSliceSeq[T], i: int): var T =
  assert s.a + i <=% s.b
  result = s.s[s.a + i]

Well, you get the idea... ;-)

adrianv (orginal) [2012-06-26T07:59:21+02:00] view original

Yes that's simpler and close to the solution I had first in mind ;-), but then I wanted access performance equal to normal arrays, without extra costs, therefor I cache the base. But I didn't create a benchmark yet to see if this extra bookkeeping pays out.

edit: did some basic benchmarks against the standard slices. My implementation shows some benefits, if the slice length gets longer (which is not a not surprise).

What I don't understand fully yet is the benefit of the shallow pragma in your example

edit: found out - the slices itself are shallow copied

Anyway nimrods flexibility is fascinating. If you want D like slices - you can create it quite easily, but your are not forced to use them (and they can be really confusing if you are not used to them)

adrianv (orginal) [2012-06-26T17:43:08+02:00] view original

Well, you get the idea...

Yes ;-)

now I did some benchmarks. There is no measurable difference between your solution and mine. So I will go with the cleaner (your suggestion).

I did the following test. Fill an array of byte with 10_000_000 random values. go over the array and create ~1_000_000 slices with a length ~10 - simulating a parse. Than go over the created slices and sum up all the values inside:

average slice length 10:

create 1_000_000 slices:

standard slices from system: 270 ms

seq slices (both implementations): 140 ms, dmd: 110 ms

non copying slicing pays off by factor 2

now the interesting part:

calc sum over all slices:

standard slices from system: 370 ms

seq slices (both implementations): 130 ms, dmd: 60 ms

slices seem also faster to process, because they are more cache efficient. When you increase the length and reduce the number it is even more significant.

average slice length 100:

create ~100_000 slices:

standard slices from system: 122 ms

seq slices (both implementations): 91 ms, dmd: 85 ms

calc sum over all slices:

standard slices from system: 250 ms

seq slices (both implementations): 15 ms, dmd: 50 ms

so the benefit is not in creation (as what I had expected), but in processing - more than factor 15 in this case. Something I would have never thought

edit: values for dmd added

Araq (orginal) [2012-06-27T15:06:23+02:00] view original

I haven't looked at the produced C code nor did any measurements, but in general the compiler is still missing an important type specialization optimization for assignments. This is not far away from being implemented and may lead to improvements of a factor of 2-3 for assignment heavy code for types like PSeqSlice. Perhaps we beat DMD then on more of the tests...

Btw I am not particularly interested in efficient slices:

For parsing I prefer a parseutils like approach: The upper bound is often the "natural" bound for parsing, so s.len is fine and doesn't need to be stored again in a different field. By passing start and returning the amount of chars processed it is a bit easier to keep the cursor in a register.

For matrixes and numerical computing you often want strides, columns etc and you need a special implementation for them anyway (and it's harder to do so without copying). Btw Nimrod's flexibility allows you to overload {} for column access: a{i..j}.

srean (orginal) [2012-11-22T22:04:31+01:00] view original

Hi,

found out about Nimrod today (it sure looks great) and this is one of the first things I checked for.

Just wanted to say that for certain types of progrmas slices really help. Many linear algebraic, machine learning and statistical and other number crunching algorithms can be expressed more succinctly using the slice syntax. Since the algorithms themselves are often written in terms slices or other sub-arrays of the original, the translation from an algorithmic description to code is less error prone.

This is one of the reasons for popularity of Numpy. Its semantics makes a lot of copying unavoidable, so one has to drop down to C, C++ or Cython.

This is just in case it kilndles some interest in efficient slices.

I will dig more into the language as time permits.

Best: srean

p.s. Is there a way to receive new posts/followups in one's inbox ?

Araq (orginal) [2012-11-22T23:45:47+01:00] view original

I think Nimrod's term rewriting macros will help these algorithms more than slices will but we'll see what reality has to say about that. ;-)

PS: Sorry, not yet.

Mirror of forum.nim-lang.org

46 :: D like slices