nimforum mirror - [Performance improvement] seq remove() and insert()

takaomag (orginal) [2015-07-24T22:10:18+02:00] view original

Since I usually use scripting languages like Python and don't know static typed languages well, this suggestion may be a strange, premature or completely wrong optimization. Please kindly let me know if it is wrong.

As mentioned in docstring, some seq procs (system.delete(), system.insert(), sequtils.delete() and sequtils.insert()) is O(n) operation. Bigger seq requires more shallowCopy() especially on leftward index operation.

I sometimes use large sequence directly or indirectly. In my opinion, seq is quite a basic type and used for many situations and some algorithms (and some benchmarks?). So, to improve its performance, I think memmove in libc helps it. Here is rough alternative delete() code which seems to work and get significant performance improvement especially on large seq.

when defined(gogc):
  const GenericSeqSize = (3 * sizeof(int))
else:
  const GenericSeqSize = (2 * sizeof(int))


proc delete*[T] (s: var seq[T], i: int) =
  # Current system.delete(s: var seq[T], i: Natural) accepts only
  # Natural. But I thought it would be better to accept negative int
  # also, for convenience.
  
  delete(s, i, i)


proc delete*[T](s: var seq[T], first, last: int) =
  # This is similar to `sequtils.delete`.
  
  let
    sHigh = s.high
  var
    ix, iy: int
  
  if first > sHigh:
    raise newException(IndexError, "index out of bounds")
  elif first < 0:
    ix = sHigh + first + 1
    if ix < 0:
      raise newException(IndexError, "index out of bounds")
  else:
    ix = first
  
  if last > sHigh:
    raise newException(IndexError, "index out of bounds")
  elif last < 0:
    iy = sHigh + last + 1
    if iy < 0:
      raise newException(IndexError, "index out of bounds")
  else:
    iy = last
  
  if ix > iy:
    return
  
  # I'm not sure this is correct way to check if the elements are
  # required to be GC unrefed.
  when T is ref or T is seq or T is string:
    # Do GC_unref() against only elements to be removed.
    for i in ix .. iy:
      GC_unref(s[i])
  
  let
    c = iy - ix + 1
    newLen = sHigh + 1 - c
    elemSize = sizeof(T)
    eAddr = cast[ByteAddress](s) + GenericSeqSize
  
  # If only the last element(s) is asked to be deleted, no memmove is required.
  if iy != sHigh:
    moveMem(
      cast[pointer](eAddr + (elemSize * ix)),
      cast[pointer](eAddr + (elemSize * (iy + 1))),
      elemSize * (sHigh - iy),
    )
  
  # To avoid decRef, don't use `setLen()`.
  # So, zeroMem() and change length value.
  zeroMem(
    cast[pointer](eAddr + (elemSize * newLen)),
    elemSize * c,
  )
  cast[ptr int](eAddr - GenericSeqSize)[] = newLen

Questions:

Is this correct and reasonable?

If so, I feel it would be magic proc such as mSetLengthSeq, because each backend may also have more efficient way (for example, javascrit has splice() method). At first, I tried to implement it as magic to create PR, but I could not (ccgexprs.nim, ast.nim, jsgen.nim, semexprs.nim, and csources? Compiler is difficult for me). If this suggestion is reasonable, could someone please implement it?

If not, Thanks for reading to the end! Please let me know the point.

Araq (orginal) [2015-07-25T19:25:20+02:00] view original

I dunno, looks good.

The real problem why we don't do it already is that we have no isGCed type trait. Your check works I guess but doesn't cover every case. Once we have this trait, the stdlib can use memmove where efficient.

You're welcome.

takaomag (orginal) [2015-07-25T21:08:01+02:00] view original

isGCed type trait sounds good! Especially for library/system developers who want efficient program.

Once we have this trait, the stdlib can use memmove where efficient.

Exactly. In stdlib, there are many = (assignments) and shallowCopy in for-loops (for example, &, [], []= operators for seq/array/string).

I guess, "Call memmove once and GC_ref/unref against the elements (if the type is isGCed)" is more efficient than "assignment or shallowCopy for each element in loop".

Actually, the above code is 10x or more faster than system.delete on large seq. And in some cases, "Call memcpy once" may be used, which would result in more performance gain.

Jehan (orginal) [2015-07-26T03:15:00+02:00] view original

I don't think this code works for sequences of tuple or object types that contain references, e.g.: seq[tuple[a: string, b: ref int]].

I also don't think the code works properly with GCs other than the ref-counting one. The GC_unrefed reference may be one that was GC_refed by another piece of code (e.g. to be kept alive during an FFI call) and then gets deallocated prematurely.

My recommendation would be to use reset(s[i]) instead, which should properly kill any references, whether top-level or embedded inside an object or tuple and doesn't depend on GC internals.

takaomag (orginal) [2015-07-27T04:42:59+02:00] view original

Thanks Jehan.

when T is ref or T is seq or T is string:
    for i in ix .. iy:
      GC_unref(s[i])

is wrong. To be safe, it must be:

for i in ix .. iy: reset(s[i])

The new code can not check if its elements are required to be reset or not in advance. Since every elements must be reset, the advantage of memmove disappears...

Jehan (orginal) [2015-07-27T05:38:53+02:00] view original

The advantage of memmove() should not disappear unless you delete a large number of elements relative to the number of elements being moved (in which case GC_unref() is also going to offset the speed advantage). You can also still elide reset() for types where you know that they don't contain references. These are at least the basic integral and float types.

Varriount (orginal) [2015-07-27T07:58:22+02:00] view original

Hm. Would this improvement be applicable if modified for strings? Strings are just character arrays, so there wouldn't be any GC'd references to manage.

takaomag (orginal) [2015-07-27T08:12:51+02:00] view original

@Jehan: you are right. To test it roughly, I compiled the following loop by -d:release and run it with time command.

var
  i = 10000
  s: seq[int]

newSeq(s, i)
while i > 0:
  delete(s, i div 2)
  # system.delete(s, i div 2)
  dec(i)

in case of system.delete:


$ time test

real	0m0.021s
user	0m0.019s
sys	0m0.001s

in case of the custom delete (with reset)


$ time test

real	0m0.008s
user	0m0.007s
sys	0m0.001s

@Varriount: Sorry I want to clarify it. You mean seq[string] ?

Varriount (orginal) [2015-07-27T08:21:40+02:00] view original

No, just the string type.

takaomag (orginal) [2015-07-27T10:49:18+02:00] view original

Like this?

# I'm not sure this is is OK or not.
const ArrayHeaderSize = 2 * sizeof(int)

proc `[]=`*(s: var string, x: Slice[int], b: string) =
  # Original system.`[]=`
  #var a = x.a
  #var L = x.b - a + 1
  #if L == b.len:
  #  for i in 0 .. <L: s[i+a] = b[i]
  #else:
  #  spliceImpl(s, a, L, b)
  
  let
    ix = x.a
    iy = x.b
    sLen = s.len
    bLen = b.len
    diffLen = bLen - (iy - ix + 1)
    sAddr = cast[ByteAddress](s) + ArrayHeaderSize
  
  if diffLen > 0:
    setLen(s, sLen + diffLen)
    moveMem(
      cast[pointer](sAddr + iy + 1 + diffLen),
      cast[pointer](sAddr + iy + 1),
      sLen - iy - 1
    )
  
  copyMem(
    cast[pointer](sAddr + ix),
    cast[pointer](cast[ByteAddress](b) + ArrayHeaderSize),
    bLen,
  )
  if diffLen < 0:
    moveMem(
      cast[pointer](sAddr + ix + bLen),
      cast[pointer](sAddr + iy + 1),
      sLen - iy - 1,
    )
    setLen(s, sLen + diffLen)

takaomag (orginal) [2015-07-27T12:12:13+02:00] view original

To test the []= proc roughly, I compiled the following loop by -d:release and run it with time shell command.

var
  s:string = ""
  sLen = 10000
s.setLen(sLen)
for i in 0 .. (sLen - 2):
  s[0 .. 1] = "0"
  # system.`[]=`(s, 0 .. 1, "0")

in case of system.[]=:


$ time test

real	0m0.060s
user	0m0.059s
sys	0m0.001s

in case of the custom []=


$ time test

real	0m0.003s
user	0m0.002s
sys	0m0.001s

Varriount (orginal) [2015-07-27T15:42:33+02:00] view original

Yes. You know that for strings and sequences, you can use addr myString[0] to get the address of the buffer, right? No need to fiddle around with header offsets.

takaomag (orginal) [2015-07-27T18:00:46+02:00] view original

Thanks Varriount. I wasn’t aware addr myString[0]!

To summarize:

My suggestion is memmove and memcpy would seed up modification of a container (seq, string and array), in case that it has many elements and current stdlib implementation uses O(n) = assignment or shallowCopy in for-loop. I guess there would be no penalty even in the case of a few elements (but I don't cover all cases).

But it may be premature optimization since Nim is under dev phase (core developers have so many issues!) and we can not get the benefit in case of a few elements.

Araq seems to have an idea about GC (isGCed type trait).

I'm not sure these code should be magic procs (as I said, each backend may have each efficient implementation). I can not create PR since I don't know Nim compiler well, sorry.

So, I will post this suggestion as an issue on the github repo.

Mirror of forum.nim-lang.org

1471 :: [Performance improvement] seq remove() and insert()