Since I usually use scripting languages like Python and don't know static typed languages well, this suggestion may be a strange, premature or completely wrong optimization. Please kindly let me know if it is wrong.
As mentioned in docstring, some seq procs (system.delete(), system.insert(), sequtils.delete() and sequtils.insert()) is O(n) operation. Bigger seq requires more shallowCopy() especially on leftward index operation.
I sometimes use large sequence directly or indirectly. In my opinion, seq is quite a basic type and used for many situations and some algorithms (and some benchmarks?). So, to improve its performance, I think memmove in libc helps it. Here is rough alternative delete() code which seems to work and get significant performance improvement especially on large seq.
when defined(gogc):
const GenericSeqSize = (3 * sizeof(int))
else:
const GenericSeqSize = (2 * sizeof(int))
proc delete*[T] (s: var seq[T], i: int) =
# Current system.delete(s: var seq[T], i: Natural) accepts only
# Natural. But I thought it would be better to accept negative int
# also, for convenience.
delete(s, i, i)
proc delete*[T](s: var seq[T], first, last: int) =
# This is similar to `sequtils.delete`.
let
sHigh = s.high
var
ix, iy: int
if first > sHigh:
raise newException(IndexError, "index out of bounds")
elif first < 0:
ix = sHigh + first + 1
if ix < 0:
raise newException(IndexError, "index out of bounds")
else:
ix = first
if last > sHigh:
raise newException(IndexError, "index out of bounds")
elif last < 0:
iy = sHigh + last + 1
if iy < 0:
raise newException(IndexError, "index out of bounds")
else:
iy = last
if ix > iy:
return
# I'm not sure this is correct way to check if the elements are
# required to be GC unrefed.
when T is ref or T is seq or T is string:
# Do GC_unref() against only elements to be removed.
for i in ix .. iy:
GC_unref(s[i])
let
c = iy - ix + 1
newLen = sHigh + 1 - c
elemSize = sizeof(T)
eAddr = cast[ByteAddress](s) + GenericSeqSize
# If only the last element(s) is asked to be deleted, no memmove is required.
if iy != sHigh:
moveMem(
cast[pointer](eAddr + (elemSize * ix)),
cast[pointer](eAddr + (elemSize * (iy + 1))),
elemSize * (sHigh - iy),
)
# To avoid decRef, don't use `setLen()`.
# So, zeroMem() and change length value.
zeroMem(
cast[pointer](eAddr + (elemSize * newLen)),
elemSize * c,
)
cast[ptr int](eAddr - GenericSeqSize)[] = newLen
Questions:
isGCed type trait sounds good! Especially for library/system developers who want efficient program.
Once we have this trait, the stdlib can use memmove where efficient.
Exactly. In stdlib, there are many = (assignments) and shallowCopy in for-loops (for example, &, [], []= operators for seq/array/string).
I guess, "Call memmove once and GC_ref/unref against the elements (if the type is isGCed)" is more efficient than "assignment or shallowCopy for each element in loop".
Actually, the above code is 10x or more faster than system.delete on large seq. And in some cases, "Call memcpy once" may be used, which would result in more performance gain.
Thanks Jehan.
when T is ref or T is seq or T is string:
for i in ix .. iy:
GC_unref(s[i])
is wrong. To be safe, it must be:
for i in ix .. iy: reset(s[i])
The new code can not check if its elements are required to be reset or not in advance. Since every elements must be reset, the advantage of memmove disappears...
@Jehan: you are right. To test it roughly, I compiled the following loop by -d:release and run it with time command.
var
i = 10000
s: seq[int]
newSeq(s, i)
while i > 0:
delete(s, i div 2)
# system.delete(s, i div 2)
dec(i)
in case of system.delete:
$ time test
real 0m0.021s
user 0m0.019s
sys 0m0.001s
in case of the custom delete (with reset)
$ time test
real 0m0.008s
user 0m0.007s
sys 0m0.001s
@Varriount: Sorry I want to clarify it. You mean seq[string] ?
Like this?
# I'm not sure this is is OK or not.
const ArrayHeaderSize = 2 * sizeof(int)
proc `[]=`*(s: var string, x: Slice[int], b: string) =
# Original system.`[]=`
#var a = x.a
#var L = x.b - a + 1
#if L == b.len:
# for i in 0 .. <L: s[i+a] = b[i]
#else:
# spliceImpl(s, a, L, b)
let
ix = x.a
iy = x.b
sLen = s.len
bLen = b.len
diffLen = bLen - (iy - ix + 1)
sAddr = cast[ByteAddress](s) + ArrayHeaderSize
if diffLen > 0:
setLen(s, sLen + diffLen)
moveMem(
cast[pointer](sAddr + iy + 1 + diffLen),
cast[pointer](sAddr + iy + 1),
sLen - iy - 1
)
copyMem(
cast[pointer](sAddr + ix),
cast[pointer](cast[ByteAddress](b) + ArrayHeaderSize),
bLen,
)
if diffLen < 0:
moveMem(
cast[pointer](sAddr + ix + bLen),
cast[pointer](sAddr + iy + 1),
sLen - iy - 1,
)
setLen(s, sLen + diffLen)
To test the []= proc roughly, I compiled the following loop by -d:release and run it with time shell command.
var
s:string = ""
sLen = 10000
s.setLen(sLen)
for i in 0 .. (sLen - 2):
s[0 .. 1] = "0"
# system.`[]=`(s, 0 .. 1, "0")
in case of system.[]=:
$ time test
real 0m0.060s
user 0m0.059s
sys 0m0.001s
in case of the custom []=
$ time test
real 0m0.003s
user 0m0.002s
sys 0m0.001s
Thanks Varriount. I wasn’t aware addr myString[0]!
To summarize:
So, I will post this suggestion as an issue on the github repo.