So... what I'm trying to do is pretty much this: we have a seq[byte] and given two consecutive byte values, we have to replace them with one.
Like:
A B C D E F G A B F (replace A B with W)
W C D E F G W F
I've been experimenting with different algorithms.
This is my original implementation:
proc substitute2to1r*(a: ByteArray, subA: OpCode, subB: OpCode, replacement: OpCode): ByteArray =
var i = 0
let aLen = a.len
while i < aLen-2+1:
let diffA = a[i] - (Byte)(subA)
if diffA >= 0 and diffA <= 29 and diffA == a[i+1] - (Byte)(subB):
result.add((Byte)(replacement) + diffA)
i.inc(2)
else:
result.add(a[i])
i.inc(1)
result.add(a[i..^1])
This is a better version of the above (the most performant of the 3 here):
proc substitute2to1r2*(a: ByteArray, subA: OpCode, subB: OpCode, replacement: OpCode): ByteArray =
var i = 0
var cntr = 0
let aLen = a.len
newSeq(result, aLen)
while i < aLen-2+1:
let diffA = a[i] - (Byte)(subA)
if diffA >= 0 and diffA <= 29 and diffA == a[i+1] - (Byte)(subB):
result[cntr] = (Byte)(replacement) + diffA
cntr.inc()
i.inc(2)
else:
result[cntr] = a[i]
cntr.inc()
i.inc(1)
for j,it in a[i..^1]:
result[cntr+j] = it
result.setLen(cntr + a[i..^1].len)
And this is an attempt with in-place modification:
proc sub2to1r2*(a: var ByteArray, subA: OpCode, subB: OpCode, replacement: OpCode) =
var i = 0
var cntr = 0
let aLen = a.len
while i < aLen-2+1:
let diffA = a[i] - (Byte)(subA)
if diffA >= 0 and diffA <= 29 and diffA == a[i+1] - (Byte)(subB):
a[i] = (Byte)(replacement) + diffA
a[i+1] = (Byte)255
cntr.inc()
i.inc()
i.inc()
a.keepIf((item) => item != (Byte)255)
I thought by doing it in-place, there would be something to gain, but I guess I'm still not doing it in the best possible way.
Given that what I'm aiming for is speed, any suggestions?
This reminds of an AOC problem for last year: https://adventofcode.com/2021/day/14
IIRC the trick for that one might have been memoization/dynamic programming. You could try that? You can find a bunch of hints and discussion on reddit: https://www.reddit.com/r/adventofcode/search/?q=2021%20day%2014&restrict_sr=1&sr_nsfw=
please tell me if my assumptions are off, but your functions aren't quite working the way you say they should.
type
ByteArray = seq[byte]
OpChar = byte
Byte = byte
let
input = "ABCDEFGBF".mapIt(it.byte)
output = input.substitute2to1r('A'.byte,'B'.byte,'W'.byte)
echo output.mapIt(it.char)
#@['W', 'Y', '[', 'G', 'W', 'F']
As he said:
Exactly!
I just didn't want to bother you with too many details while still maintaining the code roughly as it looks.
What I'm experimenting with (and it's still an experiment - at least its implementation - that's why it looks so draft-like) is a bytecode optimizer for Arturo.
Basically, Arturo's evaluator produced a bytecode object, which is nothing other than a list of constants (=data) and a list of opcodes (=code).
https://github.com/arturo-lang/arturo/blob/master/src/vm/opcodes.nim
What I'm experimenting with here is a way of going through these opcodes and performing some simple transformations.
For example, the above code could be used for something like Store+Load->StoreLoad transformations.
Take this as an example (in Arturo):
x: 1
print 1 + x
This produces the following bytecode:
consti1 -> push 1 to the stack
store0 -> pop stack and store it as 'x'
load0 -> load variable 'x' onto the stack
consti1 -> push 1 to the stack
call2 -> call function ('add')
call1 -> call function ('print')
end
Given that... a very simple peephole optimization, but still: store0 + load0 could be reduced to 1 single opcode (storl0) which stores the value to 'x' but does not pop it from the stack.
So in that case, we would have to go through the bytecode and replace all sequences of StoreX-LoadX with StorlX (where X=0..29).
That's exactly what the code above does. ;-)
I don't know if you have to deal with one large ByteArray or many small ones, but this part of your code (the fastest one) is quite expensive:
for j,it in a[i..^1]:
result[cntr+j] = it
result.setLen(cntr + a[i..^1].len)
My naive implementation is faster mainly because of this.
import std/times
type
ByteArray = seq[byte]
OpCode = byte
Byte = byte
proc substitute2to1naive*(a: ByteArray, subA: OpCode, subB: OpCode,
replacement: OpCode): ByteArray =
newSeq(result, a.len)
var resultIdx = 0
var lastByte = a[0]
var currentByte: Byte
var skipNext: bool
for i in 1..<a.len:
currentByte = a[i]
if currentByte == subB and lastByte == subA:
result[resultIdx] = replacement
inc resultIdx
lastByte = byte('\x00')
skipNext = true
elif skipNext:
skipNext = false
lastByte = currentByte
else:
result[resultIdx] = lastByte
inc resultIdx
lastByte = currentByte
result[resultIdx] = lastByte
result.setLen(resultIdx + 1)
proc substitute2to1r2*(a: ByteArray, subA: OpCode, subB: OpCode,
replacement: OpCode): ByteArray =
var i = 0
var cntr = 0
let aLen = a.len
newSeq(result, aLen)
while i < aLen-2+1:
if a[i] == subA and a[i+1] == subB:
result[cntr] = replacement
cntr.inc()
i.inc(2)
else:
result[cntr] = a[i]
cntr.inc()
i.inc(1)
for j, it in a[i..^1]:
result[cntr+j] = it
result.setLen(cntr + a[i..^1].len)
func toString(bytes: openArray[byte]): string =
result = newString(bytes.len)
copyMem(result[0].addr, bytes[0].unsafeAddr, bytes.len)
let input = cast[seq[byte]]("AABBCDEFGABHIJKLMABNOPQRSTABUVWXYZ")
var output1, output2: seq[byte]
var t1 = epochTime()
for i in 1..1_000_000:
output1 = input.substitute2to1naive('A'.byte, 'B'.byte, '.'.byte)
echo output1.toString
# A.BCDEFG.HIJKLM.NOPQRST.UVWXYZ
echo (epochTime() - t1)
var t2 = epochTime()
for i in 1..1_000_000:
output2 = input.substitute2to1r2('A'.byte, 'B'.byte, '.'.byte)
echo output2.toString
# A.BCDEFG.HIJKLM.NOPQRST.UVWXYZ
echo (epochTime() - t2)
If only I knew how to use SIMD...
To be perfectly precise, I'm not even substituting A+B with C; what is actually going on is substituting a range around A + B, with a range around C.
Can you demonstrate what this means with an example, otherwise it seems you could you use a string search algorithm.
The first one was when taking input from user, the second was when I had input specifically set to "Hello World" without bothering with asking for input. In both cases, there are extra characters, in the first case where it asks for input, it really messes up. Why?
Secondly, is this the right way to convert a char literal into an array of bytes?
I'll give you some ideas for the algorithm in pseudo-code, I have no implementation yet. I don't know if these ideas are perfect for your setting, this ranges might change what we optimize.
Instead of moving either from one/two byte whether you have done a replacement or not, you could always read your input data two by two.
Doing so, forces you to special case the last read and to distinguish at least the 4 following cases:
Now, I believe you want to do it recursively. The problem is if you have a string with AABB. You will not detect it in the first pass, if you do not check for this. You might have problematic strings with lets say: "XXXXXAAAAAAAAAAAAAAABBBBBBBBBBBBBBBBBBBBBXXXXXX" You will do a lot of pass, and read all the X many times for each pass.
To circumvent this, the best approach (without benchmark, so I can't tell for sure) would be to split your string in blocks.
If you have done a pass on two strings, what can we tell about the replacement we have to do in the concatenation of both strings ? Well, we have to check for the end of the first string and the beginning of the second string. We can have an approach where we do something like:
for s in substrings:
substitute2to1(s)
for s1, s2 in [(1,2), (2,3), (3,4), ...]:
checkForSubstitutionInConcatenation(s1, s2)
I do not think that you will benefit from fast pattern matching algorithms like https://en.wikipedia.org/wiki/Rabin%E2%80%93Karp_algorithm
The limitation in the inplace algorithm is probably more the write operation than the read operation. I think that if you want to change your bytes in place, you are better off beginning from the end of your seq. I think that every time you change two bytes by one, the sequence will have to move all the end of your seq to keep data consecutive.
I do not think Nim forces data to be consecutive in memory, but it is best to try to do the modifications from the end to the start of the sequence.