Thanks for the suggestions, def!
Compile with -d:release
I always do.
use seq[bool] instead of seq[string]
Just tried this and it makes no difference speed-wise.
avoid copying, avoid resizing, avoid allocating more seqs.
I'm too tired currently, but tomorrow I'll see what can I do about those. Will report back if I manage to do something useful.
It's quite easy to speed it up, actually. Let's take a look at your transform iterator:
proc flip(s: seq[string]): seq[string] =
result = s # copy
result[0] = s[^1]
result[^1] = s[0]
proc transpose(s: seq[string]): seq[string] =
result = s # copy
for i in 0 .. s.high:
for j in 0 .. s.high:
result[j][i] = s[i][j]
iterator transform(s: seq[string]): seq[string] =
for a in [s, s.flip]: # copy x 3
let
b = a.transpose # copy
c = b.flip # copy
d = c.transpose # copy
# x2
yield a; yield b; yield c; yield d # possibly copy x 4
# total of: copy x 9, possibly copy x 17
Well, you take a sequence BY VALUE but still you don't operate in-place but allocate new sequences all the time. And all strings, it'd add. The bare minimum would be to get rid of copies you don't need:
proc flip(s: var seq[string]) =
swap s[0], s[^1] # in-place
proc transpose(s: var seq[string]) =
for i in 0 .. s.high:
for j in i+1 .. s.high:
swap s[j][i], s[i][j] # in-place
iterator transform(s: var seq[string]): var seq[string] =
var tmp = s # copy
tmp.flip # in-place
template yieldAll(c) =
yield c; c.transpose # in-place
yield c; c.flip # in-place
yield c; c.transpose # in-place
yield c # in-place
yieldAll(s) # in-place
s = tmp # copy
yieldAll(s) # in-place
Now a new seq is allocated (and filled; i.e. deepCopied) only two times: at var tmp = s and at s = tmp. You could further reduce that to one by using shallowCopy in the latter case. Then, if you want to truly use only seq all the time, you can check whenever the following would be faster:
proc flip2(s: var seq[string]) =
for i in 0 .. s.high:
swap s[i][0], s[i][^1] # in-place
iterator transform(s: var seq[string]): var seq[string] =
template yieldAll(c) =
yield c; c.transpose # in-place
yield c; c.flip # in-place
yield c; c.transpose # in-place
yield c # in-place
yieldAll(s) # in-place
s.flip2 # in-place
s.flip # in-place
yieldAll(s) # in-place
It may be tempting to use a 2D array / seq mocking one, as transpose and reading would benefit from memory locality (but not THAT much, I guess). On the other hand, flip (but not flip2!) benefit from memory scattering so it's not that obvious, but it would still faster than flip2, thanks to locality (e.g. you can use copyMem instead of a swap in a loop). Still, I think it might be nice to give 2D array a try.
@Araq Are these seq copied in yield? It would be nice if not but the value semantics (which seem to break on let instead of var, so maybe here too?) make me doubt...
I just grabbed the first part of code that is very easy to understand. I didn't know that changing the order of yielded values doesn't make any difference for you (I did think about how greatly it would simplify transform but forgot when posting). :)
I only had a glance at the rest of the functions. Separately, they don't look that trivial to optimize heavily but that's because I didn't grasp the whole idea of the program (I was quite sleepy when I posted my answer). But I guess you get the general idea now. ;)