I faced with weird performance issues with pcre engine that nim uses. Probably Nim uses PCRE library without JIT.
I implemented simple expression test based on common regex_dna benchmark with 50 MB input file (in.txt):
grep -cE 'agggtaaa|tttaccct' in.txt # 0.11 s
grep -cP 'agggtaaa|tttaccct' in.txt # 0.38 s
# 1.6 s. One would think this is slow...
import pegs
let exp = peg"agggtaaa/tttaccct"
var c = 0
for line in "in.txt".lines():
if line.contains(exp):
c += 1
echo (c)
# 4.4 s! 10 times slower than grep with same PCRE engine
import re
let exp = re"agggtaaa|tttaccct"
var c = 0
for line in "in.txt".lines():
if line.contains(exp):
c += 1
echo (c)
# 2.5 s better than re but this is not full PCRE speed
import nre
let exp = re"agggtaaa|tttaccct"
var c = 0
for line in "in.txt".lines():
if line.contains(exp):
c += 1
echo (c)
I looked through grep source and noticed that there are:
I inserted those options into re module and... voila, 0.2 s, even faster than grep. But I cannot do the same with nre because there is noticeable heap allocation overhead (it allocates something every pcre_exec call).
P.S. probably this is PCRE (8.37) issue because it disables JIT unconditionally if flags == 0x0. I just tried to put there 1 or 2 and JIT started to work.
There was an earlier thread about performance of regex
https://forum.nim-lang.org/t/2312
I inserted those options into re module and... voila, 0.2 s
I'd be interested to know how or what you did there.