playing around with text search (strutils.find) I realized, that the nim find implementation uses a Boyer-Moore algorithm. This is nice in many cases, but there are scenarios where a naive brute force search is faster (small search patterns and/or a simple one shot search like find("nim is a great programming language", "great")). The situation gets worse, when a stupid user like me uses the find repeatedly without precalculating the SkipTable. - I know stupid programmers write stupid code.
Now I tried to find a better find algorithm. After some googling my first stop was a Knuth-Morris-Pratt, but in all my benchmarks a simple brute force find was better (Maybe someone can show me an artificial search string and pattern where KMP shines? ). Than I had a genius idea and "invented" a rolling hash search. After implementing it, I googled again and found out that I re-"invented" a naive form of the Rabin-Karp algorithm :) But to my astonishment my naive implementation works very well. In all my tests, it is always better than the naive brute force search. In many cases it beats the BM and when it looses it is not so bad. But I want to make some more tests to be sure. Can someone show me some real world test where BM shines - my current observation is the longer the pattern, the better is BM.
When it turns out like I guess, I would like to suggest the following for the strutils find:
what do you think ?
@dataman thanks that was a really helpful link.
@araq I tested my simple hash algorithm against brute forth and BM with that smart tool. It always wins against brute forth. Against BM it wins always for pattern lengths under 16. Given that the BM has always an overhead for the first search I would use a heuristic with pattern under 16 or search text under 100 char.