I'm porting a python NLP module to nim,it results 4 times slower than python version.
It's for Chinese word segment task.
you can see the main test file(takes about 5 second):
https://github.com/bung87/finalseg/blob/master/tests/test2.nim
and python version test(takes about 1 second):
https://github.com/bung87/finalseg/blob/master/tests/speed.py
the original python module file:
https://github.com/fxsjy/jieba/blob/master/jieba/finalseg/__init__.py
nim :Nim Compiler Version 0.18.0 [MacOSX: amd64]
python:Python 3.6.5 (default, Jun 17 2018, 12:13:06)
cup:2.7 GHz Intel Core i5
ram:8 GB 1867 MHz DDR3
First off, compiling with the command line option -d:release always speeds up Nim code. Still though it's expected that Nim without release mode is faster than Python so I blame the nre module. Beyond that, here's some things I noticed in your code.
Let's go through the cut iterator that your code uses.
iterator cut*(sentence:string):string =
let blocks:seq[string] = filter(nre.split(sentence,re_han),proc(x: string): bool = x.len > 0)
var
tmp = newSeq[string]()
wordStr:string
for blk in blocks:
if isSome(blk.match(re_han)) == true:
for word in internal_cut(blk):
wordStr = $word
if (wordStr in Force_Split_Words == false):
yield wordStr
else:
for c in wordStr:
yield $c
else:
tmp = filter(split(blk,re_skip),proc(x: string): bool = x.len > 0 or x.runeLen()>0)
for x in tmp:
yield x
You use filter and then iterate through the result right after twice here. Converting iterators to seqs is pretty expensive, so it's best to do all you can in 1 iteration.
iterator cut*(sentence: string): string =
for blk in sentence.split(re_han):
if blk.len == 0: continue
if blk.match(re_han).isSome:
for word in internal_cut(blk):
let wordStr = $word
if wordStr notin Force_Split_Words:
yield wordStr
else:
for c in wordStr:
yield $c
else:
for x in blk.split(re_skip):
if x.len > 0 or x.runeLen > 0:
yield x
This doesn't really improve performance, but I thought I'd include it anyway:
proc lcut*(sentence:string):seq[string] =
result = lc[y | (y <- cut(sentence)),string ]
There is already a template in system.nim (the default imported module) for this purpose named accumulateResult. It's used like so:
proc lcut*(sentence: string): seq[string] =
accumulateResult(cut(sentence))
But accumulateResult is deprecated in the devel branch, to our luck you can use sequtils.toSeq at your specific callsite:
for line in lines:
discard lcut(line).join("/")
turns to:
# top of file
from sequtils import toSeq
for line in lines:
discard toSeq(cut(line)).join("/")
This probably doesn't have much to do with the slowness, but you can optimize Table objects with char keys. Tables are currently implemented as a seq of tuple[hash, key, value], and since for chars key is the same thing as hash it would use 8 bytes more memory per entry. This might be optimized in a future version of nim, but for now this works:
proc getFromCharTable[V](charTable: openarray[(char, V)], key: char): V =
for it in charTable:
if it[0] == key:
return it[1]
let foo = {'A': 1, 'B': 2} # the type is an array of (char, int)
echo foo.getFromCharTable('B') # 2
@Hlaaftana1d, I might be completely off here about what you are suggesting, if so let me know.
It seems to me, though, that you are doing a linear search through the table for the correct key. With only 256 possibilities this isn't necessarily terrible, but there is another part to a typical hash table: key placement. If the keys are placed in a predictable location and collisions are handled, the table may have to only scan a few keys to find the correct value.
On the other, this adds significant complexity and can require allocations (for rehashing), not to mention one would have to benchmark to see if for such simply compared char keys there's an appreciable slowdown.
For Linux perf tools is also very nice -- I still wonder why Dom did not even mention it in his guide.
https://fedoramagazine.org/performance-profiling-perf/
Basic usage is
perf record yourNimExecutable && perf report
On macOS, you can use Instruments (part of XCode, but it can also be used as a standalone app) to do time profiling.
On macOS Sierra or later, you can run it from the commandline using:
instruments -l 10000 -D output.trace -t "Time Profiler" /path/to/executable args...
(On older versions of macOS, the iprofiler command does the same thing, though with different options; for those, see iprofiler --help or man iprofiler. You can also run programs from the Instruments UI, but that is more cumbersome to set up.)
The -l option is for the maximum number of seconds the program may run (after that, it will be killed), the -D option specifies the output directory for the trace, and the -t option allows you to choose what kind of analysis to run.
Then open output.trace to be able to inspect the profile in the Instruments UI. You can see a sample screenshot of the UI with the output from the original code here. Don't forget to invert the call tree in the UI (using the button at the bottom of the window), it gives you a much more useful breakdown of where time is spent.
@bung You can replace all code like
p1 = if probRef.hasKey(vChar) : probRef.getOrDefault(vChar) else: MIN_FLOAT
by
p1 = probRef.getOrDefault(vChar, MIN_FLOAT)