Hi All, I installed Nim yesterday and have been just trying to write very small pieces of code to get to know the language/libraries better. i have used python before and the fact that Nim uses similar syntax is a big draw for me. Added to that is the fact that it compiles to C code which helps in performance. Once I get better at the language, I would like to contribute to its development in whatever way I can.
One very small(micro) piece of code I wrote was to read a csv line by line and compute the average line length. The program is pasted below. The program takes around 60 secs to complete on my machine (Windows 7 64 bit). The file has around 6.8 million lines. I use the csv file at work as part of a larger application. When I read the file through scala language and checked the time, it took only around 5 secs to run. I am not comparing languages, but just want to know if there is anyway to speed up the code? I used -d:release flag to compile.
import times
echo("Avg Line Length = ",totallen div count) echo(cpuTime() - t0)
Welcome!
Thanks for your replies. I used the parsecsv module and it helped speed up the execution a lot.
putting the code in a 'main' proc should really help.
By this, I assume that the code should be placed inside a proc. On my system, this did not help in speeding up the program.
Varriount: Any chance you could help find any bottlenecks?
The bottlenecks are probably still the same as in the original thread (the one that def linked). I.e., the iterator reading the file character-by-character, which can incur significant per character overhead. The solution would be to use a buffer to read multiple characters at a time.
Note that file streams by themselves do not solve this problem, either (though one could do a BufferedFileStream that would, which is basically what you have in Java/Scala). If anything, the readLine implementation for file streams is even slower, since it also reads the file character by character, and uses a less efficient way of doing that. The reason that parsecsv is faster is that the underlying lexbase-based scanner does not read line-by-line, but in 8k chunks.