Hi
Just rolling up my sleeves to learn Nim, but have run into an immediate roadblock.
I though I'd stress-test it by reading in a large .csv file, as this will be a common requirement.
My code is crashing after reading around 51,000,000 lines. It was only a test so all I'm doing is counting the rows.
I am running Win 10 Pro on a well resourced workstation.
I can't see that it's anything in my code, but I'd be happy to be corrected. It's basically a copy and paste from the manual.
I'd appreciate some help getting this sorted, or my flirtation with Nim is sadly going to die at birth...
import parsecsv
let path = "e:\\TickData\\AUDUSD.csv"
var p: CsvParser
p.open(path)
p.readHeaderRow()
var count:uint64 = 0
while p.readRow():
count = count + 1
if count mod 1000000 == 0:
stdout.write "+"
p.close()
echo "Num rows: " & $(count)
Output:
c:\Users\user\AppData\Local\Programs\nim-1.4.8\lib\pure\parsecsv.nim(276) readRow
c:\Users\user\AppData\Local\Programs\nim-1.4.8\lib\pure\lexbase.nim(111) handleCR
c:\Users\user\AppData\Local\Programs\nim-1.4.8\lib\pure\lexbase.nim(99) fillBaseLexer
c:\Users\user\AppData\Local\Programs\nim-1.4.8\lib\system\fatal.nim(49) sysFatal
Error: unhandled exception: over- or underflow [OverflowDefect]
I think the problem must be triggered by something in your data. What I would recommend is changing your main loop to:
var count: uint64 = 1
while p.readRow():
inc count
stderr.write count, '\n'
Then compile & run your program again only now you will know where in your data things fail.
Despite having some "advisory RFC", CSV is not that well defined a format and its many variations make robust parsing difficult.
I was sceptical that the data would be the issue - I harvested it myself and know the quality.
But I did another run and caught the precise line where it occurred. There is nothing wrong with the line - it's well formed and only 40 chars long. I've visually inspected the file and there's nothing amiss.
This does seem to be a Nim issue.
I'm VERY keen to use Nim for this project and become an active member of the community. But obviously I can't commit unless someone can help me get to the bottom of this.
Is the line content or the line number produce the crash?
Can you swap this line with another and see if the crash happens one the same line number?
I very much appreciate your efforts to help. But it really can't be the specific row in the file that is causing the crash.
I've inspected it in Vim, and it's well formed as I said. There are no strange characters, and the length is only 40 chars, so I can't see how it could have anything to do with an overflow. It's only text, after all.
And I've used a big int for the counter, so that's clearly not the culprit.
I've just run the import code on a different file from a different price stream, with entirely different content, though in a very similar format. The parser crashed at just about the same spot. The records aren't fixed length - the prices are floats and length can vary by rounding. But essentially it processed the same number of lines before crashing.
For context, it crashed at line 51,387,744 around 40% through the file.
It's not any kind of major memory leak - memory on the workstation stayed stable throughout the run.
As a complete newbie to Nim I don't have the background to dig into the guts of this and get it solved. I really do need some help from the community, please, if this is going to be my new home...
L.offsetBase += pos
it fails on this line in lexbase.nim so my guess is that because L.offsetBase is an int, and int32.high div 51_387_744 == 41 which means that the parser is failing because the offset in the file is too many characters to be stored in an int32 (assuming you are on an 32-bit system). Try is on a 64-bit system and it should work.
Also, at the top of the documentation, https://nim-lang.org/docs/parsecsv.html, there is an example using a FileStream. I would suggest using that for very long files. Also, take a look at https://github.com/status-im/nim-faststreams
@ynfle
Great catch! I should have realised there was something suspicious about that figure.
The problem isn't my hardware, it's my wetware. I somehow managed to install the 32 bit version of Nim. I was having a wrestle with a false positive from Defender as I tried to download, and it must have distracted me...
With the 64 bit version installed the import is now running to completion.
Despite my embarrassment, I'm hugely relieved that Nim is functioning as advertised and that there is an informed community who are kind enough to help me out even when I'm being an idiot.
I've decided to use this project as a motivation to learn a new language and had set my heart on Nim. Now I know that the libs aren't broken, I can get going...