I was poking around parsecsv and thought the semantics were a little odd so I started digging and ended up in lexbase.
https://github.com/nim-lang/Nim/blob/master/lib/pure/lexbase.nim#L143
In particular, the open does not open the stream, but the close closes it. This seems like a bad idea on several levels, so I'm curious if there's a real, practical reason for this, or if it was a convenience for the author when it was written that's never been revisited.
From my perspective this is not ideal for a few reasons.
edit:
Taking a bit of a closer look at the code in lexbase, the skipUtf8Bom proc looks dangerous to me
proc skipUtf8Bom(L: var BaseLexer) =
if (L.buf[0] == '\xEF') and (L.buf[1] == '\xBB') and (L.buf[2] == '\xBF'):
inc(L.bufpos, 3)
inc(L.lineStart, 3)
This will increment regardless of the actual position in the buffer. While I understand that a caller should only be calling this function when they're at the start of a file, if they make a mistake I feel like this proc should either error or noop (and my vote honestly goes for error, they can track it down quicker). This is the stuff of hard to track down bugs in my opinion.
The alternative is to have the BOM checks happen at the current buffer position, but I would still vote for throwing an error since BOM's belong at the start of the file and it's most likely a mistake on the caller's part (and erroring immediately will make it easier for them to identify it).
or if it was a convenience for the author when it was written that's never been revisited.
Yep. I don't see the problem though, I think it's natural and it worked well for very many use cases. Also lexbase is a semi-public API, shared by most parsers in the stdli but was not designed for general consumption.
Hey Araq, I'm not trying to attack the design, I just thought I could contribute by striking up a conversation about a piece of the API that may not have been given too much thought due to the workload involved in putting together a language and it's stdlib.
Also, I don't know what natural means in this context, but I do feel it's probably best to define it so we're on the same page as to what a good nim API feels like.
In the interim, I'll be writing my own parser since I don't want those assumptions forced into my API's and a CSV parser is simple enough for my use case.
Are you willing to revisit that bom proc? That's seriously the stuff of nightmares there, if someone hasn't been bitten by that behavior yet, they will be eventually.
In the interim, I'll be writing my own parser since I don't want those assumptions forced into my API's and a CSV parser is simple enough for my use case.
It's far less work to fix lexbase... I didn't mean to discourage you but note that lexbase is used quite a bit so you either make your changes backwards compatible or you need to update all the parsers that use it. ;-)
Are you willing to revisit that bom proc? That's seriously the stuff of nightmares there, if someone hasn't been bitten by that behavior yet, they will be eventually.
Surely this can be an optional step.
If you're suggesting I do it, I don't have an issue with that with the caveat that I'm new to nim and I don't know where to start, what your process is, etc.
As for the amount of work, for me it's a matter of not having to maintain a separate version of the stdlib over the long haul for my own purposes when a CSV parser for a specific use case is simple enough to do.
But by all means, direct me where I need to go to get started and I'm more than willing to take a crack at doing it myself.