nimforum mirror - Why does lexbase close the stream?

mreiland (orginal) [2015-08-23T14:57:41+02:00] view original

I was poking around parsecsv and thought the semantics were a little odd so I started digging and ended up in lexbase.

https://github.com/nim-lang/Nim/blob/master/lib/pure/lexbase.nim#L143

In particular, the open does not open the stream, but the close closes it. This seems like a bad idea on several levels, so I'm curious if there's a real, practical reason for this, or if it was a convenience for the author when it was written that's never been revisited.

From my perspective this is not ideal for a few reasons.

The API is not symmetric (more confusing)

The behavior isn't obvious, I would expect an API that takes a stream to leave the lifetime management to the caller

That's an AWFULLY LARGE assumption on the part of lexbase and parseCsv (that the caller wants to close whatever that stream is connected to).

lexbase cleans up its buffer memory in the close call, this means the caller must call close in order to be a good citizen (or rely on the language runtime behavior)

edit:

Taking a bit of a closer look at the code in lexbase, the skipUtf8Bom proc looks dangerous to me

proc skipUtf8Bom(L: var BaseLexer) =
  if (L.buf[0] == '\xEF') and (L.buf[1] == '\xBB') and (L.buf[2] == '\xBF'):
    inc(L.bufpos, 3)
    inc(L.lineStart, 3)

This will increment regardless of the actual position in the buffer. While I understand that a caller should only be calling this function when they're at the start of a file, if they make a mistake I feel like this proc should either error or noop (and my vote honestly goes for error, they can track it down quicker). This is the stuff of hard to track down bugs in my opinion.

The alternative is to have the BOM checks happen at the current buffer position, but I would still vote for throwing an error since BOM's belong at the start of the file and it's most likely a mistake on the caller's part (and erroring immediately will make it easier for them to identify it).

Araq (orginal) [2015-08-23T20:42:14+02:00] view original

or if it was a convenience for the author when it was written that's never been revisited.

Yep. I don't see the problem though, I think it's natural and it worked well for very many use cases. Also lexbase is a semi-public API, shared by most parsers in the stdli but was not designed for general consumption.

mreiland (orginal) [2015-08-24T02:20:42+02:00] view original

Hey Araq, I'm not trying to attack the design, I just thought I could contribute by striking up a conversation about a piece of the API that may not have been given too much thought due to the workload involved in putting together a language and it's stdlib.

Also, I don't know what natural means in this context, but I do feel it's probably best to define it so we're on the same page as to what a good nim API feels like.

In the interim, I'll be writing my own parser since I don't want those assumptions forced into my API's and a CSV parser is simple enough for my use case.

Are you willing to revisit that bom proc? That's seriously the stuff of nightmares there, if someone hasn't been bitten by that behavior yet, they will be eventually.

Araq (orginal) [2015-08-24T08:44:04+02:00] view original

In the interim, I'll be writing my own parser since I don't want those assumptions forced into my API's and a CSV parser is simple enough for my use case.

It's far less work to fix lexbase... I didn't mean to discourage you but note that lexbase is used quite a bit so you either make your changes backwards compatible or you need to update all the parsers that use it. ;-)

Are you willing to revisit that bom proc? That's seriously the stuff of nightmares there, if someone hasn't been bitten by that behavior yet, they will be eventually.

Surely this can be an optional step.

mreiland (orginal) [2015-08-24T14:39:18+02:00] view original

If you're suggesting I do it, I don't have an issue with that with the caveat that I'm new to nim and I don't know where to start, what your process is, etc.

As for the amount of work, for me it's a matter of not having to maintain a separate version of the stdlib over the long haul for my own purposes when a CSV parser for a specific use case is simple enough to do.

But by all means, direct me where I need to go to get started and I'm more than willing to take a crack at doing it myself.

Mirror of forum.nim-lang.org

1569 :: Why does lexbase close the stream?