Hi to everybody,
I have put on GitHub a Nim library to parse TOML files (https://github.com/toml-lang/toml). It is MIT-licensed and available at the URL https://github.com/ziotom78/parsetoml. (I submitted a PR for having it added to Nimble, if there are no problems I hope it will appear soon.)
Currently it is able to correctly parse the test files provided in the latest version of the TOML repository (including the "hard" test, https://github.com/toml-lang/toml/blob/master/tests/hard_example.toml). It has a quite complete test coverage (although not every case is covered). Things I would like to do for the next releases, in order of importance:
Comments and suggestions are welcome!
Edit: links fixed after fadg44a3w4fe's comment
Really nice! The logic is very easy to follow, and the design looks well thought out. I just want to point out, in case you weren't aware, that Nim does have a style guide , with naming suggestions for enums, spacing, etc.
What concerns me is the naming of TomlValueKind's members. Since it's a public enum, the members can be used without full qualification, and using a somewhat vague naming scheme such as 'kind<X>' could cause confusion. I would personally go for something along the lines of 'tvk<X>', 'tvKind<X>', or similar.
Stylistic concerns aside, this looks to be a very useful module, and definitely something I would consider using.
Lots of your links seem broken, I'm not sure why.
I've been working on a TOML parser myself, but I got caught up in some yak shaving and haven't finished yet. If you'd like to look at it, see https://gist.github.com/7382d036b5cfc612cfb0
https://github.com/ziotom78/parsetoml/blob/master/parsetoml.nim#L235-L270 isn't really necessary, the unicode module does the same thing: http://nim-lang.org/unicode.html
https://github.com/ziotom78/parsetoml/blob/master/parsetoml.nim#L279 is actually a bug, n in nim is platform dependent. 'l' would be correct.
For datetime, I use option for optional segments so that I can keep perfect back-and-forth. See the gist.
I'd also like to point out that TOML's test suite is woefully inadequate, it doesn't test that [ foo ] => "foo". I'd use the proposed ABNF at https://github.com/toml-lang/toml/pull/236 instead.
Thanks for your nice comments, Varriount, fadg44a3w4fe, def, and Nikki!
Varriount: You're right. I decided to write this library as a way to better understand Nim, and I discovered the awesomeness of pure enums while I was in the middle of coding it. In fact, you can determine if an "enum" in the library was designed early or late by its purity. Since at the moment nobody else is using the library, I got the courage to change the API and make TomlValueKind a pure enum. Currently this change is available in the devel branch.
fadg44a3w4fe: Sorry for the links, I fixed them. Many thanks for sharing your code, I have had a read at it and have found your implementation of the Datetime object very interesting, I think I'll copy it (I added a issue here: https://github.com/ziotom78/parsetoml/issues/4 ). I have also fixed my implementation for parseUnicode by relying on the unicode.toUTF8 proc (how could I have missed that? I had a look at the procedures contained in that module, but I didn't notice its presence…). I see that your code reads the whole file in memory and then uses string utilities (and the re) module. I initially thought about this approach, but then I discarded the idea because of two things:
def: I didn't think about the idea of implementing procedures for writing TOML files. However, this would fit perfectly with my old idea of translating some JSON configuration files I have (for an old C++ legacy program I'm using for my job) into TOML files. It's true that some similar tool probably already exists, but it would be an interesting exercise for a novice like me to write it in Nim.
I was thinking that streaming parsing is unlikely to be necessary, and so I thought it'd be better to optimize for simplicity than for functionality. It doesn't return until it's done parsing anyway, and config files tend to be small enough and pcre fast enough that I don't believe (no concrete numbers) it really matters if it does everything all at once or separated.
PCRE is installed on pretty much every linux distro, and it's in homebrew for mac. Things are harder on windows, I'm not sure how to get pcre to work there.
fadg44a3w4fe: My idea was to use TOML for providing a summary of the calculations of the numerical code I write in my job (I am an astrophysicist). These are quite huge MPI codes that run on hundreds of processes and take hours/days to complete. Usually, such programs write a large number of log files (typically one per job, and because of the difficulty in debugging MPI programs you usually put a lot of messages in each of them). When one of such jobs run, I am always digging into the partially-written log files to check that everyting is ok so far, and to try to figure how much has already been done and how much is left to compute. I've always dreamed of patching such programs and making them write a summary of the computations they have completed so far on stderr. I would then pipe stderr to another program that shows the progress and other useful information for each process. I think that TOML would be the perfect format for the information being piped between the two programs. (It's true that so far the functions I wrote don't return until the parsing is complete, but it's easy to add a callback argument to parseStream that must be called whenever it adds a new node to the tree.)
Regarding PCRE, is this the only available option in Nim, apart from PEG, or is there some other module providing a small, standalone regexp engine? A few days ago I found on HN this link and discovered that if you don't aim for some advanced features, it's not very difficult to implement one. Perhaps at some time I might try to implement it in Nim — it would be useful for people wanting to port their codes to Nim but not willing to convert they regexps in PEG.
gradha: Thanks for the link, I have read some of your documentation in the past and wondered how you managed to keep the GitHub page synced with the nimdoc documentation. After several projects documented using doxygen and similar tools, I must confess I am no more convinced that having documentation intertwined with code is a good idea. It makes the code longer and harder to seek. Now I prefer to write the documentation from scratch, like if I were to write a novel, so that I can present the functions and the data structures in an order that is best suited for a pedagogical presentation. It requires more effort to write, but it's easier to make the reader feel the text flowing naturally. Moreover, a few important functions in my TOML library (getString, getInt, getFloat …) are defined by means of a template (https://github.com/ziotom78/parsetoml/blob/master/parsetoml.nim#L1040-L1075): how should I use docstrings in this case?
Also, from the existing documentation I have the impression that nim doc produces one HTML page per module. Is it really so? I usually use Sphinx, which allows me to split the documentation in as many pages as I decide: I think this makes the document easier to read and navigate. (An example of what I mean is the documentation of a C library I wrote a few years ago: http://hpixlib.readthedocs.org/en/latest/. I find the subdivision in sections particularly useful in this case.)
I think you are conflating the reference pages nim generates and plain documents explaining how to use them. People don't go to Nim's system module and read it from beginning to end, they go to the tutorials or other documents, which instead link to the reference.
The only difference with regards to Spinx seems to me that you can embed the docstrings directly in the manually crafted rst. This could be done through Nim's jsondoc command which dumps the individual docstrings, then a special include directive could read the generated json files and embed them.
With regards to the templates, IIRC the doc2 command processes them, so they could in theory contain their own docstring. Of course in this case you could have a generic "This is a generic proc doing foo with bar", since I guess the user figures out the rest looking at the parameters of the signature.
Sphinx is in any case much better.
@zio_tom78 TOML doesn't really seem like a good choice for this, but it might be easier to have a document separator instead. ex, something like the following YAML-inspired example:
[some_table]
val1 = "123"
[other_table]
val2 = 123
---
[some_table]
val1 = "321"
[other_table]
val2 = 321
---
That way you can still do most the streaming stuff while also keeping the API and implementation simple.
re. PCRE, yes, I believe it's the only regex library in Nim. It isn't hard to build PCRE though, I see no reason that it can't be seamlessly used with {.compile.}.
start-time = 2015-01-18T03:00:03Z [input-parameters] user = "foo" data-directory = "/datastorage/foo/planck" output-directory = "/datastorage/foo/my_analysis" num-of-mpi-processes = 126 num-of-data-files = 1463 ----- [model-fitting] start-time = 2015-01-18T03:00:05Z end-time = 2015-01-18T05:00:03Z norm-chi-sq = 0.86 failed-convergence = ["datafile.0005.fits", "datafile.0008.fits", "datafile.0016.fits"] ----- [CG-inversion] max-step-bound = 1000 steps-required = 170 final-rz/rzinit = 2.4e-13 estimated-error = 1.3e-9 start-time = 2015-01-18T05:00:03Z end-time = 2015-01-18T06:00:03Z ----- # The computation is still running, so more stuff is going to be appended here
@zio_tom78 What I mean is that TOML is not explicitly designed for this sort of usage; it's 1 file - 1 document. JSON and YAML have the idea of documents built in.
The data format you posted is not TOML, but an extension on TOML; you can't directly pass that data to a parser without pre-processing it. There's nothing wrong with that, but it can't strictly be called TOML.