Hi everyone,
I've been working on nimhuml, a Nim implementation of the HUML parser and serializer.
What is HUML?
HUML (huml.io) is a serialization language for documents, datasets, and config files created by Kailash Nadh. It looks like YAML but is intentionally stricter, one canonical way to write everything, no indentation ambiguity, no silent footguns.
What does nimhuml do?
Links:
This is still early (v0.2.0) so feedback, bug reports, and contributions are very welcome. Would love to hear from anyone who's been looking for a cleaner alternative to YAML in their Nim projects.
It's unclear why it parses into a JsonNode. I think it would be much better to create something like a HumlNode instead
I'd guess because its the same underlying data structure and it makes it fit in with the rest of Nim codebases.
With which data-langs is Huml isomorph (i mean like translatable without loosing info)?
HUML is basically a projection of jSON that looks vaguely YAML with a very unambiguous parser. You have to use :: for "vectors" rather than it being inferred from context.
Its a language I've looked at in the past and concluded I wasn't very interested in. NestedText is very close to the YAML feel that people tend to actually use. There is a caveat that the only data types are maps, lists, and string, though every data type is realistically passing through strings in production anyway.
The code looks like Direct translation of Python version. Also the code Probably looks like vibecoded maybe You have used Claude ? But that's not the Problem for me as far as You provide disclaimer and some kind of benchmarks.
It will be better for Nim version to split into files instead of single large file and better to maintain it, Clumping into single file I personally don't like as Nim is an expressive language.
I don't mean 30 files and I don't know where You got it ? And 30 lines per file is very small I never meant it.
I mean to split files based on functionality for easy to maintain and good practice, I didn't set any upper limit but just advice to split based on function codes does not on loc.
I just mean to split like : nimhuml.nim (public api), parser.nim, writer.nim that's it and if needed errors.nim.
And 30 loc per file some children make such kind of code ? I don't know why You conclude that from my opinion <3
And it's just personal opinion there is no compulsion
recursive descent parsers taking up hundreds of lines of code is to be expected. my current peg repo parses ford's peg notation in ~900 SLOC and that's genuinely irreducible. i'd prefer if the files were split by concern (reader, writer, document model) but my nim coding styles are particularly nonstandard.
HUML and NestedText are pretty thin specs and if you're reusing the json module's document format this isn't particularly offensive.
raise p.error("trailing spaces are not allowed")
This part is a bit anachronistic perhaps. Araq doesn't really want us using exception throwing going forward. I actually prefer it (but then I've always been more Ada-aligned) but this kind of bailout is discouraged these days.
Araq doesn't really want us using exception throwing going forward.
Not always but in your case, sure, parsing errors are so easy to make "keep going", store the first error in the parser object and count further errors. Then offer a real API for it. Something like that.
I'm still not sure why continuing to parse a failed document is seen as a desirable thing. It can require contextual fix-ups that result in misunderstanding a document and generating error noise.
ex. GCC encounters a typo, fails to understand, assumes its an integer and keeps going, generates 30 more errors about all the things an integer can't do. those errors are worthless because none of the code actually tried to do those things to integers.
it seems largely like parser writers just flexing.