While 0.3 focused on correctness, the overall theme for 0.4 is to polish the output, making it more regular and less wasteful of indent and newlines.
The changes in formatting are enabled mainly by reusing the concepts of "prefer-things-on-one-line" and "don't-waste-a-line" features already present when rendering some nodes - in particular:
# If the whole function call doesn't fit with `=` but fits on its own on a single line, do that:
let reallylongvariablename =
function(arg0, arg1, arg2)
# ...otherwise, "begin" the function call on the same line as `=`
let variable = somelongfunction(
arg0 = value,
arg1 = value2,
arg2 = value3
)
Similarly, when formatting infix expressions, preference is given to formattings that put the operator last while keeping operands on a line of their own, thus avoiding non-structural line breaks in common situation:
# We could have fit "parts" of the expression after `and` here, but
# that would not represent the structure well
if L.buf[L.bufpos + 1] in {'0' .. '9'} and
(L.bufpos - 1 == 0 or L.buf[L.bufpos - 1] in UnaryMinusWhitelist):
discard
We also learned to stack dot calls, as commonly seen in "builder"-style API:s:
let x =
somevariable
.function0(32)
.function1(42)
When formatting, nph tries to keep apart "simple" things like literals and identifiers from complex expressions - when something is "simple", it will use a more compact formatting to ensure that things like lists of numbers don't end up taking too much vertical space - this release saw the addition of . to the list of "simple" things, such that introducing a module or object name (mymodule.symbol) doesn't cause an explosion.
Finally, NeoVim editor integration was contributed by foxoman 🎉
Unless significant bugs appear, I'm hoping to keep the 0.4 formatting stable for some time for the format to sink in and for significant issues to be discovered, while at the same time avoiding disruption.
The plan is to increase the time between each release for each release that passes until we get to something that feels really good and deserves a "stable" moniker, which hopefully isn't too far off.
In terms of areas of future improvement and possible contribution, doc comments probably stand out - this work however should probably be undertaken in concert with doc generation tooling so that both nph and nim doc end up following similar conventions - the comment handling in the parser remains one of the most fragile aspects of automated formatting and documentation tooling alike and would greatly benefit from a holistic approach.
Well, the way docs are implemented right now is that the parser shoehorns doc comments into somewhat random locations in the AST and gives this to nim doc, but there isn't a comprehensive set of rules for what comment belongs to which item:
proc f(
arg: int, ## Is this a comment that belongs to arg?
## What about this one? Or does it belong to arg2
arg2: int,
## What about this one? and why does it need a comma above?
)
type X = object
## Is this a comment about X or the field that follows?
field: int
The parser is full of little quirks like that which would benefit from a simple-to-remember rule, such as "require indent for doc comments, attach them to the less indented thing".
proc f(
arg1: int
## Indented and belongs to "previous thing"
arg2: int
)
Unfortunately, it looks like the parser / grammar for doc comments was built .. incrementally .. so there are many conventions in the current parser, many of them arbitrary / unstructured.
One option is indeed to go the javadoc way of repeating everything in the doc comment but I'd prefer an approach where the grammar and parsers are reviewed to increase the regularity of where doc comments may appear and how they "attach" to a specific element such that we avoid needless repetition while maintaining syntactic simplicity (such as reusing the "ideas" of indent representing structural relationships).
core of the problem .. no dedicated slot
Yes and no - it's relatively easy to add a "comment slot" to every node (like the upstream AST has a magic comment field that is used sometimes and how nph has 3 - pre/mid/post-comment) - what's hard is knowing which node a particular comment belongs to while parsing - for this, there is no consistent rule that's possible to apply, partially because the grammar itself is somewhat random for different nodes, both when parsing comments but also when parsing the nodes themselves - ie optInd vs flexComment vs indAndComment` vs "mandatory vs optional comma" vs "mandatory vs optional indent" and so on - many composed nodes (like identdefs but also procty etc) differ in where indent and comments are allowed "inside" the node (before and after the : - before and after the = etc).
When we also take into account concepts, generics, post and do expressions, they all show significant departures from each other in terms of rules for whitespace and comment, so the proper "attaching" of comment to nodes becomes problematic in and of itself, even if a proper space for them existed in the AST.
it's relatively easy to add a "comment slot" to every node
Yeah, but that's what got us into the mess. If the doc comment can be attached to every node you end up with guessing and heuristics and a sloppy way of doing things.
A doc comment cannot refer to "any" node, it's documentation that belongs to a declarative construct.
fair - nph has to handle non-doc-comments as well however, hence the "every node" thing in nph specifically.
By holistic, this is probably more or less - the problem of comments, doc comments and formatting is something that would benefit from a review taking all 3 into account, and it's worth dedicating an iteration of quality thought to.
just thank you again for this piece of tecnology.
On top of formatting, I've found that it's very useful when nph does NOT format on save: it helps to discriminate erroneous AST from generic compile-time error.
"don't-waste-a-line", but...
let x =
somevariable
.function0(32)
.function1(42)
and not
let x = somevariable
.function0(32)
.function1(42)