nimforum mirror - Horizontal parsing

Araq (orginal) [2013-08-29T00:18:03+02:00] view original

Nimrod might get "Horizontal parsing" in some of its next versions. Horizontal parsing means that instead of hard coding the precedence rules in the grammar the whitespace is used to determine the precedence in a particular context. This means i+1 * 3 is parsed as (i+1) * 3. Now that may be highly controversial for + and * but it looks better when you want to introduce Unicode operators: i ø 3→8 means i ø (3→8).

For Unicode operators a table like in http://nimrod-code.org/manual.html#precedence would be much too large to remember. In fact the table already is too large and even I do not remember all of its details. ;-)

Another advantage of horizontal parsing is that it fixes some minor issues with Nimrod's current syntax: echo $1 is currently parsed as (echo) $ (1) which is a very common trap to fall into. Likewise echo (1, 2) should mean to write the tuple (1, 2) and not simply function application.

Any opinions?

MFlamer (orginal) [2013-08-29T05:19:15+02:00] view original

I'm new so this may be common knowledge, can we assign operator precedence? Seems easier than a table of complicated rules.

adrianv (orginal) [2013-08-29T09:13:55+02:00] view original

for unicode operators I understand this motivation, but I see some problems with existing code. Maybe it's a good idea to make an exception for + - * (if you copy/paste code from another language you might get unexpected results). For logical ops I don't see a problem because the rules are different in any language and I never trust them and use parenthesis.

Btw how will this be parsed then ? (what counts for the precedence - the leading or trailing whitespace => maybe they must be the same)



var x = i+1 *
  3

enurlyx (orginal) [2013-08-29T12:33:50+02:00] view original

I like the idea. As you said, the table would be to large, does it mean there will be no precedence defined? Would the following throw an compiletime error or will it evaluate from left to right?

x = a+b*c

exhu (orginal) [2013-08-29T12:35:57+02:00] view original

this looks error prone for me used to not account for white-space and lots of hidden bugs...

Araq (orginal) [2013-08-29T13:00:00+02:00] view original

There will still be special rules for when there is no surrounding whitespace so a+b*c produces the expected a+(b*c). Also things like x == 3 and y == 4 will continue to work as before.

Btw how will this be parsed then ? (what counts for the precedence - the leading or trailing whitespace => maybe they must be the same)


var x = i+1 *
  3

Leading whitespace is what counts for this very reason. Or maybe the newline counts as many spaces.

dom96 (orginal) [2013-08-29T23:13:49+02:00] view original

Well, at first I feared that looking at the following:


var x = i + 5  *  34

I would have to count the number of spaces manually to figure out the precedence.

But with monospace fonts it's clear enough, I just hope there will be no surprises in practice.

wkornewald (orginal) [2013-08-30T09:58:19+02:00] view original

I think that having unicode operators is already not a good idea. The code becomes more difficult to write and especially novice users won't know how to even type those characters. Good design strives to be self-explanatory and that's best done with the characters that have labels on our keyboards.

Horizontal parsing is also very unusual and error-prone when modifying code. You might just want to add some whitespace for readability or tune the formula a little bit and introduce some "innocent" whitespace and suddenly the meaning of your formula has changed. Overall, I don't think it's worth pursuing this feature because it doesn't add significant value and at the same time it introduces another possible bug source.

For me personally, getting rid of forward declarations would be a far greater improvement than horizontal parsing.

In general, having great solutions for large target audiences would be much more beneficial for Nimrod. For example, there still is no good solution for sharing the same code base between iOS and Android and desktop apps. Also, maybe Nimrod could become interesting for the scientific computing crowd or for game developers (everything performance-critical where people still use low-productivity languages) or maybe even web apps if it had nice libraries for those.

EDIT: And regarding your echo example, I'd rather make Nimrod's syntax less ambiguous and have just one obvious way to call functions than introduce yet more complexity just to solve a problem which in itself was born from unnecessary complexity. If you keep things simple, obvious, and non-ambiguous then such problems won't come up. Complexity breeds more complexity.

Araq (orginal) [2013-08-30T10:33:53+02:00] view original

I think that having unicode operators is already not a good idea. The code becomes more difficult to write and especially novice users won't know how to even type those characters. Good design strives to be self-explanatory and that's best done with the characters that have labels on our keyboards.

It's 2013 now. When do you think we might get rid of Ascii as a lowest common denominator? ;-)

Horizontal parsing is also very unusual and error-prone when modifying code. You might just want to add some whitespace for readability or tune the formula a little bit and introduce some "innocent" whitespace and suddenly the meaning of your formula has changed.

I cannot see how whitespace that contradicts precedence can ever improve your formular's readability.

EDIT: And regarding your echo example, I'd rather make Nimrod's syntax less ambiguous and have just one obvious way to call functions than introduce yet more complexity just to solve a problem which in itself was born from unnecessary complexity.

The command syntax has been designed with the macro system in mind so that


m x:
  a

resembles


if x:
  a

There is no unnecessary complexity here. Ymmv of course.

wkornewald (orginal) [2013-08-30T11:22:18+02:00] view original

It's 2013 now. When do you think we might get rid of Ascii as a lowest common denominator?

I'm not saying that Unicode looks bad. It looks great. The problem is, nobody can comfortably enter Unicode characters on our ASCII keyboards. It doesn't matter that it's 2013 because our input devices haven't significantly changed since 1984. The Fortress scientific programming language team also thought Unicode was a fantastic idea because many math formulas often contain special characters, but in their wrapping up report they weren't as hyped about Unicode, anymore (see "Mathematical syntax"): https://blogs.oracle.com/projectfortress/entry/fortress_wrapping_up

Note that their target audience generally writes lots of math formulas and they found entering Unicode daunting. So, I'm not making this up. Others have already tried this with actual users and it didn't work out in practice.

I cannot see how whitespace that contradicts precedence can ever improve your formular's readability.

I agree with you, but that's not the point. People are responsible for making their code readable and no sane developer would ever write something like 1+2 * 3 and mean 1 + (2 * 3). In practice, that problem doesn't exist, so it doesn't need to be solved. ;)

However, what does happen in practice is that you have to wrap a long formula over multiple lines (e.g. when rearranging it for readability) and in that case whitespace significance will get in your way. Also, people new to the language will have additional problems reading the code because it's unnecessarily unusual. Why break established rules just to get a minor improvement in edge cases?

The command syntax has been designed with the macro system in mind so that [...]

Ah, I see. Could we somehow solve this ambiguity for echo? E.g., always require parentheses (like Python did with print() in the transition from version 2 to 3)?

By the way, is there an easy way to quote a post in this forum somehow?

tjpalmer (orginal) [2014-07-31T18:34:42+02:00] view original

What is the future of strongSpaces? Is it likely to stay supported? Might it become default? I think a compiler option might at least be nice, to avoid needing to specify it on each file, if a project wants to use it throughout.

I ask because I've seen and thought about this feature before, and I think I like it, but I haven't used it in any language in practice. Either way, my inclination is that it's also nice to be able to look at someone else's Nimrod code and know how to parse it without knowing what options they've set.

(Anyway, I'm new to this forum. Been watching Nimrod some for a while. I just hadn't asked any questions yet. And I've never used Nimrod in anger and only toyed with it a little, and it's not all my style, but I still find it very interesting.)

fuzzthink (orginal) [2014-07-31T19:30:17+02:00] view original

Project I'm working on using Nimrod actually involves using operators much. Since being new to Nimrod, I had to lookup the precedence table all time, which is not only a pain, but more importantly, interrupts my train of thought. Horizontal parsing would be a blessing since I'm already using whitespace for readability.

tjpalmer (orginal) [2014-07-31T20:48:50+02:00] view original

Also, I might simplify the rules to 0 spaces (both sides), 1 or more spaces (both sides), or line break, for three levels of precedence. If space/line-break only on one side, that implies prefix/postfix.

If anything more than that is needed, I'd recommend knowing precendence or using parens. I think the 0, 1, 2, 4, or 8 thing is awkward and odd looking. And although I don't usually, some people might use spacing for alignment, ...

If I use Nimrod's current strongSpace rules, I'd likely only ever use 0 or 1 spaces myself, and I'd likely use 2 before a line break just to approximate my line-break third-level preference. That is, I'd probably pretend to use my suggested rules instead of the ones Nimrod currently provides, if I do end up using Nimrod at some point.

(And I'm also likely against Unicode operators, as a complete aside.)

Anyway, I've already said more than I can justify by my newness and noncommitment here, so I'll stop.

Araq (orginal) [2014-08-01T01:24:41+02:00] view original

Anyway, I've already said more than I can justify by my newness and noncommitment here, so I'll stop.

No worries, I like your suggestions. I have an idea of how to add postfix operators to the language, but people are already scared enough so this will wait for version 2. ;-)

Unfortunately I didn't got much feedback about the implemented #! strongSpaces syntax option from people that actually tried it, so I don't plan on making it the default before version 1 is out. But it won't disappear either.

tjpalmer (orginal) [2014-08-01T16:54:04+02:00] view original

Thanks much for the feedback!

On the issue that I'd perhaps like to be able to make it standard across a project (without repeating it in each file), I at least tried out the 'include' feature, and it didn't even propagate across the file boundary. Maybe if there's some extension (other than '.nim') to specify a file as a fragment (not to be used alone), it could be okay to let settings like this bleed across includes?

hweller (orginal) [2015-08-03T17:57:05+02:00] view original

I am very much in favor of Unicode operators and added them to XL2 so that I could implement field algebra with appropriate operators and precedence.

One option in Nim would be to implement all the generally useful operator symbols with fixed precedence as was done in Fortress, see chapter 16 in

http://www.ccs.neu.edu/home/samth/fortress-spec.pdf

Interestingly the "juxtaposition" operator is also supported which allows for very readable mathematical expressions. I simulated this in XL2 by supporting a Unicode narrow space as equivalent to a multiplication which would also work well with horizontal parsing but this may be considered as going too far for many. Anyway, Fortress offered everything one could possible want to implement mathematical formulas in a clean and readable form.

Another option would be allow for the specification of additional operator symbols and their precedence in the compiler configuration file and build the compiler with them.

A third option would be to support user-defined operator symbols and their precedence in code.

If supporting user-defined operator symbols would be easier with horizontal parsing then I would be happy with that as using spaces in expressions to clarify the precedence is good practice anyway.

If adding Unicode operator support to Nim is something the core team would like to see happen I would be interested in getting involved in the implementation.

jibal (orginal) [2015-08-03T20:10:46+02:00] view original

Nim's lexer has very naive support for Unicode. If a token starts with an ASCII letter or any byte with the high bit set, then all subsequent ASCII letters, ASCII digits, or any byte with the high bit set, is part of the token, which is considered an identifier. If a token starts with any of the ASCII operator characters, then any subsequent such character or any byte with the high bit set is part of the token, which is considered an operator.

This is clearly inadequate because it makes no distinction between Unicode character classes, it treats any Unicode character as an identifier character if preceded by an identifier character, and as an operator character if preceded by an operator character (so any Unicode operator must be separated from a preceding identifier by space, else it is parsed as part of the identifier), and operators that contain Unicode characters still have to start with ASCII characters. (And if that is changed we're left with the issue of what precedence they have.)

For proper Unicode support the lexer needs to be aware of the types of Unicode characters and must treat the input as a sequence of Unicode characters rather than a sequence of bytes ... but hopefully without much impact on its performance. At one point I started familiarizing myself with the compiler code and the first thing I looked at was the lexer. I started writing a version of the lexer that was Unicode-aware without a performance hit, using a rewritten version of unicode.nim (the current one is far from compliant, and even less so now that Unicode 8 was recently released; mine generates Nim code from the Unicode data files, though the code generator isn't written in Nim). But the project got too big and I haven't finished it. Also I got carried away with my changes and it would probably be too much to be accepted as a PR.

mindplay (orginal) [2015-08-08T18:06:24+02:00] view original

FWIW, I think the idea of using whitespace controlling precedence is going to be a huge WTF to practically anyone coming from basically any mainstream language I can think of. It's a fairly exotic idea, it's no doubt going to throw a lot of people off the horse, and could potentially slow this language from reaching anywhere near it's potential.

I honestly would prefer having some operators that don't have defined precedence, if that simplifies things - being forced to indicate precedence with parens is, sure, more verbose, but it's going to be a lot more intuitive to anyone experienced with mainstream languages. I think there's a limit to the number of surprises you can pack into a language, and I think you're already close to the sweet spot, a balance between the familiar and the new, introducing new only where it contributes substantial value to the overall language experience. It sounds like the sort of thing that could slow down the uptake for newcomers substantially - there's already a wealth concepts that are going to be totally new to a lot of developers, and this, in my opinion, is the sort of feature that is going to require a lot of reasoning to comprehend why the language would depart so radically from what most people are used to.

Just my two cents :-)

jibal (orginal) [2015-08-08T20:39:21+02:00] view original

@mindplay

It's only applied in files that have

#! strongSpaces

Mirror of forum.nim-lang.org

209 :: Horizontal parsing