nimforum mirror - 12+ naming rules

Memophenon (orginal) [2016-07-10T18:16:01+02:00] view original

As a fresh Adam in the Nim paradise, I didn't want to start too ambitiously. Just do some thinking about and experimenting with naming the other creatures. Oof, quite a tough job! I've found 12+ laws of nature so far, that apply to both keywords and identifiers:

Most characters above U+001F are allowed (e.g. déjà_vu), but some require the name to be stropped, e.g. {`}2nd_thought{`}, or {`}echo{`} as a variable name [{`} means a backquote]. Forbidden characters are #, ,, ;, and U+007F (DEL), unless quoted as character or string.

Single and double quotes must constitute valid character and string values, e.g. {`}A'b'cde{`} and {`}A"bc"de{`}. Or they must be quoted as character or string themselves.

Stropping delimiters cannot be stropped themselves otherwise than quoted as character or string.

Spaces and matching pairs of single and double quotes are removed, so {`} A 'b'c de {`} ⇒ Abcde.

The first character after removal of the spaces and quoted is case-sensitive.

The next characters are case-insensitive, as far as they are within the ASCII range U+0020 – U+007F. In fact, the uppercase letters within this range are converted to lowercase. So ABÇDÉ ⇒ AbÇdÉ (not Abçdé).

A single underscore is used for discarding values, as in let (first, _) = (x: 100, y: 50).

Underscores are not allowed (a) at the beginning, (b) at the end, (c) directly after another underscore, or (d) after an en-dash.

Other underscores are removed, so A_bc_de ⇒ Abcde.

En-dashes are not allowed (a) at the end or (b) directly before an underscore.

En-dashes are not removed. Nevertheless, they are all ignored when comparing names, except for an en-dash at the first position.

When you use characters in the ranges U+0020 – U+002F, U+003A – U+003F, U+005B – U+005E, and U+007B – U+007E, a plethora of more lenient but also more complex rules fall to you. For instance, {`}_ A __(B C)_.DE!_{`} is a valid variable name, which can be referred to as {`}_ a __(bc)_.de!_{`} (you still need the spaces around the a to make it a valid expression, but the space between b and c can be omitted). So you can start and end an identifier with an underscore, if you really want to.

I may have missed something. Actually I'm looking for some bible that describes the rules in more detail and in a more consistent way than the empirical ones above. But maybe it's too early for that.

Anyway, if I have to distill a commandment from all this, it would be "Stick to the alphabet and numbers, or thou willst be driven out of paradise."

Krux02 (orginal) [2016-07-10T19:30:51+02:00] view original

Here are my two cents on this topic. I think these rules are too complicated. I would prefer if I can focus on my problem, then on weather two identifiers are the same or not, when they don't appear to be the same.

But yes, I really do think this process should be documented. I would also add the information on the normalized identifier representation that can be used to check weather two identifiers are equal, by just comparing two strings.

Araq (orginal) [2016-07-10T20:12:17+02:00] view original

Please enlighten us of how you came up with these rules, most of which are wrong and nowhere to be found in the compiler's source code nor in Nim's spec. :-)

Memophenon (orginal) [2016-07-10T20:50:31+02:00] view original

@Araq: Simply by trying things out and attempting to make out some patterns in the observed behaviour. And having a peek now and then at the resulting C source.

As you can see, I gave up at rule twelve to be more specific. It became too cluttered when I experimented with exclamation marks and dollars, especially in combination with underscores and en-dashes. So it would be better indeed to build up the rules from the source, in my opinion. On the other hand: tests have the last word.

Araq (orginal) [2016-07-10T21:17:16+02:00] view original

And having a peek now and then at the resulting C source.

Which is wrong. The generated C code is irrelevant for identifier equality. The compiler could emit NIM_$ID for everything instead and yet it wouldn't affect Nim.

Araq (orginal) [2016-07-10T21:26:26+02:00] view original

The rules are more like the following:

Underscores and em-dash are ignored except that '_' is a "don't care" identifier.

First char is CS, others in the range A-Za-z are CI.

Underscores and em-dash are separators.

Backticks can be used to construct other identifiers where everything in the backticks has to be a valid token. Whitespace between the tokens is ignored.

That's 4 rules and the backtick rules are mostly irrelevant in practice. For example, in Java you can either write π or \u03C0. Does that mean I need to worry all the time about my hypothetical Java code becoming unreadable anytime soon? Hardly.

Memophenon (orginal) [2016-07-10T22:04:35+02:00] view original

@Araq 19:17:16: That makes the distinction between rule 9 ('underscores are removed') and a part of rule 11 ('en-dashes are not removed but ignored') irrelevant indeed. Most of my rules are based on the behaviour of the executables or the compilability of the source, however.

@Araq 19:26:26: That four rules don't explain everything. I'm not quite sure what you mean by your last paragraph. In general, different representations of essentially the same character doesn't make life easier when you want to search through the source for occurrences of them, but that topic has been discussed elsewere. I was just overwhelmed by the complexity of the whole thing, that's all. Why are {`}!_:!{`} and {`}!:_!{`} okay and is {`}!_:_!{`} not okay? It makes me curious about the underlying mechanisms, in case I want to play with it. In most cases, the alphabet and numbers would suffice me, and I would certainly avoid the pathological ones like this example.

Araq (orginal) [2016-07-10T22:20:58+02:00] view original

This the grammar rule:

symbol = '`' (KEYW|IDENT|literal|(operator|'('|')'|'['|']'|'{'|'}'|'=')+)+ '`'

':' is not an operator: =, :, :: are not available as general operators; they are used for other notational purposes. (From http://nim-lang.org/docs/manual.html#lexical-analysis-operators )

Hence ! :! is valid (operator followed by operator), ! : ! is not (operator followed by colon followed by operator). Simple. :P

Memophenon (orginal) [2016-07-10T22:57:29+02:00] view original

@Araq: I guess that explains most of the apparently whimsical behaviour pointed at in rule 12, if not all. Have to rethink {`}echo___echo{`} (wrong) vs {`}echo_echo{`} (right, collapses to echoecho) vs {`}!___!{`} (right, collapses to {`}!!{`}). Still don't see that one.

Only 11 rules to go. :-)

Araq (orginal) [2016-07-10T23:36:38+02:00] view original

Still don't see that one.

That's 3x the valid token _.

Only 11 rules to go.

It's really only 4 rules if you don't describe them in the most convoluted manner possible and at the same time mixing them with codegen choices to make a point.

moigagoo (orginal) [2016-07-11T07:01:36+02:00] view original

My first and only thought about these "12 rules" is "WTF." Why would you do something like that? Reverse-engineer the naming rules that are already defined in the docs? Write them down so obscurely they're impossible to actually understand? Mix naming conventions with naming restrictions?

Memophenon (orginal) [2016-07-11T22:50:45+02:00] view original

Please don't get wrong about my intentions. If there's any point I want to make, it's about my ignorence.

I simply started with the link mentioned by moigagoo and the explanation in the first chapter Nim in Action, did some experiments, found some behaviour that didn't seem to be described by those texts, and tried to find more patterns in what I had observed. This is not a straightforward process. Hence the somewhat unorganized abundance of 12+ rules. I don't want to fight for them, quite the contrary, I want them to be reduced to a clear and compact system where nothing can be taken out of without loosing completeness. Any help with that is appreciated.

In spite of the grammar rule that Araq has mentioned, which reveals a lot more about the underlying mechanisms indeed, I don't think we are at that stage of completeness yet, speaking about documentation. I still fail to understand why {`}!___!{`} or {`}___{`} is 3 times a valid _ token (collapsing to {`}!!{`} or {`}_{`} respectively), and {`}echo___echo{`} is not, for instance. I've no idea why x_–y, x––y and –xy are good, and x–_y, x__y and _xy are bad, whithout adding more rules than I have read so far in the documentation (the dash-like symbols here are en-dashes). Yes, these are edge cases, and probably of no practical value in daily affairs. But what's wrong about that in the context of understanding a language's design?

@moigagoo: Conventions and restrictions are two different things. I was just aiming at the restrictions (or to put it otherwise: degrees of freedom) for now. Some Nim conventions can be found here. I've read them and I do know conventions are a subset of restrictions. If I have used some confusing terms in this thread, I apologize for that.

@Araq: Are naming rules a good starting point for learning a new language? In this case, I'm inclined to think not. I've noticed the structure of Nim is reflected in the naming to some extent. I can see the beauty of that, but it's also a complication in mastering (and documenting) Nim. Maybe I should study Nim for a year or so before trying to really understand the production rules for names.

Araq (orginal) [2016-07-12T01:12:55+02:00] view original

Your newly found edge cases have all been fixed, thanks. And the 4 rules that I gave still apply. ;-)

PS: I really dislike this em-dash special casing and might remove it from the language again. I never understood why the fonts cannot be patched instead so that the underscore looks more like a dash...

Mirror of forum.nim-lang.org

2367 :: 12+ naming rules