The API naming design document mentions that include and exclude should be abbreviated as incl and excl. These are used exclusively in the critbits, intsets and sets modules. However, they are essentially a different term for add and del/delete which are used for other types like sequences or strings.
What is the benefit of using these terms instead of the more familiar add and delete? I don't even understand why there is a card alias for len. I've never heard of card before and don't even know what it means as it is not mentioned in the design document.
gradha: I've never heard of card before and don't even know what it means as it is not mentioned in the design document.
It's for the cardinality of a set.
never heard of card before
http://en.wikipedia.org/wiki/Cardinality
And incl vs add -- incl can be a NOP when the set already includes the element. Indeed guessing the meaning of incl may not always work, in Ruby we have methods incl?() or include? with the question mark at the end to query if the container includes an object, with boolean result. I can not say that I do not miss the ? a bit in Nimrod, also the ! at the end of a method name for dangerous or self modifying methods -- sort!() sorts in place, while sort() returns a sorted copy.
A related remark: Is there already a final solution for conflicts of variable names and keywords? Recently I had to use event.event_type for Ruby GTK3-bindings -- seems that the authors regard type as a reserved word in Ruby. For Nimrod it is for sure. For Nimrods GTK2 I have seen type with bakticks, but someone other told me just to write typ without the terminating e. All not very nice. For my understanding Nimrods identifier should only contain Ascii digits and numbers and underscore in between. Two strange nonprintable characters are allowed also -- I can not remember their ascii code currently and do not understand why they are available. (One usage may be to use them for automatic text substitution when a wrapper file is read -- I tried to replace glibs leading underscores for private names with these characters. Seems to work, the advantage is that the wrapper textfile still contains the original identifiers, which increases readebility. Initially I replaced each leading _ with the text 'underscore', which is not too nice.) If we had at least one valid printable ascii character which is also allowed in identifiers, then we could use that for conflicts if identifier names from libraries with Nimrod keywords. For example we may write event.type@ to make it distinct from keyword. If that appended character is well defined, then everyone will know that identifiers identical to keywords must have such a sign at the end.
And incl vs add -- incl can be a NOP when the set already includes the element. Indeed guessing the meaning of incl may not always work, in Ruby we have methods incl?() or include? ...
But it does work, we have incl vs containsOrIncl...
with the question mark at the end to query if the container includes an object, with boolean result. I can not say that I do not miss the ? a bit in Nimrod, also the ! at the end of a method name for dangerous or self modifying methods -- sort!() sorts in place, while sort() returns a sorted copy.
Well instead we got a type and effect system... Also Ruby's naming convention makes very little sense. If "returns bool" gets a ? and "returns void" gets a ! what do you write for e.g. "returns int" and "returns float"? Ah you see there are no cute symbols left for these, so that's not annotated at all then. To me this suggests that it never was about readability in the first place, but instead it was about looking sexy.
If we had at least one valid printable ascii character which is also allowed in identifiers, then we could use that for conflicts if identifier names from libraries with Nimrod keywords. For example we may write event.type@ to make it distinct from keyword.
Well but we already have a means for this via the backticks. I prefer 'typ' because it's easier to type and I can't see how 'type@' would be better in this regard.
These are used exclusively in the critbits, intsets and sets modules. However, they are essentially a different term for add and del/delete which are used for other types like sequences or strings.
Well for most data structures add is different from incl when it comes to how duplicates are handled. I don't mind add as an alias except for the fact that I don't really like aliases to begin with. We'll also get &= as an alias for add, write or send so that data sinks can be handled in a generic way more easily.
Well but we already have a means for this via the backticks. I prefer 'typ' because it's easier to type
The backticks use -- I have only read about it for defining operators like + and only found it by accident that it was also used when identifiers conflicted with Nimrod keywords. Indeed I also prefer typ instead of type with backticks, I used that for the GTK3 wrapper. But I am not sure if it was a good decision. If the backticks use is well defined, most people will assume that they have to use type with backticks and not typ with missing e.
For a test I used this at a beginning of a wrapper file:
#! replace(sub = "_", by = "x80") | replace(sub = "t", by = " ")
It makes Nimrod happy with leading underscores for wrapper files. And the second replace allows me to continue using tabs as I did before Nimrod time. Is at least the first substitution OK?
And still the question about the two nonprintable allowed characters in identifiers:
letter ::= 'A'..'Z' | 'a'..'z' | 'x80'..'xff'
Why do we allow x80 and xff ?
Stefan_Salewski:
'x80'..'xff' - there are 128, not 2, characters, and they are printable (dependent on a character set). Letters like ö, ä, non-latin alphabets and all such.
Stefan Salewski: Why do we allow x80 and xff?
In UTF-8 encoding the 0x80..0xff range includes all code points from U+0080 upwards. I'm not sure myself if that's a good idea, but it's not as though they're being used for anything else. It does allow you to do stuff like:
import math
let π = pi
let r = 2.0
echo π*r*r
or let non-English programmers write identifiers in their native alphabets.
Sorry, I miss-read the documentation, have not noticed that indeed the range 'x80'..'xff' is allowed.
But I have to admit that I do not really understand it. My impression was that for UTF-8 the ASCII letters are presented in only one byte, but others like Pi or German Umlauts are multi-byte. So my current understanding is, that when the first byte of an identifier is in the range 'x80'..'xff' there must follow at least one more byte? (We have no one-byte umlaut encoding as in latin-1 available?). I can not see which utf-8 characters are really allowed. Not a problem for me, I generally use only ASCII characters, but others may be confused also?
And the surrounding backticks -- is that a special case, or are the backticks really part of the identifier encoded in utf-8, so that we may use an arbitrary number of backticks in each identifier?
So my current understanding is, that when the first byte of an identifier is in the range 'x80'..'xff' there must follow at least one more byte?
Yes, characters up from 'x80' are multibyte, and bytes they consist of are themselves 'x80' and greater, so are allowed in identifiers.
And the surrounding backticks -- is that a special case, or are the backticks really part of the identifier encoded in utf-8, so that we may use an arbitrary number of backticks in each identifier?
No, they are delimiters, quotes for identifiers. `type` means identifier named "type".
#! replace(sub = "_", by = "x80") | replace(sub = "t", by = " ")
> It makes Nimrod happy with leading underscores for wrapper files. And the second replace allows me to continue using tabs as I did before Nimrod time. Is at least the first substitution OK?
Ugh, no the first substituion is not OK. The tab thingie might become a proper feature though if people continue to insist on using Notepad or NEdit for programming...
Stefan Salewski: So my current understanding is, that when the first byte of an identifier is in the range 'x80'..'xff' there must follow at least one more byte?
All multi-byte UTF-8 sequences are exclusively made up from characters in the 0x80..0xff range, all single-byte characters are in the 0x00..0x7f range (and are ASCII characters). See the wikipedia article for details.
This is specifically so that strings can be processed semi-blindly without awareness of UTF-8 encoding (e.g., as long as an UTF-8 string is valid, you can scan for the end of a string in the same way you'd do for ASCII or Latin-1 and without having to decode any multi-byte characters).
Unfortunately, not all encodings have the same property (for example, SJIS doesn't).