nimforum mirror - casE SenSitivity

mason_mcgill (orginal) [2014-08-12T22:53:51+02:00] view original

Hello,

I'm new to Nimrod, so before I ask my question I'd like to congratulate the community on a powerful, approachable, forward-thinking language!

My question: I know about the "--cs:partial" command-line option, but I'm wondering if there are plans to extend it to address any of the the following use-cases:

Math: I often write forumlas in which λ_n and λ_N are intended to appear similar, yet be semantically distinct.

Python Iteroperability: It would be nice to be able to overload . to call things like __new__ or __len__.

JSON Interoperability: It would be nice to be able to overload .= to create JSON strings that conform to style-sensitive protocols (e.g. {"firstName": "Fred", "lastName": "Rick", "lastName_encoded": "Ichray"}).

Araq (orginal) [2014-08-12T23:52:01+02:00] view original

Python Iteroperability: It would be nice to be able to overload . to call things like __new__ or __len__.

You can always use something like py["__new__"] = myNew for these (rare) cases. That said, identifiers are case preserving. If it happens to be a valid Nimrod identifier you should be lucky. So your JSON examples should all work.

That said, I'm not that happy with --cs:partial. I'd like to further distinguish FULLCAPS from fullcaps for better C interop. (And I really want --cs:none when hacking debugging code into the compiler or working in a REPL. ;-) )

mason_mcgill (orginal) [2014-08-13T00:15:14+02:00] view original

OK. Is this likely to remain a compiler option, or will there eventually be a per-module case-sensitivity pragma, so I can write a library that uses case-sensitivity internally [1] that you can use from your case-insensitive REPL?

[1] e.g. some algorithm written with "λ_n ∈ {λ_0..λ_N}" notation.

Araq (orginal) [2014-08-13T01:35:29+02:00] view original

A per-module basis is quite useful, but it's also hard to implement and it might lead to a desaster where people submit modules with --cs:full that you can't even import with --cs:none. So then the compiler has to check your module interface plays nice with all case sensitivity options ... This way lies madness.

filwit (orginal) [2014-08-13T06:25:15+02:00] view original

I want to add something to this. I've always enjoyed Nimrod's style-agnostic "code your way" approach to case sensitivity and underscores.. however I had to move my project to --cs:partial awhile ago (and I'm very glad that's now an option) due to name conflicts between types & getters... for example:


type
  Shader* = ref object
    ...
  Material* = ref object
    shader: Shader # private member
    ...

proc shader*(m:Material): Shader = m.shader # public getter

Without --cs:partial the getter and Shader type collide. However, while partial case-sensitivity works well to avoid these conflicts, I would actually prefer a better system of avoiding this and not need to rely on it (though I'll always keep my code very case consistent for potential use with it).

A simple solution to basic getters is something Araq has mentioned before as a potential future feature: + for "readonly" exposure. eg:


type
  Material* = ref object
    id*: GLint # public member
    shader+: Shader # readonly member
    ...

However, that doesn't work well if you want anything more complex behind a getter (although I think the feature would be very convenient, and should be eventually added for that sake alone). What seems like the best solution here is to allow types and procs to share the same name, and choose "logical defaults" for what is chosen when, with more explicit distinction for the potential conflict areas such as object constructors. eg:


type Foo = ...
proc foo: Foo = ...

var a: Foo # uses type
var b: foo() # calls proc (if it returns a typedesc)
var c = foo() # calls proc
var d = Foo(:) # calls type constructor

Although I don't know if that syntax is good, and might be a bit confusing if there was a lot of symbols with the same names floating around. IDK exactly, these are just thoughts I'm throwing out there. Perhaps instead of allowing types/procs to share names, there could just be a better "getter" syntax which avoids these conflicts?

[EDIT] err... well I feel kinda silly since it looks like using backtics proc `.shader`*(m:Material)... syntax seems to avoid the name conflicts. However, whenever I try to actually use the getter, i get Error: conversion from Material to Shader is invalid, so there's still a conflict somewhere, but perhaps this is a symbol resolution bug?

gradha (orginal) [2014-08-13T08:56:37+02:00] view original

To avoid name case conflicts the tutorial suggests prefixing variables with a letter like F. You can write a quick and dirrty macro to help with the declaration. I was meaning to write a more nimrodic macro for this but got distracted.

filwit (orginal) [2014-08-13T09:28:07+02:00] view original

That's not really the problem. It's not a name conflict between a member and a proc, but between a proc and a type within the same module. I could use Hungarian notation on either the proc or type (getShader or TShader) but at that point I would much rather just use --cs:parital.

Also the error from my [EDIT] was a false report. It doesn't avoid the type/proc name conflict even with backticks.

mason_mcgill (orginal) [2014-08-13T19:52:00+02:00] view original

Couldn't render post #2810.

Araq (orginal) [2014-08-13T20:06:55+02:00] view original

The T and P prefixes are obsolete: https://github.com/Araq/Nimrod/wiki/NEP-1-:-Style-Guide-for-Nimrod-Code

mason_mcgill (orginal) [2014-08-13T20:32:19+02:00] view original

This is exactly what I was looking for; thanks!

Araq (orginal) [2014-08-14T10:53:58+02:00] view original

I'm playing with the idea to change the semantics of --cs:partial: Instead of only distinguishing the first character, do it completely differently and yet capture the basic motivation behind it. --cs:partial should perform an identifier normalization:

In ALL_CAPS identifiers the underscores are ignored`, but no further normalization is done.

*_c where c is any lower-case letter is normalized to *C, so foo_bar becomes fooBar, Foo_type becomes FooType etc.

*_C where C is any upper-case letter is normalized to *C, so foo_Bar becomes fooBar.

For multiple upper-cased letters only the first stays upper-cased: parseURL becomes parseUrl etc. Edge case: C_codeGenerator becomes CCodeGenerator.

Opinions?

filwit (orginal) [2014-08-14T11:58:20+02:00] view original

What problem does this fix? Personally I'm much more attracted to to simplistic "first-letter matters only" rules of the existing --cs:partial. In fact I think it's pretty much perfect as-is and would even argue for it eventually becoming default with --cs:none being the opt-in.. it's avoids symbol conflicts (like my previous post illustrates) while still allowing programmers to opt-in to snake_case, without consequence, if that's what they prefer.

Araq (orginal) [2014-08-14T12:09:29+02:00] view original

Well it allows for lambdaN vs lambdan for math people and allows FOOBAR for fewer name conflicts in C wrappers (though this happens rarely). But yeah, the current rule is much simpler.

In fact I think it's pretty much perfect as-is and would even argue for it eventually becoming default with --cs:none being the opt-in.

--cs:partial will be the default soon. Somebody needs to nimrod pretty all the Babel packages... ;-)

filwit (orginal) [2014-08-14T12:21:07+02:00] view original

it allows for lambdaN vs lambdan for math people

Well on second thought these rules aren't too bad. Underscores would just be considered an alternative to capitalization, which is easy to explain..

In ALL_CAPS identifiers the underscores are ignored ... For multiple upper-cased letters only the first stays...

This is the confusing part I think. Since ALL_CAPS are really only useful for C-wrappers (and, as you say, rare) perhaps if this wasn't part of the change I wouldn't personally mind the change (considering it allows for fooBar/foobar distinction).

--cs:partial will be the default soon

Awesome :)

filwit (orginal) [2014-08-14T12:37:42+02:00] view original

For multiple upper-cased letters only the first stays

er... i guess without this change as well it would be almost identical to --cs:full huh. So maybe leave this in but not the ALLCAPS rule (as it's a bit confusing when caps are handled differently)?


foobar -> foobar
fooBar -> fooBar
foo_bar -> fooBar
fooBAR -> fooBar
FOO_BAR -> FooBar

Though that doesn't seem much easier to learn than your original proposal. IDK, if I'm just being over-sensitive to rule complexity here. Maybe other's will like your original idea just fine. But I would get a lot of feedback on this idea before changing anything, especially if this is designed to become default. Last thing you want is new users getting hung-up on complex symbol rules who's only benefit (at-a-glance) might appear to be allowing for optional underscores.

mason_mcgill (orginal) [2014-08-14T22:44:31+02:00] view original

I realize case-insensitivity might just take some getting used to, but, for the sake of "user-feedback collection", I'll try to convey my first impression of your proposal as a "math person" :) (I'm a researcher at a university, mostly used to C++/D/CUDA and Matlab/Python/Julia).

Underscores are used heavily in the mathematical code I've seen, and for more than just separating words. This is probably because
- Authors are translating from LaTeX.
- Authors are concerned with readability issues for non-native speakers [http://bit.ly/1rwXxsJ].

In mathematical code, capitalization is used for more than just distinguishing betweens types/non-types (for better or worse). A is probably a matrix and a is probably not.

From the description above, it's unclear whether the normalization considers unicode, or just ASCII.

The "all-caps" rule seems strange, since AB -> AB and ABc -> Abc (I think).

I prefer snake_case, but I'd gladly give that up for the simpler mental model that comes with writing in a case-sensitive language.

For prototyping, I'd rather have a REPL with simple autocompletion than a langauge that forgives me (via a complex symbol normalization procedure) when I forget to capitalize something.

filwit (orginal) [2014-08-15T02:17:19+02:00] view original

Araq: Thinking about this more, I'm wondering, given your plan for --cs:partial as default, how that's going to even work.. if a module uses both Foo and foo, that completely breaks --cs:none.. and any "style guideline" which prevent this would negate the benefits of --cs:partial as default and just cause you to have to constantly explain to people why their (compiling) PR requests don't match the guidelines.

In all honestly, I'm not sure having two options is really a realistic solution, and I think --cs:none is showing it's flaws (seems lots of people, myself included, want to distinguish symbols by capitalization for various reasons). That said, I really like Nim's ability to interchange camelCase and snake_case, and I think there might be a solution which addresses both these concerns with one rule: make underscores an alternative to capitals, eg:


fooBar -> fooBar
foo_bar -> fooBar

FooBar -> FooBar
_foo_bar -> FooBar

fooBAR -> fooBAR
foo_b_a_r -> fooBAR

# the following could work, but might be better to just make 2 or more _'s invalid
foo__bar -> foo_bar
foo__Bar -> foo_Bar
foo___bar -> foo_Bar

Just a thought. I'm a little concerned about this, as I really need --cs:partial now, and I'm afraid of you removing that feature :| so if everyone wants --cs:none then I would vote to leave that as default and keep --cs:partial how it is now (or at least keep it simple to understand, like my suggestion with the underscores).

Araq (orginal) [2014-08-15T08:54:05+02:00] view original

In all honestly, I'm not sure having two options is really a realistic solution

Well we can always dream up more complex rules for disambiguation ("in this context it can only be a type anyway") but I'm not a fan of this either. However I never thought it is a realistic solution and it has always been designed to enable a transition period. The default value of --cs defines the language!

which addresses both these concerns with one rule: make underscores an alternative to capitals

Nice idea. Your rule is certainly simpler than mine.

but might be better to just make 2 or more _'s invalid

They already are.

Mirror of forum.nim-lang.org

523 :: casE SenSitivity