Hi,
I think that there are more disadvantages (performance wise) in the current behavior that Int = Int64 on 64bit platforms. Int's are often used for loop variables and to index arrays, but who has the need to access more than 2^32 elements in a seq or array by default? By the current definition the pressure on the CPU cache is increased with every Int and every struct that uses an Int, which leads to slower code, which could be avoided by defaulting Int=Int32 on 64bit platforms. I am quite sure, that in 90% of the use cases a Int32 would be sufficient And if you need to do pointer arithmetic it is better to use cast[TAddress] instead cast[int]
I know a change would cause incompatibilities on existing code, but maybe it is worth the pain.
FWIW D has taken this decision to always have int as 32bit: much more easy to use from a portability POV and more efficient for the cache.
In a different but related issue, I'm a bit "shocked" that a char is 8bit in a language which use unicode for strings by default.. char as 8bit when ASCII was the norm made sense, now it doesn't, an "ubyte" type would be better (to emphasize that a character/code point may not fit in 8 bit) and either no char type but a 32bit code point type (not sure what should be the name: code_point is too long, cdpt?) or a 32bit char type (probably confusing for C developpers though).
It surely works in 90% of the use cases, but it's not correct. ;-)
Your concerns about CPU cache efficiency are valid (and quite frankly I think 64bit is a stupid architecture to begin with) but then the question arises: why 32 bit? why not 16 or 8 bit? In the compiler itself I also use int16 to save memory...
FWIW D has taken this decision to always have int as 32bit: much more easy to use from a portability POV and more efficient for the cache.
Yeah and then you use 'size_t' everywhere to get sane integer behaviour back...
In a different but related issue, I'm a bit "shocked" that a char is 8bit in a language which use unicode for strings by default..
Shrug using uint16 for char doesn't work either (Unicode is broken in Java/.NET) and uint32 is expensive for the caches... The world is not UTF-8 anyway, it's "UTF-8 with occasional CP-1252 characters in the same string" ... ;-)
why 32 bit? why not 16 or 8 bit?
because even I want to use more than 256 elements in an array ;-) - sometimes
Yeah and then you use 'size_t' everywhere to get sane integer behaviour back...
OK I agree, that there is a problem. The choice is between correctness and performance. Currently I would vote for performance, because it is an advantage, that you can measure. Correctness is an advantage too - but its more difficult to measure ;-)
But maybe I can't see all the implications of this.
Yeah and then you use 'size_t' everywhere to get sane integer behaviour back...
To check your assertion, I tried to measure the usage of int and of size_t, so I did a fgrep of size_t and 'int '(int followed by a space) in the whole phobos: the result is: 12954 lines match for 'int '(1) and 4109 match for 'size_t'.
The "hard" part is how to interprete these numbers.. My interpretation is:
Shrug using uint16 for char doesn't work either (Unicode is broken in Java/.NET) and uint32 is expensive for the caches... The world is not UTF-8 anyway, it's "UTF-8 with occasional CP-1252 characters in the same string" ...
I didn't say that by default the processing of strings should be done with uint32 instead of uint8: both are useful, what I don't like is the name 'char': a string is a list of character, yet a character doesn't always fit in a char, which makes 'char' an annoying trap that every new programmer has to learn to avoid, sigh..
1: this is certainly an undervalued number: auto x = 1; declares x as an integer even though 'int' is nowhere to grep..
the result is: 12954 lines match for 'int '(1) and 4109 match for 'size_t'.
Kudos for measuring things.
My interpretation differs though: 'int' is preferred in D code as it's shorter and prettier than 'size_t'. And so the sample is heavily biased. It doesn't mean that each usage of 'int' is intended.
Now back to Nimrod: I see no need to have a symbol 'int' that is always 'int32' everywhere. If you want 'int32' on every platform, just use 'int32'. The real problem is that Nimrod is opinionated and integer literals have type 'int'. However, I strive to make Nimrod easier to use in this respect: integer literals will become a separate type and implicitely convertible to any integer type if the literal is in the proper range.
This has the nice side effect that Nimrod may even be useful to program 16 bit CPUs conveniently.
You're right that the name 'char' could be better, but I can't see any better alternative ('octet' perhaps?). And while we're nitpicking, 'string' should have been named 'text' ...
And while we're nitpicking, 'string' should have been named 'text' ...
Shorter name and even more expressive, I like it!
However, I strive to make Nimrod easier to use in this respect: integer literals will become a separate type and implicitely convertible to any integer type if the literal is in the proper range.
Oh yes, please. It's annoying and makes no sense to get errors for expressions like this:
var x: Int32
x += 1