nimforum mirror - Is there any advantage that Int != Int32 on 64bit ?

adrianv (orginal) [2012-06-20T10:41:31+02:00] view original

Hi,

I think that there are more disadvantages (performance wise) in the current behavior that Int = Int64 on 64bit platforms. Int's are often used for loop variables and to index arrays, but who has the need to access more than 2^32 elements in a seq or array by default? By the current definition the pressure on the CPU cache is increased with every Int and every struct that uses an Int, which leads to slower code, which could be avoided by defaulting Int=Int32 on 64bit platforms. I am quite sure, that in 90% of the use cases a Int32 would be sufficient And if you need to do pointer arithmetic it is better to use cast[TAddress] instead cast[int]

I know a change would cause incompatibilities on existing code, but maybe it is worth the pain.

renoX (orginal) [2012-06-20T11:07:29+02:00] view original

FWIW D has taken this decision to always have int as 32bit: much more easy to use from a portability POV and more efficient for the cache.

In a different but related issue, I'm a bit "shocked" that a char is 8bit in a language which use unicode for strings by default.. char as 8bit when ASCII was the norm made sense, now it doesn't, an "ubyte" type would be better (to emphasize that a character/code point may not fit in 8 bit) and either no char type but a 32bit code point type (not sure what should be the name: code_point is too long, cdpt?) or a 32bit char type (probably confusing for C developpers though).

Araq (orginal) [2012-06-20T11:08:59+02:00] view original

It surely works in 90% of the use cases, but it's not correct. ;-)

Your concerns about CPU cache efficiency are valid (and quite frankly I think 64bit is a stupid architecture to begin with) but then the question arises: why 32 bit? why not 16 or 8 bit? In the compiler itself I also use int16 to save memory...

Araq (orginal) [2012-06-20T11:20:58+02:00] view original

FWIW D has taken this decision to always have int as 32bit: much more easy to use from a portability POV and more efficient for the cache.

Yeah and then you use 'size_t' everywhere to get sane integer behaviour back...

In a different but related issue, I'm a bit "shocked" that a char is 8bit in a language which use unicode for strings by default..

Shrug using uint16 for char doesn't work either (Unicode is broken in Java/.NET) and uint32 is expensive for the caches... The world is not UTF-8 anyway, it's "UTF-8 with occasional CP-1252 characters in the same string" ... ;-)

adrianv (orginal) [2012-06-20T12:36:54+02:00] view original

why 32 bit? why not 16 or 8 bit?

because even I want to use more than 256 elements in an array ;-) - sometimes

Yeah and then you use 'size_t' everywhere to get sane integer behaviour back...

OK I agree, that there is a problem. The choice is between correctness and performance. Currently I would vote for performance, because it is an advantage, that you can measure. Correctness is an advantage too - but its more difficult to measure ;-)

But maybe I can't see all the implications of this.

renoX (orginal) [2012-06-20T14:08:50+02:00] view original

Yeah and then you use 'size_t' everywhere to get sane integer behaviour back...

To check your assertion, I tried to measure the usage of int and of size_t, so I did a fgrep of size_t and 'int '(int followed by a space) in the whole phobos: the result is: 12954 lines match for 'int '(1) and 4109 match for 'size_t'.

The "hard" part is how to interprete these numbers.. My interpretation is:

the '32bit integer' type is used much more often (at least 3 times more) than the 'integer pointer' type, so for a better readability the shortest type name (int) should be used for 32bit integers.

size_t is still used very often, so if possible a better name should be used, I like 'word' myself, though this can be thought as a bike-shedding issue..

Shrug using uint16 for char doesn't work either (Unicode is broken in Java/.NET) and uint32 is expensive for the caches... The world is not UTF-8 anyway, it's "UTF-8 with occasional CP-1252 characters in the same string" ...

I didn't say that by default the processing of strings should be done with uint32 instead of uint8: both are useful, what I don't like is the name 'char': a string is a list of character, yet a character doesn't always fit in a char, which makes 'char' an annoying trap that every new programmer has to learn to avoid, sigh..

1: this is certainly an undervalued number: auto x = 1; declares x as an integer even though 'int' is nowhere to grep..

Araq (orginal) [2012-06-21T21:22:23+02:00] view original

the result is: 12954 lines match for 'int '(1) and 4109 match for 'size_t'.

Kudos for measuring things.

My interpretation differs though: 'int' is preferred in D code as it's shorter and prettier than 'size_t'. And so the sample is heavily biased. It doesn't mean that each usage of 'int' is intended.

Now back to Nimrod: I see no need to have a symbol 'int' that is always 'int32' everywhere. If you want 'int32' on every platform, just use 'int32'. The real problem is that Nimrod is opinionated and integer literals have type 'int'. However, I strive to make Nimrod easier to use in this respect: integer literals will become a separate type and implicitely convertible to any integer type if the literal is in the proper range.

This has the nice side effect that Nimrod may even be useful to program 16 bit CPUs conveniently.

You're right that the name 'char' could be better, but I can't see any better alternative ('octet' perhaps?). And while we're nitpicking, 'string' should have been named 'text' ...

renoX (orginal) [2012-06-22T09:28:15+02:00] view original

About the 'int' bias, I agree, shorter more readable, int is definitely meant to be used by default, so the question is which default is better?

int as pointer sized: advantage: can be used both for pointers (not really ideal for pointers though as pointers are/should be unsigned) and for integers. drawback: can create portability issue, less efficient on 64bit computers (more cache used).

int as int32: advantage: more efficient (on 64bit CPUs), more portable, drawback: cannot be used for pointer or big table so another type must be used for this..

And while we're nitpicking, 'string' should have been named 'text' ...

Shorter name and even more expressive, I like it!

adrianv (orginal) [2012-06-24T10:04:02+02:00] view original

However, I strive to make Nimrod easier to use in this respect: integer literals will become a separate type and implicitely convertible to any integer type if the literal is in the proper range.

Oh yes, please. It's annoying and makes no sense to get errors for expressions like this:



var x: Int32

x += 1

Mirror of forum.nim-lang.org

41 :: Is there any advantage that Int != Int32 on 64bit ?