nimforum mirror - Big integer litterals

lscrd (orginal) [2018-03-03T18:22:59+01:00] view original

Hi. This is my first post on this forum, but I have used Nim for some time now (in particular, the Gtk3 bindings).

Compiling one of my old programs with the new 0.18 version, I have encountered an error which can be sum up to a single statement:

var x: int = 10_000_000_000

The error is type mismatch: got <int64> but expected 'int'

As my computer is a 64 bits machine and int is represented by 8 bytes, I expected 10_000_000_000 to be of type int not int64 (as described in the user manual).

Is there an explanation for this or is it a bug (which is also present is 0.17.2)?

StasB (orginal) [2018-03-03T20:19:27+01:00] view original

I think the compiler is just telling you that int is a generic integer which could end up being too small (e.g. 32 bits on some old 32 bit machine) for your value on another machine, and that you need to explicitly specify that you want a 64 bit int.

Stefan_Salewski (orginal) [2018-03-03T20:48:25+01:00] view original

That is an interesting topic. When you try

var x: int = 10 * 1_000_000_000

it seems to compile fine on a 64 bit Linux box. So problem is the constant itself.

But what I really ask myself: How does

var x = 10 * 1_000_000_000

behave on 32 and 64 bit OS. Is x int or int64.

StasB (orginal) [2018-03-03T20:58:40+01:00] view original

It compiles fine when you do that because type-checking probably happens on the expression level, and 1,000,000,000 fits in a signed 32-bit int, which is probably the minimum size for int, and the type of the multiplication operator is (int, int): int, so the entire thing type-checks.

LeuGim (orginal) [2018-03-03T22:00:23+01:00] view original

Stefan_Salewski: Your example though compiles, gives wrong result - just of 4 lower bytes.

var x: int = 10 * 1_000_000_000
var y: int = 1410065408
echo x       # => 1410065408
echo x == y  # => true
echo x == (10_000_000_000 and 0xFFFF_FFFF)  # => true

This should be because of Nim's/C's compiler's 32-bitness.

lightness1024 (orginal) [2018-03-06T15:54:54+01:00] view original

I just wish to say that I'm not sure of how nim behaves, but as of the C standard, nothing is said about the size of types, and definitely, the fact that int should be the natural machine's word size is an urban legend. What the C standard says about type, is the following: sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(stuff*) <= sizeof(long) nothing more. And in fact, all implementations I've encountered make int 32 bits, even on x64.

Stefan_Salewski (orginal) [2018-03-06T16:31:20+01:00] view original

@lscrd

Nim 0.18 is correct indeed -- see language manual:


Pre-defined integer types

These integer types are pre-defined:

int
the generic signed integer type; its size is platform dependent and has the same size as a pointer. This type should be used in general.
An integer literal that has no type suffix is of this type if it is in the range low(int32)..high(int32) otherwise the literal's type is int64.

lscrd (orginal) [2018-03-06T18:38:30+01:00] view original

Thank you all for your answers. Is seems that my first replies are still blocked :-(.

@Stefan_Salewski

I didn't remember this paragraph, but indeed the compiler does what is written in the manual. However, in "Numerical constants" is is said that "Literals without a type suffix are of the type int, unless the literal contains a dot or E|e in which case it is of type float." So, there is a contradiction or, at least, an imprecision here. Not a big deal but this has confused me.

Now, to define an int literal greater of 2^32-1 or less than -2^32, we have to apply a conversion, whereas it is possible to define directly a literal with the right type for other integer types. So, it should be logical to add a prefix 'i for this purpose. Not that I think it should be something with high priority :-).

@Stefan Salewski again.

To answer your first question, I have use gintro when converting a program from gtk2 to gtk3 (running on my Linux Manjaro 64 bits). I have been one of your first users and I have issued many reports about bugs or wishes (in fact, I have issued the majority of reports at this time :-) ). Incidentally, it is a great work you have done.

Stefan_Salewski (orginal) [2018-03-06T21:32:22+01:00] view original

I have issued the majority of reports at this time

Yes, indeed I remembered your name a short period after sending my post -- so I removed the question about GTK a few days ago already. :-)

For the contradiction in the compiler manual, you are right, I have just posted an issue about that to github issue tracker.

About your lost messages: I think whenever the first message of a new forum user appears on the forum, then all later messages generally are not blocked any more. So your first reply may get lost by a forum bug.

StasB (orginal) [2018-03-07T05:37:00+01:00] view original

@Iscrd: it doesn't really make sense to be able to do so without an explicit conversion, because then you have code that builds fine on one platform but fails type-checking on another. I think there's a good reason for that funny definition in the manual which separates the physical size of an int from the range of its literals.

lscrd (orginal) [2018-03-07T10:49:59+01:00] view original

@StasB

Yes I understand. If you write var x = 10_000_000_000, x has type int64 whatever the platform. You have to deal now with an int64 and avoid to convert it to int which is not portable. But if you want to do that if is better to write var x = 10_000_000_000'i64 which clearly states that you want an int64 and doesn't rely on a special rule – explicit is better than implicit :-). If the portability to 32 bits platform is important, you have to be careful anyway.

The problem with that special rule for the big literals is that it breaks the general rule. So const c = 2_147_483_647 gives the type int to c whereas const c = 2_147_483_648 gives it the type int64 but const c = 2^32 gives an int. So, I'm not sure we have gained anything with this special rule.

The only way to make sure we can write portable programs using int would have be to forced int to 32 bits even on 64 bits platform. So, to work with 64 bits integers, we would have been forced to use int64. I'm pretty sure that it would have cause a lot of other problems though.

I don't like this special rule and I'm still not convinced of its usefulness. But, actually, I don't care a lot as I can use a conversion to get what I want (and I don't want to use int64 for the kind of programs I have written and which use some big literals).

Araq (orginal) [2018-03-07T11:51:30+01:00] view original

So, I'm not sure we have gained anything with this special rule.

Fair enough but what type should 10_000_000_000 have otherwise? I guess the compiler could just flag an error instead but that would be inconsistent too since 2 ^32 gives an int.

lscrd (orginal) [2018-03-07T13:18:10+01:00] view original

I don't see why 10_000_000_000 could not be an int on a 64 bits platform and produce an error on a 32 bits platform. As there is always the possibility to write 10_000_000_000'i64, this is not a restriction.

Furthermore, it would make things consistent as const c = 10 * 1_000_000_000 would give an int on a 64 bits machine and an error on a 32 bits machine (or, is it that it would give an int64? I don't have a 32 bits machine to check this but it seems unlikely).

But I will not fight about this point which is minor. As long as it is clearly stated that big literals are int64, I think I can live with that and use a conversion to get an int on 64 bits platforms :-). The real issue is this small inconsistency in the manual which deceived me.

StasB (orginal) [2018-03-07T14:53:37+01:00] view original

@Iscrd:

if you want to do that if is better to write var x = 10_000_000_000'i64 which clearly states that you want an int64 and doesn't rely on a special rule – explicit is better than implicit

The situation is already bad enough with having to convert variables/annotate literals in ways that aren't necessary in any other systems programming language.

The problem with that special rule for the big literals is that it breaks the general rule

Not sure what you mean. Can you state what you think the general rule is?

The only way to make sure we can write portable programs using int would have be to forced int to 32 bits even on 64 bits platform

int is platform-dependent type by definition.

I don't see why 10_000_000_000 could not be an int on a 64 bits platform and produce an error on a 32 bits platform

Because the idea that your code can either pass or fail type checking depending on where it's being compiled is absolutely bonkers.

mashingan (orginal) [2018-03-07T16:07:37+01:00] view original

@lscrd, there's int32. Also big number on 32 bit arch should give warning about its size I guess, because sometimes people forget to add type annotation when write big integer literals.

Especially when people used to work on 64 bit suddenly have to code on 32 bit (this should be rare but possible though).

lscrd (orginal) [2018-03-07T22:29:42+01:00] view original

@StasB

Not sure what you mean. Can you state what you think the general rule is?

The general rule is written in the manual: "Literals without a type suffix are of the type int, unless the literal contains a dot or E|e in which case it is of type float." This is the reason why I asked a question. Later in the manual, another rule I missed states that for int "An integer literal that has no type suffix is of this type if it is in the range low(int32)..high(int32) otherwise the literal's type is int64." which contradicts the previous rule. If I had known this second rule, I would not have asked the question but probably issued a report about the inconsistency.

Because the idea that your code can either pass or fail type checking depending on where it's being compiled is absolutely bonkers.

But this is already what is done when you use when conditions: you compile a code depending on these conditions. You cannot expect to execute exactly the same code on all platforms, especially if it depends heavily on the int size.

Now, if you write var x = 10_000_000_000, the type of x is not explicitly defined. It is logical to consider that it is an int. Adding a special rule to specify that, as it cannot fit in a 32 bits signed integer, it has type int64 has the disadvantage to change its type according to its value. So, you have to make sure that changing the value to a smaller integer (such as 1_000_000_000) doesn't break the code. And the situation becomes really complicated on 32 bits platforms as with 10_000_000_000 you get a 64 bits value whereas with 1_000_000_000, you get a 32 bits value. It will be very difficult to manage this.

But the right way to write portable code here is var x = 10_000_000_000'i64, var x: int64 = 10_000_000_000 or var x: int64 = 10_000_000_000'i64. Even if you change the value to 1_000_000_000, the code will continue to compile and execute on both platforms. No need then for a special rule for, as we have seen, it may be dangerous on 32 bits machines. And, in the future, if we have to manage 128 bits integers, we will not have to add another special rule to give type int128 to literals which doesn't fit in a 64 bits signed integer :-).

@mashingan

According to the second rule, a big literal on a 32 bits machine will be of type int64, so it will be impossible to assign it to an int whatever its size. So there is no risk and this is an advantage of the rule.

Without this rule (and only the first one which gives type int to literals without suffix), if people used to work on 64 bits machines forget to specify the suffix, they will see an error at the first compilation.

The only problem is when porting a program written without care on a 64 bits machine to a 32 bits machine. But I think that, in this case, other problems, not related to this one, will occur. It's unlikely that a program depending on integer size will compile and execute without error on another platform, if not designed carefully (and tested on this platform). For this reason, this problem with big literals may not be so important.

StasB (orginal) [2018-03-08T06:35:17+01:00] view original

@Iscrd:

But this is already what is done when you use when conditions

The entire point of when is to make sure that your code doesn't arbitrarily break on other platforms, but again, we're not talking about that. We're talking about the language itself being consistent across platforms, so you either need weakly typed literals in general, or to somehow ensure statically that the literal fits into the type regardless of the platform. You're not going to convince me that it's reasonable for the rules of the language itself (which literals are compatible with what types) to vary across platforms (not that you need to; I don't decide anything).

lscrd (orginal) [2018-03-08T10:35:06+01:00] view original

You have not convinced me either. I have shown that on 32 bits platform, when a variable is initialized without explicit type, the current rule will cause problems as the variable size will vary according to the initialization value. I think it's really a bigger problem that which seems a problem for you, i.e. that int being a different type on 32 bits and 64 bits platforms, some assignments to int will inevitably failed on a 32 bits machine.

This is the case with var x = 10 * 1_000_000_000 as, here, the compiler doesn't magically evaluate the expression as an int64 (as it will do for a literal) and so you will encounter the situation you do not want (different behavior on different platforms). What is really ugly is that the way you write the initialization value changes the semantic of the program at the point that, on a 32 bits machine, it compiles with the literal and fails to compile with the expression. From a pure logic point of view, when no size is specified, 10_000_000_000 should be equal to 10 * 1_000_000_000 (same value, same type). The rule for big literals is too ad hoc and will produce this kind of inconsistency.

But the language is defined this way and I don't think it will change – not before we have to manage int128 I suppose :-). I only asked a question, I got answers which explains why this is done this way and I still think it would have been better to manage literals without prefix in a more uniform way (as stated in the first rule). But I have not made a request for a change (which would have few chances to be adopted anyway). It's a too minor problem and I think there are more important things to do :-).

lightness1024 (orginal) [2018-03-08T15:12:52+01:00] view original

It was even worse before https://github.com/nim-lang/Nim/issues/936

Mirror of forum.nim-lang.org

3604 :: Big integer litterals