nimforum mirror - Why does 'string' and 'len' use 'int' rather than 'uint' for length?

randbox (orginal) [2021-12-24T16:07:58+01:00] view original

Is there such a thing as a string with negative length?

If we want to compare a string cursor or byte counter guaranteed to be >= 0 (unsigned) with the length of string why is it necessary to convert it to a signed integer?

DeletedUser (orginal) [2021-12-24T19:09:42+01:00] view original

Mostly because Nim did not originally have unsigned integers. Beyond that, constant type conversions are annoying and it's not like there's a problem of int not allowing high enough values. Even if it did, imagine if it overflowed, would you prefer it has the valid length 0 or an invalid negative length?

demotomohiro (orginal) [2021-12-24T20:06:27+01:00] view original

Nim discourages unsigned int. This is why: https://github.com/nim-lang/Nim/wiki/Unofficial-FAQ#why-are-unsigned-types-discouraged

kobi (orginal) [2021-12-24T21:18:08+01:00] view original

confusing bugs can happen with underflowing. when uint is less than 0, it becomes the highest uint

Araq (orginal) [2021-12-24T22:11:50+01:00] view original

Bugs like for (auto i = 0; i < s.length() - 3; ++i) are everywhere in C++ and nowhere in C#, that's why. Using uint for "cannot be negative" is just wrong -- unsigned "integers" wrap around and are more like bitvectors than they are numbers.

randbox (orginal) [2021-12-25T00:02:48+01:00] view original

It appears compiler allows`` uint(str.len) `` but not `` uint(-1) `` so can infer string length integer never negative ? If a function disallows negative number for a particular parameter, it seems reasonable to declare the type of that parameter as unsigned to avoid the cost of an extra precondition check.

According to the manual Nim supports subrange types, so is type Offset = range[0..high(int)] more idiomatic for declaring non-negative arguments for functions dealing with strings? Is there any runtime overhead?

xigoi (orginal) [2021-12-25T00:13:34+01:00] view original

uint(-1) doesn't work only because it can be proven bad at compile time. The following compiles and overflows:

var i = -1
echo uint(i)

Zoom (orginal) [2021-12-25T00:27:59+01:00] view original

Using uint for "cannot be negative" is just wrong -- unsigned "integers" wrap around and are more like bitvectors than they are numbers.

So the real issue is just that wrapping is the default while it would better be explicit. Anyway, it's too late, we're stuck with this like Lua with their 1-based indexing.

According to the manual Nim supports subrange types, so is type Offset = range[0..high(int)] more idiomatic for declaring non-negative arguments for functions dealing with strings?

There's already Natural and Positive in system. Use them as much as possible. Don't know why they couldn't be uint with saturating/checked maths, though. Probably because you need to drop checks for release mode and somehow wrapping is considered a more severe error than UB.

demotomohiro (orginal) [2021-12-25T01:47:17+01:00] view original

CPUs do wrap around operation to unsigned int. So I think saturated unsigned int operations always requires checking value everytime +, -, *, div or mod operaters are used and not as efficient as signed int.

There are algorithms (in crypt, hash, pseudo random number generator) that requires unsigned int with wrap around operations. For example, xoroshiro128+ used in Nim's stdlib uses wrap around addition. These algorithms don't use unsigned int to count something but just hold bit patterns. They use unsigned int addition or multiplication to randomize bit pattern.

There is saturate.nim in Nim compiler: https://github.com/nim-lang/Nim/blob/devel/compiler/saturate.nim

Araq (orginal) [2021-12-25T06:30:12+01:00] view original

So the real issue is just that wrapping is the default while it would better be explicit.

No, that's not the "real issue", if you make x.len - 3 produce an underflow instead then common loops like for i in 0..x.len - 4 crash instead of iterating 0 times...

Anyway, it's too late, we're stuck with this like Lua with their 1-based indexing.

Well much like C# and Java are "stuck" on their designs that actually work...

Jehan (orginal) [2021-12-25T20:39:01+01:00] view original

So the real issue is just that wrapping is the default while it would better be explicit. Anyway, it's too late, we're stuck with this like Lua with their 1-based indexing.

It's more that using unsigned values for lengths is a constant booby trap waiting to go off, regardless of how you deal with underflow; both underflows that result in silent errors and exceptions are generally unexpected and bad. I'm keenly aware of the issues involved and it still occasionally bites me in languages that use them this way (primarily C++). A C++ scripting library I wrote a while ago specifically uses signed integers for lengths in order to avoid this.

If you don't believe us, you can listen to what Bjarne Stroustrup and Chandler Carruth are saying.. If you don't know who Chandler Carruth is, he leads the C++, Clang, and LLVM teams at Google.

Zoom (orginal) [2022-01-02T23:40:27+01:00] view original

No, that's not the "real issue", if you make x.len - 3 produce an underflow instead then common loops like for i in 0..x.len - 4 crash instead of iterating 0 times...

Relying on 0..s.len-X iterating 0 times when X>len is not a good practice in my opinion, as it's not clear from the code if the author actually assumed that len will always be >=X or not. However, I understand it's short, convenient and efficient when the situation perfectly aligns with the code's behaviour.

Well much like C# and Java are "stuck" on their designs that actually work...

I actually meant my comment to be rather neutral (that's why I chose a pretty inconsequential comparison).

No, because if the underlying type allows for high(uint) then you cannot convert that safely to int for signed arithmetics anymore...

Well, it kind of explains how the standard library is not very consistent with using Natural type for indexing as if a bit unsure it's a good idea. ;) I was wrong as my comment only considered the type's role in indexing.

Here's a very solid article in favour of using unsigneds: https://graphitemaster.github.io/aau/ (excuse the absence of namedropping). I don't have a strong opinion here, just presenting some alternative arguments here.

Araq (orginal) [2022-01-03T07:28:45+01:00] view original

Well, it kind of explains how the standard library is not very consistent with using Natural type for indexing

Well indexes should be Natural, lengths should be int. Though Natural would probably be better as a second order type requirement like .requires: x >= 0.

Here's an interesting article in favour of using unsigneds: https://graphitemaster.github.io/aau/

for (size_t i = size - 1; i < size; i--) typing i < size instead of i >= 0 and casting the wrap-around behavior into stone doesn't seem like a good idea... The article is not very convincing, "just learn these N subtle patterns and apply them consistently everywhere" has been proven again and again not to work.

Here is another article, https://www.nayuki.io/page/unsigned-int-considered-harmful-for-java -- equally well written (IMHO), equally non-convincing.

Zoom (orginal) [2022-01-03T13:36:58+01:00] view original

for (size_t i = size - 1; i < size; i--) typing i < size instead of i >= 0 and casting the wrap-around behavior into stone doesn't seem like a good idea... The article is not very convincing, "just learn these N subtle patterns and apply them consistently everywhere" has been proven again and again not to work.

It sure doesn't if you're supposed to do it by hand. If Nim indexing used unsigneds and you could write for container.items().rev() which would use the appropriate checks or "insert conditions around these instructions" then it would work just fine. ;)

Araq (orginal) [2022-01-03T14:21:53+01:00] view original

No, it wouldn't "work just fine", rev is already an indicator that it doesn't work but here is maybe a stronger argument: You program with int, you get overflows, you change to BigInt in the appropriate places, things work. You program with uint, you don't get overflows, it's hard to debug and then you have no idea if a BigUint will save you as the semantics of wraparound are cast into stone...

Having said that, using BigInt as the default in a language makes plenty of sense. Too bad we cannot have that because of "performance".

demotomohiro (orginal) [2022-01-04T07:29:57+01:00] view original

With a few code change like for (size_t i = size - 1; i < size; i--) to for (size_t i = size - 1; i < size; i-=2) it become infinite loop when size == size_t.high. Because when i wraparound from 0 to 0xfffffffe, this loop still continues because 0xfffffffe < 0xffffffff is true.

Here is example C++ code:

#include <iostream>

int main() {
  uint32_t size = 0xffffffff;
  
  for(uint32_t i = size - 1; i < size; i -= 2);
  
  std::cout << "finished\n";
}

mratsim (orginal) [2022-01-10T18:18:25+01:00] view original

We've had this discussion often when implementing Ethereum 2, a multi-implementation blockchain(Go, Java, Nim, Rust, Typescript), securing over $27B of assets at the moment.

https://github.com/ethereum/consensus-specs/issues/626 (The integer debate)

https://github.com/ethereum/consensus-specs/issues/1029 (Spec needed to be modified to catch the several underflows we ran into)

https://github.com/ethereum/consensus-specs/pull/279/files (Example underflows to fix: a <= b - c needs to be changed to a + c <= b)

Mirror of forum.nim-lang.org

8737 :: Why does 'string' and 'len' use 'int' rather than 'uint' for length?