nimforum mirror - Avoiding RangeError getting address of empty seq

snej (orginal) [2020-07-22T21:03:07+02:00] view original

There are times I need to pass a seq as a pointer and byte-count, usually to C code but sometimes to Nim code like AsyncSocket.send.

If I do it like this:

socket.send(addr data[0], data.len)

it unfortunately raises a RangeError when data is empty. So I have to complicate my code with an if test.

Is there a way to get the address of a seq's items, that doesn't have this problem? (addr data doesn't work because it gives the address of the seq itself, which starts with a length field not the data.)

treeform (orginal) [2020-07-22T21:16:37+02:00] view original

I don't think so. When data.len == 0 there is no allocation storage. The pointer pointing to the array is nil. A seq is just 3 values (len, ptr and cap). When len = 0 its just (0, 0, 0) everywhere.

You will be just sending it:


socket.send(nil, 0)

Stefan_Salewski (orginal) [2020-07-22T21:20:13+02:00] view original

The problem is not that the seq is empty, but it is in an uninitialized state still.

var s1: seq[int]
var s2 = newSeq[int]()

s2 should give no problem. Core part of a seq is a pointer to the actual data. I think for s1 that pointer is just nil still. Unfortunately for s1 and s2 len is zero, so we can not decide what case we have. What is the test you do? I can not really answer your question unfortunately.

Stefan_Salewski (orginal) [2020-07-22T21:34:19+02:00] view original

A seq is just 3 values (len, ptr and cap).

I thinbk that is wrong for modern Nim. Sizeof(seq) is 8. I think onje of the two ints, cap or len is not stored in the seq value object, but in the data buffer.

snej (orginal) [2020-07-23T00:42:54+02:00] view original

Um ... I think both of you misunderstood my question, since neither of your replies make any sense to me. Let me be specific.

Given a seq, how do I get a pointer to its items, in a way that doesn't fail when the seq is empty?

I'm looking for something equivalent to C++'s std::vector::data() and std::string::data().

In most cases I could just skip the call entirely. But (a) that involves adding a check, which is easy to forget; and (b) there are cases where the pointer/length you're passing is only part of the call, and you don't want to skip it if it's empty.

@treeform:

When data.len == 0 there is no allocation storage. The pointer pointing to the array is nil.

That's not a problem, since the size is 0.

@Stefan_Salewski:

The problem is not that the seq is empty, but it is in an uninitialized state still.

No, it's empty. There's no such thing as an uninitialized seq.

All this stuff about the fields in a seq is irrelevant. All I was pointing out is that addr data doesn't work because it points to the seq itself not to the items.

Stefan_Salewski (orginal) [2020-07-23T06:27:52+02:00] view original

how do I get a pointer to its items

Well you did it the right and common way in your first post:

socket.send(addr data[0], data.len)

When we have var s1: seq[T] then addr(s1[0]) is the address of the first element. But as I told you, the dereference s1[0] may fail when the seq is uninitialized, as it is a nil deref.

proc main =
  var s1: seq[int]
  var s2 = newSeq[int]()
  
  echo cast[int](s1)
  echo cast[int](s2)
  
  var a: ptr int = addr(s1[0])

main()


$ ./t
0
140675632484432
/tmp/t.nim(10)           t
/tmp/t.nim(8)            main
/home/salewski/Nim/lib/system/fatal.nim(49) sysFatal
Error: unhandled exception: index out of bounds, the container is empty [IndexDefect]

And indeed we get the same error for s2, so you have to test for len == 0 before using subscript operator [].

jibal (orginal) [2020-07-23T08:53:45+02:00] view original

C and C++ define the address of the element one beyond the end of an array to be referable but not dereferenceable ... that is, you can take its address but you can't access its content. Nim considers merely taking the address of the element one beyond the end of an array or sequence to be a range error, which I think is a mistake.

lscrd (orginal) [2020-07-23T17:05:54+02:00] view original

Empty is not uninitialized--they are two completely different concepts.

Yes, indeed, except that the compiler processes them the same way, i.e. both are considered to have length 0.

In fact, there are no uninitialized objects in Nim. If you write var s: seq[int], the memory area representing s is filled with a nil value, which is semantically equivalent to a sequence of length 0. So, one can say that all sequences are by default initialized to a sequence of length 0. It was not the case in previous versions where a nil sequence was considered to be different of a sequence of length 0.

Said another way, in current version of Nim var s: seq[int] and var s = newSeq[int] are semantically equivalent, but represented differently.

I think that, as when interfacing with C we frequently need to get the address of the raw data, it would be helpful if a predefined proc existed for this purpose. But, as there are no semantic difference between a sequence with no memory allocated and a sequence of length 0, I’m afraid that we will not avoid one test and even two. This procedure would be equivalent to:


proc bufferAddr[T](s: seq[T]): pointer =
  if s.len == 0: nil else: unsafeAddr(s[0])

There are two tests: one to compute the length and one to test the length.

snej (orginal) [2020-07-23T19:46:30+02:00] view original

A predefined proc for this could avoid the unnecessary range check and simply return the internal seq field that points to the items — that would boil down to one CPU instruction.

If the seq is empty the result would presumably be nil, but as explained several times above (thanks, @jibal!) it's irrelevant what the pointer is when the length is 0.

lscrd (orginal) [2020-07-23T20:28:33+02:00] view original

No, you can’t avoid the test as the area containing the length, the capacity and the items may not have been allocated yet, as in var s: seq[int]. You can’t return the address of something which doesn’t exist.

Now, I agree that if this area has been allocated, it would be possible to simply return something like s[]+16 (considering s as a pointer). But, then there is a inconsistency as some sequences of length 0 (those non allocated) will return nil, while others (those empty) will return a non nil value. It would be better to return nil in both cases.

And the problem was exactly the same when non allocated sequences and empty sequences were considered different. Before accessing the area containing the length, the capacity and the items, you had to check that the sequence was not nil.

To get the address of the area containing the items without checking that s is not nil, the compiler should systematically create the area containing the length, the capacity and the (empty) item field. I suppose it would have simplified some things (no need for the runtime to check for nil when doing operations on sequence) at the price of a possibly useless allocation.

jibal (orginal) [2020-07-23T21:39:20+02:00] view original

Yes, indeed, except that the compiler processes them the same way, i.e. both are considered to have length 0.

No, you're misunderstanding, and making the same mistake. An uninitialized seq, which you can get with {.noinit.}, has not been zero-filled ... it contains junk, and using it is likely to produce an access violation;. e.g.,

proc foo*(): seq[int] =
  var x {.noinit.}: seq[int]
  x

echo foo().len

crashes on my machine with "SIGSEGV: Illegal storage access. (Attempt to read from nil?)"

If you don't put {.noinit.} on it, then it is initialized to be empty ... it is not uninitialized, which is a different concept.

In fact, there are no uninitialized objects in Nim.

Didn't I just address this in my comment "you can only get an uninitialized seq by using {.noinit.} (so strictly speaking, snej is incorrect that there's no such thing" ? It would really help if people would read what they're responding to before responding.

No, you can’t avoid the test as the area containing the length, the capacity and the items may not have been allocated yet, as in var s: seq[int]. You can’t return the address of something which doesn’t exist.

This is complete nonsense. The address is obtained by pointer arithmetic between the pointer to the data and the index--for an empty seq, the pointer is nil (0), and the index is going to be 0, so the address you get is nil (0). The compiler could avoid the test by ... not doing the test. But as I already explained, the right way to do this is for the compiler to view the address of the element one beyond the end to be in range when the operation is just taking the address--but currently the compiler makes no distinction between taking the address and actually accessing the element--it considers the legal range to be 0..len-1 in either case.

I've programmed since 1965, for decades in assembler and C, and was a member of the C language standards committee, so I have a lot of experience in this area and I always think in terms of the underlying machine-level operations, but I think that people who haven't been so close to the metal tend to get confused about such differences as between uninitialized vs. default initialized to 0, and between getting an error exception produced by an explicit error check by the runtime and getting an error exception due to a trap produced by the memory protection hardware upon access to location 0 of memory.

snej (orginal) [2020-07-23T21:51:08+02:00] view original

But, then there is a inconsistency as some sequences of length 0 (those non allocated) will return nil, while others (those empty) will return a non nil value. It would be better to return nil in both cases.

It doesn't matter, actually. Any two zero-length byte ranges (slices) are equivalent. There's no reason to compare the pointers, nor is there a reason to dereference them.

I've been creating & maintaining a pretty large C++ codebase for five years that heavily uses a struct I call slice that's just a (pointer, length) tuple, as discussed here. I am very, very familiar with zero-length slices. So believe me when I say things like "it doesn't matter" above :)

jibal (orginal) [2020-07-23T22:01:27+02:00] view original

It doesn't matter, actually.

Note my comment above: "there is a subtlety: the quoted C standard says that "pointer arguments on such a call shall still have valid values", which is not true of NULL (nil). But few implementations check the pointer when the len is 0, so people get away with it." That is, according to the C standard, merely loading a pointer that contains NULL is undefined. But it's not an issue in practice.

lscrd (orginal) [2020-07-24T00:18:31+02:00] view original

I think we were in a dialog of deaf.

You were focused on the difference between uninitialized (with {.noinit.}) and initialized sequences (other sequences) whereas I was focused on the difference between sequences with no explicit initialization and sequences with explicit initialization.

What I didn’t see is that it is indeed possible to take the address contained in s and add the offset to the start of item list (actually 16) even if s contains a nil address. Indeed, in this case, the length is considered to be 0, and, so, any address should be acceptable.

I find this ugly but it works, provided no access is done at this address which should be the case. Personally, I would prefer to return a null address when the length is 0, even at the cost of two tests. But as you said, null pointers are no more valid in this case, so it isn’t a real improvement.

Mirror of forum.nim-lang.org

6575 :: Avoiding RangeError getting address of empty seq