nimforum mirror - Why does converting a big array to a seq with the '@' operator cause a segmentation fault?

biran2038 (orginal) [2025-05-15T07:28:50+02:00] view original


# nimble build -d:release=safe
# nim 2.2.4
import std/strformat
proc `!`*[T](a: openArray[T]): seq[T] =
    ## Turns an *openArray* into a sequence.
    ##
    ## This is not as efficient as turning a fixed length array into a sequence
    ## as it always copies every element of `a`.
    newSeq(result, a.len)
    for i in 0..a.len-1: result[i] = a[i]

block:
  var a: array[1_000_000, int]
  var b: seq[int] = newSeq[int]()
  for i in low(a) .. high(a):
    b.add(a[i])
  echo &"{b.len = :}"
  
  let c = !a # there is ok
  echo &"{c.len = :}"
  
  let d = @a # there is ok
  echo &"{d.len = :}"

block:
  var a: array[10_000_000, int]
  var b: seq[int] = newSeq[int]()
  for i in low(a) .. high(a):
    b.add(a[i])
  echo &"{b.len = :}"
  
  let c = !a # there is ok
  echo &"{c.len = :}"
  
  let d = @a # Segmentation fault(core dumped)
  echo &"{d.len = :}"

Araq (orginal) [2025-05-15T07:34:40+02:00] view original

Stack overflow?

biran2038 (orginal) [2025-05-15T08:07:58+02:00] view original

I used a very large array solely for data testing purposes due to its convenience. While I suspect a stack overflow, I can’t understand why the error only occurs with the @ operator. Even when I increased the array size tenfold (to 100,000,000 elements), no overflow occurs with other methods except @ operator. Even when I increased the array size tenfold (to 100,000,000 elements), no overflow occurs with other methods except

crfout (orginal) [2025-05-15T12:34:19+02:00] view original

Creating a new sequence from a large array (10,000,000 integers): newSeq works. toSeq works. @ from fails with a segfault.

Creating a new seq using @ from an existing seq of length 10,000,000 works.

Araq (orginal) [2025-05-15T12:39:52+02:00] view original

And why?

cblake (orginal) [2025-05-15T23:03:29+02:00] view original

This comes up a lot - and not just in Nim. Here's another thread (with links to others). Briefly, even what the default limits are depends upon the sysadmin for some deployed system. I have been cranking mine way up for decades. No idea how common/rare this is on Windows.

A program usually has no idea what the limits currently in force at program start-up are. Often enough "the scales" of problems just come from external data like files / command parameters / the network, etc.. AFAICT, there's just little culture of checking limits - much like C's culture of unchecked array bounds.

Perhaps not as part of the stdlib / compiler runtime for conversion which can stay as is, but maybe just as an addition stdlib call, might it be helpful to add some kind of portable interface using GetCurrentThreadStackLimits on Windows and getrlimit on Unix and NotSure on "other"? This might make it easy for a Nim program to guestimate its needs and if not met by 1.5..2X or more error out with a helpful message. Then, perhaps, we can make a more helpful response to queries like @biran2038's?

Maybe I'm ignorant and something like this already exists, but I did not find GetCurrentThreadStackLimits anywhere except an importc in winim/winim/inc/winbase.nim. Maybe there's some problem on Windows with that working or another name for the API?

Another option (which might already be in wide use?) is using the backend C compiler's sanitizer/etc. modes to isolate failures from more mysteriously blowing these limits out, but a helpful message for an end user that needs to ask an admin to do something is probably not on offer there.

biran2038 (orginal) [2025-05-16T00:18:00+02:00] view original

My system is Linux, where the default stack size is 8MB. When I use a command to increase the program's stack size to 80MB, this test code no longer causes a segmentation fault. However, what puzzles me is why using the '@' operator from the system library results in a segmentation fault, whereas the '!' operator in the test code does not.

nrk (orginal) [2025-05-16T17:17:18+02:00] view original

But a is not on the stack, global variables live in the data segment (or in this case, I guess bss). Had a lived on the stack, it would have immediately crashed like this:


proc main =
  var a: array[10_000_000, int]
  echo a.len # goodbye

main()

That brings up the question, what is overflowing the stack. Compiling following code with -d:danger:


var myArray: array[10_000_000, int]
let myArrayCopy = @myArray
stdout.writeLine myArrayCopy.len

yields


typedef NI tyArray__XQpq8SssyrXu6SqZAuwqFg[10000000];
[...]
N_LIB_PRIVATE tyArray__XQpq8SssyrXu6SqZAuwqFg myArray__x_u1;
[...]
N_LIB_PRIVATE N_NIMCALL(void, NimMainModule)(void) {
        {
        tyArray__XQpq8SssyrXu6SqZAuwqFg colontmpD_;
NimStringV2 colontmpD__2;
NI T2_;
tyArray__nHXaesL0DJZHyVS07ARPRA T3_;
NI T4_;
NIM_BOOL* nimErr_;
nimErr_ = nimErrorFlag();
nimZeroMem(((void*) colontmpD_), sizeof(tyArray__XQpq8SssyrXu6SqZAuwqFg));
colontmpD__2.len = 0;
colontmpD__2.p = NIM_NIL;
nimCopyMem(((void*) colontmpD_), ((NIM_CONST void*) myArray__x_u1), sizeof(tyArray__XQpq8SssyrXu6SqZAuwqFg));
myArrayCopy__x_u6.len = 10000000;
myArrayCopy__x_u6.p = ((tySequence__qwqHTkRvwhrRyENtudHQ7g_Content*) newSeqPayload(10000000, sizeof(NI), NIM_ALIGNOF(NI)));
T2_ = ((NI) 0);
for (T2_ = 0; T2_ < 10000000; T2_++) {
        (myArrayCopy__x_u6).p->data[T2_] = colontmpD_[T2_];
}
T4_ = myArrayCopy__x_u6.len;
colontmpD__2 = dollar___systemZdollars_u14(T4_);
if (NIM_UNLIKELY((*nimErr_))) {
        goto LA1_;
}
T3_[0] = colontmpD__2;
writeLine__x_u11(stdout, T3_, 1);
[...]

So it seems @ first copies myArray to the stack, and then copies the copy to the heap. I don't know why.

demotomohiro (orginal) [2025-05-17T16:25:20+02:00] view original

Maybe, the backend C compiler optimize out the large array from the stack because the array in your code is 0 filled and C compiler can generate assembly code that just adds 0 to the seq. But when '@' operator is used, such optimization doesn't applied and generate the large array on the stack. In that case, you probably need to read the assembly code generated by the backend C compiler. (If you use GCC, my article might help to generate assembly code https://internet-of-tomohiro.pages.dev/nim/nimruntimecheckoptimize.en).

Or try to assign runtime values (read from stdin, files, etc) to the big array so that no optimization can removes the stack variable.

arnetheduck (orginal) [2025-05-17T19:50:50+02:00] view original

Nim has a tendency to introduce lots and lots of temporary copies of everything - these slow things down and use up stack space - likely, @ falls into the category of code where nim simply adds a few copies here and there due to bugs / incomplete analysis.

Indeed, the recommendation in such cases is to look at the C code and report an issue with a small example where you understand the copy is redundant but the nim compiler does not.

autumngray (orginal) [2025-05-17T23:20:47+02:00] view original

Since @ takes a sink parameter, the compiler will make a copy of its argument when it cannot prove that what was passed won't be modified somewhere else. Maybe in this specific case the compiler could either:

see that the array elements don't reference heap memory and thus don't need to be moved or

(better?) analyze read/writes of global vars

but I don't know about the complexities involved in that.

Araq (orginal) [2025-05-18T07:48:20+02:00] view original

proc `@`* [IDX, T](a: sink array[IDX, T]): seq[T] {.magic: "ArrToSeq", noSideEffect.}

This sink annotation here is just a typo, I think.

Mirror of forum.nim-lang.org

12932 :: Why does converting a big array to a seq with the '@' operator cause a segmentation fault?