nimforum mirror - Memory layout of a seq[int]

lux (orginal) [2019-05-11T13:01:20+02:00] view original

Hi,

Can somebody tell me what they memory layout of a seq is? I've compiled a nim function as a dynamic library which returns a seq[int], but I cannot really figure out the structure.

mratsim (orginal) [2019-05-11T13:42:11+02:00] view original

I suggest you have your function return a ptr to the first element and a length instead and make sure that while your function is running a variable holds on the seq so it's not garbage collected.

A seq at a low-level is:

2 ints (C int_64t on 64-bit arch) for tlength and reserved space

a pointer to a contiguous heap-allocated array of elements

See: https://github.com/nim-lang/Nim/blob/721bf7188bfff3a3ae1db44bece57cca3dfe8461/lib/system.nim#L502-L508

when not defined(JS) and not defined(gcDestructors):
  type
    TGenericSeq {.compilerproc, pure, inheritable.} = object
      len, reserved: int
      when defined(gogc):
        elemSize: int
    PGenericSeq {.exportc.} = ptr TGenericSeq

This changes a bit when using destructors or Javascript or the Go GC.

The way to use seq in FFI is

## raw example

let a = @[1, 2, 3, 4, 5]

let a_ptr = a[0].unsafeAddr # cast to ptr UncheckedArray if you want array indexing
# if a is a var a[0].addr is enough


## C Interface function

proc foo(a: seq[int]): tuple[p: ptr int, len: int] {.exportc.} =
  result = (a[0].unsafeAddr, a.len)

lux (orginal) [2019-05-11T19:49:01+02:00] view original

Thanks a lot mratsim, works like a charm!

FYI, see below my end result, calculating the primes up to a certain number with the 'sieve of Eratosthenes', is quite fast (already 70 times faster as the scripting language I used first).

func sieve* (n:int):tuple[p:ptr int, len: int] {.cdecl, exportc: "sievenim", dynlib.}=
  var a : seq[int]
  a.add(2)
  #let a_ptr =a[0].unsafeAddr
  var arr = newSeq[bool](n+1)
  for x in countup(3,n,2):
     if not(arr[x]):
      a.add(x)
      for y in countup(x*x,n,x):
        arr[y] = true
  result = (a[0].addr, a.len)

mratsim (orginal) [2019-05-11T20:19:10+02:00] view original

In your code a should be passed as parameter because the result is otherwise "escaping".

At the end of the function, the GC is allowed to collect and deallocate the sequence because no variables hold on to it (it was local).

That will make your result pointer points to invalid data.

Regarding Sieve of Eratosthenes I have an implementation here that uses a bitvector and should be significantly faster.

lux (orginal) [2019-05-11T21:20:47+02:00] view original

Ah, thank you that is what I was afraid of/experiencing, should I do it like below?

func sieve* (n:int, a:var seq[int]): int {.cdecl, exportc: "sievenim", dynlib.}=
  a.add(2)
  #let a_ptr =a[0].unsafeAddr
  var arr = newSeq[bool](n+1)
  for x in countup(3,n,2):
     if not(arr[x]):
      a.add(x)
      for y in countup(x*x,n,x):
        arr[y] = true
  result = ( a.len)

mratsim (orginal) [2019-05-12T00:30:26+02:00] view original

Yes it's better, note that a.len is already available to the caller, you can use the result for an error code or have no result value at all.

lux (orginal) [2019-05-14T21:08:46+02:00] view original

Well it was not easy trying to get this working and after several segfaults and illegal indexes i (finally! RTFM!) looked into some more documentation but really couldn't find much about working with dynamic libraries created in Nim. What I did see is that more people had the same problem in this forum and there I found a 'magic' switch: use --gc:regions (mentioned by Araq) when you compile and this worked out great! I even can use the seq memory layout 'as is' : first int is the length, second the capacity and then the rest of the ints of my seq[int] follow. I. can easily use the FFi interface of the scripting language (newLisp in my case).

What was also nice is that my slightly enhanced function (thanks to a tip in mratsim's number_theory library) now is almost as fast as the super duper bit vector version from mratsim (compiled with -d:release).

My version generates all primes upto 1_000_000_000 in about 6.8 seconds (cpuTime()) , super duper bit vector version in about 6.4 seconds on my MacBook Air.

For reference here is my final function, compiled with: nim c -d:release --gc:regions --app:lib sieve.nim

func sieve2* (n:int):seq[int]{.exportc.} =
  result = newSeqOfCap[int](int(n.float/ln(n.float)*1.2))
  result.add(2)
  var arr = newSeq[bool](n+1)
  for x in countup(3, n, 2):
    if not(arr[x]):
      result.add(x)
      for y in countup(x*x, n, 2*x):
        arr[y] = true

Mirror of forum.nim-lang.org

4837 :: Memory layout of a seq[int]