nimforum mirror - Variable length array

brianrogoff (orginal) [2014-07-17T19:00:52+02:00] view original

Inspired by a thread over at the D forum, I'm wondering if there is a way to get stack allocated variable length arrays in Nimrod, or the rationale against them if there is not.

AFAIK, Nimrod's arrays have sizes given at compile time, while Nimrod's sequences can have run time determined length and are resizable. I'm asking about an intermediate thing, like C99's VLA or Ada's stack allocated arrays, who's size is determined at runtime but which is not resizable and so can potentially avoid the GC.

Araq (orginal) [2014-07-17T19:40:18+02:00] view original

Nimrod is not D. We have a working GC implementation and a language design that is not hostile to an efficient GC implementation. "Avoiding the GC" already sounds like a broken record.

The reason for not providing them is simplicity. That said, people surely want them. Maybe after 1.0 is out.

But in my opinion we should leave them out and do an escape analysis instead. This is very simple to do with the current semantics and would help strings too.

Jehan (orginal) [2014-07-17T19:48:00+02:00] view original

The main practical problem with stack-allocated arrays is that they can result in stack overflow unless their size is bounded. But if their size is bounded, you can generally just use a fixed-size array of the maximum length you can tolerate instead.

If you truly do need variable-length arrays, then you have the following options:

Allocate small arrays on the stack (using a fixed size) and large arrays on the heap.

Maintain an external stack of type array[largenumber,T] and pass slices of that stack as an openarray parameter.

brianrogoff (orginal) [2014-07-17T22:02:02+02:00] view original

Almost everything I work on these days benefits from a GC, but I don't believe that even the most efficient GC is always a win against stack allocation. I also don't believe that escape analysis will always allow you to elide heap access better than explicitly using stack allocated collections. I agree that if it can, it would be best to leave them out. Having VLAs would better than another !@#$ing pragma for stack allocation :-).

Nimrod is not D, or C++, or Ada, but it should be competing with all of them. I'm glad to read that at least there's openness to VLAs at some point in the future.

Jehan (orginal) [2014-07-17T23:27:59+02:00] view original

Here's a basic implementation of VLAs (caution, there are no checks to make sure that you don't use them past their lifetime):


include system/ansi_c

proc alloca(n: int): pointer {.importc, header: "<alloca.h>".}

type
  UncheckedArray{.unchecked.}[T] =
    array[1, T]
  VarLengthArray*[T] =
    ptr object
      size: int
      data: UncheckedArray[T]

proc `[]`*[T](a: VarLengthArray[T], i: int): T =
  assert i >= 0 and i < a.size
  result = a.data[i]

proc `[]=`*[T](a: VarLengthArray[T], i: int, x: T) =
  assert i >= 0 and i < a.size
  a.data[i] = x

proc len*[T](a: VarLengthArray[T]): int =
  a.size

template newVLA*(T: typedesc, n: int): expr =
  let bytes = sizeof(int) + sizeof(T)*n
  var vla = cast[VarLengthArray[T]](alloca(bytes))
  c_memset(vla, 0, bytes)
  vla.size = n
  vla

proc toSeq*[T](a: VarLengthArray[T]): seq[T] =
  result = newSeq[T](len(a))
  for i in 0..len(a)-1:
    result[i] = a[i]

With a bit more effort, one can probably also mirror the seq layout so that a VLA can be passed as an openarray parameter.

That said:

brianrogoff: I also don't believe that escape analysis will always allow you to elide heap access better than explicitly using stack allocated collections.

That case is actually fairly simple to check for. When a seq object is allocated locally, never assigned to another variable, never returned, never resized, and only passed as an openarray to other procedures, that's completely equivalent to having a local declaration of a variable length array.

I'll also still say that this is a case of premature optimization and even if you need to optimize it, it may not be the best way to optimize your code (especially if you're risking a stack overflow).

brianrogoff (orginal) [2014-07-22T18:08:37+02:00] view original

Thanks for the code Jehan, nice example of Nimrod extensibility!

How would I take your suggested next step, namely openarray compatibility for VarLengthArray? If you could point me in the direction of code to look at I'll give it a try; I didn't see anything in a quick pass through the docs.

Jehan (orginal) [2014-07-22T19:17:23+02:00] view original

You can find the layout of the seq header in lib/system.nim as TGenericSeq and PGenericSeq (various stuff that uses these types is in lib/system/sysstr.nim). PGenericSeq corresponds to the VarLengthArray[T] type that I defined above, it just has a different header and does not explicitly define a data attribute, though seq data is laid out similarly.

The len attribute in TGenericSeq is the same as size above; the reserved attribute contains the allocated space for data in its low-order bits (one per slot, not one per byte) and a flag in the most significant bit that determines whether sequence assignment should be shallow. You can probably just set reserved to the same size as len (same as newSeq in lib/system/gc.nim does, except that you'll be using alloca to get memory).

This should correctly mimic the layout of standard sequences. You cannot fully use the resulting type as a general sequence, since resizing or copying is likely to mess up the stack, the heap, or both. However, you can use it as an openarray, best via a template + cast, e.g.


# Untested code
template asArray[T](a: VarLengthArray[T]): openarray[T] =
  cast[seq[type(a[0])]](a)

brianrogoff (orginal) [2014-07-24T16:16:00+02:00] view original

Thanks again Jehan, with that hint it was easy to get a first pass of Nimrod VLAs based on alloca working, and compatible with open array. In my quick timing test against seq[char] (which I built up using 'add', is there a better way?) the VLAs were more than 50% faster.

gradha (orginal) [2014-07-24T17:02:05+02:00] view original

You can preallocate space using the len parameter version. But if you are using seq[char], isn't that essentially a string preallocated with newStringOfCap?

Mirror of forum.nim-lang.org

499 :: Variable length array