nimforum mirror - pointer/length -> seq[int8]?

cy (orginal) [2016-06-14T23:33:08+02:00] view original

If I'm calling a C routine that provides a pointer to a byte blob, and an integer length, how do I return that as a seq[int8]? Or is there some sort of array[...,int8] that I should return instead?

def (orginal) [2016-06-15T00:39:14+02:00] view original

Based on system.nim code:

const arrayDummySize = when defined(cpu16): 10_000 else: 100_000_000
type UncheckedArray {.unchecked.} [T] = array[0..arrayDummySize, T]
var x: ptr UncheckedArray[int8]

mbaulch (orginal) [2016-06-15T02:43:54+02:00] view original

If you want the safety (and convenience) of a pure Nim array[..., ...] or seq[...] and can accept a performance hit (small in many cases), you could use an approach like this:

proc arrayGet[T](arr: ptr T, index: cint): T {.importc: "#[#]", nodecl.}
proc arraySet[T](arr: ptr T, index: cint, val: T) {.importc: "#[#] = #", nodecl.}

proc copyArrayC[T, U](a: openarray[T], num: int, conv: proc (x: T): U): ptr U =
  assert(a.len >= num)
  result = createU(U, num)
  for i in 0..<num:
    arraySet(result, i.cint, a[i].conv())

proc cleanupArrayC[T, U](a: var openarray[T], aC: ptr U, num: int, conv: proc (y: U): T) =
  for i in 0..<num:
    a[i] = arrayGet(aC, i.cint).conv()
  discard resize(aC, 0)

proc intToCint(i: int): cint = i.cint
proc cintToInt(ci: cint): int = ci.int

proc intsToCints(a: openarray[int], num: int): ptr cint = copyArrayC(a, num, intToCint)
proc cintsToInts(a: var openarray[int], aC: ptr cint, num: int) = cleanupArrayC(a, aC, num, cintToInt)

As I'm sure you can see, this easily generalises to float/cfloat or indeed any other Nim/C array type conversions. Also, beware that this frees your C array. Remove the resize() call if that's not what you want.

The approach suggested by @def works too, although I personally dislike having unchecked arrays scattered everywhere. Otherwise, why not just program in C?

In the rare case that cintsToInts (say) causes a performance bottleneck, we can still:

Refactor so that the array is generated in Nim. (Hopefully only have to re-write a little C)

Use an {.unchecked.} array if we feel the safety/convenience vs. performance/quick implementation trade-off is worthwhile.

N.B. You could (possibly) make copyArray/cleanupArray even faster by rewriting them as a template taking conv: untyped.

mbaulch (orginal) [2016-06-15T04:44:42+02:00] view original

Thinking about this a bit more, {.unchecked.} arrays aren't needed at all. We can't safely pass them around to procs that will call len(), in any case.

If you must have (effectively) zero performance overhead, why not implement a wrapper type for C arrays? For instance,

type CArray[T] = object
  data: ptr T
  size: int

proc `[]`(ca: CArray[T], x: Slice[int]): T
proc `[]=`(ca: var CArray[T], x: Slice[int])

You could define len, items, pairs, and replacements for everything from sequtils if you felt inclined. There may be other syntactic sugar that could be applied. Unfortunately, Nim doesn't (AFAIK) provide a way to re-use all existing openarray (i.e. array and seq) code for such a type. Perhaps concepts will one day provide an answer.

cy (orginal) [2016-06-15T05:36:31+02:00] view original

I have absolutely no idea what def's suggestion would even do. You define an arbitrary array size, then you create an array of that size. And that's it? Where does that convert an unbounded array of bytes (and its length) into a bounded list?

As for mbaulch's first suggestion, that seems sensible. You'd have to be pretty stupid to use large blobs in a database, and their data only lasts as long as the statement hasn't been reset, so converting them is definitely what I had in mind.

I'm guessing arrayGet and arraySet are some sort of magic macro-ish things, that get translated into "result = carray[i]" (except carray is an opaque pointer, far as Nim knows)? I would wonder if that couldn't be made more efficient. Shouldn't there be some sort of use of memcpy, that overwrites a block of data in managed memory?

mbaulch's second suggestion is... well, interesting at least. Not really useful for my purposes. I'd wonder how you would tell it to free the underlying C array when you're done with the wrapper object.

Anyway, thanks for answering.

mbaulch (orginal) [2016-06-15T06:30:06+02:00] view original

Shouldn't there be some sort of use of memcpy, that overwrites a block of data in managed memory?

You could probably do this (AFAIK) for int and float arrays. It would rely of the representation of Nim arrays at the backend. There are a few caveats:

You'd have to check that each version of the compiler kept the same underlying representation. (This isn't guaranteed AFAIK)

Between platforms, you'd have to check that int and cint were always represented equivalently.

I'm not motivated to find out, because anything I learn could become quickly out of date. My understanding of the compiler, and of the guarantees Nim makes about its backend representations are both quite limited. For these reasons, I'd avoid this technique. You may feel differently.

I'd wonder how you would tell it to free the underlying C array when you're done with the wrapper object.

You're right. The underlying C array isn't freed. You'd have to do that manually.

cy (orginal) [2016-06-15T08:28:32+02:00] view original

You could probably do this (AFAIK) for int and float arrays.

What? Oh, no, no. I just wanted to only convert it to a byte array. I can scan that for more complex structure afterwards. I didn't mean a C library that passed me an array of integers where endianness matters, just raw bytes.

That's why I said int8 specifically.

You're right. The underlying C array isn't freed. You'd have to do that manually.

You could possibly do something with move semantics...

proc `=destroy`(ca: var CArray) =
  if ca.data != null:
    discard resize(ca, 0)

proc `=`(ca: var CArray, src: var CArray) =
  ca.data = src.data
  src.data = null

...or something. But again, in my case the C array is freed, and I wanted to copy it anyway, so I wanted to make it at least an array of bytes that Nim could understand.

mbaulch (orginal) [2016-06-15T08:42:56+02:00] view original

Aah. You did say "byte blob". I focussed on the int8 and so that's why I thought endianness matters. If raw bytes is all you need, and are happy to handle endian issues yourself, copying into managed memory should be okay. Forgot about `=destroy`. Neat idea.

Good luck!

Krux02 (orginal) [2016-06-17T15:10:23+02:00] view original

I you want to return a seq[int8], then I agree, that copy is the best way to go, since the seq type has value semantics. If you get this value, to modify it, then you should use the unchecked array way.

I remember that in Go (programming language) I once wrote a wrapper that took the pointer and size, and made a go slice out of them without copying. That has the advantage, that it really behaves like the C version, I mean you can do modifications to the data that have an effect in the C library. And the advantage, that you work with a bounds checked slice type. But that only worked, because passing a slice in Go does not copy/owns the underlying data.

I don't think this would be possible in nim, because a seq owns the data meaning, if the seq is gone, nim want's to free the content.

cy (orginal) [2016-06-19T00:26:17+02:00] view original

Well, here's my latest attempt. It works... assuming the (C backend) header for a seq[] doesn't stop being TGenericSeq. Copying the buffer into the seq's raw data area, after setting the size of the sequence sufficiently. Obviously only works generally for 8 bit item sequences, like seq[int8] or seq[char]

Just ignore the "makebuf" function. I just did that to get a C generated buffer to play with.

{.emit: """
#include <assert.h>
void memcpySeq(void** dest, void* src, int len) {
  TGenericSeq* seq = ((TGenericSeq*)*dest);
  assert(len <= seq->len);
  memcpy(*dest+
    sizeof(TGenericSeq), // header
    src, len);
  seq->len = len;
}
""".}

# proc memcpy[T](dest: array[0..T,int8], src: pointer, len: int) {.importc: "memcpy",header: "<string.h>".}
proc memcpy[T](dest: var seq[T], src: pointer, len: int) {.importc: "memcpySeq",header: "<string.h>".}

from macros import getType,kind,typeKind,toStrLit,`$`
import macros

# can't do a template, since we need to check the type of kind...
macro onebyte(kind: expr): string =
  case kind.kind
  of nnkSym:
    echo("we got a symbol: ",kind)
  else:
    assert(false)
  
  # assert(sizeof(kind.getType) == 1) sigh...
  let skind = $kind
  assert(skind == "char" or
         skind == "int8" or
         skind == "uint8",
         "blobs can only have 1 byte items");
  ""

template toBlob(kind, src, size: typed): expr =
  discard onebyte(kind)
  var dest = newSeq[kind](size);
  memcpy(dest,src,size);
  dest

template toBlob(src, size: typed): expr =
  toBlob(char, src, size)

# just a little (terrible) C for example

{.emit: """
#include <stdlib.h>
#include <string.h>
int makebuf(void** dest) {
  *dest = malloc(0x10);
  memset(*dest,'Q',0x10);
  return 0x10;
}""".}

proc makebuf(dest: var pointer): int {.importc: "makebuf",nodecl.}

var a: pointer;
let c = makebuf(a);

echo("length of data is ",c);
var b = toBlob(a,c)

import typetraits
assert(b.type.name == "seq[char]")
assert(b[3] == 'Q',"the elements are not the same!")
assert(b.len == c,"The sequences are not the same length!")

echo(b)
# not sure why this is considered unsafe...
echo(cast[seq[int8]](b))
# but eh
echo(toBlob(int8,a,c))
# echo(toBlob(int,a,c))

cy (orginal) [2016-06-19T01:56:14+02:00] view original

Keeping in mind that nim won't generate the TGenericSeq structure unless there's a sequence in that very module...

Well, it's a hack, but at least it works.

{.emit: """
#include <assert.h>
void memcpySeq(void** dest, void* src, int len) {
  memcpy(((char*)*dest)+
    sizeof(TGenericSeq), // header
    src, len);
}
""".}

# proc memcpy[T](dest: array[0..T,int8], src: pointer, len: int) {.importc: "memcpy",header: "<string.h>".}
proc memcpy[T](dest: var seq[T], src: pointer, len: int) {.importc: "memcpySeq",header: "<string.h>".}

from macros import getType,kind,typeKind,toStrLit,`$`
import macros

# can't do a template, since we need to check the type of kind...
macro onebyte(kind: expr): string =
  case kind.kind
  of nnkSym:
    echo("we got a symbol: ",kind)
  else:
    assert(false)
  
  # assert(sizeof(kind.getType) == 1) sigh...
  let skind = $kind
  assert(skind == "char" or
         skind == "int8" or
         skind == "uint8",
         "blobs can only have 1 byte items");
  ""

template toBlob*(kind, src, size: typed): expr =
  discard onebyte(kind)
  var dest = newSeq[kind](size);
  memcpy(dest,src,size);
  dest

template toBlob*(src, size: typed): expr =
  toBlob(char, src, size)

# just a little (terrible) C for example
when defined(test):
  {.emit: """
#include <stdlib.h>
#include <string.h>
int makebuf(void** dest) {
  *dest = malloc(0x10);
  memset(*dest,'Q',0x10);
  return 0x10;
}""".}
  import typetraits
  proc example() =
    proc makebuf(dest: var pointer): int {.importc: "makebuf",nodecl.}
    
    var a: pointer;
    let c = makebuf(a);
    
    echo("length of data is ",c);
    var b = toBlob(a,c)
    
    assert(b.type.name == "seq[char]")
    assert(b[3] == 'Q',"the elements are not the same!")
    assert(b.len == c,"The sequences are not the same length!")
    
    echo(b)
    # not sure why this is considered unsafe...
    echo(cast[seq[int8]](b))
    # but eh
    echo(toBlob(int8,a,c))
    # echo(toBlob(int,a,c))
  
  example()
else:
  var q: seq[int8]; # ensure we can access TGenericSeq

Mirror of forum.nim-lang.org

2317 :: pointer/length -> seq[int8]?