nimforum mirror - Use cstring for C binding

mildred (orginal) [2021-07-01T14:10:02+02:00] view original

Hello, I'm trying to bind to C code, and I have various questions around C strings (cstring)

As I understand, cstring is suitable for both const strings in C (const char*) as well as mutable strings (char *, as long as the length is respected). Now, I need to pass a string buffer to a C function so it can be filled (like would do snprintf) and I don't know how I should allocate the cstring from Nim.

let s: string = newString(1024)
let cs: cstring = cstring(s)
snprintf(s, 1024, "Foo bar %s", foobar)
echo s

Would that actually work? it does not feel good to allocate a string to convert it to a cstring. What's the best way to do here?

I also have another question to write C structs in Nim. Sometimes, in a struct, a string is embedded (without a pointer indirection) as an array of chars. Something like this:

#define MAX_LENGTH 1024
struct foo {
    char name[MAX_LENGTH]
}

I translated it in Nim as:

const MAX_LENGTH = 1024
type foo = tuple
  name: array[MAX_LENGTH, char]

Is that correct, is the memory layout the same? And then, how do I convert the array to a cstring to use in my code?

Thank you.

PMunch (orginal) [2021-07-01T14:33:41+02:00] view original

To answer these questions it helps to understand what a string in C actually is. char * simply means a pointer to a character. In C this is all that a string is, a pointer to a character. Then it is assumed that the rest of the string follows after the first character byte-by-byte, until a NULL byte is encountered. So the string "Hello world" in C would actually be stored in memory as "Hello world0" and the char * variable would simply be a pointer to the "H" character.

In Nim a string is a bit more complex, it is essentially a small object that contains the size of the string, the capacity of the allocated buffer, and the pointer to that buffer. So the string "Hello world" would be stored as the buffer "Hello world0000" and the object (len: 11, cap: 14), note that len and cap is without the last \0, this is to ensure that the buffer data will always be compatible with a char * in C.

So to answer your first question on how to pass a pre-allocated string to C you are almost spot on:

# This is how you can use snprintf directly in Nim
proc snprintf(buf: cstring, cap: cint, frmt: cstring): cint {.header: "<stdio.h>", importc: "snprintf", varargs.}

var s = newString(1024) # This allocates a buffer of NULL bytes that is 1024 characters long
echo s.len # As you can see the string is here 1024 characters long, but printing it in Nim will not do anything because the first byte is \0 and the string stops there.
echo cast[int](s.cstring) # cstring simply returns the pointer to the underlying buffer, which is compatible with a C string
echo cast[int](s[0].addr) # As you can see the pointer to the first character of the string is exactly the same as the cstring
s.setLen snprintf(s.cstring, 1024, "Foo bar %s", "Hello") # snprintf returns the amount of bytes written into the string, we use this to update the length of our string
echo s # Our string is now "Foo bar Hello"
echo s.len  # And our length is now 13 as we would expect

If we hadn't used setLen the string length would still return 1024, this might not be an issue if you're only outputting it to a terminal, but if you need the string length make sure to do this. By the way, the above sample can be run in the Nim playground (play.nim-lang.org) to show the results.

Again on the second question you are spot on, that object will have the same memory layout as the C struct. To convert to a cstring the only thing we need to do is to get a pointer to the first character of our string, which is the same as a pointer to the array. You can see how that works here:

proc snprintf(buf: cstring, cap: cint, frmt: cstring): cint {.header: "<stdio.h>",
              importc: "snprintf",
              varargs.}

type TestObject = object
  myStrData: array[1024, char]

var data: TestObject
var len = snprintf(data.myStrData.addr, 1024, "Foo bar %s", "Hello")
echo data.myStrData.addr.cstring

Clonk (orginal) [2021-07-01T14:48:35+02:00] view original

Would that actually work? it does not feel good to allocate a string to convert it to a cstring. What's the best way to do here?

Allocating a string and callind cstring does work for passing string.

What I usually do is make a lightweight wrapper on proc like these so I can pass string directly :

Here is a typical example, just ask if something is not clear :

// example.c
#include <stddef.h>
#include <string.h>
#include <stdio.h>

#define MAX_LEN 255

void  dummy(char bufMsg[MAX_LEN]) {
  strcpy(bufMsg, "azerty"); // This is just a dummy example
  printf("%s \n", bufMsg);
}

// Sometimes you also have
void  dummySize(char * bufMsg, size_t bufSize) {
  if(bufSize < 10) {
    printf("C program says: Error buffer too small\n");
    return;
  }
  
  strcpy(bufMsg, "0123456789"); // In reality, check that you don't write over bufSize
  printf("%s \n", bufMsg);
}

# example.nim
import strutils

{.compile: "example.c".}
const MAX_LEN = 255

proc c_dummy(bufMsg: cstring) {.importc: "dummy", cdecl.}
proc c_dummySize(bufMsg: cstring, bufSize: csize_t) {.importc: "dummySize", cdecl.}

proc dummy(): string =
  result = newString(MAX_LEN);
  c_dummy(result.cstring)
  result = $(result.cstring)

proc dummySize(bufSize: int): string =
  result = newString(bufSize);
  c_dummySize(result.cstring, result.len.csize_t)
  # Why do I do that ?
  # Because newString create a buffer of size bufSize full of zeros
  # The C function will not (usually write) in all the memory.
  # result represent as an array now contains : ['0', '1', '2', '3', '4', '5,', '6', '7', '8', '9',char(0), char(0), char(0), ..., char(0)]
  # Which is not typically what you want. So converting to cstring and back will strip result of '0' char (you can also use strip from std/strutils)
  result = $(result.cstring)

doAssert dummy() == "azerty"
doAssert dummySize(12) == "0123456789"
let res = dummySize(5)  # Here you see the printf : your buffer is too small
doAssert res.isEmptyOrWhitespace() # res is empty because the C didn't write any data inside : it's full of char(0)

I personnaly wouldn't use array[LEN, char] to represent char * because it would be harder to use from Nim side: dealing manually with null termination or even simply writing a string in your array[char] just wouldn't be practical.

So unless your char* is actually a buffer that does not need to be null terminated (in which case you really should use uint8_t*), I don't recommend using array.

mildred (orginal) [2021-07-01T15:06:10+02:00] view original

Thank you very much for the detailed information. I just learnt that strings could have a capacity different from their length (might be useful for mutable strings). Is there a way to access this capacity from the code?

Clonk (orginal) [2021-07-01T15:24:06+02:00] view original

The capacity is the size of the memory allocated, the length is the number of char that are not 0x00 .

Basically :

a.len() # Maximum number of char you can write in the string memory
a.cstring.len() # Number of char before the first char(0) character

That is because C consider the string to end at the first null character while Nim doesn't have this limitation.

Since you allocated a string that is bigger than what C write you end up with a Nim string with a lot of trailing char(0) that you need to remove.

Which is why we do :

mystring = $(mystring.cstring) # My solution
setLen(mystring, mystring.cstring.len) # PMunch's solution, that is (arguably) cleaner

mildred (orginal) [2021-07-01T15:43:25+02:00] view original

str.len is the actual length of the string. it's equal to its capacity when allocated with newString(cap) but after it has been used, and more specifically after str.setLen() has been called, it does no longer contains the capacity.

It's just that I'd prefer to rewrite:

const MAX_LEN = 1024
var s = newString(MAX_LEN)
s.setLen snprintf(s.cstring, MAX_LEN, "Foo bar %s", "Hello")
s.setLen snprintf(s.cstring, MAX_LEN, "Foo bar %s", "HelloWorld")

into something like:

var s = newString(1024)
s.setLen snprintf(s.cstring, s.cap, "Foo bar %s", "Hello")
s.setLen snprintf(s.cstring, s.cap, "Foo bar %s", "HelloWorld")

PMunch (orginal) [2021-07-01T21:33:13+02:00] view original

There is a way, but it's very hacky (basically cast the object to something with the same memory layout, yuck!). Someone opened a RFC to add this functionality: https://github.com/nim-lang/RFCs/issues/97, but I haven't heard any more about that.

arnetheduck (orginal) [2021-07-01T21:48:06+02:00] view original

Careful though, it's easy to create dangling references with cstring - it doesn't keep memory alive:


proc f(i: int): cstring =
  cstring($i)

let x= f(42)

echo x

GC_fullcollect()

let y = newSeq[char](10)

echo x

https://play.nim-lang.org/#ix=3rGI

cdunn2001 (orginal) [2021-07-08T08:17:00+02:00] view original

var s = newString(1024)

That creates a buffer of 1025 characters, but only 1024 are legally accessible. I.e.

s[1024] = 'Z'

is an error:


Error: unhandled exception: index 1024 not in 0 .. 1023 [IndexDefect]

It's an important distinction when interacting with C-code because many C functions want to read/write that final 0 (using "char*" i.e. "cstring), while Nim will write it implicitly at the end of a "string".

That means you (usually) can and should add 1 to "len" when telling C the buffer size, e.g.

snprintf(s.cstring, 1025, "Foo bar %s", "Hello");

cdunn2001 (orginal) [2021-07-08T08:19:08+02:00] view original

I think you want

s.setLen snprintf(s.cstring, s.cap + 1, "Foo bar %s", "Hello")
# or
s.setLen snprintf(s.cstring, MAX_LEN + 1, "Foo bar %s", "Hello")

Otherwise, the longest string snprintf could write is only 1023 (plus a null), while the Nim string could be up to 1024 non-null characters.

Mirror of forum.nim-lang.org

8179 :: Use cstring for C binding