nimforum mirror - Ptr byte to cstring?

Nlits (orginal) [2023-12-23T03:51:00+01:00] view original

I am working with the freeimage nim library/wrapper, and have this code:

proc compressJpeg(imgData: string): string =
  ## Use freeimage to compress the jpeg files
  
  # Open image
  var mem = FreeImage_OpenMemory(cast[ptr byte](imgData.cstring), imgData.len.uint32)
  var image = FreeImage_LoadFromMemory(FIF_JPEG, mem, JPEG_ACCURATE)
  FreeImage_CloseMemory(mem)
  
  # Save image
  let FI_DEFAULT = 0.uint8
  var outmem = FreeImage_OpenMemory(addr FI_DEFAULT, FI_DEFAULT.uint32)
  
  doAssert FreeImage_SaveToMemory(FIF_JPEG, image, outmem, JPEG_OPTIMIZE).bool
  
  
  # to nim
  var buffer: ptr byte
  var length: uint32
  doAssert FreeImage_AcquireMemory(outmem, addr buffer, addr length).bool
  
  let cs = cast[cstring](buffer)
  
  
  doAssert cs.len.uint32 == length # Ensure converison was sucessful
  result = $cs
  doAssert result.len.uint32 == length # Ensure converison was sucessful
  
  # Cleanup
  FreeImage_Unload(image)
  FreeImage_CloseMemory(outmem)

But doAssert cs.len.uint32 == length always fires, with cs being 4 bytes long? I assume I am converting buffer to a cstring in the wrong way, but there could be another issue. Is there a better way than cast to convert between ptr byte and cstring?

jrfondren (orginal) [2023-12-23T05:55:25+01:00] view original

C strings are zero-terminated, and FreeImage_AcquireMemory isn't filling the buffer with non-zero bytes for len() to count. Even if it did, the assertion would fail as a C string can't have a length equal to the memory backing it without len() reading out of bounds.

Consider:

var buffer = [byte('a'), byte('b'), 0, 0, 0]
let cstr = cast[cstring](buffer[0].addr)
doAssert cstr == "ab"
doAssert cstr.len == 2
buffer[2] = byte('c')
doAssert cstr.len == 3

JiyaHana (orginal) [2023-12-23T07:16:05+01:00] view original

This is another way you can try.

var buffer: ptr byte

var length: uint32

doAssert FreeImage_AcquireMemory(outmem, addr buffer, addr length).bool

let cs = buffer[0 .. length - 1].cstring

nasl (orginal) [2023-12-23T12:39:17+01:00] view original

@Araq How about submitting a reply explaining why the post is wrong instead of editing it and appending, quite illogical, "Do not give wrong advice" comment.

It is discouraging participation if one's comment is censored, even when it's wrong.

Araq (orginal) [2023-12-23T14:02:08+01:00] view original

There was no "censorship" here at all, but surely I could have handled it better.

Nlits (orginal) [2023-12-23T17:41:59+01:00] view original

Although I get the point that the 0’s are not counted for the cstring length, but are for the buffer link. I don’t understand this:

var buffer = [byte('a'), byte('b'), 0, 0, 0]
let cstr = cast[cstring](buffer[0].addr)
doAssert cstr == "ab" # Why does buffer[0] give both `a` and `b`, not just a? Is buffer[0] not == to ‘a’?

I also don’t get how ptr byte goes to any of this. Is buffer[0].addr the same as the buffer var in my code? I was wondering how you could get a multi byte string from one ptr byte anyways. And @JiyaHana, idk how your code is supposed to work but it does not?

Araq (orginal) [2023-12-23T18:41:36+01:00] view original

Consider this solved.

I cannot, you should learn about and use ptr UncheckedArray[byte].

PMunch (orginal) [2023-12-24T16:14:14+01:00] view original

To understand what's going on we need to have a look at how all these types look in memory, and how they work in C. Your initial buffer type is a ptr byte, that is simply a pointer to a single 8-bit entity in memory. When you go to AquireMemory you pass a pointer to such a pointer. This is done so that the procedure can allocate some memory and then put the pointer to that memory in your variable.

Now let's have a look at cstring. These are defined in C as a pointer to a string of characters, terminated by a null byte. Remember that char, and byte are both 8 bit it size, and the only difference is semantic. To get the length of a cstring we have to start at the first character, and move through the string of characters until we find a null character.

Knowing all this we can begin to understand what's going on in your code. After you've written the image to the memory stream and gotten a pointer you erroneously treat this as a cstring. Remember that these are null terminated, but you have binary data which can contain null bytes anywhere. The JPEG file specification has some kind of four byte header followed by a null byte, that is what you're picking up when you try to get the length of the cstring. In fact the whole point of also passing in a pointer to a number which is set to the amount of bytes written is so that you know how long the output is since you can't count it easily yourself.

The definition of a cstring as a pointer to a string of data also explains why ptr byte and buffer[0].addr work as cstrings. If buffer is a series of bytes, then buffer[0] is the first byte, and buffer[0].addr is the pointer to this first byte in a series of bytes. ptr byte ys similarly set to point to a series of bytes somewhere in memory by the AllocateMemory function. Note of course that these aren't guaranteed to be properly set up as strings with a terminating null characters and no null characters within the series.

Now, how do we actually deal with this in Nim? First of storing binary output in a string is a bad idea. This output isn't text, so why should we lie and pretend that it is? A seq[byte] would be a better type to return. Although maybe your wrapper is using strings and you want to keep using them. First we have a pointer to some data and the length of data stored behind that pointer. You shouldn't try to second guess or verify this size, the whole point is that it's hard to tell how long the data output is so the library is kind enough to tell you. With this information you can use newSeq[byte](length.int) or newString(length.int) to create a buffer with enough size to hold your data. Then you can copy the information over from the FreeImage buffer with copyMem. Of course this causes an extra memory copy, but looking at the FreeImage API there doesn't seem to exist a simple way to populate a pre-existing buffer. You can however use the SaveToHandle function by supplying your own reader/writer implementation in order to write to a string, sequence, or stream.

ThomasTJdev (orginal) [2023-12-24T20:42:29+01:00] view original

Great answer @PMunch. Thanks!

PMunch (orginal) [2023-12-27T14:19:18+01:00] view original

Oh, almost forgot, but let's have a look at ptr UncheckedArray[byte] as Araq suggested. In C there really isn't any difference between a pointer to a single element, and a pointer to a series of elements. If you use square brackets on a pointer it is simply a shortcut for doing pointer arithmetic, no checking goes on under the hood to verify if you are actually doing this a something which is intended to be a series of elements. In Nim however the type system is much stricter than in C, and ptr byte and ptr UncheckedArray[byte] is how we make this distinction. ptr byte in Nim is always a pointer to a single byte, and as you discovered the only way to get the "next element" if your ptr byte actually points to the first element in an array of bytes is to cast it back and forth between a pointer type and a number while doing arithmetic on the numerical value. This is very inconvenient. So the way you're supposed to handle arrays coming from C is with ptr UncheckedArray[byte]. This type is simply a pointer to a series of bytes of an unknown length, so it isn't checked for out of bound access. You will see that this type actually supports the square bracket notation to fetch elements, but keep in mind that we need to check the bounds ourselves. In my previous answer I mentioned how we could populate a sequence or string in Nim, either through allowing the C API to write directly into a pre-allocated string, or by copying over the bytes after the fact. This would create a managed structure with bound checks and everything. Using UncheckedArray we accept the burden of working with a C data type and have to free the memory ourselves and deal with bound checks ourselves.

Personally I'd really like Nim to add a CheckedArray type which had a generic type and a generic length field for an even nicer API to C types (as many of them come with some form of length). But if you're able to it's always better to create proper Nim types instead.

Mirror of forum.nim-lang.org

10799 :: Ptr byte to cstring?