There's no documentation I can find about how to convert an array of bytes to a string (where the bytes are UTF-8, of course.) I've looked through the manual, tutorial and several library modules.
So far I've been using cast[string](...). This appears to work fine for seq[byte]; at least I haven't found any problems resulting from it.
But yesterday I ran into a crash doing the same with an openarray[byte] — this does not work, it produces a string whose contents and length are garbage.
proc toString(bytes: openarray[byte]): string =
let str = cast[string](bytes) # <--- the call in question; how to do this properly?
echo "length = ", str.len
echo "str[0] = ", str[0].byte
assert str.len == 3 # FAILS: actual value is 0x232221
return str
let bytes = @[33'u8, 34, 35] # Before you suggest it, appending a 0 does not help :)
echo toString(bytes) # Without the asserts above, this will spew garbage or crash
So I'm guessing what I'm doing is not kosher, even though it seems to work on a seq. After all, cast[] is documented as being low-level and dangerous.
What's the right way? (Ideally it would be efficient, i.e. not create an intermediate seq[byte], as I'm doing this in some low-level network code.)
One way would be
proc toString(bytes: openarray[byte]): string =
result = newString(bytes.len)
copyMem(result[0].addr, bytes[0].unsafeAddr, bytes.len)
@oswjk solution is correct.
openarray[byte] are not nul-terminated unlike strings and that would cause issues if you are interfacing with C code that expect nul-terminated cstring. I.e. you were probably victim of the same bug as https://github.com/status-im/nim-http-utils/issues/8 and coincidentally the fix is almost the exact same code as @osjwk: https://github.com/status-im/nim-http-utils/pull/9/files#diff-0d2a0d4a1727b8f0022cbacbd946e780R546
where the bytes are UTF-8, of course.
This isn't relevant -- Nim is agnostic about the content of strings and doesn't require them to contain UTF-8 or any other encoding.
@oswjk solution is correct.
It would be nice to have this in the standard library, so that one doesn't have to unleash unsafeAddr just to do a simple conversion. 😬
openarray[byte] are not nul-terminated unlike strings and that would cause issues if you are interfacing with C code that expect nul-terminated cstring.
I'm not. The problem is in pure Nim -- the cast returns a garbage string object, as shown in the above example.
It appears to be misinterpreting the raw bytes in the openarray as if they were a string object, so e.g. the string's length is the first bytes of the array interpreted as a little-endian int.
Again, I don't know the exact semantics of Nim's cast[], so this might just be misuse of it. But it's dangerous that it works with one type (seq) but fails with a conceptually similar type.
cast is equivalent to reinterpret_cast in C++, it just reinterprets the raw bit pattern (except for numerical types were it does zero-extension or truncation like C casts)
Openarray are "ptr+len", I did misread your parameters and thought it was seq[byte] when I wrote that part of my answer.
Not sure if this is helpful, but casting to cstring apparently works as expected:
proc toString(bytes: openarray[byte]): cstring =
let str = cast[cstring](bytes)
echo "length = ", str.len
echo "str[0] = ", str[0].byte
assert str.len == 3
return str
let bytes = @[33'u8, 34, 35]
echo toString(bytes)
# Output
# length = 3
# str[0] = 33
# !"#
Interesting … I'm wondering where the 0 byte at the end of the cstring came from, since cast doesn't copy anything. You might just have gotten lucky in that particular example, and the memory after bytes happened to start with a zero.
Here's a modification that fails, because the byte past the end of the openarray isn't zero:
proc toString(bytes: openarray[byte]): cstring =
assert bytes.len == 3
let str = cast[cstring](bytes)
echo "length = ", str.len
echo "str[0] = ", str[0].byte
assert str.len == 3
return str
var bytes = @[33'u8, 34, 35, 36, 37]
echo toString(bytes.toOpenArray(0, 2))