For debugging low-level details, it's always useful to check the generated C code. e.g.
echo 'echo "hello, world"' > x.nim
mkdir tmp
nim c --nimcache:tmp x.nim
Now you can find the source in tmp/@mx.nim.c.
Let's search for our string:
static const struct {
NI cap; NIM_CHAR data[12+1];
} TM__6x2C9bN0rvuU6HnSvx4zF9aQ_3 = { 12 | NIM_STRLIT_FLAG, "hello, world" };
So it's put in a struct, with the first member being "cap". Suspicious already :)
Obviously, 12 is the string length, so cap must be the length prefix. What is NIM_STRLIT_FLAG? It's defined in nimbase.h (on my computer, /usr/lib/nim/lib/nimbase.h).
#define NIM_STRLIT_FLAG ((NU)(1) << ((NIM_INTBITS) - 2)) /* This has to be the same as system.strlitFlag! */
NIM_INTBITS appears to be the word size of the target architecture. On my computer it's 64.
We can calculate the expressions then:
12 | (1 << 62) /* = 0x4000000C in hex. */
Finally, Q: what is @ in ASCII? A: 0x40. Most modern computers are little-endian, so the leftmost byte is stored last. That's why you're seeing the @ sign.
(In fact, if you open the file in a pager, you will see something like ^L^@^@^@^@^@^@@hello, world. ^L is form feed, ASCII 0x0C. ^@ is null, as control toggles the 7th bit, turning 0x40 -> 0x00.)
So, to answer your question: it is an internal flag for string literals, encoded into the length prefix. You probably don't want to get rid of it.
...rare exception: if you store lots of small constant strings during compile-time in an array, you can try storing them as cstrings to save some space. e.g.
const many_strings = [cstring"str1", cstring"str2", ... cstring"str99999"]
# output is something like {"str1", "str2", ... "str99999"}
This way, the length prefix need not be saved in the binary. OTOH this also means that determining the strings' length will be O(N) (as with C strlen). So use with caution.
Thanks for the explanation!
const many_strings = [cstring"str1", cstring"str2", ... cstring"str99999"]
# output is something like {"str1", "str2", ... "str99999"}
Really helpful, thanks :)