nimforum mirror - Strange string prefix

nrk (orginal) [2023-11-13T21:02:10+01:00] view original

For debugging low-level details, it's always useful to check the generated C code. e.g.


echo 'echo "hello, world"' > x.nim
mkdir tmp
nim c --nimcache:tmp x.nim

Now you can find the source in tmp/@mx.nim.c.

Let's search for our string:

static const struct {
  NI cap; NIM_CHAR data[12+1];
  } TM__6x2C9bN0rvuU6HnSvx4zF9aQ_3 = { 12 | NIM_STRLIT_FLAG, "hello, world" };

So it's put in a struct, with the first member being "cap". Suspicious already :)

Obviously, 12 is the string length, so cap must be the length prefix. What is NIM_STRLIT_FLAG? It's defined in nimbase.h (on my computer, /usr/lib/nim/lib/nimbase.h).

#define NIM_STRLIT_FLAG ((NU)(1) << ((NIM_INTBITS) - 2)) /* This has to be the same as system.strlitFlag! */

NIM_INTBITS appears to be the word size of the target architecture. On my computer it's 64.

We can calculate the expressions then:

12 | (1 << 62) /* = 0x4000000C in hex. */

Finally, Q: what is @ in ASCII? A: 0x40. Most modern computers are little-endian, so the leftmost byte is stored last. That's why you're seeing the @ sign.

(In fact, if you open the file in a pager, you will see something like ^L^@^@^@^@^@^@@hello, world. ^L is form feed, ASCII 0x0C. ^@ is null, as control toggles the 7th bit, turning 0x40 -> 0x00.)

So, to answer your question: it is an internal flag for string literals, encoded into the length prefix. You probably don't want to get rid of it.

...rare exception: if you store lots of small constant strings during compile-time in an array, you can try storing them as cstrings to save some space. e.g.

const many_strings = [cstring"str1", cstring"str2", ... cstring"str99999"]
# output is something like {"str1", "str2", ... "str99999"}

This way, the length prefix need not be saved in the binary. OTOH this also means that determining the strings' length will be O(N) (as with C strlen). So use with caution.

Mirror of forum.nim-lang.org

10629 :: Strange string prefix