How long a pointer return by str.cstring remains valid, according to the language standard (as opposed to concrete implementations)?
If we pass str.cstring directly into C function, I believe the data remains valid until the C function returns.
But is there are ways to prolong the lifetime? In particular, does having a variable holding str or str.cstring prolongs the C string lifetime? I mean code like that:
proc f(str: string | cstring) =
c_setstr(str.cstring) # this C function saves the pointer
...
echo str # prolong str lifetime
c_usestr() # and this C function uses this pointer to extract the data
Or str can be moved around at any moment when Nim code is executed, so it's invalid after the c_setstr call regardless further use of str?
Or str can be GCed immediately after the last use, i.e. here after the "echo" call?
C functions for clarity:
char *g_s;
void c_setstr(char *s) {g_s = s;}
void c_usestr() {printf(s);}
This is from https://ssalewski.de/nimprogramming.html#_basic_string_operations:
Technically, for a Nim string s addr s[0] is the C string pointer, called * char in C language. Whenever we pass strings to C libraries, we have to care for the fact that Nim’s garbage collector may deallocate the string automatically. Most C libs create copies of passed strings when the library uses the string for a longer time span. GTK, for example, does this with text for its widgets. But when the C library does not copy the string but uses it directly for a longer time, then it can occur that the Nim code frees the string, as the only Nim variable referring to the string goes out of scope, but the C library still uses the string. For that rare case, we may call GC_ref() on the string to prevent garbage collection, but that may generate memory leaks then. In the case that C libraries create strings, they provide generally also a function to deallocate the string. When we use such a C function, it is typically the best solution, that we copy the string from the C library to a Nim string and immediately deallocate the C string by a call of the provided free()/dealloc() function. For most C libraries, there exist good high-level bindings, which do not have these issues, so we mostly can use the C libs like pure Nim libs.
Not sure how correct it is, but GC_ref() should work at least for Nim v1.6, we will see which surprises Nim 2.0 may bring.
Thanks everyone for insights, now I understood how it should be done.
I was also looking for guarantees that Nim 1.x can provide, as well as documenting that for other programmers. GC_ref docs don't say that object will be pinned, although https://nim-lang.org/1.6.0/gc.html indeed propose GC_ref as the way to pin object for the C code - but this document isn't linked from the language manual.
Looking around, I've finally found that it's described in https://nim-lang.org/docs/backends.html#memory-management although wording may be a little better.
So, I think the only thing that the documentation missed is the link from GC_ref to this section for description of its "super-powers" for C backend. Plus, gc.html should be linked from Nim docs.
Araq, does it mean that this code is fine?
proc f(str: string) =
c_setstr(str.cstring) # this C function saves the pointer
...
c_usestr() # and this C function uses this pointer to extract the data
i.e. pointer returned by str.cstring and saved in C global, remains valid here till the end of f() ?
Can a Nim ref object be moved even while it's alive?
Also, does all that mentioned in documentation and guaranteed for every Nim 1.x with C/C++ backend?
Does GC_Ref() guarantees that it will be not moved till GC_Unref() ? It's mentioned in the C backend doc, but not in GC_Ref doc.
I ask that because I want to make code future-proof and based on documented Nim features. I volunteer to bring it to the docs to pin the guarantees :)
i.e. pointer returned by str.cstring and saved in C global, remains valid here till the end of f() ?
Correct.
Can a Nim ref object be moved even while it's alive?
Not in this decade, no. In theory it could be moved unless GC_ref was called on it but a moving GC is neither planned nor that desirable for Nim.
Also, does all that mentioned in documentation and guaranteed for every Nim 1.x with C/C++ backend?
To the best of knowledge the rules that I outlined here are valid for every Nim version (including nlvm) that produces native code. Note that the old GCs are even more lenient when it comes to collecting memory so use --mm:arc/orc to be on the safe side.