In my codebase I'm trying to ensure that nobody tries to free something that's on the heap.
There are two symbols defined by the linker script which can be imported and used like so in C:
extern char __HEAP_START__[];
extern char __HEAP_END__[];
// ...
uintptr_t p = (uintptr_t) __HEAP_START__; // get a pointer to the start of the heap.
However, I don't know how to define these same arrays in Nim.
At first I thought cstring might do the trick but apparently not, as that causes them to be seen as pointers to data at a memory location rather than simply "data that starts at a memory location". You might say "hey, isn't the C code treating them as pointers?"... Yes it seems like it, but C semantics are just fucked like that, and in reality [] and * have different meaning.
I was eventually able to get it to work using the following Nim code:
let
heapStart {.importc:"__HEAP_START__".}: char
heapEnd {.importc:"__HEAP_END__".}: char
proc onHeap*(p: pointer): bool {.inline.} =
## Return true if the given pointer is on the heap.
cast[uint32](p) in cast[uint32](addr heapStart) ..< cast[uint32](addr heapEnd)
This is technically fine, it just says "there's a byte called __HEAP_START__ somewhere", which works.
However, I'm still curious if the original definition ("there's an array of undetermined size called __HEAP_START__ somewhere") is possible. I tried this:
let
heapStart {.importc:"__HEAP_START__".}: UncheckedArray[char]
heapEnd {.importc:"__HEAP_END__".}: UncheckedArray[char]
which gives Error: invalid type: 'UncheckedArray[char]' for let
Any ideas?
Yes ptr UncheckedArray is usually correct, but not in this case.
it leads to the following codegen:
extern NIM_CHAR* __HEAP_START__;
extern NIM_CHAR* __HEAP_END__;
which does not exhibit the same behaviour as the desired codegen:
extern NIM_CHAR __HEAP_START__[];
extern NIM_CHAR __HEAP_END__[];
My understanding is as follows:
With the former, you tell the C compiler "there's a value in memory somewhere which is a pointer to a character"
With the latter, you say: "there's a value somewhere in memory which is a character, and there may be more characters after it"
And if you try to use the latter in a place where a pointer is expected, the C compiler implicitly takes the address for you. With the former it simply reads the value to get the address.
let
heapStart {.importc:"((void*) __HEAP_START__)", nodecl.}: pointer
heapEnd {.importc:"((void*) __HEAP_END__)", nodecl.}: pointer
With the former, you tell the C compiler "there's a value in memory somewhere which is a pointer to a character" > With the latter, you say: "there's a value somewhere in memory which is a character, and there may be more characters after it"
Thats both true and not true. In theory arrays in C are a collection of items, but in practice they're almost always implicitly treated as pointers and dereferenced.
And if you try to use the latter in a place where a pointer is expected, the C compiler implicitly takes the address for you. With the former it simply reads the value to get the address.
Generally its better to think of the array in C as a pointer with some extra syntax. Thats why the Nim output code for ptr UncheckedArray would generally function correctly.
Heres a good reference for exact behaviors: https://www.geeksforgeeks.org/difference-pointer-array-c/
Generally its better to think of the array in C as a pointer with some extra syntax.
But that's how I got into this mess!
Look:
// main.c
#include <stdio.h>
char my_data[] = "Hello world!";
extern char *foo();
extern char *bar();
int main() {
printf("%s\n", my_data);
printf("%s\n", foo());
printf("%s\n", bar());
}
// foo.c
extern char my_data[];
char *foo() {
return my_data;
}
// bar.c
extern char *my_data;
char *bar() {
return my_data;
}
$ gcc main.c foo.c bar.c
$ ./a.out
Hello world!
Hello world!
Segmentation fault (core dumped)
About that, there is this:
var heapBegin {.importc:"__HEAP_BEGIN__".} : array[0, char]
With nlvm, this produces:
@__HEAP_BEGIN__ = external global [0 x i8]
(same as would produce clang in with C extern char __HEAP_BEGIN__[];
With nim c,
typedef NIM_CHAR tyArray__LoFFW0ONIEnen61TrpExEQ[1];
//...
extern tyArray__LoFFW0ONIEnen61TrpExEQ __HEAP_BEGIN__;
Not completely satisfactory, but that would work ?
Then, this is what I think is most close:
let
heapStart {.importc:"__HEAP_START__".}: cstring
heapEnd {.importc:"__HEAP_END__".}: cstring
var s = cast[cstring](addr heapStart)
Anyway I don't think the C compiler cares about the length of the data. What's important is that there's a symbol named __HEAP_START__ and it has an address.