nimforum mirror - Stock address instead of plain object in a table : C to Nim

zack (orginal) [2024-06-07T19:32:59+02:00] view original

New to the Nim language, I'm trying to store the address of an object in a table to interface it with C code. I've realized that storing the entire object requires a lot of memory, especially if I'm storing a lot of objects. So instead of storing the object, I want to store its address.

type
  myObj = ref object
    age: int

# Init global table
var testTable  = initTable[string, ptr myObj]()

proc initTableObj (): cint =
  # Constructor
  let obj = newObj()
  obj.age = 20
  
  # Store address in 'testTable'
  testTable["key1"] = obj.addr
  return 0

proc callTableObj (): cint =
  let p = testTable["key1"]
  
  if p == nil:
    echo "yes"
    return 1
  # Deference
  let newObj = p[]
  
  if newObj == nil:
    echo "yes"
    return 1
  else:
    # Error here : SIGSEGV: Illegal storage access. (Attempt to read from nil?)
    echo "Age: ", newObj.age
  
  return 0

I get this error SIGSEGV: Illegal storage access I'd like to know what I'm doing wrong, If I call the procedure initTableObj inside my callTableObj procedure, everything works.

PMunch (orginal) [2024-06-07T21:07:18+02:00] view original

There's a couple things wrong with this code. But let's start with the wrong assumption(s) that led you to try this in the first place.

I've realized that storing the entire object requires a lot of memory

Your object is the size of an integer, so it is exactly the same size as a pointer. Of course if you add more fields this will add up, but you need to store the data in your object somewhere, often just having it in a table is the right choice. But let's say your object actually is massive, and you want to avoid copies. The object you've got there is also a ref object, this means that when you create it, it is allocated on the heap, and whenever you pass the object around you're actually just passing around a pointer. It will be automatically freed once the last reference to it goes out of scope, and you can basically just relax. So having Table[string, myObj] would actually just store pointers already, no more fuzz! If you need to pass a ref object to a C function you can use addr to tell Nim to give you the pointer instead of abstracting it away, just keep in mind that the reference needs to stay in scope as long as the C code wants to access it.

With that said, let's have a look at your code. The problem here is pretty much the exact thing I warned against, you create a myObj as a local variable in your function, then you manually store a pointer to it in the table, circumventing Nims reference counting, then you return. As you return from this function Nim sees that myObj goes out of scope, and since you just circumventing the reference counting that's the only reference Nim knows about. So Nim diligently frees your object since you're no longer using it. Once you try to use it later on it isn't actually a nil dereference issue, but rather a user after free issue. Putting the contents of initTableObj into callTableObj just means that the local variable doesn't have time to go out of scope before you try to use it, and as such Nim hasn't gotten around to free it yet.

Hopefully that helps, and welcome to the Nim community!

zack (orginal) [2024-06-08T08:54:54+02:00] view original

@PMunch , Thank you for your kindness, I think I understand, finally when in my table I declared the object, I made a copy like this :

var testTable  = initTable[string, myObj]() # First approach declared the whole object
proc initTableObj (): cint =
  # ...
  testTable["key1"] = obj
  return 0

Now if I define my obj as a global variable, it works

var testTable  = initTable[string, ptr myObj]() # Second approach declared the object address
var obj: myObj # define global variable
proc initTableObj (): cint =
  # ...
  obj = newObj() # Set object
  testTable["key1"] = obj.addr
  return 0

The problem now is that I can't create several different objects, since obj is declared as global :

testTable["key1"] = obj.addr # initTableObj ()
testTable["key2"] = obj.addr # initTableObj ()

If I want to retrieve key1 , I end up with the values of key2

Do you have a suggestion?

PMunch (orginal) [2024-06-08T09:59:47+02:00] view original

As I mentioned, the whole premise of what you're doing is incorrect, so solving the problems you are facing won't actually solve your problem. Instead try this:

import tables

type MyObj = ref object
  age: int

var testTable: Table[string, MyObj]

proc populateTable() =
  testTable["key1"] = MyObj(age: 42)
  testTable["key2"] = MyObj(age: 32)

proc checkTable() =
  # These lookups returns a reference (pointer) to the object stored somewhere on the heap
  echo testTable["key1"].repr
  echo cast[int](testTable["key1"].addr) # Cast to int so it can be printed
  echo testTable["key2"].repr
  echo cast[int](testTable["key2"].addr) # Cast to int so it can be printed
  
  for key in ["key1", "key2"]:
    var obj = testTable[key] # Obj now holds a reference to the object
    obj.age += 1 # Manipulate the object through the pointer
  
  # These still return the same reference, but we can see the data has been manipulated
  echo testTable["key1"].repr
  echo cast[int](testTable["key1"].addr) # Cast to int so it can be printed
  echo testTable["key2"].repr
  echo cast[int](testTable["key2"].addr) # Cast to int so it can be printed

populateTable()
checkTable()

zack (orginal) [2024-06-08T10:23:11+02:00] view original

@PMunch, sorry I don't understand , when you do this :

var testTable: Table[string, MyObj]

proc populateTable() =
  testTable["key1"] = MyObj(age: 42)
  testTable["key2"] = MyObj(age: 32)

You are populating your table with objects, but imagine that I have 10,000 objects ("key1", "key2" ... "key10000"). My memory will increase.

Maybe I didn't understand your first answer (probably), but my goal is not to store my objects, but only a reference, an address... something that would allow me to find it through my table so I can use it again later.

PMunch (orginal) [2024-06-08T10:29:00+02:00] view original

Well they need to exist somewhere. The code I shared above only stores a reference to the object in the table, the object itself lives on the heap. The reason for this is because the object is declared as a ref object. If you try to change that to just object then you will see the program behaving quite differently.

You might also find this helpful: https://peterme.net/nim-types-originally-a-reddit-reply.html

zack (orginal) [2024-06-08T10:44:41+02:00] view original

The reason for this is because the object is declared as a ref object.

I have no choice, my object is declared like this.

The problem may be deeper than that: did I make the right choice in using a table to save my objects ??? Thanks for the link.

Araq (orginal) [2024-06-08T10:57:56+02:00] view original

If your object is already a ref, feel free to alias it as you need:

var testTable: Table[string, MyObj]

proc populateTable() =
  let sharedObj = MyObj(age: 42)
  testTable["key1"] = sharedObj
  testTable["key2"] = MyObj(age: 32)
  testTable["key3"] = sharedObj
  testTable["key4"] = sharedObj
  testTable["key5"] = sharedObj
  testTable["key6"] = sharedObj

Now key1 and key3 and key4 etc all point to the same object and the memory requirements do not increase beyond what is needed to resize the testTable itself.

zack (orginal) [2024-06-08T11:23:04+02:00] view original

@Arak, my objects will all be different, so my table memory will increase. Am I right ? In my case, I did the same thing as you. But my code is a bit different, it's like I call it that (I don't know if there's any difference) side Nim.

var testTable: Table[string, MyObj]

proc populateTable(age: int) =
  let sharedObj = MyObj(age: age)
  testTable["key" & $age] = sharedObj

for i in 1..10000:
  populateTable(i)

Araq (orginal) [2024-06-08T13:20:33+02:00] view original

Well if every object is different how exactly do you think you can save memory?

Alogani (orginal) [2024-06-08T13:22:20+02:00] view original

Hello,

The rule of thumb is too never confuse and avoid mixing:

untraced pointers (ptr T)

traced pointers (ref T)

stack objects

If you want to store the address of a stack object, you are screwed immediatly when the function where your object is defined returns. So you have to copy it.

If you want to store the address (I mean casting the ref to a pointer) of a traced object (ref), it is exactly the same problem: the nim GC (ARC) will detect when your object is no longer referenced and dealloc it (for example when your function returns !). It is possible to tell the GC to not do that with GC_ref and GC_unref, but this is not a really good practice.

Your best bet is either :

to create and store your objects as ref (no conversion to pointers)

to use directly pointers and manage memory yourself (with all the risks it implies). We generally use that kind of idiom : castptr MyObject

And I you want to share your data with C functions, it depends on what C will do, if C will store the pointer, it is unsafe to pass a ref object or the adress of a stack object.

PMunch (orginal) [2024-06-08T13:25:40+02:00] view original

It seems like we're misunderstanding each other. Your objects being a ref object is exactly what you want, and the code I shared should do exactly the thing you're after. If I understand you correctly.

But lets walk through this. You have 10000 objects which are not trivially small (say somewhere above 30 bytes big). You want to be able to look these objects up with the O(1) access that a table grants you. You don't want to store them more than once in memory, and instead always pass around their address.

A few things to note right of the bat:

Your table size will increase with how many elements you but in a table, doesn't matter how big or small the elements are, the size will increase. How much it increases is however a matter of the size of the objects. The size of your keys will also matter for how much space the table will consume. For a Table[string, MyObj] where MyObj = ref object the table really only stores a reference to the string, the hash of the string, and the reference to the underlying object of MyObj for each element. Tables, through their design, also need to allocate more room than strictly required to hold N elements.

Your objects are already ref object which means that they are already only created once and passed around by reference. Reference = pointer = address for the most part.

You still need to store the actual object somewhere. With ref object they will be allocated on the heap when you create them, with normal object they will typically be allocated on the stack or inside a data structure.

So to summarize, your table will require some space, otherwise it wouldn't hold any information. Your objects, when declared as ref object will only have one copy in memory, and when passing the object around Nim actually passes around the reference instead of the object. As long as your object is a ref object you don't need to worry about this, put them in a Table like I showed you, it won't be super huge.

zack (orginal) [2024-06-09T09:28:35+02:00] view original

As a test, I exported a procedure via {.exportc,dynlib.} without saving my objects in a Table, and let my FFI library handle it, and I have the same memory size.

In conclusion, I think I'm on the right way (I think !). Another solution would be to destroy the elements in my table as soon as I'm done using it to reduce my memory size.

Many thanks to all.

Mirror of forum.nim-lang.org

11730 :: Stock address instead of plain object in a table : C to Nim