Hard core newbie question here: when to use 'ref' when making objects.
Sorry this is long, if I understood it better I could write a shorter question :)
Anyway, when would you want this...
type
Fruit* = ref object of RootObj
origin*: string
price*: Dollar
...instead of this:
type
Fruit* = object of RootObj
origin*: string
price*: Dollar
As an experiment I took the examples in http://goran.krampe.se/2014/12/03/nim-seq/ and removed 'ref' from 'ref object' everywhere to see what would happen. It all seemed to work fine w/out 'ref'. (code for that page is zipped up, so I put it unzipped here and here.)
I RTFM'd and in the manual it says 'ref' (and 'ptr') are for many-to-one relationships. It's been a long time since I programmed in C but I get how in C you'd want the indirection for, eg. mutating a data structure (sorting, inserting, etc). But in nim, collection things like 'sort', 'map', 'filter', etc work fine w/out the indirection of 'ref', so got confused as to when you would need it.
I looked also in Nim's source, but it was not obvious to me why 'ref object' was chosen over plain 'object' there. Eg. why does AsyncFile need to be a ref?
A ref object (or tuple or another type) creates an object on the heap which is ref-counted. That means the pointer increments a reference counter connected with the object/type on the heap. Every other reference to the object increments the ref-counter further. When a reference goes out of scope, the reference becomes invalid and the ref-counter will be decremented. If a reference counter reaches zero, the object will be garbage-collected.
A ptr object is barely a pointer to the object and does not affect the ref-counter. E.g. in a binary object, the pointers to the nodes are references usually. The pointer back to the parent may be a simple pointer(*), because the parent is "kept" by another reference, e.g. the root of the binary tree. When a parent becomes invalid (no reference points to the parent longer) the ref-counters of the nodes will be decremented. If their reference-counters become zero too, they will be garbage-collected and the allocated memory will be available for reallocation.
(*) and this avoids a cyclic reference. The nim GC detects that, however (which is a challenging task...)
type
rob = ref ob
ob = object
key : int
var myobject = rob(key: 42) # object allocated and referenced on the heap
echo myobject.key # should reply 42
myobject = nil # now the object is not referenced longer and will be garbage-collected eventually
The primary between a plain object and a reference is that assignment makes a copy of the original object. This makes having shared state harder (aside from objects that are declared in a local or global variable and are passed around exclusively using var parameters). It also creates significant overhead for assigning large objects (since the entire state has to be copied).
Another important difference is that plain objects and polymorphism do not mix well. Anything you assign an object of a subtype to a variable of a supertype, any extraneous state that only exists in the subtype has to be stripped.
Sixte: Every other reference to the object increments the ref-counter further. When a reference goes out of scope, the reference becomes invalid and the ref-counter will be decremented. If a reference counter reaches zero, the object will be garbage-collected.
This is not strictly true. Leaving aside the fact that Nim has non-referencecounting garbage collectors, too, the reference counting collector only changes the reference count of an object when the reference is assigned to a location on the heap or in global memory. Assignments to local variables and passing a reference to or returning it from a procedure do not update the reference count. This is enormously cheaper than normal reference counting, because the bulk of reference count updates need not occur. The downside is that actual deallocation is deferred; when a reference count reaches zero, it may still be referenced from a local variable, so the object cannot be freed immediately. Rather, the system remembers the objects with a zero reference count in a so-called zero count table and processes that table at intervals, freeing only objects that have a reference count of zero and are not still referenced by a local variable.
What Jehan said can be illustrated in an example like so:
type
rob = ref ob
ob = object
key: int
var ob1 = ob(key:42)
var ob2 = ob1 # makes a copy of ob1
ob2.key = 2
echo ob1.key # prints 42
echo ob2.key # prints 2
var rob1 = rob(key:42)
var rob2 = rob1 # rob2 now points to rob1
rob2.key = 2
echo rob1.key # prints 2
echo rob2.key # prints 2
I was struggling to correlate use of Nim ref with C pointer. Now my understanding is that a pointer variable/object in C programming language can be created by using * with any type or struct (or class in C++) but in case of Nim, a (pointer/ref) type need to be created first to create a pointer object.
ref is also needed for async:
is not working:
import asyncdispatch
type Foo = object
num: int
proc dostuff(foo: var Foo): Future[void] {.async.} =
foo.num = 0
return
var foo = Foo()
waitFor foo.dostuff()
#Error: 'foo' is of type <var Foo> which cannot be captured as it would violate memory safety, declared here: c:\Users\david\projects\nimPlayground\t1207.nim(7, 14)
while this works:
import asyncdispatch
type Foo = ref object
num: int
proc dostuff(foo: Foo): Future[void] {.async.} =
foo.num = 0
return
var foo = Foo()
waitFor foo.dostuff()
Well given your follow up statement you could very likely use a procedure with lent T to get the exact semantics you're using safely without exposing a mutable reference or copy. For instance:
var a = "SomeString"
proc getSomeString: lent string = a
assert getSomeString()[0].unsafeaddr == a[0].unsafeaddr
var b = a
assert b[0].addr != a[0].addr
Generally always when we have "many to one" relations.
No. You only need ref if you have mutable "many to one" relation. If it's read only, you don't care if it's copy or ref, because it behaves the same. The only problem is poor performance caused by copying large objects, so you had to use ref, even if you don't need it in terms of behavior.
ref object would be almost never needed if Nim supported copy on write.
Nim does support "copy on write" via custom =hooks.
The only problem is poor performance caused by copying large objects (which copy-on-write may help solve), so you had to use ref, even if you don't need it in terms of behavior.
No, that is not the "only problem", try creating an immutable cyclic data structure without mutations. The problem is that it's immutable after a construction phase and modeling phase transitions via a type system is tricky and can quickly become more annoying than leaving the types in the mutable state.
An inspiring image about this approach, I want someone else to do the job (compiler or VM), not me
Yeah I'm not surprised. But if you don't want to program, why not switch jobs... After 50 years of intensive research programming is still hard work, esp if you need to be so good at it that somebody pays you to do it.
Nim does support "copy on write" via custom =hooks.
You know what I meant, that it should be done automatically.
But if you don't want to program, why not switch jobs...
I don't want to do low-level programming. VM and compilers can do it. Humans have more interesting work to do.
I meant - supported automatically by compiler or VM, not human written hooks.
It's not a perfect solution but you could use a CoW type to wrap your objects, you don't have to write hooks for every object. You just have to declare your objects as CoW[MyObject].
It's not compiler support but it doesn't involve much effort/complexity either.
You are saying that you can do a better performance optimisation and micromanagement than compiler could do.
I think you are right, for now. But 1) I don't think the situation will be the same in the near future (see this for example). And 2) I don't think majority of other people can do it and willing to spent effort doing it.
I think the practical outcome will be - some non-Nim person will try to convert some program to Nim. From Python/Java/JS and it's going to be slower than original. Because using non-ref for even a couple large objects could easily kill even C-level performance. And it's a very frequent use case.
And the quote from some bio-informatics person