nimforum mirror - when to use 'ref object' vs plain 'object'

swerling (orginal) [2015-05-08T22:02:01+02:00] view original

Hard core newbie question here: when to use 'ref' when making objects.

Sorry this is long, if I understood it better I could write a shorter question :)

Anyway, when would you want this...

type
  Fruit* = ref object of RootObj
    origin*: string
    price*: Dollar

...instead of this:

type
  Fruit* = object of RootObj
    origin*: string
    price*: Dollar

As an experiment I took the examples in http://goran.krampe.se/2014/12/03/nim-seq/ and removed 'ref' from 'ref object' everywhere to see what would happen. It all seemed to work fine w/out 'ref'. (code for that page is zipped up, so I put it unzipped here and here.)

I RTFM'd and in the manual it says 'ref' (and 'ptr') are for many-to-one relationships. It's been a long time since I programmed in C but I get how in C you'd want the indirection for, eg. mutating a data structure (sorting, inserting, etc). But in nim, collection things like 'sort', 'map', 'filter', etc work fine w/out the indirection of 'ref', so got confused as to when you would need it.

I looked also in Nim's source, but it was not obvious to me why 'ref object' was chosen over plain 'object' there. Eg. why does AsyncFile need to be a ref?

Stefan_Salewski (orginal) [2015-05-08T22:34:34+02:00] view original

Well, one important fact when using pointers or references is, that it allows you to dynamically create arbitrary numbers of objects, using new() or similar for object creation. Without references, you may as well define a large number of objects, for example by using large arrays. But then there is always an upper bound. Using references and allocator functions like new or C alloc() you can generate objects in a loop and you may add all these objects to sequences, lists, trees or other large data structures -- and you are only limited by the RAM of your computer. Another benefit is of course the relation structure: You can have a few references referring to the same object (memory area) which may be useful, or can generate problems as well. Of course we have here many smart people who can answer your question much better. You may look into the Wikipedia article about references too: http://en.wikipedia.org/wiki/Reference_(computer_science)

Sixte (orginal) [2015-05-08T22:44:09+02:00] view original

A ref object (or tuple or another type) creates an object on the heap which is ref-counted. That means the pointer increments a reference counter connected with the object/type on the heap. Every other reference to the object increments the ref-counter further. When a reference goes out of scope, the reference becomes invalid and the ref-counter will be decremented. If a reference counter reaches zero, the object will be garbage-collected.

A ptr object is barely a pointer to the object and does not affect the ref-counter. E.g. in a binary object, the pointers to the nodes are references usually. The pointer back to the parent may be a simple pointer(*), because the parent is "kept" by another reference, e.g. the root of the binary tree. When a parent becomes invalid (no reference points to the parent longer) the ref-counters of the nodes will be decremented. If their reference-counters become zero too, they will be garbage-collected and the allocated memory will be available for reallocation.

(*) and this avoids a cyclic reference. The nim GC detects that, however (which is a challenging task...)

type
  rob = ref ob
  ob  = object
    key : int

var myobject = rob(key: 42) # object allocated and referenced on the heap
echo myobject.key # should reply 42
myobject = nil # now the object is not referenced longer and will be garbage-collected eventually

Jehan (orginal) [2015-05-09T08:46:01+02:00] view original

The primary between a plain object and a reference is that assignment makes a copy of the original object. This makes having shared state harder (aside from objects that are declared in a local or global variable and are passed around exclusively using var parameters). It also creates significant overhead for assigning large objects (since the entire state has to be copied).

Another important difference is that plain objects and polymorphism do not mix well. Anything you assign an object of a subtype to a variable of a supertype, any extraneous state that only exists in the subtype has to be stripped.

Jehan (orginal) [2015-05-09T08:54:50+02:00] view original

Sixte: Every other reference to the object increments the ref-counter further. When a reference goes out of scope, the reference becomes invalid and the ref-counter will be decremented. If a reference counter reaches zero, the object will be garbage-collected.

This is not strictly true. Leaving aside the fact that Nim has non-referencecounting garbage collectors, too, the reference counting collector only changes the reference count of an object when the reference is assigned to a location on the heap or in global memory. Assignments to local variables and passing a reference to or returning it from a procedure do not update the reference count. This is enormously cheaper than normal reference counting, because the bulk of reference count updates need not occur. The downside is that actual deallocation is deferred; when a reference count reaches zero, it may still be referenced from a local variable, so the object cannot be freed immediately. Rather, the system remembers the objects with a zero reference count in a so-called zero count table and processes that table at intervals, freeing only objects that have a reference count of zero and are not still referenced by a local variable.

jyapayne (orginal) [2015-05-09T17:32:28+02:00] view original

What Jehan said can be illustrated in an example like so:


type
    rob = ref ob
    ob = object
        key: int

var ob1 = ob(key:42)
var ob2 = ob1 # makes a copy of ob1
ob2.key = 2

echo ob1.key # prints 42
echo ob2.key # prints 2


var rob1 = rob(key:42)
var rob2 = rob1 # rob2 now points to rob1

rob2.key = 2

echo rob1.key # prints 2
echo rob2.key # prints 2

suvendu (orginal) [2018-08-24T12:43:51+02:00] view original

good to know both the following type definition of rob is same

type

rob = ref ob

ob = object: key: int

type

rob = ref object: key: int

I was struggling to correlate use of Nim ref with C pointer. Now my understanding is that a pointer variable/object in C programming language can be created by using * with any type or struct (or class in C++) but in case of Nim, a (pointer/ref) type need to be created first to create a pointer object.

PMunch (orginal) [2018-08-24T14:51:23+02:00] view original

Something which might help a bit, be sure to read the comments as well: https://www.reddit.com/r/nim/comments/7dm3le/tutorial_for_types_having_a_hard_time/

kcvinu (orginal) [2019-11-20T17:01:49+01:00] view original

@PMunch , Thanks for this great reply.

enthus1ast (orginal) [2019-11-20T17:56:19+01:00] view original

ref is also needed for async:

is not working:

import asyncdispatch
type Foo = object
    num: int

proc dostuff(foo: var Foo): Future[void] {.async.} =
    foo.num = 0
    return

var foo = Foo()
waitFor foo.dostuff()

#Error: 'foo' is of type <var Foo> which cannot be captured as it would violate memory safety, declared here: c:\Users\david\projects\nimPlayground\t1207.nim(7, 14)

while this works:

import asyncdispatch
type Foo = ref object
    num: int

proc dostuff(foo: Foo): Future[void] {.async.} =
    foo.num = 0
    return

var foo = Foo()
waitFor foo.dostuff()

ElegantBeef (orginal) [2021-08-25T23:03:24+02:00] view original

Well given your follow up statement you could very likely use a procedure with lent T to get the exact semantics you're using safely without exposing a mutable reference or copy. For instance:

var a = "SomeString"
proc getSomeString: lent string = a
assert getSomeString()[0].unsafeaddr == a[0].unsafeaddr
var b = a
assert b[0].addr != a[0].addr

alexeypetrushin (orginal) [2021-08-25T23:05:59+02:00] view original

Generally always when we have "many to one" relations.

No. You only need ref if you have mutable "many to one" relation. If it's read only, you don't care if it's copy or ref, because it behaves the same. The only problem is poor performance caused by copying large objects, so you had to use ref, even if you don't need it in terms of behavior.

Araq (orginal) [2021-08-26T07:56:06+02:00] view original

ref object would be almost never needed if Nim supported copy on write.

Nim does support "copy on write" via custom =hooks.

The only problem is poor performance caused by copying large objects (which copy-on-write may help solve), so you had to use ref, even if you don't need it in terms of behavior.

No, that is not the "only problem", try creating an immutable cyclic data structure without mutations. The problem is that it's immutable after a construction phase and modeling phase transitions via a type system is tricky and can quickly become more annoying than leaving the types in the mutable state.

An inspiring image about this approach, I want someone else to do the job (compiler or VM), not me

Yeah I'm not surprised. But if you don't want to program, why not switch jobs... After 50 years of intensive research programming is still hard work, esp if you need to be so good at it that somebody pays you to do it.

alexeypetrushin (orginal) [2021-08-26T16:56:34+02:00] view original

Nim does support "copy on write" via custom =hooks.

You know what I meant, that it should be done automatically.

But if you don't want to program, why not switch jobs...

I don't want to do low-level programming. VM and compilers can do it. Humans have more interesting work to do.

boia01 (orginal) [2021-08-27T02:38:22+02:00] view original

I meant - supported automatically by compiler or VM, not human written hooks.

It's not a perfect solution but you could use a CoW type to wrap your objects, you don't have to write hooks for every object. You just have to declare your objects as CoW[MyObject].

It's not compiler support but it doesn't involve much effort/complexity either.

mratsim (orginal) [2021-08-31T12:35:52+02:00] view original

Couldn't render post #54057.

alexeypetrushin (orginal) [2021-08-31T18:53:21+02:00] view original

You are saying that you can do a better performance optimisation and micromanagement than compiler could do.

I think you are right, for now. But 1) I don't think the situation will be the same in the near future (see this for example). And 2) I don't think majority of other people can do it and willing to spent effort doing it.

I think the practical outcome will be - some non-Nim person will try to convert some program to Nim. From Python/Java/JS and it's going to be slower than original. Because using non-ref for even a couple large objects could easily kill even C-level performance. And it's a very frequent use case.

And the quote from some bio-informatics person

alexeypetrushin (orginal) [2021-08-31T19:22:29+02:00] view original

A possible compromise, could be Nim compiler measuring time spent on objects copying and if it sees

that copying takes like >50% of CPU, printing an optimisation hint like maybe turn object A into ref A

Araq (orginal) [2021-09-01T12:20:11+02:00] view original

As a compromise, use object (and maybe more system.move in your codebase) and leave the language/compiler as it is.

Mirror of forum.nim-lang.org

1207 :: when to use 'ref object' vs plain 'object'