nimforum mirror - Why do custom types need to be reference counted objects for dynamic dispatch to work.

def_pri_pub (orginal) [2017-01-04T03:21:25+01:00] view original

Let's have a code example. When I'm using ref object of ... This code:

type
  Animal = ref object of RootObj
    name: string

method makeNoise(this: Animal) {.base.} =
  echo "..."


type
  Human = ref object of Animal
  Dog = ref object of Animal

method makeNoise(this: Human) =
  echo "Hi, I'm ", this.name

method makeNoise(this: Dog) =
  echo "*Bark!* [said ", this.name, "]"


let
  h = Human(name: "Kevin Bacon")
  d = Dog(name: "Fuzzy")

h.makeNoise()
d.makeNoise()

let a:Animal = Dog(name: "Fluffy")
a.makeNoise()

Produces this result:


Hi, I'm Kevin Bacon
*Bark!* [said Fuzzy]
*Bark!* [said Fluffy]

But when we take away that ref keyword from the type lines, the output becomes this:


Hi, I'm Kevin Bacon
*Bark!* [said Fuzzy]
...

This leads me to believe that my objects need to be reference counted to take advantage of dynamic dispatch. Is this true?

Varriount (orginal) [2017-01-04T06:08:16+01:00] view original

Yes, objects need to be reference counted for methods to work. This is because only reference types can point to variable-length memory regions.

Take the below code:

type
  Animal = ref object of RootObj
    name: string
  
  Dog = ref object of Animal
    breed: string

method makeNoise(this: Animal) =
  echo "Hi, I'm ", this.name

method makeNoise(this: Dog) =
  echo "*Bark!* [said ", this.name, "]"

These type definitions translate roughly to the equivalent structures:

# TypeInfo is an object containing type information
# makeTypeInfo creates a TypeInfo object holding a type's information

type
  AnimalObjBase = object of RootObj
    typeInfo = ptr TypeInfo
  
  AnimalBase = ptr AnimalObjBase
  
  AnimalObj = object of RootObj
    typeInfo = ptr TypeInfo
    name: pointer
  
  Animal = ptr AnimalObj
  
  DogObj = object of RootObj
    typeInfo = ptr TypeInfo
    name: pointer
    breed: pointer
  
  Dog = ptr DogObj


const
  animalTypeInfo: TypeInfo = makeTypeInfo(AnimalObjBase)
  dogTypeInfo: TypeInfo = makeTypeInfo(DogObjBase)


proc makeNoise_Animal(this: Animal) =
  echo "Hi, I'm ", this.name

proc makeNoise_Dog(this: Dog) =
  echo "*Bark!* [said ", this.name, "]"

proc makeNoise(this: AnimalBase) =
  if baseObj.typeInfo == animalTypeInfo:
    makeNoise_Animal(cast[Animal](this))
  elif baseObj.typeInfo == dogTypeInfo:
    makeNoise_Dog(cast[Dog](this))

(Note that this isn't exactly valid code, nor is it precisely how methods are implemented)

Note that 'AnimalObjBase', 'AnimalObj', and 'DogObj' all share common fields, 'typeInfo' for all three, and 'name' for the latter two. This means that, given a region of memory holding data from one of these three types, we will always be able to access the 'typeInfo' field, and given a region of memory holding data from AnimalObj or DogObj, we can access the 'name' field (this field-sharing is the basis for subtyping).


+---------------+   +---------------+   +---------------+
| AnimalObjBase |   | AnimalObj     |   | DogObj        |
+---------------+   +---------------+   +---------------+
| typeInfo      |   | typeInfo      |   | typeInfo      |
+---------------+   +---------------+   +---------------+
                    | name          |   | name          |
                    +---------------+   +---------------+
                                        | breed         |
                                        +---------------+

The typeInfo field is used to mark these regions of memory. As long as every AnimalObj's 'typeInfo' member points to 'animalTypeInfo' and every DogObj's 'typeInfo' member points to 'dogTypeInfo', we can reinterpret (cast) these regions of memory to their appropriate types, and pass them into their corresponding procedures/methods.

Now lets look at how objects are stored in memory. In contrast to references, which are pointers that always point to heap-allocated memory, object data may be located either in the heap or the stack. It's this latter case that reveals why methods won't work on object types.

Say we create Animal and Dog variables in a main method, then pass those variables into a procedure which calls the 'makeNoise' method:

method makeNoise(this: AnimalBase)

proc makeLotsOfNoise(someAnimal: Animal):
  makeNoise(someAnimal)
  makeNoise(someAnimal)
  makeNoise(someAnimal)

proc main =
  var animal = Animal(name: "Unknown")
  var dog = Dog(name: "Spot", breed: "Poodle")
  
  makeLotsOfNoise(animal)
  makeLotsOfNoise(dog)

main()

When 'main' is called, after the variables are created, the stack holds two references that point to regions of heap memory:


main():
  animal: 8 byte pointer -> 16 byte heap memory region
  dog:    8 byte pointer -> 24 byte heap memory region

And when makeLotsOfNoise is called, the stack layout looks something like this:


main():
  animal: 8 byte pointer -> 16 byte heap memory region
  dog:    8 byte pointer -> 24 byte heap memory region
  makeLotsOfNoise(someAnimal = animal):
    someAnimal: 8 byte pointer -> 16 byte heap memory region
    makeNoise(this = someAnimal):
      this: 8 byte pointer -> 16 byte heap memory region
      ...
  makeLotsOfNoise(someAnimal = dog):
    someAnimal: 8 byte pointer -> 24 byte heap memory region
    makeNoise(this = someAnimal):
      this: 8 byte pointer -> 24 byte heap memory region
      ...

Make note of the size of the parameter passed into 'makeLotsOfNoise' - it's always an 8 byte pointer. This is a constraint of how procedure calls work, as the size of the parameters usually needs to be known ahead of time. Furthermore, the semantics of procedure calls must allow for the possibility (even if optimization decides otherwise) for parameter data to be copied from the previous procedure frame to the current procedure frame.

Now observe what happens if we were allowed to use objects instead. Our code becomes:

method makeNoise(this: AnimalObjBase)

proc makeLotsOfNoise(someAnimal: AnimalObj):
  makeNoise(someAnimal)
  makeNoise(someAnimal)
  makeNoise(someAnimal)

proc main =
  var animal = AnimalObj(name: "Unknown")
  var dog = DogObj(name: "Spot", breed: "Poodle")
  
  makeLotsOfNoise(animal)
  makeLotsOfNoise(dog)

main()

And our stack looks like this:


main():
  animal: 16 byte stack memory region
  dog:    24 byte stack memory region
  makeLotsOfNoise(someAnimal = animal):
    someAnimal: 16 byte memory region
    makeNoise(this = someAnimal):
      this: 8 byte memory region
      ...
  makeLotsOfNoise(someAnimal = dog):
    someAnimal: 16 byte memory region
    makeNoise(this = someAnimal):
      this: 8 byte memory region
      ...

Notice that, because parameter data is copied from frame to frame, the region containing the 'Dog' data was truncated from 24 to 8 bytes! This would obviously lead to problems - what happens when makeNoise dispatches to the Animal and Dog methods, and the name/breed fields are accessed? We would get garbage, as the program tries to read from wrong areas of the stack.

While there are workarounds for this (the one that comes to my mind is passing a pointer to the stack data*, instead of copying it around), they all come with additional costs/caveats, or make parameter passing semantics even more complex than they already are.

Disclaimers:

*This is actually already done, except if certain pragmas are used (which the semantics still have to accommodate)

Yes, I know about alignments and have the stack would actually be laid out. The above stack diagrams are meant to illustrate the point, not the reality.

All the above implementation details are subject to change. For all I know type information could be passed as a hidden parameter in the future (or maybe it already is).

def_pri_pub (orginal) [2017-01-04T06:37:22+01:00] view original

Wow. Thanks for that post. I think this belongs in the docs somewhere or on a wiki.

I came across this issue when I was change the prototype of a base method but forgot to change one of the child objects. So for that specific child object it was using the base method. Would using the base pragma have the Nim compiler fail if the child prototypes didn't match the parent?

Jehan (orginal) [2017-01-05T16:46:54+01:00] view original

Note that ref stands for "reference", not "reference counting". The behavior will not differ between the reference counting and the mark and sweep GC.

You also do not strictly require ref for polymorphism to work, though this is the most common use case; any kind of pointer (ref, ptr, var, or pass-by-reference for value arguments) will work.

Example:

type
  animal = object of RootObj
  dog = object of animal
  cat = object of animal

method say(self: animal) = discard
method say(self: dog) = echo "woof!"
method say(self: cat) = echo "meow?"

proc make_noise(a: var animal) =
  a.say; a.say; a.say

proc main =
  var d: dog
  var c: cat
  d.make_noise
  c.make_noise

main()

The reason why it doesn't work without pointers is that variables that aren't references (or somesuch) cannot themselves handle polymorphic types and will be coerced to the supertype upon assignment by hacking off any extraneous fields at the end of the subtype and changing the type field. Otherwise, it may be possible that method calls would try to access fields that do not exist in memory.

Krux02 (orginal) [2017-01-06T15:59:15+01:00] view original

in function parameters, you don't need var keyword. Parameters are passed by immutable reference by default, only when you want to change the argument in the function you need the var declaration.

type
  animal = object of RootObj
  dog = object of animal
  cat = object of animal

method say(self: animal) = discard
method say(self: dog) = echo "woof!"
method say(self: cat) = echo "meow?"

proc make_noise(a: animal) =
  a.say; a.say; a.say

proc main =
  let d = dog()
  let c = cat()
  d.make_noise
  c.make_noise

main()

Jehan (orginal) [2017-01-06T16:38:39+01:00] view original

Krux02: in function parameters, you don't need var keyword. Parameters are passed by immutable reference by default, only when you want to change the argument in the function you need the var declaration.

This is currently not specified, unless you use {.bycopy.} or {.byref.}. Value parameters can either be passed by value or by reference. If you specify {.bycopy.} for each of the object types above, you will actually run into a bug:

{.pragma: byX, bycopy.}

type
  animal {.byX.} = object of RootObj
  dog {.byX.} = object of animal
    name: string
  cat {.byX.} = object of animal
    name: string

method say(self: animal) = discard
method say(self: dog) = echo self.name, ": woof!"
method say(self: cat) = echo self.name, ": meow?"

proc make_noise(a: animal) =
  a.say; a.say; a.say

proc main =
  var d: dog = dog(name: "Snoopy")
  var c: cat = cat(name: "Garfield")
  d.make_noise
  c.make_noise

main()

Krux02 (orginal) [2017-01-06T17:54:25+01:00] view original

I know that parameters can also be passed by value, when the compiler decides to do so. I just didn't mention it, because semantically it is the same if you have a copy of an object that you cant modify, or a reference to the original object that you can't modify either. And since it did work without problems I left out the detail that sometimes parameters are passed by value. I didn't look it up, I just assumed that all types with inheritance are always passed by reference, because those types are meant to be used is a polymorphic context, and pass by value would not allow the function to be used in a polymorphic way.

So I think technically you are right with "This is currently not specified", but I highly doubt that this behaviour might change in the future, because it just works too well. I think this pass by value and pass by reference should be documented more.

I don't think your example is a bug , I think it is a very well written example of how to not use the bycopy pragma, because it destroys the polymorphic attributes of polymorphic types.

EDIT: I just realized my message reads a bit offensive. Sorry for that, I like your last post, I just don't agree with your message.

Jehan (orginal) [2017-01-06T19:22:07+01:00] view original

Kruxo2: I don't think your example is a bug , I think it is a very well written example of how to not use the bycopy pragma, because it destroys the polymorphic attributes of polymorphic types.

It is a bug, because it breaks memory safety. If you use ints instead of strings, you'll see that random values are essentially pulled from stack frames.

Krux02 (orginal) [2017-01-06T21:17:40+01:00] view original

I don't think an argument of what the bug is, is leading to anything here. I think the compiler can throw a warning here at least, if not an error. But we can agree here, that the default parameter passing does work very well.

Mirror of forum.nim-lang.org

2698 :: Why do custom types need to be reference counted objects for dynamic dispatch to work.