Let's have a code example. When I'm using ref object of ... This code:
type
Animal = ref object of RootObj
name: string
method makeNoise(this: Animal) {.base.} =
echo "..."
type
Human = ref object of Animal
Dog = ref object of Animal
method makeNoise(this: Human) =
echo "Hi, I'm ", this.name
method makeNoise(this: Dog) =
echo "*Bark!* [said ", this.name, "]"
let
h = Human(name: "Kevin Bacon")
d = Dog(name: "Fuzzy")
h.makeNoise()
d.makeNoise()
let a:Animal = Dog(name: "Fluffy")
a.makeNoise()
Produces this result:
Hi, I'm Kevin Bacon
*Bark!* [said Fuzzy]
*Bark!* [said Fluffy]
But when we take away that ref keyword from the type lines, the output becomes this:
Hi, I'm Kevin Bacon
*Bark!* [said Fuzzy]
...
This leads me to believe that my objects need to be reference counted to take advantage of dynamic dispatch. Is this true?
Yes, objects need to be reference counted for methods to work. This is because only reference types can point to variable-length memory regions.
Take the below code:
type
Animal = ref object of RootObj
name: string
Dog = ref object of Animal
breed: string
method makeNoise(this: Animal) =
echo "Hi, I'm ", this.name
method makeNoise(this: Dog) =
echo "*Bark!* [said ", this.name, "]"
These type definitions translate roughly to the equivalent structures:
# TypeInfo is an object containing type information
# makeTypeInfo creates a TypeInfo object holding a type's information
type
AnimalObjBase = object of RootObj
typeInfo = ptr TypeInfo
AnimalBase = ptr AnimalObjBase
AnimalObj = object of RootObj
typeInfo = ptr TypeInfo
name: pointer
Animal = ptr AnimalObj
DogObj = object of RootObj
typeInfo = ptr TypeInfo
name: pointer
breed: pointer
Dog = ptr DogObj
const
animalTypeInfo: TypeInfo = makeTypeInfo(AnimalObjBase)
dogTypeInfo: TypeInfo = makeTypeInfo(DogObjBase)
proc makeNoise_Animal(this: Animal) =
echo "Hi, I'm ", this.name
proc makeNoise_Dog(this: Dog) =
echo "*Bark!* [said ", this.name, "]"
proc makeNoise(this: AnimalBase) =
if baseObj.typeInfo == animalTypeInfo:
makeNoise_Animal(cast[Animal](this))
elif baseObj.typeInfo == dogTypeInfo:
makeNoise_Dog(cast[Dog](this))
(Note that this isn't exactly valid code, nor is it precisely how methods are implemented)
Note that 'AnimalObjBase', 'AnimalObj', and 'DogObj' all share common fields, 'typeInfo' for all three, and 'name' for the latter two. This means that, given a region of memory holding data from one of these three types, we will always be able to access the 'typeInfo' field, and given a region of memory holding data from AnimalObj or DogObj, we can access the 'name' field (this field-sharing is the basis for subtyping).
+---------------+ +---------------+ +---------------+
| AnimalObjBase | | AnimalObj | | DogObj |
+---------------+ +---------------+ +---------------+
| typeInfo | | typeInfo | | typeInfo |
+---------------+ +---------------+ +---------------+
| name | | name |
+---------------+ +---------------+
| breed |
+---------------+
The typeInfo field is used to mark these regions of memory. As long as every AnimalObj's 'typeInfo' member points to 'animalTypeInfo' and every DogObj's 'typeInfo' member points to 'dogTypeInfo', we can reinterpret (cast) these regions of memory to their appropriate types, and pass them into their corresponding procedures/methods.
Now lets look at how objects are stored in memory. In contrast to references, which are pointers that always point to heap-allocated memory, object data may be located either in the heap or the stack. It's this latter case that reveals why methods won't work on object types.
Say we create Animal and Dog variables in a main method, then pass those variables into a procedure which calls the 'makeNoise' method:
method makeNoise(this: AnimalBase)
proc makeLotsOfNoise(someAnimal: Animal):
makeNoise(someAnimal)
makeNoise(someAnimal)
makeNoise(someAnimal)
proc main =
var animal = Animal(name: "Unknown")
var dog = Dog(name: "Spot", breed: "Poodle")
makeLotsOfNoise(animal)
makeLotsOfNoise(dog)
main()
When 'main' is called, after the variables are created, the stack holds two references that point to regions of heap memory:
main():
animal: 8 byte pointer -> 16 byte heap memory region
dog: 8 byte pointer -> 24 byte heap memory region
And when makeLotsOfNoise is called, the stack layout looks something like this:
main():
animal: 8 byte pointer -> 16 byte heap memory region
dog: 8 byte pointer -> 24 byte heap memory region
makeLotsOfNoise(someAnimal = animal):
someAnimal: 8 byte pointer -> 16 byte heap memory region
makeNoise(this = someAnimal):
this: 8 byte pointer -> 16 byte heap memory region
...
makeLotsOfNoise(someAnimal = dog):
someAnimal: 8 byte pointer -> 24 byte heap memory region
makeNoise(this = someAnimal):
this: 8 byte pointer -> 24 byte heap memory region
...
Make note of the size of the parameter passed into 'makeLotsOfNoise' - it's always an 8 byte pointer. This is a constraint of how procedure calls work, as the size of the parameters usually needs to be known ahead of time. Furthermore, the semantics of procedure calls must allow for the possibility (even if optimization decides otherwise) for parameter data to be copied from the previous procedure frame to the current procedure frame.
Now observe what happens if we were allowed to use objects instead. Our code becomes:
method makeNoise(this: AnimalObjBase)
proc makeLotsOfNoise(someAnimal: AnimalObj):
makeNoise(someAnimal)
makeNoise(someAnimal)
makeNoise(someAnimal)
proc main =
var animal = AnimalObj(name: "Unknown")
var dog = DogObj(name: "Spot", breed: "Poodle")
makeLotsOfNoise(animal)
makeLotsOfNoise(dog)
main()
And our stack looks like this:
main():
animal: 16 byte stack memory region
dog: 24 byte stack memory region
makeLotsOfNoise(someAnimal = animal):
someAnimal: 16 byte memory region
makeNoise(this = someAnimal):
this: 8 byte memory region
...
makeLotsOfNoise(someAnimal = dog):
someAnimal: 16 byte memory region
makeNoise(this = someAnimal):
this: 8 byte memory region
...
Notice that, because parameter data is copied from frame to frame, the region containing the 'Dog' data was truncated from 24 to 8 bytes! This would obviously lead to problems - what happens when makeNoise dispatches to the Animal and Dog methods, and the name/breed fields are accessed? We would get garbage, as the program tries to read from wrong areas of the stack.
While there are workarounds for this (the one that comes to my mind is passing a pointer to the stack data*, instead of copying it around), they all come with additional costs/caveats, or make parameter passing semantics even more complex than they already are.
Wow. Thanks for that post. I think this belongs in the docs somewhere or on a wiki.
I came across this issue when I was change the prototype of a base method but forgot to change one of the child objects. So for that specific child object it was using the base method. Would using the base pragma have the Nim compiler fail if the child prototypes didn't match the parent?
Note that ref stands for "reference", not "reference counting". The behavior will not differ between the reference counting and the mark and sweep GC.
You also do not strictly require ref for polymorphism to work, though this is the most common use case; any kind of pointer (ref, ptr, var, or pass-by-reference for value arguments) will work.
Example:
type
animal = object of RootObj
dog = object of animal
cat = object of animal
method say(self: animal) = discard
method say(self: dog) = echo "woof!"
method say(self: cat) = echo "meow?"
proc make_noise(a: var animal) =
a.say; a.say; a.say
proc main =
var d: dog
var c: cat
d.make_noise
c.make_noise
main()
The reason why it doesn't work without pointers is that variables that aren't references (or somesuch) cannot themselves handle polymorphic types and will be coerced to the supertype upon assignment by hacking off any extraneous fields at the end of the subtype and changing the type field. Otherwise, it may be possible that method calls would try to access fields that do not exist in memory.
in function parameters, you don't need var keyword. Parameters are passed by immutable reference by default, only when you want to change the argument in the function you need the var declaration.
type
animal = object of RootObj
dog = object of animal
cat = object of animal
method say(self: animal) = discard
method say(self: dog) = echo "woof!"
method say(self: cat) = echo "meow?"
proc make_noise(a: animal) =
a.say; a.say; a.say
proc main =
let d = dog()
let c = cat()
d.make_noise
c.make_noise
main()
Krux02: in function parameters, you don't need var keyword. Parameters are passed by immutable reference by default, only when you want to change the argument in the function you need the var declaration.
This is currently not specified, unless you use {.bycopy.} or {.byref.}. Value parameters can either be passed by value or by reference. If you specify {.bycopy.} for each of the object types above, you will actually run into a bug:
{.pragma: byX, bycopy.}
type
animal {.byX.} = object of RootObj
dog {.byX.} = object of animal
name: string
cat {.byX.} = object of animal
name: string
method say(self: animal) = discard
method say(self: dog) = echo self.name, ": woof!"
method say(self: cat) = echo self.name, ": meow?"
proc make_noise(a: animal) =
a.say; a.say; a.say
proc main =
var d: dog = dog(name: "Snoopy")
var c: cat = cat(name: "Garfield")
d.make_noise
c.make_noise
main()
I know that parameters can also be passed by value, when the compiler decides to do so. I just didn't mention it, because semantically it is the same if you have a copy of an object that you cant modify, or a reference to the original object that you can't modify either. And since it did work without problems I left out the detail that sometimes parameters are passed by value. I didn't look it up, I just assumed that all types with inheritance are always passed by reference, because those types are meant to be used is a polymorphic context, and pass by value would not allow the function to be used in a polymorphic way.
So I think technically you are right with "This is currently not specified", but I highly doubt that this behaviour might change in the future, because it just works too well. I think this pass by value and pass by reference should be documented more.
I don't think your example is a bug , I think it is a very well written example of how to not use the bycopy pragma, because it destroys the polymorphic attributes of polymorphic types.
EDIT: I just realized my message reads a bit offensive. Sorry for that, I like your last post, I just don't agree with your message.
Kruxo2: I don't think your example is a bug , I think it is a very well written example of how to not use the bycopy pragma, because it destroys the polymorphic attributes of polymorphic types.
It is a bug, because it breaks memory safety. If you use ints instead of strings, you'll see that random values are essentially pulled from stack frames.