nimforum mirror - Good resources on references

NewGuy (orginal) [2013-11-03T04:25:41+01:00] view original

I'll try my best, tell me how I do:

References are essentially garbage-collected pointers.

You see, in Python everything is stored on a heap and is dynamically allocated and handled for you. Manual memory management isn't there, but in lower-level languages you have the heap and the stack and you have to manage your pointers. So you're going to need to learn a bit about how memory works in these kinds of languages:

The Stack is where value types and pointers to heap data are stored. Because of locality (a bunch of complicated workings of the CPU), it's extremely fast, and therefore most programs are written to make use of the stack as much as possible.

However, stacks are not unlimited, and you can get overflows (prevented in some languages through the use of segmented stacks) when you put too much data on it. As well, the stack doesn't like to be dynamic.

These limitations mean that large data structures should be put on the heap, to prevent using up all the stack space. As well, dynamically sized objects (arrays for instance) are usually put on the heap, since the stack likes to know exactly the size everything should be at compile time.

This last limitation isn't "hard" though. C implemented run-time sized arrays recently, and languages like Ada offer them as well. I'm not sure if Nimrod does or not to be honest, but I don't think it does.

And as a convenience, the stack is managed for you using RAII. When something goes out of scope, it's destroyed. It's super simple to reason with.

EDIT: One last thing - the stack is copy-happy. When you have a stack value equal a stack value, they're now independent of each-other. If one changes, the other one couldn't care less!

The Heap is where dynamically allocated memory and very large data is stored. It's also where you put persistant data structures that need to be shared and have no locality. This is the location with gigabytes of RAM ready for your program to eat up.

The problem with the heap is two fold. First, it's actually pretty hard to reason about the performance of the heap if you aren't programming for a specific platform. Modern computers are insanely complex, and the levels of caching, paging systems, etc just make it a nightmare. Sometimes it's faster to use the heap than the stack due to locality, but sometimes it's an order of magnitude slower! Thus a lot of domains (game programming) tend to be very, very finicky on using the heap in tight loops.

Second, it's sometimes hard to manage. In languages with GC (like C++), you have to use malloc, free, new, delete, etc to create and destroy memory manually. Pointers start null. They're given a value when you point them to data, or create data (and point them to that data). Then when you don't want that data anymore, you can't just set the pointer to null or let it go out of scope, otherwise BOOM, you get a memory leak. You have to destroy it youself, and if you destroy it to early and try to get that data BOOM, segmentation fault.

This second problem isn't really existant here in Nimrod thanks to it's garbage collection, but it's nice to know anyway - in case you need to use unsafe memory or wrap around C libraries.

Of course, it sounds like the heap is nothing but trouble, but manual memory management isn't that hard once you get used to it (except when projects get big and things like cycles occur) and the GC makes it super simple. And the speed isn't as good as the stack most of the time, but it's definitely fast enough. The ability to store dynamically-sized objects and large data structures with little to no fear of problems is what it does and it does it great.

The last thing "persistant data structures that need to be shared and have no locality" sounds scary, but it's simple. When you leave the scope of a block, the objects created on the stack are destroyed. But the objects on the heap are not, they last until they're deleted (in Nimrod's case, when the GC deletes it for you). So you'll have it as long as you need it an no longer!

Pointers are the address of memory, where your data is stored. These trip a lot of new guys (hehe) up.

The pointer points to a location in memory, and the OS uses the pointer to find the data in the heap and get the data for you, or make the changes you want to it. You can get the data, manipulate the data, or destroy the data. As well, that location isn't tied to one variable. Anything you do to that data will be seen by every other pointer pointing to the same place. This is very useful when you need to share data amungst different locations.

For instance, writing a game you may need each character to "see" the map. With pointers you can have them all see the same single map, and not need to go copying it for each character, then needing to manage all the changes characters make to their individual maps, and not waste memory!

EDIT: If you were to create a pointer to data, then make another variable equal to that pointer - it wouldn't contain that data it points to, just the address! They're dependent of each-other, and if one variable changes the data at that address, both of them will see those changes!

This is one of the reasons pointers are so great - they use very little stack space to represent very large objects, and can prevent the need to have many copies of that object!

As an analogy - Pointers are like... erm... the address to a house. You can use the address to find the house, and you can look inside and play with the house - even do some interior decorating, you can even break the thing down if you want (although that make some other pointers null/invalid), but anything you do to that house is going to be seen by anyone else going to that address! And that address can be seen by as many other people you give it to. And they can do the same, too!

Null pointers are pointers that point to nothing - An address that leads to the Sahara. There's no ******* house here, it's just sa- Segmentation Fault.

Invalid pointers are pointers to memory that either isn't yours, or at least isn't the right data you needed - An address to some random house. If it's yours - it's the wrong house, and you'll just cause some real problems for yourself as you turn your business home into a fluffy pink kitten factory before your boss comes to visit. If it's someone elses house - the operating system... I mean... police, are you going to have a prob- Segmentation Fault.

Like I said in the negatives of the heap, you have to make sure to properly manage them and keep track of everything. You need to keep pointers valid and then delete them when they aren't. But since Nimrod is GC - you don't have to worry about anything but accessing null/invalid pointers.

References are simple. Don't let all this scare you too much. And if you need help understanding, you now know all the terms a lot better, so you at least have a bit of understanding to go Google better with (hopefully).

In Nimrod, references are just garbage collected pointers to objects on the heap. So when you have dynamic, large, or persistant data structures, you'd use

 to create memory on the heap and assign that to a reference type! Nimrod will automagically take care of deleting the memory, and all you need to do is make sure you manage your nulls and invalid references.

Some Gotchyas about memory -

The stack is technically heap memory as well. You can have pointers to the data on it by taking it's address. This can be dangerous though - as the stack naturally contains short-lived objects. If you go out of a scope and try to deref the pointer - you can either get a segmentation fault or memory corruption. It's a valuable tool, though. For instance, it can also be used to prevent copying data when passing pretty large stack-based objects. I'm just saying to be careful.

Cycles are caused when a pointer points to an object, while another pointer in that object points to the first object. You have cyclic dependency. This means that the reference counts are never reduced and smart pointers or Nimrod's deferred reference counter GC can never destroy the data! Luckily, we have a cycle detector. It's slow and it's speed is relative to the size of the heap, but it does what it needs to do. If you turn it off for performance - beware.


If you have any questions, feel free to ask! I tried my best, but it's late and writing long narratives in this tiny text box is a bit cumbersome.

Mirror of forum.nim-lang.org

294 :: Good resources on references