primarily focusses on thread local (and garbage collected) heaps and message passing between threads.
Is there any other message passing mechanism except channels?
In source it is described that channels are slow now. What is the best solution to send objects (or refs to objects) between threads?
Slow is relative. It means that the way in which data is encoded when sending them across channels is slower than memcpy(), but not that it's slow like sending data over a network connection (note that other languages like Dart, Erlang, or OCaml use the same basic model and work fine). Note that even in "shared memory" models on modern processors, you're still essentially constantly shoveling data between registers or the L1 cache and DRAM (or at least the L3 cache) if you want multiple threads to use it. This is how, for example, so-called false sharing (i.e. when there's contention for the same cache line by multiple cores) can make a parallel shared memory application an order of magnitude slower than the sequential version. So, "slow" is relative: it's not something that you want to do in a tight inner loop, but then you don't want to use any concurrency primitive in a tight inner loop (a blocking mutex, for example, will easily devour thousands of clock cycles due to the context switch it induces).
The spawn primitive also transfers its arguments (and results) to (or from) a separate thread, but uses a deep copy implementation, which should have somewhat higher performance.
If you deal with large amounts of shared data, the easiest way to use a global heap (and avoid copying altogether) is to switch to the Boehm GC (which, other than not being generational is actually pretty good these days [1]). Under the Boehm GC, you can share references pretty much arbitrarily, but will still have to sort out race conditions (which obviously cannot occur with thread-local heaps).
Nim can in principle also work with the Go GC, though I am not sure how stable that interface is at the moment (given that both Nim and Go are undergoing some fairly rapid evolution).
You can also freely share ref-free data between threads, which is relevant for (say) operating on large matrices, and you can use shared memory as long as you are willing to put up with manual memory allocation (see allocShared() and friends).
More generally, if you want to write a multi-threaded program, you first need to figure out which programming model you wish to use and then how to model your application to live on top of it (Nim's biggest shortcoming in this area at the moment is arguably a lack of high-level support for established message passing models, such as actors; the problem is not that message passing is deficient, but that dealing with raw channels is cumbersome).
[1] The Boehm GC actually has support for generational GC, but that requires some non-trivial effort. Its main downside is that it is a stop-the-world collector, which makes it unsuitable for applications that require low pause times, but is fairly inconsequential for work that involves, say, number crunching.