I am a newcomer to nim coming from Python (& some C) , and am having trouble getting my head around when to use new vs. init, and returning refs vs. objects. I am trying to covert some python code to nim to improve performance, and was hoping someone could help me understand where I'm going wrong. My problem arises in 2 functions. The first creates & returns tables of seqs, where the keys are substrings and the values are all strings having that substring:
proc get_buffer_level(buff: Buff, j,level: int): TableRef =
var inv: string
result = initTable[string,seq[string]]()
for kmer in buff.levels[level].keys:
inv = kmer[j+level .. k-level-1]
if hasKey(result, inv):
result[inv].add(kmer)
else:
result[inv] = @[kmer]
In the second, I want to use the tables from the first as follows:
proc get_candidate_paths(filename: string, bf: object; rc=false): auto =
var
f_hand = open(filename)
cands = newStringTable()
line_no = 0
read: string
kmers: array[0..read_len-k+1, string]
buff = get_empty_buff(j)
backs = newTable[string,seq[string]]()
fronts = newTable[string,seq[string]]()
for line in f_hand.lines:
if (line_no + 1) mod 10_000==0:
echo($(line_no+1) & " " & $len(cands))
read = strip(line)
if rc:
read = get_rc(read)
get_kmers(read, k, kmers)
init_read_buff(kmers[0], buff, bf)
# print_buff_info(buff)
for ind, value in @kmers:
backs = get_buffer_level(buff,j,0)
fronts = get_buffer_level(buff,j,j)
...
When I try to compile, I get <my_file.nim>(246, 39) Info: template/generic instantiation from here <my_file.nim>(182, 37) Error: cannot instantiate: 'TableRef'
where the first line number is referring to the call to the second function, and the the second is the line "backs = get_buffer_level(buff,j,0)" I would appreciate any guidance on where I'm going wrong. Many thanks
I managed to resolve some of the issues by looking at more examples - I wasn't as successful with the docs :(. If others are interested, here's the skeleton of the current compiling version:
proc get_buffer_level(buff: Buff, j,level: int): TableRef[string,seq[string]] =
...
result = newTable[string,seq[string]]()
...
proc get_candidate_paths(filename: string, bf: object; rc=false): auto =
var
backs : ref Table[string,seq[string]]
...
backs = get_buffer_level(buff,j,0)
...
I would still love to hear if I'm doing something clearly stupid...
The difference between Table[A,B] and TableRef[A,B] (and the init vs. new call to instantiate them) is that the former is a value type, while the latter is a reference type. Using Table[A,B] can save you a level of indirection if you embed the table in another data structure. It also means that assigning one Table[A,B] instance to another will actually copy the table, while assigning a TableRef[A,B] will not. This difference exists because, simply put, Nim is a system programming language, where controlling the memory layout is often necessary.
You do need to specify the type parameters when declaring an instance of either or when calling initTable[A,B]() or newTable[A,B] (which you seem to have figured out), since the compiler can't guess them. However, when calling a procedure on an existing table instance, the compiler can usually infer them, and you can omit them. I agree that the error message could be improved. :)
Thanks, Jehan. Q: do copies only occur on assignments? When is this a performance hit in practice?
One other question, off this topic: Is there a set-like data structure that can be used to hold strings? I have been using a StringTable with the values set to nil, but it seems rather hack-ish. For one thing, it doesn't seem to be possible to remove keys I've previously inserted...
In general, copying occurs on assignment, not when you pass the value to a procedure (otherwise, even a simple membership test would be very expensive).
For sets, use the sets module. It defines a type HashSet[T] that you can instantiate with initSet[string] for a set of strings. It has, among other things, the usual incl and excl operations for sets.
The problem with the tables module is that there's a whole lot of code duplication going on (because essentially each TableRef procedure delegates to the matching Table procedure). The plan is to have this automated away by making the compiler automatically dereference the argument so that you can use either a Table or TableRef instance (this is actually implemented, but requires the experimental switch to activate). Once this is fully done, getting rid of the duplicated code without creating some weird regressions may become problematic.
For this reason, sets don't have a ref variant (yet). Unfortunately, this means that you may need to work around that issue. Example:
import sets
{.experimental.}
proc box*[T](x: T): ref T =
new result
result[] = x
var s = box initSet[string]()
var s2 = box initSet[string]()
incl s, "foo"
echo "foo" in s
excl s, "foo"
echo "foo" in s
incl s2, "bar"
let t = box(s + s2[])
for x in t:
echo x
The box procedure is a simple way to put an arbitrary value inside a ref. For the most part, with {.experimental.}, auto-dereferencing works normally. Only when multiple arguments need to be dereferenced (such as for the union operation) do you have to do it manually (and turn the result back into a ref).
Note also that often you may not want a reference. For example:
type
Filter* = ref object
positives, negatives: HashSet[string]
Here, we already have two sets inside a ref object. Adding another level of indirection would not get us anything; it would actually incur overhead. You get a benefit out of references when you actually copy sets. Most of the time, sets are created once and then modified or queried in place. Remember that when you pass a set as an argument to a procedure, only a pointer will be passed internally (this is also why you can't change non-var arguments – namely so that you don't get aliasing problems).