nimforum mirror - Noob question - table of seqs, new vs. init...

rozovr (orginal) [2015-05-17T10:24:36+02:00] view original

I am a newcomer to nim coming from Python (& some C) , and am having trouble getting my head around when to use new vs. init, and returning refs vs. objects. I am trying to covert some python code to nim to improve performance, and was hoping someone could help me understand where I'm going wrong. My problem arises in 2 functions. The first creates & returns tables of seqs, where the keys are substrings and the values are all strings having that substring:


proc get_buffer_level(buff: Buff, j,level: int): TableRef =
    var inv: string
    result = initTable[string,seq[string]]()
    for kmer in buff.levels[level].keys:
        inv = kmer[j+level .. k-level-1]
        
        if hasKey(result, inv):
            result[inv].add(kmer)
        else:
            result[inv] = @[kmer]

In the second, I want to use the tables from the first as follows:


proc get_candidate_paths(filename: string, bf: object; rc=false): auto =
    var
        f_hand = open(filename)
        cands = newStringTable()
        line_no = 0
        read: string
        kmers: array[0..read_len-k+1, string]
        buff = get_empty_buff(j)
        backs = newTable[string,seq[string]]()
        fronts = newTable[string,seq[string]]()
    
    for line in f_hand.lines:
        if (line_no + 1) mod 10_000==0:
            echo($(line_no+1) & " " & $len(cands))
        read = strip(line)
        if rc:
            read = get_rc(read)
        get_kmers(read, k, kmers)
        init_read_buff(kmers[0], buff, bf)
        # print_buff_info(buff)
        for ind, value in @kmers:
            backs = get_buffer_level(buff,j,0)
            fronts = get_buffer_level(buff,j,j)
...

When I try to compile, I get <my_file.nim>(246, 39) Info: template/generic instantiation from here <my_file.nim>(182, 37) Error: cannot instantiate: 'TableRef'

where the first line number is referring to the call to the second function, and the the second is the line "backs = get_buffer_level(buff,j,0)" I would appreciate any guidance on where I'm going wrong. Many thanks

rozovr (orginal) [2015-05-17T15:46:22+02:00] view original

I managed to resolve some of the issues by looking at more examples - I wasn't as successful with the docs :(. If others are interested, here's the skeleton of the current compiling version:


proc get_buffer_level(buff: Buff, j,level: int): TableRef[string,seq[string]] =
   
   ...
    result = newTable[string,seq[string]]()
    
    ...

proc get_candidate_paths(filename: string, bf: object; rc=false): auto =
    var
        backs : ref Table[string,seq[string]]
    ...
     backs = get_buffer_level(buff,j,0)
    ...

I would still love to hear if I'm doing something clearly stupid...

Jehan (orginal) [2015-05-17T17:04:08+02:00] view original

The difference between Table[A,B] and TableRef[A,B] (and the init vs. new call to instantiate them) is that the former is a value type, while the latter is a reference type. Using Table[A,B] can save you a level of indirection if you embed the table in another data structure. It also means that assigning one Table[A,B] instance to another will actually copy the table, while assigning a TableRef[A,B] will not. This difference exists because, simply put, Nim is a system programming language, where controlling the memory layout is often necessary.

You do need to specify the type parameters when declaring an instance of either or when calling initTable[A,B]() or newTable[A,B] (which you seem to have figured out), since the compiler can't guess them. However, when calling a procedure on an existing table instance, the compiler can usually infer them, and you can omit them. I agree that the error message could be improved. :)

rozovr (orginal) [2015-05-17T20:25:44+02:00] view original

Thanks, Jehan. Q: do copies only occur on assignments? When is this a performance hit in practice?

One other question, off this topic: Is there a set-like data structure that can be used to hold strings? I have been using a StringTable with the values set to nil, but it seems rather hack-ish. For one thing, it doesn't seem to be possible to remove keys I've previously inserted...

def (orginal) [2015-05-17T21:19:06+02:00] view original

http://nim-lang.org/docs/sets.html

Jehan (orginal) [2015-05-17T21:23:10+02:00] view original

In general, copying occurs on assignment, not when you pass the value to a procedure (otherwise, even a simple membership test would be very expensive).

For sets, use the sets module. It defines a type HashSet[T] that you can instantiate with initSet[string] for a set of strings. It has, among other things, the usual incl and excl operations for sets.

rozovr (orginal) [2015-05-17T21:38:05+02:00] view original

Thanks to you both. Re: sets, In the tutorial I saw they can only handle ordinals, which I took to mean didn't include strings - embarrassed now :(

def (orginal) [2015-05-17T21:39:22+02:00] view original

The confusion is understandable. There are two kinds of sets: The ones in system, which are bitsets for small ordinals, and the ones in the sets module, which work for any hashable type.

rozovr (orginal) [2015-05-17T23:11:43+02:00] view original

Okay, I started moving to sets. I am sure it will make for much cleaner code. Maybe it would make sense to rename the sets in system to avoid this confusion. Last question for the night: I noticed there is no use of new or refs in the sets module -- why the different design from Table? If I want to use refs in this case, is it just by the ref keyword? How would I replace the StringTables in this declaration: "newSeqWith(j+1, newStringTable())"?

Jehan (orginal) [2015-05-17T23:54:46+02:00] view original

The problem with the tables module is that there's a whole lot of code duplication going on (because essentially each TableRef procedure delegates to the matching Table procedure). The plan is to have this automated away by making the compiler automatically dereference the argument so that you can use either a Table or TableRef instance (this is actually implemented, but requires the experimental switch to activate). Once this is fully done, getting rid of the duplicated code without creating some weird regressions may become problematic.

For this reason, sets don't have a ref variant (yet). Unfortunately, this means that you may need to work around that issue. Example:

import sets

{.experimental.}

proc box*[T](x: T): ref T =
  new result
  result[] = x

var s = box initSet[string]()
var s2 = box initSet[string]()
incl s, "foo"
echo "foo" in s
excl s, "foo"
echo "foo" in s
incl s2, "bar"
let t = box(s + s2[])
for x in t:
  echo x

The box procedure is a simple way to put an arbitrary value inside a ref. For the most part, with {.experimental.}, auto-dereferencing works normally. Only when multiple arguments need to be dereferenced (such as for the union operation) do you have to do it manually (and turn the result back into a ref).

Note also that often you may not want a reference. For example:

type
  Filter* = ref object
    positives, negatives: HashSet[string]

Here, we already have two sets inside a ref object. Adding another level of indirection would not get us anything; it would actually incur overhead. You get a benefit out of references when you actually copy sets. Most of the time, sets are created once and then modified or queried in place. Remember that when you pass a set as an argument to a procedure, only a pointer will be passed internally (this is also why you can't change non-var arguments – namely so that you don't get aliasing problems).

Mirror of forum.nim-lang.org

1233 :: Noob question - table of seqs, new vs. init...