nimforum mirror - how to properly init a Table of seq

tcheran (orginal) [2022-10-09T02:13:00+02:00] view original

Hello, I wish to use a Table of seq[int] to store values associated to some keys across different .txt files. I learn how many files I need to scan at runtime, so I was thinking to declare the size of seq of each table entry with:

newSeqUninitialized[int](n)

However it seems I need an additional step of seq init to make it work

import tables
let
  files = @["first","second","third"]
  n = files.len
  emptyseq = newSeqUninitialized[int](n)
var
  table = initTable[string, newSeqUninitialized[int](n)]()
  i = 0
for file in files:
  # keys fetching stuff... now I need to store content of key1
  if not ("key1" in table):
    table["key1"] = emptyseq #without this extra init it fails with [KeyError]
  table["key1"][i] = 2*i
  i.inc
echo table #{"key1": @[0, 2, 4]}

Am I missing something, or do I need always to assign the whole (empty) seq the first time I insert a new key, in order to assign its elements, despite var table declaration? Thank you

ElegantBeef (orginal) [2022-10-09T02:27:30+02:00] view original

Yes you need to put the value in the table before operating on it. table[key][i] = is not equal to table[key] = @[]. In the first case there is no sequence to index on, since there is no entry for the key. The following is a slightly more elegant method of doing this.

if table.hasKeyOrPut(key, emptySeq):
    table["key1"][i] = 2*i

chancyk (orginal) [2022-10-09T03:00:08+02:00] view original

@ElegantBeef

Is there a way to overload [] to provide a default if the key doesn't exist? Something like:


proc `[]`(table: var Table[string, seq[int]], key: string): var seq[int] =
    if table.hasKey(key):
        result = table[key]
    else:
        result = newSeq[int]()

This fails with expression has no address.

ElegantBeef (orginal) [2022-10-09T04:25:13+02:00] view original

Yes you can, you need to add it to the table so the result has an address:

import std/tables

proc `[]`(table: var Table[string, seq[int]], key: string): var seq[int] =
  discard table.hasKeyOrPut(key, @[])
  tables.`[]`(table, key)


var a = initTable[string, seq[int]]()
echo a["hello"]
a["hello"].add 20
echo a["hello"]

chancyk (orginal) [2022-10-09T08:53:48+02:00] view original

Aha, that's how you solve the address problem. Thanks!

tcheran (orginal) [2022-10-09T08:55:44+02:00] view original

@ElegantBeef Thank you. Yes, the hasKeyOrPut solution is more compact and elegant.

Araq (orginal) [2022-10-09T09:33:45+02:00] view original

Is there a way to overload [] to provide a default if the key doesn't exist?

You really should not do that. It's a design mistake in other languages too.

cblake (orginal) [2022-10-09T11:15:30+02:00] view original

@tcheran - all you need is to change your main loop to this (NOTE: the i = 0 is also unneeded):

import tables
let
  files = @["first","second","third"]
  n = files.len
  initSeq = newSeqUninitialized[int](n)
var table = initTable[string, newSeqUninitialized[int](n)]()
for i, file in files:
  table.mgetOrPut("key1", initSeq)[i] = 2*i
echo table #{"key1": @[0, 2, 4]}

Other changes:

nix unneeded i; Nim's for loops are smart enough to have a pairs that works and is defined for seq[T] like your files

emptyseq -> initSeq since it is .len !=0 but all zero elements -- a somewhat rare situation

chancyk (orginal) [2022-10-09T18:26:57+02:00] view original

Good point! I just had overloading on the brain, but we shouldn't overload the brain.

tcheran (orginal) [2022-10-09T19:38:44+02:00] view original

@cblake I didn't know the pairs iterator trick... that's really nice. If I understand you point correctly, you suggest to not use newSeqUninitialized[int](n) because it's a bit exotic (maybe it will disappear in future Nim versions?), not so relevant peformance-wise, and newSeq[int](n) is a solid, reliable and well defined replacement. OK, suggestion accepted! Actually I tested the mgetOrPut solution with my real case that is more like this

table.mgetOrPut(key, emptySeq)[i].inc(1)
#increment the key counter at the file-related i column position

and worked beautifully. Thank you. Apparently also newSeqUninitialized[int](n) is filled with n x zeroed entries (and it's much longer to write!):

let
  n = 3
  a = newSeq[int](n)
  b = newSeqUninitialized[int](n) #and in addition it does not work with non int type
  c = newSeqOfCap[int](n)

echo a # len == n, all entries set to 0
echo b # len == n, all entries set to 0, too
echo c # len == 0

assert a == b

cblake (orginal) [2022-10-09T20:53:51+02:00] view original

Memory that a process gets from an OS is almost always zeroed (and sometimes is actually a copy-on-write all zero virtual memory page replicated however many times is needed, depending upon the OS).

However, what newSeqUninitialized in Nim gives you will depend upon the history of the memory in your process. Using uninitialized memory that happens to be 0 while you are testing but winds up being otherwise later is often considered one of the (many) "gotchas" of C/C++ programming.

You should probably not use newSeqUninitialized in Nim unless you really know what you are doing and it is an important optimization as revealed by profiling your code in the context of your problem.

Mirror of forum.nim-lang.org

9515 :: how to properly init a Table of seq