Hello world !
What is the most direct way to assemble all the keys of a table in a seq ? I mean, is there a proc in the std lib that does it ?
Concerning the proc sample in the random module, it doesn't apply to a table. Is there a proc elsewhere that does ?
There's no stdlib proc that does this "directly", but you can of course always compose different things together :)
One simple way to do what you want is:
import std/[tables, sequtils, random]
# initialize the random seed so we get different results each time
randomize()
let myTable = {"a": "hello", "b": "world", "c": "forum"}.toTable()
# toSeq takes an iterator and saves its results into a seq
let keys = toSeq(myTable.keys)
# get random value
echo myTable[sample(keys)]
what about collect (you need to import sugar) and iteration?
import std/tables, sugar
var a = {
'y': 1,
'm': 2,
'c': 3,
'a': 4,
}.toOrderedTable
let keyseq = collect(newSeq):
for key in a.keys: key
assert keyseq == @['y','m','c','a']
actually I like
let keys = toSeq(myTable.keys)
even more, I was writing my reply before seeing your solution You can also sample many values with Yardanico's 2nd idea with no unneeded string allocations if you build a sets.HashSet[int] or intsets.IntSet or even set if Table.len < 65536. Call it nums. All allow a fast-ish i in nums test.
That is for sampling without replacement. For sampling with replacement you probably want to make nums into a tables.CountTable and repeat myTable[key] however many duplicates the count of random indices implies.
Depending upon how you are using the samples, you might also structure things as an iterator just yield -ing sample values instead of creating a seq.
You can do it like this, nice and clean.
Nim
let keys = mytable.keys
let sample = mytable.values.shuffle.take(10)
With some helpers.
Nim
proc keys*[K, V](table: Table[K, V] | ref Table[K, V]): seq[K] =
for k in table.keys: result.add k
proc values*[K, V](table: Table[K, V] | ref Table[K, V]): seq[V] =
for v in table.values: result.add v
proc take*[T](s: openarray[T], n: int): seq[T] =
if n < s.len: s[0..(n - 1)] else: s.to_seq
func shuffle*[T](list: openarray[T], seed = 1): seq[T] =
var rand = random.init_rand(seed)
result = list.to_seq
random.shuffle(rand, result)
func shuffle*[T](list: openarray[T], rand: random.Rand): seq[T] =
result = list.to_seq
random.shuffle(rand, result)
Of course there's overhead doing it like that, but I don't care about performance (except in 1% codebase, the bottleneck code).
I think this is neither clearer nor nicer than let keys = toSeq(myTable.keys); let smpl = myTable[sample(keys)]. At the very least it wastes memory and allocations (seq->seq->seq) in the process. Instead of bunch of helper procs which may never be reused I'd focus on solving the issue as it's stated.
import std/[random, tables, intsets, math]
let myTable = {"a": "hello", "b": "world", "c": "forum", "d": "bye", "e": "cu"}.toTable()
proc sampleValues[K, V](t: Table[K, V]; size: Natural = 1): seq[V] =
## Returns a random seq with a set of values of table `t`
var i = size
assert i <= t.len() # replace i with `min(len, size)` to force never failing
let high = t.len()-1
let lenSqrt = floor(sqrt(float(t.len()))).Natural()
var sampleIds = initIntSet()
if size <= lenSqrt:
# Dumb rejection sampling. Perhaps don't use this for crypto
while i > 0:
let sample = rand(high)
if not sampleIds.containsOrIncl(sample):
i.dec(1)
else:
var sampleIdSeq = newSeqOfCap[int](i)
for id in 0..high:
sampleIdSeq.add(id)
shuffle(sampleIdSeq)
sampleIdSeq.setLen(i)
sampleIds = toIntSet(sampleIdSeq)
i = 0
for v in values(t): # i == 0
if i in sampleIds:
result.add(v)
i.inc()
#debugEcho sampleIds
randomize()
echo myTable.sampleValues(1)
echo myTable.sampleValues(5)
I think this is neither clearer nor nicer than let keys = toSeq(myTable.keys)
That's an old story about red/blue functions and it's already being discussed here.
At the very least it wastes memory and allocations
Of course, and I said so. I did that consciously, to sacrifice performance in non-critical paths and gain in reducing cost of development and ease maintenance.
Instead of bunch of helper procs which may never be reused
True, but not in my case. As I use my own nim std, which is designed in that way (basically a clone of Ruby STD library). It was surprisingly easy to implement and has very little code, I spent maybe a two or tree days to write it. So, in my case, I constantly do reuse these (and others) helpers.
I make no claim to this being the ultimate best solution, but the direction of this thread seems more concrete than my earlier vague comments. A simple seq will actually do with some wasted space (probably a favorable space-time trade-off.) So, presented purely for your consideration/to make those more concrete/expand The Menu:
import std/[random, tables, sugar]
iterator samples[K, V](t: Table[K, V]; size=1): (K, V) =
## Yields a random sample (with replacement) of pairs
## in table `t`. (This assumes keys are unique, not
## using `.add` multi-map functionality.)
var reps = newSeq[int](t.len) # Fill w/size samples
for i in 1..size: #..of range 0..<t.len.
reps[rand(t.len - 1)].inc
var i = 0 # Yield samples w/correct
for (k, v) in pairs(t): #..repeated value count.
for j in 1..reps[i]:
yield (k, v) # Repeats do always appear together
inc i #..BUT order is also Table-order --
#..only as random as hash is good.
randomize()
let myTable = {"a": "hello", "b": "world", "c": "forum",
"d": "bye", "e": "cu"}.toTable()
for (k, v) in myTable.samples(4):
echo v
let someKeys = collect(newSeq):
for (k, v) in myTable.samples(4): k
echo "someKeys: ", someKeys
let someVals = collect(newSeq):
for (k, v) in myTable.samples(4): v
echo "someVals: ", someVals
If one were doing this a lot, then one could have the iterator take a var seq and zero it each time to realize zero allocations.
In terms of what space is allocated, that can be reduced by using a newSeq[uint8]. The width of that counter can be based on the requested sample size since you cannot have a value > size show up in a reps slot. Even with newSeq[int], it should already be a small fraction of the space used to store a Table[string,string]. ( @archnim never even said what types he had in his Table though or if sampling was with/without replacement. )
As a terminology point that might help someone someday, I try to use the terms "random sample" for a "with replacement style" and "random subset" for the "without replacement" style. The lack of care in the wider world will probably always leave a need for clarification, though.