I have a function returning table of companies that's called many times to get information about different companies:
echo companies()["MSFT"].name
echo companies()["INTC"].name
...
How the result of companies function should be defined, as Value or Reference?
proc companies(): Table[string, Company] = ...
or as
proc companies(): ref Table[string, Company] = ...
I guess it should be defined as ref Table[string, Company] because otherwise the whole huge table containing all the companies will be copied every time when the function is called, right?
Full code
import tables
proc to_ref*[T](o: T): ref T =
result.new
result[] = o
type
Company* = object
symbol*: string # MSFT
name*: string # Microsoft
# ... many more fields
var cached_companies: ref Table[string, Company]
proc companies*(): Table[string, Company] =
if cached_companies == nil:
cached_companies = {
"MSFT": Company(name: "Microsoft", symbol: "MSFT")
# Couple hundreds more companies ....
}.to_table.to_ref
cached_companies[]
# Many thousands calls to get details about
# different companies
for i in 1..3:
echo companies()["MSFT"]
From
https://nim-lang.org/docs/tables.html
For consistency with every other data type in Nim these have value semantics, this means that = performs a copy of the hash table.
For ref semantics use their Ref variants:
I would assume that your assumption is true.
But I wonder if you have calls like
echo companies()["MSFT"]
would it not be good enough when to companies() the query string like "MSFT" is passed and it returns just that entry. Procs that returns whole tables, where the table itself is a global var are not that often used.
A very simple method is to call the proc once, and store its result in a var
Yes, but it will be loaded eagerly, I try to load it lazy, as I have many different data tables and not all of them always used.
would it not be good enough when to companies() the query string like "MSFT" is passed and it returns just that entry. Procs that returns whole tables, where the table itself is a global var are not that often used.
Yes, i was thinking about it, but I also need to iterate over all companies sometimes for symbol in companies().keys: ...
of course the compiler is free to optimize it, so maybe there is no copy of large data blocks involved?
That would be a very useful optimisation :)
If cached_companies never needs to be written to this is the perfect situation for lent
and {.global.} is how to keep global variables safe within the scope of the proc that deals with them
proc companies()*: lent Table[string,Company] =
var loaded {.global.}= false;
var cached_companies {.global.} :Table[string,Company]
if not loaded:
cached_companies = {
"MSFT": Company(name: "Microsoft", symbol: "MSFT")
}.to_table
loaded = true
cached_companies
companies() here returns an immutable view into cached_companies, without copying
FYI, that code isn't thread-safe — it's possible that two threads enter the body of the if statement, which would cause concurrent modifications to cached_companies.
To remedy that you'd need something equivalent to C++'s call_once, which lets only the first caller enter the lambda, all others block until it completes. I don't know if this exists in Nim.
There's nothing as far as I'm aware of in the stdlib like call_once, but something like the following seems to work:
import atomics
when compileOption("threads"):
import locks
when compileOption("threads"):
type OnceBase = object of RootObj
done: Atomic[uint32]
lock: Lock
else:
type OnceBase = object of RootObj
done: Atomic[uint32]
type Once*[TCallback] = ref object of OnceBase
cb: TCallback
proc newOnce*[TCallback](cb: TCallback): Once[TCallback] =
new(result)
store(result.done, uint32(0))
result.cb = cb
when compileOption("threads"):
initLock(result.lock)
proc doRun(self: Once) =
when compileOption("threads"):
self.lock.acquire()
try:
if load(self.done) == 0'u32:
try:
self.cb()
finally:
store(self.done, uint32(1))
finally:
when compileOption("threads"):
self.lock.release()
proc run*(self: Once) =
if load(self.done) == 0'u32:
self.doRun()
Usage:
proc test() =
echo "test called from thread: ", getThreadId()
var
onceInstance = newOnce(test)
thr: array[0..4, Thread[void]]
proc threadProc() {.thread.} =
onceInstance.run()
for i in 0..high(thr):
createThread(thr[i], threadProc)
joinThreads(thr)