nimforum mirror - Should a function that's called many times return Table or ref Table?

alexeypetrushin (orginal) [2020-09-10T14:55:01+02:00] view original

I have a function returning table of companies that's called many times to get information about different companies:


echo companies()["MSFT"].name
echo companies()["INTC"].name
...

How the result of companies function should be defined, as Value or Reference?


proc companies(): Table[string, Company] = ...

or as


proc companies(): ref Table[string, Company] = ...

I guess it should be defined as ref Table[string, Company] because otherwise the whole huge table containing all the companies will be copied every time when the function is called, right?

Full code


import tables

proc to_ref*[T](o: T): ref T =
  result.new
  result[] = o

type
  Company* = object
    symbol*:      string # MSFT
    name*:        string # Microsoft
    # ... many more fields

var cached_companies: ref Table[string, Company]

proc companies*(): Table[string, Company] =
  if cached_companies == nil:
    cached_companies = {
      "MSFT": Company(name: "Microsoft", symbol: "MSFT")
      # Couple hundreds more companies ....
    }.to_table.to_ref
  cached_companies[]

# Many thousands calls to get details about
# different companies
for i in 1..3:
  echo companies()["MSFT"]

Stefan_Salewski (orginal) [2020-09-10T15:21:46+02:00] view original

From

https://nim-lang.org/docs/tables.html

For consistency with every other data type in Nim these have value semantics, this means that = performs a copy of the hash table.
For ref semantics use their Ref variants:

I would assume that your assumption is true.

But I wonder if you have calls like


echo companies()["MSFT"]

would it not be good enough when to companies() the query string like "MSFT" is passed and it returns just that entry. Procs that returns whole tables, where the table itself is a global var are not that often used.

archnim (orginal) [2020-09-10T15:23:26+02:00] view original

A very simple method is to call the proc once, and store its result in a var. Then you'll be able to wrtite: companies["anId"].name

alexeypetrushin (orginal) [2020-09-10T15:47:21+02:00] view original

A very simple method is to call the proc once, and store its result in a var

Yes, but it will be loaded eagerly, I try to load it lazy, as I have many different data tables and not all of them always used.

alexeypetrushin (orginal) [2020-09-10T15:50:26+02:00] view original

would it not be good enough when to companies() the query string like "MSFT" is passed and it returns just that entry. Procs that returns whole tables, where the table itself is a global var are not that often used.

Yes, i was thinking about it, but I also need to iterate over all companies sometimes for symbol in companies().keys: ...

alexeypetrushin (orginal) [2020-09-10T15:58:28+02:00] view original

of course the compiler is free to optimize it, so maybe there is no copy of large data blocks involved?

That would be a very useful optimisation :)

Araq (orginal) [2020-09-10T17:04:26+02:00] view original

Use TableRef for this case but your design itself is not good. For example, it's immediately not thread-safe.

shirleyquirk (orginal) [2020-09-10T17:11:09+02:00] view original

If cached_companies never needs to be written to this is the perfect situation for lent

and {.global.} is how to keep global variables safe within the scope of the proc that deals with them

proc companies()*: lent Table[string,Company] =
  var loaded {.global.}= false;
  var cached_companies {.global.} :Table[string,Company]
  if not loaded:
    cached_companies =   {
      "MSFT": Company(name: "Microsoft", symbol: "MSFT")
      }.to_table
    loaded = true
  cached_companies

companies() here returns an immutable view into cached_companies, without copying

snej (orginal) [2020-09-10T18:48:14+02:00] view original

FYI, that code isn't thread-safe — it's possible that two threads enter the body of the if statement, which would cause concurrent modifications to cached_companies.

To remedy that you'd need something equivalent to C++'s call_once, which lets only the first caller enter the lambda, all others block until it completes. I don't know if this exists in Nim.

shirleyquirk (orginal) [2020-09-10T20:07:22+02:00] view original

oof, good catch

euant (orginal) [2020-09-10T21:33:25+02:00] view original

There's nothing as far as I'm aware of in the stdlib like call_once, but something like the following seems to work:


import atomics

when compileOption("threads"):
  import locks

when compileOption("threads"):
  type OnceBase = object of RootObj
    done: Atomic[uint32]
    lock: Lock
else:
 type OnceBase = object of RootObj
    done: Atomic[uint32]

type Once*[TCallback] = ref object of OnceBase
    cb: TCallback

proc newOnce*[TCallback](cb: TCallback): Once[TCallback] =
  new(result)
  store(result.done, uint32(0))
  result.cb = cb
  when compileOption("threads"):
    initLock(result.lock)

proc doRun(self: Once) =
  when compileOption("threads"):
    self.lock.acquire()
  
  try:
    if load(self.done) == 0'u32:
      try:
        self.cb()
      finally:
         store(self.done, uint32(1))
  finally:
    when compileOption("threads"):
      self.lock.release()

proc run*(self: Once) =
  if load(self.done) == 0'u32:
    self.doRun()

Usage:


proc test() =
  echo "test called from thread: ", getThreadId()

var
  onceInstance = newOnce(test)
  thr: array[0..4, Thread[void]]

proc threadProc() {.thread.} =
  onceInstance.run()

for i in 0..high(thr):
  createThread(thr[i], threadProc)

joinThreads(thr)

shirleyquirk (orginal) [2020-09-10T22:40:31+02:00] view original

i came back to suggest {.threadvar.} instead of {.global.} but that's way better +1

Mirror of forum.nim-lang.org

6796 :: Should a function that's called many times return Table or ref Table?