nimforum mirror - How to pass around large immutable data? Copy-on-Write?

alexeypetrushin (orginal) [2021-01-10T01:35:12+01:00] view original

As far as I know Nim in some cases can use smart Copy-on-Write, so you can write simple code without the references and it will be efficient.

I wrote a data processing example, with large chunks of immutable data (stock prices) passed around. The only non-immutable part is the cache for fast, lazy, on-demand loading. So, ideally - there shouldn't be much copying as the data is immutable.

The code shouldn't be affected by the size of the data chunks, unless there are lots of copying. When the N increased - it becomes much slower. Why? Should the refs be used to make it faster?

playground

import tables, strformat, sequtils, sugar

# The code would take much longer to run if N changed to 100_000, although
# theoretically it shoulnd't affect the execution time (unless there are lots of copying).
const N = 1000

type
  Time = tuple[year: int, month: int, day: int, hour: int, minute: int, second: int]
  
  StockPrices* = object
    symbol*: string
    prices*: seq[(Time, float)]

proc load_stock_prices_slow*(symbol: string): StockPrices =
  StockPrices(
    symbol: symbol,
    prices: (1..N).to_seq.map((i) => ((2000,1,1,1,1,i), 1.0))
  )

# 1 should `ref` be used as table value?
var prices_cache = init_table[string, StockPrices]()

# 2 should `ref` be used for `proc` return type?
proc get_prices*(symbol: string): StockPrices =
  if symbol notin prices_cache:
    prices_cache[symbol] = load_stock_prices_slow(symbol)
  prices_cache[symbol]

# 3 should `ref` be used for `prices` argument?
proc find_price_anomalies(prices: seq[(Time, float)]): bool = false

# Usage - the `get_prices` functino called many times in different places.
for _ in 1..1000:
  for i in 1..100:
    let symbol = fmt"S{i}"
    # 4 should `ref` be used here?
    let prices = get_prices(symbol)
    discard prices.prices[2]
    discard find_price_anomalies(prices.prices)

echo "done"

alexeypetrushin (orginal) [2021-01-10T02:38:30+01:00] view original

Changing to ref improves the performance for large N = 100_000.

...
  StockPrices* = ref object
...

ElegantBeef (orginal) [2021-01-10T09:11:16+01:00] view original

I could be wrong but the NVRO doesnt kick in, simply cause the StockPrices type is not a big type, it's a fixed size of 16 bytes, so normal copy semantics are used instead of treating it as a reference.

mratsim (orginal) [2021-01-10T09:12:02+01:00] view original


StockPrices* {.shallow.}= object

will avoid the double indirection involved in ref + strings or ref + seq

snej (orginal) [2021-01-11T03:32:16+01:00] view original

I don’t think Nim does COW automatically. Types like seq and string do it, but that’s part of the implementation of the type: internally they contain a reference to a shared heap-based array.

You may be thinking of the way that values passed as function parameters aren’t copied; this is safe because a function can’t mutate its parameters.

But this doesn’t work for return values. If get_prices returned a pointer to a StockPrices object, then the caller might mutate it and affect the object stored in the cache.

You can declare the return value of get_prices as lent; under the hood this will cause it to return a pointer. But that makes the returned value immutable, and in any case when you assign it to another object, that’s always a copy.

Araq (orginal) [2021-01-11T09:35:21+01:00] view original

Types like seq and string do it...

Actually they don't. IMHO esp. string should do it but there is not much agreement on the corresponding RFC.

More importantly: Nim does allow for COW to be implemented for custom types via the =hook mechanisms. (Which is yet another reason why these new features are excellent.)

alexeypetrushin (orginal) [2021-01-11T13:59:03+01:00] view original

Types like seq and string do it...
Actually they don't....

Yea, seems like so, made similar benchmark for seq, (for table results are similar). Couple orders of magnitude faster with ref in proc get_prices*(): ref seq[float] =.

const N = 100_000

proc load_prices_slow*(): seq[float] =
  for i in 1..N: result.add 1.0

var cached_stock_prices: ref seq[float] = nil
proc get_prices*(): seq[float] =
  if cached_stock_prices.is_nil:
    cached_stock_prices.new
    cached_stock_prices[] = load_prices_slow()
    echo "loaded"
  cached_stock_prices[]

# Usage - the `get_prices` functino called many times in different places.
echo "starting calculations"
for _ in 1..1000:
  for i in 1..100:
    discard get_prices()[i]

echo "done"

mratsim (orginal) [2021-01-11T18:24:33+01:00] view original

If your buffer is immutable during the time of computation, you can use a (copying) seq, but you should pass to the compute proc an openarray or ptr UncheckedArray so there is no GC or copy involved.

This is an example of a decently fast data structure for finance: https://github.com/mratsim/weave/blob/5034793/benchmarks/black_scholes/weave_black_scholes.nim#L36-L65

type
  OptionKind = enum
    Put
    Call
  
  OptionData[T: SomeFloat] = object
    spot: T     # Spot price
    strike: T   # Strike price
    riskfree: T # risk-free rate
    divrate: T  # dividend rate
    vol: T      # volatility
    expiry: T   # expiry to maturity or option expiration in years
                # (1 year = 1.0, 6 months = 0.5)
    kind: OptionKind
    divvals: T  # Dividend values (not used in this test)
    dgrefval: T # DerivaGem reference value
  
  Context[T: SomeFloat] = object
    data: ptr UncheckedArray[OptionData[T]]
    prices: ptr UncheckedArray[T]
    numOptions: int
    numRuns: int
    
    otype: ptr UncheckedArray[OptionKind]
    spot: ptr UncheckedArray[T]
    strike: ptr UncheckedArray[T]
    riskFreeRate: ptr UncheckedArray[T]
    volatility: ptr UncheckedArray[T]
    expiry: ptr UncheckedArray[T]
    numErrors: int

Nowadays I would use {.experimental: "views".} and store openarrays but this was written a long time ago.

alexeypetrushin (orginal) [2021-01-11T19:37:52+01:00] view original

If your buffer is immutable during the time of computation, you can use a (copying) seq, but you should pass to the compute proc an openarray or ptr UncheckedArray so there is no GC or copy involved.

Thanks, I'll keep that in mind and use for some heavy simulations. But for ordinary day-to-day code ref and {.shallow.} feels easier. :)

mratsim (orginal) [2021-01-11T19:56:15+01:00] view original

I hope with first class views we make toOpenArray(seq, start, stop) more ergonomic.

alexeypetrushin (orginal) [2021-01-15T09:10:19+01:00] view original

I watched the Move semantics for Nim talk.

It seems that the part of the problem in the previous code was the mutable cache table, so compiler was unable to optimise the code with move/sink/lent.

I simplified the example and changed cache to be immutable, so the compiler should be able to track the usage of the prices variable and optimize it. But the code is still slow. Manually marking get_prices with lent improves it.


import sequtils

let prices = (1..100_000).to_seq.map(proc (i: int): float = i.float)
proc get_prices: seq[float] = prices

for _ in 1..100_000:
  discard get_prices()[2]

Araq (orginal) [2021-01-15T09:24:04+01:00] view original

Note that the compiler does not perform "lent" inference and the existing "sink" inference had to be disabled -- separate compilation put a spoke in the wheel. Fortunately the resulting language is still perfectly usable and no new type modifiers are planned either, so it doesn't get worse than that. (Famous last words...)

alexeypetrushin (orginal) [2021-01-15T13:07:00+01:00] view original

If you copy seqs around everywhere, how exactly is it a "1 to 1" conversion? Different languages have different idioms and code like (1..100_000).to_seq.map is simply bad, sorry.

I mean - the code on the surface looks almost identical to TypeScript (and if rewritten in say Kotlin or Ruby it would also be identical). How it's actually works - like if values are copied or passed by reference - that could be different.

ynfle (orginal) [2021-01-15T14:26:01+01:00] view original

You could try a library like ` zero_functional https://github.com/zero-functional/zero-functional`_ that have a much more efficient implementation of functional concepts and coding paradigms than sequtils

jasonfi (orginal) [2021-01-15T14:45:31+01:00] view original

The github link in your comment is broken.

alexeypetrushin (orginal) [2021-01-15T18:07:35+01:00] view original

... code like (1..100_000).to_seq.map is simply bad, sorry.

Hmmm, I have different view on performance.

That code would produce ~x2 memory and ~x1-4 lower speed compared to plain for-loop/iterators/macros etc. But, it won't change the complexity class, it's O(n) now, and the fastest possible version with for-loop will be also O(n).

Is it bad? For some low-level library functions or hot-spot code - yes, it would be bad.

Is it bad in our example? - No, change it and the overal speed improve maybe by 1% or even less.

Is it bad for like 95% of codebase of 95% of applications? I would say no. Make it faster - maybe in some really badly written software we'll notice some improvements, but most probably nobody notices the difference.

Now let's look at the really bad code. The lack of Copy-on-Memory turned code that on the surfice appears to be O(n) into totally different complexity class 0(n^2) - and, it's a hot-spot code.

proc get_prices: seq[float] = prices
for _ in 1..100_000:
  discard get_prices()[2]

What's the point of optimizing 1) "non hot spot, O(n)" code, when we have a 2) "hot spot, O(n^2)" code?

If we consider performance - we should start with the second part, that's where's the real problem - part N2.

ynfle (orginal) [2021-01-16T23:12:39+01:00] view original

When you are talking about O(n) that is time complexity but everyone else referring to space complexity. In this situation, the significant amount of time spent is in the space complexity part, not the time complexity. In general, time complexity is a completely theoretical issue of algorithm implementation. It doesn't take into account the specific language, compiler or the optimizations made by either of those. Two examples that come to mind are 1) that accessing successive elements from the beginning of an array or a linked list is the same time complexity but array access can be optimized because it is continuous memory so the CPU can grab chunks of it ahead of time. (IIRC this computerphile video speaks about it). 2) In Julia (I think other languages may do this also) create an O(1) computation for some mathematical loops (like the sum of 1..n = n(n+1)/2 and the similar sum of squares and cubes) which would O(n) if implemented directly.

xigoi (orginal) [2021-01-17T08:41:21+01:00] view original

I think you're refering to memory locality, not space complexity.

alexeypetrushin (orginal) [2021-01-17T12:58:10+01:00] view original

... In this situation, the significant amount of time spent is in the space complexity part ...

I re-wrote same code in ruby, according to your statement it should be even slower, as Ruby has no memory locality optimization at all. But Ruby version is blazing fast, which suggest that it's not the memory locality that causes problem here.


$prices = (1..100_000).map { |i| i.to_f }

def get_prices() $prices; end

(1..100_000).each do
  get_prices()[2]
end

Mirror of forum.nim-lang.org

7364 :: How to pass around large immutable data? Copy-on-Write?