As far as I know Nim in some cases can use smart Copy-on-Write, so you can write simple code without the references and it will be efficient.
I wrote a data processing example, with large chunks of immutable data (stock prices) passed around. The only non-immutable part is the cache for fast, lazy, on-demand loading. So, ideally - there shouldn't be much copying as the data is immutable.
The code shouldn't be affected by the size of the data chunks, unless there are lots of copying. When the N increased - it becomes much slower. Why? Should the refs be used to make it faster?
import tables, strformat, sequtils, sugar
# The code would take much longer to run if N changed to 100_000, although
# theoretically it shoulnd't affect the execution time (unless there are lots of copying).
const N = 1000
type
Time = tuple[year: int, month: int, day: int, hour: int, minute: int, second: int]
StockPrices* = object
symbol*: string
prices*: seq[(Time, float)]
proc load_stock_prices_slow*(symbol: string): StockPrices =
StockPrices(
symbol: symbol,
prices: (1..N).to_seq.map((i) => ((2000,1,1,1,1,i), 1.0))
)
# 1 should `ref` be used as table value?
var prices_cache = init_table[string, StockPrices]()
# 2 should `ref` be used for `proc` return type?
proc get_prices*(symbol: string): StockPrices =
if symbol notin prices_cache:
prices_cache[symbol] = load_stock_prices_slow(symbol)
prices_cache[symbol]
# 3 should `ref` be used for `prices` argument?
proc find_price_anomalies(prices: seq[(Time, float)]): bool = false
# Usage - the `get_prices` functino called many times in different places.
for _ in 1..1000:
for i in 1..100:
let symbol = fmt"S{i}"
# 4 should `ref` be used here?
let prices = get_prices(symbol)
discard prices.prices[2]
discard find_price_anomalies(prices.prices)
echo "done"
Changing to ref improves the performance for large N = 100_000.
...
StockPrices* = ref object
...
StockPrices* {.shallow.}= object
will avoid the double indirection involved in ref + strings or ref + seq
I don’t think Nim does COW automatically. Types like seq and string do it, but that’s part of the implementation of the type: internally they contain a reference to a shared heap-based array.
You may be thinking of the way that values passed as function parameters aren’t copied; this is safe because a function can’t mutate its parameters.
But this doesn’t work for return values. If get_prices returned a pointer to a StockPrices object, then the caller might mutate it and affect the object stored in the cache.
You can declare the return value of get_prices as lent; under the hood this will cause it to return a pointer. But that makes the returned value immutable, and in any case when you assign it to another object, that’s always a copy.
Types like seq and string do it...
Actually they don't. IMHO esp. string should do it but there is not much agreement on the corresponding RFC.
More importantly: Nim does allow for COW to be implemented for custom types via the =hook mechanisms. (Which is yet another reason why these new features are excellent.)
Types like seq and string do it...
Actually they don't....
Yea, seems like so, made similar benchmark for seq, (for table results are similar). Couple orders of magnitude faster with ref in proc get_prices*(): ref seq[float] =.
const N = 100_000
proc load_prices_slow*(): seq[float] =
for i in 1..N: result.add 1.0
var cached_stock_prices: ref seq[float] = nil
proc get_prices*(): seq[float] =
if cached_stock_prices.is_nil:
cached_stock_prices.new
cached_stock_prices[] = load_prices_slow()
echo "loaded"
cached_stock_prices[]
# Usage - the `get_prices` functino called many times in different places.
echo "starting calculations"
for _ in 1..1000:
for i in 1..100:
discard get_prices()[i]
echo "done"
If your buffer is immutable during the time of computation, you can use a (copying) seq, but you should pass to the compute proc an openarray or ptr UncheckedArray so there is no GC or copy involved.
This is an example of a decently fast data structure for finance: https://github.com/mratsim/weave/blob/5034793/benchmarks/black_scholes/weave_black_scholes.nim#L36-L65
type
OptionKind = enum
Put
Call
OptionData[T: SomeFloat] = object
spot: T # Spot price
strike: T # Strike price
riskfree: T # risk-free rate
divrate: T # dividend rate
vol: T # volatility
expiry: T # expiry to maturity or option expiration in years
# (1 year = 1.0, 6 months = 0.5)
kind: OptionKind
divvals: T # Dividend values (not used in this test)
dgrefval: T # DerivaGem reference value
Context[T: SomeFloat] = object
data: ptr UncheckedArray[OptionData[T]]
prices: ptr UncheckedArray[T]
numOptions: int
numRuns: int
otype: ptr UncheckedArray[OptionKind]
spot: ptr UncheckedArray[T]
strike: ptr UncheckedArray[T]
riskFreeRate: ptr UncheckedArray[T]
volatility: ptr UncheckedArray[T]
expiry: ptr UncheckedArray[T]
numErrors: int
Nowadays I would use {.experimental: "views".} and store openarrays but this was written a long time ago.
If your buffer is immutable during the time of computation, you can use a (copying) seq, but you should pass to the compute proc an openarray or ptr UncheckedArray so there is no GC or copy involved.
Thanks, I'll keep that in mind and use for some heavy simulations. But for ordinary day-to-day code ref and {.shallow.} feels easier. :)
I watched the Move semantics for Nim talk.
It seems that the part of the problem in the previous code was the mutable cache table, so compiler was unable to optimise the code with move/sink/lent.
I simplified the example and changed cache to be immutable, so the compiler should be able to track the usage of the prices variable and optimize it. But the code is still slow. Manually marking get_prices with lent improves it.
import sequtils
let prices = (1..100_000).to_seq.map(proc (i: int): float = i.float)
proc get_prices: seq[float] = prices
for _ in 1..100_000:
discard get_prices()[2]
If you copy seqs around everywhere, how exactly is it a "1 to 1" conversion? Different languages have different idioms and code like (1..100_000).to_seq.map is simply bad, sorry.
I mean - the code on the surface looks almost identical to TypeScript (and if rewritten in say Kotlin or Ruby it would also be identical). How it's actually works - like if values are copied or passed by reference - that could be different.
... code like (1..100_000).to_seq.map is simply bad, sorry.
Hmmm, I have different view on performance.
That code would produce ~x2 memory and ~x1-4 lower speed compared to plain for-loop/iterators/macros etc. But, it won't change the complexity class, it's O(n) now, and the fastest possible version with for-loop will be also O(n).
Is it bad? For some low-level library functions or hot-spot code - yes, it would be bad.
Is it bad in our example? - No, change it and the overal speed improve maybe by 1% or even less.
Is it bad for like 95% of codebase of 95% of applications? I would say no. Make it faster - maybe in some really badly written software we'll notice some improvements, but most probably nobody notices the difference.
Now let's look at the really bad code. The lack of Copy-on-Memory turned code that on the surfice appears to be O(n) into totally different complexity class 0(n^2) - and, it's a hot-spot code.
proc get_prices: seq[float] = prices
for _ in 1..100_000:
discard get_prices()[2]
What's the point of optimizing 1) "non hot spot, O(n)" code, when we have a 2) "hot spot, O(n^2)" code?
If we consider performance - we should start with the second part, that's where's the real problem - part N2.
... In this situation, the significant amount of time spent is in the space complexity part ...
I re-wrote same code in ruby, according to your statement it should be even slower, as Ruby has no memory locality optimization at all. But Ruby version is blazing fast, which suggest that it's not the memory locality that causes problem here.
$prices = (1..100_000).map { |i| i.to_f }
def get_prices() $prices; end
(1..100_000).each do
get_prices()[2]
end