nimforum mirror - orderedTable slow for Least Recently Used cache?

ingo (orginal) [2025-03-12T10:39:13+01:00] view original

Two versions of an lru cache, the first one scrambled from bits and pieces from the web a few years ago. The second one from this morning when I thought, why not use the orderedTable for this? Now I don't know whether I should be disappointed or did I miss something essential (or is it a just stupid choice)? The orderedTable one is rather slow.

Some insight is appreciated.

Benchmark, rendering 512x512 cells of Worley noise, caching nearby neighbour feature points:

lrucache1 cache size: time 1: (seconds: 0, nanosecond: 992988300) 2: (seconds: 0, nanosecond: 948481100) 3: (seconds: 0, nanosecond: 974308700) 4: (seconds: 0, nanosecond: 956377000) 5: (seconds: 0, nanosecond: 973190400) 10: (seconds: 0, nanosecond: 461624800) 20: (seconds: 0, nanosecond: 468986500) 50: (seconds: 0, nanosecond: 468143900) 100: (seconds: 0, nanosecond: 473003100) 1000: (seconds: 0, nanosecond: 481897400) 5000: (seconds: 0, nanosecond: 470832200)

lrucache2 cache size: time 1: (seconds: 1, nanosecond: 65394100) 2: (seconds: 1, nanosecond: 96497500) 3: (seconds: 1, nanosecond: 98379700) 4: (seconds: 1, nanosecond: 172149600) 5: (seconds: 1, nanosecond: 224398200) 10: (seconds: 1, nanosecond: 188621100) 20: (seconds: 1, nanosecond: 604642700) 50: (seconds: 3, nanosecond: 18123300) 100: (seconds: 5, nanosecond: 737862600) 1000: (seconds: 7, nanosecond: 992100700) 5000: (seconds: 7, nanosecond: 604643100)

lrucache1.nim ---%<------%<------%<---


import tables, lists

type
  LRUCache*[K, V] = ref object
    capacity: int
    cache: Table[K, V]
    use: DoublyLinkedList[K]                # Track use order
    nodeMap: Table[K, DoublyLinkedNode[K]]  # Maps keys to list nodes


proc newLRUCache*[K, V](capacity: int): LRUCache[K, V] =
  result.new()
  result.capacity = capacity
  result.cache = initTable[K, V]()
  result.use = initDoublyLinkedList[K]()
  result.nodeMap = initTable[K, DoublyLinkedNode[K]]()


proc get*[K, V](cache: LRUCache[K, V], key: K): tuple[value: V, found: bool] =
  if cache.cache.hasKey(key):
    let node = cache.nodeMap[key]
    # Update use, move to front of list
    cache.use.remove(node)
    cache.use.prepend(node)
    return (cache.cache[key], true)
  else:
    var defaultVal: V
    return (defaultVal, false)


proc put*[K, V](cache: LRUCache[K, V], key: K, value: V) =
  # If key already exists, update it and move to front
  if cache.cache.hasKey(key):
    cache.cache[key] = value
    let node = cache.nodeMap[key]
    cache.use.remove(node)
    cache.use.prepend(node)
    return
  # Prune
  if cache.cache.len >= cache.capacity:
    let lruKey = cache.use.tail.value
    cache.cache.del(lruKey)
    cache.nodeMap.del(lruKey)
    cache.use.remove(cache.use.tail)
  cache.cache[key] = value
  let node = newDoublyLinkedNode[K](key)
  cache.use.prepend(node)
  cache.nodeMap[key] = node

---%<------%<------%<---

lrucache2.nim ---%<------%<------%<---


include tables

type
  LRUCache*[K, V] = ref object
    capacity: int
    cache: OrderedTable[K, V]


proc newLRUCache*[K, V](capacity: int): LRUCache[K, V] =
  new(result)
  result.capacity = capacity
  result.cache = initOrderedTable[K, V]()


proc get*[K, V](cache: LRUCache[K, V], key: K): tuple[value: V, found: bool] =
  if cache.cache.hasKey(key):
    let value = cache.cache[key]
    # Move most recently used position
    cache.cache.del(key)
    cache.cache[key] = value
    return (value, true)
  else:
    var defaultVal: V
    return (defaultVal, false)


proc put*[K, V](cache: LRUCache[K, V], key: K, value: V) =
  # If key exists, remove it first so it gets added at the end
  if cache.cache.hasKey(key):
    cache.cache.del(key)
  # If at capacity, remove the oldest item, first in ordered table
  if cache.cache.len >= cache.capacity: # and cache.cache.len > 0:
    let oldestKey = cache.cache.data[cache.cache.first].key
    cache.cache.del(oldestKey)
  # (re)add item
  cache.cache[key] = value

---%<------%<------%<---

Araq (orginal) [2025-03-12T10:59:51+01:00] view original

FWIW both versions look terrible to me. ;-)

Use sink and lent annotations, use default(T) for clarity. Use object construction instead of new.

But yeah, probably you need a real algorithm, not some hack on top of OrderedTable which wasn't designed for this.

Bosinski (orginal) [2025-03-12T11:26:17+01:00] view original

Hi ingo,

your algo is the by-the-book version of a LRU-Cache. It's hard to make concurrent, cos' of the doubly-linked-list. A more modern approach was presented by Hiroshi Inoue named Multi-step LRU. It uses SIMD and should perform much better. I did a nim-version for 64-bit keys and values, but that can be adopted to whatever you want to put into the Cache.

greets Bosinski

ingo (orginal) [2025-03-12T12:28:09+01:00] view original

Use sink [...]

added sink to the K's and V's and the capacity: int. On the simplest noise that is heavier influenced by the cache it is ~6% slower with sink. Is that to be expected?

@Bosinski I'll have a look at that. Thanks.

Araq (orginal) [2025-03-12T12:34:19+01:00] view original

Whether it's slower or not depends on the callsite usages and also on the effectiveness of the cache. So yeah, it's not unexpected. I suppose eventually we need overloading by sink T after all... (This will be ugly...)

ingo (orginal) [2025-03-12T13:23:28+01:00] view original

fwiw, The numbers where with the MS compiler. With gcc it is a tad faster with sink, but the results are about 3x slower than the MS compiler.

arnetheduck (orginal) [2025-03-12T14:06:22+01:00] view original

See also https://github.com/status-im/nim-minilru - OrderedTable is indeed pretty slow, also for algorithmic reasons outside of sink.

ingo (orginal) [2025-03-16T10:48:06+01:00] view original

It's hard to make concurrent, cos' of the doubly-linked-list.

Thinking about that for the specific application, the cache size is quite small. The nearest neighbours of a cell (3^dimensions -1) and a bit. So one could use a cache per thread.

Mirror of forum.nim-lang.org

12717 :: orderedTable slow for Least Recently Used cache?