nimforum mirror - How to keep a ordered table sorted with the most recent changed elements at the end?

mrhdias (orginal) [2020-10-15T19:34:39+02:00] view original

Each time an element of ordered table is changed, it is added to the bottom of the ordered table, so that the oldest element is at the beginning and the most recent at the end. New elements are always added to the end of the list.

The following example illustrates what I want:

import tables
import times
import random
import os

randomize()

let numbers = ["one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten"]
var pool = newOrderedTable[string, DateTime]()

# Pool initialization
for k in numbers:
  pool[k] = now()

proc add2bottom(id: string, newtime: DateTime) =
  var tmp: DateTime
  if pool.pop(id, tmp): # deletes the key from the ordered table
    tmp = newtime # set the new value to tmp variable
    pool[id] = tmp # add the new key, value to the end of ordered table

while true:
  let idx = rand(9)
  echo "Key to add to bottom of ordered table: ", numbers[idx]
  add2bottom(numbers[idx], now())
  echo pool
  sleep(2000)

Is there a way to put the most recent elements at the bottom of the ordered table without deleting and add them again?

shirleyquirk (orginal) [2020-10-15T20:07:59+02:00] view original

short answer no, and this is O(n), you probably want an LRU cache like this

cblake (orginal) [2020-10-15T20:34:31+02:00] view original

I agree with @shirleyquirk, but would point out that the linked-to implementation seems to store 2 copies of every Key,Value pair - one in the Node[K,T] of the list and one in the Table. This may be a problem if this Key,Value data becomes large or memory economy matters. An "intrusive" doubly linked list (one where the list procs only manipulate prev-next pointers) could yield more efficiency. I believe @Araq made rumblings that such intrusive singly/doubly linked implementations might be welcome in the stdlib if someone is looking for a project.

mrhdias (orginal) [2020-10-15T22:41:31+02:00] view original

The ordered tables works fine, but I don't know if as impact in performance if I delete and add again large structures. The idea is add new keys and the changed key values to the tail. After some time the oldest keys are deleted by order, begin at the head of table.

cblake (orginal) [2020-10-15T23:15:28+02:00] view original

As @shirleyquirk mentioned std/tables.del(OrderedTable) is O(n). That means O(theTable.len) per delete operation. So, if you are putting many objects in the OrderedTable then just the deletion will become slow. adix/lptabz.nim has a more efficient delete while retaining insert-order iteration. { In fact, if you want an Oldest-Inserted-is-the-victim instead of Least-Recently-Used-is-the-victim cache replacement policy then you could probably get that efficiently with that lptabz-style of ordered table. I'm not advocating that replacement policy, though. }

If you don't use that linked to LRU or something similar then as you get a table with many elements it will become expensive to find the key of the victim to delete from the table even if you use either the regular std/tables.OrderedTable or my adix/lptabz.

shirleyquirk (orginal) [2020-10-16T02:49:27+02:00] view original

Building on what @cblake said about the costs of popping (or deleting) a value from an ordered table, with regards to the cost of adding a large object to a table:

If your big objects are the keys, the hash function could get expensive.

It's easy to move values into a table without a copy:

myTable[key]=move(bigValue)

but if you move your key into the table, it won't exist anymore, so you won't be able to get your value back out again. (Without recreating the big object)

Alternatively, use ref objects, which are cheap to copy about, as you're just passing pointers around, though calculating their hash could still be expensive.

mrhdias (orginal) [2020-10-16T12:18:12+02:00] view original

Thanks @shirleyquirk. The big objects are not the keys.

I use an OrderedTableRef.

pool.pop(key, tmp): # deletes the key from the ordered table
tmp = newobj        # set the new value to tmp variable
pool[key] = tmp     # add the same key, value to the end of ordered table

If I copy to a temporary var then deleting and adding again have an impact on performance if I have a table with many keys?

cblake (orginal) [2020-10-16T12:48:09+02:00] view original

Copying always has some impact on performance, but that probably isn't your big problem if you are worried about many entries with small keys. As I & @shirleyquirk both mentioned, if you have many keys/entries in your OrderedTable then that keyed pop or a del is going to be slow - if you look at lib/pure/collections/tables.nim you will see that it basically rebuilds the whole table. A while back, there was some discussion to move to a doubly-linked list inside OrderedTable.

It sounds like you may be just learning about how to implement this idea efficiently. The core issue is that there are "two orders" in play: the hash order to make finding by key fast and the doubly-linked list order to make finding by least-recent-ness and re-ordering by recentness fast. I don't know your full problem context, but from what you have said so far, the "insert order" that OrderedTable maintains is likely not helping you. People also do not (usually) query real time with your now() for these things, but simply have some operations update the "victimhood order". (A "victim" here is what you delete when something new gets inserted after the data structure has filled up.)

You really should just use lrucache as @shirleyquirk mentioned. It's under 200 lines of code. If you don't want to depend upon it with nimble then you can use it by copying the module & removing what you don't need. Fully understanding it seems like it is key to you moving forward.

Also, if the value type (the 2nd type slot in *Table[A,B]) is big, as seems to maybe be a concern, then you want to use a ref type for that type (not just the OrderedTableRef you mention).

cblake (orginal) [2020-10-16T13:09:18+02:00] view original

Here is a slightly longer article with Python code instead of Java. You could also just port that to Nim yourself. Since it inlines the doubly linked list logic it does not duplicate keys.

shirleyquirk (orginal) [2020-10-16T14:06:31+02:00] view original

Please listen to @cblake, they preach the truth. Big O rules our lives, heed her mighty warnings

cblake (orginal) [2020-10-16T15:33:49+02:00] view original

Lol. Big O sure can be nasty. OTOH, over-reliance can neglect important constant factors or same-level overheads. Big-O is a good starting point, but only a starting point if you want to fully understand performance. Knuth's TAOCP's frequent use of exact time formulas always struck me as the best way to go as in cycles or nanosec = A + B*N + C*N^2 + .., etc. Maybe some log N's in there, etc. Time is additive (with certain assumptions) in a way that rates (like items/second) are not.

Unlike in the 1960s when TAOCP was mostly written, modern multi-tasking (and multi-everything else) create a lot of noise. So, you may need a regression to recover such forms (a good one would give error estimates on the A, B, .. above). You may even need 3..5 regressions over separate memory hierarchy regimes. A chart/graph starts to seem a better summary, but people often crave single numbers.

Anyway, @mrhdias - if you do port that Python example then you could have a more attractive package than lrucache and you could publish it to the Nim packages repo!

jackhftang (orginal) [2020-10-20T15:02:38+02:00] view original

The python and java version are basically the same as current implementation of lrucache. Except that the python one restrict the key and value to int. The python version also has two copy of key: one copy of key is hidden inside LRUCache.node_lookup and another copy of key is at Node.node_key.

If a single copy of key is a must, the doubly-linked list node need to replace the key field with a reference to internal node of table and that node keeps the key. For keys that have the same size as pointer, the space usage is the same. The performance gain depends on the actual implementation of table, but it should only save one table lookup per deletion of doubly-linked list node.

Mirror of forum.nim-lang.org

6931 :: How to keep a ordered table sorted with the most recent changed elements at the end?