This is the program:
# I’m trying to sort by date a collection of tuples which can
# include records with the same date.
# Each date has a sequence of records - all with the same date and
# in insertion order (which is irrelevant for this application)
const strs = [
"2014-04-01,456.23",
"2014-04-01,123.23",
"2013-05-02,234.15",
"2011-06-02,534.15",
]
import tables
import strutils
import math
type
Dat = tuple[date: string, amt: float]
DatSeq = seq[Dat]
DatTab = OrderedTable[string,DatSeq]
var datTab: DatTab = initOrderedTable[string,DatSeq](2048)
proc enter_data(date: string, amt: float) =
let rec: Dat = (date, amt)
if datTab.hasKeyOrPut(date,@[rec]):
datTab[date].add(rec)
for line in strs:
let fields: seq[string] = line.split(",")
let date: string = fields[0]
let amt: float = parseFloat(fields[1])
enter_data(date,amt)
# datTab.sort(proc (x,y: DatSeq): int =
# result = cmp(x[0].date, y[0].date) )
for key in datTab.keys:
echo(key & " " & datTab[key].repr)
for key in datTab.keys:
let st: seq[Dat] = datTab[key]
echo(key & " " & $st.len)
2014-04-01 0x1034a80d0[[Field0 = 0x1034a7200"2014-04-01",
Field1 = 456.23], [Field0 = 0x1034a71d0"2014-04-01",
Field1 = 123.23]]
2013-05-02 0x1034a7500[[Field0 = 0x1034a7530"2013-05-02",
Field1 = 234.15]]
2011-06-02 0x1034a76e0[[Field0 = 0x1034a7710"2011-06-02",
Field1 = 534.15]]
2014-04-01 2
2013-05-02 1
2011-06-02 1
There are a couple of lines that are commented out. This is my shot at a correct sort command syntax. Taking out the comments - does not compile.
It would be good if the ordering behind the scenes in the OrderedTable would be in date order. The date in each record was used as a key when the record was inserted.
I don't want to change the structure of the program, or simplify it, because I left out a lot of lines that do sub processing on the data at various points.
Have fun.
As you seem to sort of understand, but perhaps need/want confirmation upon, OrderedTable does not order the collection by the key type's sense of less than or greater than. It orders things by insertion order, but it preserves that order (rather than scrambling it non-deterministically as an ordinary table might). Indeed, OrderedTable never compares keys in a less-than/greater-than sense.
What you want is a key-ordered collection type, but the Nim standard library doesn't presently provide one. B-trees are often best for this (both in-memory and disk-backed), but there are probably at least a few binary search tree/red-black tree/AVL-tree/other balanced binary search tree or skip list implementations in Nim floating around. Maybe these keywords will help your search.
My understanding is that OrderedTable is what I want. I can see the ordering of the keys by doing:
for key in datTab.keys:
echo key
Initially, insertion order is the order. This order can be changed by doing a sort as in:
datTab.sort(cmp)
which sorts the string type keys to alphabetical order.
However, this simple command only works for very simple VALUE types.
When I use a little more complex VALUE type, like a tuple, this simple command does not work anymore (does not compile). This is odd because the only thing that is being sorted are the keys - which are string dates in the form 2016-06-12 - easily sorted, no compound keys. The type of the value should not matter because it is the keys that are sorted. Odd.
Here is a kludge way of getting what I want:
# I’m trying to sort by date a collection of tuples which can
# include records with the same date.
# Each date has a sequence of records - all with the same date and
# in insertion order (which is irrelevant for this application)
const strs = [
"2014-04-01,456.23",
"2014-04-01,123.23",
"2013-05-02,234.15",
"2011-06-02,534.15",
]
import tables
import strutils
import math
type
Dat = tuple[date: string, amt: float]
DatSeq = seq[Dat]
DatTab = OrderedTable[string,DatSeq]
var datTab: DatTab = initOrderedTable[string,DatSeq](2048)
proc enter_data(date: string, amt: float) =
let rec: Dat = (date, amt)
if datTab.hasKeyOrPut(date,@[rec]):
datTab[date].add(rec)
for line in strs:
let fields: seq[string] = line.split(",")
let date: string = fields[0]
let amt: float = parseFloat(fields[1])
enter_data(date,amt)
echo "\noriginal insert order"
var tmp_keySeq: seq[string] = @[]
for key in datTab.keys:
echo key
tmp_keySeq.add(key)
# From Rosetta:
proc bubbleSort[T](a: var openarray[T]) =
var t = true
for n in countdown(a.len-2, 0):
if not t: break
t = false
for j in 0..n:
if a[j] <= a[j+1]: continue
swap a[j], a[j+1]
t = true
bubbleSort tmp_keySeq
# sort(tmp_keySeq, system.cmp)
echo "\nsorted order"
for ky in tmp_keySeq:
echo ky
var tmp_datTab: DatTab = initOrderedTable[string,DatSeq](2048)
for ky in tmp_keySeq:
tmp_datTab[ky] = datTab[ky]
# Whack old datTab with new better sorted one
datTab = tmp_datTab
echo "\n Now, check ordering"
for key in datTab.keys:
echo(key & " " & datTab[key].repr)
for key in datTab.keys:
let st: seq[Dat] = datTab[key]
echo(key & " " & $st.len)
I shouldn't have to do this. My kludge requires double the Table space and an extra vector of keys and the time needed to copy.
The results are:
original insert order
2014-04-01
2013-05-02
2011-06-02
sorted order
2011-06-02
2013-05-02
2014-04-01
Now, check ordering
2011-06-02 0x106e52b60[[Field0 = 0x106e52b90"2011-06-02",
Field1 = 534.15]]
2013-05-02 0x106e52c80[[Field0 = 0x106e52cb0"2013-05-02",
Field1 = 234.15]]
2014-04-01 0x106e53210[[Field0 = 0x106e52bf0"2014-04-01",
Field1 = 456.23], [Field0 = 0x106e52c20"2014-04-01",
Field1 = 123.23]]
2011-06-02 1
2013-05-02 1
2014-04-01 2
Sorry, I can only understand parts of your intent...
let rec: Dat = (date, amt)
if datTab.hasKeyOrPut(date,@[rec]):
datTab[date].add(rec)
You insert a seq containing element rec, and then add the same element again to the sequence? And your hash key is identical to the first element of the tupel?
Well, this one would compile:
const strs = [
"2014-04-01,456.23",
"2014-04-01,123.23",
"2013-05-02,234.15",
"2011-06-02,534.15",
]
import tables
import strutils
import math
type
Dat = tuple[date: string, amt: float]
DatSeq = seq[Dat]
DatTab = OrderedTable[string,DatSeq]
var datTab: DatTab = initOrderedTable[string,DatSeq](2048)
proc enter_data(date: string, amt: float) =
let rec: Dat = (date, amt)
if datTab.hasKeyOrPut(date,@[rec]):
datTab[date].add(rec)
for line in strs:
let fields: seq[string] = line.split(",")
let date: string = fields[0]
let amt: float = parseFloat(fields[1])
enter_data(date,amt)
sort(datTab, proc (x,y: (string, DatSeq)): int =
result = cmp(x[0], y[0]) )
for key in datTab.keys:
echo(key & " " & datTab[key].repr)
for key in datTab.keys:
let st: seq[Dat] = datTab[key]
echo(key & " " & $st.len)
2011-06-02 1 2013-05-02 1 2014-04-01 2
Or this one too:
sort(datTab, proc (x,y: (string, DatSeq)): int =
result = cmp(x[1][0].date, y[1][0].date) )
Super! I used the second of your choices. The first choice seemed to be doing a (redundant) sort on every line of input.
Your solution still involves dipping into the values of each data record. It should be possible to sort only the keys.
Ok, how about this:
sort(datTab, proc (x,y: (string, DatSeq)): int =
result = cmp(x[0], y[0]) )
No fooling around with contents of the values.
This also works:
datTab.sort(proc (x,y: (string, DatSeq)): int =
result = cmp(x[0], y[0]) )
The syntax of the sort command is sensitive to whitespace (or no WS)
For example, the command:
sesTab.sort(proc (x,y: (string,SesSeq)): int =
result = cmp(x[0], y[0]) )
does not compile. However, putting a space after the comma in the type of x,y, allows it to compile
sesTab.sort(proc (x,y: (string, SesSeq)): int =
result = cmp(x[0], y[0]) )
I accidentally left out the space and thought I should report it.
I'm surprised that problem is still there after 3 years. Somebody else could have stumbled over it..