Who's gonna be the first implement this new funnel hash in Nim and benchmark it?! ;)
I definitely want to, but I'm in the middle of several projects. This is the sort of algorithm I like seeing in Nim because it becomes so elegant and easy to read. The related Tiny Pointer paper also seems interesting, especially for "packed data".
Theres a nice youtube video from the discoverer Andrew Krapivin on YouTube.
asked deep seek to implement in python and then to rewrite to Tim
import std/[math, random, hashes, monotimes, sets]
type
FunnelHash* = object
data: seq[int] # Single array to store all elements
n: int # Total size of the array
delta: float # Parameter δ
alpha: int # α = O(log δ⁻¹)
beta: int # β = O(log δ⁻¹)
aPrimeSize: int
aAlphaPlusOneSize: int
aOffsets: seq[int] # Store offsets for each subarray
proc newFunnelHash*(n: int, delta: float): FunnelHash =
result.n = n
result.delta = delta
result.alpha = ceil(4 * ln(1 / delta) + 10).int
result.beta = ceil(2 * ln(1 / delta)).int
# Calculate sizes
result.aPrimeSize = n - floor(delta * n.float).int
result.aAlphaPlusOneSize = floor(3 * delta * n.float / 4).int
# Initialize the single data array
let totalSize = result.aPrimeSize + result.aAlphaPlusOneSize
result.data = newSeq[int](totalSize)
for i in 0..<totalSize:
result.data[i] = 0 # 0 represents None/empty slot
# Calculate and store offsets for each subarray
result.aOffsets = newSeq[int]()
var currentOffset = 0
var remainingSize = result.aPrimeSize
for i in 0..<result.alpha:
result.aOffsets.add(currentOffset)
let size = min(result.beta * (3 ^ i), remainingSize)
currentOffset += size
remainingSize -= size
proc hashKey(self: FunnelHash, key: int, arraySize: int): int =
# Simple hash function
return abs(hash(key) mod arraySize)
proc getSubarrayBounds(self: FunnelHash, arrayIndex: int): tuple[start: int, size: int] =
if arrayIndex >= self.alpha:
return (self.aPrimeSize, self.aAlphaPlusOneSize)
let start = self.aOffsets[arrayIndex]
let nextStart = if arrayIndex + 1 < self.aOffsets.len: self.aOffsets[arrayIndex + 1]
else: self.aPrimeSize
return (start, nextStart - start)
proc insert*(self: var FunnelHash, key: int): bool =
# Try inserting into A1, A2, ..., Aα
for i in 0..<self.alpha:
let bounds = self.getSubarrayBounds(i)
let subarraySize = bounds.size div self.beta
if subarraySize == 0: continue
let j = self.hashKey(key, subarraySize)
let start = bounds.start + j * self.beta
let endPos = min(start + self.beta, bounds.start + bounds.size)
# Check for an empty slot in the subarray
for slot in start..<endPos:
if self.data[slot] == 0:
self.data[slot] = key
return true
# If all A_i fail, insert into A_alpha_plus_1
let bounds = self.getSubarrayBounds(self.alpha)
for slot in bounds.start..<(bounds.start + bounds.size):
if self.data[slot] == 0:
self.data[slot] = key
return true
return false
proc search*(self: FunnelHash, key: int): bool =
# Search in A1, A2, ..., Aα
for i in 0..<self.alpha:
let bounds = self.getSubarrayBounds(i)
let subarraySize = bounds.size div self.beta
if subarraySize == 0: continue
let j = self.hashKey(key, subarraySize)
let start = bounds.start + j * self.beta
let endPos = min(start + self.beta, bounds.start + bounds.size)
# Check if the key is in the subarray
for slot in start..<endPos:
if self.data[slot] == key:
return true
# Search in A_alpha_plus_1
let bounds = self.getSubarrayBounds(self.alpha)
for slot in bounds.start..<(bounds.start + bounds.size):
if self.data[slot] == key:
return true
return false
var size = 32
for i in 1..<10:
var funnelHash = newFunnelHash(size, 0.1)
var start = getMonoTime()
for i in 1..<(size*9 div 10):
discard funnelHash.insert(rand(1..10000))
var time = getMonoTime() - start
echo "FunnelHash ", size, " ", time
size *= 2
size = 32
for i in 1..<10:
var t = initHashSet[int]()
var start = getMonoTime()
for i in 1..<(size*9 div 10):
t.incl(rand(1..10000))
var time = getMonoTime() - start
echo "HashSet ", size, " ", time
size *= 2
FunnelHash 32 (seconds: 0, nanosecond: 18916)
FunnelHash 64 (seconds: 0, nanosecond: 44125)
FunnelHash 128 (seconds: 0, nanosecond: 101875)
FunnelHash 256 (seconds: 0, nanosecond: 232459)
FunnelHash 512 (seconds: 0, nanosecond: 528959)
FunnelHash 1024 (seconds: 0, nanosecond: 1189250)
FunnelHash 2048 (seconds: 0, nanosecond: 3031250)
FunnelHash 4096 (seconds: 0, nanosecond: 6688500)
FunnelHash 8192 (seconds: 0, nanosecond: 12776959)
HashSet 32 (seconds: 0, nanosecond: 8458)
HashSet 64 (seconds: 0, nanosecond: 16292)
HashSet 128 (seconds: 0, nanosecond: 46375)
HashSet 256 (seconds: 0, nanosecond: 102750)
HashSet 512 (seconds: 0, nanosecond: 218917)
HashSet 1024 (seconds: 0, nanosecond: 434167)
HashSet 2048 (seconds: 0, nanosecond: 870209)
HashSet 4096 (seconds: 0, nanosecond: 1743417)
HashSet 8192 (seconds: 0, nanosecond: 2992375)
strange algorithm, it is hard to select alpha, beta and delta values
and exponential division of arrays can't have exactly alpha members... but I don't know if I select different A` division won't it disprove that we safely can insert N-/delta*N items
disclaimer: I haven't read the paper, so maybe it's entirely wrong and hallucinating. Here is a claude 3.5 sonnet attempt:
# funnel_paper.nim
import std/[math, random, hashes, options]
import countit
type
Slot[K,V] = object
key: K
val: V
occupied: bool
Subarray[K,V] = seq[Slot[K,V]]
Array[K,V] = seq[Subarray[K,V]]
FunnelTable*[K,V] = object
arrays: seq[Array[K,V]] # A1 through Aα
specialB: seq[Slot[K,V]] # Part B of Aα+1
specialC: seq[seq[Slot[K,V]]] # Part C of Aα+1 (two-choice with buckets)
count: int
alpha: int # ⌈4 log(1/δ) + 10⌉
beta: int # ⌈2 log(1/δ)⌉
delta: float # Load factor parameter
size: int # Total size
proc nextPowerOfTwoFunnel(n: int): int =
result = 1
while result <= n: result = result shl 1
if result > n: result = result shr 1
if result < 8: result = 8
proc calcArraySizes(totalSize: int, beta: int): seq[int] =
var remaining = totalSize
result = @[]
var currentSize = remaining div 2
while currentSize >= beta:
result.add(currentSize)
currentSize = (currentSize * 3) div 4
remaining -= currentSize
proc hash1[K](key: K, size: int): int =
if size <= 0: 0
else: abs(hash(key)) mod size
proc hash2[K](key: K, size: int): int =
if size <= 0: 0
else: abs(hash(hash(key))) mod size
proc initFunnelTable*[K,V](initialSize: int = 64): FunnelTable[K,V] =
let size = nextPowerOfTwoFunnel(max(initialSize, 8))
let delta = 0.1 # As per paper's typical value
# Calculate parameters as per paper
let alpha = ceil(4 * ln(1/delta) / ln(2.0) + 10).int
let beta = ceil(2 * ln(1/delta) / ln(2.0)).int
result = FunnelTable[K,V](
size: size,
delta: delta,
count: 0,
alpha: alpha,
beta: beta
)
# Initialize main arrays
let mainArrays = calcArraySizes(size - int(delta * float(size)), beta)
result.arrays = newSeq[Array[K,V]](alpha)
for i in 0..<min(alpha, mainArrays.len):
let numSubarrays = max(1, mainArrays[i] div beta)
var array = newSeq[Subarray[K,V]](numSubarrays)
for j in 0..<numSubarrays:
array[j] = newSeq[Slot[K,V]](beta)
result.arrays[i] = array
# Initialize special array parts
let specialSize = int(delta * float(size))
let loglogn = max(2, ceil(ln(ln(float(size)))).int)
# Part B: uniform probing array
result.specialB = newSeq[Slot[K,V]](specialSize div 2)
# Part C: two-choice array with buckets
let bucketSize = 2 * loglogn
let numBuckets = max(2, (specialSize div 2) div bucketSize)
result.specialC = newSeq[seq[Slot[K,V]]](numBuckets)
for i in 0..<numBuckets:
result.specialC[i] = newSeq[Slot[K,V]](bucketSize)
proc tryInsertArray[K,V](array: var Array[K,V], key: K, val: V): bool =
if array.len == 0: return false
# Step 1: Hash key to get subarray index
let subarrayIdx = hash1(key, array.len)
if subarrayIdx >= array.len: return false
# Step 2 & 3: Check slots in subarray
for slot in array[subarrayIdx].mitems:
if not slot.occupied:
slot = Slot[K,V](key: key, val: val, occupied: true)
return true
elif slot.key == key:
slot.val = val
return true
return false
proc trySpecialB[K,V](t: var FunnelTable[K,V], key: K, val: V): bool =
if t.specialB.len == 0: return false
# Try log log n uniform probes
let loglogn = max(2, ceil(ln(ln(float(t.size)))).int)
for i in 0..<loglogn:
let pos = hash1(key, t.specialB.len)
if not t.specialB[pos].occupied:
t.specialB[pos] = Slot[K,V](key: key, val: val, occupied: true)
return true
elif t.specialB[pos].key == key:
t.specialB[pos].val = val
return true
return false
proc trySpecialC[K,V](t: var FunnelTable[K,V], key: K, val: V): bool =
if t.specialC.len == 0: return false
# Two-choice hashing
let h1 = hash1(key, t.specialC.len)
let h2 = hash2(key, t.specialC.len)
if h1 >= t.specialC.len or h2 >= t.specialC.len:
return false
# Choose less full bucket
let load1 = t.specialC[h1].countIt(it.occupied)
let load2 = t.specialC[h2].countIt(it.occupied)
let bucket = if load1 <= load2: h1 else: h2
# Try to insert
for slot in t.specialC[bucket].mitems:
if not slot.occupied:
slot = Slot[K,V](key: key, val: val, occupied: true)
return true
elif slot.key == key:
slot.val = val
return true
return false
proc resize[K,V](t: var FunnelTable[K,V]) =
var entries: seq[(K,V)] = @[]
# Collect all entries
for array in t.arrays:
for subarray in array:
for slot in subarray:
if slot.occupied:
entries.add((slot.key, slot.val))
for slot in t.specialB:
if slot.occupied:
entries.add((slot.key, slot.val))
for bucket in t.specialC:
for slot in bucket:
if slot.occupied:
entries.add((slot.key, slot.val))
# Create new table
t = initFunnelTable[K,V](t.size * 2)
# Reinsert all entries in original order
for (k, v) in entries:
# Use direct insertion to avoid recursive resizing
block insertion:
for array in t.arrays.mitems:
if tryInsertArray(array, k, v):
inc t.count
break insertion
if trySpecialB(t, k, v):
inc t.count
break insertion
if trySpecialC(t, k, v):
inc t.count
break insertion
proc `[]=`*[K,V](t: var FunnelTable[K,V], key: K, val: V) =
# First try to update existing key
block findExisting:
# Check main arrays
for array in t.arrays.mitems:
let subarrayIdx = hash1(key, array.len)
if subarrayIdx >= array.len: continue
for slot in array[subarrayIdx].mitems:
if slot.occupied and slot.key == key:
slot.val = val # Update existing value
return
# Check special array B
for slot in t.specialB.mitems:
if slot.occupied and slot.key == key:
slot.val = val
return
# Check special array C
let h1 = hash1(key, t.specialC.len)
let h2 = hash2(key, t.specialC.len)
for h in [h1, h2]:
if h >= t.specialC.len: continue
for slot in t.specialC[h].mitems:
if slot.occupied and slot.key == key:
slot.val = val
return
# Key doesn't exist, proceed with insertion
# Check load factor
if t.count >= int(0.9 * float(t.size)):
t.resize()
# Try each array in sequence as per paper
for array in t.arrays.mitems:
if tryInsertArray(array, key, val):
inc t.count
return
# Try special arrays
if trySpecialB(t, key, val):
inc t.count
return
if trySpecialC(t, key, val):
inc t.count
return
# If everything fails, resize and retry
t.resize()
t[key] = val
proc searchArrays[K,V](t: FunnelTable[K,V], key: K): Option[V] =
for array in t.arrays:
let subarrayIdx = hash1(key, array.len)
if subarrayIdx >= array.len: continue
for slot in array[subarrayIdx]:
if slot.occupied and slot.key == key:
return some(slot.val)
none(V)
proc searchSpecialB[K,V](t: FunnelTable[K,V], key: K): Option[V] =
let loglogn = max(2, ceil(ln(ln(float(t.size)))).int)
for i in 0..<loglogn:
let pos = hash1(key, t.specialB.len)
if pos < t.specialB.len and t.specialB[pos].occupied and t.specialB[pos].key == key:
return some(t.specialB[pos].val)
none(V)
proc searchSpecialC[K,V](t: FunnelTable[K,V], key: K): Option[V] =
if t.specialC.len == 0: return none(V)
let h1 = hash1(key, t.specialC.len)
let h2 = hash2(key, t.specialC.len)
if h1 >= t.specialC.len and h2 >= t.specialC.len:
return none(V)
# Check both buckets
for h in [h1, h2]:
if h >= t.specialC.len: continue
for slot in t.specialC[h]:
if slot.occupied and slot.key == key:
return some(slot.val)
none(V)
proc `[]`*[K,V](t: FunnelTable[K,V], key: K): V =
# Search main arrays
let mainResult = searchArrays(t, key)
if mainResult.isSome:
return mainResult.get
# Search special array B
let specialBResult = searchSpecialB(t, key)
if specialBResult.isSome:
return specialBResult.get
# Search special array C
let specialCResult = searchSpecialC(t, key)
if specialCResult.isSome:
return specialCResult.get
raise newException(KeyError, "Key not found: " & $key)
proc contains*[K,V](t: FunnelTable[K,V], key: K): bool =
try:
discard t[key]
true
except KeyError:
false
proc len*[K,V](t: FunnelTable[K,V]): int = t.count
proc clear*[K,V](t: var FunnelTable[K,V]) =
t = initFunnelTable[K,V](64)
iterator pairs*[K,V](t: FunnelTable[K,V]): (K,V) =
for array in t.arrays:
for subarray in array:
for slot in subarray:
if slot.occupied:
yield (slot.key, slot.val)
for slot in t.specialB:
if slot.occupied:
yield (slot.key, slot.val)
for bucket in t.specialC:
for slot in bucket:
if slot.occupied:
yield (slot.key, slot.val)
when isMainModule:
# Basic tests
var t = initFunnelTable[string, int]()
t["one"] = 1
t["two"] = 2
assert t["one"] == 1
assert t["two"] == 2
assert t.len == 2
assert "one" in t
assert "three" notin t
echo "Basic tests passed!"
(A side though) Seems like the Declarative Programming is finally here. I think there's 90% that in next 1-5 years all programs will look like code below. And the idea to write implementation will be as absurd as writing in assembler today.
You define the code, and the implementation is just a derivative artefact, same like build step or fetching dependencies. There probably will be also step when AI generate bunch of tests for each function, and you review some of it quickly to confirm its correctness.
type FunnelHash* = object
len: int
proc new*(_type: FunnelHash, n: int): FunnelHash =
Create a new FunnelHash object with n elements
proc set[K, V]*(self: var FunnelHash[K, V], k: K, v: V) =
Insert a key into the FunnelHash object
proc get[K, V]*(self: FunnelHash[K, V], k: K): V =
Get the value of a key in the FunnelHash object
P.S. If there will be any programming at all...
Nitpick on the syntax :P
Actually you can basically do similar with one of the LLM IDEs nowadays. You could probably make the below work today sorta. Though probably manually generating and modifying tests. A friend of mine has been running DeepSeek locally using ollama on a macbook at a few tokens/sec. Just add an agent to generate code, compile, and test it iteratively.
I think a lot of programming will move to be more like writing "contracts" or "theorems" in the form of tests, requirements, contexts. It could actually be kinda fun in that you choose which code you write by hand. Sorta like we do with assembly currently.
type FunnelHash* = object
len: int
deepClaudeChatContext("Details of data structure and algorithm described in ./docs/funnel-hash.pdf")
proc new*(_type: FunnelHash, n: int): FunnelHash =
deepClaudeChatImpl("Create a new FunnelHash object with n elements")
...