nimforum mirror - Funnel Hash?

elcritch (orginal) [2025-02-11T08:43:31+01:00] view original

Who's gonna be the first implement this new funnel hash in Nim and benchmark it?! ;)

I definitely want to, but I'm in the middle of several projects. This is the sort of algorithm I like seeing in Nim because it becomes so elegant and easy to read. The related Tiny Pointer paper also seems interesting, especially for "packed data".

Theres a nice youtube video from the discoverer Andrew Krapivin on YouTube.

PMunch (orginal) [2025-02-11T11:02:47+01:00] view original

I was looking at this over breakfast today, seems interesting. But I don't think I'll find the time to do it myself unless I get properly nerd-sniped.

elcritch (orginal) [2025-02-12T04:52:46+01:00] view original

Same here, but it looks so simple to implement.. Maybe o3-mini could do it. Must resist. ;)

blackmius (orginal) [2025-02-12T05:05:44+01:00] view original

asked deep seek to implement in python and then to rewrite to Tim


import std/[math, random, hashes, monotimes, sets]

type
  FunnelHash* = object
    data: seq[int]  # Single array to store all elements
    n: int  # Total size of the array
    delta: float  # Parameter δ
    alpha: int  # α = O(log δ⁻¹)
    beta: int   # β = O(log δ⁻¹)
    aPrimeSize: int
    aAlphaPlusOneSize: int
    aOffsets: seq[int]  # Store offsets for each subarray

proc newFunnelHash*(n: int, delta: float): FunnelHash =
  result.n = n
  result.delta = delta
  result.alpha = ceil(4 * ln(1 / delta) + 10).int
  result.beta = ceil(2 * ln(1 / delta)).int
  
  # Calculate sizes
  result.aPrimeSize = n - floor(delta * n.float).int
  result.aAlphaPlusOneSize = floor(3 * delta * n.float / 4).int
  
  # Initialize the single data array
  let totalSize = result.aPrimeSize + result.aAlphaPlusOneSize
  result.data = newSeq[int](totalSize)
  for i in 0..<totalSize:
    result.data[i] = 0  # 0 represents None/empty slot
  
  # Calculate and store offsets for each subarray
  result.aOffsets = newSeq[int]()
  var currentOffset = 0
  var remainingSize = result.aPrimeSize
  
  for i in 0..<result.alpha:
    result.aOffsets.add(currentOffset)
    let size = min(result.beta * (3 ^ i), remainingSize)
    currentOffset += size
    remainingSize -= size

proc hashKey(self: FunnelHash, key: int, arraySize: int): int =
  # Simple hash function
  return abs(hash(key) mod arraySize)

proc getSubarrayBounds(self: FunnelHash, arrayIndex: int): tuple[start: int, size: int] =
  if arrayIndex >= self.alpha:
    return (self.aPrimeSize, self.aAlphaPlusOneSize)
  
  let start = self.aOffsets[arrayIndex]
  let nextStart = if arrayIndex + 1 < self.aOffsets.len: self.aOffsets[arrayIndex + 1]
                 else: self.aPrimeSize
  return (start, nextStart - start)

proc insert*(self: var FunnelHash, key: int): bool =
  # Try inserting into A1, A2, ..., Aα
  for i in 0..<self.alpha:
    let bounds = self.getSubarrayBounds(i)
    let subarraySize = bounds.size div self.beta
    if subarraySize == 0: continue
    
    let j = self.hashKey(key, subarraySize)
    let start = bounds.start + j * self.beta
    let endPos = min(start + self.beta, bounds.start + bounds.size)
    
    # Check for an empty slot in the subarray
    for slot in start..<endPos:
      if self.data[slot] == 0:
        self.data[slot] = key
        return true
  
  # If all A_i fail, insert into A_alpha_plus_1
  let bounds = self.getSubarrayBounds(self.alpha)
  for slot in bounds.start..<(bounds.start + bounds.size):
    if self.data[slot] == 0:
      self.data[slot] = key
      return true
  
  return false

proc search*(self: FunnelHash, key: int): bool =
  # Search in A1, A2, ..., Aα
  for i in 0..<self.alpha:
    let bounds = self.getSubarrayBounds(i)
    let subarraySize = bounds.size div self.beta
    if subarraySize == 0: continue
    
    let j = self.hashKey(key, subarraySize)
    let start = bounds.start + j * self.beta
    let endPos = min(start + self.beta, bounds.start + bounds.size)
    
    # Check if the key is in the subarray
    for slot in start..<endPos:
      if self.data[slot] == key:
        return true
  
  # Search in A_alpha_plus_1
  let bounds = self.getSubarrayBounds(self.alpha)
  for slot in bounds.start..<(bounds.start + bounds.size):
    if self.data[slot] == key:
      return true
  
  return false


var size = 32
for i in 1..<10:
  var funnelHash = newFunnelHash(size, 0.1)
  var start = getMonoTime()
  for i in 1..<(size*9 div 10):
    discard funnelHash.insert(rand(1..10000))
  var time = getMonoTime() - start
  echo "FunnelHash ", size, " ", time
  size *= 2

size = 32
for i in 1..<10:
  var t = initHashSet[int]()
  var start = getMonoTime()
  for i in 1..<(size*9 div 10):
    t.incl(rand(1..10000))
  var time = getMonoTime() - start
  echo "HashSet ", size, " ", time
  size *= 2


FunnelHash 32 (seconds: 0, nanosecond: 18916)
FunnelHash 64 (seconds: 0, nanosecond: 44125)
FunnelHash 128 (seconds: 0, nanosecond: 101875)
FunnelHash 256 (seconds: 0, nanosecond: 232459)
FunnelHash 512 (seconds: 0, nanosecond: 528959)
FunnelHash 1024 (seconds: 0, nanosecond: 1189250)
FunnelHash 2048 (seconds: 0, nanosecond: 3031250)
FunnelHash 4096 (seconds: 0, nanosecond: 6688500)
FunnelHash 8192 (seconds: 0, nanosecond: 12776959)
HashSet 32 (seconds: 0, nanosecond: 8458)
HashSet 64 (seconds: 0, nanosecond: 16292)
HashSet 128 (seconds: 0, nanosecond: 46375)
HashSet 256 (seconds: 0, nanosecond: 102750)
HashSet 512 (seconds: 0, nanosecond: 218917)
HashSet 1024 (seconds: 0, nanosecond: 434167)
HashSet 2048 (seconds: 0, nanosecond: 870209)
HashSet 4096 (seconds: 0, nanosecond: 1743417)
HashSet 8192 (seconds: 0, nanosecond: 2992375)

blackmius (orginal) [2025-02-12T08:05:59+01:00] view original

strange algorithm, it is hard to select alpha, beta and delta values

and exponential division of arrays can't have exactly alpha members... but I don't know if I select different A` division won't it disprove that we safely can insert N-/delta*N items

kobi (orginal) [2025-02-13T15:19:15+01:00] view original

disclaimer: I haven't read the paper, so maybe it's entirely wrong and hallucinating. Here is a claude 3.5 sonnet attempt:


# funnel_paper.nim
import std/[math, random, hashes, options]
import countit

type
  Slot[K,V] = object
    key: K
    val: V
    occupied: bool
  
  Subarray[K,V] = seq[Slot[K,V]]
  Array[K,V] = seq[Subarray[K,V]]
  
  FunnelTable*[K,V] = object
    arrays: seq[Array[K,V]]      # A1 through Aα
    specialB: seq[Slot[K,V]]     # Part B of Aα+1
    specialC: seq[seq[Slot[K,V]]] # Part C of Aα+1 (two-choice with buckets)
    count: int
    alpha: int                   # ⌈4 log(1/δ) + 10⌉
    beta: int                    # ⌈2 log(1/δ)⌉
    delta: float                 # Load factor parameter
    size: int                    # Total size

proc nextPowerOfTwoFunnel(n: int): int =
  result = 1
  while result <= n: result = result shl 1
  if result > n: result = result shr 1
  if result < 8: result = 8

proc calcArraySizes(totalSize: int, beta: int): seq[int] =
  var remaining = totalSize
  result = @[]
  var currentSize = remaining div 2
  while currentSize >= beta:
    result.add(currentSize)
    currentSize = (currentSize * 3) div 4
    remaining -= currentSize

proc hash1[K](key: K, size: int): int =
  if size <= 0: 0
  else: abs(hash(key)) mod size

proc hash2[K](key: K, size: int): int =
  if size <= 0: 0
  else: abs(hash(hash(key))) mod size

proc initFunnelTable*[K,V](initialSize: int = 64): FunnelTable[K,V] =
  let size = nextPowerOfTwoFunnel(max(initialSize, 8))
  let delta = 0.1  # As per paper's typical value
  
  # Calculate parameters as per paper
  let alpha = ceil(4 * ln(1/delta) / ln(2.0) + 10).int
  let beta = ceil(2 * ln(1/delta) / ln(2.0)).int
  
  result = FunnelTable[K,V](
    size: size,
    delta: delta,
    count: 0,
    alpha: alpha,
    beta: beta
  )
  
  # Initialize main arrays
  let mainArrays = calcArraySizes(size - int(delta * float(size)), beta)
  result.arrays = newSeq[Array[K,V]](alpha)
  
  for i in 0..<min(alpha, mainArrays.len):
    let numSubarrays = max(1, mainArrays[i] div beta)
    var array = newSeq[Subarray[K,V]](numSubarrays)
    for j in 0..<numSubarrays:
      array[j] = newSeq[Slot[K,V]](beta)
    result.arrays[i] = array
  
  # Initialize special array parts
  let specialSize = int(delta * float(size))
  let loglogn = max(2, ceil(ln(ln(float(size)))).int)
  
  # Part B: uniform probing array
  result.specialB = newSeq[Slot[K,V]](specialSize div 2)
  
  # Part C: two-choice array with buckets
  let bucketSize = 2 * loglogn
  let numBuckets = max(2, (specialSize div 2) div bucketSize)
  result.specialC = newSeq[seq[Slot[K,V]]](numBuckets)
  for i in 0..<numBuckets:
    result.specialC[i] = newSeq[Slot[K,V]](bucketSize)

proc tryInsertArray[K,V](array: var Array[K,V], key: K, val: V): bool =
  if array.len == 0: return false
  
  # Step 1: Hash key to get subarray index
  let subarrayIdx = hash1(key, array.len)
  if subarrayIdx >= array.len: return false
  
  # Step 2 & 3: Check slots in subarray
  for slot in array[subarrayIdx].mitems:
    if not slot.occupied:
      slot = Slot[K,V](key: key, val: val, occupied: true)
      return true
    elif slot.key == key:
      slot.val = val
      return true
  
  return false

proc trySpecialB[K,V](t: var FunnelTable[K,V], key: K, val: V): bool =
  if t.specialB.len == 0: return false
  
  # Try log log n uniform probes
  let loglogn = max(2, ceil(ln(ln(float(t.size)))).int)
  for i in 0..<loglogn:
    let pos = hash1(key, t.specialB.len)
    if not t.specialB[pos].occupied:
      t.specialB[pos] = Slot[K,V](key: key, val: val, occupied: true)
      return true
    elif t.specialB[pos].key == key:
      t.specialB[pos].val = val
      return true
  
  return false

proc trySpecialC[K,V](t: var FunnelTable[K,V], key: K, val: V): bool =
  if t.specialC.len == 0: return false
  
  # Two-choice hashing
  let h1 = hash1(key, t.specialC.len)
  let h2 = hash2(key, t.specialC.len)
  
  if h1 >= t.specialC.len or h2 >= t.specialC.len:
    return false
  
  # Choose less full bucket
  let load1 = t.specialC[h1].countIt(it.occupied)
  let load2 = t.specialC[h2].countIt(it.occupied)
  let bucket = if load1 <= load2: h1 else: h2
  
  # Try to insert
  for slot in t.specialC[bucket].mitems:
    if not slot.occupied:
      slot = Slot[K,V](key: key, val: val, occupied: true)
      return true
    elif slot.key == key:
      slot.val = val
      return true
  
  return false

proc resize[K,V](t: var FunnelTable[K,V]) =
  var entries: seq[(K,V)] = @[]
  
  # Collect all entries
  for array in t.arrays:
    for subarray in array:
      for slot in subarray:
        if slot.occupied:
          entries.add((slot.key, slot.val))
  
  for slot in t.specialB:
    if slot.occupied:
      entries.add((slot.key, slot.val))
  
  for bucket in t.specialC:
    for slot in bucket:
      if slot.occupied:
        entries.add((slot.key, slot.val))
  
  # Create new table
  t = initFunnelTable[K,V](t.size * 2)
  
  # Reinsert all entries in original order
  for (k, v) in entries:
    # Use direct insertion to avoid recursive resizing
    block insertion:
      for array in t.arrays.mitems:
        if tryInsertArray(array, k, v):
          inc t.count
          break insertion
      
      if trySpecialB(t, k, v):
        inc t.count
        break insertion
      
      if trySpecialC(t, k, v):
        inc t.count
        break insertion


proc `[]=`*[K,V](t: var FunnelTable[K,V], key: K, val: V) =
  # First try to update existing key
  block findExisting:
    # Check main arrays
    for array in t.arrays.mitems:
      let subarrayIdx = hash1(key, array.len)
      if subarrayIdx >= array.len: continue
      
      for slot in array[subarrayIdx].mitems:
        if slot.occupied and slot.key == key:
          slot.val = val  # Update existing value
          return
    
    # Check special array B
    for slot in t.specialB.mitems:
      if slot.occupied and slot.key == key:
        slot.val = val
        return
    
    # Check special array C
    let h1 = hash1(key, t.specialC.len)
    let h2 = hash2(key, t.specialC.len)
    for h in [h1, h2]:
      if h >= t.specialC.len: continue
      for slot in t.specialC[h].mitems:
        if slot.occupied and slot.key == key:
          slot.val = val
          return
  
  # Key doesn't exist, proceed with insertion
  
  # Check load factor
  if t.count >= int(0.9 * float(t.size)):
    t.resize()
  
  # Try each array in sequence as per paper
  for array in t.arrays.mitems:
    if tryInsertArray(array, key, val):
      inc t.count
      return
  
  # Try special arrays
  if trySpecialB(t, key, val):
    inc t.count
    return
  
  if trySpecialC(t, key, val):
    inc t.count
    return
  
  # If everything fails, resize and retry
  t.resize()
  t[key] = val

proc searchArrays[K,V](t: FunnelTable[K,V], key: K): Option[V] =
  for array in t.arrays:
    let subarrayIdx = hash1(key, array.len)
    if subarrayIdx >= array.len: continue
    
    for slot in array[subarrayIdx]:
      if slot.occupied and slot.key == key:
        return some(slot.val)
  none(V)

proc searchSpecialB[K,V](t: FunnelTable[K,V], key: K): Option[V] =
  let loglogn = max(2, ceil(ln(ln(float(t.size)))).int)
  for i in 0..<loglogn:
    let pos = hash1(key, t.specialB.len)
    if pos < t.specialB.len and t.specialB[pos].occupied and t.specialB[pos].key == key:
      return some(t.specialB[pos].val)
  none(V)

proc searchSpecialC[K,V](t: FunnelTable[K,V], key: K): Option[V] =
  if t.specialC.len == 0: return none(V)
  
  let h1 = hash1(key, t.specialC.len)
  let h2 = hash2(key, t.specialC.len)
  
  if h1 >= t.specialC.len and h2 >= t.specialC.len:
    return none(V)
  
  # Check both buckets
  for h in [h1, h2]:
    if h >= t.specialC.len: continue
    for slot in t.specialC[h]:
      if slot.occupied and slot.key == key:
        return some(slot.val)
  none(V)

proc `[]`*[K,V](t: FunnelTable[K,V], key: K): V =
  # Search main arrays
  let mainResult = searchArrays(t, key)
  if mainResult.isSome:
    return mainResult.get
  
  # Search special array B
  let specialBResult = searchSpecialB(t, key)
  if specialBResult.isSome:
    return specialBResult.get
  
  # Search special array C
  let specialCResult = searchSpecialC(t, key)
  if specialCResult.isSome:
    return specialCResult.get
  
  raise newException(KeyError, "Key not found: " & $key)

proc contains*[K,V](t: FunnelTable[K,V], key: K): bool =
  try:
    discard t[key]
    true
  except KeyError:
    false

proc len*[K,V](t: FunnelTable[K,V]): int = t.count

proc clear*[K,V](t: var FunnelTable[K,V]) =
  t = initFunnelTable[K,V](64)

iterator pairs*[K,V](t: FunnelTable[K,V]): (K,V) =
  for array in t.arrays:
    for subarray in array:
      for slot in subarray:
        if slot.occupied:
          yield (slot.key, slot.val)
  
  for slot in t.specialB:
    if slot.occupied:
      yield (slot.key, slot.val)
  
  for bucket in t.specialC:
    for slot in bucket:
      if slot.occupied:
        yield (slot.key, slot.val)

when isMainModule:
  # Basic tests
  var t = initFunnelTable[string, int]()
  t["one"] = 1
  t["two"] = 2
  assert t["one"] == 1
  assert t["two"] == 2
  assert t.len == 2
  assert "one" in t
  assert "three" notin t
  echo "Basic tests passed!"

kobi (orginal) [2025-02-13T15:20:49+01:00] view original

But from initial tests it's slower than std's hashtable

Araq (orginal) [2025-02-13T15:52:22+01:00] view original

Why wouldn't it be slower? There is no reason to assume this thing to be fast for realistic N. ;-)

alexeypetrushin (orginal) [2025-02-15T09:47:10+01:00] view original

(A side though) Seems like the Declarative Programming is finally here. I think there's 90% that in next 1-5 years all programs will look like code below. And the idea to write implementation will be as absurd as writing in assembler today.

You define the code, and the implementation is just a derivative artefact, same like build step or fetching dependencies. There probably will be also step when AI generate bunch of tests for each function, and you review some of it quickly to confirm its correctness.


type FunnelHash* = object
  len: int

proc new*(_type: FunnelHash, n: int): FunnelHash =
  Create a new FunnelHash object with n elements

proc set[K, V]*(self: var FunnelHash[K, V], k: K, v: V) =
  Insert a key into the FunnelHash object

proc get[K, V]*(self: FunnelHash[K, V], k: K): V =
  Get the value of a key in the FunnelHash object

P.S. If there will be any programming at all...

elcritch (orginal) [2025-02-15T17:02:34+01:00] view original

Nitpick on the syntax :P

Actually you can basically do similar with one of the LLM IDEs nowadays. You could probably make the below work today sorta. Though probably manually generating and modifying tests. A friend of mine has been running DeepSeek locally using ollama on a macbook at a few tokens/sec. Just add an agent to generate code, compile, and test it iteratively.

I think a lot of programming will move to be more like writing "contracts" or "theorems" in the form of tests, requirements, contexts. It could actually be kinda fun in that you choose which code you write by hand. Sorta like we do with assembly currently.


type FunnelHash* = object
  len: int
  
  deepClaudeChatContext("Details of data structure and algorithm described in ./docs/funnel-hash.pdf")

proc new*(_type: FunnelHash, n: int): FunnelHash =
  deepClaudeChatImpl("Create a new FunnelHash object with n elements")

...

EverettChanning (orginal) [2025-03-22T13:30:29+01:00] view original

Thank you so much for answering.

Mirror of forum.nim-lang.org

12679 :: Funnel Hash?