🐦 Cuckoo Filter
Probabilistic membership with deletion support
Overview
A Cuckoo Filter is a space-efficient probabilistic data structure for approximate set membership testing — like a Bloom filter, but with a killer feature: it supports deletion. Based on cuckoo hashing, it stores compact fingerprints in a bucket array and uses the "kick-and-relocate" strategy when buckets are full.
Reference: "Cuckoo Filter: Practically Better Than Bloom" — Fan, Andersen, Kaminsky, Mitzenmacher (CoNEXT 2014).
Concepts Demonstrated
- Cuckoo hashing — two candidate buckets per item, eviction chains
- Fingerprints — compact hash summaries for space efficiency
- Deletion in probabilistic structures — the key advantage over Bloom filters
- False positive analysis — configurable via fingerprint size
- Imperative OCaml — mutable arrays, refs, while loops
Cuckoo Filter vs Bloom Filter
| Feature | Bloom Filter | Cuckoo Filter |
|---|---|---|
| Membership test | ✅ | ✅ |
| Deletion | ❌ | ✅ |
| Space efficiency | Good | Better (for FPR < 3%) |
| Lookup speed | k hash lookups | 2 bucket lookups |
| Insert worst case | O(k) | O(max_kicks) |
API
val create : int -> t (* create filter with capacity *)
val insert : t -> string -> bool (* insert item, returns success *)
val lookup : t -> string -> bool (* check membership *)
val delete : t -> string -> bool (* remove item, returns success *)
val size : t -> int (* current item count *)
val load_factor : t -> float (* utilization ratio *)
val stats : t -> string (* formatted statistics *)
How It Works
- Compute the item's fingerprint (compact hash)
- Compute two candidate bucket indices:
i1 = hash(item),i2 = i1 ⊕ hash(fingerprint) - If either bucket has space, store the fingerprint there
- If both are full, evict a random entry and relocate it to its alternate bucket
- Repeat relocation up to
max_kickstimes
Run It
ocamlfind ocamlopt -package str -linkpkg cuckoo_filter.ml -o cuckoo_filter
./cuckoo_filter
Or simply: ocamlfind ocamlopt cuckoo_filter.ml -o cuckoo_filter && ./cuckoo_filter