🌸 Bloom Filter

Probabilistic set membership with no false negatives

Overview

A space-efficient probabilistic data structure that tests whether an element is a member of a set. May return false positives ("maybe in set") but never false negatives ("definitely not in set"). Used in databases, caches, and network protocols to avoid expensive lookups.

Concepts Demonstrated

How It Works

(* 1. Create: bit array of size m, k hash functions *)
(* 2. Add: set bits at h₁(x), h₂(x), ..., hₖ(x) *)
(* 3. Query: check if ALL k bits are set *)
(*    - All set → "probably in set" (false positive possible) *)
(*    - Any unset → "definitely not in set" (guaranteed) *)

let bf = create ~expected_items:1000 ~false_positive_rate:0.01
let bf = add "hello" bf
let _ = mem "hello" bf    (* → true  (definitely added) *)
let _ = mem "world" bf    (* → false (definitely not, or rare FP) *)

Key Functions

FunctionDescription
createCreate filter with target capacity and FP rate
addInsert an element
memTest membership (may return false positive)
false_positive_rateCurrent estimated FP probability
size_bytesMemory usage of the filter
unionMerge two filters (bitwise OR)
intersectionIntersect two filters (bitwise AND)

When to Use