πŸ”€

Trie β€” Prefix Tree

File: trie.ml

Map.Make functor recursive records persistence prefix search auto-complete tree pruning string manipulation

Overview

A trie (pronounced "try") is a tree where each edge represents a character. Words sharing a common prefix share the same path from the root β€” making prefix search and auto-complete natural O(m) operations (where m is the query length, independent of the number of stored words).

This implementation is purely functional β€” every operation returns a new trie while the original is preserved. The CharMap (via Map.Make(Char)) gives sorted character keys for free, so all_words returns results in lexicographic order without explicit sorting.

Data Type

Each node has a boolean flag (is_word) marking whether a complete word ends here, and a map of children indexed by character:

module CharMap = Map.Make(Char)

type trie = {
  is_word: bool;              (* does a word end at this node? *)
  children: trie CharMap.t;   (* child nodes keyed by character *)
}

For example, inserting "an", "ant", "and" creates:

(root)
└── a
    └── n *        (* "an" ends here *)
        β”œβ”€β”€ d *    (* "and" ends here *)
        └── t *    (* "ant" ends here *)

Core Operations

Insert β€” Walk Down, Create Nodes

Recursively traverse the trie character by character, creating new nodes as needed. At the last character, set is_word = true:

let insert word trie =
  let chars = chars_of_string word in
  let rec aux chars node =
    match chars with
    | [] -> { node with is_word = true }
    | c :: rest ->
      let child = match CharMap.find_opt c node.children with
        | Some t -> t
        | None -> empty
      in
      let updated_child = aux rest child in
      { node with children = CharMap.add c updated_child node.children }
  in
  aux chars trie

Member β€” Check if a Word Exists

Follow the path character by character. The word exists only if we reach the end and is_word is true:

let member word trie =
  let rec aux chars node =
    match chars with
    | [] -> node.is_word
    | c :: rest ->
      match CharMap.find_opt c node.children with
      | None -> false
      | Some child -> aux rest child
  in
  aux chars trie

Delete with Pruning

Unmark the word, then prune dead-end nodes (nodes with no words and no children) on the way back up:

let delete word trie =
  let rec aux chars node =
    match chars with
    | [] -> { node with is_word = false }
    | c :: rest ->
      (* ... update child, then prune if empty *)
      if not updated.is_word && CharMap.is_empty updated.children
      then { node with children = CharMap.remove c node.children }
      else { node with children = CharMap.add c updated node.children }
  in
  aux chars trie

Prefix Operations

Auto-Complete β€” Find All Words with a Prefix

Navigate to the prefix node, then collect all words in the subtree. Because CharMap maintains sorted order, results are lexicographic:

let words_with_prefix prefix trie =
  match find_subtrie prefix trie with
  | None -> []
  | Some subtrie ->
    (* DFS through subtrie, collecting words *)
    collect prefix_chars subtrie

Longest Common Prefix

Walk down single-child paths until a branch point or word-end is reached:

let longest_common_prefix trie =
  let rec aux node =
    match CharMap.bindings node.children with
    | [(c, child)] when not node.is_word ->
      String.make 1 c ^ aux child
    | _ -> ""
  in
  aux trie

All Operations

  • insert: Add a word to the trie β€” O(m)
  • member: Check if a word exists β€” O(m)
  • delete: Remove a word with node pruning β€” O(m)
  • has_prefix: Check if any word starts with a prefix β€” O(m)
  • words_with_prefix: Auto-complete β€” find all words with given prefix β€” O(m + k) where k = results
  • all_words: List all words in sorted order β€” O(n)
  • word_count: Count stored words β€” O(nodes)
  • node_count: Count trie nodes β€” O(nodes)
  • longest_common_prefix: Find the shared prefix of all words β€” O(m)
  • of_list: Build a trie from a word list
  • to_string: ASCII tree visualization

Output

=== Trie (Prefix Tree) ===

Inserted 14 words

--- Membership ---
member "apple":  true
member "app":    true
member "ap":     false
member "bat":    true
member "batman": false

--- Prefix Search ---
has_prefix "app": true
has_prefix "xyz": false

--- Auto-complete ---
  "app" -> [app; apple; application; apply]
  "car" -> [car; card; care; careful; cart]
  "bat" -> [bat; batch; bath]
  "ba"  -> [bad; bat; batch; bath]
  "z"   -> []

--- Statistics ---
Word count: 14
All words: [app; apple; application; apply; apt; bad; bat; batch; bath; car; card; care; careful; cart]

LCP of [flower; flow; flight]: "fl"
LCP of [test; testing; tested; tester]: "test"

--- Deletion ---
After deleting "apple":
  member "apple": false
  member "app":   true
  word count: 13

--- Persistence ---
Original still has "apple": true
Original word count: 14

--- Trie Structure ---
(root)
└── a
    └── n *
        β”œβ”€β”€ d *
        └── t *

Concepts Introduced

  • Map.Make functor: Map.Make(Char) creates a balanced BST map keyed by characters. The functor pattern lets you create type-safe containers for any ordered key type β€” reuse OCaml's standard library instead of rolling your own.
  • Recursive records: The trie type contains trie CharMap.t as a field β€” a record that references itself. Unlike algebraic data types (BST, heap), this uses named fields for clarity.
  • Persistence: Like the heap, every operation returns a new trie. Inserting or deleting creates a new path while sharing unchanged subtrees β€” efficient copy-on-write.
  • Tree pruning: Deletion doesn't just unmark words β€” it removes now-unnecessary nodes on the way back up the recursion. This keeps the trie compact.
  • Prefix-based retrieval: The trie's structure naturally groups words by prefix, making auto-complete a simple subtree traversal rather than a linear scan.

Complexity

OperationTimeNotes
insertO(m)m = word length
memberO(m)independent of # words
deleteO(m)includes pruning
has_prefixO(m)m = prefix length
words_with_prefixO(m + k)k = number of matches
all_wordsO(n)n = total characters stored
word_countO(nodes)full traversal
longest_common_prefixO(m)m = LCP length

Space: O(Ξ£ Γ— n) worst case, where Ξ£ is alphabet size (256 for ASCII) and n is total characters. In practice, shared prefixes reduce this significantly.