Parser Combinators
File: parser.ml
Overview
A parser combinator library lets you build complex parsers by snapping together small, reusable parser functions β like LEGO bricks. Each parser is a first-class function that can be passed around, combined, and composed.
This is a quintessential functional programming pattern. The library includes primitives (single characters, strings, digits), combinators (sequencing, choice, repetition), and three complete example parsers: arithmetic expressions, integer lists, and key-value pairs.
Core Design
A parser is simply a function that takes an input string and position, and returns either a parsed value with the new position, or an error:
type 'a result =
| Ok of 'a * int (* value + new position *)
| Error of string * int (* message + error position *)
type 'a parser = string -> int -> 'a result
Primitive Parsers
(* Match a character by predicate *)
let satisfy pred desc = fun input pos ->
if pos >= String.length input then Error "end of input"
else if pred input.[pos] then Ok (input.[pos], pos + 1)
else Error "no match"
(* Built on satisfy: *)
let digit = satisfy (fun c -> c >= '0' && c <= '9') "digit"
let letter = satisfy (fun c -> (*...*)) "letter"
let char_ = fun expected -> satisfy (fun c -> c = expected)
Key Combinators
Bind (>>=) β Monadic Sequencing
Run parser A, then feed its result to a function that produces parser B. This is the fundamental way to sequence parsers:
let bind p f = fun input pos ->
match p input pos with
| Error _ as e -> e
| Ok (a, pos') -> (f a) input pos'
(* Example: parse two letters into a string *)
let two_letters =
letter >>= fun a ->
letter >>= fun b ->
return_ (String.init 2 (fun i -> if i = 0 then a else b))
Choice (<|>) β Try Alternatives
let bool_parser =
(string_ "true" *> return_ true)
<|>
(string_ "false" *> return_ false)
Repetition β many, sep_by, between
(* Parse a comma-separated list of integers in brackets *)
let int_list =
between (char_ '[') (char_ ']')
(sep_by integer (char_ ','))
(* "[1, 2, 3]" β [1; 2; 3] *)
chainl1 / chainr1 β Operator Precedence
Parse left-associative or right-associative infix operators. This is the key to correct arithmetic precedence:
(* Left-assoc: 10 - 3 - 2 = (10-3)-2 = 5 *)
let expr = chainl1 term add_op
(* Right-assoc: 2^3^2 = 2^(3^2) = 512 *)
let power = chainr1 atom pow_op
Arithmetic Expression Parser
A complete calculator built entirely from combinators. Handles +, -, *, /, ^, and parentheses with correct precedence:
type expr =
| Num of int
| Add of expr * expr
| Sub of expr * expr
| Mul of expr * expr | ...
(* Grammar with precedence layers: *)
let expr = chainl1 term add_op (* lowest: + - *)
let term = chainl1 power mul_op (* middle: * / *)
let power = chainr1 atom pow_op (* highest: ^ *)
let atom = integer <|> parens (* base: num or (expr) *)
Output
=== Parser Combinators ===
Building parsers from small, composable pieces
--- Primitive parsers ---
digit on "42": OK: 4
letter on "abc": OK: a
char 'x' on "xyz": OK: x
string "hi" on "hi there": OK: "hi"
--- Number parsing ---
integer "42": OK: 42
integer "-7": OK: -7
integer "0": OK: 0
integer "12345": OK: 12345
integer "abc": Error: expected digit at position 0
--- Combinators: many, sep_by, between ---
int list "[1, 2, 3]": OK: [1; 2; 3]
int list "[]": OK: []
int list "[42]": OK: [42]
--- Key-value parser ---
kv: OK: name = "Alice"
kv list: OK: {name = "Alice", age = "30", city = "NYC"}
--- Arithmetic expression parser ---
Supports: + - * / ^ () with correct precedence
42 AST: 42 = 42
2 + 3 AST: (2 + 3) = 5
2 + 3 * 4 AST: (2 + (3 * 4)) = 14
(2 + 3) * 4 AST: ((2 + 3) * 4) = 20
2 ^ 3 ^ 2 AST: (2 ^ (3 ^ 2)) = 512
10 - 3 - 2 AST: ((10 - 3) - 2) = 5
100 / 5 / 4 AST: ((100 / 5) / 4) = 5
(1 + 2) * (3 + 4) AST: ((1 + 2) * (3 + 4)) = 21
2 ^ 10 AST: (2 ^ 10) = 1024
((3 + 5) * 2) - (10 / 2) AST: (((3 + 5) * 2) - (10 / 2)) = 11
--- Quick calculator ---
calc "1+2+3" = 6
calc "2*3+4" = 10
calc "2*(3+4)" = 14
calc "2^8" = 256
calc "-5 + 10" = 5
Combinator Reference
| Combinator | Operator | Description |
|---|---|---|
bind | >>= | Sequence two parsers (monadic bind) |
map | <$> | Transform a parser's result |
| β | <*> | Applicative apply |
| β | <|> | Choice: try first, then second |
| β | *> | Sequence, keep right result |
| β | <* | Sequence, keep left result |
many | β | Zero or more repetitions |
many1 | β | One or more repetitions |
sep_by | β | Delimited lists |
between | β | Parse between delimiters |
chainl1 | β | Left-associative operators |
chainr1 | β | Right-associative operators |
optional | β | Maybe parse (returns option) |
try_ | β | Backtracking on failure |
Concepts Introduced
- Higher-order functions: Parsers are functions that take and return functions. Combinators are functions that take parser functions and produce new parser functions. Functions all the way down.
- Monadic composition: The
bind(>>=) operator sequences parsers while threading state (position). This is the same pattern as Haskell'sIOmonad,Resultmonad, etc. - Closures: Each parser captures its configuration (expected character, predicate) in a closure β a function bundled with its environment.
- Recursive descent: The expression parser demonstrates mutual recursion β
exprcallsterm, which callsatom, which can call back toexprvia parentheses. - Operator precedence: Layered grammar rules (expr β term β power β atom) naturally encode precedence without any special algorithm.
- Polymorphism: The same combinators work with any value type β
'a parsercan produce chars, ints, strings, AST nodes, or anything else.