šŸ“– Earley Parser

Chart parser for any context-free grammar — handles ambiguity and left-recursion

Overview

The Earley algorithm (Jay Earley, 1970) can parse any context-free grammar — including ambiguous and left-recursive ones — with O(n³) worst-case, O(n²) for unambiguous grammars, and O(n) for many practical grammars. Unlike recursive descent or combinator parsers, it never needs grammar rewriting to eliminate left recursion.

This implementation builds a parse chart of Earley items (dotted rules with origin positions) and supports shared packed parse forests (SPPF) for representing all parse trees of ambiguous input.

Concepts Demonstrated

The Algorithm

(* Define a grammar for arithmetic expressions *)
let g = grammar "Expr" [
  rule "Expr"   [nt "Expr"; t '+'; nt "Term"];    (* left-recursive! *)
  rule "Expr"   [nt "Term"];
  rule "Term"   [nt "Term"; t '*'; nt "Factor"];
  rule "Term"   [nt "Factor"];
  rule "Factor" [t '('; nt "Expr"; t ')'];
  rule "Factor" [t_pred is_digit];
]

(* Parse "3+4*5" — works despite left recursion *)
let chart = parse g "3+4*5"

(* Three operations per chart position:
   PREDICT: if dot before nonterminal, add rules for that nonterminal
   SCAN:    if dot before terminal matching input, advance dot
   COMPLETE: if dot at end, advance all items that predicted this rule *)

Earley vs Other Parsers

ParserGrammar ClassLeft RecursionAmbiguity
EarleyAll CFGsāœ… Handledāœ… All trees via SPPF
CombinatorsLL(āˆž) / PEG-likeāŒ Infinite loopāŒ First match only
Recursive descentLL(k)āŒ Must rewriteāŒ Not supported
LALR (yacc)LALR(1)āœ… HandledāŒ Shift-reduce conflicts

When to Use Earley

Related Modules