๐งฎ Automatic Differentiation
Exact derivatives via dual numbers and computation graphs
Overview
A complete automatic differentiation library supporting both forward mode (dual numbers) and reverse mode (computation graphs). Unlike symbolic differentiation (which manipulates expressions) or numerical differentiation (which approximates with finite differences), AD computes exact derivatives at machine precision by applying the chain rule to elementary operations.
This is the core technique behind modern deep learning frameworks like PyTorch and TensorFlow, implemented from scratch in ~1000 lines of OCaml.
Concepts Demonstrated
- Dual numbers โ forward-mode AD via (value, derivative) pairs
- Computation graphs โ reverse-mode AD with backpropagation
- Operator overloading โ seamless derivative tracking through arithmetic
- Chain rule โ composing derivatives of elementary operations
- Gradient computation โ partial derivatives of multivariate functions
- Jacobian & Hessian matrices โ higher-order derivative structures
- Gradient descent โ optimization using computed gradients
- Neural network building blocks โ layers and activations via AD
Forward Mode (Dual Numbers)
Each value carries its derivative alongside it. One forward pass computes the derivative with respect to one input variable. Best for functions with few inputs, many outputs (cost: O(n) passes for n inputs).
(* Dual number: value + tangent *)
type t = { v : float; d : float }
(* Derivative of xยฒ + sin(x) at x = ฯ/4 *)
let f x = Forward.(x * x + sin x)
let deriv = Forward.diff f (Float.pi /. 4.0)
(* deriv โ 2ยท(ฯ/4) + cos(ฯ/4) โ 2.278 *)
Reverse Mode (Backpropagation)
Builds a computation graph during the forward pass, then propagates gradients backward. One reverse pass computes derivatives with respect to all inputs. Best for functions with many inputs, few outputs โ exactly the case for loss functions in ML.
(* Build computation graph *)
let x = Reverse.var 2.0
let y = Reverse.var 3.0
let z = Reverse.(x * y + sin x)
(* Backpropagate from z *)
let () = Reverse.backward z
let dz_dx = Reverse.grad x (* โz/โx = y + cos(x) *)
let dz_dy = Reverse.grad y (* โz/โy = x *)
Key Functions
Forward Module
| Function | Description |
|---|---|
var x | Create independent variable (derivative = 1) |
const x | Create constant (derivative = 0) |
diff f x | Compute f'(x) via one forward pass |
nth_diff n f x | n-th derivative (AD + finite differences) |
gradient f x | Gradient โf at point x (array) |
jacobian f x | Jacobian matrix of f : โโฟ โ โแต |
hessian f x | Hessian matrix of f : โโฟ โ โ |
directional_deriv f x v | Directional derivative along v |
Reverse Module
| Function | Description |
|---|---|
var x | Create input node in computation graph |
backward z | Backpropagate gradients from output z |
grad x | Read accumulated gradient โz/โx |
gradient f x | Gradient via single reverse pass |
grad_descent f x0 ~lr ~steps | Minimize f by gradient descent |
Supported Operations
| Category | Operations |
|---|---|
| Arithmetic | + - * / neg abs pow |
| Trigonometric | sin cos tan asin acos atan atan2 |
| Hyperbolic | sinh cosh tanh |
| Exponential | exp log sqrt |
| Activations | sigmoid relu softplus |
Forward vs Reverse Mode
| Forward Mode | Reverse Mode | |
|---|---|---|
| Mechanism | Dual numbers (value + tangent) | Computation graph + backprop |
| Cost per pass | One โf/โxแตข | All โf/โxแตข |
| Best for | f : โ โ โแต (few inputs) | f : โโฟ โ โ (few outputs) |
| Memory | O(1) extra | O(ops) โ stores graph |
| Use case | Jacobian columns, sensitivities | Loss function gradients (ML) |
When to Use
- Machine learning โ training neural networks via backpropagation
- Scientific computing โ sensitivities in simulations and ODEs
- Optimization โ gradient-based minimization (Newton, L-BFGS)
- Probabilistic programming โ gradient estimation in inference
- Physics engines โ differentiable simulation for control