🧮 Automatic Differentiation

Exact derivatives via dual numbers and computation graphs

Overview

A complete automatic differentiation library supporting both forward mode (dual numbers) and reverse mode (computation graphs). Unlike symbolic differentiation (which manipulates expressions) or numerical differentiation (which approximates with finite differences), AD computes exact derivatives at machine precision by applying the chain rule to elementary operations.

This is the core technique behind modern deep learning frameworks like PyTorch and TensorFlow, implemented from scratch in ~1000 lines of OCaml.

Concepts Demonstrated

Dual numbers — forward-mode AD via (value, derivative) pairs
Computation graphs — reverse-mode AD with backpropagation
Operator overloading — seamless derivative tracking through arithmetic
Chain rule — composing derivatives of elementary operations
Gradient computation — partial derivatives of multivariate functions
Jacobian & Hessian matrices — higher-order derivative structures
Gradient descent — optimization using computed gradients
Neural network building blocks — layers and activations via AD

Forward Mode (Dual Numbers)

Each value carries its derivative alongside it. One forward pass computes the derivative with respect to one input variable. Best for functions with few inputs, many outputs (cost: O(n) passes for n inputs).

(* Dual number: value + tangent *)
type t = { v : float; d : float }

(* Derivative of x² + sin(x) at x = π/4 *)
let f x = Forward.(x * x + sin x)
let deriv = Forward.diff f (Float.pi /. 4.0)
(* deriv ≈ 2·(π/4) + cos(π/4) ≈ 2.278 *)

Reverse Mode (Backpropagation)

Builds a computation graph during the forward pass, then propagates gradients backward. One reverse pass computes derivatives with respect to all inputs. Best for functions with many inputs, few outputs — exactly the case for loss functions in ML.

(* Build computation graph *)
let x = Reverse.var 2.0
let y = Reverse.var 3.0
let z = Reverse.(x * y + sin x)

(* Backpropagate from z *)
let () = Reverse.backward z
let dz_dx = Reverse.grad x   (* ∂z/∂x = y + cos(x) *)
let dz_dy = Reverse.grad y   (* ∂z/∂y = x *)

Key Functions

Forward Module

Function	Description
`var x`	Create independent variable (derivative = 1)
`const x`	Create constant (derivative = 0)
`diff f x`	Compute f'(x) via one forward pass
`nth_diff n f x`	n-th derivative (AD + finite differences)
`gradient f x`	Gradient ∇f at point x (array)
`jacobian f x`	Jacobian matrix of f : ℝⁿ → ℝᵐ
`hessian f x`	Hessian matrix of f : ℝⁿ → ℝ
`directional_deriv f x v`	Directional derivative along v

Reverse Module

Function	Description
`var x`	Create input node in computation graph
`backward z`	Backpropagate gradients from output z
`grad x`	Read accumulated gradient ∂z/∂x
`gradient f x`	Gradient via single reverse pass
`grad_descent f x0 ~lr ~steps`	Minimize f by gradient descent

Supported Operations

Category	Operations
Arithmetic	`+ - * / neg abs pow`
Trigonometric	`sin cos tan asin acos atan atan2`
Hyperbolic	`sinh cosh tanh`
Exponential	`exp log sqrt`
Activations	`sigmoid relu softplus`

Forward vs Reverse Mode

	Forward Mode	Reverse Mode
Mechanism	Dual numbers (value + tangent)	Computation graph + backprop
Cost per pass	One ∂f/∂xᵢ	All ∂f/∂xᵢ
Best for	f : ℝ → ℝᵐ (few inputs)	f : ℝⁿ → ℝ (few outputs)
Memory	O(1) extra	O(ops) — stores graph
Use case	Jacobian columns, sensitivities	Loss function gradients (ML)

When to Use

Machine learning — training neural networks via backpropagation
Scientific computing — sensitivities in simulations and ODEs
Optimization — gradient-based minimization (Newton, L-BFGS)
Probabilistic programming — gradient estimation in inference
Physics engines — differentiable simulation for control