๐ง Neural Network
Feedforward multilayer perceptron with backpropagation โ from scratch
Overview
A complete implementation of a multilayer perceptron (MLP) using only OCaml's standard library. No frameworks, no dependencies โ just matrix math and calculus implemented by hand. Demonstrates how neural networks learn through gradient descent and backpropagation.
Features
- Configurable architecture โ arbitrary depth and layer sizes
- Activation functions โ Sigmoid, Tanh, ReLU, Leaky ReLU, Softmax, Linear
- Weight initialization โ Xavier and He schemes for stable training
- Training modes โ batch and stochastic gradient descent
- Loss functions โ Mean Squared Error, Cross-entropy
- Learning rate scheduling โ constant, decay, step
- Momentum & gradient clipping โ stabilize training
- Model serialization โ save/load trained weights
Concepts Demonstrated
- Backpropagation โ chain rule applied layer-by-layer to compute gradients
- Numerical computing in OCaml โ matrix operations without NumPy
- Functional design โ pure functions for forward pass, imperative updates for weights
- Box-Muller transform โ generating normally distributed random numbers
- Type-safe ML โ OCaml's type system catches dimension mismatches at compile time
How It Works
(* 1. Initialize network with random weights *)
(* Xavier: scale = sqrt(2 / (fan_in + fan_out)) *)
(* He: scale = sqrt(2 / fan_in) *)
(* 2. Forward pass: propagate input through layers *)
(* output_i = activate(W_i ยท input_i + bias_i) *)
(* 3. Compute loss (MSE or cross-entropy) *)
(* 4. Backward pass: compute gradients via chain rule *)
(* ฮด_output = (predicted - target) * activate' *)
(* ฮด_hidden = (W_next^T ยท ฮด_next) * activate' *)
(* 5. Update weights: W -= learning_rate * ฮด ยท input^T *)
(* 6. Repeat until convergence *)
Demo Problems
- XOR โ classic non-linearly-separable problem proving hidden layers work
- AND / OR โ linearly separable baselines
- Circle classification โ 2D points inside/outside a circle
Key Takeaways
- Neural networks are fundamentally function approximators composed of matrix multiplications and non-linearities
- Backpropagation is just the chain rule applied systematically
- Weight initialization matters enormously โ poor init leads to vanishing/exploding gradients
- OCaml's strict evaluation and explicit mutation make the dataflow crystal clear compared to framework magic