🧠 Neural Network

Feedforward multilayer perceptron with backpropagation — from scratch

Overview

A complete implementation of a multilayer perceptron (MLP) using only OCaml's standard library. No frameworks, no dependencies — just matrix math and calculus implemented by hand. Demonstrates how neural networks learn through gradient descent and backpropagation.

Features

Configurable architecture — arbitrary depth and layer sizes
Activation functions — Sigmoid, Tanh, ReLU, Leaky ReLU, Softmax, Linear
Weight initialization — Xavier and He schemes for stable training
Training modes — batch and stochastic gradient descent
Loss functions — Mean Squared Error, Cross-entropy
Learning rate scheduling — constant, decay, step
Momentum & gradient clipping — stabilize training
Model serialization — save/load trained weights

Concepts Demonstrated

Backpropagation — chain rule applied layer-by-layer to compute gradients
Numerical computing in OCaml — matrix operations without NumPy
Functional design — pure functions for forward pass, imperative updates for weights
Box-Muller transform — generating normally distributed random numbers
Type-safe ML — OCaml's type system catches dimension mismatches at compile time

How It Works

(* 1. Initialize network with random weights *)
(*    Xavier: scale = sqrt(2 / (fan_in + fan_out)) *)
(*    He:     scale = sqrt(2 / fan_in)             *)

(* 2. Forward pass: propagate input through layers *)
(*    output_i = activate(W_i · input_i + bias_i)  *)

(* 3. Compute loss (MSE or cross-entropy) *)

(* 4. Backward pass: compute gradients via chain rule *)
(*    δ_output = (predicted - target) * activate'   *)
(*    δ_hidden = (W_next^T · δ_next) * activate'    *)

(* 5. Update weights: W -= learning_rate * δ · input^T *)

(* 6. Repeat until convergence *)

Demo Problems

XOR — classic non-linearly-separable problem proving hidden layers work
AND / OR — linearly separable baselines
Circle classification — 2D points inside/outside a circle

Key Takeaways

Neural networks are fundamentally function approximators composed of matrix multiplications and non-linearities
Backpropagation is just the chain rule applied systematically
Weight initialization matters enormously — poor init leads to vanishing/exploding gradients
OCaml's strict evaluation and explicit mutation make the dataflow crystal clear compared to framework magic