Contents – Learning Statistics in Python

Tap any chapter to start reading.

Chapter 1 Distributions, Tails, and Anomalies

Empirical and theoretical distributions, KDE, bootstrap CIs, hypothesis testing, Pearson / Spearman / distance correlation, GEV & GPD for tail patterns, Mahalanobis and Isolation Forest, multiple-testing correction.

Chapter 2 Statistical Predictive Models

Multiple linear regression, diagnostics and leverage, nonlinear transformations, best-subset / forward / backward / stepwise selection, k-fold and TimeSeriesSplit cross-validation, Ridge / LASSO / Elastic Net, PCA, tree ensembles.

Chapter 3 Rethinking Statistics with Bayesian Methods

Bayes’ theorem, conjugate Beta-Binomial and Normal-Normal, hand-coded Metropolis-Hastings MCMC, robust regression with Student-t errors, Bayesian linear regression, change-point detection, hierarchical shrinkage.

Chapter 4 Time Series Models

Pandas time-series methods, ADF + KPSS stationarity tests, ACF / PACF, ARIMA for the conditional mean, GARCH for volatility clustering, cointegration & pairs trading, Markov-switching regimes, a full mean-reversion backtest.

Chapter 5 Clustering for Unsupervised Pattern Discovery

K-means, hierarchical agglomerative clustering with dendrograms, DBSCAN, Gaussian mixture models, spectral clustering, cluster validation, and Hierarchical Risk Parity (HRP) — clustering at the heart of modern portfolio construction.

Chapter 6 Pattern Recognition: A Hedge-Fund Perspective

The full pipeline: framing, feature engineering, classifier zoo (KNN, logistic, SVM, gradient boosting), t-SNE visualisation, Hidden Markov Models, template matching, a Renaissance-style signal hunt, and the six ways patterns lie.

How to read this book

Every Python code block in this book runs live in your browser. Click into any cell, edit it, press the ▶ Run button, and see the output. The Python engine (Pyodide) downloads once on the first chapter — after that, everything is instant.

This book assumes you are already comfortable with pandas DataFrames, NumPy arrays, and basic plotting in Python. If those words make you nervous, work through an introductory Python-for-data-analysis book first and then return.

Why these six chapters, in this order

Chapter 1 teaches you to look at a single variable and a pair of variables honestly — distributions, tails, tests, and associations. Every signal you ever trade started life as one of these.
Chapter 2 is the workhorse: turn many candidate features into a parsimonious predictive model, with explicit machinery for choosing which features to keep.
Chapter 3 is the rethink. When data is scarce, returns are fat-tailed, or you want to fuse beliefs with evidence, Bayesian inference is no longer optional.
Chapter 4 is where time enters the picture — autocorrelation, mean reversion, volatility clustering. Without it you cannot trade.
Chapter 5 flips to unsupervised — clustering, dendrograms, Hierarchical Risk Parity. Most data arriving at a research desk has no labels.
Chapter 6 is the capstone discipline: the workflow, the classifier zoo, sequence patterns with HMMs, and the six ways patterns lie.

What this book is not

This is not a course on neural networks, reinforcement learning, or alternative-data NLP. Those are downstream of the foundations covered here. Get the foundations right; the rest is implementation.

← Back to Cover