M/01 — Random Matrix Theory
Eigenspectrum of the strategy correlation matrix
Marchenko–Pastur theory and parallel analysis as a noise-floor for the principal components of a strategy population.
The mathematics
Suppose you observe an N × T matrix of standardized returns X (each of N strategies recorded over T bars, each row mean-zero unit-variance). The sample correlation matrix is
If the rows of X are independent N(0, 1) — i.e. there is no real structure — then in the joint limit N, T → ∞ with fixed ratio q = N/T, the empirical eigenvalue distribution of C converges to a deterministic density on a finite support [λ₋, λ₊]:
For standardized data σ² = 1. The interval [λ₋, λ₊] is the bulk. Any eigenvalue outside the bulk cannot be explained by sample-size noise alone — it is a candidate for real structure. At q = 0.1, λ₊ ≈ 1.73; at q = 0.2, λ₊ ≈ 2.09; at q = 0.5, λ₊ ≈ 2.91. The bulk widens as q grows because with fewer observations per dimension, sample correlations become noisier.
Parallel analysis as a non-parametric null
MP assumes Gaussian, identically-distributed columns. Real strategy returns violate both. Parallel analysis (Horn 1965) substitutes a fully data-driven null: take X, independently permute each row in time, recompute the eigenvalues. Repeat B = 1000 times and record the maximum eigenvalue at each replication. The 99th-percentile of that distribution is a non-parametric upper bound on the bulk:
Permuting in time preserves each row’s marginal distribution (so heavy-tailed returns stay heavy-tailed) but destroys cross-row dependence. An eigenvalue exceeding both λ₊ from MP and the PA 99th-percentile is robust signal.
Worked example
Take N = 60 strategies, T = 300 bars, q = 0.2. Plant a single correlated cluster: 10 of the 60 rows are loaded on a common factor with intra-cluster correlation ρ = 0.4. The other 50 are independent N(0,1).
- MP bulk: λ₊ = (1 + √0.2)² ≈ 2.087, λ₋ ≈ 0.106.
- Theoretical leading eigenvalue from the planted cluster ≈ k · ρ + (1−ρ) ≈ 10 · 0.4 + 0.6 = 4.6.
- The remaining 59 eigenvalues should fall inside [λ₋, λ₊].
The interactive demo below recomputes this every time you change a slider — the bulk-edge marker λ₊ moves with q, and any eigenvalue exceeding it is highlighted in amber. Drop the cluster size to zero and the bulk is all you see.
Demo — Marchenko–Pastur eigenspectrum
Generate an N×T strategy returns matrix with a planted correlated cluster. Eigenvalues outside the MP bulk are signal.
Solid curve: MP density ρ_MP(λ) for q=0.200. Bars: empirical histogram of 60 sample-correlation eigenvalues. Bars in amber are above the bulk edge λ₊=2.094 — these are the signal eigenvalues.
Figures
Why this matters for systematic strategies
A strategy population at q = N/T = 0.2 (say N = 6,000 strategies on T = 30,000 bars of common history) will exhibit several apparent factors in its empirical correlation matrix purely by sample-size noise. Building a portfolio that diversifies along these directions will not diversify anything — it will diversify noise. The MP bound is the cheapest non-parametric guard against this failure mode. In the firm’s production pipeline this check runs before any clustering or factor decomposition.
Mathematically equivalent statement: the leading eigenvector of an unstructured correlation matrix has a participation ratio that is a known function of q. Our M/01 implementation reports both the bulk upper bound and the participation ratio of the leading eigenvector against its noise distribution.
Reproducibility
DaruFinance / strategy-rmt
Python — open source reference implementation
Minimal invocation
import numpy as np
from strategy_rmt import mp_bounds, parallel_analysis
# X: N x T returns matrix (rows = strategies, cols = bars)
N, T = X.shape
C = np.corrcoef(X)
eigs = np.linalg.eigvalsh(C)
lo, hi = mp_bounds(N, T, sigma2=1.0) # (1 - sqrt(q))^2, (1 + sqrt(q))^2
signal = eigs[eigs > hi]
# Optional: parallel analysis null
pa_threshold = parallel_analysis(X, n_perm=1000, q=0.99)
robust_signal = eigs[eigs > pa_threshold]
References
- [1]Marchenko, V. A. & Pastur, L. A. (1967). Distribution of eigenvalues for some sets of random matrices. Mat. Sb. (N.S.) 72(114):4, 507–536.
- [2]Laloux, L., Cizeau, P., Bouchaud, J.-P., & Potters, M. (1999). Noise dressing of financial correlation matrices. Physical Review Letters 83(7), 1467–1470.
- [3]Plerou, V., Gopikrishnan, P., Rosenow, B., et al. (2002). Random matrix approach to cross correlations in financial data. Physical Review E 65, 066126.
- [4]Bouchaud, J.-P. & Potters, M. (2009). Financial Applications of Random Matrix Theory: a short review. in The Oxford Handbook of Random Matrix Theory.