Lab — population statistics

Sharpe density and (μ, σ, t) population scatter

Distribution-fitting and visualisation library for working with strategy populations of 38k+ variants per asset.

The mathematics

For each strategy i in a pool of N variants, observed over T bars with realized returns r_i,t, compute

\overset{μ}{^}_{i} = \frac{1}{T} t \sum r_{i, t}, \overset{σ}{^}_{i}^{2} = \frac{1}{T - 1} t \sum (r_{i, t} - \overset{μ}{^}_{i})^{2}, S R_{i} = \frac{μ ^ _{i}}{σ ^ _{i}} .

Under the null of independent N(0, σ²) returns, the studentized Sharpe statistic $t_{i} = T S R_{i}$ has a Student-t distribution with T − 1 degrees of freedom. For T moderate, this is well approximated by a standard normal.

The population view aggregates N such t-statistics into a histogram. Under the global null (no real edge anywhere) this histogram is the empirical distribution of N i.i.d. N(0, 1) variables. Any deviation — especially in the right tail — is the population-level signature of a non-empty subset of true non-nulls.

Multiple-comparisons inflation

At the “5% significance” threshold |t| > 1.96, the expected count of false positives in a pool of N = 38,000 strategies drawn from the global null is

E [# {i : ∣ t_{i} ∣ > 1.96} ∣ H_{0}^{global}] = 0.05 \times 38, 000 = 1, 900.

Any individual strategy passing |t| > 1.96 in this pool is virtually uninformative. The Bonferroni-corrected threshold for population-wide control at 5% is $∣ t ∣ > Φ^{- 1} (1 - 0.025/ N) \approx 4.36$ — extremely conservative and likely to kill all signal. The right move is FDR control via knockoffs (M/02) or Benjamini–Hochberg, which the library reports alongside the raw distribution.

Sharpe-ratio sampling distribution

For the deflated Sharpe ratio of Bailey & López de Prado (2014), the standard error of an observed Sharpe depends on its skew and kurtosis:

SE (S R) = \frac{1}{T - 1} (1 - γ_{3} S R + \frac{γ _{4} - 1}{4} S R^{2})

where γ₃, γ₄ are skewness and kurtosis of the per-bar return distribution. For heavy-tailed crypto returns this can inflate the SE by 30–50% relative to the Gaussian baseline; the library reports both the naive and the deflated bound.

Worked example

Generate N = 2,000 strategies with T = 200 bars each. 5% (k = 100) carry a true edge with $μ_{i} = 0.06 σ_{i}$ (so per-bar Sharpe of 0.06, annualised ≈ 0.95 at daily frequency). The remaining 1,900 are pure noise.

Expected # of |t| > 2 from the noise pool: 1900 · (1 − Φ(2)) · 2 ≈ 87 false discoveries.
Expected # of |t| > 2 from the alpha pool: 100 · P(t_i > 2 − √200·0.06) ≈ 100 · P(Z > 1.15) ≈ 12.5. So roughly 12 true discoveries.
Realized FDP at the |t| > 2 cutoff therefore lands near 87 / (87 + 12) ≈ 88%. That’s the baseline you have to beat with any selection rule.

The demo below regenerates this scenario live. Sweep the alpha fraction up and watch the right tail of the Sharpe density grow; the (μ, σ) scatter colours alpha-positive points orange via |t| > 2.

Demo — Sharpe density & (μ, σ, t) scatter

2000 strategies, T=200 bars each. Most are pure noise; 100 carry a small true edge.

N strategies2000

T bars200

α fraction (true edge)0.05

seed=3

planted α

119

|t|>2 selected

104

true discoveries

false discoveries

Sharpe density (orange = α-stratum)

(μ, σ) scatter — orange = |t| > 2

Figures

Population OOS Sharpe density across the 30-asset corpus — Fig. 1 —Population kernel density of OOS Sharpe ratios across 130k+ strategy-window observations spanning the full 30-asset corpus (unperturbed baseline, n_trades ≥ 10). The empirical density is visibly wider than N(0,1) — heavy-tailed, with non-trivial right-tail mass that survives any honest multiplicity correction.

(mean, std, t) scatter coloured by win-rate quartile — Fig. 2 —(μ, σ) scatter of every OOS strategy in the corpus, coloured by win-rate quartile. Dashed amber curves mark the |t| = 2 boundary at the median trade count. Top-quartile win-rate strategies (green) are concentrated above the |t| = 2 line — the population-level signature that the ranking metric is finding real cross-strategy structure rather than redrawing the noise.

Why this matters for systematic strategies

Strategy populations of the size we work with (38k+ variants per asset) are basically intractable without a library to summarise them. The library gives the same input to every downstream model: a per-strategy summary tensor with μ, σ, Sharpe, t, p-value, and per-bar return moments. M/01 (RMT) consumes the correlation structure of the returns; M/02 (HC + knockoffs) consumes the t-statistics; M/03 (TDA) consumes the same correlations as M/01 but extracts topological invariants; M/04 (EVT) consumes per-bar returns directly. Visualisation routines are the bridge between the population and human pattern-matching: the (μ, σ) scatter is where you first see whether your filtered subset is on a different planet from the bulk, or just a slightly luckier corner of the same one.

Reproducibility

DaruFinance / strategy-stats

Python — open source reference implementation

Minimal invocation

from strategy_stats import population_summary, sharpe_kde, mu_sigma_scatter

# returns: N x T strategy returns matrix
summary = population_summary(returns, periods_per_year=252)
print(summary.head())   # mu, sigma, sharpe, t, p_value

ax = sharpe_kde(summary['sharpe'])
ax = mu_sigma_scatter(summary, color_by='t')

References

[1]Bailey, D. H. & López de Prado, M. (2014). The deflated Sharpe ratio: correcting for selection bias, backtest overfitting, and non-normality. Journal of Portfolio Management 40(5), 94–107.
[2]Lo, A. W. (2002). The statistics of Sharpe ratios. Financial Analysts Journal 58(4), 36–52.
[3]Harvey, C. R., Liu, Y. & Zhu, H. (2016). … and the cross-section of expected returns. Review of Financial Studies 29(1), 5–68.

All projects View on GitHub