M/02 · Higher Criticism + Model-X Knockoffs

Sparse signal detection with FDR control

Higher Criticism for global sparse-signal screening, paired with Model-X knockoffs for FDR-controlled variable selection on populations of strategies.

The mathematics

Higher Criticism

Suppose you observe p two-sided p-values $p_{(1)} \leq p_{(2)} \leq \dots \leq p_{(p)}$ from p hypothesis tests. Under the global null, the order statistics are distributed as the order statistics of p i.i.d. Uniform(0,1) variables, with j-th expected value j/(p+1). The Higher Criticism statistic compares the empirical CDF to its uniform expectation, scaled by the variance of a binomial under the null:

HC^{*} = 1 \leq j \leq α p max \frac{p ( j / p - p _{(j)} )}{( j / p ) ( 1 - j / p )}

with α typically taken as 1/2. Donoho & Jin (2004) show that under the rare-and-weak alternative, a sparsity fraction p^−β of non-nulls with effect size $r 2 lo g p$ , HC is asymptotically optimal at the detection boundary that no thresholding rule can cross. As a side benefit the argmax j* gives a data-driven cutoff: reject the smallest j* p-values.

Model-X knockoffs

For p features X₁, …, X_p with known joint distribution, a Model-X knockoff matrix X̃ satisfies two conditions:

Pairwise exchangeability: for any subset S ⊆ {1,…,p}, $(X, \tilde{X})_{swap (S)} = d (X, \tilde{X}) .$
Conditional independence from the response: $\tilde{X} ⊥ Y ∣ X .$

Construct any feature-importance statistic Z_j for the original variable and Z̃_jfor its knockoff (e.g. lasso coefficients, regression z-scores). The knockoff statistic is

W_{j} = ∣ Z_{j} ∣ - ∣ \tilde{Z}_{j} ∣.

Under the null, the sign of W_j is symmetric and exchangeable. The knockoff filter picks the smallest threshold t such that

\frac{1 + # { j : W _{j} \leq - t }}{max ( # { j : W _{j} \geq t } , 1 )} \leq q,

and selects S = {j : W_j ≥ t}. Theorem 3.4 of Candes et al. (2018) then gives finite-sample control of the modified false discovery rate

mFDR (S) = E [\frac{∣ S \cap H _{0} ∣}{∣ S ∣ + 1/ q}] \leq q .

The result on real strategy pools

Higher Criticism was run over the out-of-sample strategy pools for nine instruments spanning crypto, FX and commodities, each with more than thirty thousand parameter combinations. For each pool we measure the realised HC* over the ordered upper-tail p-values, then compare it to the Monte-Carlo null quantile of HC* at the same dimension. A pool only carries detectable sparse alpha if its HC* reaches past that boundary.

The demo below plots those committed figures directly. Every value is read from the result table; nothing is regenerated or synthesised. The reading is stark: no instrument's HC* crosses its null boundary, so under this test there is no detectable sparse signal in any pool. That is the same negative property that makes Higher Criticism trustworthy, it refuses to manufacture discoveries when none survive the null.

Demo: Higher Criticism vs Model-X Knockoffs

p features, k true non-nulls with effect size A. Knockoffs control FDR at level q; HC chooses a data-driven threshold from the empirical p-value process.

p (features)200

k (true non-nulls)10

A (signal strength)3.0

q (target FDR)0.10

seed=11

|z|>1.96 (naive)

selected19

true positives9

false positives10

FDP52.6%

power90.0%

Higher Criticism

HC*3.50

argmax j43

selected43

FDP76.7%

power100.0%

Knockoffs @ q=0.10

threshold t∅

selected0

true pos0

false pos0

FDP0.0%

power0.0%

Histogram of W statistics. Amber overlay = true non-nulls. The vertical green line is the knockoff threshold +t; everything to the right is selected. Note that under H0, W is symmetric about 0, that’s what makes the FDR control work.

Figures

Fig. 1:Knockoff W-statistic Wⱼ = |Zⱼ| − |Z̃ⱼ| on the BTC OOS strategy pool. Z is built from real Sharpes scaled by √n_trades and MAD-normalised; Z̃ is a sign-symmetric Gaussian knockoff. The amber dashed line is the data-dependent τ that achieves the FDR target q = 0.10; selected strategies are everything to the right of τ.

Fig. 2:Higher-Criticism HCⱼ = √p · (j/p − p₍ⱼ₎)/√((j/p)(1−j/p)) sweeping over the rank j of sorted p-values from the same BTC pool. HC* is the maximum over j ∈ [1, p/2]. The dashed/dotted lines are 95% and 99% quantiles of HC* under a uniform-pᵢ null built by 400 simulated replicates of the same dimension.

Why this matters for systematic strategies

A strategy pool with p = 38,000 parameter combinations on a single asset will produce several thousand nominally significant t-statistics under any unadjusted threshold. The standard remedy in finance has been Bonferroni (too conservative, kills real signals) or BH-FDR (correct under independence, broken under arbitrary dependence). Knockoffs are designed for the dependence structure that pools of trading strategies actually exhibit: they share market data, indicator families, parameter neighbourhoods.

Combined use: run Higher Criticism on the pool first. If HC* is below its asymptotic null quantile, stop, there is no detectable signal at this sparsity. If HC* is above, run knockoffs to extract the responsible features at controlled FDR.

Reproducibility

DaruFinance / hc-knockoffs

R · open source reference implementation

Minimal invocation

library(hc.knockoffs)

# Z: vector of per-strategy z-scores, length p
hc <- higher_criticism(Z)
hc$HC_star      # global statistic
hc$selected     # indices below the HC threshold

# Model-X knockoffs at FDR q = 0.10
sel <- knockoff_filter(Z, q = 0.10, method = "gaussian")
sel$tau         # data-dependent threshold
sel$selected    # selected feature indices

References

[1]Donoho, D. & Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures. Annals of Statistics 32(3), 962–994.
[2]Barber, R. F. & Candes, E. J. (2015). Controlling the false discovery rate via knockoffs. Annals of Statistics 43(5), 2055–2085.
[3]Candes, E., Fan, Y., Janson, L. & Lv, J. (2018). Panning for gold: Model-X knockoffs for high-dimensional controlled variable selection. JRSS-B 80(3), 551–577.
[4]Benjamini, Y. & Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. JRSS-B 57(1), 289–300.

All projects View on GitHub