Research review · portfolio construction · study 13

Portfolio Construction: HRP, NCO & Covariance Denoising vs Markowitz

Across 44 assets and 39 strategy markets, the López de Prado toolkit reliably beats raw mean-variance out-of-sample, but the standout finding is that denoising's value is entirely a function of q = T/N

López de Prado argues that mean-variance optimisation on a sample covariance is a trap, the “Markowitz curse,” where more correlated assets give more unstable, more concentrated optimal weights, and proposes covariance denoising/detoning, Hierarchical Risk Parity (HRP), Nested Clustered Optimization (NCO) and Theory-Implied Correlation (TIC) as the fix. This study reproduces all four and benchmarks them walk-forward against the raw-covariance controls, inverse-variance and naive 1/N, on real multi-market data, asking the only question that matters: do the methods actually lower realised out-of-sample variance, and survive a deflated-Sharpe correction, more than the controls? The explorer below lets you reproduce the central mechanic, the q = T/N regime dependence of denoising, in the browser.

source

Advances in Financial ML, Ch. 16

López de Prado (2018) · HRP, NCO & denoising

Code & data

lopez-de-prado-work-review / projects / 13

the claim

What López de Prado says

Mean-variance optimisation on a sample covariance is a trap, the "Markowitz curse": the more correlated the assets, the more unstable and concentrated the optimal weights become, and the matrix inversion fails exactly when you need diversification most.
His fixes avoid or stabilise the inversion: covariance denoising/detoning via the Marčenko–Pastur law, Hierarchical Risk Parity (clustering + recursive bisection, no inversion), Nested Clustered Optimization, and Theory-Implied Correlation. He argues these beat raw Markowitz out-of-sample.

our result

What we found

On the variance objective the toolkit works and the curse is real: raw Markowitz is the worst allocator everywhere, last in 39/39 strategy markets and ~65% higher OOS vol than the best LdP method in the 44-asset universe, with degenerate concentration.
But on a comparable-risk basis the honest margin is modest: HRP beats raw Markowitz by ~15% once legs are equal-risk, and the best deflated Sharpe often belongs to the simplest robust baselines (1/N, inverse-variance), not the elaborate methods.
A clean, teachable nuance: denoising's value is entirely a function of T/N, it is a no-op on a singular (q<1) covariance and only helps once q > 1. None of the methods turns a losing menu into a winning one; their job is risk control, not alpha (DSR ≈ 0 in the net-losing strategy universe).

The result in three lines

Markowitz last · 39/39 markets

The toolkit wins on variance, reliably

Raw mean-variance / min-variance Markowitz is the worst allocator on realised out-of-sample variance in all 39 strategy markets and in the 44-asset universe, in every framing. HRP, NCO and inverse-variance lower OOS variance and de-concentrate the book; the Markowitz curse is real and unconditional here.

Edge over 1/N ≈ 15%

But the honest margin is modest

Once legs enter at equal risk, HRP beats raw min-variance in 38/39 markets by a median of only ~15%; the un-normalised ~50% headline is largely an artifact of raw dollar magnitudes. On a comparable-risk basis the honest OOS winner is often plain 1/N / inverse-variance, with HRP a close second.

Denoising = f(q = T/N)

The standout result

Covariance denoising's value is entirely a function of q = T/N. At N=300 / T=252 (q=0.84) the sample covariance is singular, condition number ~1e17, and Marčenko–Pastur denoising is a near-no-op. Halve the menu to N=150 (q=1.68) and it cuts the condition number ~4× and finds only 7 of 150 eigenvalues are signal.

The standout finding, when does denoising actually help?

Covariance denoising's value is entirely a function of q = T/N

Fig. 1:Denoising helps only when q = T/N > 1. The Marčenko–Pastur law is undefined for a singular (q<1) sample covariance. At N=300 with a 252-day in-sample window (q=0.84) denoising barely moves the condition number (1.2×); halve the menu to N=150 (q=1.68) and the same method cuts it 4.1×; in the assets universe (q=28) it cuts it 9.6×.

This is the cleanest, most teachable result in the study, and the one most HRP/NCO write-ups gloss over. The Marčenko–Pastur law that powers covariance denoising is undefined for a singular sample covariance, and you have a singular covariance whenever you have more series than in-sample days.

With N=300 strategies and a 252-day window, q = 0.84 < 1: the sample covariance is rank-deficient with a condition number of ~1e17, and MP denoising is a near-no-op, it keeps all 300 eigenvalues as “signal.” This is precisely López de Prado's motivation for HRP: when you cannot invert the covariance at all, the inversion-free methods are the only ones that do not blow up.

Halve the menu to N=150 (same window → q = 1.68) and the covariance flips to well-posed: denoising now cuts the condition number 4.1× and finds only 7 of 150 eigenvalues are signal, flattening the other 143 as noise. The single most important practical takeaway: a practitioner with ~300 strategies and one year of daily data is in the regime where denoising cannot help and HRP / inverse-variance are mandatory.

universe	N	q = T/N	cond (raw)	cond (denoised)	reduction	signal factors
assets	9	28.0	8.6e3	9.0e2	9.6×	1 / 9
strategies N=300	300	0.84	4.3e17	3.5e17	1.2×	300 / 300
strategies N=150	150	1.68	2.1e3	5.0e2	4.1×	7 / 150

Median per-fold condition number of the covariance fed to the allocator. q is computed on the active-instrument count per fold. Detoning can re-inflate conditioning (it removes the market eigenvector, leaving a near-singular residual), so it is a clustering aid, not a conditioning fix.

Demo, Markowitz-curse / denoising explorer

Slide across the study's three real walk-forward regimes and watch the covariance condition number explode as q = T/N falls toward 1, and Marčenko–Pastur denoising collapse to a near-no-op exactly where the sample covariance turns singular.

condition_number_deepening.csv

regime, by q = T/Nq = 0.84

q<1, singularq≫1, well-posed

sample covariance

singular (q<1)

cond. number, raw

4.3e17

cond. number, denoised

3.5e17

denoising reduction

1.2×

eigenvalues kept as signal

300 / 300

q = 252-day in-sample window ÷ N. Bars are the median per-fold condition number of the covariance fed to the allocator.

At q = 0.84 < 1 there are more strategies (300) than in-sample days (252): the sample covariance is rank-deficient with a condition number near 4.3e17. The Marčenko–Pastur law is undefined here, so denoising is a near-no-op (just 1.2×, keeping all 300 eigenvalues as "signal"). You cannot invert this matrix, only the inversion-free methods (HRP, inverse-variance) survive.

And the payoff: realised OOS annualised vol per allocator on the 44-asset universe (equal-risk legs). Lower is better. Raw mean-variance Markowitz is the worst; HRP / NCO / inverse-variance lead.

portfolio_assets

nco

0.068

best

hrp

0.071

min_var_raw

0.073

tic_nco

0.074

hrp_denoise

0.084

1/N

0.091

inv_var

0.091

mean_var_raw

0.100

nco_denoise_detone

0.111

worst

naive (1/N, inv-var)raw MarkowitzHRPNCO / TIC

The Markowitz curse, measured

Raw Markowitz is last in 39/39 strategy markets, in every framing

This is López de Prado's intended use-case, allocating a risk budget across many de-correlated strategies, and the cleanest demonstration of the curse. Raw mean-variance / min-variance Markowitz is the worst allocator on OOS variance in all 39 markets, in both q regimes and both unit conventions. The curse is real and unconditional here.

But the honest margin matters. In raw dollar-PnL units HRP beats raw Markowitz by a median ~50%, inflated, because the singular-cov Markowitz weights blow up on a few high-dollar-vol legs. On the fair, vol-targeted basis (every leg at 10% annualised risk) HRP beats raw min-variance in 38/39 markets by a median of only ~15%. A real but unspectacular edge.

Two nuances the chart makes plain: (i) on a comparable-risk basis the honest winner is the simplest robust method, plain 1/N / inverse-variance, with HRP close behind; the elaborate methods (NCO, detoning) rank worse than 1/N in this net-noisy panel, a caution against over-engineering. (ii) Under vol-targeting inverse-variance reduces to 1/N exactly, the expected consistency check, which the run reproduces.

Fig. 2:Strategies universe, vol-targeted (equal-risk legs), mean OOS-vol rank across 39 markets (1 = best). Raw mean-variance and min-variance Markowitz are last in both the singular (q=0.84, N=300) and well-posed (q=1.68, N=150) regimes. The honest winner on a comparable-risk basis is plain 1/N / inverse-variance, with HRP a close second.

The 44-asset universe

Same verdict on real cross-asset returns

Fig. 3:Assets universe (44 instruments, vol-targeted 10%/leg), realised OOS annualised volatility, lower is better. HRP, NCO and TIC-NCO cluster at the low-variance end; raw mean-variance Markowitz is the highest. The detone-NCO bar is a reminder that detoning trades conditioning for clustering quality.

On daily close-to-close returns of 44 real instruments, crypto perps, US-equity ETFs and FX majors, the picture holds. In raw return units (table) TIC-NCO and HRP deliver the lowest OOS vol (0.146, 0.159); raw mean-variance is the worst at 0.418 , ~65% higher OOS vol than the best LdP method, with a degenerate effective-N of 2.6 (it bets the book on a handful of names). That is the curse, measured.

On the risk-adjusted axis the honest winner is the simplest robust method: the best Deflated Sharpe belongs to inverse-variance (0.64) and denoise-detone NCO (0.75), not the inversion-based Markowitz, and not always HRP. HRP's edge is risk reduction and de-concentration, not alpha.

allocator	OOS vol	OOS Sharpe	DSR	eff-N	cond. number
tic_nco	0.146	0.10	0.34	3.8	1.5e4
hrp	0.159	0.15	0.43	4.9	1.9e16
hrp_denoise	0.162	0.16	0.44	4.8	3.6e4
inv_var	0.167	0.27	0.64	6.8	1.9e16
nco_denoise_detone	0.180	0.34	0.75	6.3	1.7e16
nco	0.181	−0.04	0.15	2.5	1.9e16
1/N	0.282	0.09	0.32	18.2	1.9e16
min_var_raw	0.292	−0.03	0.17	2.1	1.9e16
mean_var_raw	0.418	0.16	0.44	2.6	1.9e16

Best → worst by OOS vol, raw return units. Condition number is the median per-fold value of the covariance fed to the allocator, note HRP (1.9e16) ignores the singular raw covariance entirely, while TIC and denoised methods impose a well-conditioned structure (1.5e4, 3.6e4).

Does the HRP edge scale?

HRP's advantage over 1/N grows with universe breadth

Fig. 5:OOS annualised variance of HRP (and NCO) divided by equal-weight (1/N), by universe size N, on a survivorship-aware 690-name crypto-perp panel (24 walk-forward folds). Below the dotted parity line means HRP delivered lower out-of-sample variance than 1/N, and the gap widens with breadth, from ~9% variance reduction at N=25 to roughly half by N=100+.

The 39-/44-asset result above, HRP barely beating 1/N on variance, turns out to be a small-universe artifact. Re-running the same walk-forward on a survivorship-aware 690-name crypto-perp panel and sweeping the universe size N shows HRP's hierarchical diversification needs breadth to express itself: the OOS variance ratio versus 1/N falls from ~0.87 at N=25 (≈9% variance reduction) to ~0.48 at N=100 and beyond (≈50% reduction).

The honest caveat: at large N the variance win is not a Sharpe win. HRP concentrates as the tree deepens, effective breadth collapses (eff-N ≈ 4–5 at N=690 despite hundreds of names) and turnover climbs from 0.14 to 0.49. So a fee-aware deployment nets only part of this variance edge against costs; the result is a risk-reduction story, not a free alpha story.

N	HRP var	1/N var	HRP/1N var	HRP Sharpe	1/N Sharpe
25	0.518	0.597	0.87	0.80	0.64
50	0.594	0.653	0.91	0.83	0.60
100	0.337	0.676	0.50	0.87	0.58
200	0.307	0.685	0.45	0.28	0.53
690	0.321	0.669	0.48	0.19	0.52

OOS annualised variance and Sharpe by universe size N (24 walk-forward folds, 690-name crypto-perp panel). HRP/1N var < 1 = HRP lower variance than equal-weight; the ratio falls as N grows. Note the Sharpe columns cross over: 1/N overtakes HRP at large N as HRP concentrates.

The risk-adjusted axis

Lower variance does not mean more return. Deflating each allocator's realised OOS Sharpe against the nine-allocator menu separates the robust methods from the fragile ones.

Fig. 4:Risk-adjusted view: Deflated Sharpe Ratio vs realised OOS Sharpe, assets universe (DSR deflated against the nine-allocator menu). Denoise-detone NCO and inverse-variance lead; raw NCO and raw min-variance trail. The robust, parameter-light methods are what survive deflation; the inversion-heavy raw methods are what the data punishes.

López de Prado's strongest specific claim, that the full denoise → detone → NCO pipeline dominates, is only partially supported here. NCO needs denoised, well-conditioned input to behave: raw NCO posts a negative OOS Sharpe and the worst DSR among the NCO variants. Detoning trades conditioning for clustering quality. The robust, parameter-light methods (HRP, inverse-variance, denoised-NCO) are what survive out of sample.

The honest negative

The de-correlated strategy samples are net-losing (median OOS Sharpe ≈ −4.9 at N=300, vol-targeted). The greedy-decorrelation sampler deliberately pulls uncorrelated names from a corpus dominated by losers, and no allocator can manufacture return from a losing menu, the Deflated Sharpe is ≈ 0 everywhere in the strategy universe. The portfolio methods control risk, not sign. We report this rather than hide it behind a return-positive cherry-pick.

Verdict

HRP / NCO / denoising beat raw Markowitz at controlling out-of-sample variance, reproducibly (Markowitz last in 39/39 strategy markets and worst in the 44-asset universe). But on a comparable-risk basis the edge over the simplest robust baselines (1/N, inverse-variance) is modest (~15% in the strategy universe), those baselines are often the honest OOS winner, and none of the methods turns a losing menu into a winning one. A faithful, unembellished confirmation of the spirit of the work, matrix inversion on a noisy or singular sample covariance is the enemy; structure and shrinkage help, without overselling the magnitude of HRP's edge.

Method

Real data only, multi-market: daily close-to-close returns of 44 instruments (crypto perps, US-equity ETFs, FX majors), plus up to 300 greedily de-correlated per-strategy daily-PnL series across 39 markets, the López de Prado use-case.
Walk-forward, causal: weights are estimated on a 252-day in-sample window, held over the following 63-day out-of-sample window, then rolled by 63 days. Weights are never scored on the data that built them.
Nine allocators compared: 1/N, inverse-variance, raw min-variance and mean-variance Markowitz, HRP, HRP-on-denoised, NCO, denoise-detone NCO, and TIC-NCO.
Headline metric is realised OOS annualised volatility (lower is better), with the Deflated Sharpe Ratio of each allocator's realised stream deflated against the nine-allocator menu (False Strategy Theorem).
Diagnostics: weight concentration (effective-N, HHI) and the condition number of the covariance actually fed to the allocator, the direct measure of the curse. A causal, in-sample-only per-leg vol scaler puts every leg at a common 10% annualised risk so cross-method comparison is fair.
The Numba HRP kernel is verified bit-identical to the numpy reference (max |Δw| ≤ 5.6e-17) on N ∈ {8, 17, 40, 120} every run.

Notes & limitations

Two unit conventions are reported because they measure different things: the OOS-vol ranking is scale-invariant and robust, but the magnitude of HRP's edge depends on how legs are scaled, the un-normalised ~50% figure is partly an artifact of raw dollar magnitudes, and the vol-targeted ~15% is the fair comparison. The strategy corpus is net-losing, so the Sharpe story is muted by design, and the methods themselves are López de Prado's, not new. The contribution is the empirical, multi-market, walk-forward audit, especially the q = T/N regime dependence of denoising.

Reproducibility

The engine (allocators, MP denoise/detone, HRP/NCO/TIC, walk-forward, DSR, the bit-identity-tested Numba HRP kernel), the figures and the tables are collected in project 13 of lopez-de-prado-work-review. The explorer on this page is self-contained: the condition-number regimes and the per-allocator OOS-vol numbers are encoded from the study's tables, so the mechanic it illustrates reproduces exactly on every load.

Cite

Cite as

Gatto, D. V. (2026). When Does Covariance Denoising Actually Help? A Walk-Forward, Multi-Market Audit of HRP, NCO and TIC vs Markowitz. Working paper.

@techreport{gatto2026portfolio,
  author      = {Gatto, Daniel V.},
  title       = {When Does Covariance Denoising Actually Help? A Walk-Forward,
                 Multi-Market Audit of HRP, NCO and TIC vs Markowitz},
  year        = {2026},
  type        = {Working paper},
  note        = {Review of Lopez de Prado's portfolio-construction apparatus}
}

References

The primary sources for the apparatus reviewed here:

López de Prado, M. (2016). Building Diversified Portfolios that Outperform Out of Sample. Journal of Portfolio Management, 42(4). (Hierarchical Risk Parity.)
López de Prado, M. (2018). Advances in Financial Machine Learning. Wiley. (Ch. 16, HRP.)
López de Prado, M. (2020). Machine Learning for Asset Managers. Cambridge University Press. (Covariance denoising, detoning, and Nested Clustered Optimization.)
López de Prado, M., & Lewis, M. J. (2019). Detection of False Investment Strategies Using Unsupervised Learning Methods, and the Theory-Implied Correlation matrix. Quantitative Finance.
Bailey, D. H., & López de Prado, M. (2014). The Deflated Sharpe Ratio: Correcting for Selection Bias, Backtest Overfitting, and Non-Normality. Journal of Portfolio Management, 40(5).