Back to López de Prado

Research review · portfolio construction · study 13

Portfolio Construction: HRP, NCO & Covariance Denoising vs Markowitz

Across 44 assets and 39 strategy markets, the López de Prado toolkit reliably beats raw mean-variance out-of-sample, but the standout finding is that denoising's value is entirely a function of q = T/N

López de Prado argues that mean-variance optimisation on a sample covariance is a trap, the “Markowitz curse,” where more correlated assets give more unstable, more concentrated optimal weights, and proposes covariance denoising/detoning, Hierarchical Risk Parity (HRP), Nested Clustered Optimization (NCO) and Theory-Implied Correlation (TIC) as the fix. This study reproduces all four and benchmarks them walk-forward against the raw-covariance controls, inverse-variance and naive 1/N, on real multi-market data, asking the only question that matters: do the methods actually lower realised out-of-sample variance, and survive a deflated-Sharpe correction, more than the controls? The explorer below lets you reproduce the central mechanic, the q = T/N regime dependence of denoising, in the browser.

source

Advances in Financial ML, Ch. 16

López de Prado (2018) · HRP, NCO & denoising

Code & data

lopez-de-prado-work-review / projects / 13

the claim

What López de Prado says

  • Mean-variance optimisation on a sample covariance is a trap, the "Markowitz curse": the more correlated the assets, the more unstable and concentrated the optimal weights become, and the matrix inversion fails exactly when you need diversification most.
  • His fixes avoid or stabilise the inversion: covariance denoising/detoning via the Marčenko–Pastur law, Hierarchical Risk Parity (clustering + recursive bisection, no inversion), Nested Clustered Optimization, and Theory-Implied Correlation. He argues these beat raw Markowitz out-of-sample.

our result

What we found

  • On the variance objective the toolkit works and the curse is real: raw Markowitz is the worst allocator everywhere, last in 39/39 strategy markets and ~65% higher OOS vol than the best LdP method in the 44-asset universe, with degenerate concentration.
  • But on a comparable-risk basis the honest margin is modest: HRP beats raw Markowitz by ~15% once legs are equal-risk, and the best deflated Sharpe often belongs to the simplest robust baselines (1/N, inverse-variance), not the elaborate methods.
  • A clean, teachable nuance: denoising's value is entirely a function of T/N, it is a no-op on a singular (q<1) covariance and only helps once q > 1. None of the methods turns a losing menu into a winning one; their job is risk control, not alpha (DSR ≈ 0 in the net-losing strategy universe).

The result in three lines

Markowitz last · 39/39 markets

The toolkit wins on variance, reliably

Raw mean-variance / min-variance Markowitz is the worst allocator on realised out-of-sample variance in all 39 strategy markets and in the 44-asset universe, in every framing. HRP, NCO and inverse-variance lower OOS variance and de-concentrate the book; the Markowitz curse is real and unconditional here.

Edge over 1/N ≈ 15%

But the honest margin is modest

Once legs enter at equal risk, HRP beats raw min-variance in 38/39 markets by a median of only ~15%; the un-normalised ~50% headline is largely an artifact of raw dollar magnitudes. On a comparable-risk basis the honest OOS winner is often plain 1/N / inverse-variance, with HRP a close second.

Denoising = f(q = T/N)

The standout result

Covariance denoising's value is entirely a function of q = T/N. At N=300 / T=252 (q=0.84) the sample covariance is singular, condition number ~1e17, and Marčenko–Pastur denoising is a near-no-op. Halve the menu to N=150 (q=1.68) and it cuts the condition number ~4× and finds only 7 of 150 eigenvalues are signal.

The standout finding, when does denoising actually help?

Covariance denoising's value is entirely a function of q = T/N

Fig. 1:Denoising helps only when q = T/N > 1. The Marčenko–Pastur law is undefined for a singular (q<1) sample covariance. At N=300 with a 252-day in-sample window (q=0.84) denoising barely moves the condition number (1.2×); halve the menu to N=150 (q=1.68) and the same method cuts it 4.1×; in the assets universe (q=28) it cuts it 9.6×.

This is the cleanest, most teachable result in the study, and the one most HRP/NCO write-ups gloss over. The Marčenko–Pastur law that powers covariance denoising is undefined for a singular sample covariance, and you have a singular covariance whenever you have more series than in-sample days.

With N=300 strategies and a 252-day window, q = 0.84 < 1: the sample covariance is rank-deficient with a condition number of ~1e17, and MP denoising is a near-no-op, it keeps all 300 eigenvalues as “signal.” This is precisely López de Prado's motivation for HRP: when you cannot invert the covariance at all, the inversion-free methods are the only ones that do not blow up.

Halve the menu to N=150 (same window → q = 1.68) and the covariance flips to well-posed: denoising now cuts the condition number 4.1× and finds only 7 of 150 eigenvalues are signal, flattening the other 143 as noise. The single most important practical takeaway: a practitioner with ~300 strategies and one year of daily data is in the regime where denoising cannot help and HRP / inverse-variance are mandatory.

universeNq = T/Ncond (raw)cond (denoised)reductionsignal factors
assets928.08.6e39.0e29.6×1 / 9
strategies N=3003000.844.3e173.5e171.2×300 / 300
strategies N=1501501.682.1e35.0e24.1×7 / 150

Median per-fold condition number of the covariance fed to the allocator. q is computed on the active-instrument count per fold. Detoning can re-inflate conditioning (it removes the market eigenvector, leaving a near-singular residual), so it is a clustering aid, not a conditioning fix.

Demo, Markowitz-curse / denoising explorer

Slide across the study's three real walk-forward regimes and watch the covariance condition number explode as q = T/N falls toward 1, and Marčenko–Pastur denoising collapse to a near-no-op exactly where the sample covariance turns singular.

condition_number_deepening.csv
regime, by q = T/Nq = 0.84
q<1, singularq≫1, well-posed
sample covariance
singular (q<1)
cond. number, raw
4.3e17
cond. number, denoised
3.5e17
denoising reduction
1.2×
eigenvalues kept as signal
300 / 300
COVARIANCE CONDITION NUMBER (log), 300 strategies1e181e151e121e91e61e34.3e17raw cov3.5e17MP-denoisedsingular / non-invertible

q = 252-day in-sample window ÷ N. Bars are the median per-fold condition number of the covariance fed to the allocator.

At q = 0.84 < 1 there are more strategies (300) than in-sample days (252): the sample covariance is rank-deficient with a condition number near 4.3e17. The Marčenko–Pastur law is undefined here, so denoising is a near-no-op (just 1.2×, keeping all 300 eigenvalues as "signal"). You cannot invert this matrix, only the inversion-free methods (HRP, inverse-variance) survive.

And the payoff: realised OOS annualised vol per allocator on the 44-asset universe (equal-risk legs). Lower is better. Raw mean-variance Markowitz is the worst; HRP / NCO / inverse-variance lead.

portfolio_assets
nco
0.068
best
hrp
0.071
min_var_raw
0.073
tic_nco
0.074
hrp_denoise
0.084
1/N
0.091
inv_var
0.091
mean_var_raw
0.100
nco_denoise_detone
0.111
worst
naive (1/N, inv-var)raw MarkowitzHRPNCO / TIC

The Markowitz curse, measured

Raw Markowitz is last in 39/39 strategy markets, in every framing

This is López de Prado's intended use-case, allocating a risk budget across many de-correlated strategies, and the cleanest demonstration of the curse. Raw mean-variance / min-variance Markowitz is the worst allocator on OOS variance in all 39 markets, in both q regimes and both unit conventions. The curse is real and unconditional here.

But the honest margin matters. In raw dollar-PnL units HRP beats raw Markowitz by a median ~50%, inflated, because the singular-cov Markowitz weights blow up on a few high-dollar-vol legs. On the fair, vol-targeted basis (every leg at 10% annualised risk) HRP beats raw min-variance in 38/39 markets by a median of only ~15%. A real but unspectacular edge.

Two nuances the chart makes plain: (i) on a comparable-risk basis the honest winner is the simplest robust method, plain 1/N / inverse-variance, with HRP close behind; the elaborate methods (NCO, detoning) rank worse than 1/N in this net-noisy panel, a caution against over-engineering. (ii) Under vol-targeting inverse-variance reduces to 1/N exactly, the expected consistency check, which the run reproduces.

Fig. 2:Strategies universe, vol-targeted (equal-risk legs), mean OOS-vol rank across 39 markets (1 = best). Raw mean-variance and min-variance Markowitz are last in both the singular (q=0.84, N=300) and well-posed (q=1.68, N=150) regimes. The honest winner on a comparable-risk basis is plain 1/N / inverse-variance, with HRP a close second.

The 44-asset universe

Same verdict on real cross-asset returns

Fig. 3:Assets universe (44 instruments, vol-targeted 10%/leg), realised OOS annualised volatility, lower is better. HRP, NCO and TIC-NCO cluster at the low-variance end; raw mean-variance Markowitz is the highest. The detone-NCO bar is a reminder that detoning trades conditioning for clustering quality.

On daily close-to-close returns of 44 real instruments, crypto perps, US-equity ETFs and FX majors, the picture holds. In raw return units (table) TIC-NCO and HRP deliver the lowest OOS vol (0.146, 0.159); raw mean-variance is the worst at 0.418 , ~65% higher OOS vol than the best LdP method, with a degenerate effective-N of 2.6 (it bets the book on a handful of names). That is the curse, measured.

On the risk-adjusted axis the honest winner is the simplest robust method: the best Deflated Sharpe belongs to inverse-variance (0.64) and denoise-detone NCO (0.75), not the inversion-based Markowitz, and not always HRP. HRP's edge is risk reduction and de-concentration, not alpha.

allocatorOOS volOOS SharpeDSReff-Ncond. number
tic_nco0.1460.100.343.81.5e4
hrp0.1590.150.434.91.9e16
hrp_denoise0.1620.160.444.83.6e4
inv_var0.1670.270.646.81.9e16
nco_denoise_detone0.1800.340.756.31.7e16
nco0.181−0.040.152.51.9e16
1/N0.2820.090.3218.21.9e16
min_var_raw0.292−0.030.172.11.9e16
mean_var_raw0.4180.160.442.61.9e16

Best → worst by OOS vol, raw return units. Condition number is the median per-fold value of the covariance fed to the allocator, note HRP (1.9e16) ignores the singular raw covariance entirely, while TIC and denoised methods impose a well-conditioned structure (1.5e4, 3.6e4).

Does the HRP edge scale?

HRP's advantage over 1/N grows with universe breadth

Fig. 5:OOS annualised variance of HRP (and NCO) divided by equal-weight (1/N), by universe size N, on a survivorship-aware 690-name crypto-perp panel (24 walk-forward folds). Below the dotted parity line means HRP delivered lower out-of-sample variance than 1/N, and the gap widens with breadth, from ~9% variance reduction at N=25 to roughly half by N=100+.

The 39-/44-asset result above, HRP barely beating 1/N on variance, turns out to be a small-universe artifact. Re-running the same walk-forward on a survivorship-aware 690-name crypto-perp panel and sweeping the universe size N shows HRP's hierarchical diversification needs breadth to express itself: the OOS variance ratio versus 1/N falls from ~0.87 at N=25 (≈9% variance reduction) to ~0.48 at N=100 and beyond (≈50% reduction).

The honest caveat: at large N the variance win is not a Sharpe win. HRP concentrates as the tree deepens, effective breadth collapses (eff-N ≈ 4–5 at N=690 despite hundreds of names) and turnover climbs from 0.14 to 0.49. So a fee-aware deployment nets only part of this variance edge against costs; the result is a risk-reduction story, not a free alpha story.

NHRP var1/N varHRP/1N varHRP Sharpe1/N Sharpe
250.5180.5970.870.800.64
500.5940.6530.910.830.60
1000.3370.6760.500.870.58
2000.3070.6850.450.280.53
6900.3210.6690.480.190.52

OOS annualised variance and Sharpe by universe size N (24 walk-forward folds, 690-name crypto-perp panel). HRP/1N var < 1 = HRP lower variance than equal-weight; the ratio falls as N grows. Note the Sharpe columns cross over: 1/N overtakes HRP at large N as HRP concentrates.

The risk-adjusted axis

Lower variance does not mean more return. Deflating each allocator's realised OOS Sharpe against the nine-allocator menu separates the robust methods from the fragile ones.

Fig. 4:Risk-adjusted view: Deflated Sharpe Ratio vs realised OOS Sharpe, assets universe (DSR deflated against the nine-allocator menu). Denoise-detone NCO and inverse-variance lead; raw NCO and raw min-variance trail. The robust, parameter-light methods are what survive deflation; the inversion-heavy raw methods are what the data punishes.

López de Prado's strongest specific claim, that the full denoise → detone → NCO pipeline dominates, is only partially supported here. NCO needs denoised, well-conditioned input to behave: raw NCO posts a negative OOS Sharpe and the worst DSR among the NCO variants. Detoning trades conditioning for clustering quality. The robust, parameter-light methods (HRP, inverse-variance, denoised-NCO) are what survive out of sample.

The honest negative

The de-correlated strategy samples are net-losing (median OOS Sharpe ≈ −4.9 at N=300, vol-targeted). The greedy-decorrelation sampler deliberately pulls uncorrelated names from a corpus dominated by losers, and no allocator can manufacture return from a losing menu, the Deflated Sharpe is ≈ 0 everywhere in the strategy universe. The portfolio methods control risk, not sign. We report this rather than hide it behind a return-positive cherry-pick.

Verdict

HRP / NCO / denoising beat raw Markowitz at controlling out-of-sample variance, reproducibly (Markowitz last in 39/39 strategy markets and worst in the 44-asset universe). But on a comparable-risk basis the edge over the simplest robust baselines (1/N, inverse-variance) is modest (~15% in the strategy universe), those baselines are often the honest OOS winner, and none of the methods turns a losing menu into a winning one. A faithful, unembellished confirmation of the spirit of the work, matrix inversion on a noisy or singular sample covariance is the enemy; structure and shrinkage help, without overselling the magnitude of HRP's edge.

Method

  • Real data only, multi-market: daily close-to-close returns of 44 instruments (crypto perps, US-equity ETFs, FX majors), plus up to 300 greedily de-correlated per-strategy daily-PnL series across 39 markets, the López de Prado use-case.
  • Walk-forward, causal: weights are estimated on a 252-day in-sample window, held over the following 63-day out-of-sample window, then rolled by 63 days. Weights are never scored on the data that built them.
  • Nine allocators compared: 1/N, inverse-variance, raw min-variance and mean-variance Markowitz, HRP, HRP-on-denoised, NCO, denoise-detone NCO, and TIC-NCO.
  • Headline metric is realised OOS annualised volatility (lower is better), with the Deflated Sharpe Ratio of each allocator's realised stream deflated against the nine-allocator menu (False Strategy Theorem).
  • Diagnostics: weight concentration (effective-N, HHI) and the condition number of the covariance actually fed to the allocator, the direct measure of the curse. A causal, in-sample-only per-leg vol scaler puts every leg at a common 10% annualised risk so cross-method comparison is fair.
  • The Numba HRP kernel is verified bit-identical to the numpy reference (max |Δw| ≤ 5.6e-17) on N ∈ {8, 17, 40, 120} every run.

Notes & limitations

Two unit conventions are reported because they measure different things: the OOS-vol ranking is scale-invariant and robust, but the magnitude of HRP's edge depends on how legs are scaled, the un-normalised ~50% figure is partly an artifact of raw dollar magnitudes, and the vol-targeted ~15% is the fair comparison. The strategy corpus is net-losing, so the Sharpe story is muted by design, and the methods themselves are López de Prado's, not new. The contribution is the empirical, multi-market, walk-forward audit, especially the q = T/N regime dependence of denoising.

Reproducibility

The engine (allocators, MP denoise/detone, HRP/NCO/TIC, walk-forward, DSR, the bit-identity-tested Numba HRP kernel), the figures and the tables are collected in project 13 of lopez-de-prado-work-review. The explorer on this page is self-contained: the condition-number regimes and the per-allocator OOS-vol numbers are encoded from the study's tables, so the mechanic it illustrates reproduces exactly on every load.

Cite

References

The primary sources for the apparatus reviewed here:

  • López de Prado, M. (2016). Building Diversified Portfolios that Outperform Out of Sample. Journal of Portfolio Management, 42(4). (Hierarchical Risk Parity.)
  • López de Prado, M. (2018). Advances in Financial Machine Learning. Wiley. (Ch. 16, HRP.)
  • López de Prado, M. (2020). Machine Learning for Asset Managers. Cambridge University Press. (Covariance denoising, detoning, and Nested Clustered Optimization.)
  • López de Prado, M., & Lewis, M. J. (2019). Detection of False Investment Strategies Using Unsupervised Learning Methods, and the Theory-Implied Correlation matrix. Quantitative Finance.
  • Bailey, D. H., & López de Prado, M. (2014). The Deflated Sharpe Ratio: Correcting for Selection Bias, Backtest Overfitting, and Non-Normality. Journal of Portfolio Management, 40(5).

See also

The companion study on selection discipline and the Deflated Sharpe Ratio is at Backtest Overfitting & the Deflated Sharpe Ratio, and the broader body of work is at Research.

Portfolio Construction: HRP, NCO & Covariance Denoising vs Markowitz | Daru Finance