Research review · portfolio construction · study 13
Portfolio Construction: HRP, NCO & Covariance Denoising vs Markowitz
Across 44 assets and 39 strategy markets, the López de Prado toolkit reliably beats raw mean-variance out-of-sample, but the standout finding is that denoising's value is entirely a function of q = T/N
López de Prado argues that mean-variance optimisation on a sample covariance is a trap, the “Markowitz curse,” where more correlated assets give more unstable, more concentrated optimal weights, and proposes covariance denoising/detoning, Hierarchical Risk Parity (HRP), Nested Clustered Optimization (NCO) and Theory-Implied Correlation (TIC) as the fix. This study reproduces all four and benchmarks them walk-forward against the raw-covariance controls, inverse-variance and naive 1/N, on real multi-market data, asking the only question that matters: do the methods actually lower realised out-of-sample variance, and survive a deflated-Sharpe correction, more than the controls? The explorer below lets you reproduce the central mechanic, the q = T/N regime dependence of denoising, in the browser.
Advances in Financial ML, Ch. 16
López de Prado (2018) · HRP, NCO & denoising
Code & data
lopez-de-prado-work-review / projects / 13
the claim
What López de Prado says
- Mean-variance optimisation on a sample covariance is a trap, the "Markowitz curse": the more correlated the assets, the more unstable and concentrated the optimal weights become, and the matrix inversion fails exactly when you need diversification most.
- His fixes avoid or stabilise the inversion: covariance denoising/detoning via the Marčenko–Pastur law, Hierarchical Risk Parity (clustering + recursive bisection, no inversion), Nested Clustered Optimization, and Theory-Implied Correlation. He argues these beat raw Markowitz out-of-sample.
our result
What we found
- On the variance objective the toolkit works and the curse is real: raw Markowitz is the worst allocator everywhere, last in 39/39 strategy markets and ~65% higher OOS vol than the best LdP method in the 44-asset universe, with degenerate concentration.
- But on a comparable-risk basis the honest margin is modest: HRP beats raw Markowitz by ~15% once legs are equal-risk, and the best deflated Sharpe often belongs to the simplest robust baselines (1/N, inverse-variance), not the elaborate methods.
- A clean, teachable nuance: denoising's value is entirely a function of T/N, it is a no-op on a singular (q<1) covariance and only helps once q > 1. None of the methods turns a losing menu into a winning one; their job is risk control, not alpha (DSR ≈ 0 in the net-losing strategy universe).
The result in three lines
Markowitz last · 39/39 markets
The toolkit wins on variance, reliably
Raw mean-variance / min-variance Markowitz is the worst allocator on realised out-of-sample variance in all 39 strategy markets and in the 44-asset universe, in every framing. HRP, NCO and inverse-variance lower OOS variance and de-concentrate the book; the Markowitz curse is real and unconditional here.
Edge over 1/N ≈ 15%
But the honest margin is modest
Once legs enter at equal risk, HRP beats raw min-variance in 38/39 markets by a median of only ~15%; the un-normalised ~50% headline is largely an artifact of raw dollar magnitudes. On a comparable-risk basis the honest OOS winner is often plain 1/N / inverse-variance, with HRP a close second.
Denoising = f(q = T/N)
The standout result
Covariance denoising's value is entirely a function of q = T/N. At N=300 / T=252 (q=0.84) the sample covariance is singular, condition number ~1e17, and Marčenko–Pastur denoising is a near-no-op. Halve the menu to N=150 (q=1.68) and it cuts the condition number ~4× and finds only 7 of 150 eigenvalues are signal.
The standout finding, when does denoising actually help?
Covariance denoising's value is entirely a function of q = T/N
This is the cleanest, most teachable result in the study, and the one most HRP/NCO write-ups gloss over. The Marčenko–Pastur law that powers covariance denoising is undefined for a singular sample covariance, and you have a singular covariance whenever you have more series than in-sample days.
With N=300 strategies and a 252-day window, q = 0.84 < 1: the sample covariance is rank-deficient with a condition number of ~1e17, and MP denoising is a near-no-op, it keeps all 300 eigenvalues as “signal.” This is precisely López de Prado's motivation for HRP: when you cannot invert the covariance at all, the inversion-free methods are the only ones that do not blow up.
Halve the menu to N=150 (same window → q = 1.68) and the covariance flips to well-posed: denoising now cuts the condition number 4.1× and finds only 7 of 150 eigenvalues are signal, flattening the other 143 as noise. The single most important practical takeaway: a practitioner with ~300 strategies and one year of daily data is in the regime where denoising cannot help and HRP / inverse-variance are mandatory.
| universe | N | q = T/N | cond (raw) | cond (denoised) | reduction | signal factors |
|---|---|---|---|---|---|---|
| assets | 9 | 28.0 | 8.6e3 | 9.0e2 | 9.6× | 1 / 9 |
| strategies N=300 | 300 | 0.84 | 4.3e17 | 3.5e17 | 1.2× | 300 / 300 |
| strategies N=150 | 150 | 1.68 | 2.1e3 | 5.0e2 | 4.1× | 7 / 150 |
Median per-fold condition number of the covariance fed to the allocator. q is computed on the active-instrument count per fold. Detoning can re-inflate conditioning (it removes the market eigenvector, leaving a near-singular residual), so it is a clustering aid, not a conditioning fix.
Demo, Markowitz-curse / denoising explorer
Slide across the study's three real walk-forward regimes and watch the covariance condition number explode as q = T/N falls toward 1, and Marčenko–Pastur denoising collapse to a near-no-op exactly where the sample covariance turns singular.
q = 252-day in-sample window ÷ N. Bars are the median per-fold condition number of the covariance fed to the allocator.
At q = 0.84 < 1 there are more strategies (300) than in-sample days (252): the sample covariance is rank-deficient with a condition number near 4.3e17. The Marčenko–Pastur law is undefined here, so denoising is a near-no-op (just 1.2×, keeping all 300 eigenvalues as "signal"). You cannot invert this matrix, only the inversion-free methods (HRP, inverse-variance) survive.
And the payoff: realised OOS annualised vol per allocator on the 44-asset universe (equal-risk legs). Lower is better. Raw mean-variance Markowitz is the worst; HRP / NCO / inverse-variance lead.
portfolio_assetsThe Markowitz curse, measured
Raw Markowitz is last in 39/39 strategy markets, in every framing
This is López de Prado's intended use-case, allocating a risk budget across many de-correlated strategies, and the cleanest demonstration of the curse. Raw mean-variance / min-variance Markowitz is the worst allocator on OOS variance in all 39 markets, in both q regimes and both unit conventions. The curse is real and unconditional here.
But the honest margin matters. In raw dollar-PnL units HRP beats raw Markowitz by a median ~50%, inflated, because the singular-cov Markowitz weights blow up on a few high-dollar-vol legs. On the fair, vol-targeted basis (every leg at 10% annualised risk) HRP beats raw min-variance in 38/39 markets by a median of only ~15%. A real but unspectacular edge.
Two nuances the chart makes plain: (i) on a comparable-risk basis the honest winner is the simplest robust method, plain 1/N / inverse-variance, with HRP close behind; the elaborate methods (NCO, detoning) rank worse than 1/N in this net-noisy panel, a caution against over-engineering. (ii) Under vol-targeting inverse-variance reduces to 1/N exactly, the expected consistency check, which the run reproduces.
The 44-asset universe
Same verdict on real cross-asset returns
On daily close-to-close returns of 44 real instruments, crypto perps, US-equity ETFs and FX majors, the picture holds. In raw return units (table) TIC-NCO and HRP deliver the lowest OOS vol (0.146, 0.159); raw mean-variance is the worst at 0.418 , ~65% higher OOS vol than the best LdP method, with a degenerate effective-N of 2.6 (it bets the book on a handful of names). That is the curse, measured.
On the risk-adjusted axis the honest winner is the simplest robust method: the best Deflated Sharpe belongs to inverse-variance (0.64) and denoise-detone NCO (0.75), not the inversion-based Markowitz, and not always HRP. HRP's edge is risk reduction and de-concentration, not alpha.
| allocator | OOS vol | OOS Sharpe | DSR | eff-N | cond. number |
|---|---|---|---|---|---|
| tic_nco | 0.146 | 0.10 | 0.34 | 3.8 | 1.5e4 |
| hrp | 0.159 | 0.15 | 0.43 | 4.9 | 1.9e16 |
| hrp_denoise | 0.162 | 0.16 | 0.44 | 4.8 | 3.6e4 |
| inv_var | 0.167 | 0.27 | 0.64 | 6.8 | 1.9e16 |
| nco_denoise_detone | 0.180 | 0.34 | 0.75 | 6.3 | 1.7e16 |
| nco | 0.181 | −0.04 | 0.15 | 2.5 | 1.9e16 |
| 1/N | 0.282 | 0.09 | 0.32 | 18.2 | 1.9e16 |
| min_var_raw | 0.292 | −0.03 | 0.17 | 2.1 | 1.9e16 |
| mean_var_raw | 0.418 | 0.16 | 0.44 | 2.6 | 1.9e16 |
Best → worst by OOS vol, raw return units. Condition number is the median per-fold value of the covariance fed to the allocator, note HRP (1.9e16) ignores the singular raw covariance entirely, while TIC and denoised methods impose a well-conditioned structure (1.5e4, 3.6e4).
Does the HRP edge scale?
HRP's advantage over 1/N grows with universe breadth
The 39-/44-asset result above, HRP barely beating 1/N on variance, turns out to be a small-universe artifact. Re-running the same walk-forward on a survivorship-aware 690-name crypto-perp panel and sweeping the universe size N shows HRP's hierarchical diversification needs breadth to express itself: the OOS variance ratio versus 1/N falls from ~0.87 at N=25 (≈9% variance reduction) to ~0.48 at N=100 and beyond (≈50% reduction).
The honest caveat: at large N the variance win is not a Sharpe win. HRP concentrates as the tree deepens, effective breadth collapses (eff-N ≈ 4–5 at N=690 despite hundreds of names) and turnover climbs from 0.14 to 0.49. So a fee-aware deployment nets only part of this variance edge against costs; the result is a risk-reduction story, not a free alpha story.
| N | HRP var | 1/N var | HRP/1N var | HRP Sharpe | 1/N Sharpe |
|---|---|---|---|---|---|
| 25 | 0.518 | 0.597 | 0.87 | 0.80 | 0.64 |
| 50 | 0.594 | 0.653 | 0.91 | 0.83 | 0.60 |
| 100 | 0.337 | 0.676 | 0.50 | 0.87 | 0.58 |
| 200 | 0.307 | 0.685 | 0.45 | 0.28 | 0.53 |
| 690 | 0.321 | 0.669 | 0.48 | 0.19 | 0.52 |
OOS annualised variance and Sharpe by universe size N (24 walk-forward folds, 690-name crypto-perp panel). HRP/1N var < 1 = HRP lower variance than equal-weight; the ratio falls as N grows. Note the Sharpe columns cross over: 1/N overtakes HRP at large N as HRP concentrates.
The risk-adjusted axis
Lower variance does not mean more return. Deflating each allocator's realised OOS Sharpe against the nine-allocator menu separates the robust methods from the fragile ones.
López de Prado's strongest specific claim, that the full denoise → detone → NCO pipeline dominates, is only partially supported here. NCO needs denoised, well-conditioned input to behave: raw NCO posts a negative OOS Sharpe and the worst DSR among the NCO variants. Detoning trades conditioning for clustering quality. The robust, parameter-light methods (HRP, inverse-variance, denoised-NCO) are what survive out of sample.
The honest negative
The de-correlated strategy samples are net-losing (median OOS Sharpe ≈ −4.9 at N=300, vol-targeted). The greedy-decorrelation sampler deliberately pulls uncorrelated names from a corpus dominated by losers, and no allocator can manufacture return from a losing menu, the Deflated Sharpe is ≈ 0 everywhere in the strategy universe. The portfolio methods control risk, not sign. We report this rather than hide it behind a return-positive cherry-pick.
Verdict
HRP / NCO / denoising beat raw Markowitz at controlling out-of-sample variance, reproducibly (Markowitz last in 39/39 strategy markets and worst in the 44-asset universe). But on a comparable-risk basis the edge over the simplest robust baselines (1/N, inverse-variance) is modest (~15% in the strategy universe), those baselines are often the honest OOS winner, and none of the methods turns a losing menu into a winning one. A faithful, unembellished confirmation of the spirit of the work, matrix inversion on a noisy or singular sample covariance is the enemy; structure and shrinkage help, without overselling the magnitude of HRP's edge.
Method
- Real data only, multi-market: daily close-to-close returns of 44 instruments (crypto perps, US-equity ETFs, FX majors), plus up to 300 greedily de-correlated per-strategy daily-PnL series across 39 markets, the López de Prado use-case.
- Walk-forward, causal: weights are estimated on a 252-day in-sample window, held over the following 63-day out-of-sample window, then rolled by 63 days. Weights are never scored on the data that built them.
- Nine allocators compared: 1/N, inverse-variance, raw min-variance and mean-variance Markowitz, HRP, HRP-on-denoised, NCO, denoise-detone NCO, and TIC-NCO.
- Headline metric is realised OOS annualised volatility (lower is better), with the Deflated Sharpe Ratio of each allocator's realised stream deflated against the nine-allocator menu (False Strategy Theorem).
- Diagnostics: weight concentration (effective-N, HHI) and the condition number of the covariance actually fed to the allocator, the direct measure of the curse. A causal, in-sample-only per-leg vol scaler puts every leg at a common 10% annualised risk so cross-method comparison is fair.
- The Numba HRP kernel is verified bit-identical to the numpy reference (max |Δw| ≤ 5.6e-17) on N ∈ {8, 17, 40, 120} every run.
Notes & limitations
Two unit conventions are reported because they measure different things: the OOS-vol ranking is scale-invariant and robust, but the magnitude of HRP's edge depends on how legs are scaled, the un-normalised ~50% figure is partly an artifact of raw dollar magnitudes, and the vol-targeted ~15% is the fair comparison. The strategy corpus is net-losing, so the Sharpe story is muted by design, and the methods themselves are López de Prado's, not new. The contribution is the empirical, multi-market, walk-forward audit, especially the q = T/N regime dependence of denoising.
Reproducibility
The engine (allocators, MP denoise/detone, HRP/NCO/TIC, walk-forward, DSR, the bit-identity-tested Numba HRP kernel), the figures and the tables are collected in project 13 of lopez-de-prado-work-review. The explorer on this page is self-contained: the condition-number regimes and the per-allocator OOS-vol numbers are encoded from the study's tables, so the mechanic it illustrates reproduces exactly on every load.
Cite
References
The primary sources for the apparatus reviewed here:
- López de Prado, M. (2016). Building Diversified Portfolios that Outperform Out of Sample. Journal of Portfolio Management, 42(4). (Hierarchical Risk Parity.)
- López de Prado, M. (2018). Advances in Financial Machine Learning. Wiley. (Ch. 16, HRP.)
- López de Prado, M. (2020). Machine Learning for Asset Managers. Cambridge University Press. (Covariance denoising, detoning, and Nested Clustered Optimization.)
- López de Prado, M., & Lewis, M. J. (2019). Detection of False Investment Strategies Using Unsupervised Learning Methods, and the Theory-Implied Correlation matrix. Quantitative Finance.
- Bailey, D. H., & López de Prado, M. (2014). The Deflated Sharpe Ratio: Correcting for Selection Bias, Backtest Overfitting, and Non-Normality. Journal of Portfolio Management, 40(5).
See also
The companion study on selection discipline and the Deflated Sharpe Ratio is at Backtest Overfitting & the Deflated Sharpe Ratio, and the broader body of work is at Research.

