Research review · López de Prado · chapters 10 & 13
Trading Rules & Bet Sizing
Sizing bets by a classifier's confidence cuts turnover 80–87% without creating edge; the backtest-free OU rule is a coin flip on a plain entry, but the Triple-Penance drawdown correction is the real, validated win
Two of Advances in Financial Machine Learning's most practical ideas live downstream of a signal: once you have a bet, how big should it be, and when should you take profit or stop out? This review puts both on real, fully-costed 1-minute data across 42 instruments in crypto, US equities and forex, and keeps the comparison honest with the Deflated Sharpe Ratio. The verdict separates cleanly: bet sizing genuinely cuts turnover but creates no edge; the backtest-free OU profit-take/stop rule is information-free on a plain entry; and the companion Triple-Penance drawdown correction is the one result that holds up empirically.
Advances in Financial ML, Ch. 10 & 13
López de Prado (2018) · bet sizing & trading rules
Bet sizing, code
projects / 11_bet_sizing
Trading rules, code
projects / 12_optimal_trading_rules
Bet sizing from probabilities
the claim
What López de Prado says
Once a meta-classifier outputs a calibrated probability that a bet is profitable, size the position by that probability rather than trading a fixed unit: map the probability to a signed size through a normal-CDF transform, average all concurrently-active bets into one net book position, and optionally discretize onto a coarse grid to stop the book churning on tiny changes.
our result
What we found
It is a precision/cost layer, not an alpha source. Probability sizing collapses book turnover by 80–87% across all 42 instruments, large, universal, repeatable, but deflated performance does not move (ΔDSR ≈ 0, no significance, 0/42 above DSR 0.95). The one real ordering is that discretizing beats continuous sizing, and that too is cost-efficiency, not edge. Sizing expresses an edge more cheaply; it cannot manufacture one.
Optimal trading rules (OU) & Triple Penance
the claim
What López de Prado says
Derive a profit-take/stop-loss rule without backtesting: fit an Ornstein–Uhlenbeck process to a mean-reverting series, Monte-Carlo paths off the fitted process, and read the optimal (PT, SL) pair off a Sharpe mesh. The companion Triple Penance rule gives the serial-correlation-aware maximum drawdown of an AR(1) return stream, on average recovery spans three times the loss period.
our result
What we found
On a vanilla z-score mean-reversion entry the OU apparatus adds nothing over an IS-tuned fixed-threshold control, a coin-flip on OOS DSR (54.8%), and the only DSR>0.95 names are broad equity indices riding beta drift that the dumb control captures too. We also trace a structural degeneracy: the OU mesh pins the stop at the grid edge for nearly every instrument. Triple Penance, by contrast, is vindicated: the naive IID drawdown bound understates realised drawdown by ~3.2×, and the AR(1) correction k=(1+φ)/(1−φ) closes the gap.
The result in three lines
Bet sizing · 42 instruments
Turnover collapses 80–87%, edge does not appear
Sizing bets by the meta-model's probability, m = 2Φ(z)−1, averaged across active bets, optionally discretized, cuts book turnover by 80–87% in crypto, equities and forex alike (all 42 instruments below the fixed-size baseline). But the Deflated Sharpe Ratio does not move: 0 of 42 instruments clear DSR > 0.95 under any scheme. It is a precision/cost-control layer, not an alpha source.
OU optimal rule · coin flip
The backtest-free rule earns nothing on a plain entry
López de Prado's OU profit-take/stop-loss rule, derived by Monte-Carlo on a fitted process, beats a plainly IS-tuned fixed PT/SL control out-of-sample only 54.8% of the time on DSR (47.6% for the geometrically-corrected formulation). The mesh is structurally degenerate, it pins the stop to the grid edge in 41/42 instruments, so on a vanilla z-score mean-reversion entry it has nothing to optimise.
Triple Penance · clean positive
Ignoring serial correlation underprices drawdown ~3.2×
Realised strategy returns are strongly autocorrelated (median AR(1) φ 0.24→0.40, k = 3.28×). The naive IID maximum-drawdown bound understates the realised drawdown by ≈ 3.2× (median realised/IID = 3.234); the AR(1)-corrected bound k(φ) = (1+φ)/(1−φ) closes the gap (median realised/AR(1) = 0.722). A clean empirical vindication of the 2015 result across 42 instruments.
The setup, shared across both studies
Both reviews run on the same real 1-minute panel: 27 crypto perps, 7 US-equity ETFs (regular hours only) and 8 forex majors, aggregated to information-driven bars and never traded costlessly.
No synthetic price series. Crypto carries a flat 7 bp per side; equity and forex frictions come from a causal, ex-ante time-of-day half-spread schedule plus commission, with weekend and overnight gaps kept as real risk. Every comparison fixes the structural strategy shape and varies only the thing under test, so the numeric knobs are treated as in-sample-tunable trials and the headline is always the Deflated Sharpe Ratio after honest trial-counting, never raw profit factor. Both projects' hot loops (the active-bet-averaging kernel; the OU Monte-Carlo mesh and the out-of-sample exit scan) are reimplemented in a compiled kernel and verified bit-identical against an independent pure-Python reference before any result is reported.
Part 1, bet sizing from predicted probabilities · chapter 10
Sizing by confidence is a cost-control layer, not an edge
Chapter 10 gives an explicit recipe: map a meta-classifier's calibrated probability of a profitable bet to a signed size through m = 2Φ(z) − 1 with z = (p − ½)/√(p(1−p)), average the sizes of all concurrently active bets into one net book position, and optionally snap that size to a coarse grid.
Holding the primary strategy, the triple-barrier labels, the out-of-fold meta-model, the costs and the active-bet-averaging engine identical across three schemes, fixed unit, continuous probability, and discretized probability, the only robust effect is on turnover: it falls 80–87% in every market. All-instrument median book turnover drops 254 → 48.6 (prob, −81%) → 46.2 (disc, −82%).
| market | n | turnover (fixed) | prob | disc | prob cut | disc cut |
|---|---|---|---|---|---|---|
| Crypto | 27 | 272 | 53.6 | 52.3 | −80% | −84% |
| Equities | 7 | 197 | 20.4 | 19.6 | −87% | −87% |
| Forex | 8 | 173 | 42.2 | 29.0 | −84% | −86% |
Median book turnover by scheme. Costs are charged on the change in book position (realised turnover), so the collapse is a direct, repeatable cost saving.
The headline metric is the Deflated Sharpe Ratio, net of costs, because the whole point is to ask whether any apparent sizing benefit survives the multiple-testing deflation implied by the 32-point knob grid (effective-N ≈ 3, median PBO 0.49, a coin flip). It does not: ΔDSR ≈ 0, no market reaches sign-test significance, and no instrument, in any market, under any scheme, clears DSR > 0.95.
The one consistent ordering is discretized ≥ continuous (33/42 instruments, and discretized PF beats fixed in 26/42 vs 15/42 for continuous). That too is cost-efficiency, not alpha, snapping to a 0.1 grid zeros the smallest bets that otherwise pay cost without conviction.
| market | DSR (fixed) | prob | disc |
|---|---|---|---|
| Crypto | 0.027 | 0.013 | 0.032 |
| Equities | 0.321 | 0.254 | 0.313 |
| Forex | 0.087 | 0.113 | 0.186 |
Median per-market DSR; the 0.95 bar is unmet everywhere. The EMA-crossover primary has no deflatable edge for sizing to amplify, so sizing's only visible footprint is the turnover line. That is precisely the prediction of treating Chapter 10 as an execution layer rather than a signal.
Part 2, optimal trading rules & triple penance · chapter 13
The OU rule is info-free; the drawdown correction is the real win
Chapter 13 derives a profit-take/stop-loss rule without backtesting: fit an Ornstein–Uhlenbeck process to a mean-reverting series, Monte-Carlo many paths off the fitted process, and read the optimal (profit-take, stop-loss) pair off a Sharpe mesh. We validate that rule out-of-sample, with costs and intrabar OHLC exits, against a control that simply grid-searches the same thresholds in-sample.
On a vanilla z-score mean-reversion entry the OU apparatus adds nothing decision-relevant: it beats the control on OOS DSR 54.8% of the time (42.9% on Sharpe), a coin flip, and never produces a deflation-clearing winner the control does not also produce. Crypto is a loss after 7 bp/side (median OU PF 0.78); forex's raw PF edges ahead (median 1.157) but still fails the deflation (best, GBPUSD, OU DSR 0.571).
| market | n | OU PF | ctrl PF | OU DSR | ctrl DSR | DSR>0.95 (OU/ctrl) |
|---|---|---|---|---|---|---|
| Crypto | 27 | 0.780 | 0.773 | 0.0001 | 0.000 | 0 / 0 |
| Equities | 7 | 0.577 | 1.163 | 0.000 | 0.000 | 3 / 3 |
| Forex | 8 | 1.157 | 1.211 | 0.098 | 0.023 | 0 / 0 |
Per-market medians, OOS, net of costs. PF is profit factor; DSR uses the 27-point IS-tunable knob grid as the trial set. The only DSR > 0.95 names are the three broad equity indices, where every arm harvests the same long-beta drift.
Why does the rule earn nothing? The OU mesh is structurally degenerate. It pins the optimal stop to the grid maximum (sl* = 3.0) in every instrument. The obvious suspect, that the textbook mesh starts each simulated path at the long-run mean, while the live entry fires at a z-extreme, is real, and a geometrically-correct enter-at-deviation formulation does move the needle (it beats enter-at-mean on OOS DSR in 73.8% of instruments, and rescues GBPUSD to OU-dev DSR 0.945, a near miss).
But the verdict is unchanged: enter-at-deviation still selects sl* = 3.0 in 41/42 instruments and beats the control only 47.6% of the time. The degeneracy is intrinsic to a first-touch rule on a mean-reverting process, the load-bearing methodological finding here.
Putting the OU rule on its true regime
On a cointegrated spread, the coin-flip reverses
The coin-flip verdict above was always partly a wrong-regime artifact: a raw price series is not the mean-reverting process the Ornstein–Uhlenbeck machinery assumes. Its true habitat is a cointegrated residual spread. So we re-ran the exact same apparatus on five genuinely cointegrated crypto-perp pairs, hedge ratio β and OU parameters fit on train only, then OOS trading with full intrabar OHLC exits (spread extremes bounded by the leg OHLC) and per-leg costs on both legs, with the Deflated Sharpe as the headline.
The result reverses. The OU optimal rule beats both a fixed ±σ band control and buy-and-hold on 4 of 5 pairs, and clears Deflated Sharpe > 0.95 on 3 of 5, median OU Sharpe ≈ 9.2 against ≈ 3.1 for the band. Run on the regime it was designed for, the apparatus carries real information.
The honest caveats hold. The profit-take/stop grid-edge degeneracy persists, every pair selects the same (PT, SL) corner, so the apparatus is best read as a regime-and-shape selector, not a calibrated set-point: it tells you the OU regime is tradeable and roughly what shape the rule should take, not the exact thresholds.
And one pair (THETA–FIL) fails for both the OU rule and the band control, so cointegration is necessary but not sufficient: a passing cointegration test is the price of entry, not a guarantee of a tradeable spread.
| pair (β·b) | half-life | OU SR | band SR | OU PF | OU DSR |
|---|---|---|---|---|---|
| 1INCH–UNI | 96 | 9.21 | 3.11 | 2.11 | 1.00 |
| COMP–AAVE | 120 | 10.80 | 5.31 | 2.72 | 1.00 |
| SNX–AAVE | 89 | 12.66 | 6.75 | 2.52 | 1.00 |
| GALA–SAND | 174 | 4.58 | −2.18 | 2.09 | 0.93 |
| THETA–FIL | 117 | −4.60 | −3.55 | 0.50 | 0.00 |
Five cointegrated crypto-perp pairs, OOS, net of per-leg costs. SR is annualized; DSR uses the same IS-tunable trial grid as the main study. OU beats the band on 4 of 5 and clears DSR > 0.95 on 3 of 5 (median OU SR 9.21 vs band 3.11). THETA–FIL fails for both arms, cointegration is necessary, not sufficient.
The clean, citable result, triple penance
Ignoring serial correlation underprices drawdown by ~3.2×
This is where the chapter's machinery genuinely pays off. Under a Gaussian AR(1) return stream with lag-1 autocorrelation φ, the long-run variance inflates by k(φ) = (1+φ)/(1−φ) (effective σ scaled by √k), and the 95%-confidence drawdown and time-under-water bounds inflate with it.
On the 23 instruments with positive drift, the realised returns are strongly autocorrelated (median k = 3.28×). The naive IID bound understates the realised drawdown by ≈ 3.2× , the textbook failure the 2015 paper warns about, and the AR(1) correction closes the gap (median realised/AR(1)-bound = 0.722, the bound sitting above realised for most names, as a 95% envelope should). A clean, honest empirical vindication on real strategy returns.
Reproduce the two mechanics in the browser
The serial-correlation drawdown tax, and the turnover/DSR split
The first calculator implements the Triple-Penance bound directly: drag the lag-1 autocorrelation φ and watch the maximum-drawdown and time-under-water bounds inflate by k(φ) = (1+φ)/(1−φ) over the naive IID bound. The second panel encodes the real per-market bet-sizing medians so you can see turnover collapse 80–87% while the Deflated Sharpe sits still. Both are self-contained, the formulae and the study's numbers are in the component, so they reproduce exactly on every load.
Demo A · Triple Penance: the serial-correlation drawdown tax
Strategy returns are not independent. Drag the lag-1 autocorrelation φ and watch the maximum-drawdown and time-under-water bounds inflate by the factor k(φ) = (1+φ)/(1−φ) over the naive IID bound that ignores it.
On the 23 instruments with positive drift, the project measured a median AR(1) φ from 0.24 (crypto) to 0.40 (forex), median variance inflation k = 3.28×. The naive IID bound understated the realised drawdown by ≈ 3.2× (median realised / IID = 3.234); the AR(1)-corrected bound closed the gap (median realised / AR(1) = 0.722). Set φ ≈ 0.52 above to reproduce the ~3.2× tax.
Demo B · Bet sizing: turnover collapses, deflated Sharpe does not
Hold the strategy, labels, meta-model and costs identical; change only how the bet is sized. Probability sizing cuts book turnover 80–87% in every market, but the Deflated Sharpe Ratio barely moves, and 0 of 42 instruments clear DSR > 0.95 under any scheme.
All-instrument median turnover falls 254 → 48.6 (prob, −81%) → 46.2 (disc, −82%); all 42 instruments sit below the fixed-size baseline. The paired ΔDSR vs fixed is −0.0022 (prob) and +0.0026 (disc), neither significant. The only consistent ranking is discretized ≥ continuous (33/42), a cost-efficiency effect, not alpha.
The honest read
Two practical ideas, two negatives and one clean positive, and the negatives are as useful as the positive when you frame them as a methodology audit.
What each idea actually does
Bet sizing from probabilities is a precision and cost-control layer, not an alpha source. If the primary strategy has a real, deflatable edge, sizing lets you express it more cheaply and with far less churn, worth having in a cost-dominated regime, but it cannot manufacture significance where the primary has none. This does not contradict López de Prado: Chapter 10 is about converting a classifier's confidence into a position, not about generating it. The result simply makes the boundary empirical.
The OU optimal-rule apparatus, on a vanilla entry, does not earn its complexity: it is a coin flip against simply tuning the thresholds in-sample, and its mesh is degenerate in its stop selection. That is a clean negative result with a constructive correction (the enter-at-deviation formulation helps but does not remove the degeneracy). The right next step is not more data but a genuinely mean-reverting entry, a fitted residual spread from a cointegrated pair, where the OU process is the true data-generating process and the mesh has something non-degenerate to optimise.
The Triple-Penance drawdown correction is the strong, standalone positive: a 42-instrument, three-asset-class demonstration that the IID drawdown bound is ≈ 3× too optimistic and that the k(φ) inflation closes the gap. A serious write-up frames the whole as a methodology audit, the backtest-free OU rule is fragile on plain entries and degenerate in its stop selection, while its companion drawdown correction is empirically vindicated.
Notes & limitations
Neither half is a tradable alpha claim, and that separation is the point. The bet-sizing primary is a toy EMA crossover with no deflatable edge, so sizing has nothing to amplify; QQQ's eye-catching prob profit factor of 12.7 is an artifact of a near-empty book (≈4 events cleared its act-gate) and is excluded from any qualitative claim. The OU rule's only DSR > 0.95 names are broad equity indices riding a long bull leg, shared by every arm. The sanctioned synthetic step in the OU study is the labelled rule-derivation (Monte-Carlo on a fitted process); every reported headline is real out-of-sample with costs and intrabar OHLC exits.
Reproducibility
Both studies are collected in the companion repository, project 11 (bet sizing) and project 12 (optimal trading rules) of lopez-de-prado-work-review. The compiled kernels are verified bit-identical against independent pure-Python references; the calculators on this page encode the formulae and the study's numbers directly, so the mechanics they illustrate reproduce on every load.
Cite
References
- Bailey, D. H., & López de Prado, M. (2013). Drawdown-Based Stop-Outs and the “Triple Penance” Rule. Journal of Risk, 18(2).
- Bailey, D. H., & López de Prado, M. (2013). Stop-Outs Under Serial Correlation and the Triple Penance Rule.
- Bailey, D. H., & López de Prado, M. (2014). The Deflated Sharpe Ratio: Correcting for Selection Bias, Backtest Overfitting, and Non-Normality. Journal of Portfolio Management, 40(5).
- López de Prado, M., & Vince, R. (2019). Optimal Risk Budgeting under a Finite Investment Horizon.
- López de Prado, M. (2018). Advances in Financial Machine Learning, Chapters 10 (Bet Sizing) and 13 (Backtesting on Synthetic Data). Wiley.
See also
The deflated-significance discipline used throughout is built and stress-tested in Backtest Overfitting & the Deflated Sharpe Ratio, and the broader body of work is at Research.

