Research review · López de Prado · chapters 10 & 13

Trading Rules & Bet Sizing

Sizing bets by a classifier's confidence cuts turnover 80–87% without creating edge; the backtest-free OU rule is a coin flip on a plain entry, but the Triple-Penance drawdown correction is the real, validated win

Two of Advances in Financial Machine Learning's most practical ideas live downstream of a signal: once you have a bet, how big should it be, and when should you take profit or stop out? This review puts both on real, fully-costed 1-minute data across 42 instruments in crypto, US equities and forex, and keeps the comparison honest with the Deflated Sharpe Ratio. The verdict separates cleanly: bet sizing genuinely cuts turnover but creates no edge; the backtest-free OU profit-take/stop rule is information-free on a plain entry; and the companion Triple-Penance drawdown correction is the one result that holds up empirically.

source

Advances in Financial ML, Ch. 10 & 13

López de Prado (2018) · bet sizing & trading rules

Bet sizing, code

projects / 11_bet_sizing

Trading rules, code

projects / 12_optimal_trading_rules

Bet sizing from probabilities

the claim

What López de Prado says

Once a meta-classifier outputs a calibrated probability that a bet is profitable, size the position by that probability rather than trading a fixed unit: map the probability to a signed size through a normal-CDF transform, average all concurrently-active bets into one net book position, and optionally discretize onto a coarse grid to stop the book churning on tiny changes.

our result

What we found

It is a precision/cost layer, not an alpha source. Probability sizing collapses book turnover by 80–87% across all 42 instruments, large, universal, repeatable, but deflated performance does not move (ΔDSR ≈ 0, no significance, 0/42 above DSR 0.95). The one real ordering is that discretizing beats continuous sizing, and that too is cost-efficiency, not edge. Sizing expresses an edge more cheaply; it cannot manufacture one.

Optimal trading rules (OU) & Triple Penance

the claim

What López de Prado says

Derive a profit-take/stop-loss rule without backtesting: fit an Ornstein–Uhlenbeck process to a mean-reverting series, Monte-Carlo paths off the fitted process, and read the optimal (PT, SL) pair off a Sharpe mesh. The companion Triple Penance rule gives the serial-correlation-aware maximum drawdown of an AR(1) return stream, on average recovery spans three times the loss period.

our result

What we found

On a vanilla z-score mean-reversion entry the OU apparatus adds nothing over an IS-tuned fixed-threshold control, a coin-flip on OOS DSR (54.8%), and the only DSR>0.95 names are broad equity indices riding beta drift that the dumb control captures too. We also trace a structural degeneracy: the OU mesh pins the stop at the grid edge for nearly every instrument. Triple Penance, by contrast, is vindicated: the naive IID drawdown bound understates realised drawdown by ~3.2×, and the AR(1) correction k=(1+φ)/(1−φ) closes the gap.

The result in three lines

Bet sizing · 42 instruments

Turnover collapses 80–87%, edge does not appear

Sizing bets by the meta-model's probability, m = 2Φ(z)−1, averaged across active bets, optionally discretized, cuts book turnover by 80–87% in crypto, equities and forex alike (all 42 instruments below the fixed-size baseline). But the Deflated Sharpe Ratio does not move: 0 of 42 instruments clear DSR > 0.95 under any scheme. It is a precision/cost-control layer, not an alpha source.

OU optimal rule · coin flip

The backtest-free rule earns nothing on a plain entry

López de Prado's OU profit-take/stop-loss rule, derived by Monte-Carlo on a fitted process, beats a plainly IS-tuned fixed PT/SL control out-of-sample only 54.8% of the time on DSR (47.6% for the geometrically-corrected formulation). The mesh is structurally degenerate, it pins the stop to the grid edge in 41/42 instruments, so on a vanilla z-score mean-reversion entry it has nothing to optimise.

Triple Penance · clean positive

Ignoring serial correlation underprices drawdown ~3.2×

Realised strategy returns are strongly autocorrelated (median AR(1) φ 0.24→0.40, k = 3.28×). The naive IID maximum-drawdown bound understates the realised drawdown by ≈ 3.2× (median realised/IID = 3.234); the AR(1)-corrected bound k(φ) = (1+φ)/(1−φ) closes the gap (median realised/AR(1) = 0.722). A clean empirical vindication of the 2015 result across 42 instruments.

The setup, shared across both studies

Both reviews run on the same real 1-minute panel: 27 crypto perps, 7 US-equity ETFs (regular hours only) and 8 forex majors, aggregated to information-driven bars and never traded costlessly.

No synthetic price series. Crypto carries a flat 7 bp per side; equity and forex frictions come from a causal, ex-ante time-of-day half-spread schedule plus commission, with weekend and overnight gaps kept as real risk. Every comparison fixes the structural strategy shape and varies only the thing under test, so the numeric knobs are treated as in-sample-tunable trials and the headline is always the Deflated Sharpe Ratio after honest trial-counting, never raw profit factor. Both projects' hot loops (the active-bet-averaging kernel; the OU Monte-Carlo mesh and the out-of-sample exit scan) are reimplemented in a compiled kernel and verified bit-identical against an independent pure-Python reference before any result is reported.

Part 1, bet sizing from predicted probabilities · chapter 10

Sizing by confidence is a cost-control layer, not an edge

Fig. 1:Turnover collapses everywhere. Each instrument's book turnover under probability sizing relative to its fixed-unit baseline, all 42 sit below 1, typically 0.13–0.25× fixed. Averaging overlapping bets nets opposing positions and damps the constant ±1 flipping of a fixed-unit book; discretization further suppresses the long tail of tiny near-zero-conviction trades.

Chapter 10 gives an explicit recipe: map a meta-classifier's calibrated probability of a profitable bet to a signed size through m = 2Φ(z) − 1 with z = (p − ½)/√(p(1−p)), average the sizes of all concurrently active bets into one net book position, and optionally snap that size to a coarse grid.

Holding the primary strategy, the triple-barrier labels, the out-of-fold meta-model, the costs and the active-bet-averaging engine identical across three schemes, fixed unit, continuous probability, and discretized probability, the only robust effect is on turnover: it falls 80–87% in every market. All-instrument median book turnover drops 254 → 48.6 (prob, −81%) → 46.2 (disc, −82%).

market	n	turnover (fixed)	prob	disc	prob cut	disc cut
Crypto	27	272	53.6	52.3	−80%	−84%
Equities	7	197	20.4	19.6	−87%	−87%
Forex	8	173	42.2	29.0	−84%	−86%

Median book turnover by scheme. Costs are charged on the change in book position (realised turnover), so the collapse is a direct, repeatable cost saving.

The headline metric is the Deflated Sharpe Ratio, net of costs, because the whole point is to ask whether any apparent sizing benefit survives the multiple-testing deflation implied by the 32-point knob grid (effective-N ≈ 3, median PBO 0.49, a coin flip). It does not: ΔDSR ≈ 0, no market reaches sign-test significance, and no instrument, in any market, under any scheme, clears DSR > 0.95.

The one consistent ordering is discretized ≥ continuous (33/42 instruments, and discretized PF beats fixed in 26/42 vs 15/42 for continuous). That too is cost-efficiency, not alpha, snapping to a 0.1 grid zeros the smallest bets that otherwise pay cost without conviction.

Fig. 2:Deflated performance does not move. Median Deflated Sharpe Ratio by market for fixed, probability and discretized sizing. The three schemes are statistically indistinguishable, and 0 of 42 instruments cross DSR > 0.95 under any scheme. The paired ΔDSR vs fixed is −0.0022 (prob) and +0.0026 (disc), neither significant by a two-sided sign test.

market	DSR (fixed)	prob	disc
Crypto	0.027	0.013	0.032
Equities	0.321	0.254	0.313
Forex	0.087	0.113	0.186

Median per-market DSR; the 0.95 bar is unmet everywhere. The EMA-crossover primary has no deflatable edge for sizing to amplify, so sizing's only visible footprint is the turnover line. That is precisely the prediction of treating Chapter 10 as an execution layer rather than a signal.

Fig. 3:Continuous vs discretized bet-size distributions. Continuous m keeps a cloud of tiny bets that each pay turnover cost without conviction; snapping to a 0.1 grid zeros the smallest of them, a mild net positive, the cost-efficiency ordering behind discretized ≥ continuous.

Fig. 4:Median ΔDSR versus the fixed-size baseline, by market. Continuous probability sizing is slightly negative; discretized is slightly positive; both are a fraction of a DSR point and neither is significant. Sizing redistributes how the book expresses its position, it does not create edge.

Part 2, optimal trading rules & triple penance · chapter 13

The OU rule is info-free; the drawdown correction is the real win

Fig. 5:OU rule vs an IS-tuned control, OOS Deflated Sharpe by market. The OU-derived profit-take/stop beats simply tuning the thresholds in-sample only 54.8% of the time on DSR, a coin flip. Only SPY/QQQ/IWM clear DSR > 0.95, and they clear for both arms (it is long-side index drift, not OU alpha); the control in fact out-Sharpes the OU rule on all three.

Chapter 13 derives a profit-take/stop-loss rule without backtesting: fit an Ornstein–Uhlenbeck process to a mean-reverting series, Monte-Carlo many paths off the fitted process, and read the optimal (profit-take, stop-loss) pair off a Sharpe mesh. We validate that rule out-of-sample, with costs and intrabar OHLC exits, against a control that simply grid-searches the same thresholds in-sample.

On a vanilla z-score mean-reversion entry the OU apparatus adds nothing decision-relevant: it beats the control on OOS DSR 54.8% of the time (42.9% on Sharpe), a coin flip, and never produces a deflation-clearing winner the control does not also produce. Crypto is a loss after 7 bp/side (median OU PF 0.78); forex's raw PF edges ahead (median 1.157) but still fails the deflation (best, GBPUSD, OU DSR 0.571).

market	n	OU PF	ctrl PF	OU DSR	ctrl DSR	DSR>0.95 (OU/ctrl)
Crypto	27	0.780	0.773	0.0001	0.000	0 / 0
Equities	7	0.577	1.163	0.000	0.000	3 / 3
Forex	8	1.157	1.211	0.098	0.023	0 / 0

Per-market medians, OOS, net of costs. PF is profit factor; DSR uses the 27-point IS-tunable knob grid as the trial set. The only DSR > 0.95 names are the three broad equity indices, where every arm harvests the same long-beta drift.

Why does the rule earn nothing? The OU mesh is structurally degenerate. It pins the optimal stop to the grid maximum (sl* = 3.0) in every instrument. The obvious suspect, that the textbook mesh starts each simulated path at the long-run mean, while the live entry fires at a z-extreme, is real, and a geometrically-correct enter-at-deviation formulation does move the needle (it beats enter-at-mean on OOS DSR in 73.8% of instruments, and rescues GBPUSD to OU-dev DSR 0.945, a near miss).

But the verdict is unchanged: enter-at-deviation still selects sl* = 3.0 in 41/42 instruments and beats the control only 47.6% of the time. The degeneracy is intrinsic to a first-touch rule on a mean-reverting process, the load-bearing methodological finding here.

Fig. 6:The degeneracy is intrinsic. The OU first-touch mesh drives the optimal stop to the grid edge (sl* = 3.0) in 41/42 instruments, even the geometrically-correct enter-at-deviation formulation. A first-touch rule on a mean-reverting series at the measured half-lives (10–58 bars) almost always wants the widest admissible stop: the 'optimal rule' collapses to 'don't stop early', which a 12×12 grid can only express as 'go to the edge'.

Fig. 7:Three-arm OOS Deflated Sharpe: OU enter-at-mean, the corrected OU enter-at-deviation, and the IS-tuned control. The correction is right and moves the needle, but the OU machinery still does not earn its complexity, only the three broad equity indices clear DSR > 0.95, across all three arms, and the dumb control wins there.

Putting the OU rule on its true regime

On a cointegrated spread, the coin-flip reverses

The coin-flip verdict above was always partly a wrong-regime artifact: a raw price series is not the mean-reverting process the Ornstein–Uhlenbeck machinery assumes. Its true habitat is a cointegrated residual spread. So we re-ran the exact same apparatus on five genuinely cointegrated crypto-perp pairs, hedge ratio β and OU parameters fit on train only, then OOS trading with full intrabar OHLC exits (spread extremes bounded by the leg OHLC) and per-leg costs on both legs, with the Deflated Sharpe as the headline.

The result reverses. The OU optimal rule beats both a fixed ±σ band control and buy-and-hold on 4 of 5 pairs, and clears Deflated Sharpe > 0.95 on 3 of 5, median OU Sharpe ≈ 9.2 against ≈ 3.1 for the band. Run on the regime it was designed for, the apparatus carries real information.

The honest caveats hold. The profit-take/stop grid-edge degeneracy persists, every pair selects the same (PT, SL) corner, so the apparatus is best read as a regime-and-shape selector, not a calibrated set-point: it tells you the OU regime is tradeable and roughly what shape the rule should take, not the exact thresholds.

And one pair (THETA–FIL) fails for both the OU rule and the band control, so cointegration is necessary but not sufficient: a passing cointegration test is the price of entry, not a guarantee of a tradeable spread.

Fig. 8:OU on its true regime, the cointegrated residual spread of SNX vs AAVE (β-hedged, OU half-life 89 bars, ADF p = 0.003). Left: the OOS spread tracking its rolling mean inside a ±σ entry band. Right: cumulative net P&L, the OU optimal rule (OU SR 12.66, DSR 1.00) outruns both the fixed-band control (band SR 6.75) and the buy-and-hold spread. Net of per-leg costs, full intrabar OHLC exits.

pair (β·b)	half-life	OU SR	band SR	OU PF	OU DSR
1INCH–UNI	96	9.21	3.11	2.11	1.00
COMP–AAVE	120	10.80	5.31	2.72	1.00
SNX–AAVE	89	12.66	6.75	2.52	1.00
GALA–SAND	174	4.58	−2.18	2.09	0.93
THETA–FIL	117	−4.60	−3.55	0.50	0.00

Five cointegrated crypto-perp pairs, OOS, net of per-leg costs. SR is annualized; DSR uses the same IS-tunable trial grid as the main study. OU beats the band on 4 of 5 and clears DSR > 0.95 on 3 of 5 (median OU SR 9.21 vs band 3.11). THETA–FIL fails for both arms, cointegration is necessary, not sufficient.

The clean, citable result, triple penance

Ignoring serial correlation underprices drawdown by ~3.2×

Fig. 9:Realised drawdown vs the AR(1)-adjusted bound across the panel, with the k(φ) = (1+φ)/(1−φ) inflation curve. Strategy returns are strongly positively autocorrelated (median AR(1) φ 0.24 crypto → 0.40 forex; selected arms up to 0.74), median variance inflation k = 3.28×. The naive IID bound understates realised drawdown by ≈ 3.2× (median realised/IID = 3.234); the AR(1) correction closes the gap (median realised/AR(1) = 0.722).

This is where the chapter's machinery genuinely pays off. Under a Gaussian AR(1) return stream with lag-1 autocorrelation φ, the long-run variance inflates by k(φ) = (1+φ)/(1−φ) (effective σ scaled by √k), and the 95%-confidence drawdown and time-under-water bounds inflate with it.

On the 23 instruments with positive drift, the realised returns are strongly autocorrelated (median k = 3.28×). The naive IID bound understates the realised drawdown by ≈ 3.2× , the textbook failure the 2015 paper warns about, and the AR(1) correction closes the gap (median realised/AR(1)-bound = 0.722, the bound sitting above realised for most names, as a 95% envelope should). A clean, honest empirical vindication on real strategy returns.

Fig. 10:The closed-form bounds side by side: naive IID (which ignores serial correlation) versus the AR(1)-adjusted MaxDD. The inflation factor k(φ) almost exactly accounts for the IID bound's shortfall against realised drawdown.

Reproduce the two mechanics in the browser

The serial-correlation drawdown tax, and the turnover/DSR split

The first calculator implements the Triple-Penance bound directly: drag the lag-1 autocorrelation φ and watch the maximum-drawdown and time-under-water bounds inflate by k(φ) = (1+φ)/(1−φ) over the naive IID bound. The second panel encodes the real per-market bet-sizing medians so you can see turnover collapse 80–87% while the Deflated Sharpe sits still. Both are self-contained, the formulae and the study's numbers are in the component, so they reproduce exactly on every load.

Demo A · Triple Penance: the serial-correlation drawdown tax

Strategy returns are not independent. Drag the lag-1 autocorrelation φ and watch the maximum-drawdown and time-under-water bounds inflate by the factor k(φ) = (1+φ)/(1−φ) over the naive IID bound that ignores it.

k(φ) = (1+φ)/(1−φ)

φ, lag-1 autocorrelation0.52

μ, drift per bar (%)0.040

σ, vol per bar (%)1.00

variance inflation k(φ)

3.17×

AR(1) MaxDD vs IID bound

3.17× deeper

MaxDD · naive IID bound

16.91%

assumes φ = 0

MaxDD · AR(1)-corrected

53.55%

σ scaled by √k = 1.78

Time-under-water · IID

1691 bars

assumes φ = 0

Time-under-water · AR(1)

5355 bars

3.17× the IID figure

On the 23 instruments with positive drift, the project measured a median AR(1) φ from 0.24 (crypto) to 0.40 (forex), median variance inflation k = 3.28×. The naive IID bound understated the realised drawdown by ≈ 3.2× (median realised / IID = 3.234); the AR(1)-corrected bound closed the gap (median realised / AR(1) = 0.722). Set φ ≈ 0.52 above to reproduce the ~3.2× tax.

Demo B · Bet sizing: turnover collapses, deflated Sharpe does not

Hold the strategy, labels, meta-model and costs identical; change only how the bet is sized. Probability sizing cuts book turnover 80–87% in every market, but the Deflated Sharpe Ratio barely moves, and 0 of 42 instruments clear DSR > 0.95 under any scheme.

m = 2Φ(z) − 1

turnover cut · continuous prob

−80%

vs fixed unit

turnover cut · discretized

−81%

vs fixed unit

instruments clearing DSR > 0.95

0 / 42

any scheme, any market

All-instrument median turnover falls 254 → 48.6 (prob, −81%) → 46.2 (disc, −82%); all 42 instruments sit below the fixed-size baseline. The paired ΔDSR vs fixed is −0.0022 (prob) and +0.0026 (disc), neither significant. The only consistent ranking is discretized ≥ continuous (33/42), a cost-efficiency effect, not alpha.

The honest read

Two practical ideas, two negatives and one clean positive, and the negatives are as useful as the positive when you frame them as a methodology audit.

What each idea actually does

Bet sizing from probabilities is a precision and cost-control layer, not an alpha source. If the primary strategy has a real, deflatable edge, sizing lets you express it more cheaply and with far less churn, worth having in a cost-dominated regime, but it cannot manufacture significance where the primary has none. This does not contradict López de Prado: Chapter 10 is about converting a classifier's confidence into a position, not about generating it. The result simply makes the boundary empirical.

The OU optimal-rule apparatus, on a vanilla entry, does not earn its complexity: it is a coin flip against simply tuning the thresholds in-sample, and its mesh is degenerate in its stop selection. That is a clean negative result with a constructive correction (the enter-at-deviation formulation helps but does not remove the degeneracy). The right next step is not more data but a genuinely mean-reverting entry, a fitted residual spread from a cointegrated pair, where the OU process is the true data-generating process and the mesh has something non-degenerate to optimise.

The Triple-Penance drawdown correction is the strong, standalone positive: a 42-instrument, three-asset-class demonstration that the IID drawdown bound is ≈ 3× too optimistic and that the k(φ) inflation closes the gap. A serious write-up frames the whole as a methodology audit, the backtest-free OU rule is fragile on plain entries and degenerate in its stop selection, while its companion drawdown correction is empirically vindicated.

Notes & limitations

Neither half is a tradable alpha claim, and that separation is the point. The bet-sizing primary is a toy EMA crossover with no deflatable edge, so sizing has nothing to amplify; QQQ's eye-catching prob profit factor of 12.7 is an artifact of a near-empty book (≈4 events cleared its act-gate) and is excluded from any qualitative claim. The OU rule's only DSR > 0.95 names are broad equity indices riding a long bull leg, shared by every arm. The sanctioned synthetic step in the OU study is the labelled rule-derivation (Monte-Carlo on a fitted process); every reported headline is real out-of-sample with costs and intrabar OHLC exits.

Reproducibility

Both studies are collected in the companion repository, project 11 (bet sizing) and project 12 (optimal trading rules) of lopez-de-prado-work-review. The compiled kernels are verified bit-identical against independent pure-Python references; the calculators on this page encode the formulae and the study's numbers directly, so the mechanics they illustrate reproduce on every load.

Cite

Cite as

Gatto, D. V. (2026). Trading Rules and Bet Sizing: A Multi-Market Review of Probability Sizing, the Backtest-Free OU Rule, and the Triple-Penance Drawdown Correction. Working paper (preview).

@techreport{gatto2026tradingrules,
  author      = {Gatto, Daniel V.},
  title       = {Trading Rules and Bet Sizing: A Multi-Market Review of
                 Probability Sizing, the Backtest-Free OU Rule, and the
                 Triple-Penance Drawdown Correction},
  year        = {2026},
  type        = {Working paper (preview)},
  note        = {Review of Lopez de Prado's Chapters 10 and 13}
}

References

Bailey, D. H., & López de Prado, M. (2013). Drawdown-Based Stop-Outs and the “Triple Penance” Rule. Journal of Risk, 18(2).
Bailey, D. H., & López de Prado, M. (2013). Stop-Outs Under Serial Correlation and the Triple Penance Rule.
Bailey, D. H., & López de Prado, M. (2014). The Deflated Sharpe Ratio: Correcting for Selection Bias, Backtest Overfitting, and Non-Normality. Journal of Portfolio Management, 40(5).
López de Prado, M., & Vince, R. (2019). Optimal Risk Budgeting under a Finite Investment Horizon.
López de Prado, M. (2018). Advances in Financial Machine Learning, Chapters 10 (Bet Sizing) and 13 (Backtesting on Synthetic Data). Wiley.

Trading Rules & Bet Sizing

Advances in Financial ML, Ch. 10 & 13

Bet sizing, code

Trading rules, code

What López de Prado says

What we found

What López de Prado says

What we found

The setup, shared across both studies

Sizing by confidence is a cost-control layer, not an edge

The OU rule is info-free; the drawdown correction is the real win

On a cointegrated spread, the coin-flip reverses

Ignoring serial correlation underprices drawdown by ~3.2×

The serial-correlation drawdown tax, and the turnover/DSR split

The honest read

What each idea actually does

Notes & limitations

Reproducibility

Cite

References

See also