Research review · predictive features · studies 07 & 08
Predictive Features: Structural Breaks, Entropy & Microstructure
López de Prado's exotic features are real and computable, but under realistic costs and a deflated-significance gate, none of them is tradeable here, and the expensive data tier buys no extra predictive power
Two of López de Prado's feature chapters promise predictive content that ordinary momentum features cannot see: structural-break and entropy estimators (Ch. 17–18) that flag regime shifts and quantify how predictable a return string is, and microstructure estimators (Ch. 19) that read who is trading and at what cost. This review implements both families across crypto, US-equity ETFs and forex, keeps every feature strictly causal, charges realistic time-of-day costs, cross-validates with purging and embargo, and gates the headline on the Deflated Sharpe Ratio. The verdict is honest and the same in both: the features compute cleanly and one descriptive regime effect is statistically strong, yet none survives as a tradeable, deflation-adjusted rule, and the expensive order-flow data tier buys no extra predictive power.
Advances in Financial ML, Ch. 17–19
López de Prado (2018) · structural breaks, entropy & microstructure
Structural breaks & entropy
lopez-de-prado-work-review / projects / 07
Microstructural features
lopez-de-prado-work-review / projects / 08
Structural breaks & entropy
the claim
What López de Prado says
Two feature families: structural-break detectors (the CUSUM event sampler and the SADF explosiveness/bubble statistic) and entropy estimators (Shannon, Lempel-Ziv, Kontoyiannis) that quantify the information content of the return string. He presents these as features, not standalone strategies, summaries of regime and predictability a classifier can use.
our result
What we found
As standalone, costed, purged-CV signals they carry essentially no overfitting-adjusted edge: across 108 trials the best feature deflates to DSR ≈ 0 (PBO 0.36). There is one clean descriptive effect, explosive (SADF>0) regimes precede higher forward returns in crypto (+21bp) and equities (+29bp), but it is mostly unconditional drift and does not survive as a tradeable, deflated rule (best regime-rule DSR 0.46).
Microstructural features
the claim
What López de Prado says
A second generation of microstructure estimators built from bar data alone (Roll, Corwin-Schultz, Kyle's λ, Amihud, Hasbrouck, VPIN). The premise: these summarise who is trading and at what cost, something a return/momentum feature set cannot see, and so should add predictive content for short-horizon moves and volatility.
our result
What we found
Volatility yes, direction no: next-bar volatility AUC is consistently above 0.50 (~0.54) in all three markets, but direction sits at the coin flip (~0.51), these features help size and time risk, not pick sides. The costed long/short is net-negative in 22 of 24 instruments and the all-scope DSR is 0.00. A bonus result not in the book: a free Bulk-Volume-Classification proxy matches true trade-side data for the vol task, the expensive order-flow tier is not the binding constraint.
The result in three lines
27 instruments · 3 asset classes
The features are real and computable
CUSUM, SADF explosiveness, Shannon / Lempel-Ziv / Kontoyiannis entropy, and the Roll / Kyle / Amihud / Hasbrouck / VPIN microstructure estimators, all built strictly causally on information-driven bars across crypto, US-equity ETFs and forex, with bit-identical reference and accelerated kernels (max|Δ| = 0).
best DSR ≈ 0.002 · ≈ 0.00
Nothing clears the deflated bar
As standalone signals the structural-break/entropy features return a best Deflated Sharpe Ratio of ≈0.002 across 108 costed purged-CV trials; the microstructure direction signal deflates to 0.00 across 144 trials. The 0.95 significance bar is never approached in either study.
BVC ≈ true flow
The expensive data tier buys nothing
On identical crypto bars, the free Bulk-Volume-Classification proxy matches or beats true buyer/seller volume for next-bar volatility (0.547 vs 0.522 AUC) and tick-only is within 0.004. You need price + volume, not order-flow data, the data tier is not the binding constraint.
Explore, the two findings, interactive
Switch the data tier; reveal the cost-and-deflation verdict
The explorer encodes the two studies' real numbers. Toggle the microstructure data tier and watch the next-bar volatility AUC barely move while direction stays a coin flip (Project 08's ablation), then reveal how the strong explosive-regime forward-return tilt (Project 07) collapses under realistic costs and deflation.
Demo, does the expensive data tier buy predictive power?
On identical crypto bars, the four order-flow estimators are recomputed three ways and scored on the next bar. Switch the data tier and watch the volatility signal barely move, and the direction signal stay a coin flip. Then reveal the cost-and-deflation verdict on the regime tilt.
The four side-volume estimators (Kyle, Hasbrouck, VPIN, OFI) on identical crypto bars. The free BVC proxy (0.547) beats the expensive true side-volume tier (0.522) for next-bar volatility, and tick-only (0.526) is within 0.004 of it. For direction, all three tiers sit at the coin flip (≈0.507–0.512). The order-flow data tier is not the binding constraint, price + volume is enough.
Descriptive, out-of-sample, thresholds fit in-fold (no peeking). Explosive (SADF>0) bars precede +29.2 bp forward in equities (p<1e-6) and +21.4 bp in crypto (p=7.5e-5), while non-explosive crypto bars are actually negative. Forex shows nothing.
Section 07, structural breaks & entropy
Explosive regimes really do precede higher returns, it just isn't alpha
Ch. 17–18 propose two feature families: structural breaks, the CUSUM event sampler and the SADF explosiveness/bubble statistic, and entropy, Shannon plug-in, Lempel-Ziv (LZ76) complexity, and the Kontoyiannis match-length rate. We compute all of them strictly causally over backward windows on ~6,000 matched dollar bars per instrument, across 27 instruments in three asset classes.
Conditioning forward returns on causal regimes surfaces one robust descriptive effect: explosive (SADF>0) bars precede a large positive forward move in crypto and equities. But it is mostly unconditional drift, SADF>0 episodes cluster inside trending up-legs, not a directional edge, which the next panel makes precise.
trials (instr × feature)
108
27 instr × 4 features
best OOS Sharpe
0.063
forex USDJPY · Shannon
E[max] null (SR₀)
0.101
skill-less 108-trial family
Deflated Sharpe
0.0016
bar is 0.95
PBO (CSCV)
0.357
corpus
% trials positive
45.4%
range [−0.171, +0.063]
As standalone signals these features are noise-band: the best of 108 costed purged-CV trials (forex USDJPY Shannon, Sharpe 0.063) is below the expected maximum of a skill-less 108-trial family (SR₀ 0.101), so its Deflated Sharpe collapses to ≈0.002. Turning the explosive regime into a directional rule and deflating across the 81-rule family gives DSR 0.46, the descriptive tilt does not monetise.
which family carries the (faint) tilt, by market
| market | family | SR | frac>0 | sign-p |
|---|---|---|---|---|
| crypto | breaks (SADF) | +0.012 | 0.75 | 0.146 |
| crypto | entropy | −0.008 | 0.44 | 0.62 |
| equities | breaks (SADF) | −0.046 | 0.00 | 0.016 |
| equities | entropy | −0.033 | 0.19 | 0.007 |
| forex | breaks (SADF) | +0.0004 | 0.50 | 1.00 |
| forex | entropy | +0.005 | 0.67 | 0.152 |
In crypto the faint positive tilt is SADF, not entropy; in forex it is entropy, not SADF; in equities both families are significantly negative net of costs (p=0.016 / 0.007). No family is significant in the right direction anywhere. CUSUM-event sampling rather than the clock moves OOS Sharpe by ≤0.003, a non-event.
Section 08, microstructural features
Volatility, weakly yes; direction, no, and the data tier doesn't matter
Ch. 19's “second generation” estimators read liquidity and toxicity from bar data alone: Roll and Corwin-Schultz spreads, Kyle, Amihud and Hasbrouck impact, and the volume-clock VPIN with order-flow imbalance and the tick rule. We compute them causally on ≈8 bars/day across 24 instruments, with honest per-market availability: crypto has true side volume, equities use a BVC proxy, forex (no volume) gets price-only estimators.
The cleanest result is the ablation at left. On identical crypto bars the free BVC proxy matches or beats true order-flow data for next-bar volatility. The predictive content lives in the price/volume dynamics BVC reconstructs, a research desk does not need expensive trade-side data to extract the (modest) microstructure vol signal.
predictive content by market, volatility weakly yes, direction no
| market | dir AUC | vol AUC | net SR | bp/bar |
|---|---|---|---|---|
| crypto10 instr · n 8,709 | 0.510 | 0.538 | −0.018 | −2.72 |
| equity7 instr · n 42,397 | 0.513 | 0.551 | −0.355 | −33.45 |
| forex7 instr · n 8,694 | 0.507 | 0.539 | −0.016 | −0.26 |
The costed long/short on the direction signal is net-negative in 22 of 24 instruments. Equities are worst (−33.45 bp/bar) because a min-ticket commission dominates a 1-unit book; crypto and forex sit just below zero. Vol-AUC, not direction-AUC, is the only place any signal lives.
deflated-Sharpe gate
| scope | n | best SR | SR₀ | DSR | PBO |
|---|---|---|---|---|---|
| all | 144 | 0.070 | 0.535 | 0.00 | 0.10 |
| crypto | 60 | 0.070 | 0.057 | 0.89 | 0.36 |
| equity | 42 | −0.032 | 0.474 | 0.00 | 0.02 |
| forex | 42 | 0.053 | 0.054 | 0.45 | 0.13 |
data-tier ablation, next-bar vol AUC
| TRUE side-volumewhat crypto actually has | 0.522 |
| BVC proxythe equity tier (price + volume, no side) | 0.547 |
| tick-onlythe forex tier (price only) | 0.526 |
The free BVC proxy (the equity tier) beats true side-volume for next-bar volatility. The order-flow data tier is not the binding constraint.
Section 08 · stationary vs raw order flow
Does presenting the flow as a stationary input help? Here, no.
A separate strand of the order-flow literature (Kolm, Turiel and Westray) argues that for flow-driven prediction the stationarity of the input matters more than the model: feed a non-stationary level into a learner and it should degrade out of sample, while the same information as a stationary transform should predict. We test that head-to-head on the same crypto bars, same purged cross-validation and same cost model. The raw input is the cumulative signed buyer-minus-seller flow (a trending, non-stationary level) plus its one-bar lag. The stationary input is the per-bar order-flow imbalance (OFI) plus a fractional differentiation of that level at the minimal d that keeps the series stationary while retaining memory.
The honest result is a null. Pooled across six instruments the direction AUC is ≈0.50 for both inputs (raw 0.499, stationary 0.496), so the gap is −0.002, the stationary input is, if anything, a hair worse. A short recent slice looked better for the stationary inputs, but the edge did not survive extending to the full sample, which marks it as a small-sample artifact. The transform is doing its job: the fractionally-differenced series is stationary with d≈0.5 and memory retained. There is simply no bar-level directional signal for it to expose, consistent with the rest of the study, where only volatility carries content while direction sits at the coin flip. A stationary input cannot manufacture an edge that is not in the data; it only helps when a real signal is masked by non-stationarity, which is not the regime here.
direction AUC: raw level vs stationary (OFI + fractional differentiation)
| instrument | d used | raw AUC | stat AUC | gap |
|---|---|---|---|---|
| BTC | 0.40 | 0.495 | 0.492 | −0.003 |
| ETH | 0.55 | 0.494 | 0.503 | +0.009 |
| SOL | 0.45 | 0.502 | 0.492 | −0.010 |
| BNB | 0.55 | 0.490 | 0.503 | +0.013 |
| XRP | 0.40 | 0.505 | 0.500 | −0.005 |
| LTC | 0.55 | 0.504 | 0.487 | −0.017 |
| pooled | 0.50 | 0.499 | 0.496 | −0.002 |
Pooled, the Deflated Sharpe Ratio of the best costed fold is 0.017 for raw and 0.000 for stationary against 36 trials, neither clears the bar. The stationarity transform works as intended; the bar-level direction it is asked to predict simply has no signal.
How the test is kept honest
Multi-market, causal-only, realistically costed, purged-CV, and gated on the Deflated Sharpe Ratio, the same discipline both studies share.
Method
- Causal only. Every feature value at bar t uses only bars ≤ t; every threshold and rule orientation is fit inside the training fold; label horizons are purged and embargoed. No figure or statistic uses forward information.
- Information-driven bars. Dollar bars for crypto and equities, tick bars for forex, López de Prado's preferred bar, with ~6,000 (structural breaks) or ≈8/day (microstructure) bars per instrument.
- Realistic costs, no clamping. Equity time-of-day half-spread schedule plus a min-ticket commission; forex per-pair half-spread in pips with a UTC time-of-day multiplier; crypto a house default. These costs are exactly what flip the equity feature families significantly negative, a costless test would report a false positive.
- Purged k-fold CV with embargo so the multi-bar label window never straddles train and test.
- Deflated Sharpe + PBO headline. The False Strategy Theorem supplies the expected maximum of a skill-less trial family; a result is only real if it clears that bar after deflation for trial count, dispersion, skew and kurtosis.
- Verified kernels. Every hot estimator has a pure-reference and an accelerated implementation, checked bit-identical (max|Δ| = 0 for CUSUM, LZ76, Kontoyiannis and the microstructure rolling block; ~1e-13 OLS round-off for SADF).
Notes & limitations
Both studies deliberately find nothing tradeable, and that is the finding. The explosive-regime forward-return tilt is real but is mostly unconditional drift, not alpha: crypto and equities trended up over the sample, SADF>0 episodes cluster inside the up-legs, and a long-biased conditional mean looks large until you demand a long/short rule and pay costs. The microstructure direction result is null on bar data; the volatility result is modest and is the only place future work should aim (a volatility-targeting overlay rather than a standalone long/short). Costs are realistic, not adversarial; the equity min-ticket commission punishes a tiny notional, but the qualitative verdict (direction AUC ≈ 0.50, DSR ≈ 0) is robust to that because there is no gross edge to defend even before costs. Labels are short-horizon; a longer triple-barrier span could change the direction picture and is left for follow-on work.
Reproducibility
The feature kernels, references, costed purged-CV harness and the deepening analyses are in the companion repository, project 07 and project 08 of lopez-de-prado-work-review (private during the review). The explorer on this page is self-contained: every number it shows is encoded in the component, so the two findings it illustrates reproduce exactly on every load.
Cite
References
The primary sources for the features reviewed here:
- López de Prado, M. (2018). Advances in Financial Machine Learning. Wiley. Chapters 17 (Structural Breaks), 18 (Entropy Features) and 19 (Microstructural Features).
- Easley, D., López de Prado, M., & O'Hara, M. (2012). Flow Toxicity and Liquidity in a High-Frequency World (VPIN). Review of Financial Studies, 25(5).
- Roll, R. (1984). A Simple Implicit Measure of the Effective Bid-Ask Spread in an Efficient Market. Journal of Finance, 39(4).
- Kyle, A. S. (1985). Continuous Auctions and Insider Trading. Econometrica, 53(6).
- Amihud, Y. (2002). Illiquidity and Stock Returns: Cross-Section and Time-Series Effects. Journal of Financial Markets, 5(1).
- Corwin, S. A., & Schultz, P. (2012). A Simple Way to Estimate Bid-Ask Spreads from Daily High and Low Prices. Journal of Finance, 67(2).
- Bailey, D. H., & López de Prado, M. (2014). The Deflated Sharpe Ratio. Journal of Portfolio Management, 40(5).
See also
The deflated-significance backbone these two studies share is built and stress-tested in Backtest Overfitting & the Deflated Sharpe Ratio, the selection-discipline theme is developed narratively in The edge is in the process, and the broader body of work is at Research.

