Research review · predictive features · studies 07 & 08

Predictive Features: Structural Breaks, Entropy & Microstructure

López de Prado's exotic features are real and computable, but under realistic costs and a deflated-significance gate, none of them is tradeable here, and the expensive data tier buys no extra predictive power

Two of López de Prado's feature chapters promise predictive content that ordinary momentum features cannot see: structural-break and entropy estimators (Ch. 17–18) that flag regime shifts and quantify how predictable a return string is, and microstructure estimators (Ch. 19) that read who is trading and at what cost. This review implements both families across crypto, US-equity ETFs and forex, keeps every feature strictly causal, charges realistic time-of-day costs, cross-validates with purging and embargo, and gates the headline on the Deflated Sharpe Ratio. The verdict is honest and the same in both: the features compute cleanly and one descriptive regime effect is statistically strong, yet none survives as a tradeable, deflation-adjusted rule, and the expensive order-flow data tier buys no extra predictive power.

source

Advances in Financial ML, Ch. 17–19

López de Prado (2018) · structural breaks, entropy & microstructure

Structural breaks & entropy

lopez-de-prado-work-review / projects / 07

Microstructural features

lopez-de-prado-work-review / projects / 08

Structural breaks & entropy

the claim

What López de Prado says

Two feature families: structural-break detectors (the CUSUM event sampler and the SADF explosiveness/bubble statistic) and entropy estimators (Shannon, Lempel-Ziv, Kontoyiannis) that quantify the information content of the return string. He presents these as features, not standalone strategies, summaries of regime and predictability a classifier can use.

our result

What we found

As standalone, costed, purged-CV signals they carry essentially no overfitting-adjusted edge: across 108 trials the best feature deflates to DSR ≈ 0 (PBO 0.36). There is one clean descriptive effect, explosive (SADF>0) regimes precede higher forward returns in crypto (+21bp) and equities (+29bp), but it is mostly unconditional drift and does not survive as a tradeable, deflated rule (best regime-rule DSR 0.46).

Microstructural features

the claim

What López de Prado says

A second generation of microstructure estimators built from bar data alone (Roll, Corwin-Schultz, Kyle's λ, Amihud, Hasbrouck, VPIN). The premise: these summarise who is trading and at what cost, something a return/momentum feature set cannot see, and so should add predictive content for short-horizon moves and volatility.

our result

What we found

Volatility yes, direction no: next-bar volatility AUC is consistently above 0.50 (~0.54) in all three markets, but direction sits at the coin flip (~0.51), these features help size and time risk, not pick sides. The costed long/short is net-negative in 22 of 24 instruments and the all-scope DSR is 0.00. A bonus result not in the book: a free Bulk-Volume-Classification proxy matches true trade-side data for the vol task, the expensive order-flow tier is not the binding constraint.

The result in three lines

27 instruments · 3 asset classes

The features are real and computable

CUSUM, SADF explosiveness, Shannon / Lempel-Ziv / Kontoyiannis entropy, and the Roll / Kyle / Amihud / Hasbrouck / VPIN microstructure estimators, all built strictly causally on information-driven bars across crypto, US-equity ETFs and forex, with bit-identical reference and accelerated kernels (max|Δ| = 0).

best DSR ≈ 0.002 · ≈ 0.00

Nothing clears the deflated bar

As standalone signals the structural-break/entropy features return a best Deflated Sharpe Ratio of ≈0.002 across 108 costed purged-CV trials; the microstructure direction signal deflates to 0.00 across 144 trials. The 0.95 significance bar is never approached in either study.

BVC ≈ true flow

The expensive data tier buys nothing

On identical crypto bars, the free Bulk-Volume-Classification proxy matches or beats true buyer/seller volume for next-bar volatility (0.547 vs 0.522 AUC) and tick-only is within 0.004. You need price + volume, not order-flow data, the data tier is not the binding constraint.

Explore, the two findings, interactive

Switch the data tier; reveal the cost-and-deflation verdict

The explorer encodes the two studies' real numbers. Toggle the microstructure data tier and watch the next-bar volatility AUC barely move while direction stays a coin flip (Project 08's ablation), then reveal how the strong explosive-regime forward-return tilt (Project 07) collapses under realistic costs and deflation.

Demo, does the expensive data tier buy predictive power?

On identical crypto bars, the four order-flow estimators are recomputed three ways and scored on the next bar. Switch the data tier and watch the volatility signal barely move, and the direction signal stay a coin flip. Then reveal the cost-and-deflation verdict on the regime tilt.

Data tier

next-bar VOL, OOS AUC

0.522

next-bar DIRECTION, OOS AUC

0.507

vol-AUC gap vs the free BVC proxy

-0.025

The four side-volume estimators (Kyle, Hasbrouck, VPIN, OFI) on identical crypto bars. The free BVC proxy (0.547) beats the expensive true side-volume tier (0.522) for next-bar volatility, and tick-only (0.526) is within 0.004 of it. For direction, all three tiers sit at the coin flip (≈0.507–0.512). The order-flow data tier is not the binding constraint, price + volume is enough.

Explosive regimes precede higher forward returns, but are they tradeable?

Descriptive, out-of-sample, thresholds fit in-fold (no peeking). Explosive (SADF>0) bars precede +29.2 bp forward in equities (p<1e-6) and +21.4 bp in crypto (p=7.5e-5), while non-explosive crypto bars are actually negative. Forex shows nothing.

Section 07, structural breaks & entropy

Explosive regimes really do precede higher returns, it just isn't alpha

Fig. 1:Conditional forward return by causal regime (out-of-sample, 95% CI). Explosive (SADF>0) regimes precede materially higher forward returns in equities (+29.2 bp, p<1e-6) and crypto (+21.4 bp, p=7.5e-5); non-explosive crypto bars are actually negative (−3.8 bp). Forex separates nothing. Thresholds are fit in-fold on train only, no lookahead.

Ch. 17–18 propose two feature families: structural breaks, the CUSUM event sampler and the SADF explosiveness/bubble statistic, and entropy, Shannon plug-in, Lempel-Ziv (LZ76) complexity, and the Kontoyiannis match-length rate. We compute all of them strictly causally over backward windows on ~6,000 matched dollar bars per instrument, across 27 instruments in three asset classes.

Conditioning forward returns on causal regimes surfaces one robust descriptive effect: explosive (SADF>0) bars precede a large positive forward move in crypto and equities. But it is mostly unconditional drift, SADF>0 episodes cluster inside trending up-legs, not a directional edge, which the next panel makes precise.

trials (instr × feature)

108

27 instr × 4 features

best OOS Sharpe

0.063

forex USDJPY · Shannon

E[max] null (SR₀)

0.101

skill-less 108-trial family

Deflated Sharpe

0.0016

bar is 0.95

PBO (CSCV)

0.357

corpus

% trials positive

45.4%

range [−0.171, +0.063]

As standalone signals these features are noise-band: the best of 108 costed purged-CV trials (forex USDJPY Shannon, Sharpe 0.063) is below the expected maximum of a skill-less 108-trial family (SR₀ 0.101), so its Deflated Sharpe collapses to ≈0.002. Turning the explosive regime into a directional rule and deflating across the 81-rule family gives DSR 0.46, the descriptive tilt does not monetise.

which family carries the (faint) tilt, by market

market	family	SR	frac>0	sign-p
crypto	breaks (SADF)	+0.012	0.75	0.146
crypto	entropy	−0.008	0.44	0.62
equities	breaks (SADF)	−0.046	0.00	0.016
equities	entropy	−0.033	0.19	0.007
forex	breaks (SADF)	+0.0004	0.50	1.00
forex	entropy	+0.005	0.67	0.152

In crypto the faint positive tilt is SADF, not entropy; in forex it is entropy, not SADF; in equities both families are significantly negative net of costs (p=0.016 / 0.007). No family is significant in the right direction anywhere. CUSUM-event sampling rather than the clock moves OOS Sharpe by ≤0.003, a non-event.

Fig. 2:Backward-only feature panel (BTC dollar bars): price with the SADF explosiveness statistic and rolling entropy beneath. Every value at bar t uses only bars ≤ t, the features are descriptive of the regime they sit in, with no lookahead.

Section 08, microstructural features

Volatility, weakly yes; direction, no, and the data tier doesn't matter

Fig. 3:Data-tier ablation on identical crypto bars (the only market with true buyer/seller volume). Recomputing the four side-volume estimators three ways, the free Bulk-Volume-Classification proxy (0.547 AUC) matches or beats true side-volume (0.522) for next-bar volatility on 9 of 10 instruments, and tick-only (0.526) is within 0.004. The expensive order-flow tier buys nothing.

Ch. 19's “second generation” estimators read liquidity and toxicity from bar data alone: Roll and Corwin-Schultz spreads, Kyle, Amihud and Hasbrouck impact, and the volume-clock VPIN with order-flow imbalance and the tick rule. We compute them causally on ≈8 bars/day across 24 instruments, with honest per-market availability: crypto has true side volume, equities use a BVC proxy, forex (no volume) gets price-only estimators.

The cleanest result is the ablation at left. On identical crypto bars the free BVC proxy matches or beats true order-flow data for next-bar volatility. The predictive content lives in the price/volume dynamics BVC reconstructs, a research desk does not need expensive trade-side data to extract the (modest) microstructure vol signal.

predictive content by market, volatility weakly yes, direction no

Fig. 4:Out-of-sample AUC by market (purged CV). Next-bar volatility AUC is consistently above 0.50 (≈0.538–0.551, tight error bars), microstructure features carry real short-horizon volatility information. Next-bar direction sits at the coin flip (≈0.507–0.513) in every market.

market	dir AUC	vol AUC	net SR	bp/bar
crypto10 instr · n 8,709	0.510	0.538	−0.018	−2.72
equity7 instr · n 42,397	0.513	0.551	−0.355	−33.45
forex7 instr · n 8,694	0.507	0.539	−0.016	−0.26

The costed long/short on the direction signal is net-negative in 22 of 24 instruments. Equities are worst (−33.45 bp/bar) because a min-ticket commission dominates a 1-unit book; crypto and forex sit just below zero. Vol-AUC, not direction-AUC, is the only place any signal lives.

Fig. 5:Per-instrument costed Sharpe and the Deflated Sharpe gate. Across all 144 trials the best deflates to 0.00, the best direction signal is consistent with luck. The crypto-scope 0.89 deflates only against a tiny same-scope benchmark (60 near-zero trials) and carries PBO 0.36. After realistic costs the direction signal is not tradeable.

deflated-Sharpe gate

scope	n	best SR	SR₀	DSR	PBO
all	144	0.070	0.535	0.00	0.10
crypto	60	0.070	0.057	0.89	0.36
equity	42	−0.032	0.474	0.00	0.02
forex	42	0.053	0.054	0.45	0.13

data-tier ablation, next-bar vol AUC

TRUE side-volumewhat crypto actually has	0.522
BVC proxythe equity tier (price + volume, no side)	0.547
tick-onlythe forex tier (price only)	0.526

The free BVC proxy (the equity tier) beats true side-volume for next-bar volatility. The order-flow data tier is not the binding constraint.

Section 08 · stationary vs raw order flow

Does presenting the flow as a stationary input help? Here, no.

Fig. 6:Order-flow inputs, stationary transform versus raw level (crypto, purged CV). Left: out-of-sample direction AUC straddles the 0.50 coin flip for both raw level and the stationary OFI-plus-fractional-differencing transform, with a pooled gap of −0.002. Right: the costed long/short Sharpe is net-negative for both. The transform achieves stationarity with memory retained; there is no bar-level directional signal for it to surface.

A separate strand of the order-flow literature (Kolm, Turiel and Westray) argues that for flow-driven prediction the stationarity of the input matters more than the model: feed a non-stationary level into a learner and it should degrade out of sample, while the same information as a stationary transform should predict. We test that head-to-head on the same crypto bars, same purged cross-validation and same cost model. The raw input is the cumulative signed buyer-minus-seller flow (a trending, non-stationary level) plus its one-bar lag. The stationary input is the per-bar order-flow imbalance (OFI) plus a fractional differentiation of that level at the minimal d that keeps the series stationary while retaining memory.

The honest result is a null. Pooled across six instruments the direction AUC is ≈0.50 for both inputs (raw 0.499, stationary 0.496), so the gap is −0.002, the stationary input is, if anything, a hair worse. A short recent slice looked better for the stationary inputs, but the edge did not survive extending to the full sample, which marks it as a small-sample artifact. The transform is doing its job: the fractionally-differenced series is stationary with d≈0.5 and memory retained. There is simply no bar-level directional signal for it to expose, consistent with the rest of the study, where only volatility carries content while direction sits at the coin flip. A stationary input cannot manufacture an edge that is not in the data; it only helps when a real signal is masked by non-stationarity, which is not the regime here.

direction AUC: raw level vs stationary (OFI + fractional differentiation)

instrument	d used	raw AUC	stat AUC	gap
BTC	0.40	0.495	0.492	−0.003
ETH	0.55	0.494	0.503	+0.009
SOL	0.45	0.502	0.492	−0.010
BNB	0.55	0.490	0.503	+0.013
XRP	0.40	0.505	0.500	−0.005
LTC	0.55	0.504	0.487	−0.017
pooled	0.50	0.499	0.496	−0.002

Pooled, the Deflated Sharpe Ratio of the best costed fold is 0.017 for raw and 0.000 for stationary against 36 trials, neither clears the bar. The stationarity transform works as intended; the bar-level direction it is asked to predict simply has no signal.

How the test is kept honest

Multi-market, causal-only, realistically costed, purged-CV, and gated on the Deflated Sharpe Ratio, the same discipline both studies share.

Method

Causal only. Every feature value at bar t uses only bars ≤ t; every threshold and rule orientation is fit inside the training fold; label horizons are purged and embargoed. No figure or statistic uses forward information.
Information-driven bars. Dollar bars for crypto and equities, tick bars for forex, López de Prado's preferred bar, with ~6,000 (structural breaks) or ≈8/day (microstructure) bars per instrument.
Realistic costs, no clamping. Equity time-of-day half-spread schedule plus a min-ticket commission; forex per-pair half-spread in pips with a UTC time-of-day multiplier; crypto a house default. These costs are exactly what flip the equity feature families significantly negative, a costless test would report a false positive.
Purged k-fold CV with embargo so the multi-bar label window never straddles train and test.
Deflated Sharpe + PBO headline. The False Strategy Theorem supplies the expected maximum of a skill-less trial family; a result is only real if it clears that bar after deflation for trial count, dispersion, skew and kurtosis.
Verified kernels. Every hot estimator has a pure-reference and an accelerated implementation, checked bit-identical (max|Δ| = 0 for CUSUM, LZ76, Kontoyiannis and the microstructure rolling block; ~1e-13 OLS round-off for SADF).

Notes & limitations

Both studies deliberately find nothing tradeable, and that is the finding. The explosive-regime forward-return tilt is real but is mostly unconditional drift, not alpha: crypto and equities trended up over the sample, SADF>0 episodes cluster inside the up-legs, and a long-biased conditional mean looks large until you demand a long/short rule and pay costs. The microstructure direction result is null on bar data; the volatility result is modest and is the only place future work should aim (a volatility-targeting overlay rather than a standalone long/short). Costs are realistic, not adversarial; the equity min-ticket commission punishes a tiny notional, but the qualitative verdict (direction AUC ≈ 0.50, DSR ≈ 0) is robust to that because there is no gross edge to defend even before costs. Labels are short-horizon; a longer triple-barrier span could change the direction picture and is left for follow-on work.

Reproducibility

The feature kernels, references, costed purged-CV harness and the deepening analyses are in the companion repository, project 07 and project 08 of lopez-de-prado-work-review (private during the review). The explorer on this page is self-contained: every number it shows is encoded in the component, so the two findings it illustrates reproduce exactly on every load.

Cite

Cite as

Gatto, D. V. (2026). Predictive Features: A Deflated, Multi-Market Test of Structural-Break, Entropy and Microstructure Features. Working paper (preview).

@techreport{gatto2026features,
  author      = {Gatto, Daniel V.},
  title       = {Predictive Features: A Deflated, Multi-Market Test of
                 Structural-Break, Entropy and Microstructure Features},
  year        = {2026},
  type        = {Working paper (preview)},
  note        = {Review of Lopez de Prado, Advances in Financial
                 Machine Learning, Ch. 17--19}
}

References

The primary sources for the features reviewed here:

López de Prado, M. (2018). Advances in Financial Machine Learning. Wiley. Chapters 17 (Structural Breaks), 18 (Entropy Features) and 19 (Microstructural Features).
Easley, D., López de Prado, M., & O'Hara, M. (2012). Flow Toxicity and Liquidity in a High-Frequency World (VPIN). Review of Financial Studies, 25(5).
Roll, R. (1984). A Simple Implicit Measure of the Effective Bid-Ask Spread in an Efficient Market. Journal of Finance, 39(4).
Kyle, A. S. (1985). Continuous Auctions and Insider Trading. Econometrica, 53(6).
Amihud, Y. (2002). Illiquidity and Stock Returns: Cross-Section and Time-Series Effects. Journal of Financial Markets, 5(1).
Corwin, S. A., & Schultz, P. (2012). A Simple Way to Estimate Bid-Ask Spreads from Daily High and Low Prices. Journal of Finance, 67(2).
Bailey, D. H., & López de Prado, M. (2014). The Deflated Sharpe Ratio. Journal of Portfolio Management, 40(5).

Predictive Features: Structural Breaks, Entropy & Microstructure

Advances in Financial ML, Ch. 17–19

Structural breaks & entropy

Microstructural features

What López de Prado says

What we found

What López de Prado says

What we found

Switch the data tier; reveal the cost-and-deflation verdict

Explosive regimes really do precede higher returns, it just isn't alpha

Volatility, weakly yes; direction, no, and the data tier doesn't matter

Does presenting the flow as a stationary input help? Here, no.

How the test is kept honest

Method

Notes & limitations

Reproducibility

Cite

References

See also