Walk-Forward vs. In-Sample

9 min read

If you only have in-sample, you have nothing

The single most common mistake in retail backtesting is reporting in-sample performance — performance on the same data the strategy was tuned on. It tells you nothing about future performance. Walk-forward analysis fits parameters on a rolling window, then evaluates on the immediately following out-of-sample window, then rolls forward. The aggregate of out-of-sample results is what you report.

walk_forward.py

python

1def walk_forward(df, fit_window=252, test_window=63):
2    results = []
3    for start in range(0, len(df) - fit_window - test_window, test_window):
4        fit = df.iloc[start : start + fit_window]
5        test = df.iloc[start + fit_window : start + fit_window + test_window]
6        params = fit_strategy(fit)
7        oos = evaluate(test, params)
8        results.append(oos)
9    return aggregate(results)

A 1-year fit / 1-quarter test rolling window is a common starting point for daily-bar strategies.

Heads up

Lookahead bias is the silent killer

Even in walk-forward, lookahead bias creeps in through the data pipeline: a feature computed using close-of-day data assumed available at the open, a Z-score computed over the full sample. Audit every feature for the timestamp of its inputs.

Up next

Cointegration & Pair Trades

Published Statistical Edges · Quantitative Methods & Risk

Continue

Previous: Monte Carlo Intuition