Building a Dual-Model Forecast Engine: DL vs. Statistical

We run two independent model families — deep learning and statistical — then ensemble them. Here's why, and what we learned building both.

The Problem with a Single Model

The first version of OptiHedge ran a single LSTM network on daily OHLCV data. It achieved reasonable directional accuracy on backtests. But in live deployment we noticed a consistent weakness: the model was excellent on trending stocks and poor on mean-reverting ones. The same architecture that learned momentum patterns in NVDA was systematically wrong about JNJ.

This is a well-known limitation of deep learning models applied to financial time series. Neural networks are universal function approximators — they fit the training distribution extremely well. The problem is that stock market regimes change. A momentum regime and a mean-reversion regime look completely different in the data, and a single model trained across both tends to average them poorly.

Why We Added a Statistical Model

Classical statistical models for time series — ARIMA variants, GARCH for volatility, exponential smoothing — have a property that neural networks lack: they are interpretable and explicitly encode assumptions about how the series behaves. An ARIMA model assumes stationarity and linear structure. A GARCH model explicitly models volatility clustering.

These assumptions are wrong for many stocks most of the time. But they are right for some stocks some of the time — particularly in low-volatility, slow-moving, large-cap names that exhibit relatively stable autocorrelation patterns over short horizons.

Our hypothesis was that a statistical model family would be complementary to our deep learning family — getting the cases right that the LSTM model gets wrong, and vice versa. The ensemble of the two would be more stable than either alone.

How the Deep Learning Family Works

Our primary deep learning model is a stacked LSTM with attention. It takes a rolling 60-day window of daily features as input: open, high, low, close, adjusted close, volume, and several derived features including RSI, MACD signal line, Bollinger Band position, and a market-relative return (stock return minus SPY return on the same day).

The output is a probability distribution over the next day's return, from which we derive the predicted direction, magnitude estimate, and confidence score. The model is retrained quarterly on the most recent 5 years of data for each stock, and fine-tuned weekly on the most recent 90 days.

We use dropout at training time for regularisation, and we apply Monte Carlo dropout at inference time — running 50 forward passes with dropout active and averaging the outputs — to generate a calibrated uncertainty estimate rather than a single point prediction. This uncertainty estimate feeds directly into the forecast band width and the confidence score you see on the platform.

How the Statistical Family Works

The statistical family runs in parallel on the same input data but uses a fundamentally different approach. For each stock, we fit an ARIMA(p,d,q) model using auto-selection of the order parameters via the AIC criterion. We pair this with an EGARCH volatility model to get a conditional variance estimate for the forecast horizon.

The statistical model outputs a point forecast and a prediction interval based on the fitted distribution. Where the deep learning model learns complex non-linear patterns from a high-dimensional feature set, the statistical model assumes a simpler linear structure in the return series and uses a principled probabilistic framework to generate its confidence interval.

The Ensemble Mechanism

We combine the two model families using a weighted average. The weights are not fixed — they are dynamically adjusted based on each model's recent accuracy on the specific stock being forecast. A model that has been performing well on a stock over the last 30 days gets a higher weight.

Concretely, we track the 30-day rolling Met/Beat rate for each model family on each stock separately. The weight assigned to each model is proportional to its recent Met/Beat rate, with a floor of 0.2 to ensure neither model is completely discarded.

The final forecast direction is the weighted average of the two directional predictions. The final band is computed from the weighted combination of the two uncertainty estimates. The confidence score is derived from the agreement between the two models: when both models predict the same direction with high individual confidence, the ensemble confidence is high. When they disagree, confidence is low — even if each model is individually confident.

What We Learned

The most important finding from building this dual-model architecture is that model disagreement is itself a signal. When our deep learning and statistical models disagree on a stock, the Met/Beat rate on that stock drops significantly — regardless of which model turns out to be right. A disagreement means the stock's near-term behavior is genuinely uncertain, and the ensemble confidence score correctly reflects that.

The stocks where both models consistently agree are the highest-confidence, highest-accuracy stocks on the platform. You can identify these on the dashboard: they are the stocks in our Bullseye picks and Hall of Fame, where sustained high accuracy reflects sustained model agreement over many trading days.

We are continuing to add new model families to the ensemble. A transformer-based architecture trained on tick data is currently in internal testing. As we add more models, the ensemble becomes more robust — any single model's failure mode is increasingly unlikely to be shared by all other models simultaneously.

OptiHedge forecasts are for informational purposes only and do not constitute financial advice. Past model performance does not guarantee future results.