AI Trading Platform Update: Building a Bulletproof Evaluation Framework
What Changed Since December 9th?
In our last post, we celebrated improving our win rate from 16.6% to 20.1%. But we kept asking ourselves: Are these results real, or are we fooling ourselves?
We spent the past two days answering that question. The answer: We were partly wrong.
The Problem We Discovered
Our previous backtesting approach had small but important problems:
- No out-of-sample testing — We tested on the same data we trained on (the AI had already seen this data)
- Missing costs — We ignored trading fees, spread, and slippage
- Small samples — We made decisions based on 10–20 trades
- No reproducibility — Results changed between runs, so we couldn’t repeat tests
What We Built (Dec 9–11)
1. Walk-Forward Evaluation System
We built a time-series evaluation system that makes sure training data and testing data never mix:
13 independent evaluation windows
168 true out-of-sample trades (data the AI never saw during training)
Zero overlap between training and testing data
The system moves through time, training only on past data and testing on future data — just like real trading.
2. Realistic Cost Model
Every trade now accounts for real trading costs:
| Asset | Fees | Spread | Slippage | Total |
|---|---|---|---|---|
| Stock | 0.10% | 0.05% | 0.05% | ~0.20% |
| Crypto | 0.20% | 0.10% | 0.15% | ~0.45% |
| Forex | 0.02% | 0.08% | 0.03% | ~0.13% |
The backtester now tracks both gross (before costs) and net (after costs) profit and loss separately.
3. Minimum Trade Guards
The system now refuses to report statistics without enough data:
```python MIN_TRADES_FOR_STATS = 30 # Basic statistics MIN_TRADES_FOR_CONFIDENCE = 50 # Statistical significance ```
No more making decisions based on 15 trades.
4. Comprehensive Metrics Suite
New metrics module with:
- Wilson confidence intervals (a way to measure how reliable our numbers are)
- Expectancy calculation (average profit per trade)
- Profit Factor
- Sharpe and Sortino ratios (risk-adjusted returns)
- Maximum drawdown tracking (largest loss from peak)
5. Reproducibility Framework
Every experiment now logs:
- Git commit hash and branch
- Random seeds (Python, NumPy, PyTorch)
- Full configuration
- Results with timestamps
6. Parameter Optimization Tools
- Stop-Loss Sweep — Test ATR multipliers systematically
- Regime Detection — Classify market conditions
- Ablation Harness — Compare model configurations fairly
7. Database Migration
Added new fields to track cost data:
result_pct_gross -- PnL before costs
costs_json -- Cost breakdown
exit_reason -- stopped/target/timeout
bars_held -- Trade duration
The Honest Results
With proper out-of-sample evaluation:
| Metric | Previous Claim | Actual OOS |
|---|---|---|
| Win Rate | 20.1% | 20.8% |
| Profit Factor | ~1.2 (estimated) | 0.92 |
| Sharpe Ratio | Not measured | -0.46 |
| Total OOS Trades | — | 168 |
The truth: While our win rate held up, we’re actually losing money when costs are included.
New Modules Created
src/
├── reproducibility.py # Seed management, experiment logging
├── metrics.py # Trading metrics with confidence intervals
├── walk_forward.py # Time-series evaluation pipeline
├── ablation.py # Model comparison harness
├── regime.py # Market regime classification
└── stop_loss_sweep.py # ATR parameter optimization
tests/
└── test_core.py # 20 unit tests (all passing)
Tests: All Passing
$ pytest tests/test_core.py -v
======================== 20 passed in 1.60s ========================
Tests cover:
- Cost model calculations
- Wilson confidence intervals
- Walk-forward window generation
- OOS data isolation
- Trade result conversion
What This Means
Bad news: We’re not profitable yet (Profit Factor 0.92).
Good news: We now have the tools to find real edge:
- Every future improvement will be validated out-of-sample
- Costs are included from the start
- Results are reproducible
- Statistics are meaningful
Next Steps
- Improve signal generation (current 20.8% win rate needs to reach ~33% for 2:1 R/R profitability)
- Test different indicator combinations
- Filter by market regime
- Explore longer time horizons
Learn more about the platform: AI Trading Platform
Previous: Building an AI Trading Platform | Next: From Losing to Winning - Our Optimization Story →
Building a profitable trading system is hard. Building an honest evaluation framework is the first step to knowing if you’ve actually succeeded.
AI Trading Platform Blog
Read our development journey and latest updates:
AI Trading Platform: Adding CI/CD to Our Development Workflow
We implemented a CI/CD pipeline for the AI Trading Platform. This post documents the approach, the types of bugs it catches, and why automated testing matters …
AI Adoption in Trading: Why Acting Now Matters
AI adoption in trading is accelerating. Learn why firms that act now gain an advantage, and how our AI trading platform validates this approach through real …
AI Trading Platform: Choosing the Right Markets and Trading Style
We tested our frozen v0.3 exit strategy across different markets and trading styles. Here is what we learned about CFDs, futures, stocks, and why swing trading …
AI Trading Platform: Exit Strategy Breakthrough & Paper Trading System
Our AI trading platform improved expectancy from near-zero to statistically meaningful levels through a two-stage exit strategy. We also built a causal paper …
AI Trading Platform: Security Audit & Major System Updates
A comprehensive audit of our AI trading platform reveals a critical security fix, frontend accessibility improvements, and live stats from 33,000+ analyzed …
AI Trading Platform: Observing Model Behavior During Strategy Optimization
A research note documenting how iterative changes affected model behavior, risk distribution, and evaluation metrics.
Building an AI Trading Platform: Our Progress So Far
We are building an AI trading system that learns from its mistakes. Here is what we have done, what works, and what problems we still need to solve.
The AI Trading Platform is an internal research project operated exclusively by ABXK.AI. It is not publicly accessible and cannot be used by visitors.
Any results, insights, or examples shared on this website or on social media are provided for informational and educational purposes only and do not constitute financial advice.