December 11, 2025 by ABXK.AI AI Trading

AI Trading Platform Update: Building a Bulletproof Evaluation Framework

machine-learningtradingbacktestingpythonevaluation

What Changed Since December 9th?

In our last post, we celebrated improving our win rate from 16.6% to 20.1%. But we kept asking ourselves: Are these results real, or are we fooling ourselves?

We spent the past two days answering that question. The answer: We were partly wrong.

Diagram showing Walk-Forward Evaluation Pipeline with 13 evaluation windows, training and OOS testing phases, cost model breakdown, and OOS results — Walk-forward evaluation pipeline with 13 testing windows to validate trading strategies on unseen data.

The Problem We Discovered

Our previous backtesting approach had small but important problems:

No out-of-sample testing — We tested on the same data we trained on (the AI had already seen this data)
Missing costs — We ignored trading fees, spread, and slippage
Small samples — We made decisions based on 10–20 trades
No reproducibility — Results changed between runs, so we couldn’t repeat tests

What We Built (Dec 9–11)

1. Walk-Forward Evaluation System

We built a time-series evaluation system that makes sure training data and testing data never mix:

13 independent evaluation windows
168 true out-of-sample trades (data the AI never saw during training)
Zero overlap between training and testing data

The system moves through time, training only on past data and testing on future data — just like real trading.

2. Realistic Cost Model

Every trade now accounts for real trading costs:

Asset	Fees	Spread	Slippage	Total
Stock	0.10%	0.05%	0.05%	~0.20%
Crypto	0.20%	0.10%	0.15%	~0.45%
Forex	0.02%	0.08%	0.03%	~0.13%

The backtester now tracks both gross (before costs) and net (after costs) profit and loss separately.

3. Minimum Trade Guards

The system now refuses to report statistics without enough data:

```python MIN_TRADES_FOR_STATS = 30 # Basic statistics MIN_TRADES_FOR_CONFIDENCE = 50 # Statistical significance ```

No more making decisions based on 15 trades.

4. Comprehensive Metrics Suite

New metrics module with:

Wilson confidence intervals (a way to measure how reliable our numbers are)
Expectancy calculation (average profit per trade)
Profit Factor
Sharpe and Sortino ratios (risk-adjusted returns)
Maximum drawdown tracking (largest loss from peak)

5. Reproducibility Framework

Every experiment now logs:

Git commit hash and branch
Random seeds (Python, NumPy, PyTorch)
Full configuration
Results with timestamps

6. Parameter Optimization Tools

Stop-Loss Sweep — Test ATR multipliers systematically
Regime Detection — Classify market conditions
Ablation Harness — Compare model configurations fairly

7. Database Migration

Added new fields to track cost data:

result_pct_gross  -- PnL before costs
costs_json        -- Cost breakdown
exit_reason       -- stopped/target/timeout
bars_held         -- Trade duration

The Honest Results

With proper out-of-sample evaluation:

Metric	Previous Claim	Actual OOS
Win Rate	20.1%	20.8%
Profit Factor	~1.2 (estimated)	0.92
Sharpe Ratio	Not measured	-0.46
Total OOS Trades	—	168

The truth: While our win rate held up, we’re actually losing money when costs are included.

New Modules Created

src/
├── reproducibility.py    # Seed management, experiment logging
├── metrics.py            # Trading metrics with confidence intervals
├── walk_forward.py       # Time-series evaluation pipeline
├── ablation.py           # Model comparison harness
├── regime.py             # Market regime classification
└── stop_loss_sweep.py    # ATR parameter optimization

tests/
└── test_core.py          # 20 unit tests (all passing)

Tests: All Passing

$ pytest tests/test_core.py -v
======================== 20 passed in 1.60s ========================

Tests cover:

Cost model calculations
Wilson confidence intervals
Walk-forward window generation
OOS data isolation
Trade result conversion

What This Means

Bad news: We’re not profitable yet (Profit Factor 0.92).

Good news: We now have the tools to find real edge:

Every future improvement will be validated out-of-sample
Costs are included from the start
Results are reproducible
Statistics are meaningful

Next Steps

Improve signal generation (current 20.8% win rate needs to reach ~33% for 2:1 R/R profitability)
Test different indicator combinations
Filter by market regime
Explore longer time horizons

Learn more about the platform: AI Trading Platform

Previous: Building an AI Trading Platform | Next: From Losing to Winning - Our Optimization Story →

Building a profitable trading system is hard. Building an honest evaluation framework is the first step to knowing if you’ve actually succeeded.

AI Trading Platform Blog

Read our development journey and latest updates:

January 18, 2026 AI Trading

AI Trading Platform: Adding CI/CD to Our Development Workflow

We implemented a CI/CD pipeline for the AI Trading Platform. This post documents the approach, the types of bugs it catches, and why automated testing matters …

AI tradingCI/CDautomated testing

January 13, 2026 AI Trading

AI Adoption in Trading: Why Acting Now Matters

AI adoption in trading is accelerating. Learn why firms that act now gain an advantage, and how our AI trading platform validates this approach through real …

AI tradingtrading technologypaper trading

January 10, 2026 AI Trading

AI Trading Platform: Choosing the Right Markets and Trading Style

We tested our frozen v0.3 exit strategy across different markets and trading styles. Here is what we learned about CFDs, futures, stocks, and why swing trading …

AI tradingfutures tradingCFD trading

December 28, 2025 AI Trading

AI Trading Platform: Exit Strategy Breakthrough & Paper Trading System

Our AI trading platform improved expectancy from near-zero to statistically meaningful levels through a two-stage exit strategy. We also built a causal paper …

machine-learningtradingexit-strategy

December 25, 2025 AI Trading

AI Trading Platform: Security Audit & Major System Updates

A comprehensive audit of our AI trading platform reveals a critical security fix, frontend accessibility improvements, and live stats from 33,000+ analyzed …

machine-learningtradingsecurity

December 17, 2025 AI Trading

AI Trading Platform: Observing Model Behavior During Strategy Optimization

A research note documenting how iterative changes affected model behavior, risk distribution, and evaluation metrics.

machine-learningtradingbacktesting

December 9, 2025 AI Trading

Building an AI Trading Platform: Our Progress So Far

We are building an AI trading system that learns from its mistakes. Here is what we have done, what works, and what problems we still need to solve.

machine-learningtradingneural-networks

⚠️ Important Notice

The AI Trading Platform is an internal research project operated exclusively by ABXK.AI. It is not publicly accessible and cannot be used by visitors.

Any results, insights, or examples shared on this website or on social media are provided for informational and educational purposes only and do not constitute financial advice.