by ABXK.AI AI Trading

AI Trading Platform Update: Building a Bulletproof Evaluation Framework

machine-learningtradingbacktestingpythonevaluation

What Changed Since December 9th?

In our last post, we celebrated improving our win rate from 16.6% to 20.1%. But we kept asking ourselves: Are these results real, or are we fooling ourselves?

We spent the past two days answering that question. The answer: We were partly wrong.

Diagram showing Walk-Forward Evaluation Pipeline with 13 evaluation windows, training and OOS testing phases, cost model breakdown, and OOS results
Walk-forward evaluation pipeline with 13 testing windows to validate trading strategies on unseen data.

The Problem We Discovered

Our previous backtesting approach had small but important problems:

  1. No out-of-sample testing — We tested on the same data we trained on (the AI had already seen this data)
  2. Missing costs — We ignored trading fees, spread, and slippage
  3. Small samples — We made decisions based on 10–20 trades
  4. No reproducibility — Results changed between runs, so we couldn’t repeat tests

What We Built (Dec 9–11)

1. Walk-Forward Evaluation System

We built a time-series evaluation system that makes sure training data and testing data never mix:

13 independent evaluation windows
168 true out-of-sample trades (data the AI never saw during training)
Zero overlap between training and testing data

The system moves through time, training only on past data and testing on future data — just like real trading.

2. Realistic Cost Model

Every trade now accounts for real trading costs:

AssetFeesSpreadSlippageTotal
Stock0.10%0.05%0.05%~0.20%
Crypto0.20%0.10%0.15%~0.45%
Forex0.02%0.08%0.03%~0.13%

The backtester now tracks both gross (before costs) and net (after costs) profit and loss separately.

3. Minimum Trade Guards

The system now refuses to report statistics without enough data:

```python MIN_TRADES_FOR_STATS = 30 # Basic statistics MIN_TRADES_FOR_CONFIDENCE = 50 # Statistical significance ```

No more making decisions based on 15 trades.

4. Comprehensive Metrics Suite

New metrics module with:

  • Wilson confidence intervals (a way to measure how reliable our numbers are)
  • Expectancy calculation (average profit per trade)
  • Profit Factor
  • Sharpe and Sortino ratios (risk-adjusted returns)
  • Maximum drawdown tracking (largest loss from peak)

5. Reproducibility Framework

Every experiment now logs:

  • Git commit hash and branch
  • Random seeds (Python, NumPy, PyTorch)
  • Full configuration
  • Results with timestamps

6. Parameter Optimization Tools

  • Stop-Loss Sweep — Test ATR multipliers systematically
  • Regime Detection — Classify market conditions
  • Ablation Harness — Compare model configurations fairly

7. Database Migration

Added new fields to track cost data:

result_pct_gross  -- PnL before costs
costs_json        -- Cost breakdown
exit_reason       -- stopped/target/timeout
bars_held         -- Trade duration

The Honest Results

With proper out-of-sample evaluation:

MetricPrevious ClaimActual OOS
Win Rate20.1%20.8%
Profit Factor~1.2 (estimated)0.92
Sharpe RatioNot measured-0.46
Total OOS Trades168

The truth: While our win rate held up, we’re actually losing money when costs are included.

New Modules Created

src/
├── reproducibility.py    # Seed management, experiment logging
├── metrics.py            # Trading metrics with confidence intervals
├── walk_forward.py       # Time-series evaluation pipeline
├── ablation.py           # Model comparison harness
├── regime.py             # Market regime classification
└── stop_loss_sweep.py    # ATR parameter optimization

tests/
└── test_core.py          # 20 unit tests (all passing)

Tests: All Passing

$ pytest tests/test_core.py -v
======================== 20 passed in 1.60s ========================

Tests cover:

  • Cost model calculations
  • Wilson confidence intervals
  • Walk-forward window generation
  • OOS data isolation
  • Trade result conversion

What This Means

Bad news: We’re not profitable yet (Profit Factor 0.92).

Good news: We now have the tools to find real edge:

  1. Every future improvement will be validated out-of-sample
  2. Costs are included from the start
  3. Results are reproducible
  4. Statistics are meaningful

Next Steps

  1. Improve signal generation (current 20.8% win rate needs to reach ~33% for 2:1 R/R profitability)
  2. Test different indicator combinations
  3. Filter by market regime
  4. Explore longer time horizons

Learn more about the platform: AI Trading Platform

Previous: Building an AI Trading Platform | Next: From Losing to Winning - Our Optimization Story


Building a profitable trading system is hard. Building an honest evaluation framework is the first step to knowing if you’ve actually succeeded.

AI Trading Platform Blog

Read our development journey and latest updates:

AI Trading Platform: Adding CI/CD to Our Development Workflow
AI Trading

AI Trading Platform: Adding CI/CD to Our Development Workflow

We implemented a CI/CD pipeline for the AI Trading Platform. This post documents the approach, the types of bugs it catches, and why automated testing matters …

AI tradingCI/CDautomated testing
AI Adoption in Trading: Why Acting Now Matters
AI Trading

AI Adoption in Trading: Why Acting Now Matters

AI adoption in trading is accelerating. Learn why firms that act now gain an advantage, and how our AI trading platform validates this approach through real …

AI tradingtrading technologypaper trading
AI Trading Platform: Choosing the Right Markets and Trading Style
AI Trading

AI Trading Platform: Choosing the Right Markets and Trading Style

We tested our frozen v0.3 exit strategy across different markets and trading styles. Here is what we learned about CFDs, futures, stocks, and why swing trading …

AI tradingfutures tradingCFD trading
AI Trading Platform: Exit Strategy Breakthrough & Paper Trading System
AI Trading

AI Trading Platform: Exit Strategy Breakthrough & Paper Trading System

Our AI trading platform improved expectancy from near-zero to statistically meaningful levels through a two-stage exit strategy. We also built a causal paper …

machine-learningtradingexit-strategy
AI Trading Platform: Security Audit & Major System Updates
AI Trading

AI Trading Platform: Security Audit & Major System Updates

A comprehensive audit of our AI trading platform reveals a critical security fix, frontend accessibility improvements, and live stats from 33,000+ analyzed …

machine-learningtradingsecurity
AI Trading Platform: Observing Model Behavior During Strategy Optimization
AI Trading

AI Trading Platform: Observing Model Behavior During Strategy Optimization

A research note documenting how iterative changes affected model behavior, risk distribution, and evaluation metrics.

machine-learningtradingbacktesting
Building an AI Trading Platform: Our Progress So Far
AI Trading

Building an AI Trading Platform: Our Progress So Far

We are building an AI trading system that learns from its mistakes. Here is what we have done, what works, and what problems we still need to solve.

machine-learningtradingneural-networks
⚠️ Important Notice

The AI Trading Platform is an internal research project operated exclusively by ABXK.AI. It is not publicly accessible and cannot be used by visitors.

Any results, insights, or examples shared on this website or on social media are provided for informational and educational purposes only and do not constitute financial advice.