A six-month side project · Feb → May 2026

AlphaLudo

A neural network that learned to play Ludo from scratch.

3 million parameters trained over six months and dozens of experiments — now running entirely inside your browser. No server, no API calls, no waiting. Pick up the dice and see if you can beat it.

Inspired by AlphaGo · TD-Gammon · AlphaZero

6mo of iteration
31 experiments tried
3M parameters in the model
~80% wins vs scripted opponents
~52% wins vs the previous AlphaLudo (10,000-game test)

Interactive demo

Play AlphaLudo

You're Green (Player 0). The model is Yellow (Player 2). Click the dice to roll, then click a highlighted token to move.

You vs AlphaLudo
Your Turn

Behind the model

Six months of iteration.

From a naive baseline that couldn't tell its own four tokens apart, to a 3-million-parameter network that beats every earlier version of itself. The short tour of how we got there — and what didn't work along the way.

From V1 to V13.2

The architecture timeline

  1. Feb 2026 · The first try The naive baseline

    The model saw the board as eight stacked black-and-white maps — but it couldn't tell its own four tokens apart. They all collapsed into one blob, so the AI was guessing which piece to move. It lost a lot.

  2. Mar 2026 · Hand-holding Engineered features

    We started feeding the network "tactical hints" we computed by hand — danger maps, capture opportunities, safe landing squares. It got better, but plateaued at 73–77% wins against scripted opponents and stopped improving.

  3. Apr 2026 · Attention Breaking through the plateau

    Added a "token attention" layer — letting the network reason about its four tokens as separate entities, with awareness of how often each had been ignored. First model to consistently win >80% against scripted opponents.

  4. May 2026 · Current Less is more

    We stripped most of the hand-engineered features back out and gave the network mostly raw board positions. It beats every earlier version of itself. Over 10,000 head-to-head games against the previous best AlphaLudo, this version wins about 52% of the time — a small but statistically real edge. This is the model you play against.

End-to-end

How AlphaLudo learns

  1. 01 Bootstrap

    Generate millions of practice games between scripted bots — heuristic, aggressive, defensive, expert. The network learns by watching them play.

  2. 02 Imitate the best teacher

    The new student network is trained to copy the previous best AlphaLudo's decisions. By the end of this stage it already plays as well as the teacher.

  3. 03 Self-play reinforcement

    The student plays thousands of games against itself and various opponents, gradually adjusting its strategy to win more. Once it's consistently strong, we add the previous AlphaLudo versions back in as sparring partners.

  4. 04 Fix the bad habits

    Watching it play, we noticed specific failure modes — leaving a laggard token at base, walking into capture range. Reward penalties were added to discourage these, then tuned by trial and error.

  5. 05 The honest test

    Win rate against scripted bots saturates around 80%, so it stops being useful as a measure. We compare versions directly — 10,000 games each, head to head. That's the only test that distinguishes the strongest models.

From the journal

Three lessons we won't unlearn

Failed

"Mathematically clean" rewards can be poison

An early reward-shaping scheme looked elegant on paper but quietly subtracted a tiny amount of reward every turn. Over a 150-move game it added up to about a fifth of a "loss" — the model became convinced every game was unwinnable. Took 155,000 games to figure out what was happening.

In long games, even tiny systematic biases compound. Always check what the reward looks like end-to-end.

Worked

Loud rewards beat clean rewards

We tried scaling intermediate rewards 5× smaller, reasoning that the final win/loss signal should dominate. Win rate cratered from 67% to 33% over 125,000 games. In a dice game the random variance is so loud that quiet signals just get drowned out.

In stochastic games, intermediate rewards must be loud enough to cut through the dice noise — not just mathematically pretty.

Insight

The architecture isn't the bottleneck

We tried three completely different network designs — one with attention, one pure convolutional, one with no spatial structure at all. All three plateaued at the same 80–83% win rate. Whatever's holding us back, it isn't the shape of the model.

More parameters and fancier layers were never going to help. The ceiling lives somewhere else — probably in the training opponents we have access to.

Want to see the model in action?

▶ Play AlphaLudo

Inspirations & dead ends

The lineage.

AlphaLudo borrowed liberally — and abandoned an idea or two along the way. Here's the full reading list, in chronological order.

Inspiration · 2016 · DeepMind

AlphaGo

The whole project started here. AlphaLudo borrows the AlphaGo recipe almost wholesale — a network that predicts both the best move and how likely you are to win, trained first by imitating a strong teacher and then by playing millions of games against itself.

If you've never seen the documentary, watch it. It's still the best one-hour explanation of why this whole field exists.

1992 IBM · Tesauro

TD-Gammon

The original "neural net plays a dice game at world-class level" result, written when most of the modern field didn't exist yet. Thirty years later, AlphaLudo rediscovered Tesauro's central lesson the hard way: in dice games, the small rewards along the way matter more than the final win/loss signal. Scale them down too far and learning collapses.

Read on Wikipedia →
2017 Tried, then rejected

AlphaZero

The natural next thing to try after AlphaGo — but it didn't work for Ludo. AlphaZero's tree search assumes you can simulate the future of the game cleanly, but a dice game branches four ways every turn, so any reasonable amount of search drowns in randomness.

Worth mentioning because the failure was instructive: pure tree-search doesn't survive in games with this much randomness.

Original paper →
2017 Zaheer et al.

DeepSets

A 2017 idea that lets a neural network reason about a "set" of things — like the four tokens you control — without caring what order they're in. We used this for one experiment in AlphaLudo, building a much smaller network with no convolutional layers at all. It hit the same ceiling as the bigger models, which is what convinced us the model itself wasn't the limit.

arXiv:1703.06114 →
2017 Schulman et al.

PPO

The reinforcement-learning algorithm doing the heavy lifting in every AlphaLudo run. PPO is the workhorse of modern RL — boring, reliable, well-understood. We didn't try to be clever with the optimiser; the interesting part of AlphaLudo is what we feed into it, not how we update the weights.

arXiv:1707.06347 →

Project meta

About AlphaLudo.

A six-month side project on what it actually takes to learn Ludo from raw self-play. Built end-to-end — engine, training, mech-interp, and this site.

Runtime

How this page works

  • Game engine: hand-written C++ compiled to WebAssembly via Emscripten
  • Inference: ONNX Runtime Web (single-threaded WASM build)
  • Frontend: vanilla ES modules, no framework, no bundler in dev
  • Hosting: Cloudflare Pages (static), no server, no telemetry
  • Total payload: ~50 MB, dominated by the ONNX model + ORT runtime

Model

What the AI is

  • ~3 million parameters, all running locally on your machine
  • Convolutional network — looks at the board as a 15×15 image with extra channels for token positions and dice value
  • Three outputs: which token to move, an estimate of who's winning, and how long the game has left
  • Trained by imitating an earlier AlphaLudo and then sharpening through self-play

By the numbers

Six months of training, summarised

~14Mgames of self-play across all versions
~10Mteacher games used for imitation
31labelled experiments
14distinct architectures tried
8generations of input encoder
3major dead ends documented

Privacy

What we collect

Nothing. There is no backend. Your moves never leave your browser. The page loads Google Fonts and (on the Lineage page) one YouTube embed via youtube-nocookie.com; that's the only third-party traffic. No analytics, no cookies, no telemetry.

Ready?

Play the model

The network is already loaded. Pick up the dice.

▶ Play AlphaLudo