GitHub Pages Demo

Tic-Tac-Toe RL Arena

Play against a tiny policy network trained in this repo. The default mode uses the stronger network that was trained with MCTS-guided action selection.

Match Setup

Loading model...
You 0
Draws 0
AI 0

Loading exported weights...

The browser demo runs direct network inference for responsiveness. The training-time search itself is not replayed here.