Connect-Zero: Reinforcement Learning from Scratch

For a long time I’ve wanted to get deeper into reinforcement learning (RL), and the project I finally settled on is teaching a neural network model how to play the classic game Connect 4 (pretty sneaky, sis!). Obviously, the name “Connect-Zero” is a cheeky nod to AlphaGo Zero and AlphaZero by DeepMind. I chose Connect 4 because it’s a simple game everyone knows how to play where we can hope to achieve good results without expensive hardware and high training costs.

Some ground rules:

The neural network model gets nothing but the current board state as a 6x7 grid and outputs a classifier over the seven columns for which move to play next.
Like AlphaZero, it doesn’t train on any human games, but only on computer-generated ones.
Unlike AlphaZero, it doesn’t do any tree seach to find the best moves, for multiple reasons: for simplicity, and because I was curious to see how far you could get with a simple move predictor. In fact, there’s precedent to show that even for Chess, a vastly more complicated game, this can work well: see the DeepMind paper Grandmaster-Level Chess Without Search, which trained a pure next-move predictor for Chess. However, they used Stockfish as a teacher for their model, rather than self-play. I also strongly suspect that Connect 4 is shallow enough that tree search might trivialize it, and that wouldn’t leave much room for any interesting RL.
Build it from scratch in PyTorch, without relying on any existing RL framework. The point is to learn about the techniques more than just getting to the end.

You can actually play against a version of the current model right now! It will take a couple of blog posts until we catch up to that level, though.

The code repository for this blog post series is at https://github.com/c-f-h/connect-zero.

In the next post, we’ll specify how our models interact with the board and set up a basic gameplay loop.