The Five-year Memory Hole
With the last update, my objective assessment of the model’s progress was that it could not beat a beginner checkers player (me), and that I had a lot of work to do. The model is comically simple. It is a five-layer fully-connected network (perceptron) delivering only a policy distribution. It is not a convolutional network. It does not implement in-game simulations from individual board positions or have a value output. So I had every reason to have low expectations. Subjectively, though, I was disappointed with its performance.
It turns out, though, that my model was using the wrong inputs. It’s not that I just realized that there are better inputs than what I built into the model. What seems to have happened five years ago before I stopped working on it is that I switched the correct inputs for something else, something wrong for learning how to win games.
For some reason, I had taken out one of the most important inputs to the model: the full board state. Instead, all I was feeding the model was the “view” from each piece to the eight possible moves. So essentially, I was training the model with no visibility into exactly what pieces were on the board and where. Given that, I’m impressed it was able to learn as much as it did.
My motivation for doing this is lost to the memory hole.
But now I’ve reinstated the original input which includes the full board state. Qualitatively, the new model is a more challenging opponent. But I still don’t have much trouble beating it. This time around, I also gathered more comprehensive evaluation statistics on how each bootstrap version of the model performs against every other bootstrap version. It’s hard to know for sure, but the data indicates that the incremental gains past bootstrap version 4 are negligible.
It is now time to up the game with a convolutional model, and, if I can figure it all out, a training and play algorithm that includes simulations for moves as well as a model with a value output.