TomBolton.io

Tom Bolton’s AI and Machine Learning Lab Notebook.

Machine Learning

Endgame

I have been considering where this effort has brought me and where it might go. When I was selecting a project to work on, the first option I considered, just because it was absurdly simple, was Tic Tac Toe. After just a little bit of thought, that turned out to be too simple. There are significantly less than 20K board positions, and the best way to “learn” how to play would be to keep a complete list of all possible game states, and record the optimal move for that position. Game over.

At the time, I chose checkers because it was clear that the number of possible board states was significantly higher than would be practical to store in a table. It turns out that there are approximately \({10}^{20}\) possible game states. Definitely high enough to make use of a table impractical. So it is a game for which employement of a neural net seemed worthwhile. Additionally, it was a game that, while large and complex enough to warrant an AI solution, was not so large and complex that making a “successful” AI was beyond what I could accomplish given both my level of experience and expertise at the time as well as my compute budget. Choosing either Chess or Go felt beyond what I could “succeed” at. But what counts as “success?”

Over the past couple of months, I have made significant strides in building an AI that has some skill at Checkers. Thanks to the UI that I recently built that enables me to play against my models, I know that some of the recent models I have built are non-trivial to defeat. I am often forced into draw matches. Here’s the story statistics are telling: my model can typically win 99-100% of matches against its prior version and often against previous prior versions. That sounds great except when considering the fact that any of my models, no matter how well trained on improved versions of itself, still loses about 10% of matches against random moves.

One thing I have noticed is that the models, when playing against prior versions if themselves, seem to be tuning their play to the particulars of the move distributions of the prior model rather than learning more generalizable techniques and strategies to win. It is obvious that in many scenarios, playing against a prior version of itself ends up playing literally the same game over and over again. In looking at the win/loss feed as any batch of 500 games unfolds, it is not uncommon to see 50-60 games all ending with exactly the same move count and exactly the same number of pieces on the board for both sides. I have not tried to look at actual endgame board positions for all those games, but outcomes like that are never seen playing against random moves. It is apparent that the diversity of games for trained models playing other versions of trained models is quite low.

Considering all this, I decided to research the qualitative and quantitative nature of Checkers versus other games like Chess and Go. I was not surprised to find out that Checkers is a much simpler game than either Chess or Go. As I mentioned, estimates for the total number of board positions for Checkers is typically around \({10}^{20}\). For comparison, board positions for Chess is from \(d{10}^{40}\) to \({10}^{50}\). That is vastly more complex. For each individual board position in Checkers, Chess has a number of board positions equal to the total number of board positions in all of Checkers. And the number of board positions in Go is a mind-blowing \({10}^{170}\). That is far more than all the atoms in the universe, which is typically estimated at \({10}^{80}\). So much more that if every atom in the universe, instead of being one atom, was itself an entire universe worth of atoms, that would still be less than the number of possible board states in Go.

The qualitative differences are more interesting and more relevant to me, though. The most crucial qualitative difference is that Checkers is considered a solved game. It has been proven that perfect play on both sides leads to a draw. That fact is the most relevant for me. It means that getting to an ideal state – an AI that plays perfect checkers – will lead not to an AI that wins every game, but rather an AI that never loses a game.

You Might Also Like