TomBolton.io

Tom Bolton’s AI and Machine Learning Lab Notebook.

Machine Learning

A Checkers UI to Develop Intuition

After seeing unexpected behavior from the model in a bootstrap setting, I had decided that it was important to do some evaluation of the performance of the different versions of the model against one another. So I set up an evaluation framework where I saved the parameters of each bootstrapped version so once I had a few (or many in the case I illustrated in the last post), I could compare the performance of each not just against itself as it was trained, but against any and all prior models. This led me down a couple of paths which I will discuss here.

As I was gearing up to make the evaluation framework, I had a realization that I captured in the update: I suspected that because I had not tweaked the learning rate at all as training progressed, what I was seeing was the result of aggressive overfitting. So in the spirit of parallel processing and not wasting time, while I was setting up the evaluation framework, I followed through on my speculation and adjusted the bootstrap framework to halve the learning rate with each successive bootstrap version, and also to capture the state of each new boostrap version for later evaluation. The results were significantly different and more like what I was expecting at the outset.

There are a few bootstrap versions which, counter to my intuition and expectation, learned faster than the prior bootsrap version. But the overall trend was for the model to learn more slowly with each version.

I used these seven versions for the evaluation framework. In its initial form (which I have not yet updated) the most recent – and presumably best – bootstrap version plays 1,000 games against each of the prior bootstrap versions. I did not graph this, but merely looked at the results. The performance was not as I expected. In general, the model performed better against lower bootstrap versions, but not nearly to the extent that I expected. For instance, after the one million games of cumulative training that boostrap version 7 had, I would be expecting it to win 99% of games – not including draws – against boostrap version 0 which is essentially random moves. It did not perform that well, winning less than 90% of games. this compares to what it was able to achieve against the prior version of itself, version 6, where it won about 76% of games.

On top of this, I also ran some evaluations with temperature set at 0.1 rather than 1. This encouraged far more greedy and deterministic play. The results there were downright bizarre. Because when I did this was late, and well past my bedtime, I did not document or graph any of this. Eventually I may get to that.

What these evaluations did accomplish was bringing about a realization. I chose Checkers as the domain for my AI those many years ago primarily because it seemed like a challenging enough game – as opposed to tic tac toe – to be worthy of training an AI to learn. Conversely, Checkers seemed like not so difficult a game – as opposed to Go or Chess – that the combination of my then embryonic AI skills combined with my consumer-grade compute infrastructure would limit my ability to create something competitive.

Despite this choice, the fact is that I don’t play checkers. I know the rules, and I built the game mechanics in python to run headlessly. That is, games are played with no UI that even shows gameplay, let alone allows a person to play against the models I am training. I did build a crude ASCII output that I used early on to test the gameplay mechanics, but that has no practical use outside of that early use case. At 15 games per second, there would be no reason to every view these games.

But the strange results I have been seeing have shone a light on how little intuition I really have about the game of checkers. I had so many questions. Has the model been saturated and can’t learn any more? Is it overfitting to specific use cases. How much room is there to improve with a conv-net rather than the current fully-connected net? Would building AlphaGo Zero-like logic for move simulations, and policy and value training actually have ROI on a game as relatively simple as Chekcers? And perhaps most importantly: how challenging is it to play against my current best model?

Because of all this, I decided it was time to build an interactive, visual front end to support playing against the model. Yesterday I did that. It’s bare-bones functional, built in Pygame, but it works. I’ve played a few games against the model to completion.

The verdict: my model has a long way to go. So far, I’ve completed three or four games, and have yet to lose a game. I think it’s fair to say that I have beginner-level skill, which means I have yet to build and train a model that can beat a beginner.

I have much work to do.

You Might Also Like