TomBolton.io

Tom Bolton’s AI and Machine Learning Lab Notebook.

Posts by: Tom Bolton

Machine Learning

It’s been a while, but since I last wrote, I managed to implement the AlphaGo Zero gameplay/learning algorithm. I updated my code to do most of the important things that AlphaGo Zero does. My model now outputs policy probabilities and…

Machine Learning

I have been considering where this effort has brought me and where it might go. When I was selecting a project to work on, the first option I considered, just because it was absurdly simple, was Tic Tac Toe. After…

Machine Learning

I did my final optimization of my setup using Policy Gradient Loss with Reward. Rather than reward all moves of games equally, I implemented discounting whereby the move that generated the win is given the full 1 reward, and the…

Machine Learning

I really can’t believe how much I’ve managed to do since the last post. I was still talking about dropout and projection layers and forgetting. My goodness has a lot happened. The Old Convolutional Model My previous model input the…

Machine Learning

Dropout helped the model a lot, but not for helping with overfitting. Instead, adding dropout layers in the model radically improved the speed at which it was able to increase its win rate. Here is pre-dropout performance for the first…

Machine Learning

I have updated my model to be a conv net. In addition to the piece vectors, I am now feeding the 4x8x4 board state into parallel 3×3 and 5×5 conv layers with an 8×4 output and 16 channels each for…

Machine Learning

With the last update, my objective assessment of the model’s progress was that it could not beat a beginner checkers player (me), and that I had a lot of work to do. The model is comically simple. It is a…

Machine Learning

After seeing unexpected behavior from the model in a bootstrap setting, I had decided that it was important to do some evaluation of the performance of the different versions of the model against one another. So I set up an…

Machine Learning

I’ve completed changing my code to support a model designed to win games rather than guess legal moves. The model is the same, but the reward function, some game details, and much of the administrative framework has changed. The actual…

Machine Learning

I started actually doing AI again. As I was watching all the Karpathy videos, I found myself wanting to stop watching videos and get back to building. I could have gone out and built Karpathy’s nano-gpt model, but he already…