Machine Learning

Updates

By Tom Bolton February 8, 2025

My vectorization of checkers is moving along. Before I talk about that, though, one thing came up as part of that process that is worth mentioning. I’m building the whole thing in a Jupyter Notebook step by step so I can easily experiment with this foreign way of programming. While doing so, I discovered a bug in how I implemented the AlphaGo Zero algorithm in my non-vectorized version. I was using tanh as my output nonlinearity for the value network. This produces values between -1 and 1. -1 would be 100% confidence in a loss, and 1 would be 100% confidence in a win from that board position. However, for training purposes, I was assigning 0 to losing games and 1 to winning games.

Obviously, this is a mismatch and would not yield effective training. I can verify that the training on my prior engine and model was not learning well. I should note that the source of this bug was implementing ChatGPT’s recommendation for the output layer of AlphaGo Zero. But because I implemented it without fully analyzing it first, it didn’t occur to me that this was a mismatch from my training labels, which were 0 and 1. The thing I say about LLMs, in general, is that they’re great, but if what they say is not verifiable, don’t use it. The nice thing about code is it’s always 100% verifiable. So I have no real excuse for not verifying everything ChatGPT gives me. However, sometimes, when I’m hot on the trail of making progress on something, I want to keep moving fast, so I just take what I’m given and use it. I need to take my own advice and stop doing that.

Lately, I’ve mostly been using ChatGPT for help with PyTorch. My proficiency is growing, but I still have a way to go. ChatGPT is great for that. When I have some vector operation that I know I want to do, ChatGPT is great for providing operations and even full functions that accomplish what I need. Sometimes, it’s operations I was already aware of, and sometimes, it’s a new operation that I’ve never seen before. Especially when it’s something new or something that I wouldn’t have thought to do myself, I’ve taken to stepping through each function one line at a time and seeing the mechanics of each operation. There have been instances where I have discovered blocks of code that ChatGPT has provided that simply do not work. And it’s strange because, in at least one instance, one of these “broken” blocks of code was supported by explanations and inline code comments, which didn’t really make sense. The confounding part of that is no matter how nonsensical the supporting comments might be, they are always presented without reservation. There’s simply no way to tell whether something is solid from how ChatGPT presents it, because it sounds 100% confident of literally everything it ever says.

In any event, after discovering the misuse of tanh, it was immediately apparent that I should be using sigmoid rather than tanh. They both accomplish the same thing, but sigmoid delivers values between 0 and 1 rather than -1 and 1. Knowing how simple this change was, I decided that while I continued to work on my vectorization, why don’t I just switch my output nonlinearity to sigmoid and re-run my old setup?

The results are underwhelming.

Most notably, after making steady progress towards a rolling average win rate of between 60% and 65%, learning seems to stall. Not great. This could be due to any number of factors. Here are just a few of them:

My model architecture
My model’s initialization parameters
My cost function for my probability distribution
A bug in my model
A bug in the MCTS algorithm

So there’s a lot to think about there and I have some ideas. But of course, there’s the elephant in the room on those graphs. In the 64.5 hours that the game-play run above took to complete, my original game engine without MCT simulations would have completed 3,483,000. That graph shows about 5,500 games. Instead of 15 games per second, my engine is able to complete one game every 40 seconds.

So regardless of what my ideas are about what’s going wrong with my AlphaGo Zero implementation, at this rate, it will take me a small eternity to figure out what’s going on. I need to be able to log 5,500 games in minutes not days.

Hence, vectorization.

My progress there is steady, and I’m getting close to having the basic gameplay mechanics in place. I’ve gotten through being able to successfully make moves after masking legal moves. What’s left of the basics is to incorporate double jumps and then flip the appropriate boards to allow the opponent to make moves. I also need to detect wins and remove those games from the boards array. Once I’ve done those things (plus some details I’m sure I haven’t thought of), my game engine should be ready to play full games. It will be interesting to see how many orders of magnitude I can add to the speed of gameplay. My optimistic hope is 3. I’d be very pleased if I could play 13-15,000 games per second. I’ve taken some timings on parts of the gameplay loop and it seems like I could get close.

One thing I’ll say about this is that doing some of these calculations made me consider investing in a new GPU for the first time in ages. Back when I started doing this, I bought a 1080ti, which was top-of-the-line for consumer GPUs. I think that would be considered the low end of midrange now. Unfortunately, it is effectively impossible to get a 5090 right now, thanks to severe supply shortages, enormous demand, and scalper-bots. My hunch is that with a beefier GPU, it would be much more possible to hit that 1,000x mark. However, it will likely be months at best before I have a prayer of getting one of those at retail prices. So in the meantime, I’ll make the best of my 1080ti.

TomBolton.io

TomBolton.io

Updates

Tom Bolton

Vector Checkers is Complete

I’m Vectorizing the Shit Out of Checkers

Endgame

The Final Optimization

Updates

Tom Bolton

You Might Also Like