Vector Checkers is Complete
A couple of days ago, I finished my implementation of a completely vectorized checkers gameplay engine. With the current implementation, the model does all game operations for both sides on an entire batch of games at once. The most computationally expensive part of the old engine was determining legal moves. That’s true of the new engine as well. But now it can do that operation on 5,000 (or more) games in parallel on the GPU in hardware rather than sequentially in a scripting language.
The old engine could play approximately 13 games per second, depending on the length of the game. The new engine can play 210 games per second. On the one hand, at 16x the number of games per second, that is an impressive improvement. However, my goal was a 100x improvement. I was looking for an improvement that would be able to play games with 100 simulated moves per game moves at the same pace as it could play games with no simulated moves. That would need to be at least 100x faster than what I was using. So, I was shooting for two orders of magnitude improvement, and I achieved slightly better than one.
If this gameplay engine improvement in basic gameplay translates efficiently into these simulated rollouts per move, I will be looking at about two games per second. While not what I was hoping for, it means that the engine should be able to play 7,200 games per hour. That should be sufficient to move quickly enough through training batches to evaluate model architectures and rollout strategies fast enough for me to feel as if I’m making progress day to day, rather than over multiple weeks. And I suspect that should I be able to secure a modern top-of-the-line consumer-grade GPU (5090) I might be able to train even faster.
At this point, I have two possible next steps. The one that seems most tempting is to build out the game-simulation rollout algorithm. The vectorized checkers gameplay engine is radically different from the old, single-game script-based engine. The fundamental algorithm of simulation-based rollouts will be the same as it has been. But I will need to rethink how to execute it with a gameplay engine that is entirely different than what it was.
However, there is another thing to do that, will not make as much tangible forward progress but may be more important. I need to make sure that the vectorized engine actually plays correct and legal checkers. In my original engine, I did that through crude, fixed-state board setups where I would have a random-move model step through subsequent game states one by one via ASCII readouts of the board state at each step. I could do that here as well. However, a short while ago, I built an actual UI for a human to play against the machine in order to get a qualitative sense of how well a trained model would perform against a real player, who, in this case, was me. That UI will need to be refactored to work with the new vectorized version of checkers. I also made an essential improvement to how the vectorized version consumes game state and encodes game moves, and the UI will need to work with that as well.
The original game had two components to the representation of the game state. It had a (4, 8, 4) matrix representing the pieces on the board. I discussed that here. It also represented each of the twelve pieces for each player as a matrix that, for each piece, included whether it is in play, what kind of piece it is (regular or king), and what its location is on the board. These two separate matrices were both provided to the model as features representing game state.
Furthermore, moves were not represented as transformations of the board state. Instead, they were represented as transformations of the state of the pieces. Each move would consist of which piece number was the target piece and a one-hot matrix of the eight possible (but not all legal) moves that the piece could make, with the target move being the hot element. These would then need to be translated back into a transformation of the board state.
This was probably not the ideal approach to begin with, but it’s what I had. It made a lot of logical sense to me as I was programming it because it aligned with the mental model I was working with. I suspect it may have made it more challenging for the model to learn effectively because there was a synthesis of the board matrix and the pieces matrix, which the model would have to learn as part of its training. It also made the moves the model would have to predict more abstract from the board position. I can’t say for sure, but I feel confident that it was a suboptimal way to build the game engine.
Suboptimal or not, I don’t think it would be possible to build a vectorized version of checkers using this setup. So I had to change it. The most straightforward change to make was to reduce the feature vector for game state to just the (4, 8, 4) board state. If you think about checkers, the pieces vector contained information that was either redundant to information that was already in the board state or additional information (such as the unique “identity” of each piece) that was unnecessary to capture the game state or to make specific moves. After all, it doesn’t matter if a particular piece on the board in a particular location is piece four or piece 8.
So, that change was easy. Just scrap the piece matrix. The more challenging part was coming up with a different way to encode target moves (and legal moves) only in terms based on the (4, 8, 4) board state. What I came up with was an (8, 4, 5, 3) move matrix that could encode any of three things:
- Legal moves from the current board state
- Probabilities from the model for all possible moves
- A target move chosen by the model
The first two dimensions represent the locations on the board that form the “source” of the move, minus dimension 0 which encoded piece type and side. That is, the location on the board that contains the piece to be moved. The (5, 3) dimensions comprise the possible (but again, not all legal) moves that the model can make from each of those locations. There’s quite a lot to how this is implemented. I am not going to explain that here. But it works.
I believe this is a better approach all-around, both because it supports vectorization and also because I believe it’s a more direct way to connect the essential feature vectors for the model (board state) with the ultimate prediction, which is now also expressed in terms of board state, and I suspect it will facilitate faster learning on the part of the model.
However, these changes in representation mean that I need to refactor my UI to consume these new inputs and outputs and represent them appropriately. Being able to use the UI will enable me to try out all gameplay mechanics and verify that all moves the model makes are proper and legal. However, my confidence in this is already higher than it was when I was at this point in the original version of the checkers engine. Vectorizing the gameplay has forced me to be much more systematic in how I structure the engine. I don’t believe it could be doing anything wrong. But it would be good to know for sure. So I think that will be my next step.