TomBolton.io

Tom Bolton’s AI and Machine Learning Lab Notebook.

Machine Learning

It seems that the fundamental algorithm for inspecting possible moves from a given board position is minimax (or minmax). As usual, there are different ways to explain what’s going on with this. The Wikipedia page linked to above is replete with mathematical formulas along with certain logical statements intended to sum up the primary operating principles of minimax. The most concise explanation from that article is this:

The maximin value of a player is the highest value that the player can be sure to get without knowing the actions of the other players; equivalently, it is the lowest value the other players can force the player to receive when they know the player’s action. Its formal definition is:

From this, and subsequent explanation and more equations, I started to get a sense for it. But it was vague and I wouldn’t have been able to claim that I really understood it. Ultimately, though, it was—of all places—YouTube that really solidified my understanding of how it worked. Here’s my explanation of the algorithm.

Goal

The essential goal of using minimax in a tree-search scenario is to figure out, from a given board state what the optimal next move would be. This is accomplished by looking forward into possible next moves and see which player moves combined with opponent’s possible moves, will land the player in the best possible position as far into possible futures as it’s computationally feasible to look.

The Algorithm

The most fundamental part of this is building a tree that represents all possible combinations of player and opponent moves for some number of moves into the future. For a game like checkers, this is an impractical diagram, and so, so keep things simple, let’s use a hypothetical game where each player always has only two possible moves. We might build a tree that looks like this:

We’ll be starting with the red player since that’s the player for which we’re trying to figure out the next move. That’s the top red node. Red has two possible next moves, and those are the two blue circles below. Each blue circle indicates the board state that Blue will have depending upon which move Red chose. For each of those states Blue has two possible moves (for a total of four red circles). There are now four possible Red states, each with two possible moves, represented by the eight blue circles, representing, ultimately, the eight possible blue states available three moves in. The fourth move leaves 16 possible red positions.

Ideally, we would build this tree as far as it could go until each branch terminated with a win for either Red or Blue. Computationally, that’s impossible for any non-trivial game. But if that’s what this represented, the next step would be assigning each of those leaf nodes a value with respect to the red player. There are different ways to assign values to a win, but one that people use a lot is to assign ∞ to a win and -∞ to a loss. An alternate system uses 1 and -1 respectively. Doing that for this game might look something like this:

Of course, that’s silly. In reality, for the vast majority of moves in a non-trivial game, leaf nodes terminating in a win or a loss would be quite rare. Which leaves us with one of the major challenges of implementing minimax in practice: if a possible move is not an explicit win or loss, how would one evaluate the value of the board position after a move? This is typically done with heuristics. In the AlphaGo Zero implementation, the AI model determines this. Either way, though, instead of positive or negative infinity, there would be some positive or negative value indicating the relative strength of the position. Thus, we might have a tree that looked like this:

Here, we have two leaf nodes that end in wins, one for Red (∞) and one for Blue (-∞). The rest are values determined by some algorithm to represent the relative strength of the position. In general, positive would indicate a Red advantage and negative would indicate a Blue advantage.

Here it becomes important to understand that going forward, we will assume that Blue is using the same algorithm as Red. If, in reality, Blue were actually pursuing a weaker strategy (say, one that doesn’t look as far out), statistically Red would only do better than what we ultimately predict with this lookahead. And if Blue actually had a better strategy (say, looking farther ahead), there’s simply nothing to be done about that.

The goal, though, is to ultimately decide the next move for Red. How do we do this? We have our relative strength of the board position at the leaf nodes, so would we gain anything by calculating the relative strength of the interim board positions? I don’t think so. We’ve calculated out to the leaf nodes, and there are no unknowns between our current position and there, and since we assume Blue has the same knowledge, the interim positions are irrelevant. There is probably some mathematical proof of this, but I don’t know it.

The Concept

Leaving aside the mathematical and logical particulars as I’ve read them, here’s how I’ve come to understand how the algorithm actually works. We can see that in one of the branches off the left initial red move there is a win for Red (∞). Shouldn’t Red choose the left move? Not necessarily. The reason is that if we look at that leaf node, we can see that its sibling there is a -12  (Blue advantage). And since the choice between those two will actually be Blue’s choice (it’s Blue’s move), Blue would never choose that. What we’re going to want to do is figure out, for that last Blue row, which move Blue would choose. So for each Blue circle, we’re going to carry up the child node with the lower value:

Given that we can assume these are the selections Blue would make, we can now choose which of those branches Red should choose for the parent of each Blue move. Since Red would want to pick the path that would lead to the best possible outcome, Red would choose the child branch with the highest value.

And Blue would choose among those options, choosing again the lower value of the two choices.

Which leaves Red with what we initially set out to provide. If Red chooses the left branch (assuming Blue is employing the same strategy) Red will end up with a relative strength of -12, whereas if Red were to choose the right branch, the relative strength would be -72. Red should choose the left branch. Which is, in fact the same branch that contains the win for Red. But that will not be the outcome. Assuming the validity of the heuristics for determining these numbers, Red will still be at a disadvantage (-12) were this tree to play out. But it is the best possible position red can expect looking 4 moves into the future.

In reality, it would be unlikely to play out this way. These values, whether produced by rule-based heuristics or an AI algorithm, will likely never be perfect. Interestingly though, checkers is a “solved” game. Thanks to a database of moves which when compressed still weighs in at a whopping 237 gigabytes, there may exist what we could describe as perfect heuristics. Leaving that aside, in real game play, the algorithms for producing those values will have strengths and weaknesses, leading to a large degree of unpredictability. But that’s what makes this whole endeavor both interesting and fun.

The next thing I need to tackle is alpha-beta pruning, which is essentially an algorithm for producing trees like the one above, but without having to necessarily traverse every single branch. This can significantly reduce computational load.

Tagged:

LEAVE A RESPONSE

You Might Also Like