Machine Learning

Discovering Monte Carlo Tree Search

By Tom Bolton April 17, 2018 Leave a Comment

Having spent some time thinking through the trivial game of tic tac toe, I’ve now become interested in tackling some “real” game mechanics. Checkers is the one I have in mind. So naturally, I’m interested in the current state of the art in game-playing AI: AlphaGo Zero. I read the abstract to the Nature article they published explaining how they designed it. It’s a bit beyond my depth right now (convolutional NNs are still an area I need to cover), so I don’t think it would be worth it for me to fork over $36 to buy it. However one thing I was able to figure out from reading the abstract is that Monte Carlo Tree Search (MCTS) is… you know… a thing for game-playing AIs.

My experience with MCTS to date is in two areas. First, I got a foundational understanding of Monte Carlo simulations in general to support some work I’m doing for a financial services company. They use a MC simulation to predict the likelihood of a retirement plan “succeeding” through a variety of possible market conditions over time between now and retirement. I had also read about a MC simulation that was used to estimate Pi by picking random points in a square circumscribing a circle. The ratio of points outside the circle to points inside the circle can predict Pi with ever increasing accuracy as the number of random points goes up.

Monte Carlo — randomly picking next moves — is very useful in AI game play learning, and it seems that this was at least one direction that I would have headed had I gone ahead and built a tic tac toe-playing machine. In the last post, I had written this:

Furthermore, you’d be able to get to that point by having the system make completely random legal moves game after game after game.

Needless to say, I hadn’t really thought much about the details. Nor did I consider this possibility in the same way it’s often used in game play learning systems. Sophisticated AIs have the computer play out vast numbers of games from a given position during game play in order to predict next moves before making a move. My simple speculation had the system simply accumulate very simple probability numbers as part actual in-game random moves. Of course, my speculation was all in service of training the system, not playing actual games. From a training perspective, there’s no fundamental difference between playing “actual” games and “prediction” games. It’s all in service of teaching the system.

Either way, I’m enjoying seeing some correlations between my thinking and instincts on some of these issues and how these things are being tackled in the real world of AI. It’s great fuel for the significant work I have ahead which includes, among other things:

Becoming more familiar with Python so I can use this stuff in real-world environments.
Learning how convolutional NNs work.
To the extent possible, applying the above two items to the task of playing checkers.

Tagged:

TomBolton.io

TomBolton.io

Discovering Monte Carlo Tree Search

Tom Bolton

LEAVE A RESPONSE Cancel reply

Vector Checkers is Complete

Updates

I’m Vectorizing the Shit Out of Checkers

Endgame

Discovering Monte Carlo Tree Search

Tom Bolton

LEAVE A RESPONSE Cancel reply

You Might Also Like