Sudden Change of Plans
That didn’t take long. A mere day after my last post where I laid out my upcoming plans for my model, I’ve thrown them out the window. Instead, it’s time to stop futzing around with anything to do with my interim goal of learning legal moves, and instead, drop straight into building a true reinforcement learning algorithm to learn to play checkers well.
Why the sudden change of heart? I’m glad you asked.
This morning on the way into my job, I would typically either be doing work for that job, or working on this project. Unfortunately, I had forgotten to charge up my laptop, and wouldn’t have had sufficient charge to do anything meaningful on either front. So instead, I took out the old iPhone and decided now was the time to fork over the cash for the Mastering the game of Go without human knowledge. Many months ago, I had decided that I didn’t think it would be worth it to spend the money on it. Mostly I suspected that I wouldn’t yet be able to make use of it given where I was with my project, not to mention my understanding of reinforcement learning in general. This morning I thought there was a possibility that might still be true.
Fortunately, that is far from how it turned out. I was reading and re-reading it very slowly on the train, and I still have a lot to go. But I’m pleased that at a high level, I have a general understanding how it works. There are many, many details that I don’t yet understand, but the basic principles seem to make sense. So now, my short term plan is to take the next couple of posts writing a “plain english” description of how I understand its workings. Like the old saying: to the extent that I can clearly explain it, it means I understand it. And even if only for myself, I’d like to validate that understanding before I begin the formidable task of implementing something like that for my own model.
Most importantly, though, it’s time to focus my energies on understanding that algorithm and working on implementing it for my own problem.
Even where I am now, though, I do have an interesting takeaway. With the sense I have for the self-play/training workflow, I can say with confidence that the particular implementation of my parallelized architecture will not directly adapt to what I’ll need to build. However, the work won’t go to waste. The basic parallelization concepts I developed should work handily in a situation where, as I suspect, the bulk of the self-play computation will be spent executing Monte Carlo Tree Search. Since each branch of the tree search uses a static version of the network, it should be reasonable to parallelize the steps in MCTS over many parallel games. I’m looking forward to the challenge.