TomBolton.io

Tom Bolton’s AI and Machine Learning Lab Notebook.

Machine Learning

There’s a gap  in the courses I can take right now, so I have about two weeks to push forward with checkers and I started again on that yesterday. As I was starting to put together some of the classes that I’ll need to build this right, it occurred to me that it would make sense to write out some of the use cases I’ll want to cover, so that I build it right (or close-to-right) the first time.

  1. I’ll want to be able to have two agents (or maybe I’ll just call them “players”), one to play red, one to play black.
  2. I should be able to assign each player a model that the player will use to play the game.
  3. As mentioned before, at the start, both players will be playing the same model, and the model will basically be learning to play against itself.
  4. I’ll want to be able to start off saving the state of each model (at the start, just the one) in its initialized state.
  5. When the playing starts I’ll want to be able to monitor progress. At first, since each player will be playing the same model, and will be as evenly matched as possible against the other player, I’ll need a way to measure that the model is learning anything. I have some ideas on this.
    • Idea 1 – keep track of how many moves it takes to eliminate some fixed number of pieces from the board. My assumption is that at the very least, as the model moves away from near-random play, it will start to get quicker at making jumps.
    • Idea 2 – Measure what percentage of proposed moves by the model are illegal. At the very beginning (as I do this math in my head) the model will recognize 48 possible moves, only 7 of which are legal. Thus, I’m guessing that about 85% of moves will be illegal at the start of play. It’s been my assumption that I would do an immediate stochastic optimization in those scenarios, but I’ve realized that that won’t be necessary. I can explain this, but it’s too much to get sucked into while I’m trying to outline user stories.
  6. I’ll be able to track progress in each of these areas after each round of training, and I’m hoping that at the very least, I get some forward progress. However, if not, I’ll need to be ables to stop, tune, and start over.
  7. So I’d need to be able to set up a training schedule that should have some of the following characteristics:
    • Will this be picking up after an interrupt?
    • If not pickup:
      • Will the models be symmetrical (same models for both players)
      • Which model to use for the red player and the black player.
      • What checkpoint to use for each model.
    • Number of games before training takes place (I guess this could be considered the batch size, although it seems likely that I might use stochastic GD for this).
    • I’m not sure how quickly these games will run, but it seems like it might make sense to save interim game data before a batch is complete.
    • Interim saves of each model update. Basically, I’ll want to be able to interrupt the process after a completed batch and do whatever: change course, tune, look at game data etc.
    • An option to pick up where I left off. Presumably I’ll start by entering parameters, but after an interrupt, I’ll want to simply pick up and play from the last model update.

I should have just posted this when I started writing it a couple of weeks ago as I think some of it is stale, but I’m putting it up now anyway. Sigh.

Tagged:

LEAVE A RESPONSE

You Might Also Like