It’s the Gradients, Stupid
…and it is not lost on me now that I may have had a problem with my gradients which I did not check…
– Me, five days ago
After taking a brief hiatus from my checkers AI to do some overdue house projects, Monday morning I put in place a gradient checking algorithm for my net. And wouldn’t you know that when doing back propagation with the softmax function in place, my gradients are way off. The gradient checker works just fine, because for sigmoid, everything checks out.
What blows about this is that I can’t figure out why they’re wrong. By all indications, the derivative of softmax with respect to a given linear output should be the same as that for sigmoid. Yet clearly it isn’t.
On the one hand, I’m tempted to just leave this little problem in the dirt behind me and go with where I’ve gotten my best workable results. However, my original hypothesis was to use softmax, and since it seems that if I could figure out how to implement it correctly, it might actually be a better way to go. I am, after all, hand coding all of this to become intimately familiar with this stuff. So I’m going to put some more time and effort into this.