On Entropy
The last time, we ran our first self-play training loop on a simple MLP model and observed catastrophic policy collapse. Let’s first understand some of the math behind what happened, and then how to combat it. What is entropy? Given a probability distribution \(p=(p_1,\ldots,p_C)\) over a number of categories \(i=1,\ldots,C\), such as the distribution over the columns our Connect 4 model outputs for a given board state, entropy measures the “amount of randomness” and is defined as1 ...