Multi-Step Bootstrapping
Until now, we’ve done one-step lookahead for the TD bootstrapping in the A2C algorithm. We can significantly improve upon this by looking further ahead. Bootstrapping with one step Looking back at the states-values-rewards diagram in Implementing A2C, we had state \(s_i\) transitioning into state \(s_{i+1}\) with an immediate reward \(R_i\). How we actually implemented bootstrapping was subtly different and better described by this diagram: s₀ v(s₀) s₁ v(s₁) s₂ v(s₂) s'₀ v(s'₀) s'₁ v(s'₁) s'₂ v(s'₂) R₀ R'₀ R₁ R'₁ R₂ States and rewards diagram for own states si and opponent states s'i. ...