Policy Gradient Theorem Explained – Reinforcement Learning – AlphaStar Train AI to win at Starcraft and DOTA