These notes provide a unified treatment of policy optimization methods, connecting perspectives from reinforcement learning, probabilistic inference, and optimization theory.
Starting from the policy gradient theorem, we build up toward proximal policy optimization (PPO) and its extensions, discussing their interpretations through KL regularization, importance weighting, and proximal point methods.


Contents