clipped PPO

CellStrat > Research/Blog > clipped PPO

Mar

In my previous post, we discussed the simplest Policy Gradient REINFORCE. We saw, how Policy based methods are better than value based methods, a derivation of the Gradient of Score(Cost) function, and an implementation of simple Policy Gradient to train Gym’s Acrobot-v0. We then saw, how introducing a baseline reduces variance which leads to the […]

Posted in: Reinforcement Learning,

Tags: clipped PPO, Importance Sampling, PG, policy gradients, PPO, proximal policy optimization, reinforcement learning, RL, TRPO, trust region policy optimization,