clipped PPO
12
Mar
Proximal Policy Optimization
In my previous post, we discussed the simplest Policy Gradient REINFORCE. We saw, how Policy based methods are better than value based methods, a derivation of the Gradient of Score(Cost) function, and an implementation of simple Policy Gradient to train Gym’s Acrobot-v0. We then saw, how introducing a baseline reduces variance which leads to the […]