PPO

CellStrat > Research/Blog > PPO

Apr

#CellStratAILab #disrupt4.0 #WeCreateAISuperstars #AlwaysUpskilling Reinforcement Learning (RL) refers to training agents with help of incentive-driven environments. RL typically involves a tuple of <state, action, reward> paradigm, which means that the agent has action choices to make in various states, and each action entails a potential reward. This also means that each state has a “value” […]

Posted in: Reinforcement Learning, Robotics,

Tags: Actor Critic, Actor Critic method, AI lab, cartpole, DDPG, deep Q learning, deep Q network, Deep Reinforcement Learning, deterministic policy, DQN, DRL course, gaming, markov decision process, markov process, Markov Reward Process, mdp, model-based RL, model-free RL, monte carlo, off-policy, on-policy, policy based methods, policy gradients, PPO, Q learning, Q table, Rainbow method, Reinforcement learning course, reward function, RL course, RL for gaming, RL training, SAC, stochastic policy, TD Learning, TD3, training in reinforcement learning, TRPO, Value-based methods,

Mar

Face Recognition with MTCNN and FaceNet; RL with Proximal Policy Optimization

#CellStratAILab #disrupt4.0 #WeCreateAISuperstars #AlwaysUpskilling Minutes from Saturday 7th March 2020 AI Lab meetup at BLR :- Last Saturday, we had excellent sessions in the AI Lab meetup. Face Recognition with MTCNN and FaceNet :- First Amit Kumar presented a detailed overview of Face Recognition with MTCNN and FaceNet. Face Recognition involves a pipeline of Face […]

Posted in: Computer Vision, Deep Learning, Reinforcement Learning, Security,

Tags: Advantage Function, Face Recognition, facenet, Important Sampling, MTCNN, MTCNN face detector, O-Net, PG algo, policy based methods, policy gradients, PPO, proximal policy optimization, R-Net, reinforcement learning, Triplet Loss, Value-based methods, Vanilla Policy Gradient, VPG,

Mar

Proximal Policy Optimization

In my previous post, we discussed the simplest Policy Gradient REINFORCE. We saw, how Policy based methods are better than value based methods, a derivation of the Gradient of Score(Cost) function, and an implementation of simple Policy Gradient to train Gym’s Acrobot-v0. We then saw, how introducing a baseline reduces variance which leads to the […]

Posted in: Reinforcement Learning,

Tags: clipped PPO, Importance Sampling, PG, policy gradients, PPO, proximal policy optimization, reinforcement learning, RL, TRPO, trust region policy optimization,