exploration vs exploitation

CellStrat > Research/Blog > exploration vs exploitation

Apr

This post discusses temporal difference (TD) methods, used in Reinforcement Learning. It contrasts TD methods with Monte Carlo (MC) methods and dynamic programming. You need to have a thorough understanding of Markov Decision Process (MDP) to understand this post. Prediction and Control : In general, RL methods have two components 1) Prediction / Evaluation : where […]

Posted in: Reinforcement Learning, Robotics,

Tags: epsilon greedy technique, exploration vs exploitation, On-policy vs off-policy, policy improvement, policy iteration, Q learning, reinforcement learning, SARSA, SARSAMax, TD control, TD Learning, Temporal Difference, value iteration,

Apr

DDPG and TD3

This post assumes that you have a strong understanding of the basics of Reinforcement Learning, MDP, DQN and Policy Gradient Algorithms. You can go through Policy Gradients to understand the derivation for Stochastic Policies In the previous post on Actor Critic, we saw the advantage of merging Value based and Policy based methods together. The […]

Posted in: Reinforcement Learning, Robotics,

Tags: Actor Critic, Actor Critic method, Bellman equation, DDPG, Deep Deterministic Policy Gradients, deterministic policy, Double DQN, DQN, Experience Replay, exploration vs exploitation, Fixed D Targets, policy gradients, Q learning, TD3 RL, Twin Delayed Double Deterministic Policy Gradients,

Aug

Meeting Minutes from AI Lab Hands-On Workshop on Saturday 24th Aug in Bengaluru

#CellStratAILab #disrupt4.0 #WeCreateAISuperstars Last Saturday, our team lead for Reinforcement Learning (RL) Shubha Manikarnike presented a fabulous hands-on workshop on RL and it’s various algorithms such as Markov Decision Process (MDP), Policy Gradients, Bellman equation, Q-learning etc. The session started with an Introduction to RL. There was a comparison on how this is different from […]

Posted in: Deep Learning, Reinforcement Learning,

Tags: Bellman equation, epsilon greedy technique, exploration vs exploitation, Frozen Lake, markov decision process, openAI Gym, Q learning, reinforcement learning, rewards based learning,