exploration vs exploitation
This post discusses temporal difference (TD) methods, used in Reinforcement Learning. It contrasts TD methods with Monte Carlo (MC) methods and dynamic programming. You need to have a thorough understanding of Markov Decision Process (MDP) to understand this post. Prediction and Control : In general, RL methods have two components 1) Prediction / Evaluation : where […]
This post assumes that you have a strong understanding of the basics of Reinforcement Learning, MDP, DQN and Policy Gradient Algorithms. You can go through Policy Gradients to understand the derivation for Stochastic Policies In the previous post on Actor Critic, we saw the advantage of merging Value based and Policy based methods together. The […]
#CellStratAILab #disrupt4.0 #WeCreateAISuperstars Last Saturday, our team lead for Reinforcement Learning (RL) Shubha Manikarnike presented a fabulous hands-on workshop on RL and it’s various algorithms such as Markov Decision Process (MDP), Policy Gradients, Bellman equation, Q-learning etc. The session started with an Introduction to RL. There was a comparison on how this is different from […]