policy improvement

CellStrat > Research/Blog > policy improvement

Aug

#CellStratAILab #disrupt4.0 #WeCreateAISuperstars #WhereLearningNeverStops In recent weeks, I had presented a session on “AlphaZero with Monte Carlo Tree Search” algorithm at the CellStrat AI Lab. This is an algorithm developed by Google Deepmind in 2016. It mastered the game of GO and beat the 18-time world champion at the time Lee Sedol. Go is an ancient Chinese abstract strategy […]

Posted in: Gaming, Reinforcement Learning, Robotics,

Tags: Actor Critic, AlphaGO, AlphaZero, deepmind, Game Tree, MCTS, Monte Carlo Tree Search, policy evaluation, policy improvement, policy iteration, reinforcement learning, UCT, Upper Confidence Bound,

Apr

Temporal Difference methods in RL

This post discusses temporal difference (TD) methods, used in Reinforcement Learning. It contrasts TD methods with Monte Carlo (MC) methods and dynamic programming. You need to have a thorough understanding of Markov Decision Process (MDP) to understand this post. Prediction and Control : In general, RL methods have two components 1) Prediction / Evaluation : where […]

Posted in: Reinforcement Learning, Robotics,

Tags: epsilon greedy technique, exploration vs exploitation, On-policy vs off-policy, policy improvement, policy iteration, Q learning, reinforcement learning, SARSA, SARSAMax, TD control, TD Learning, Temporal Difference, value iteration,

Sep

Meeting Minutes from AI Lab session on Saturday 21st Sep in Bengaluru

#CellStratAILab #disrupt4.0 #WeCreateAISuperstars We had fantastic presentations on advanced Deep Learning concepts at the last Saturday AI Lab. Reinforcement Learning (RL) with Dynamic Programming : First Shubha M. started with a superb session on RL with Dynamic Programming. Dynamic Programming is a concept of breaking a problem into subproblems, solving them and then combining the […]

Posted in: Computer Vision, Deep Learning, Generative Modeling, Reinforcement Learning,

Tags: artificial intelligence, autoencoders, Computer Vision, denoising autoencoders, greedy policy, markov decision processes, object detection, policy improvement, single shot detector, sparse autoencoders, Unsupervised learning, VAE, value iteration,