Research/Blog
Meeting Minutes from AI Lab Hands-On Workshop on Saturday 24th Aug in Bengaluru
- August 29, 2019
- Posted by: vsinghal
- Category: Deep Learning Reinforcement Learning
#CellStratAILab #disrupt4.0 #WeCreateAISuperstars
Last Saturday, our team lead for Reinforcement Learning (RL) Shubha Manikarnike presented a fabulous hands-on workshop on RL and it’s various algorithms such as Markov Decision Process (MDP), Policy Gradients, Bellman equation, Q-learning etc.
The session started with an Introduction to RL. There was a comparison on how this is different from Supervised and Unsupervised learning. Shubha explained about Policy based approaches and Value based approaches.
Shubha then reviewed the Markov Decision Process (MDP), where she explained the concepts of Markov property, Markov process and Markov Reward Process (MRP). We understood the math for calculating State Value function and State Action Value function. Multiple examples were provided to understand the Discounting factor, which involves associating higher weightage to current rewards (immediate gratification) and discounted weightage to future rewards (i.e. delayed gratification has lesser significance in present context).
We then moved on to Policy Gradients, where we understood the math on finding the Optimal Policy by maximizing the Rewards, via a gradient ascent parameter adjustment process. We reviewed the concept of calculating the gradients dR/dw (rate of change of R with respect to w), where R represents Rewards and w represents current parameters. In a gradient ascent process, one aims to adjust parameters till the rewards curve hits the peak of it’s concave shape. In RL process, one collects the gradients at each action state in a list (without applying the gradients yet). At the end of all episodes, as a one-step activity, one applies those gradients from collected gradient list which had a positive action score, and, negative of those gradients which carried a negative action score. This makes the good actions more likely to be chosen and less optimal actions less likely to be chosen. We contrasted Gradient Ascent with Gradient Descent as seen in a normal Neural Network.
Finally Shubha discussed Q-learning, where she presented Bellman equation and epsilon greedy techniques. The Bellman equation helps calculate Q-value which indicates the reward potential in any state. We update this Q-value in successive iterations of the gaming episode to finally arrive at ideal Q-values, which indicate a recommended action-state behaviour. The epsilon greedy method enables the exploration vs exploitation paradigm, where one likes to overweight random actions at early steps of learning (in order to explore all potential scenarios) and overweight high-reward or “safe” actions in later stages of learning.
In the afternoon session, came an intense hands-on workshop for the audience. The hands-on part included :
1) Setting up Google Colab to run RL Programs.
2) An Introduction program to familiarize with Open AI gym’s environments, methods and properties.
3) A simple Neural network to solve an RL problem.
4) Q-learning to solve the Taxi-v2 problem in Open AI gym.
5) The members tried out Q-learning on the Frozen Lake environment to understand the algorithm better.
6) Executing the Policy Gradient program which uses the REINFORCE algorithm.
Do you wish to learn advanced AI / ML ? Do you wish to be part of our world-class AI Lab ? If yes, I invite you to check out our AI Lab this Saturday (31st Aug) in BLR (Bellandur or Hebbal locations) :-
Bellandur AI Lab :-
Register : https://www.meetup.com/Disrupt-4-0/events/262360216/
Topic : DeepSpeech (Baidu platform), Intro to RNNs, SVD
Date : Saturday 31st Aug 2019, 10:30AM – 5:00PM
Loc. : WeWork, Embassy Tech Village, ORR, BLR
Hebbal AI Lab :-
Register : https://www.meetup.com/Disrupt-4-0/events/263726390/
Topic : 3D CNN, Time-series modelling, Data Prep for ML
Date : Saturday 31st Aug 2019, 10:00AM – 5:00PM
Loc. : WeWork, RMZ Latitude, Bellary Road, Hebbal, BLR
See you this Saturday for the AI Lab meetup! Attend and join the global AI revolution !
Questions ? Call me at +91-9742800566 !
Best Regards,
Vivek Singhal
Co-Founder & Chief Data Scientist, CellStrat
+91-9742800566