Research/Blog
Boosting Algorithms; Named Entity Recognition
- January 27, 2020
- Posted by: vsinghal
- Category: Machine Learning Natural Language Processing
#CellStratAILab #disrupt4.0 #WeCreateAISuperstars #AlwaysUpskilling #LifetimeLearning
Minutes from Saturday 18th Jan AI Lab Meetup at BLR :-
We saw some superb presentations by AI Lab Researchers in the 18th Jan AI Lab meetup in BLR.
![](http://www.cellstrat.com/wp-content/uploads/2020/01/Collage-3-1024x768.jpg)
Boosting Algorithms :-
First AI Lab Researcher Natarajan L. presented an excellent overview of Boosting algorithms which includes ADABoost, XGBoost, Gradient Boosting Machines etc. Boosting finds its basis in the idea that a number of weak learner models can lead to a strong model. Weak models are those which are only slightly better than a random guess. Examples include Shallow Decision Tree or Logistic Regression. This almost equates to “Wisdom from a Council of Fools”.
First many weak learners are used to develop classification logic – these models might mis-classify some samples. Boosting relies on boosting (i.e. giving higher weights to) the misclassified samples from weak learners in the next iteration of training. By doing this successively, the final model, which is combined result of all the learners, learns all the major nuances and does more accurate predictions on the aggregate data.
![](http://www.cellstrat.com/wp-content/uploads/2020/01/Boosting.png)
Then Natarajan presented individual boosting algorithms such as ADABoost, XGBoost, Gradient Boosting Machines, Light GBM and CatBoost in detail. XGBoost is particularly popular in Kaggle competitions. Natarajan also covered SHAP values (Shapely Additive Explanations) – these values break down a prediction to show the impact of a feature. These values are useful in applications such as determining parameters that drive loan application rejection or health parameters that drive risk of certain ailments.
Because of their power, boosting algorithms can overfit and are computationally expensive. We often need to regularize them.
Named Entity Recognition (NER) :-
Next AI Lab Researcher Bhanumathi K. presented a super session on Named Entity Recognition with NLP algorithms. NER is a sub task of information extraction that seek to locate and classify name entities in text into predefined categories such as the names of persons, organisation, locations, expressions of times, monetary value.
NER extracts information and categorises the entities into predefined categories of a specific domain using statistical models. This finds use in a large no of applications such as information extraction, summarization, NLU or natural language understanding, Ques and Ans, ontology construction, search applications, customer support ticket categorization, document classification, recommender systems etc.
NER systems have been created that use linguistic grammar-based techniques as well as statistical models such as machine learning.
To avoid a need for lot of manual annotation, semi-supervised methods have been tried for NER. There are many best-of-breed pre-trained models to achieve NER – these include NLTK, Spacy, Stanford NLP, Allen NLP, Flair, Polyglot, Deep Pavlov etc. Some of these are more recent research and might outperform the older ones among these.
Wish to engage with CellStrat AI Lab – which is already India’s No 1 AI Lab. Want to learn about our world-class AI ML training and internships. If yes, visit our AI Lab in BLR or Gurugram this Saturday. RSVP below :-
BLR AI Lab meetup :-
Register : https://www.meetup.com/Disrupt-4-0/events/qqmxlrybcdbcb/
Topic : Graphical Neural Nets, Sarcasm Detection from news articles
Date : Saturday 1st Feb 2020, 10:30 AM – 5 PM
Presenters : Pushparaj M., Shreyas S K
Gurugram AI Lab meetup :-
Register : https://www.meetup.com/Disrupt-4-0/events/267878638/
Topic : ML in Google Maps, Naive-Bayes Classifier
Date : Saturday 1st Feb 2020, 10:30 AM – 5 PM
Presenters : Saakshi Bhargava, Ashish Bajpai
See you this Saturday for the AI Lab meetup ! Let’s disrupt the world with AI, together !
Questions ? Call me at +91-9742800566 !
Best Regards,
Vivek Singhal
Co-Founder & Chief Data Scientist, CellStrat
+91-9742800566