Posts

Showing posts from April, 2021

Sentiment Prediction using Naive Bayes Algorithm

Image
Introduction This is a post about Sentiment Prediction work that I did with the Naive Bayes Classifier. The dataset I used for the experiments were on  this sentiment labelled dataset . Which had 3 types of review datasets: IMDB Movie Reviews Amazon Product Reviews Yelp Reviews I used the IMDB Movie Reviews dataset whose textual variation can be found here . The Jupyter Notebook having the outcome of my experimentation is committed in this  GitHub Repository . Dataset The dataset consists of review and sentiment pairs as follows Figure 1: The IMDB review dataset The reviews consist of movie reviews having both positive and negative sentiments. Each review is labelled with the respective sentiments which have positive as 0 and negative as 1. Goals The major part of this project was to understand the working of the Naive Bayes Classifier. The following are the important MVPs of the project: Predicting the sentiment for a given review. Dividing the dataset into train, dev and tes...

Overfitting in Machine Learning

Image
Understanding Overfitting using Higher order linear regression     This is a project done to understand Overfitting using linear regression and Polynomial model using scikit learn. We will also see how to avoid overfitting using regularization techniques.     Please do refer to  this Jupiter notebook for the whole code     First, we will understand overfitting... Overfitting     Overfitting is the phenomenon where the model will perfectly coincide with the training data and will have large errors for unseen data.     Summing up small or 0 training error but very high validation error. Meaning it memorized the training data.     If this happens then the model will only work if the input lines in the training data and the model will not behave as expected for unseen validation data.     This can usually happen if there is a complex model or a small training dataset. If a model is overfit supplying more data for tr...