Posted onFeb 25

Sentiment Analysis on IMDB Movie Reviews Using BERT

In the ever-evolving world ofNatural Language Processing (NLP), sentiment analysis remains a crucial task. Today, we'll dive into a powerful approach to sentiment analysis usingBERT (Bidirectional Encoder Representations from Transformers) on the IMDB movie reviews dataset. This blog will guide you through the process of building a sentiment analysis model that can classify movie reviews as positive or negative.

The Dataset

We'll be using the IMDB dataset, which contains 50,000 movie reviews split evenly between positive and negative sentiments. This dataset is widely used in the NLP community and provides a great starting point for sentiment analysis tasks.

Setting Up the Environment

Before we begin, make sure you have the necessary libraries installed:

pipinstallpandas datasets scikit-learn transformers torch tensorflowpipinstall--upgrade tensorflow transformers

Loading and Preprocessing the Data

First, let's load the IMDB dataset using the Hugging Facedatasets library:

fromdatasetsimportload_datasetimportpandasaspd# Load IMDB datasetdataset=load_dataset('imdb')# Convert to pandas DataFrametrain_dataframe=pd.DataFrame(dataset['train'])test_dataframe=pd.DataFrame(dataset['test'])# Display basic infoprint(train_dataframe.info())print(train_dataframe['label'].value_counts(normalize=True))

plt.figure(figsize=(8,6))sns.countplot(x='label',data=train_df)plt.title('Distribution of Sentiment Labels')plt.xlabel('Sentiment')plt.ylabel('Count')plt.show()

Preprocessing with BERT Tokenizer

Next, we'll preprocess the text data using BERT's tokenizer:

fromtransformersimportBertTokenizerimporttorchtokenizer=BertTokenizer.from_pretrained('bert-base-uncased')defpreprocess_data(texts,labels,max_length=256):encoded=tokenizer.batch_encode_plus(texts,add_special_tokens=True,max_length=max_length,padding='max_length',truncation=True,return_attention_mask=True,return_tensors='pt')return{'input_ids':encoded['input_ids'],'attention_mask':encoded['attention_mask'],'labels':torch.tensor(labels)}# Preprocess training and testing datatrain_data=preprocess_data(train_dataframe['text'].tolist(),train_dataframe['label'].tolist())test_data=preprocess_data(test_dataframe['text'].tolist(),test_dataframe['label'].tolist())

Setting Up the Model

We'll use theBertForSequenceClassification model from Hugging Face:

fromtransformersimportBertForSequenceClassification,AdamWfromtorch.utils.dataimportDataLoader,TensorDataset# Initialize modelmodel=BertForSequenceClassification.from_pretrained('bert-base-uncased',num_labels=2)# Optimizeroptimizer=AdamW(model.parameters(),lr=2e-5)# Create DataLoadertrain_dataset=TensorDataset(train_data['input_ids'],train_data['attention_mask'],train_data['labels'])train_loader=DataLoader(train_dataset,batch_size=32,shuffle=True)

Training the Model

Now, let's train our model:

device=torch.device('cuda'iftorch.cuda.is_available()else'cpu')model.to(device)num_epochs=3forepochinrange(num_epochs):model.train()forbatchintrain_loader:input_ids,attention_mask,labels=[b.to(device)forbinbatch]optimizer.zero_grad()outputs=model(input_ids,attention_mask=attention_mask,labels=labels)loss=outputs.lossloss.backward()optimizer.step()print(f"Data passes through{epoch+1}/{num_epochs} times or epochs.")# Save the modeltorch.save(model.state_dict(),'bert_sentiment_v1_model.pth')

Training Environment Details

I trained the model using Google Colab with the runtime setting configured to utilize a GPU. Despite leveraging GPU acceleration, the training process took approximately 49 minutes to complete.

Evaluating the Model

After training, let's evaluate our model's performance:

fromsklearn.metricsimportaccuracy_score,classification_reportmodel.eval()test_dataset=TensorDataset(test_data['input_ids'],test_data['attention_mask'],test_data['labels'])test_loader=DataLoader(test_dataset,batch_size=32)all_preds=[]all_labels=[]withtorch.no_grad():forbatchintest_loader:input_ids,attention_mask,labels=[b.to(device)forbinbatch]outputs=model(input_ids,attention_mask=attention_mask)preds=torch.argmax(outputs.logits,dim=1)all_preds.extend(preds.cpu().numpy())all_labels.extend(labels.cpu().numpy())accuracy=accuracy_score(all_labels,all_preds)print(f"Accuracy:{accuracy}")print(classification_report(all_labels,all_preds))

Interpreting the Model's Performance

Let's break down the results of our sentiment analysis model:

Overall Accuracy

The model achieved an impressive overall accuracy of 92.128%. This means that out of all the movie reviews in the test set, our model correctly classified 92.128% of them as either positive or negative.

Class-specific Metrics

Negative Reviews (Class 0)

Precision: 0.93

Recall: 0.91

F1-score: 0.92

Positive Reviews (Class 1)

Precision: 0.91

Recall: 0.93

F1-score: 0.92

Interpretation

Balanced Performance: The model shows consistent performance across both positive and negative reviews, with identical F1-scores of 0.92 for both classes. This indicates that the model is well-balanced and doesn't favor one sentiment over the other.

Precision:

For negative reviews, the precision of 0.93 means that when the model predicts a review is negative, it's correct 93% of the time.

For positive reviews, the precision of 0.91 indicates that when the model predicts a review is positive, it's correct 91% of the time.

Recall:

For negative reviews, the recall of 0.91 means the model correctly identifies 91% of all actual negative reviews.

For positive reviews, the recall of 0.93 shows the model correctly identifies 93% of all actual positive reviews.

F1-Score:

The F1-score of 0.92 for both classes represents a strong balance between precision and recall, indicating robust overall performance.

Support:

The test set contains an equal number of positive and negative reviews (12,500 each), ensuring a balanced evaluation.

In conclusion, this model demonstrates excellent and balanced performance in sentiment analysis of movie reviews. Its high accuracy and consistent metrics across both positive and negative sentiments make it a reliable tool for classifying movie review sentiments.

Making Predictions

Finally, let's create a function to predict sentiment for new reviews:

defpredict_sentiment(text):encoded=tokenizer.encode_plus(text,add_special_tokens=True,max_length=256,padding='max_length',truncation=True,return_attention_mask=True,return_tensors='pt')input_ids=encoded['input_ids'].to(device)attention_mask=encoded['attention_mask'].to(device)withtorch.no_grad():outputs=model(input_ids,attention_mask=attention_mask)pred=torch.argmax(outputs.logits,dim=1)return"Positive"ifpred.item()==1else"Negative"# Example usage# Test with new sentencesprint(predict_sentiment("I love this movie!"))print(predict_sentiment("This movie was terrible."))

Here is my GOOGLE COLAB Notebook if you want to copy and run the code

https://colab.research.google.com/drive/138HZtdJib-aOoldL4pvl3rLwx8_YsQ1D?usp=drive_link

Conclusion

In this blog, we've walked through the process of building a sentiment analysis model using BERT on the IMDB movie reviews dataset. This powerful approach leverages the pre-trained BERT model and fine-tunes it for our specific task, resulting in high accuracy in sentiment classification.

By following these steps, you can create your own sentiment analysis model and apply it to various text classification tasks. Remember that you can further improve the model by experimenting with different hyperparameters, using more advanced BERT variants, or incorporating additional features into your analysis.

Happy coding and sentiment analyzing!

Thanks
Sreeni Ramadorai