Decoding Emotions: Unveiling Sentiments in IMDb Movie Reviews with NLTK’s SentimentIntensityAnalyzer

Lukman Aliyu
3 min readJun 21, 2023

Introduction

Sentiment analysis is a powerful technique that allows us to understand the sentiment or emotional tone of text data. In this article, we will explore performing sentiment analysis on the IMDb Movie Reviews dataset using the SentimentIntensityAnalyzer from the NLTK library. NLTK provides a comprehensive set of tools for natural language processing tasks, including sentiment analysis. By leveraging the SentimentIntensityAnalyzer, we can quickly analyze the sentiment of movie reviews and gain insights into the overall positive, negative, or neutral sentiment expressed in the dataset.

Performing Sentiment Analysis with NLTK

To perform sentiment analysis on the IMDb Movie Reviews dataset using NLTK’s SentimentIntensityAnalyzer, follow these steps:

Dataset Preparation:

Import the necessary libraries, including NLTK and Pandas.

import nltk
import pandas as pd
from nltk.sentiment import SentimentIntensityAnalyzer
from nltk.corpus import movie_reviews
nltk.download("movie_reviews")
nltk.download("vader_lexicon")

Load the IMDb Movie Reviews dataset from the NLTK corpus.

# Load the IMDb Movie Reviews dataset
reviews = []
for category in movie_reviews.categories():
for fileid in movie_reviews.fileids(category):
review = movie_reviews.raw(fileid)
sentiment = 'positive' if category == 'pos' else 'negative'
reviews.append((review, sentiment))

Sentiment Analysis:

Initialize the SentimentIntensityAnalyzer object from NLTK.

# Initialize the SentimentIntensityAnalyzer
sid = SentimentIntensityAnalyzer()

Iterate through each movie review in the dataset. Then use the SentimentIntensityAnalyzer to obtain sentiment scores for each review. Based on the compound score, categorize the sentiment as positive, negative, or neutral. Finally, store the original sentiment label and the predicted sentiment label for each review.

# Perform sentiment analysis on the movie reviews
sentiments = []
for review, sentiment in reviews:
sentiment_scores = sid.polarity_scores(review)

# Extract the compound score, which represents the overall sentiment
compound_score = sentiment_scores['compound']

if compound_score >= 0.05:
predicted_sentiment = "positive"
elif compound_score <= -0.05:
predicted_sentiment = "negative"
else:
predicted_sentiment = "neutral"

sentiments.append((review, sentiment, predicted_sentiment))

Analysis and Evaluation:

Convert the sentiment analysis results into a Pandas DataFrame for further analysis. Examine a sample of the DataFrame to understand the predicted sentiments compared to the actual sentiments.

# Convert sentiments to a DataFrame for easier analysis
df = pd.DataFrame(sentiments, columns=['Review', 'Actual Sentiment', 'Predicted Sentiment'])

# Print a sample of the results
df.head()

Evaluate the accuracy of the sentiment analysis by comparing the predicted sentiments with the actual sentiments.

# Filter out the neutral class from the data
filtered_df = df[df['Predicted Sentiment'] != 'neutral']

# Convert the actual and predicted sentiments to lists
actual_sentiments = filtered_df['Actual Sentiment'].tolist()
predicted_sentiments = filtered_df['Predicted Sentiment'].tolist()

# Generate the classification report
report = classification_report(actual_sentiments, predicted_sentiments)

# Print the classification report
print(report)

The metrics are not all that good.

Conclusion

Sentiment analysis is a valuable technique for understanding the sentiment expressed in textual data. In this article, we demonstrated how to perform sentiment analysis on the IMDb Movie Reviews dataset using NLTK’s SentimentIntensityAnalyzer. By leveraging NLTK’s tools and resources, we were able to quickly analyze the sentiment of the movie reviews. The SentimentIntensityAnalyzer provided sentiment scores, allowing us to categorize each review as positive, negative, or neutral. By comparing the predicted sentiments with the actual sentiments, we were able to evaluate the accuracy of the sentiment analysis. This approach offers a simple yet effective way to gain insights into the overall sentiment of the IMDb Movie Reviews dataset. Though it can be seen that the metrics are not so good, sentiment analysis has so much potential, and the metrics can be improved with a little more work. However, our focus in this article is only to demonstrate the possibilities. By applying sentiment analysis techniques, researchers and businesses can extract valuable information from large volumes of text data, leading to better decision-making and an improved understanding of public opinion.

--

--

Lukman Aliyu

Pharmacist enthusiastic about Data Science/AI/ML| Fellow, Arewa Data Science Academy