Movie Genre Prediction Using Machine Learning & NLP

Project Overview

This project explores how to predict movie genres using machine learning and NLP techniques. By analyzing movie synopses, we aim to determine which genres are most popular at different times of the year and whether we can accurately classify a movie’s genre based on its description.

We explore both traditional ML models (Naïve Bayes, SVM, Gradient Boosting) and deep learning approaches (Universal Sentence Encoder for semantic similarity) to improve genre predictions.

📖 Read the full blog post on: medium.com.

Feature Engineering (TF-IDF)

To convert text into numerical features, we use TF-IDF vectorization, which helps emphasize important words while down-weighting common terms.

from sklearn.feature_extraction.text import TfidfVectorizer
tf_vec = TfidfVectorizer()
X_train = tf_vec.fit_transform(train_corpus)

Model Training & Predictions

We train multiple models to classify movies into genres:

Naïve Bayes – Baseline model
SVM (Support Vector Machine) – Improved classification performance
Gradient Boosting Classifier – Boosted performance by combining multiple models

from sklearn.pipeline import Pipeline
from sklearn.naive_bayes import MultinomialNB

model = Pipeline([
    ("vectorizer", TfidfVectorizer()),
    ("classifier", MultinomialNB())
])

model.fit(X_train, y_train)

Semantic Textual Similarity

We use The Universal Sentence Encoder to embed descriptions into a high-dimensional space, which allows us to compute similarity scores between new movies and our known training data genres.

import tensorflow_hub as hub
model = hub.load("https://tfhub.dev/google/universal-sentence-encoder/4")

def embed(input):
    return model(input)

The genre of a new movie is determined by finding the most similar show description in the dataset.

Results & Insights

Final Accuracy: 80%
Key Improvements: Merging overlapping genres and using ensemble methods led to 5% improvement.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
genre_tagging		genre_tagging
README.md		README.md
genre_demo.ipynb		genre_demo.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Movie Genre Prediction Using Machine Learning & NLP

Project Overview

Table of Contents

Data Collection

Data Preprocessing

Feature Engineering (TF-IDF)

Model Training & Predictions

Semantic Textual Similarity

Results & Insights

About

Uh oh!

Releases

Packages

Languages

Gabya06/nlp_genres

Folders and files

Latest commit

History

Repository files navigation

Movie Genre Prediction Using Machine Learning & NLP

Project Overview

Table of Contents

Data Collection

Data Preprocessing

Feature Engineering (TF-IDF)

Model Training & Predictions

Semantic Textual Similarity

Results & Insights

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages