Streamly

What is it?

A data analysis and modelling project for Streamly, a fictional streaming platform. Using their catalogue and engagement data, I analysed what content actually drives user retention and built regression models to predict it — giving the business a data-backed foundation for content decisions.

Why I built it

Streaming platforms live and die by retention. I was curious about what the data actually says when you cut through intuition — does budget correlate with ROI? Do certain genres retain users better than others? This project was about asking those questions rigorously and letting the analysis answer them.

How it works

Started with EDA: cleaned nulls, encoded categoricals, capped outliers, and plotted a correlogram to understand variable relationships.
Calculated ROI and EAROI (engagement-adjusted ROI) across genres to identify which content types deliver the best returns.
Modelled user retention using Multiple Linear Regression and Random Forest Regressor, with R² as the evaluation metric.
Drew actionable insights from correlation analysis on what features most influence whether a user comes back.

Tech Stack

Language: Python
Models: Multiple Linear Regression, Random Forest Regressor
Libraries: pandas, numpy, scikit-learn, matplotlib, seaborn