Streamly
What is it?
A data analysis and modelling project for Streamly, a fictional streaming platform. Using their catalogue and engagement data, I analysed what content actually drives user retention and built regression models to predict it — giving the business a data-backed foundation for content decisions.
Why I built it
Streaming platforms live and die by retention. I was curious about what the data actually says when you cut through intuition — does budget correlate with ROI? Do certain genres retain users better than others? This project was about asking those questions rigorously and letting the analysis answer them.
How it works
- Started with EDA: cleaned nulls, encoded categoricals, capped outliers, and plotted a correlogram to understand variable relationships.
- Calculated ROI and EAROI (engagement-adjusted ROI) across genres to identify which content types deliver the best returns.
- Modelled user retention using Multiple Linear Regression and Random Forest Regressor, with R² as the evaluation metric.
- Drew actionable insights from correlation analysis on what features most influence whether a user comes back.
Tech Stack
- Language: Python
- Models: Multiple Linear Regression, Random Forest Regressor
- Libraries: pandas, numpy, scikit-learn, matplotlib, seaborn