Project completed in week 10 (30.11.-04.12.20) of the Data Science Bootcamp at Spiced Academy in Berlin.
This was a really exciting week, because we had a team project which combined the power of Machine Learning algorithms with the beauty of Web Development!
Making a recommender system
Spotify's Discover Weekly, Netflix's "Watch Next", Amazon's "You might like..." -- these are all examples of recommender systems. They basically use the data (history) of their users (what music they listened to, what series they watched, what they bought) to discover patterns in their preferences and recommend more similar products (and in this way keep them consuming).
There are two main approaches to recommender systems:
- memory-based (heuristic, non-parametric)
- content filtering: TF-IDF
- collaborative filtering: KNN
- model-based (algorithmic, parametric)
- content filtering: Bayesian classification, neural networks
- collaborative filtering: Bayesian networks, SVD, neural networks
In our movie recommender project, we used two algorithms:
Making a web-page with Flask
I chose to focus on the web development part of the project, in order to refresh my HTML and CSS skills. I built the web page with Flask, a lightweight Python framework for web-development which makes it really easy to create simple web apps. Flask also uses Jinja templates, files that contain static data or placeholders for dynamic data. In this case, I used Jinja to render the movies from our dataset and the additional information (posters, trailers) that we got with the TMDB API.
On the main page, we display 15 movies randomly selected from the 50 most rated movies in the dataset, to ensure that there is a high probability that the user knows at least some of these movies. Users are asked to rate as many of them as possible or leave the slider at 0 if they haven't seen the movie. In the second step, we ask users what kind of movies they prefer: old, new, or any. The answers to these two questions (movie ratings and year preference) are our input data, which is used for the NMF and SVD algorithms to make predictions.
After submitting their responses (and waiting a bit, because the algorithms take a while to calculate), the users are taken to the recommendations page. Here we display 5 movies that are similar to movies the users have rated highest and belong to the preferred year category, as calculated by the algorithm.
Friday Lightning Talk
The lightning talk of this week will actually take place next week on Wednesday, to give us more time to work on this complex project, while also brainstorming and trying out ideas for the final project. But we are almost done, so here is a demo of our recommender system: