Project completed in week 1 (28.09.-02.10.20) of the Data Science Bootcamp at Spiced Academy in Berlin.
Our first bootcamp project was creating an animated scatterplot, using the libraries matplotlib
or seaborn
and imageio
. The scatterplot illustrates the relationship between life expectancy and fertility rate of world's countries from 1960 to 2015, based on the Gapminder data set.
country | year | population | life_expectancy | fertility_rate | continent | |
---|---|---|---|---|---|---|
0 | Afghanistan | 1800 | 3280000.0 | 28.21 | 7.0 | Asia |
The animated scatterplot is basically made of several overlapping static plots. The animation consists of four steps:
1. Create static scatterplots for each year in the data set.
Thescatterplots depict life_expectancy
on the x axis and fertility_rate
on the y rate. To make the plots even more insightful, the size of the points illustrates the population
number and the color of the points illustrates the continent
.
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.scatterplot(x='life_expectancy',
y='fertility_rate',
hue='continent',
size='population',
sizes=(10, 1000),
legend=False,
data=gapminder_df.loc[gapminder_df['year']==year],
alpha=0.7,
palette='Set2')
plt.title(f'{year}', loc='center', fontsize=20, color='black', fontweight='bold')
plt.xlabel('Life expectancy')
plt.ylabel('Fertility rate')
2. Export the scatterplot images to a designated folder.
import imageio
import os
images = []
folder='/path/to/folder/images'
if not os.path.exists(folder):
os.mkdir(folder)
3. Join the individual images in chronological order.
filename = f'lifeexp_{year}.png'
plt.savefig(os.path.join(folder,filename))
images.append(imageio.imread(os.path.join(folder,filename)))
4. Export the scatterplots sequence as a gif.\
The fps
(frames per second) parameter sets the speed of the animation.
imageio.mimsave(os.path.join(folder,'scatterplot.gif'), images, fps=20)
Now, putting everything together, here's the full code and the animated scatterplot:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import imageio
import os
images = []
folder='/home/lorena/Documents/bootcamp/W1/images'
if not os.path.exists(folder):
os.mkdir(folder)
for year in range(1960, 2016):
plt.axis((0, 100, 0, 10))
sns.scatterplot(x='life_expectancy',
y='fertility_rate',
hue='continent',
size='population',
sizes=(10, 1000),
legend=False,
data=gapminder_df.loc[gapminder_df['year']==year],
alpha=0.7,
palette='Set2')
plt.title(f'{year}', loc='center', fontsize=20, color='black', fontweight='bold')
#plt.title(f'inspired by Hans Rosling', loc='right', fontsize=10, color='grey', style='italic', pad=-20)
#plt.legend(bbox_to_anchor=(0.74, 0.85), loc='center')
plt.xlabel('Life expectancy')
plt.ylabel('Fertility rate')
#plt.annotate({country}, )
filename = f'lifeexp_{year}.png'
plt.savefig(os.path.join(folder,filename))
images.append(imageio.imread(os.path.join(folder,filename)))
plt.figure()
imageio.mimsave(os.path.join(folder,'scatterplot.gif'), images, fps=20)
Friday Lightning Talk
Each week, we get a main dataset and several tasks to apply the concepts learned throughout the week. On Fridays, we present in 5 minutes a particular finding from our weekly challenge project, a chart, new library, (un)solved bugs, or anything that is worth sharing and helpful for others. This Friday, I chose to talk about five new bash commands for checking the installed Python libraries, their versions and dependencies.
function | command |
---|---|
to list all installed libraries | pip list |
to list only outdated libraries | pip list -o (or --outdated ) |
to list only the latest / up to date libraries | pip list -u (or --uptodate ) |
to show all information about a library | pip show <package-name> |
to list all libraries installed in a specific environment | conda list -n <environment-name> |