The following list of notebooks is for your homework, that you will put on a private GitHub repository. Let us recall that you need to follow these steps first:
Steps to follow
- Find a friend
- Have a GitHub account
- Each pair of students must share a single private GitHub repository
- Fill the form with your friend at: https://docs.google.com/forms/d/1TVuQixIWhDI72QtwsRhq6_7hq4jCrzzQ8rwHUOUj28U
- Share your repository with user
hassotheaon GitHub, he will assess your work
- Use the Docker image of the course, following the tools page and start to work !
About your work
- Your work can be written in French or English
- The deliverable is a
xxx.ipynbfile (jupyter notebook) or a
xxx.pyfile (if you are using
jupytext) built by completing the template. We won’t execute the code in your notebook: all your results, displays and plots must be visible without having to rerun everything.
Warning. If one of these steps is not followed: no evaluation !
Homework 01. Data wrangling and visualization with pandas
Deadline: 2022-01-30 23:55
Description: You will use pandas and plotly (or any data visualization library) to perform data-processing and visualisation on a century of data of French firstnames.
Homework 02. Babynames with spark
Deadline: 2022-02-20 23:59
Description: You will use spark to study babynames trends.
Homework 03. New York taxis
Deadline: 2022-03-06 23:59
Description: You will study a large dataset containing all taxi trips in New York City.