Notebooks
This page contains links to all the notebooks and code used during this course. The bottom right icons provide the following:
-
the button is a shortcut to open the notebook in your locally running
jupyter
notebook. It will work only ifjupyter
is available atlocalhost:8888
on your laptop and if your notebooks are in anotebooks
folder (from the home ofjupyter
). -
the button simply downloads the notebook.
-
Notebook 01. Introduction to Python
Week 2
Description: We describe the Python programming language. -
Notebook 02. Lightspeed introduction to numpy
Week 3
Description: We describe very quickly the numpy ndarray object and the main tools available in numpy. -
Notebook 03. Introdution to pandas
Week 4
Description: We describe the pandas library show some simple computations with it. -
Notebook 04. A closer look at pandas
Week 4
Description: We show a more advanced usage of pandas for data preprocessing and visualization. -
Notebook 05. Spark RDD and the low-level API
Week 5
Description: We describe the low-level API and the main transformations and actions for RDD and PairRDD. -
Notebook 06. Spark DataFrames and the high-level API (spark.sql)
Week 6
Description: We describe the spark.sql API, dataframes and the main operations that we can perform of them. -
Notebook 07. Using JSON data with Python and Spark
Week 8
Description: We describe the use of JSON formatted data with Python and Spark. -
Notebook 08. Building an ETL with Spark
Week 9
Description: We give an example using Spark as an ETL (Extract Load Transform) for preprocessing web-browsing data into a training dataset for classification.