Slides
This page contains links to all the slides used during this course. Click on the title of the slide to open it directly in your browser.
-
Slides 01. Course logistics and introduction to big data
Week 1
Description: We describe the logistics of the course and talk about the big data ecosystem. -
Slides 02. The Python Data Science Stack
Week 2
Description: We give a brief overview of the so-called python stack for data-science. -
Slides 03. Spark RDD and the low-level API
Week 5
Description: We describe the low-level API and the main transformations and actions for RDD and PairRDD. -
Slides 04. Spark SQL and the high-level API
Week 6
Description: We describe Spark SQL and operations on DataFrames. -
Slides 05. Using JSON data with Python and Spark
Week 8
Description: We describe the use of JSON formatted data with Python and Spark. -
Slides 06. The main file formats for big data
Week 9
Description: We discuss the main file formats for big data and explain how to choose among them for a task. -
Slides 07. A deeper dive into Spark
Week 10
Description: We dive deeper into Spark internal mechanics and opportunities to optimize computations. -
Slides 08. Some Spark tips
Week 11
Description: We give some tips about Spark, including some interpretation of error messages.