Welcome
Welcome to the webpage of the Big Data Technologies course. On this webpage you will find all the teaching material (mainly slides and jupyter notebooks), but also instructions to get the tools required for the course.
Important links
Learning resources
Here is a list of learning ressources that can be useful for this course, among many others:
- Spark documentation website:
- API docs
- Databricks learning notebooks:
- StackOverflow:
- More advanced:
- Book: “Spark The Definitive Guide”
Technologies
The course will focus mainly on for big data processing and starts with a description and usage of the “Python stack” for data science.
The course will use python
as main programming language, even if the scala
API is arguably better for spark
.
A tentative list of technologies used during the course is as follows:
Infrastructure
Python stack
Data Visualization
Big data processing
Data storage / formats / querying