Introduction to Machine Learning

Univ. de Paris, Masters MIDS et M2MO, 2021

Syllabus

Machine learning is a scientific discipline that is concerned with the design and development of algorithms that allow computers to learn from data. A major focus of machine learning is to automatically learn complex patterns and to make intelligent decisions based on them. The set of possible data inputs that feed a learning task can be very large and diverse, which makes modeling and prior assumptions critical problems for the design of relevant algorithms.

This course focuses on the methodology underlying supervised and unsupervised learning, with a particular emphasis on the mathematical formulation of algorithms, and the way they can be implemented and used in practice. The course will describe for instance some necessary tools from optimization theory, and explain how to use them for machine learning. Numerical illustrations and applications to datasets will be given for the methods studied in the course. Practical sessions will start with a presentation of the Python language and of the main librairies for data science and scientific computing.

Slides introducing the course

slides.html

Format

Courses on slides all material in English
Practical sessions use python, jupyter, scikit-learn, tensorflow
We will start with a quick introduction to python and the jupyter notebook

Who ?

The teachers for this course are :

Aurélie Fischer http://www.lpsm.paris/dw/doku.php?id=users:fischer:index
Stéphane Gaïffas https://stephanegaiffas.github.io

When and where ?

Check this calendar and look for “Introduction au machine learning”

This means 5 hours per week !

Evaluation

40% for homeworks (you send your jupyter notebook via moodle, for tuesday of week $w$ you must be sent it before Monday 23:59 of week $w+1$)
60% for the final exam

Material

All the material will be sent here all along the course.

1. Introduction to supervised learning (courses 1, 2 and 3 by S. Gaïffas)

This first part will last three courses and will be about :

Binary classification
LDA / QDA for Gaussian models
Logistic regression
Standard metrics and recipes (overfitting, cross-validation)
Regularization (Ridge, Lasso)
Support Vector Machine, the Hinge loss
Kernel methods

Practical sessions will introduce the Python language and all basic librairies for scientific computing and data science with Python.

Slides

Exercices

Notebooks

And a python script to be execute with the streamlit library (use pip install streamlit in order to install it):

kernel-svm.py

And type in a terminal

streamlit run kernel-svm.py

Your browser should open with an interactive widget

2. Trees and boosting methods (courses 4 and 5 by A. Fischer)

Two lectures and notebooks on machine learning methods based on trees:

k-NN, Decision trees, CART, Random Forests and Boosting methods

We will use again the scikit-learn library, through some quick illustrations of some machine learning algorithm, and will illustrate some more advanced uses of it.

Slides

slides04.pdf

Some exercices

exos4.pdf

Notebooks

3. Deep learning (course 6, by S. Gaïffas)

This course will be about deep learning:

Introduction to neural networks
The perceptron, examples of “shallow” neural nets
Multilayer neural networks, deep learning
Stochastic gradient descent and back-propagation
Regularization
Convolutional neural networks

We will use tensorFlow with the keras for the practical session.

Slides

slides05.pdf

Notebooks

notebook11_deep_learning.ipynb

Homework 2 (by S. Gaïffas)

Subject of the homework:

notebook12_dm_fashion_mnist.ipynb

You will send your notebook containing the results (WE WON’T RUN YOUR NOTEBOOK), namely with the graphs and the results that you want to show, as a unique jupyter notebook file (.ipynb extension). We won’t open any other supplementary file.

Send your work using the following google form :

https://forms.gle/CyXcrQXqZ8nCzKoB7

(you will need to create a google account if you don’t have one).

Important. Deadline for Homework 2: 5 novembre 21 @23:55

References

Machine Learning, K.M. Murphy, MIT Press
Foundations of Machine Learning. M. Mohri, A. Rostamizadeh and A. Talwalkar, MIT Press
Deep Learning, I. Goodfellow and Y. Bengio and A. Courville, MIT Press
Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython, W. McKinney, O’Reilly
Statistics for High-Dimensional Data: Methods, Theory and Applications, P. Bühlmann, S. van de Geer, Springer-Verlag

Share on

Twitter Facebook LinkedIn

Stéphane Gaïffas

Syllabus

Slides introducing the course

Format

Who ?

When and where ?

Evaluation

Material

1. Introduction to supervised learning (courses 1, 2 and 3 by S. Gaïffas)

Slides

Exercices

Notebooks

2. Trees and boosting methods (courses 4 and 5 by A. Fischer)

Slides

Some exercices

Notebooks

3. Deep learning (course 6, by S. Gaïffas)

Slides

Notebooks

Homework 2 (by S. Gaïffas)

References

Share on