Introduction to Machine Learning

Univ. de Paris, Masters MIDS et M2MO, 2021

Syllabus

Machine learning is a scientific discipline that is concerned with the design and development of algorithms that allow computers to learn from data. A major focus of machine learning is to automatically learn complex patterns and to make intelligent decisions based on them. The set of possible data inputs that feed a learning task can be very large and diverse, which makes modeling and prior assumptions critical problems for the design of relevant algorithms.

This course focuses on the methodology underlying supervised and unsupervised learning, with a particular emphasis on the mathematical formulation of algorithms, and the way they can be implemented and used in practice. The course will describe for instance some necessary tools from optimization theory, and explain how to use them for machine learning. Numerical illustrations and applications to datasets will be given for the methods studied in the course. Practical sessions will start with a presentation of the Python language and of the main librairies for data science and scientific computing.

Slides introducing the course

Format

  • Courses on slides all material in English
  • Practical sessions use python, jupyter, scikit-learn, tensorflow
  • We will start with a quick introduction to python and the jupyter notebook

Who ?

The teachers for this course are :

When and where ?

Check this calendar and look for “Introduction au machine learning”

This means 5 hours per week !

Evaluation

  • 40% for homeworks (you send your jupyter notebook via moodle, for tuesday of week $w$ you must be sent it before Monday 23:59 of week $w+1$)

  • 60% for the final exam

Material

All the material will be sent here all along the course.

1. Introduction to supervised learning (courses 1, 2 and 3 by S. Gaïffas)

This first part will last three courses and will be about :

  • Binary classification
  • LDA / QDA for Gaussian models
  • Logistic regression
  • Standard metrics and recipes (overfitting, cross-validation)
  • Regularization (Ridge, Lasso)
  • Support Vector Machine, the Hinge loss
  • Kernel methods

Practical sessions will introduce the Python language and all basic librairies for scientific computing and data science with Python.

Slides

Exercices

Notebooks

And a python script to be execute with the streamlit library (use pip install streamlit in order to install it):

And type in a terminal

streamlit run kernel-svm.py

Your browser should open with an interactive widget

2. Trees and boosting methods (courses 4 and 5 by A. Fischer)

Two lectures and notebooks on machine learning methods based on trees:

  • k-NN, Decision trees, CART, Random Forests and Boosting methods

We will use again the scikit-learn library, through some quick illustrations of some machine learning algorithm, and will illustrate some more advanced uses of it.

Slides

Some exercices

Notebooks

3. Deep learning (course 6, by S. Gaïffas)

This course will be about deep learning:

  • Introduction to neural networks
  • The perceptron, examples of “shallow” neural nets
  • Multilayer neural networks, deep learning
  • Stochastic gradient descent and back-propagation
  • Regularization
  • Convolutional neural networks

We will use tensorFlow with the keras for the practical session.

Slides

Notebooks

Homework 2 (by S. Gaïffas)

Subject of the homework:

You will send your notebook containing the results (WE WON’T RUN YOUR NOTEBOOK), namely with the graphs and the results that you want to show, as a unique jupyter notebook file (.ipynb extension). We won’t open any other supplementary file.

Send your work using the following google form :

(you will need to create a google account if you don’t have one).

Important. Deadline for Homework 2: 5 novembre 21 @23:55

References

  • Machine Learning, K.M. Murphy, MIT Press
  • Foundations of Machine Learning. M. Mohri, A. Rostamizadeh and A. Talwalkar, MIT Press
  • Deep Learning, I. Goodfellow and Y. Bengio and A. Courville, MIT Press
  • Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython, W. McKinney, O’Reilly
  • Statistics for High-Dimensional Data: Methods, Theory and Applications, P. Bühlmann, S. van de Geer, Springer-Verlag