# Introduction to Python

We introduce here the `python` language. 
Only the bare minimum necessary for getting started with the data-science stack (a bunch of libraries for data science).

To learn more about the language, consider going through the
excellent tutorial

- http://www.scipy-lectures.org/intro/index.html. 
    
Dedicated books are also available, such as

- http://diveintopython3.problemsolving.io

Among many other references (since `Python` is one of the most popular languages right now)

Python is a **programming language**, as are `C++`, `java`, `fortran`, `javascript`,
etc.

## Specific features of Python

- an **interpreted** (as opposed to *compiled*) language. Contrary to e.g.
`C++` or `fortran`, one does not compile Python code before executing it. 

- Used as a scripting language, by python `python script.py` in a terminal

- But can be used also **interactively**: the jupyter notebook, iPython, etc.

- A free software released under an **open-source** license: Python can
be used and distributed free of charge, even for building commercial
software.

- **multi-platform**: Python is available for all major operating
systems, Windows, Linux/Unix, MacOS X, most likely your mobile phone
OS, etc.

- A very readable language with clear non-verbose syntax

- A language for which a **large variety of high-quality** packages are
available for various applications, including web-frameworks and scientific
computing

- It is now the **language of choice of data-science** and **machine learning** since several years, because of his high expressivity and tools of deployment

- An object-oriented language

See https://www.python.org/about/ for more information about distinguishing features of Python.

## Important: Python 2 or Python 3

- Simple answer: **don't use Python 2, use Python 3**

- Python 2 is **mostly deprecated** and **won't be maintained** for long now

- You'll end up hanged if you use Python 2

- If Python 2 is mandatory at your workplace, find another work

# Hello world

- In a `jupyter` notebook, you have an interactive interpreter.

- You type in the cells, execute commands with `Shift` + `Enter` (on a Mac, I won't help people on Windows and Linux).

In [None]:
print("Salut tout le monde!")

# Basic types

## Integers

In [None]:
1 + 42

In [None]:
type(1+1)

We can assign values to variables with `=`

In [None]:
a = (3 + 5 ** 2) % 4
a

## Remark

We don't declare the type of a variable before assigning its value. 
In C, conversely, one should write

```C
int a = 4;
```

## Something cool

- **Arbitrary large** integer arithmetics

In [None]:
17 ** 542

## Floats

There exists a floating point type that is created when the variable has decimal values

In [None]:
c = 2.

In [None]:
type(c)

In [None]:
c = 2
type(c)

In [None]:
truc = 1 / 2
truc

In [None]:
type(truc)

## Boolean
Similarly, boolean types are created from a comparison

In [None]:
test = 3 > 4
test

In [None]:
type(test)

In [None]:
False == (not True)

In [None]:
1.41 < 2.71 and 2.71 < 3.14

In [None]:
# It's equivalent to
1.41 < 2.71 < 3.14

## Type conversion (casting)

In [None]:
a = 1
type(a)

In [None]:
b = float(a)
type(b)

In [None]:
str(b)

In [None]:
bool(b)
# All non-zero, non empty objects are casted to boolean as True (more later)

In [None]:
bool(1-1)

#  Containers

Python provides many efficient types of containers, in which collections of objects can be stored. 

The main ones are `list`, `tuple`, `set` and `dict` (but there are many others...)

## Tuples

In [None]:
tt = ('truc', 3.14, "truc")
tt

In [None]:
tt[1]

You can't change a tuple, we say that it's *immutable*

In [None]:
tt[0] = 1

Three ways of doing the same thing

In [None]:
# Method 1
tuple([1, 2, 3])

In [None]:
# Method 2
1, 2, 3

In [None]:
# Method 3
(1, 2, 3)

**Simpler is better in Python**, so usually you want to use Method 2.

In [None]:
toto = 1, 2, 3
toto

- This is serious !

## The Zen of Python easter's egg

In [None]:
import this

## Lists

A list is an ordered collection of objects. These objects may have different types. For example:

In [None]:
colors = ['red', 'blue', 'green', 'black', 'white']

In [None]:
type(colors)

**Indexing:** accessing individual objects contained in the list

In [None]:
colors[2]

In [None]:
colors[2] = 3.14
colors

**Warning.** Indexing **starts at 0** (as in C), not at 1 (as in Fortran or Matlab) for any *iterable* object in Python.

Counting from the end with negative indices:

In [None]:
colors[-2]

Index must remain in the range of the list

In [None]:
colors[10]

In [None]:
colors

In [None]:
tt

In [None]:
colors.append(tt)
colors

In [None]:
len(colors)

## Slicing: obtaining sublists of regularly-spaced elements

This work with anything iterable whenever it makes sense (`list`, `str`, `tuple`, etc.)

In [None]:
colors

In [None]:
list(reversed(colors))

In [None]:
colors[::-1]

**Slicing syntax**: ``colors[start:stop:stride]``

NB: All slicing parameters are optional

In [None]:
colors

In [None]:
colors[3:]

In [None]:
colors[:3]

In [None]:
colors[1::2]

In [None]:
colors[::-1]

## Strings

Different string syntaxes (simple, double or triple quotes):

In [None]:
s = 'tintin'
type(s)

In [None]:
s

In [None]:
s = """         Bonjour,
Je m'appelle Stephane.
Je vous souhaite une bonne journée.
Salut.       
"""
s

In [None]:
s.strip()

In [None]:
print(s.strip())

In [None]:
len(s)

In [None]:
# Casting to a list
list(s.strip()[:15])

In [None]:
# Arithmetics
print('Bonjour' * 2)
print('Hello' + ' all')

In [None]:
sss = 'A'
sss += 'bc'
sss += 'dE'
sss.lower()

In [None]:
ss = s.strip()
print(ss[:10] + ss[24:28])

In [None]:
s.strip().split('\n')

In [None]:
s[::3]

In [None]:
s[3:10]

In [None]:
' '.join(['Il', 'fait', 'super', 'beau', "aujourd'hui"])

### Important

A string is **immutable** !!

In [None]:
s = 'I am an immutable guy'

In [None]:
s[2] = 's'

In [None]:
id(s)

In [None]:
print(s + ' for sure')
id(s), id(s + ' for sure')

### Extra stuff with strings

In [None]:
'square of 2 is ' + str(2 ** 2)

In [None]:
'square of 2 is %d' % 2 ** 2

In [None]:
'square of 2 is {}'.format(2 ** 2)

In [None]:
'square of 2 is {square}'.format(square=2 ** 2)

In [None]:
# And since Python 3.6 you can use an `f-string`
number = 2
square = number ** 2

f'square of {number} is {square}'

### The `in` keyword

You can use the `in` keyword with any container, whenever it makes sense

In [None]:
print(s)
print('Salut' in s)

In [None]:
print(tt)
print('truc' in tt)

In [None]:
print(colors)
print('truc' in colors)

In [None]:
('truc', 3.14, 'truc') in colors

### Brain-f**king

Explain this weird behaviour:

In [None]:
5 in [1, 2, 3, 4] == False

In [None]:
5 not in [1, 2, 3, 4]

In [None]:
(5 in [1, 2, 3, 4]) == False

In [None]:
# ANSWER.
# This is a chained comparison. We have seen that 
1 < 2 < 3
# is equivalent to
(1 < 2) and (2 < 3)
# so that
5 in [1, 2, 3, 4] == False
# is equivalent to
(5 in [1, 2, 3, 4]) and ([1, 2, 3, 4] == False)

## Dictionaries

- A dictionary is basically an efficient table that **maps keys to values**.
- The **MOST** important container in Python. 
- Many things are actually a `dict` under the hood in `Python`

In [None]:
tel = {'emmanuelle': 5752, 'sebastian': 5578}
print(tel)
print(type(tel))

In [None]:
tel['emmanuelle'], tel['sebastian']

In [None]:
tel['francis'] = '5919'
tel

In [None]:
len(tel)

### Important remarks

- Keys can be of different types
- A key must be of **immutable** type

In [None]:
tel[7162453] = [1, 3, 2]
tel[3.14] = 'bidule'
tel[('jaouad', 2)] = 1234
tel

In [None]:
# A list is immutable
tel[['jaouad']] = '5678'

In [None]:
tel = {'emmanuelle': 5752, 'sebastian' : 5578, 'jaouad' : 1234}
print(tel.keys())
print(tel.values())
print(tel.items())

In [None]:
'emmanuelle' in tel

In [None]:
5919 in tel.values()

You can swap values like this

In [None]:
print(tel)
tel['emmanuelle'], tel['sebastian'] = tel['sebastian'], tel['emmanuelle']
print(tel)

In [None]:
# It works, since
a, b = 2.71, 3.14
a, b = b, a
a, b

### Exercice 1

Get keys of `tel` sorted by decreasing order

In [None]:
tel = {'emmanuelle': 5752, 'sebastian' : 5578, 'jaouad' : 1234}

#### Answer

In [None]:
sorted(tel, reverse=True)

### Exercice 2

Get keys of `tel` sorted by increasing values


In [None]:
tel = {'emmanuelle': 5752, 'sebastian' : 5578, 'jaouad' : 1234}

#### Answer

In [None]:
sorted(tel, key=tel.get)

### Exercice 3

Obtain a sorted-by-key version of `tel`

In [None]:
tel = {'emmanuelle': 5752, 'sebastian' : 5578, 'jaouad' : 1234}

#### Answer

- A dict is inherently **orderless**
- Only a representation of a dict can be ordered

In [None]:
# Simplest is through a list
sorted(tel.items())

If you really want an ordered dict `OrderDict` memorizes order of insertion in it

In [None]:
from collections import OrderedDict

OrderedDict(sorted(tel.items()))

## Sets

A set is an unordered container, containing unique elements

In [None]:
ss = {1, 2, 2, 2, 3, 3, 'tintin', 'tintin', 'toto'}
ss

In [None]:
s = 'truc truc bidule truc'
set(s)

In [None]:
{1, 5, 2, 1, 1}.union({1, 2, 3})

In [None]:
set([1, 5, 2, 1, 1]).intersection(set([1, 2, 3]))

In [None]:
ss.add('tintin')
ss

You can combine all containers together

In [None]:
dd = {
    'truc': [1, 2, 3], 
    5: (1, 4, 2),
    (1, 3): {'hello', 'world'}
}
dd

# Assigments in `Python` is name binding

## Everything is either mutable or immutable

In [None]:
ss = {1, 2, 3}
sss = ss
sss, ss

In [None]:
sss.add("Truc")

**Question.** What is in `ss` ?

In [None]:
ss, sss

`ss` and `sss` are names for the same object

In [None]:
id(ss), id(sss)

In [None]:
ss is sss

## About assigments

- Python never copies an object
- Unless you ask him to

When you code
```python
x = [1, 2, 3]
y = x
```
you just
- **bind** the variable name `x` to a list `[1, 2, 3]`
- give another name `y` to the same object

**Important remarks**

- **Everything** is an object in Python
- Either **immutable** or **mutable**

In [None]:
id(1), id(1+1), id(2)

**A `list` is mutable**

In [None]:
x = [1, 2, 3]
print(id(x), x)
x[0] += 42; x.append(3.14)
print(id(x), x)

**A `str` is immutable**

In order to "change" an **immutable** object, Python creates a new one

In [None]:
s = 'to'
print(id(s), s)
s += 'to'
print(id(s), s)

**Once again, a `list` is mutable**

In [None]:
super_list = [3.14, (1, 2, 3), 'tintin']
other_list = super_list
id(other_list), id(super_list)

- `other_list` and `super_list` are the same list
- If you change one, you change the other.
- `id` returns the identity of an object. Two objects with the same idendity are the same (not only the same type, but the same instance)

In [None]:
other_list[1] = 'youps'
other_list, super_list

## If you want a copy, to need to ask for one

In [None]:
other_list = super_list.copy()
id(other_list), id(super_list)

In [None]:
other_list[1] = 'copy'
other_list, super_list

Only `other_list` is modified. 

But... what if you have a `list` of `list` ? (or a mutable object containing mutable objects)

In [None]:
l1, l2 = [1, 2, 3], [4, 5, 6]
list_list = [l1, l2]
list_list

Let's make a copy of `list_list`

In [None]:
copy_list = list_list.copy()
copy_list.append('super')
list_list, copy_list

OK, only `copy_list` is modified, as expected

But now...

In [None]:
copy_list[0][1] = 'oups'
copy_list, list_list

**Question.** What happened ?!?

- The `list_list` object is copied
- But NOT what it's containing !
- By default `copy` does a *shallow* copy, not a *deep* copy
- It does not build copies of what is contained
- If you want to copy an object and all that is contained in it, you need to use `deepcopy`.

In [None]:
from copy import deepcopy

copy_list = deepcopy(list_list)
copy_list[0][1] = 'incredible !'
list_list, copy_list

## Final remarks

In [None]:
tt = ([1, 2, 3], [4, 5, 6])
print(id(tt), tt)
print(list(map(id, tt)))

In [None]:
tt[0][1] = '42'
print(id(tt), tt)
print(list(map(id, tt)))

# Control flow and other stuff...

Namely tests, loops, again booleans, etc.

In [None]:
if 2 ** 2 == 5:
    print('Obvious')
else:
    print('YES')
print('toujours')

## Blocks are delimited by indentation!

In [None]:
a = 3
if a > 0:
    if a == 1:
        print(1)
    elif a == 2:
        print(2)
elif a == 2:
    print(2)
elif a == 3:
    print(3)
else:
    print(a)

## Anything can be understood as a boolean

For example, don't do this to test if a list is empty

In [None]:
l2 = ['hello', 'everybody']

if len(l2) > 0:
    print(l2[0])

but this

In [None]:
if l2:
    print(l2[0])

**Some poetry**

- An empty `dict` is `False`
- An empty `string` is `False`
- An empty `list` is `False`
- An empty `tuple` is `False`
- An empty `set` is `False`
- `0` is `False`
- `.0` is `False`
- etc...
- everything else is `True`

## While loops

In [None]:
a = 10
b = 1
while b < a:
    b = b + 1
    print(b)

Compute the decimals of Pi using the Wallis formula

$$
\pi = 2 \prod_{i=1}^{100} \frac{4i^2}{4i^2 - 1}
$$

In [None]:
pi = 2
eps = 1e-10
dif = 2 * eps
i = 1
while dif > eps:
    pi, i, old_pi = pi * 4 * i ** 2 / (4 * i ** 2 - 1), i + 1, pi
    dif = pi - old_pi

In [None]:
pi

In [None]:
from math import pi

pi

##  `for` loop with `range`
- Iteration with an index, with a list, with many things !
- `range` has the same parameters as with slicing `start:end:stride`, all parameters being optional

In [None]:
for i in range(7, 1, -1):
    print(i)

In [None]:
for i in range(4):
    print(i + 1)
print('-')

for i in range(1, 5):
    print(i)
print('-')

for i in range(1, 10, 3):
    print(i)

**Something for nerds**. You can use `else` in a `for` loop

In [None]:
names = ['stephane', 'mokhtar', 'jaouad', 'simon', 'yiyang']

for name in names:
    if name.startswith('u'):
        print(name)
        break
else:
    print('Not found.')

In [None]:
names = ['stephane', 'mokhtar', 'jaouad', 'ulysse', 'simon', 'yiyang']

for name in names:
    if name.startswith('u'):
        print(name)
        break
else:
    print('Not found.')

## For loops over iterable objects

You can iterate using `for` over any container: `list`, `tuple`, `dict`, `str`, `set` among others...

In [None]:
colors = ['red', 'blue', 'black', 'white']
peoples = ['stephane', 'jaouad', 'mokhtar', 'yiyang']

In [None]:
# This is stupid
for i in range(len(colors)):
    print(colors[i])
    
# This is better
for color in colors:
    print(color)

To iterate over several sequences at the same time, use `zip`

In [None]:
for color, people in zip(colors, peoples):
    print(color, people)

In [None]:
l = ["Bonjour", {'francis': 5214, 'stephane': 5123}, ('truc', 3)]
for e in l:
    print(e, len(e))

**Loop over a `str`**

In [None]:
s = 'Bonjour'
for c in s:
    print(c)

**Loop over a `dict`**

In [None]:
dd = {(1, 3): {'hello', 'world'}, 'truc': [1, 2, 3], 5: (1, 4, 2)}

# Default is to loop over keys
for key in dd:
    print(key)

In [None]:
# Loop over values
for e in dd.values():
    print(e)

In [None]:
# Loop over items (key, value) pairs
for key, val in dd.items():
    print(key, val)

## Comprehensions

You can construct a `list`, `dict`, `set` and others using the **comprehension** syntax

**`list` comprehension**

In [None]:
print(colors)
print(peoples)

# The list of people with favorite color that has no more than 4 characters
[people for color, people in zip(colors, peoples) if len(color) <= 4]

**`dict` comprehension**

In [None]:
{people: color for color, people in zip(colors, peoples) if len(color) <= 4}

In [None]:
# Allows to build a dict from two lists (for keys and values)
{key: value for (key, value) in zip(peoples, colors)}

In [None]:
# But it's simpler (so better) to use
dict(zip(peoples, colors))

Something very convenient is `enumerate`

In [None]:
for i, color in enumerate(colors):
    print(i, color)

In [None]:
list(enumerate(colors))

In [None]:
print(dict(enumerate(s)))

In [None]:
s = 'Hey everyone'
{c: i for i, c in enumerate(s)}

## About functional programming

We can use `lambda` to define **anonymous** functions, and use them in the `map` and `reduce` functions

In [None]:
square = lambda x: x ** 2
square(2)

In [None]:
sum2 = lambda a, b: a + b
print(sum2('Hello', ' world'))
print(sum2(1, 2))

Intended for short and one-line function. 

More complex functions use `def` (see below)

## Exercice

Print the squares of even numbers between 0 et 15

1. Using a list comprehension as before
2. Using `map`

In [None]:
# Answer to 1.
[i ** 2 for i in range(15) if i % 2 == 0]

In [None]:
# Answer to 2. 
list(map(lambda x: x ** 2, range(0, 15, 2)))

**Remark**. We will see later why we need to use `list` above

In [None]:
map(lambda x: x ** 2, range(0, 15, 2))

Now, to get the sum of these squares, we can use `sum`

In [None]:
sum(map(lambda x: x ** 2, range(0, 15, 2)))

We can also use `reduce` (not a good idea here, but it's good to know that it exists)

In [None]:
from functools import reduce

reduce(lambda a, b: a + b, map(lambda x: x ** 2, range(0, 15, 2)))

There is also something that can be useful in `functool` called `partial`

It allows to **simplify** functions by freezing some arguments

In [None]:
from functools import partial

def mult(a, b):
    return a * b

double = partial(mult, b=2)
double(2) 

## Brain-f**king

What is the output of

```python
reduce(lambda a, b: a + b[0] * b[1], enumerate('abcde'), 'A')
```

In [None]:
reduce(lambda a, b: a + b[0] * b[1], enumerate('abcde'), 'A')

This does the following

In [None]:
((((('A' + 0 * 'a') + 1 * 'b') + 2 * 'c') + 3 * 'd') + 4 * 'e')

# Generators

In [None]:
import sys
import matplotlib.pyplot as plt
%matplotlib inline

plt.figure(figsize=(6, 6))
plt.plot([sys.getsizeof(list(range(i))) for i in range(100)], lw=3)
plt.plot([sys.getsizeof(range(i)) for i in range(100)], lw=3)
plt.xlabel('Number of elements (value of i)', fontsize=14)
plt.ylabel('Size (in bytes)', fontsize=14)
_ = plt.legend(['list(range(i))', 'range(i)'], fontsize=16)

## Why generators ?

The memory used by `range(i)` does not scale linearly with `i`

What is happening ?

- `range(n)` does not allocate a list of `n` elements ! 
- It **generates on the fly** the list of required integers
- We say that such an object behaves like a **generator** in `Python`
- Many things in the `Python` standard library behaves like this


**Warning.** Getting the real memory footprint of a `Python` object is difficult. 
Note that `sizeof` calls the `__sizeof__` method of `r`, which does not give in general the actual memory used by an object. But nevermind here.

The following computation has no memory footprint:

In [None]:
sum(range(10**8))

In [None]:
map(lambda x: x**2, range(10**7))

`map` does not return a `list` for the same reason

In [None]:
sum(map(lambda x: x**2, range(10**6)))

## Generator expression

Namely generators defined through comprehensions.
Just replace `[]` by `()` in the comprehension.

A generator can be iterated on only **once**

In [None]:
from itertools import product

gene = (i + j for i, j in product(range(3), range(3)))
gene

In [None]:
print(list(gene))
print(list(gene))

## `yield`

Something very powerful

In [None]:
def startswith(words, letter):
    for word in words:
        if word.startswith(letter):
            yield word

In [None]:
words = ['Python', "is", 'awesome', 'in', 'particular', 'generators', 'are', 'really', 'cool']
list(startswith(words, letter='a'))

But also with a `for` loop

In [None]:
for word in startswith(words, letter='a'):
    print(word)

# A glimpse at the ` collections` module

(This is where the good stuff hides)

In [None]:
texte = """
Bonjour,
Python c'est super.
Python ca a l'air quand même un peu compliqué.
Mais bon, ca a l'air pratique.
Peut-être que je pourrais m'en servir pour faire des trucs super.
"""
texte

In [None]:
print(texte)

In [None]:
# Some basic text preprocessing 
new_text = texte.strip()\
    .replace('\n', ' ')\
    .replace(',', ' ')\
    .replace('.', ' ')

print(new_text)
print('-' * 8)

words = new_text.split()
print(words)

## Exercice 

Count the number of occurences of all the words in `words`. 

Output must be a dictionary containg ``word: count``

In [None]:
print(words)

### Solution 1: hand-made

In [None]:
words_counts = {}
for word in words:
    if word in words_counts:
        words_counts[word] += 1
    else:
        words_counts[word] = 1

print(words_counts)

### Solution 2: using `defaultdict`

In [None]:
from collections import defaultdict

words_counts = defaultdict(int)
for word in words:
    words_counts[word] += 1

print(words_counts)

- `defaultdict` can be extremely useful
- A dict with a default value: here an `int` is created (defaults to 0) if key is not found
- Allows to avoid a test

### About `defaultdict`

- the argument must be a "callable" (something that can be called)
- Beware: as soon as a key is searched, a default value is added to the `defaultdict`

In [None]:
addresses = defaultdict(lambda: 'unknown')
addresses['huyen']
addresses['stephane'] = '8 place Aurelie Nemours'
print(addresses)

In [None]:
# Somewhat nasty...
print('jean-francois' in addresses)
print(addresses['jean-francois'])
print('jean-francois' in addresses)

### Solution 3. Don't do it by hand ! Use `counter`

In [None]:
from collections import Counter

print(dict(Counter(words)))

`Counter` counts the number of occurences of all objects in an iterable

**Question.** Which one do you prefer ?

- The `Counter` one right ?

### Morality

- When you need to do something, assume that there is a tool to do it directly 

- If you can't find it, ask `google` or `stackoverflow`

- Otherwise, try to do it as simply as possible

## Exercice 

Compute the number of occurences AND the length of each word in `words`.

Output must be a dictionary containing ``word: (count, length)``

### Solution

In [None]:
from collections import Counter

{word: (count, len(word)) for word, count in Counter(words).items()}

## The `namedtuple`

There is also the `namedtuple`. It's a `tuple` but with named attributes

In [None]:
from collections import namedtuple

Jedi = namedtuple('Jedi', ['firstname', 'lastname', 'age', 'color'])
yoda = Jedi('Minch', 'Yoda', 900, 'green')
yoda

In [None]:
yoda.firstname

In [None]:
yoda[1]

**Remark.** A better alternative since `Python 3.7` is dataclasses. We will talk about it later

# I/O, reading and writing files

Next, we assume that you have a text file `miserables.txt` in the folder containing 
this notebook. 

If you don't, simply run the next cell that downloads it from my webpage and saves it in a file.

In [None]:
import requests

url = 'https://stephanegaiffas.github.io/files/miserables.txt'
r = requests.get(url)

with open('miserables.txt', 'wb') as f:
    f.write(r.content)
    


In `jupyter` and `ipython` you can run terminal command lines using `!`

Let's count number of lines and number of words with the `wc` command-line tool (linux or mac only, don't ask me how on windows)

In [None]:
# Lines count
!wc -l miserables.txt

In [None]:
# Word count
!wc -w miserables.txt

## Exercice

Count the number of occurences of each word in the text file `miserables.txt`.
We use a `open` *context* and the `Counter` from before.

### Solution

In [None]:
from collections import Counter

counter = Counter()

with open('miserables.txt', encoding='utf8') as f:
    for line in f:
        line = line.strip().replace('\n', ' ')\
            .replace(',', ' ')\
            .replace('.', ' ')\
            .replace('»', ' ')\
            .replace('-', ' ')\
            .replace('!', ' ')\
            .replace('(', ' ')\
            .replace(')', ' ')\
            .replace('?', ' ').split()
        
        counter.update(line)

In [None]:
counter

In [None]:
counter.most_common(500)

## Contexts 

- A *context* in Python is something that we use with the `with` keyword.

- It allows to deal automatically with the opening and the closing of the file.

Note the for loop:
```python
for line in f:
    ...
```
You loop directly over the lines of the open file from **within** the `open` context

## About `pickle`

You can save your computation with `pickle`. 

- `pickle` is a way of saving **almost anything** with Python.
- It serializes the object in a binary format, and is usually the simplest and fastest way to go.

In [None]:
import pickle as pkl

# Let's save it
with open('miserable_word_counts.pkl', 'wb') as f:
    pkl.dump(counter, f)

# And read it again
with open('miserable_word_counts.pkl', 'rb') as f:
    counter = pkl.load(f)

In [None]:
counter.most_common(10)

# Defining functions

You **must** use function to order and reuse code

## Function definition

Function blocks must be indented as other control-flow blocks.

In [None]:
def test():
    return 'in test function'

test()

## Return statement

Functions can *optionally* return values.
By default, functions return ``None``.

The syntax to define a function:

- the ``def`` keyword;
- is followed by the function's **name**, then
- the arguments of the function are given between parentheses followed by a colon
- the function body;
- and ``return object`` for optionally returning values.

In [None]:
def f(x):
    return x + 10
f(20)

A function that returns several elements returns a `tuple`

In [None]:
def f(x):
    return x + 1, x + 4

f(5)

In [None]:
type(f(5))

## Parameters

Mandatory parameters (positional arguments)


In [None]:
def double_it(x):
    return x * 2

double_it(2)

In [None]:
double_it()

Optimal parameters

In [None]:
def double_it(x=2):
    return x * 2

double_it()

In [None]:
double_it(3)

In [None]:
def f(x, y=2, z=10):
    print(x, '+', y, '+', z, '=', x + y + z)

In [None]:
f(5)

In [None]:
f(5, -2)

In [None]:
f(5, -2, 8)

In [None]:
f(z=5, x=-2, y=8)

## Argument unpacking and keyword argument unpacking
You can do stuff like this, using unpacking `*` notation

In [None]:
a, *b, c = 1, 2, 3, 4, 5
a, b, c

Back to function `f` you can unpack a `tuple` as positional arguments

In [None]:
tt = (1, 2, 3)
f(*tt)

In [None]:
dd = {'y': 10, 'z': -5}

In [None]:
f(3, **dd)

In [None]:
def g(x, z, y, t=1, u=2):
    print(x, '+', y, '+', z, '+', t, '+', 
          u, '=', x + y + z + t + u)

In [None]:
tt = (1, -4, 2)
dd = {'t': 10, 'u': -5}
g(*tt, **dd)

## The prototype of all functions in `Python`

In [None]:
def f(*args, **kwargs):
    print('args=', args)
    print('kwargs=', kwargs)

f(1, 2, 'truc', lastname='gaiffas', firstname='stephane')

- Uses `*` for **argument unpacking** and `**` for **keyword argument unpacking**
- The names `args` and `kwargs` are a convention, not mandatory 
- (but you are fired if you name these arguments otherwise)

In [None]:
# How to get fired
def f(*aaa, **bbb):
    print('args=', aaa)
    print('kwargs=', bbb)
f(1, 2, 'truc', lastname='gaiffas', firstname='stephane')    

**Remark**. A function is a regular an object... you can add attributes on it !

In [None]:
f.truc = 4

In [None]:
f(1, 3)

In [None]:
f(3, -2, y='truc')

# Object-oriented programming (OOP)

Python supports object-oriented programming (OOP). The goals of OOP are:

- to organize the code, and
- to re-use code in similar contexts.

Here is a small example: we create a `Student` class, which is an object
gathering several custom functions (called *methods*) and variables 
(called *attributes*).

In [None]:
class Student(object):

    def __init__(self, name, birthyear, major='computer science'):
        self.name = name
        self.birthyear = birthyear
        self.major = major

    def __repr__(self):
        return "Student(name='{name}', birthyear={birthyear}, major='{major}')"\
                .format(name=self.name, birthyear=self.birthyear, major=self.major)

anna = Student('anna', 1987)
anna

The `__repr__` is what we call a 'magic method' in Python, that allows 
to display an object as a string easily. There is a very large number of such magic methods.
There are used to implement **interfaces**

## Exercice 
Add a `age` method to the Student class that computes the age of the student. 
- You  can (and should) use the `datetime` module. 
- Since we only know about the birth year, let's assume that the day of the birth is January, 1st.

### Correction

In [None]:
from datetime import datetime

class Student(object):

    def __init__(self, name, birthyear, major='computer science'):
        self.name = name
        self.birthyear = birthyear
        self.major = major

    def __repr__(self):
        return "Student(name='{name}', birthyear={birthyear}, major='{major}')"\
                .format(name=self.name, birthyear=self.birthyear, major=self.major)

    def age(self):
        return datetime.now().year - self.birthyear
        
anna = Student('anna', 1987)
anna.age()

## Properties

We can make methods look like attributes using **properties**, as shown below

In [None]:
class Student(object):

    def __init__(self, name, birthyear, major='computer science'):
        self.name = name
        self.birthyear = birthyear
        self.major = major

    def __repr__(self):
        return "Student(name='{name}', birthyear={birthyear}, major='{major}')"\
                .format(name=self.name, birthyear=self.birthyear, major=self.major)

    @property
    def age(self):
        return datetime.now().year - self.birthyear
        
anna = Student('anna', 1987)
anna.age

## Inheritance 

A `MasterStudent` is a `Student` with a new extra mandatory `internship` attribute

In [None]:
class MasterStudent(Student):
    
    def __init__(self, name, age, internship, major='computer science'):
        Student.__init__(self, name, age, major)
        self.internship = internship

    def __repr__(self):
        return "MasterStudent(name='{name}', internship='{internship}'" \
               ", birthyear={birthyear}, major='{major}')"\
                .format(name=self.name, internship=self.internship,
                        birthyear=self.birthyear, major=self.major)
    
MasterStudent('djalil', 22, 'pwc')

## Monkey patching

- Classes in `Python`  are `objects` and actually `dict`s under the hood...
- Therefore classes are objects that can be changed on the fly

In [None]:
class C(object):
    
    def __init__(self, name):
        self.name = name

    def monkey(self):
        print("Old monkey %s" % self.name)

def monkey(self):
    print("New monkey %s" % self.name)

c = C("Baloo")
c.monkey()

C.monkey = monkey
c.monkey()

## Data classes

Since `Python 3.7` you can use a dataclass for this

Does a lot of work for you (produces the `__repr__` among many other things for you)

In [None]:
from dataclasses import dataclass

@dataclass
class Student(object):
    name: str
    birthyear: int
    major: str = 'computer science'

    @property
    def age(self):
        return datetime.now().year - self.birthyear
        
anna = Student('anna', 1987)
anna

In [None]:
print(anna.age)

# Most common mistakes

- Let us wrap this up with the most common mistakes with `Python`

First, best way to learn and practice:

- Start with the official tutorial https://docs.python.org/fr/3/tutorial/index.html

- Look at https://python-3-for-scientists.readthedocs.io/en/latest/index.html

- Continue with the documentation at https://docs.python.org/fr/3/index.html and work!

## Using a mutable value as a default value

In [None]:
def foo(bar=[]):
    bar.append('oops')
    return bar

print(foo())
print(foo())
print(foo())

print('-' * 8)
print(foo(['Ah ah']))
print(foo([]))

- The default value for a function argument is evaluated once, when the function is defined
- `the` bar argument is initialized to its default (i.e., an empty list) only when foo() is first defined
- successive calls to `foo()` (with no a `bar` argument specified) use the same list!

One should use instead

In [None]:
def foo(bar=None):
    if bar is None:
        bar = []
    bar.append('oops')
    return bar

print(foo())
print(foo())
print(foo())
print(foo(['OK']))

No problem with immutable types

In [None]:
def foo(bar=()):
    bar += ('oops',)
    return bar

print(foo())
print(foo())
print(foo())

## Class attributes VS object attributes

In [None]:
class A(object):
    x = 1

    def __init__(self):
        self.y = 2

class B(A):
    def __init__(self):
        super().__init__()

class C(A):
    def __init__(self):
        super().__init__()

a, b, c = A(), B(), C()

In [None]:
print(a.x, b.x, c.x)
print(a.y, b.y, c.y)

In [None]:
a.y = 3
print(a.y, b.y, c.y)

In [None]:
a.x = 3  # Adds a new attribute named x in object a
print(a.x, b.x, c.x)

In [None]:
A.x = 3 # Changes the class attribute x of class A
print(a.x, b.x, c.x)

- Attribute `x` is not an **attribute** of `b` nor `c`
- It is also not a **class attribute** of classes `B` and `C`
- So, it is is looked up in the base class `A`, which contains a **class attribute** `x`

Classes and objects contain a hidden `dict` to store their attributes, and are accessed following a method resolution order (MRO)

In [None]:
a.__dict__, b.__dict__, c.__dict__

In [None]:
A.__dict__, B.__dict__, C.__dict__

This can lead to **nasty** errors when using class attributes: learn more about this

## Python scope rules

In [None]:
ints = [1]

def foo1():
    ints.append(2)
    return ints

def foo2():
    ints += [2]
    return ints

In [None]:
foo1()

In [None]:
foo2()

### What the hell ?

- An assignment to a variable in a scope assumes that the variable is local to that scope 
- and shadows any similarly named variable in any outer scope

```python
ints += [2]
```

means
```python
ints = ints + [2]
```

which is an assigment: `ints` must be defined in the local scope, but it is not, while
```python
ints.append(2)
```

is not an assignemnt

## Modify a `list` while iterating over it

In [None]:
odd = lambda x: bool(x % 2)
numbers = list(range(10))

for i in range(len(numbers)):
    if odd(numbers[i]):
        del numbers[i]

Typically an example where one should use a list comprehension

In [None]:
[number for number in numbers if not odd(number)]

## No docstrings

Accept to spend time to write clean docstrings (my favourite is the `numpydoc` style)

In [None]:
def create_student(name, age, address, major='computer science'):
    """Add a student in the database
    
    Parameters
    ----------
    name: `str`
        Name of the student
    
    age: `int`
        Age of the student
    
    address: `str`
        Address of the student
    
    major: `str`, default='computer science'
        The major chosen by the student
    
    Returns
    -------
    output: `Student`
        A fresh student
    """
    pass

## Not using available methods and/or the simplest solution

In [None]:
dd = {'stephane': 1234, 'gael': 4567, 'gontran': 891011}

# Bad
for key in dd.keys():
    print(key, dd[key])

print('-' * 8)

# Good
for key, value in dd.items():
    print(key, value)

In [None]:
colors = ['black', 'yellow', 'brown', 'red', 'pink']

# Bad
for i in range(len(colors)):
    print(i, colors[i])

print('-' * 8)

# Good
for i, color in enumerate(colors):
    print(i, color)

## Not using the standard library 

While it's **always** better than a hand-made solution

In [None]:
list1 = [1, 2]
list2 = [3, 4]
list3 = [5, 6, 7]

for a in list1:
    for b in list2:
        for c in list3:
            print(a, b, c)

In [None]:
from itertools import product

for a, b, c in product(list1, list2, list3):
    print(a, b, c)

# That's it for now !