Chapter 3: Introduction to Python for Data Science#

Data: The new fuel, The new electricity#

We are living in a world that’s drowning in data. As practicionners used to say, DATA is the new fuel, the new electricity.

But it will not make much of sense to limitate ourselves to having the data. Because in fact, data has always existed. From Johaness Kepler back in centuries before J.C in Prague when he tried to understand the movement of planets by recording their motion around the sun, all the way up to a medical doctor in Kenya who records information about a patient before applying any treatment.

Data has always at the center of (Applied) Science. But what makes it such an attraction nowadays are the tools we use to handle the actual data.

As said previously, Python is one of them. The purpose of this section is to give the reader a glimpse about how to get the best of Python to efficiently carry out any data science project.

Python: A tool among others#

Over the last couple of decades, Python has emerged as a first-class tool for scientific computing tasks, including the analysis and visualization of large datasets.

Though it was not specifically designed with data analysis or scientific computing in mind, it has grown as the language or tool of choice when handling any data science project.

Mainly because of its wide community of users as well as its large and active ecosystem of third-party packages such as:

  1. numpy for manipulation of homogeneous array-based data

  2. pandas for manipulation of heterogeneous and labeled data,

  3. scipy for common scientific computing tasks,

  4. matplotlib for publication-quality visualizations,

  5. scikit-learn for classic machine learning algorithms

  6. Jupyter Notebook for interactive execution and sharing of code

are the reasons why data scientists stick to it.

Deep Learning Libraries: Core to Data Science#

With the recent advent of Machine Learning/Deep Learning and their spectacular success, giant companies like Google and Facebook have put together some amazing educational resources to basically allow any practicioner to conduct any data science project.

Using the code-light philosophy to express the use of the concept-heavy thing that Machine Learning examplifies. The following libraries haev been released:

  1. Tensorflow (Deep Learning Library written in Python powered by Google)

  2. Pytorch (Deep Learning LIbrary written in Python powered by Facebook)

Most of the projects released by those companies make an extensive use of thoSe libraries and they are FREE and OPEN SOURCE!!!

There are some other alternatives like Theano , Caffe , Microsoft Azure Machine Learning.

The basics of Python (refresher)#

Installing Python

Python is easily downlodable from python.org. But if you find the process a bit hectic, we strongly recommend installing the Anaconda distribution which already includes most of the libraries needed for data science.

Launching Python

There are basically two ways you can launch Python:

  1. Either from the terminal by typing (not recommended for this course)

    $ python

  2. Or by launching Anaconda-navigator from the terminal and clicking on Jupyter Notebook (recommended for Windows Users and to some extent, Linux Users)

  3. Or simply by opening it through a Jupyter Notebook by typing from the terminal (recommended for Linux Users):

$ jupyter notebook

Task: #

  1. Make sure you are confortable with opening Anaconda

  2. Try and Navigate to the Desktop Folder

  3. Create two folders called Data_Science and Network_Science

  4. At the end of this session, make sure all the practicals are saved in the Data_Science Folder

Data Structures#

In the core Python language, some features are more important for data analysis than others. In this chapter, you’ll look at the most essential of them such as list, strings, string functions, data structures, list comprehension, counters.

Values and types#

A value is one of the basic things a program works with, like a letter or a number. These values belong to different types:

10  #is an integer

"apple" #is a string

7.5  #is a floating point
7.5

Variables#

One of the most powerful features of Python is the ability to manipulate variables. A variable is a name that refers to a value. An assignment statement creates new variables and gives them values.

a = 4

b  = " data science is cool"

pi = 3.1415926535897931

print(a)
print(b)
print(pi)
4
 data science is cool
3.141592653589793
type (a) , type(b) , type(pi)
(int, str, float)

List

A list is a sequence of values. In a string, the values in a list can be of any type. The values in a list are called elements or sometimes, items.

d = [ 2 , 9,  14, 12.5]

f = ["orange" , "apple", "banana" , "tomato"] 

print (d , f)
[2, 9, 14, 12.5] ['orange', 'apple', 'banana', 'tomato']

To access elements in a list, we just use their index (starting from 0)

#the type of the object
type(f)
list
#the length of the list
len(d)
4
print ( d[0] , f[2])
2 banana

Lists are mutable which means you can change an element as compared to tuples which share the same properties but are not mutable

f[1] ="pineapple"
print(f)
['orange', 'pineapple', 'banana', 'tomato']
g = ("R" , "Java" , "C++")
print(g)
('R', 'Java', 'C++')
type(g)
tuple
g[1] = "perl"
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-11-867cd48ef690> in <module>
----> 1 g[1] = "perl"

TypeError: 'tuple' object does not support item assignment

List comprehension

The most common way to traverse the elements of a list is with a for loop. The syntax is the same as for strings:

for i in f: 
    print(i)
orange
pineapple
banana
tomato

Functions#

Functions are crucial to your data science workflow. I suggest you brush up your knowledge on that#

A function is a named sequence of statements that performs a computation. When you define a function, you specify the name and the sequence of statements. Later, you can “call” the function by name.

def our_addition (x , y):
    """This function takes two parameters, add them and display the results"""
    z = x + y
    return z
help(our_addition)
Help on function our_addition in module __main__:

our_addition(x, y)
    This function takes two parameters, add them and display the results
our_addition( 5 , 6)
11
def add_two(s):
    """This function takes two parameters, a string preferably
    and chain it with the string _two"""
    t = s + '_two'
    return t
add_two('fourty')
'fourty_two'
add_two('fifty')
'fifty_two'

Modules and Packages#

Modules refer to a file containing Python statements and definitions.

• Modules allow us to write code, and separate it out from other code into a different file.

• We can use the import statement to include the code of another file

To import a module in Python we type the following in the Python prompt.

\[import \quad ”module \_ name”\]

Examples

import math #themath module
import re #the regular expression module
import random #the random module

However, Some modules might be relatively long to type. The user can choose to import a module and rename it , to save on typing. We can import a module by renaming it as follows.

\[import \quad ”module” \quad as \quad ”new \quad name”\]
import numpy as np
import pandas as pd
import sklearn as sk

Some of the librairies#

Numpy

Usually, the numpy library is meant to deal with arrays and matrices operations

import numpy as np
# A vector a 10 evenly spaced numbers of step 1
n1 = np.arange(10)  
n1
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
# A vector of 6 evenly spaced numbers between 0 and 50
my_vector = np.linspace(0, 50 , 100)
my_vector
array([ 0.        ,  0.50505051,  1.01010101,  1.51515152,  2.02020202,
        2.52525253,  3.03030303,  3.53535354,  4.04040404,  4.54545455,
        5.05050505,  5.55555556,  6.06060606,  6.56565657,  7.07070707,
        7.57575758,  8.08080808,  8.58585859,  9.09090909,  9.5959596 ,
       10.1010101 , 10.60606061, 11.11111111, 11.61616162, 12.12121212,
       12.62626263, 13.13131313, 13.63636364, 14.14141414, 14.64646465,
       15.15151515, 15.65656566, 16.16161616, 16.66666667, 17.17171717,
       17.67676768, 18.18181818, 18.68686869, 19.19191919, 19.6969697 ,
       20.2020202 , 20.70707071, 21.21212121, 21.71717172, 22.22222222,
       22.72727273, 23.23232323, 23.73737374, 24.24242424, 24.74747475,
       25.25252525, 25.75757576, 26.26262626, 26.76767677, 27.27272727,
       27.77777778, 28.28282828, 28.78787879, 29.29292929, 29.7979798 ,
       30.3030303 , 30.80808081, 31.31313131, 31.81818182, 32.32323232,
       32.82828283, 33.33333333, 33.83838384, 34.34343434, 34.84848485,
       35.35353535, 35.85858586, 36.36363636, 36.86868687, 37.37373737,
       37.87878788, 38.38383838, 38.88888889, 39.39393939, 39.8989899 ,
       40.4040404 , 40.90909091, 41.41414141, 41.91919192, 42.42424242,
       42.92929293, 43.43434343, 43.93939394, 44.44444444, 44.94949495,
       45.45454545, 45.95959596, 46.46464646, 46.96969697, 47.47474747,
       47.97979798, 48.48484848, 48.98989899, 49.49494949, 50.        ])
# A vector of 10 random integers between 25 and 45
another_vector = np.random.randint(25,45,10)
another_vector
array([41, 25, 33, 38, 30, 44, 29, 26, 42, 38])
# A vector of 6 ones
np.ones(6)
array([1., 1., 1., 1., 1., 1.])
# A vector of 4 zeros
np.zeros(4)
array([0., 0., 0., 0.])

To access an array, we use the same procedure as for the lists

random_array= np.random.rand(5)
random_array
array([0.04298399, 0.55089829, 0.88139508, 0.83641585, 0.48454918])
#the first element
random_array[0]
0.04298399048746615
#the last element
random_array[-1]
0.48454917509949924

To reshape an array to a matrix

vector_integers = np.random.randint(0,50,12)
vector_integers
array([48, 25, 23, 15, 10,  5, 21,  7, 30, 23, 12,  7])
vector_integers.reshape(4,3)
array([[48, 25, 23],
       [15, 10,  5],
       [21,  7, 30],
       [23, 12,  7]])

Customizing functions#

Let’s make it more fun: Let’s create our own functions

Task : #

  1. Go back on Jupyter Dashboard and Click on New and Text File

  2. Rename that file happiness.py

  3. Copy and Paste the two functions we created above.

  4. Save the file and close it.

  5. In the cell below, try and run import happiness as hp

  6. Try and use the add function. What does that teach you?

import rock as hp
hp.our_addition(23,4)
27

If you need to revise your Python skills, I suggest you use this platform

Or

We have a set of videos, practicals and slides for you via this link

from IPython.display import Image
Image('img/courses_materials_python.png', width=500)
../_images/b076c7b4d48b1f07b52a7918ac50d33bcb376edb8d349586b093f9bb0c97777c.png