Chapter 3: Introduction to Python for Data Science#
Data: The new fuel, The new electricity#
We are living in a world that’s drowning in data. As practicionners used to say, DATA is the new fuel, the new electricity.
But it will not make much of sense to limitate ourselves to having the data. Because in fact, data has always existed. From Johaness Kepler back in centuries before J.C in Prague when he tried to understand the movement of planets by recording their motion around the sun, all the way up to a medical doctor in Kenya who records information about a patient before applying any treatment.
Data has always at the center of (Applied) Science. But what makes it such an attraction nowadays are the tools we use to handle the actual data.
As said previously, Python
is one of them. The purpose of this section is to give the reader a glimpse about how to get the best of Python to efficiently carry out any data science project.
Python: A tool among others#
Over the last couple of decades, Python has emerged as a first-class tool for scientific computing tasks, including the analysis and visualization of large datasets.
Though it was not specifically designed with data analysis or scientific computing in mind, it has grown as the language or tool of choice when handling any data science project.
Mainly because of its wide community of users as well as its large and active ecosystem of third-party packages such as:
numpy
for manipulation of homogeneous array-based datapandas
for manipulation of heterogeneous and labeled data,scipy
for common scientific computing tasks,matplotlib
for publication-quality visualizations,scikit-learn
for classic machine learning algorithmsJupyter Notebook
for interactive execution and sharing of code
are the reasons why data scientists stick to it.
Deep Learning Libraries: Core to Data Science#
With the recent advent of Machine Learning/Deep Learning and their spectacular success, giant companies like Google and Facebook have put together some amazing educational resources to basically allow any practicioner to conduct any data science project.
Using the code-light philosophy to express the use of the concept-heavy thing that Machine Learning examplifies. The following libraries haev been released:
Tensorflow (Deep Learning Library written in Python powered by Google)
Pytorch (Deep Learning LIbrary written in Python powered by Facebook)
Most of the projects released by those companies make an extensive use of thoSe libraries and they are FREE and OPEN SOURCE!!!
There are some other alternatives like Theano
, Caffe
, Microsoft Azure Machine Learning
.
The basics of Python (refresher)#
Installing Python
Python is easily downlodable from python.org. But if you find the process a bit hectic, we strongly recommend installing the Anaconda distribution which already includes most of the libraries needed for data science.
Launching Python
There are basically two ways you can launch Python:
Either from the terminal by typing (not recommended for this course)
$ python
Or by launching
Anaconda-navigator
from the terminal and clicking onJupyter Notebook
(recommended for Windows Users and to some extent, Linux Users)Or simply by opening it through a Jupyter Notebook by typing from the terminal (recommended for Linux Users):
$ jupyter notebook
Task: #
Make sure you are confortable with opening Anaconda
Try and Navigate to the
Desktop
FolderCreate two folders called
Data_Science
andNetwork_Science
At the end of this session, make sure all the practicals are saved in the
Data_Science
Folder
Data Structures#
In the core Python language, some features are more important for data analysis than others. In this chapter, you’ll look at the most essential of them such as list, strings, string functions, data structures, list comprehension, counters.
Values and types#
A value is one of the basic things a program works with, like a letter or a number. These values belong to different types:
10 #is an integer
"apple" #is a string
7.5 #is a floating point
7.5
Variables#
One of the most powerful features of Python is the ability to manipulate variables. A variable is a name that refers to a value. An assignment statement creates new variables and gives them values.
a = 4
b = " data science is cool"
pi = 3.1415926535897931
print(a)
print(b)
print(pi)
4
data science is cool
3.141592653589793
type (a) , type(b) , type(pi)
(int, str, float)
List
A list is a sequence of values. In a string, the values in a list can be of any type. The values in a list are called elements or sometimes, items.
d = [ 2 , 9, 14, 12.5]
f = ["orange" , "apple", "banana" , "tomato"]
print (d , f)
[2, 9, 14, 12.5] ['orange', 'apple', 'banana', 'tomato']
To access elements in a list, we just use their index (starting from 0)
#the type of the object
type(f)
list
#the length of the list
len(d)
4
print ( d[0] , f[2])
2 banana
Lists are mutable which means you can change an element as compared to tuples which share the same properties but are not mutable
f[1] ="pineapple"
print(f)
['orange', 'pineapple', 'banana', 'tomato']
g = ("R" , "Java" , "C++")
print(g)
('R', 'Java', 'C++')
type(g)
tuple
g[1] = "perl"
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-11-867cd48ef690> in <module>
----> 1 g[1] = "perl"
TypeError: 'tuple' object does not support item assignment
List comprehension
The most common way to traverse the elements of a list is with a for loop. The syntax is the same as for strings:
for i in f:
print(i)
orange
pineapple
banana
tomato
Functions#
Functions are crucial to your data science workflow. I suggest you brush up your knowledge on that#
A function is a named sequence of statements that performs a computation. When you define a function, you specify the name and the sequence of statements. Later, you can “call” the function by name.
def our_addition (x , y):
"""This function takes two parameters, add them and display the results"""
z = x + y
return z
help(our_addition)
Help on function our_addition in module __main__:
our_addition(x, y)
This function takes two parameters, add them and display the results
our_addition( 5 , 6)
11
def add_two(s):
"""This function takes two parameters, a string preferably
and chain it with the string _two"""
t = s + '_two'
return t
add_two('fourty')
'fourty_two'
add_two('fifty')
'fifty_two'
Modules and Packages#
Modules refer to a file containing Python statements and definitions.
• Modules allow us to write code, and separate it out from other code into a different file.
• We can use the import statement to include the code of another file
To import a module in Python we type the following in the Python prompt.
Examples
import math #themath module
import re #the regular expression module
import random #the random module
However, Some modules might be relatively long to type. The user can choose to import a module and rename it , to save on typing. We can import a module by renaming it as follows.
import numpy as np
import pandas as pd
import sklearn as sk
Some of the librairies#
Numpy
Usually, the numpy library is meant to deal with arrays and matrices operations
import numpy as np
# A vector a 10 evenly spaced numbers of step 1
n1 = np.arange(10)
n1
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
# A vector of 6 evenly spaced numbers between 0 and 50
my_vector = np.linspace(0, 50 , 100)
my_vector
array([ 0. , 0.50505051, 1.01010101, 1.51515152, 2.02020202,
2.52525253, 3.03030303, 3.53535354, 4.04040404, 4.54545455,
5.05050505, 5.55555556, 6.06060606, 6.56565657, 7.07070707,
7.57575758, 8.08080808, 8.58585859, 9.09090909, 9.5959596 ,
10.1010101 , 10.60606061, 11.11111111, 11.61616162, 12.12121212,
12.62626263, 13.13131313, 13.63636364, 14.14141414, 14.64646465,
15.15151515, 15.65656566, 16.16161616, 16.66666667, 17.17171717,
17.67676768, 18.18181818, 18.68686869, 19.19191919, 19.6969697 ,
20.2020202 , 20.70707071, 21.21212121, 21.71717172, 22.22222222,
22.72727273, 23.23232323, 23.73737374, 24.24242424, 24.74747475,
25.25252525, 25.75757576, 26.26262626, 26.76767677, 27.27272727,
27.77777778, 28.28282828, 28.78787879, 29.29292929, 29.7979798 ,
30.3030303 , 30.80808081, 31.31313131, 31.81818182, 32.32323232,
32.82828283, 33.33333333, 33.83838384, 34.34343434, 34.84848485,
35.35353535, 35.85858586, 36.36363636, 36.86868687, 37.37373737,
37.87878788, 38.38383838, 38.88888889, 39.39393939, 39.8989899 ,
40.4040404 , 40.90909091, 41.41414141, 41.91919192, 42.42424242,
42.92929293, 43.43434343, 43.93939394, 44.44444444, 44.94949495,
45.45454545, 45.95959596, 46.46464646, 46.96969697, 47.47474747,
47.97979798, 48.48484848, 48.98989899, 49.49494949, 50. ])
# A vector of 10 random integers between 25 and 45
another_vector = np.random.randint(25,45,10)
another_vector
array([41, 25, 33, 38, 30, 44, 29, 26, 42, 38])
# A vector of 6 ones
np.ones(6)
array([1., 1., 1., 1., 1., 1.])
# A vector of 4 zeros
np.zeros(4)
array([0., 0., 0., 0.])
To access an array, we use the same procedure as for the lists
random_array= np.random.rand(5)
random_array
array([0.04298399, 0.55089829, 0.88139508, 0.83641585, 0.48454918])
#the first element
random_array[0]
0.04298399048746615
#the last element
random_array[-1]
0.48454917509949924
To reshape an array to a matrix
vector_integers = np.random.randint(0,50,12)
vector_integers
array([48, 25, 23, 15, 10, 5, 21, 7, 30, 23, 12, 7])
vector_integers.reshape(4,3)
array([[48, 25, 23],
[15, 10, 5],
[21, 7, 30],
[23, 12, 7]])
Customizing functions#
Let’s make it more fun: Let’s create our own functions
Task : #
Go back on
Jupyter Dashboard
and Click onNew
andText File
Rename that file
happiness.py
Copy and Paste the two functions we created above.
Save the file and close it.
In the cell below, try and run
import happiness as hp
Try and use the
add
function. What does that teach you?
import rock as hp
hp.our_addition(23,4)
27
If you need to revise your Python skills, I suggest you use this platform
Or
We have a set of videos
, practicals
and slides
for you via this link
from IPython.display import Image
Image('img/courses_materials_python.png', width=500)