Chapter 3: Introduction to Python for Data Science

Chapter 3: Introduction to Python for Data Science#

Data: The new fuel, The new electricity#

We are living in a world that’s drowning in data. As practicionners used to say, DATA is the new fuel, the new electricity.

But it will not make much of sense to limitate ourselves to having the data. Because in fact, data has always existed. From Johaness Kepler back in centuries before J.C in Prague when he tried to understand the movement of planets by recording their motion around the sun, all the way up to a medical doctor in Kenya who records information about a patient before applying any treatment.

Data has always at the center of (Applied) Science. But what makes it such an attraction nowadays are the tools we use to handle the actual data.

As said previously, Python is one of them. The purpose of this section is to give the reader a glimpse about how to get the best of Python to efficiently carry out any data science project.

Python: A tool among others#

Over the last couple of decades, Python has emerged as a first-class tool for scientific computing tasks, including the analysis and visualization of large datasets.

Though it was not specifically designed with data analysis or scientific computing in mind, it has grown as the language or tool of choice when handling any data science project.

Mainly because of its wide community of users as well as its large and active ecosystem of third-party packages such as:

numpy for manipulation of homogeneous array-based data
pandas for manipulation of heterogeneous and labeled data,
scipy for common scientific computing tasks,
matplotlib for publication-quality visualizations,
scikit-learn for classic machine learning algorithms
Jupyter Notebook for interactive execution and sharing of code

are the reasons why data scientists stick to it.

Deep Learning Libraries: Core to Data Science#

With the recent advent of Machine Learning/Deep Learning and their spectacular success, giant companies like Google and Facebook have put together some amazing educational resources to basically allow any practicioner to conduct any data science project.

Using the code-light philosophy to express the use of the concept-heavy thing that Machine Learning examplifies. The following libraries haev been released:

Tensorflow (Deep Learning Library written in Python powered by Google)
Pytorch (Deep Learning LIbrary written in Python powered by Facebook)

Most of the projects released by those companies make an extensive use of thoSe libraries and they are FREE and OPEN SOURCE!!!

There are some other alternatives like Theano , Caffe , Microsoft Azure Machine Learning.

The basics of Python (refresher)#

Installing Python

Python is easily downlodable from python.org. But if you find the process a bit hectic, we strongly recommend installing the Anaconda distribution which already includes most of the libraries needed for data science.

Launching Python

There are basically two ways you can launch Python:

Either from the terminal by typing (not recommended for this course)

$ python

Or by launching Anaconda-navigator from the terminal and clicking on Jupyter Notebook (recommended for Windows Users and to some extent, Linux Users)

Or simply by opening it through a Jupyter Notebook by typing from the terminal (recommended for Linux Users):

$ jupyter notebook

Task: #

Make sure you are confortable with opening Anaconda

Try and Navigate to the Desktop Folder

Create two folders called Data_Science and Network_Science

At the end of this session, make sure all the practicals are saved in the Data_Science Folder

Data Structures#

In the core Python language, some features are more important for data analysis than others. In this chapter, you’ll look at the most essential of them such as list, strings, string functions, data structures, list comprehension, counters.

Values and types#

A value is one of the basic things a program works with, like a letter or a number. These values belong to different types:

10 #is an integer "apple" #is a string 7.5 #is a floating point

7.5

Variables#

One of the most powerful features of Python is the ability to manipulate variables. A variable is a name that refers to a value. An assignment statement creates new variables and gives them values.

a = 4 b = " data science is cool" pi = 3.1415926535897931 print(a) print(b) print(pi)

4 data science is cool 3.141592653589793

type (a) , type(b) , type(pi)

(int, str, float)

List

A list is a sequence of values. In a string, the values in a list can be of any type. The values in a list are called elements or sometimes, items.

d = [ 2 , 9, 14, 12.5] f = ["orange" , "apple", "banana" , "tomato"] print (d , f)

[2, 9, 14, 12.5] ['orange', 'apple', 'banana', 'tomato']

To access elements in a list, we just use their index (starting from 0)

#the type of the object type(f)

list

#the length of the list len(d)

4

print ( d[0] , f[2])

2 banana

Lists are mutable which means you can change an element as compared to tuples which share the same properties but are not mutable

f[1] ="pineapple" print(f)

['orange', 'pineapple', 'banana', 'tomato']

g = ("R" , "Java" , "C++") print(g)

('R', 'Java', 'C++')

type(g)

tuple

g[1] = "perl"

--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-11-867cd48ef690> in <module> ----> 1 g[1] = "perl" TypeError: 'tuple' object does not support item assignment

List comprehension

The most common way to traverse the elements of a list is with a for loop. The syntax is the same as for strings:

for i in f: print(i)

orange pineapple banana tomato

Functions#

Functions are crucial to your data science workflow. I suggest you brush up your knowledge on that#

A function is a named sequence of statements that performs a computation. When you define a function, you specify the name and the sequence of statements. Later, you can “call” the function by name.

def our_addition (x , y): """This function takes two parameters, add them and display the results""" z = x + y return z

help(our_addition)

Help on function our_addition in module __main__: our_addition(x, y) This function takes two parameters, add them and display the results

our_addition( 5 , 6)

11

def add_two(s): """This function takes two parameters, a string preferably and chain it with the string _two""" t = s + '_two' return t

add_two('fourty')

'fourty_two'

add_two('fifty')

'fifty_two'

Modules and Packages#

Modules refer to a file containing Python statements and definitions.

• Modules allow us to write code, and separate it out from other code into a different file.

• We can use the import statement to include the code of another file

To import a module in Python we type the following in the Python prompt.

\[import \quad ”module \_ name”\]

Examples

import math #themath module import re #the regular expression module import random #the random module

However, Some modules might be relatively long to type. The user can choose to import a module and rename it , to save on typing. We can import a module by renaming it as follows.

\[import \quad ”module” \quad as \quad ”new \quad name”\]

import numpy as np import pandas as pd import sklearn as sk

Some of the librairies#

Numpy

Usually, the numpy library is meant to deal with arrays and matrices operations

import numpy as np

# A vector a 10 evenly spaced numbers of step 1 n1 = np.arange(10) n1

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

# A vector of 6 evenly spaced numbers between 0 and 50 my_vector = np.linspace(0, 50 , 100) my_vector

array([ 0. , 0.50505051, 1.01010101, 1.51515152, 2.02020202, 2.52525253, 3.03030303, 3.53535354, 4.04040404, 4.54545455, 5.05050505, 5.55555556, 6.06060606, 6.56565657, 7.07070707, 7.57575758, 8.08080808, 8.58585859, 9.09090909, 9.5959596 , 10.1010101 , 10.60606061, 11.11111111, 11.61616162, 12.12121212, 12.62626263, 13.13131313, 13.63636364, 14.14141414, 14.64646465, 15.15151515, 15.65656566, 16.16161616, 16.66666667, 17.17171717, 17.67676768, 18.18181818, 18.68686869, 19.19191919, 19.6969697 , 20.2020202 , 20.70707071, 21.21212121, 21.71717172, 22.22222222, 22.72727273, 23.23232323, 23.73737374, 24.24242424, 24.74747475, 25.25252525, 25.75757576, 26.26262626, 26.76767677, 27.27272727, 27.77777778, 28.28282828, 28.78787879, 29.29292929, 29.7979798 , 30.3030303 , 30.80808081, 31.31313131, 31.81818182, 32.32323232, 32.82828283, 33.33333333, 33.83838384, 34.34343434, 34.84848485, 35.35353535, 35.85858586, 36.36363636, 36.86868687, 37.37373737, 37.87878788, 38.38383838, 38.88888889, 39.39393939, 39.8989899 , 40.4040404 , 40.90909091, 41.41414141, 41.91919192, 42.42424242, 42.92929293, 43.43434343, 43.93939394, 44.44444444, 44.94949495, 45.45454545, 45.95959596, 46.46464646, 46.96969697, 47.47474747, 47.97979798, 48.48484848, 48.98989899, 49.49494949, 50. ])

# A vector of 10 random integers between 25 and 45 another_vector = np.random.randint(25,45,10) another_vector

array([41, 25, 33, 38, 30, 44, 29, 26, 42, 38])

# A vector of 6 ones np.ones(6)

array([1., 1., 1., 1., 1., 1.])

# A vector of 4 zeros np.zeros(4)

array([0., 0., 0., 0.])

To access an array, we use the same procedure as for the lists

random_array= np.random.rand(5) random_array

array([0.04298399, 0.55089829, 0.88139508, 0.83641585, 0.48454918])

#the first element random_array[0]

0.04298399048746615

#the last element random_array[-1]

0.48454917509949924

To reshape an array to a matrix

vector_integers = np.random.randint(0,50,12) vector_integers

array([48, 25, 23, 15, 10, 5, 21, 7, 30, 23, 12, 7])

vector_integers.reshape(4,3)

array([[48, 25, 23], [15, 10, 5], [21, 7, 30], [23, 12, 7]])

Customizing functions#

Let’s make it more fun: Let’s create our own functions

Task : #

Go back on Jupyter Dashboard and Click on New and Text File

Rename that file happiness.py

Copy and Paste the two functions we created above.

Save the file and close it.

In the cell below, try and run import happiness as hp

Try and use the add function. What does that teach you?

import rock as hp

hp.our_addition(23,4)

27

If you need to revise your Python skills, I suggest you use this platform

Or

We have a set of videos, practicals and slides for you via this link

from IPython.display import Image Image('img/courses_materials_python.png', width=500)

previous

Chapter 2: Getting started with Jupyter and Colab

next

Chapter 1: Working with Series and DataFrames

Contents

Data: The new fuel, The new electricity

Python: A tool among others

Deep Learning Libraries: Core to Data Science

The basics of Python (refresher)

Task:

Data Structures

Values and types

Variables

Functions

Functions are crucial to your data science workflow. I suggest you brush up your knowledge on that

Modules and Packages

Some of the librairies

Customizing functions

Task :

By Rockefeller

© Copyright 2022.