This tutorial was heavily influenced by two sources:
[ back to top ]
A variable is a name used to reference a value for later use. Think of a variable as a name pointing to some object. Assigning a value to a variables in Python is done with the assignment operator (=). For example, the following line assigns the value 2
to the the variable x
.
x = 2
# Comments begin with the pound sign and continue to the end of the line
# The built-in print function will display a value on the screen
print(x)
2
Assigning variables to values is the primary way that data is stored and manipulated in Python code.
Variable names must start with a letter and can contain letters, numbers, and the underscore character ( _
). The usual convention for variable names is to use lowercase letters with underscores to seperate words (e.g. my_favorite_number = 2
). In addition, there are a number of keywords in Python that are used by the interpreter. So you'll want to avoid using these keywords as variable names. Namely,
False, None, True, and, as, assert, break, class, continue, def, del, elif, else,
except, finally, for, from, global, if, import, in, is, lambda, nonlocal, not, or,
pass, raise, return, try, while, with, yield
are keywords to avoid. Now that we know variables are names that reference a value, let's look at some of the types of values that we have in Python.
[ back to top ]
Every value in Python has a type associated with it (NOTE: If you ever want to know the type of something, you can use the built-in type
function)
type(x)
int
Different types have different allowed operations and functionality. For example, as we'll discuss later, the addition operator is defined for integer types
a = 1
b = 2
print(type(a))
print(type(b))
# Use addition operator between two integers
print(a + b)
<class 'int'> <class 'int'> 3
and an uppercase method is defined for string types
c = 'Madison, WI'
print(type(c))
# Use the string uppercase method to make every character uppercase
print(c.upper())
<class 'str'> MADISON, WI
The most-commonly used built-in types in Python can be grouped into two categories: simple and compound.
[ back to top ]
The "simple" types consist of integers (int
), floating-point numbers (float
), complex numbers (complex
), boolean values (bool
), and the None type (NoneType
).
Integers represent whole numbers ( ...,-2, -1, 0, 1, 2,...). Numbers without a decimal point or exponential notation produce integers.
a = 2
print(type(a))
<class 'int'>
Floating-point numbers (often called "floats") represent real numbers. Number containing a decimal point or exponential notation are used to define floating-point values.
b = 1.5
print(type(b))
<class 'float'>
b.as_integer_ratio()
(3, 2)
Exponential notation (e
or E
) is shorthand for scientific notation. E.g. 7e3
= $7 \times 10^3$
c = 7e3
print(c)
print(type(c))
7000.0 <class 'float'>
A complex number can be created by including a 'j' or 'J' in a number. The corresponding real and imaginary parts of the complex number can be accessed using real
and imag
attributes.
z = 7 + 4.3j
print(type(z))
<class 'complex'>
z.real
7.0
z.imag
4.3
Note that the real and imaginary parts for the complex
type are floating-point numbers—regardless of whether or not there is a decimal point.
print(type(z.real))
<class 'float'>
Booleans can take on one of two possible values: True
or False
. Booleans will be utilized later when we discuss the conditional statements in Python.
n = True
print(type(n))
<class 'bool'>
p = False
print(type(p))
<class 'bool'>
Note that bool
values are case sensitive—the first letter needs to be capitalized.
The NoneType
type represents just a single value: None
. None
is commonly used to represent un-initialized values. In addition, functions that don't have an explicit return value will implicity return None
.
z = None
print(type(z))
<class 'NoneType'>
[ back to top ]
In addition to int
, float
, complex
, bool
, and NoneType
, Python also has several built-in data structures that are used as containers for other types. These "compound" types consist of lists (list
), tuples (tuple
), strings (str
), sets (set
), and dictionaries (dict
).
A list is a ordered, mutable collection of data (data elements are called "items"). We'll discuss mutable vs. immutable objects momentarily. Lists are constructed using square brackets with list items seperated by commas.
d = [2.3, 5, -43, 74.7, 5]
print(type(d))
<class 'list'>
print(d)
[2.3, 5, -43, 74.7, 5]
Lists have lots of built-in functionality. For example, You can use the built-in len
function to get the number of items in a list
len(d)
5
The list append
method can be used to add elements to a list
d.append(3.1415)
print(d)
[2.3, 5, -43, 74.7, 5, 3.1415]
The list sort
method will sort the items in a list into ascending order
d.sort()
print(d)
[-43, 2.3, 3.1415, 5, 5, 74.7]
The list reverse
method will reserve the order of a list
d.reverse()
print(d)
[74.7, 5, 5, 3.1415, 2.3, -43]
The list count
method counts the number of times an item occurs in a list
# Counts how many times the item 5 occurs in the list d
d.count(5)
2
Note that a list
can contain any type of object. The items in a list
need not be homogeneous. For example,
crazy_list = [1, False, 23.11, [1, 2, 3], None]
print(crazy_list)
[1, False, 23.11, [1, 2, 3], None]
The items in a list can be accessed using list indexing. Indexing a list consists of adding the item index in square brackets after a list. It's also important to note that in Python list indices begin with zero. So the first item in a list has the index 0
, the second item has the index 1
, and so on. For example
d = [2.3, 5, -43, 74.7, 5]
d[0]
2.3
d[1]
5
d[2]
-43
Python also supports negative indexing. This has the effect of staring from the end of the list. So the last item in a list has an idex of -1
, the second to last item has an index of -2
, and so on.
d[-1]
5
d[-2]
74.7
d[-3]
-43
In addition to indexing a list to get back list items, you can using slicing to get back a sub-list. The syntax for list slicing is given by
list_object[starting_index : stopping_index : index_step_size]
As an example we could use 0:3 which would give us all the elements starting with the zero index item and up to but not including the third index item.
print(d)
print(d[0:3]) # This will return the sub-list starting from the index 0 up to, but not including, the index 3
[2.3, 5, -43, 74.7, 5] [2.3, 5, -43]
By default, the starting index is 0
(at the beginning of the list), the stopping index corresponds to the last item, and the step size is 1
.
print(d)
print(d[:4])
print(d[1:])
print(d[::2])
[2.3, 5, -43, 74.7, 5] [2.3, 5, -43, 74.7] [5, -43, 74.7, 5] [2.3, -43, 5]
print(d[:-4])
[2.3]
Tuples are ordered, immutable collection of data. Tuples can be construced in a similar way as lists, but with parenthesis instead of square brackets.
f = (83.2, -4 ,5e7)
print(type(f))
<class 'tuple'>
One weird quirk is that a tuple with a single item needs to have a trailing comma, e.g.
f = (83.2,)
print(f)
print(type(f))
(83.2,) <class 'tuple'>
If this trailing comma is left out, then python will assume you don't actually want a tuple and will assign whatever the single item is in parenthesis to your variable. For example,
f = (83.2)
print(f)
print(type(f))
83.2 <class 'float'>
The number of items in a tuple can be found using the built-in len
function, and they support indexing and slicing similar to lists.
g = (1, 3.2, False, 222, None)
print(g)
print(len(g))
print(g[1:4])
(1, 3.2, False, 222, None) 5 (3.2, False, 222)
Up to this point, it may seem like lists and tuples aren't any different. They are both containers that can hold items, you can access the items with an index, etc. How are these things different? One of the main differences between the list
type and the tuple
type is that lists are mutable, while tuples are immutable. Once created, the value of a mutable object can be changed, while immutable objects cannot be changed once created. Let's look at an example.
Let's create a list
g = [1, 2, 3, 4]
Now let's modify the list in place. That is, let's try to change the items in the list without creating a whole new list.
g[0] = 99
print(g)
[99, 2, 3, 4]
As you can see, there wasn't a problem here. We assigned to the variable g
the list [1, 2, 3, 4]
, then modified the zeroth item in g
to be the number 99
. Let's try the same thing with a tuple
now.
g = (1, 2, 3, 4)
print(g)
(1, 2, 3, 4)
g[0] = 99
print(g)
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-43-79deaff0a4bf> in <module>() ----> 1 g[0] = 99 2 print(g) TypeError: 'tuple' object does not support item assignment
We got this error because tuples are immutable—they can't be modified once they're created.
A set is an unordered collection with no duplicate elements. They are constructed with comma separated items in curly brackets, {}
.
s = {1, 2, 3, 4, 2, 2, 3, 1}
print(s)
{1, 2, 3, 4}
Set objects support mathematical operations like union, intersection, difference, and symmetric difference. Set unions are done with the |
operator or using the set union
method. For example,
s1 = {1, 2, 3, 4}
s2 = {3, 4, 5, 6}
print(s1 | s2)
print(s1.union(s2))
{1, 2, 3, 4, 5, 6} {1, 2, 3, 4, 5, 6}
Set intersections are done with the &
operator or using the set intersection
method. For example,
print(s1 & s2)
print(s1.intersection(s2))
{3, 4} {3, 4}
As always, the number of items in a set can be found using the len
function.
len(s1)
4
Strings are used to represent a sequence of characters. Strings can be created by enclosing characters in either single or double quotes.
g = 'pizza'
type(g)
str
h = "jamesbond007"
type(h)
str
Strings can also be index just like lists and tuples.
h[0]
'j'
h[-4]
'd'
h[3:6]
'esb'
You can also find out how many characters are in a string using the len()
function
len(h)
12
Dictionaries are unordered containers for key-value pairs. That is, dictionaries store a mapping for each key to an associated value. Dictionaries are created by placing comma-separated key-value pairs inside curly brackets {}
. For a key-value pair, the key and corresponding value are seperated by a colon, :
.
An example might help...
k = {'MJ': 23, 'longitude': -53.2, 'city': 'Tokyo'}
print(type(k))
<class 'dict'>
Here, the dictionary keys are 'key1', 'key2', and 'key3', with corresponding values of 23, -53.2, and 'Tokyo'. In a similar way to sequences, you can access the values in a dicionary by giving the corresponding key in square brackets.
k['MJ']
23
k['longitude']
-53.2
k['city']
'Tokyo'
The keys in a dictionary can be obtained by using the dictionary keys
method
k.keys()
dict_keys(['MJ', 'longitude', 'city'])
The values in a dictionary can be obtained by using the dictionary values
method
k.values()
dict_values([23, -53.2, 'Tokyo'])
The size of a dictionary (the number of key-value pairs it contains) can be found with the built-in len()
function.
len(k)
3
It is important to note that in the previous example all the keys were strings, but this doesn't have to be the case. The only restriction on keys is that they be hashable. This means that keys must be an immutable type. For example, the following is also an acceptable dictionary.
m = {-23: [1, 2, 3, 4], 'desk': 3.2, 7.12: (-3, 'bird')}
m[-23]
[1, 2, 3, 4]
m['desk']
3.2
m[7.12]
(-3, 'bird')
Let see what happens if I try to contruct a dictionary with a list (a mutable object) as a key
n = {[1, 2, 3]: 'WOO'}
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-65-484e2f71ab36> in <module>() ----> 1 n = {[1, 2, 3]: 'WOO'} TypeError: unhashable type: 'list'
Whoops lists are mutable! So remember to always use immutable objects for dictionary keys!
Other useful types can be found in the collections
module in the Python standard library.
Sometimes it can be useful to change the type of an object. In Python, this so-called "type-casting" can be accomplished using several built-in functions:
int()
—casts to integerfloat()
—casts to floatstr()
—casts to stringbool()
—casts to booleanLet's see it in action
Casting integers to floats is fairly straight-forward
a = float(2)
print(a)
print(type(a))
2.0 <class 'float'>
When casting a float to an integer, Python will round down to the nearest integer
b = int(78.81)
print(b)
print(type(b))
78 <class 'int'>
You can even cast a number to a string! (This effectively just returns the number in quotes)
c = str(-1324.1)
print(c)
print(type(c))
-1324.1 <class 'str'>
Things get a little less straight-forward when casting to bool
. Below are the bool
casting rules:
0
, 0.0
, 0+0j
, cast to False
. Everything else casts to True
.[]
, will cast to False
. Non-empty lists casts to True
.()
, will cast to False
. Non-empty tuples casts to True
.''
, will cast to False
. Non-empty strings casts to True
.{}
, casts to False
. Everything else casts to True
.False
.Here are some examples:
bool(0)
False
bool(-178.3)
True
bool([])
False
bool([1, 2, 3, 4, 5])
True
bool({})
False
bool({'key': 'value'})
True
The following operations are supported for several of the built-in types in Python:
+
-
*
/
//
**
Some examples with numerical types...
1+1
2
10-3
7
3*5.0
15.0
5/2
2.5
5//3
1
9.0**2
81.0
When performing arithmetic operations, the type of numbers does matter. According to the Python Software Foundation:
Python fully supports mixed arithmetic: when a binary arithmetic operator has operands of different numeric types, the operand with the 'narrower' type is widened to that of the other, where integer is narrower than floating point, which is narrower than complex.
So, when you have a arithmetic operation with mixed numeric types, say adding an int
and a float
, the result will have the 'widest' type of the two numbers, in this case float
. The convention is int
is the most narrow type, float
is wider than int
, and complex
is wider than float
.
Some of these arithmetic operations are even defined for compound types. For example, list addition
list1 = [1, 2, 3, 4]
list2 = [5, 6]
summed_list = list1 + list2
print(summed_list)
[1, 2, 3, 4, 5, 6]
tuple addition
tup1 = (1, 2, 3, 4)
tup2 = (5, 6)
summed_tuple = tup1 + tup2
print(summed_tuple)
(1, 2, 3, 4, 5, 6)
and string addition are all defined
'My name is ' + 'Alex'
'My name is Alex'
In addition to using arithmetic operations to combine objects, it's also using to compare the value of objects as well. The comparison operators defined in Python are:
>
(greater than)>=
(greater than or equal to)A boolean value of either True
or False
will be returned appropreiately from a comparison operator. For example,
2 == 2
True
1 > 0.5
True
1 < 0.5
False
Python also can handle comparing different types to one another. In particular, floats and integers are compared in a natural way
2 == 2.0
True
Multiple comparison can also be made at once.
a = 25
print(15 < a < 30) # Checks whether or not a is greater than 15 and less thatn 30
True
Boolean values can also be combined using the and
, or
, or not
keywords.
(a < 30) and (a > 15)
True
(a < 30) and (a == 25)
True
(a > 30) or (a == 25)
True
(a > 30) or (a < 15)
False
not (a == 25)
False
[ back to top ]
Up to this point, we've explored some of the built-in types in Python and how to store values (i.e. variables) for later use. Now we'll look at using these building blocks to make a dynamic program.
[ back to top ]
Conditional statements are used to execute a piece of code based on if some condition is met. Let's look at some examples.
If statements are used to execute a piece of code if a condition is met. The basic structure of an if statement is shown below.
if condition :
# indented code block here
If statements start with the keyword if
, followed by the condition to be evaluated, then the line is ended with a colon. The block of code to be evaluated if the condition is met should be indented below the if statement. The condition here should be some expression that is either a boolean value, or can be cast to a boolean value. Let's look at some examples.
condition = True
if condition:
print('Condition is True')
Condition is True
condition = False
if condition:
print('Condition is True')
a = 10
if a < 20:
print('Condition is True')
b = 10
print(b)
Condition is True 10
Multiple conditions can be combined into a more complex condition using the and
/ or
keywords.
if condition1 and condition2 :
#code evaluated if both conditions are True
if condition1 or condition2 :
#code evaluated if at least one of the conditions are True
For example,
b = 5
c = 15
if b < 10 and c < 20:
print('Both conditions are True')
Both conditions are True
The or
keywords requires that at least one of the conditions be True
. For example, below the first condition is True
, but the second is False
.
if b < 10 or c < 10:
print('At least one condition is True')
At least one condition is True
Sometimes more complicated situations can arise in which you would like to have, depending on if a condition is met, different pieces of code run. This leads us to the if-else statement. If-else statements consist of an if
statement followed by a piece of code that will be executed if the if-statement condition is not met.
if condition :
# code for True condition
else:
# code for False condition
b = 4
if b == 5:
print('b is 5')
else:
print('b is not 5')
b is not 5
Sometimes your might like to have many if statements you would like to check
value = 10
if value < 10:
print('Value was less than 10')
elif value > 10:
print('Value was greater than 10')
else:
print('Value neither less than or greater than 10')
Value neither less than or greater than 10
x = 0.
if x == 0:
print(x, "is zero")
elif x > 0:
print(x, "is positive")
elif x < 0:
print(x, "is negative")
else:
print(x, "is unlike anything I've ever seen...")
0.0 is zero
[ back to top ]
Looping in Python is done via for
-loops and while
-loops
The basic syntax of a for
loop is shown below.
for iterating_var in iterable:
# code using iterating_var here
We've run into several iterables already: lists, tuples, strings, and dictionaries.
for item in [0, 1, 2, 3, 4, 5]:
print(item)
0 1 2 3 4 5
for item in (False, 3.2, 'this is a string'):
print(item)
False 3.2 this is a string
for letter in 'Python':
print(letter)
P y t h o n
Built-in range
function.
for item in range(3, 20):
print(item)
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
tup = (False, 3.2, 'this is a string')
for index, value in enumerate(tup):
print(index, value)
0 False 1 3.2 2 this is a string
k = {'MJ': 23, 'longitude': -53.2, 'city': 'Tokyo'}
for key in k:
print(key, k[key])
MJ 23 longitude -53.2 city Tokyo
# If using Python 2, use k.iteritems() instead of k.items()
for key, value in k.items():
print(key, value)
MJ 23 longitude -53.2 city Tokyo
Loop over code until condition is evaluated to False
while condition:
# code using iterating_var here
n = 0
while n < 10:
print(n)
n = n + 1
0 1 2 3 4 5 6 7 8 9
One way to create a list
is to use the append
method inside a for
loop
squared_list = []
for i in range(5):
value = i**2
squared_list.append(value)
print(squared_list)
[0, 1, 4, 9, 16]
While this approach gets the job done, it's a little too verbose. Python has another, less verbose, syntax for creating lists—list comprehensions
squared_list = [i**2 for i in range(5)]
print(squared_list)
[0, 1, 4, 9, 16]
[ back to top ]
Functions allow for the consolidation of several pieces of code into a single reusable object. The basic syntax for defining a function is shown below.
def function_name(some_input):
# Code utilizing input goes here
return some_output
def add(value1, value2):
total = value1 + value2
return total
print(add(4, 5))
9
print(add(1e3, -200))
800.0
print(add('Python', 'rules'))
Pythonrules
def change_zeroth_item_to_3(parameter):
parameter[0] = 3
return parameter
change_zeroth_item_to_3([1, 2, 3, 4, 5])
[3, 2, 3, 4, 5]
[ back to top ]
The NumPy (Numeric Python) package provides efficient routines for manipulating large arrays and matrices of numeric data. It contains among other things:
numpy.ndarray
)By convention, NumPy is usually imported via
import numpy as np
The fundamental datastructure that NumPy gives us the the ndarray (usually just called "array"). According to the NumPy documentation
An array object represents a multidimensional, homogeneous array of fixed-size items. An associated data-type object describes the format of each element in the array (its byte-order, how many bytes it occupies in memory, whether it is an integer, a floating point number, or something else, etc.)
Generally, think of an array as an (efficient) Python list with additional functionality. BUT keep in mind that there are a few important differences to be aware of. For instance, array object are homogenous—the values in an array must all be of the same type. Because arrays are stored in an unbroken block of memory, they need to be fixed size. While NumPy does support appending to arrays, this can become problematic for very large arrays.
array = np.array([1, 2, 3, 4, 5, 6])
print(array)
print(type(array))
[1 2 3 4 5 6] <class 'numpy.ndarray'>
The datatype of an array can be found using the dtype
array attribute
print(array.dtype)
int64
If not specified, NumPy will try to determine what dtype
you wanted based on the context. However, you can also manually specify the dtype
yourself.
array = np.array([1, 2, 3, 4, 5, 6], dtype=float)
print(array)
print(array.dtype)
[ 1. 2. 3. 4. 5. 6.] float64
Array attributes reflect information that is intrinsic to the array itself. For example, it's shape, the number of items in the array, or (as we've already seen) the item data types
array = np.array([[1, 2, 3],[4, 5, 6]], dtype=float)
print(array)
[[ 1. 2. 3.] [ 4. 5. 6.]]
print(array.shape)
print(array.size)
print(array.dtype)
(2, 3) 6 float64
In addition to array attributes, ndarray
s also have many methods that can be used to operate on an array.
print(array.sum()) # Sum of the values in the array
print(array.min()) # Minimum value in the array
print(array.max()) # Maximum value in the array
print(array.mean()) # Mean of the values in the array
print(array.cumsum()) # Cumulative sum at each index in the array
print(array.std()) # Standard deviation of the values in array
21.0 1.0 6.0 3.5 [ 1. 3. 6. 10. 15. 21.] 1.70782512766
M = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
print(M)
[[1 2 3] [4 5 6] [7 8 9]]
M.T
array([[1, 4, 7], [2, 5, 8], [3, 6, 9]])
M.diagonal()
array([1, 5, 9])
M.dot([1, 2, 3])
array([14, 32, 50])
M.trace()
15
To learn more about the motivation and need for something like Numpy, check out this great blog post Why Python is Slow: Looking Under the Hood.
array1 = np.array([1, 2, 3, 4])
array2 = np.array([5, 6, 7, 8])
print(array1 + array2)
[ 6 8 10 12]
2*array1
array([2, 4, 6, 8])
array1**2
array([ 1, 4, 9, 16])
Let's get an idea of how much NumPy speeds things up.
from IPython.display import Image
Image('./squares-list-creation.png')
Image('./sum-range.png')
[ back to top ]
import matplotlib.pyplot as plt
%matplotlib inline
Matplolib has several plotting capabilities. For example,
plot
— plotting x and y data pointserrorbar
— plotting x and y data points with errorbarshist
— plotting histogramshist2d
— plotting 2D histogramsmatshow
— display a matrixx = np.linspace(0, 4*np.pi, 100)
y = np.sin(x)
plt.plot(x, y);
x = np.linspace(0, 4*np.pi, 100)
y1 = np.sin(x)
y2 = np.cos(x)
fig, ax = plt.subplots()
ax.plot(x, y1)
ax.plot(x, y2)
plt.show()
fig, ax = plt.subplots()
ax.plot(x, y1, label='Sine')
ax.plot(x, y2, label='Cosine')
ax.legend()
plt.show()
fig, ax = plt.subplots()
ax.plot(x, y1, label='Sine')
ax.plot(x, y2, label='Cosine')
ax.set_xlabel('x')
ax.set_ylabel('f(x)')
ax.grid()
ax.legend(title='Functions')
plt.show()
fig, ax = plt.subplots()
ax.plot(x, y1, label='Sine')
ax.plot(x, y2, label='Cosine')
ax.fill_between(x, y2, y1, color='C2', alpha=0.25)
ax.set_xlabel('x')
ax.set_ylabel('f(x)')
ax.grid()
ax.legend(title='Functions')
plt.show()
[ back to top ]
I've discussed what you can do with Python, but not what you should do. Sometimes, two pieces of code can be written that accomplish the same goal but that take different amounts of time
These slides similar to Rob Morgan's Code complexity talk.
The simple one.
The cute one.
The smart one.
data = list(np.random.uniform(1, 10000, size=50000).astype(int))
%%timeit
sorted_data = merge_sort(data)
839 ms ± 20.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit -r 1
sorted_data = bubble_sort(data)
6min 11s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
%%timeit -r 1
sorted_data = insertion_sort(data)
4min 45s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
The three algorithms do the same thing to the same data, so what makes one faster than the other?
To answer this, let's look at the runtime as a function of the dataset size.
runtimes = {'MERGE': [], 'BUBBLE': [], 'INSERTION': [], 'SIZE': []}
for dataset_size in [10, 100, 1000, 10000]:
runtimes['SIZE'].append(dataset_size)
data = list(np.random.uniform(1, 10000, size=dataset_size).astype(int))
# Run and time insertion-sort
start = time.time()
sorted_data = insertion_sort(data.copy())
end = time.time()
runtimes['INSERTION'].append(end - start)
# Run and time bubble-sort
start = time.time()
sorted_data = bubble_sort(data.copy())
end = time.time()
runtimes['BUBBLE'].append(end - start)
# Run and time merge-sort
start = time.time()
sorted_data = merge_sort(data.copy())
end = time.time()
runtimes['MERGE'].append(end - start)
plot_runtimes(runtimes)
In computer science lingo, people use "Big O" notation to characterize the asymptotic behavior of an algorithm's efficiency.
As an example, let's revisit the bubble sort algorithm:
This is $\mathcal{O}(n^2)$ asymptotic behavior.
plot_runtimes(runtimes, annotate=True)
​ In your research, the number of times you will have to design an algorithm to sort a list will hopefully be zero. So why spend time talking about it?
The most common reason code runs slowly:
In the remainder of this talk, I'll show lots of little tricks for speeding up python
code, but overall the biggest factor in the efficiency of the code is the algorithmic design.
Let's take a look at how you can spot parts of your code that could use some TLC.
Let's say you have a function that for an input list returns the sum of the smallest n elements as the nth element in a new list.
def get_n_smallest(list_, n):
return sorted(list_)[0:n]
def get_sum(list_):
sum_ = 0
for element in list_:
sum_ += element
return sum_
def function(list_, return_output=False):
output_list = []
for n in range(len(list_)):
smallest_n = get_n_smallest(list_, n)
sum_smallest_n = get_sum(smallest_n)
output_list.append(sum_smallest_n)
if return_output:
return output_list
Let's generate some dummy data to run this function on
list_1 = list(np.random.uniform(1, 1000, size=500))
and try to spot the bottlenecks in this code when running on the list.
Profilers are your best friends.
%time
%timeit
%prun
%lprun
%%heat
%memit
%mprun
There are certainly more, but these are great.
%timeit function(list_1)
88.2 ms ± 2.09 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%time function(list_1)
CPU times: user 86.5 ms, sys: 1 ms, total: 87.5 ms Wall time: 85.9 ms
%prun function(list_1)
Image('./prun_output.png')
%memit function(list_1)
peak memory: 89.92 MiB, increment: 0.12 MiB
Next up is %mprun
%mprun
works on a file, not a cell, so we have to make a dummy script real quick.
%%file mprun_demo.py
import numpy as np
list_1 = list(np.random.uniform(1, 1000, size=500))
def get_n_smallest(list_, n):
return sorted(list_)[0:n]
def get_sum(list_):
sum = 0
for element in list_:
sum += element
return sum
def function(list_):
output_list = []
for n, element in enumerate(list_):
smallest_n = get_n_smallest(list_, n)
sum_smallest_n = get_sum(smallest_n)
output_list.append(sum_smallest_n)
Writing mprun_demo.py
from mprun_demo import function as demo_function
%mprun -f demo_function demo_function(list_1)
Image('./mprun_output.png')
%lprun -f function function(list_1)
Image('./lprun_output.png')
%%heat
import numpy as np
list_1 = list(np.random.uniform(1, 1000, size=500))
def get_n_smallest(list_, n):
return sorted(list_)[0:n]
def get_sum(list_):
sum = 0
for element in list_:
sum += element
return sum
def function(list_):
output_list = []
for n, element in enumerate(list_):
smallest_n = get_n_smallest(list_, n)
sum_smallest_n = get_sum(smallest_n)
output_list.append(sum_smallest_n)
function(list_1)
Image('./heatdemo.png')
If your code feels slow, profiling should be your first step.
Built-in profilers like %time
, %timeit
, %prun
, %lprun
, %%heat
, %memit
, and %mprun
can diagnose where your code is spending the most time and energy.
Re-designing the code to make these areas more algorithmically efficient is the best way to improve your code.