Introduction to Python¶

Alex Pizzuto, UW-Madison¶

IceCube Bootcamp, 2020¶

*Heavily influenced by James Bourbeau's previous Bootcamp talks

Overview¶

A few words on programming
Variables in Python
Survey of some built-in types
- Simple types
- Compound types
Type casting
Arithmetic operations
Comparison operations
Control flow
- Conditional statements
- Loops
Functions
NumPy
Matplotlib
Other things to check out
Time permitting: Complexity

This tutorial was heavily influenced by two sources:

Python documentation page
A Whirlwind Tour of Python by Jake VanderPlas (O’Reilly). Copyright 2016 O’Reilly Media, Inc., 978-1-491-96465-1.

Variables in Python¶

[ back to top ]

A variable is a name used to reference a value for later use. Think of a variable as a name pointing to some object. Assigning a value to a variables in Python is done with the assignment operator (=). For example, the following line assigns the value 2 to the the variable x.

In [2]:

x = 2

In [3]:

# Comments begin with the pound sign and continue to the end of the line
# The built-in print function will display a value on the screen
print(x)

Assigning variables to values is the primary way that data is stored and manipulated in Python code.

Variable names must start with a letter and can contain letters, numbers, and the underscore character ( _ ). The usual convention for variable names is to use lowercase letters with underscores to seperate words (e.g. my_favorite_number = 2). In addition, there are a number of keywords in Python that are used by the interpreter. So you'll want to avoid using these keywords as variable names. Namely,

False, None, True, and, as, assert, break, class, continue, def, del, elif, else,
except, finally, for, from, global, if, import, in, is, lambda, nonlocal, not, or,
pass, raise, return, try, while, with, yield

are keywords to avoid. Now that we know variables are names that reference a value, let's look at some of the types of values that we have in Python.

Survey of some built-in types¶

[ back to top ]

Every value in Python has a type associated with it (NOTE: If you ever want to know the type of something, you can use the built-in type function)

In [4]:

type(x)

Out[4]:

int

Different types have different allowed operations and functionality. For example, as we'll discuss later, the addition operator is defined for integer types

In [5]:

a = 1
b = 2
print(type(a))
print(type(b))
# Use addition operator between two integers
print(a + b)

<class 'int'>
<class 'int'>
3

and an uppercase method is defined for string types

In [6]:

c = 'Madison, WI'
print(type(c))
# Use the string uppercase method to make every character uppercase
print(c.upper())

<class 'str'>
MADISON, WI

The most-commonly used built-in types in Python can be grouped into two categories: simple and compound.

Simple types¶

[ back to top ]

The "simple" types consist of integers (int), floating-point numbers (float), complex numbers (complex), boolean values (bool), and the None type (NoneType).

Integers¶

Integers represent whole numbers ( ...,-2, -1, 0, 1, 2,...). Numbers without a decimal point or exponential notation produce integers.

In [7]:

a = 2
print(type(a))

<class 'int'>

Floating-point numbers¶

Floating-point numbers (often called "floats") represent real numbers. Number containing a decimal point or exponential notation are used to define floating-point values.

In [8]:

b = 1.5
print(type(b))

<class 'float'>

In [9]:

b.as_integer_ratio()

Out[9]:

(3, 2)

Exponential notation (e or E) is shorthand for scientific notation. E.g. 7e3 = $7 \times 10^3$

In [10]:

c = 7e3
print(c)
print(type(c))

7000.0
<class 'float'>

Complex numbers¶

A complex number can be created by including a 'j' or 'J' in a number. The corresponding real and imaginary parts of the complex number can be accessed using real and imag attributes.

In [11]:

z = 7 + 4.3j
print(type(z))

<class 'complex'>

In [12]:

z.real

Out[12]:

7.0

In [13]:

z.imag

Out[13]:

4.3

Note that the real and imaginary parts for the complex type are floating-point numbers—regardless of whether or not there is a decimal point.

In [14]:

print(type(z.real))

<class 'float'>

Booleans¶

Booleans can take on one of two possible values: True or False. Booleans will be utilized later when we discuss the conditional statements in Python.

In [15]:

n = True
print(type(n))

<class 'bool'>

In [16]:

p = False
print(type(p))

<class 'bool'>

Note that bool values are case sensitive—the first letter needs to be capitalized.

None type¶

The NoneType type represents just a single value: None. None is commonly used to represent un-initialized values. In addition, functions that don't have an explicit return value will implicity return None.

In [17]:

z = None
print(type(z))

<class 'NoneType'>

Compound types¶

[ back to top ]

In addition to int, float, complex, bool, and NoneType, Python also has several built-in data structures that are used as containers for other types. These "compound" types consist of lists (list), tuples (tuple), strings (str), sets (set), and dictionaries (dict).

Lists¶

A list is a ordered, mutable collection of data (data elements are called "items"). We'll discuss mutable vs. immutable objects momentarily. Lists are constructed using square brackets with list items seperated by commas.

In [18]:

d = [2.3, 5, -43, 74.7, 5]
print(type(d))

<class 'list'>

In [19]:

print(d)

[2.3, 5, -43, 74.7, 5]

Lists have lots of built-in functionality. For example, You can use the built-in len function to get the number of items in a list

In [20]:

len(d)

Out[20]:

The list append method can be used to add elements to a list

In [21]:

d.append(3.1415)
print(d)

[2.3, 5, -43, 74.7, 5, 3.1415]

The list sort method will sort the items in a list into ascending order

In [22]:

d.sort()
print(d)

[-43, 2.3, 3.1415, 5, 5, 74.7]

The list reverse method will reserve the order of a list

In [23]:

d.reverse()
print(d)

[74.7, 5, 5, 3.1415, 2.3, -43]

The list count method counts the number of times an item occurs in a list

In [24]:

# Counts how many times the item 5 occurs in the list d
d.count(5)

Out[24]:

Note that a list can contain any type of object. The items in a list need not be homogeneous. For example,

In [25]:

crazy_list = [1, False, 23.11, [1, 2, 3], None]
print(crazy_list)

[1, False, 23.11, [1, 2, 3], None]

The items in a list can be accessed using list indexing. Indexing a list consists of adding the item index in square brackets after a list. It's also important to note that in Python list indices begin with zero. So the first item in a list has the index 0, the second item has the index 1, and so on. For example

In [26]:

d = [2.3, 5, -43, 74.7, 5]

In [27]:

d[0]

Out[27]:

2.3

In [28]:

d[1]

Out[28]:

In [29]:

d[2]

Out[29]:

-43

Python also supports negative indexing. This has the effect of staring from the end of the list. So the last item in a list has an idex of -1, the second to last item has an index of -2, and so on.

In [30]:

d[-1]

Out[30]:

In [31]:

d[-2]

Out[31]:

74.7

In [32]:

d[-3]

Out[32]:

-43

In addition to indexing a list to get back list items, you can using slicing to get back a sub-list. The syntax for list slicing is given by

list_object[starting_index : stopping_index : index_step_size]

As an example we could use 0:3 which would give us all the elements starting with the zero index item and up to but not including the third index item.

In [33]:

print(d)
print(d[0:3]) # This will return the sub-list starting from the index 0 up to, but not including, the index 3

[2.3, 5, -43, 74.7, 5]
[2.3, 5, -43]

By default, the starting index is 0 (at the beginning of the list), the stopping index corresponds to the last item, and the step size is 1.

In [34]:

print(d)
print(d[:4]) 
print(d[1:])
print(d[::2])

[2.3, 5, -43, 74.7, 5]
[2.3, 5, -43, 74.7]
[5, -43, 74.7, 5]
[2.3, -43, 5]

In [35]:

print(d[:-4])

[2.3]

Tuples¶

Tuples are ordered, immutable collection of data. Tuples can be construced in a similar way as lists, but with parenthesis instead of square brackets.

In [36]:

f = (83.2, -4 ,5e7)
print(type(f))

<class 'tuple'>

One weird quirk is that a tuple with a single item needs to have a trailing comma, e.g.

In [37]:

f = (83.2,)
print(f)
print(type(f))

(83.2,)
<class 'tuple'>

If this trailing comma is left out, then python will assume you don't actually want a tuple and will assign whatever the single item is in parenthesis to your variable. For example,

In [38]:

f = (83.2)
print(f)
print(type(f))

83.2
<class 'float'>

The number of items in a tuple can be found using the built-in len function, and they support indexing and slicing similar to lists.

In [39]:

g = (1, 3.2, False, 222, None)
print(g)
print(len(g))
print(g[1:4])

(1, 3.2, False, 222, None)
5
(3.2, False, 222)

Mutable vs. immutable objects¶

Up to this point, it may seem like lists and tuples aren't any different. They are both containers that can hold items, you can access the items with an index, etc. How are these things different? One of the main differences between the list type and the tuple type is that lists are mutable, while tuples are immutable. Once created, the value of a mutable object can be changed, while immutable objects cannot be changed once created. Let's look at an example.

Let's create a list

In [40]:

g = [1, 2, 3, 4]

Now let's modify the list in place. That is, let's try to change the items in the list without creating a whole new list.

In [41]:

g[0] = 99
print(g)

[99, 2, 3, 4]

As you can see, there wasn't a problem here. We assigned to the variable g the list [1, 2, 3, 4], then modified the zeroth item in g to be the number 99. Let's try the same thing with a tuple now.

In [42]:

g = (1, 2, 3, 4)
print(g)

(1, 2, 3, 4)

In [43]:

g[0] = 99
print(g)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-43-79deaff0a4bf> in <module>()
----> 1 g[0] = 99
      2 print(g)

TypeError: 'tuple' object does not support item assignment

We got this error because tuples are immutable—they can't be modified once they're created.

Sets¶

A set is an unordered collection with no duplicate elements. They are constructed with comma separated items in curly brackets, {}.

In [44]:

s = {1, 2, 3, 4, 2, 2, 3, 1}
print(s)

{1, 2, 3, 4}

Set objects support mathematical operations like union, intersection, difference, and symmetric difference. Set unions are done with the | operator or using the set union method. For example,

In [45]:

s1 = {1, 2, 3, 4}
s2 = {3, 4, 5, 6}
print(s1 | s2)
print(s1.union(s2))

{1, 2, 3, 4, 5, 6}
{1, 2, 3, 4, 5, 6}

Set intersections are done with the & operator or using the set intersection method. For example,

In [46]:

print(s1 & s2)
print(s1.intersection(s2))

{3, 4}
{3, 4}

As always, the number of items in a set can be found using the len function.

In [47]:

len(s1)

Out[47]:

Strings¶

Strings are used to represent a sequence of characters. Strings can be created by enclosing characters in either single or double quotes.

In [48]:

g = 'pizza'
type(g)

Out[48]:

str

In [49]:

h = "jamesbond007"
type(h)

Out[49]:

str

Strings can also be index just like lists and tuples.

In [50]:

h[0]

Out[50]:

'j'

In [51]:

h[-4]

Out[51]:

'd'

In [52]:

h[3:6]

Out[52]:

'esb'

You can also find out how many characters are in a string using the len() function

In [53]:

len(h)

Out[53]:

Dictionaries¶

Dictionaries are unordered containers for key-value pairs. That is, dictionaries store a mapping for each key to an associated value. Dictionaries are created by placing comma-separated key-value pairs inside curly brackets {}. For a key-value pair, the key and corresponding value are seperated by a colon, :.

An example might help...

In [54]:

k = {'MJ': 23, 'longitude': -53.2, 'city': 'Tokyo'}
print(type(k))

<class 'dict'>

Here, the dictionary keys are 'key1', 'key2', and 'key3', with corresponding values of 23, -53.2, and 'Tokyo'. In a similar way to sequences, you can access the values in a dicionary by giving the corresponding key in square brackets.

In [55]:

k['MJ']

Out[55]:

In [56]:

k['longitude']

Out[56]:

-53.2

In [57]:

k['city']

Out[57]:

'Tokyo'

The keys in a dictionary can be obtained by using the dictionary keys method

In [58]:

k.keys()

Out[58]:

dict_keys(['MJ', 'longitude', 'city'])

The values in a dictionary can be obtained by using the dictionary values method

In [59]:

k.values()

Out[59]:

dict_values([23, -53.2, 'Tokyo'])

The size of a dictionary (the number of key-value pairs it contains) can be found with the built-in len() function.

In [60]:

len(k)

Out[60]:

It is important to note that in the previous example all the keys were strings, but this doesn't have to be the case. The only restriction on keys is that they be hashable. This means that keys must be an immutable type. For example, the following is also an acceptable dictionary.

In [61]:

m = {-23: [1, 2, 3, 4], 'desk': 3.2, 7.12: (-3, 'bird')}

In [62]:

m[-23]

Out[62]:

[1, 2, 3, 4]

In [63]:

m['desk']

Out[63]:

3.2

In [64]:

m[7.12]

Out[64]:

(-3, 'bird')

Let see what happens if I try to contruct a dictionary with a list (a mutable object) as a key

In [65]:

n = {[1, 2, 3]: 'WOO'}

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-65-484e2f71ab36> in <module>()
----> 1 n = {[1, 2, 3]: 'WOO'}

TypeError: unhashable type: 'list'

Whoops lists are mutable! So remember to always use immutable objects for dictionary keys!

Other types¶

Other useful types can be found in the collections module in the Python standard library.

namedtuple
Counter
OrderedDict
defaultdict

Type casting¶

[back to top]

Sometimes it can be useful to change the type of an object. In Python, this so-called "type-casting" can be accomplished using several built-in functions:

int()—casts to integer
float()—casts to float
str()—casts to string
bool()—casts to boolean

Let's see it in action

Casting integers to floats is fairly straight-forward

In [66]:

a = float(2)
print(a)
print(type(a))

2.0
<class 'float'>

When casting a float to an integer, Python will round down to the nearest integer

In [67]:

b = int(78.81)
print(b)
print(type(b))

78
<class 'int'>

You can even cast a number to a string! (This effectively just returns the number in quotes)

In [68]:

c = str(-1324.1)
print(c)
print(type(c))

-1324.1
<class 'str'>

Things get a little less straight-forward when casting to bool. Below are the bool casting rules:

Numbers: Zero of any numeric type, for example, 0, 0.0, 0+0j, cast to False. Everything else casts to True.
Lists: An empty list, [], will cast to False. Non-empty lists casts to True.
Tuples: An empty tuple, (), will cast to False. Non-empty tuples casts to True.
Strings: An empty string, '', will cast to False. Non-empty strings casts to True.
Dictionaries: The empty dictionary, {}, casts to False. Everything else casts to True.
NoneType: None casts to False.

Here are some examples:

In [69]:

bool(0)

Out[69]:

False

In [70]:

bool(-178.3)

Out[70]:

True

In [71]:

bool([])

Out[71]:

False

In [72]:

bool([1, 2, 3, 4, 5])

Out[72]:

True

In [73]:

bool({})

Out[73]:

False

In [74]:

bool({'key': 'value'})

Out[74]:

True

Arithmetic operations¶

[back to top]

The following operations are supported for several of the built-in types in Python:

Addition: +
Subtraction: -
Multiplication: *
Division: /
Floored division: //
Exponentiation: **

Some examples with numerical types...

In [75]:

1+1

Out[75]:

In [76]:

10-3

Out[76]:

In [77]:

3*5.0

Out[77]:

15.0

In [78]:

5/2

Out[78]:

2.5

In [79]:

5//3

Out[79]:

In [80]:

9.0**2

Out[80]:

81.0

When performing arithmetic operations, the type of numbers does matter. According to the Python Software Foundation:

Python fully supports mixed arithmetic: when a binary arithmetic operator has operands of different numeric types, the operand with the 'narrower' type is widened to that of the other, where integer is narrower than floating point, which is narrower than complex.

So, when you have a arithmetic operation with mixed numeric types, say adding an int and a float, the result will have the 'widest' type of the two numbers, in this case float. The convention is int is the most narrow type, float is wider than int, and complex is wider than float.

Some of these arithmetic operations are even defined for compound types. For example, list addition

In [81]:

list1 = [1, 2, 3, 4]
list2 = [5, 6]
summed_list = list1 + list2
print(summed_list)

[1, 2, 3, 4, 5, 6]

tuple addition

In [82]:

tup1 = (1, 2, 3, 4)
tup2 = (5, 6)
summed_tuple = tup1 + tup2
print(summed_tuple)

(1, 2, 3, 4, 5, 6)

and string addition are all defined

In [2]:

'My name is ' + 'Alex'

Out[2]:

'My name is Alex'

Comparison operations¶

[back to top]

In addition to using arithmetic operations to combine objects, it's also using to compare the value of objects as well. The comparison operators defined in Python are:

== (equal)
!= (not Equal)
< (less than)
> (greater than)
<= (less than or equal to)
>= (greater than or equal to)

A boolean value of either True or False will be returned appropreiately from a comparison operator. For example,

In [84]:

2 == 2

Out[84]:

True

In [85]:

1 > 0.5

Out[85]:

True

In [86]:

1 < 0.5

Out[86]:

False

Python also can handle comparing different types to one another. In particular, floats and integers are compared in a natural way

In [87]:

2 == 2.0

Out[87]:

True

Multiple comparison can also be made at once.

In [88]:

a = 25
print(15 < a < 30) # Checks whether or not a is greater than 15 and less thatn 30

True

Boolean values can also be combined using the and, or, or not keywords.

In [89]:

(a < 30) and (a > 15)

Out[89]:

True

In [90]:

(a < 30) and (a == 25)

Out[90]:

True

In [91]:

(a > 30) or (a == 25)

Out[91]:

True

In [92]:

(a > 30) or (a < 15)

Out[92]:

False

In [93]:

not (a == 25)

Out[93]:

False

Control flow¶

[ back to top ]

Up to this point, we've explored some of the built-in types in Python and how to store values (i.e. variables) for later use. Now we'll look at using these building blocks to make a dynamic program.

Conditional statements¶

[ back to top ]

Conditional statements are used to execute a piece of code based on if some condition is met. Let's look at some examples.

If statements¶

If statements are used to execute a piece of code if a condition is met. The basic structure of an if statement is shown below.

if condition :
    # indented code block here

If statements start with the keyword if, followed by the condition to be evaluated, then the line is ended with a colon. The block of code to be evaluated if the condition is met should be indented below the if statement. The condition here should be some expression that is either a boolean value, or can be cast to a boolean value. Let's look at some examples.

In [94]:

condition = True
if condition:
    print('Condition is True')

Condition is True

In [95]:

condition = False
if condition:
    print('Condition is True')

In [96]:

a = 10
if a < 20:
    print('Condition is True')
    b = 10
    print(b)

Condition is True
10

Multiple conditions can be combined into a more complex condition using the and / or keywords.

if condition1 and condition2 :
    #code evaluated if both conditions are True

if condition1 or condition2 :
    #code evaluated if at least one of the conditions are True

For example,

In [97]:

b = 5
c = 15
if b < 10 and c < 20:
    print('Both conditions are True')

Both conditions are True

The or keywords requires that at least one of the conditions be True. For example, below the first condition is True, but the second is False.

In [98]:

if b < 10 or c < 10:
    print('At least one condition is True')

At least one condition is True

If-else statements¶

Sometimes more complicated situations can arise in which you would like to have, depending on if a condition is met, different pieces of code run. This leads us to the if-else statement. If-else statements consist of an if statement followed by a piece of code that will be executed if the if-statement condition is not met.

if condition :
    # code for True condition
else:
    # code for False condition

In [99]:

b = 4
if b == 5:
    print('b is 5')
else:
    print('b is not 5')

b is not 5

Elif statements¶

Sometimes your might like to have many if statements you would like to check

In [100]:

value = 10
if value < 10:
    print('Value was less than 10')
elif value > 10:
    print('Value was greater than 10')
else:
    print('Value neither less than or greater than 10')

Value neither less than or greater than 10

In [101]:

x = 0.
if x == 0:
    print(x, "is zero")
elif x > 0:
    print(x, "is positive")
elif x < 0:
    print(x, "is negative")
else:
    print(x, "is unlike anything I've ever seen...")

0.0 is zero

Loops¶

[ back to top ]

Looping in Python is done via for-loops and while-loops

For loops¶

The basic syntax of a for loop is shown below.

for iterating_var in iterable:
    # code using iterating_var here

We've run into several iterables already: lists, tuples, strings, and dictionaries.

In [102]:

for item in [0, 1, 2, 3, 4, 5]:
    print(item)

In [103]:

for item in (False, 3.2, 'this is a string'):
    print(item)

False
3.2
this is a string

In [104]:

for letter in 'Python':
    print(letter)

P
y
t
h
o
n

Built-in range function.

In [105]:

for item in range(3, 20):
    print(item)

In [106]:

tup = (False, 3.2, 'this is a string')
for index, value in enumerate(tup):
    print(index, value)

0 False
1 3.2
2 this is a string

In [107]:

k = {'MJ': 23, 'longitude': -53.2, 'city': 'Tokyo'}
for key in k:
    print(key, k[key])

MJ 23
longitude -53.2
city Tokyo

In [108]:

# If using Python 2, use k.iteritems() instead of k.items()
for key, value in k.items():
    print(key, value)

MJ 23
longitude -53.2
city Tokyo

While loops¶

Loop over code until condition is evaluated to False

while condition:
    # code using iterating_var here

In [109]:

n = 0
while n < 10:
    print(n)
    n = n + 1

List comprehension¶

One way to create a list is to use the append method inside a for loop

In [110]:

squared_list = []
for i in range(5):
    value = i**2
    squared_list.append(value)
print(squared_list)

[0, 1, 4, 9, 16]

While this approach gets the job done, it's a little too verbose. Python has another, less verbose, syntax for creating lists—list comprehensions

In [111]:

squared_list = [i**2 for i in range(5)]
print(squared_list)

[0, 1, 4, 9, 16]

Functions¶

[ back to top ]

Functions allow for the consolidation of several pieces of code into a single reusable object. The basic syntax for defining a function is shown below.

def function_name(some_input):
    # Code utilizing input goes here
    return some_output

In [112]:

def add(value1, value2):
    total = value1 + value2
    return total

In [113]:

print(add(4, 5))

In [114]:

print(add(1e3, -200))

800.0

In [115]:

print(add('Python', 'rules'))

Pythonrules

In [116]:

def change_zeroth_item_to_3(parameter):
    parameter[0] = 3
    return parameter

In [117]:

change_zeroth_item_to_3([1, 2, 3, 4, 5])

Out[117]:

[3, 2, 3, 4, 5]

NumPy¶

[ back to top ]

The NumPy (Numeric Python) package provides efficient routines for manipulating large arrays and matrices of numeric data. It contains among other things:

A powerful N-dimensional array object (numpy.ndarray)
Broadcasting functions
Useful linear algebra, Fourier transform, and random number capabilities

By convention, NumPy is usually imported via

In [118]:

import numpy as np

The fundamental datastructure that NumPy gives us the the ndarray (usually just called "array"). According to the NumPy documentation

An array object represents a multidimensional, homogeneous array of fixed-size items. An associated data-type object describes the format of each element in the array (its byte-order, how many bytes it occupies in memory, whether it is an integer, a floating point number, or something else, etc.)

Generally, think of an array as an (efficient) Python list with additional functionality. BUT keep in mind that there are a few important differences to be aware of. For instance, array object are homogenous—the values in an array must all be of the same type. Because arrays are stored in an unbroken block of memory, they need to be fixed size. While NumPy does support appending to arrays, this can become problematic for very large arrays.

In [119]:

array = np.array([1, 2, 3, 4, 5, 6])
print(array)
print(type(array))

[1 2 3 4 5 6]
<class 'numpy.ndarray'>

The datatype of an array can be found using the dtype array attribute

In [120]:

print(array.dtype)

int64

If not specified, NumPy will try to determine what dtype you wanted based on the context. However, you can also manually specify the dtype yourself.

In [121]:

array = np.array([1, 2, 3, 4, 5, 6], dtype=float)
print(array)
print(array.dtype)

[ 1.  2.  3.  4.  5.  6.]
float64

Array attributes reflect information that is intrinsic to the array itself. For example, it's shape, the number of items in the array, or (as we've already seen) the item data types

In [122]:

array = np.array([[1, 2, 3],[4, 5, 6]], dtype=float)
print(array)

[[ 1.  2.  3.]
 [ 4.  5.  6.]]

In [123]:

print(array.shape)
print(array.size)
print(array.dtype)

(2, 3)
6
float64

In addition to array attributes, ndarrays also have many methods that can be used to operate on an array.

In [124]:

print(array.sum()) # Sum of the values in the array
print(array.min()) # Minimum value in the array
print(array.max()) # Maximum value in the array
print(array.mean()) # Mean of the values in the array
print(array.cumsum()) # Cumulative sum at each index in the array
print(array.std()) # Standard deviation of the values in array

21.0
1.0
6.0
3.5
[  1.   3.   6.  10.  15.  21.]
1.70782512766

In [125]:

M = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]])
print(M)

[[1 2 3]
 [4 5 6]
 [7 8 9]]

In [126]:

M.T

Out[126]:

array([[1, 4, 7],
       [2, 5, 8],
       [3, 6, 9]])

In [127]:

M.diagonal()

Out[127]:

array([1, 5, 9])

In [128]:

M.dot([1, 2, 3])

Out[128]:

array([14, 32, 50])

In [129]:

M.trace()

Out[129]:

To learn more about the motivation and need for something like Numpy, check out this great blog post Why Python is Slow: Looking Under the Hood.

In [130]:

array1 = np.array([1, 2, 3, 4])
array2 = np.array([5, 6, 7, 8])
print(array1 + array2)

[ 6  8 10 12]

In [131]:

2*array1

Out[131]:

array([2, 4, 6, 8])

In [132]:

array1**2

Out[132]:

array([ 1,  4,  9, 16])

Let's get an idea of how much NumPy speeds things up.

In [36]:

from IPython.display import Image
Image('./squares-list-creation.png')

Out[36]:

In [37]:

Image('./sum-range.png')

Out[37]:

Matplotlib¶

[ back to top ]

In [135]:

import matplotlib.pyplot as plt
%matplotlib inline

Matplolib has several plotting capabilities. For example,

plot — plotting x and y data points
errorbar — plotting x and y data points with errorbars
hist — plotting histograms
hist2d — plotting 2D histograms
matshow — display a matrix
etc...

In [136]:

x = np.linspace(0, 4*np.pi, 100)
y = np.sin(x)

In [137]:

plt.plot(x, y);

In [138]:

x = np.linspace(0, 4*np.pi, 100)
y1 = np.sin(x)
y2 = np.cos(x)

In [139]:

fig, ax = plt.subplots()
ax.plot(x, y1)
ax.plot(x, y2)
plt.show()

In [140]:

fig, ax = plt.subplots()
ax.plot(x, y1, label='Sine')
ax.plot(x, y2, label='Cosine')
ax.legend()
plt.show()

In [141]:

fig, ax = plt.subplots()
ax.plot(x, y1, label='Sine')
ax.plot(x, y2, label='Cosine')
ax.set_xlabel('x')
ax.set_ylabel('f(x)')
ax.grid()
ax.legend(title='Functions')
plt.show()

In [142]:

fig, ax = plt.subplots()
ax.plot(x, y1, label='Sine')
ax.plot(x, y2, label='Cosine')
ax.fill_between(x, y2, y1, color='C2', alpha=0.25)
ax.set_xlabel('x')
ax.set_ylabel('f(x)')
ax.grid()
ax.legend(title='Functions')
plt.show()

Other things to check out¶

[ back to top ]

Pandas — High-performance data analysis toolkit
Seaborn — Statistical data visualization using Matplotlib
Scikit-learn — Machine learning library
Jupyter — Document that combined code, markdown documentation, images, etc. Ideal for documenting an analysis.

Code Complexity¶

I've discussed what you can do with Python, but not what you should do. Sometimes, two pieces of code can be written that accomplish the same goal but that take different amounts of time

These slides similar to Rob Morgan's Code complexity talk.

Outline¶

Example: What do we mean by efficient?
Algorithm Design
Profiling Code

Example¶

Sorting a list of numbers¶

Believe it or not, there are dozens of different ways to sort a list of numbers. We'll look at three and compare their efficiencies.

Our Sorting Algorithms¶

Insertion Sort (the simple one)
Bubble Sort (the cute one)
Merge Sort (the smart one)

Insertion Sort¶

The simple one.

Start by iterating through an unsorted list
For each element,
1. traverse the list backwards until you find a smaller number
2. Put the element right after the found smaller number
Once you finish the iteration, the list will be sorted

Insertionsort

Bubble Sort¶

The cute one.

Iterate through the unsorted list backwards and track the smallest element
Place this as the first element
Repeat step 1
Place the result as the second element
...

bubblesort

Merge Sort¶

The smart one.

Split an unsorted list in half
For each half, split in half again and repeat this process
Then, to merge two adjacent halves, iterate through them simultaneously and place the elements in order

mergesort

Sorting Algorithms, Start Your Engines!¶

In [9]:

data = list(np.random.uniform(1, 10000, size=50000).astype(int))

In [10]:

%%timeit
sorted_data = merge_sort(data)

839 ms ± 20.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [12]:

%%timeit -r 1
sorted_data = bubble_sort(data)

6min 11s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

In [14]:

%%timeit -r 1
sorted_data = insertion_sort(data)

4min 45s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

Why so different?¶

The three algorithms do the same thing to the same data, so what makes one faster than the other?

To answer this, let's look at the runtime as a function of the dataset size.

In [15]:

runtimes = {'MERGE': [], 'BUBBLE': [], 'INSERTION': [], 'SIZE': []}

for dataset_size in [10, 100, 1000, 10000]:
    runtimes['SIZE'].append(dataset_size)
    data = list(np.random.uniform(1, 10000, size=dataset_size).astype(int))
    
    # Run and time insertion-sort
    start = time.time()
    sorted_data = insertion_sort(data.copy())
    end = time.time()
    runtimes['INSERTION'].append(end - start)
    
    # Run and time bubble-sort
    start = time.time()
    sorted_data = bubble_sort(data.copy())
    end = time.time()
    runtimes['BUBBLE'].append(end - start)
    
    # Run and time merge-sort
    start = time.time()
    sorted_data = merge_sort(data.copy())
    end = time.time()
    runtimes['MERGE'].append(end - start)

In [17]:

plot_runtimes(runtimes)

Computer Science Time¶

In computer science lingo, people use "Big O" notation to characterize the asymptotic behavior of an algorithm's efficiency.

As an example, let's revisit the bubble sort algorithm:

Iterate through the unsorted list backwards and track the smallest element
Place this as the first element
Repeat step 1
Place the result as the second element
...

This is $\mathcal{O}(n^2)$ asymptotic behavior.

It has to consider each of the $n$ elements, and for each element it has to compare to at most $n-1$ other elements.
$n$ $\times$ $(n-1)$ = $n^2 - n$ $\sim{n^2}$ for large $n$

In [18]:

plot_runtimes(runtimes, annotate=True)

Summary¶

In your research, the number of times you will have to design an algorithm to sort a list will hopefully be zero. So why spend time talking about it?

The most common reason code runs slowly:

Algorithmically the code is inefficient

In the remainder of this talk, I'll show lots of little tricks for speeding up python code, but overall the biggest factor in the efficiency of the code is the algorithmic design.

Profiling Code¶

Let's take a look at how you can spot parts of your code that could use some TLC.

Let's say you have a function that for an input list returns the sum of the smallest n elements as the nth element in a new list.

In [20]:

def get_n_smallest(list_, n):
    return sorted(list_)[0:n]

In [21]:

def get_sum(list_):
    sum_ = 0
    for element in list_:
        sum_ += element
    return sum_

In [22]:

def function(list_, return_output=False):
    output_list = []
    
    for n in range(len(list_)):
        smallest_n = get_n_smallest(list_, n)
        sum_smallest_n = get_sum(smallest_n)
        output_list.append(sum_smallest_n)

    if return_output:
        return output_list

Profiling Practice¶

Let's generate some dummy data to run this function on

In [23]:

list_1 = list(np.random.uniform(1, 1000, size=500))

and try to spot the bottlenecks in this code when running on the list.

Tools for spotting algorithmic inefficiencies¶

Profilers are your best friends.

%time
%timeit
%prun
%lprun
%%heat
%memit
%mprun

There are certainly more, but these are great.

Runtime profilers¶

In [24]:

%timeit function(list_1)

88.2 ms ± 2.09 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [25]:

%time function(list_1)

CPU times: user 86.5 ms, sys: 1 ms, total: 87.5 ms
Wall time: 85.9 ms

Functional profilers¶

In [33]:

%prun function(list_1)

In [38]:

Image('./prun_output.png')

Out[38]:

Memory Profilers¶

In [28]:

%memit function(list_1)

peak memory: 89.92 MiB, increment: 0.12 MiB

Next up is %mprun

%mprun works on a file, not a cell, so we have to make a dummy script real quick.

In [29]:

%%file mprun_demo.py

import numpy as np

list_1 = list(np.random.uniform(1, 1000, size=500))

def get_n_smallest(list_, n):
    return sorted(list_)[0:n]

def get_sum(list_):
    sum = 0
    for element in list_:
        sum += element
    return sum

def function(list_):
    output_list = []

    for n, element in enumerate(list_):
        smallest_n = get_n_smallest(list_, n)
        sum_smallest_n = get_sum(smallest_n)
        output_list.append(sum_smallest_n)

Writing mprun_demo.py

In [30]:

from mprun_demo import function as demo_function
%mprun -f demo_function demo_function(list_1)

In [39]:

Image('./mprun_output.png')

Out[39]:

Line Profilers¶

In [31]:

%lprun -f function function(list_1)

In [40]:

Image('./lprun_output.png')

Out[40]:

Fancy Line Profilers¶

%%heat 
import numpy as np

list_1 = list(np.random.uniform(1, 1000, size=500))

def get_n_smallest(list_, n):
    return sorted(list_)[0:n]

def get_sum(list_):
    sum = 0
    for element in list_:
        sum += element
    return sum

def function(list_):
    output_list = []

    for n, element in enumerate(list_):
        smallest_n = get_n_smallest(list_, n)
        sum_smallest_n = get_sum(smallest_n)
        output_list.append(sum_smallest_n)

function(list_1)

In [41]:

Image('./heatdemo.png')

Out[41]:

Profiling Summary¶

If your code feels slow, profiling should be your first step.

Built-in profilers like %time, %timeit, %prun, %lprun, %%heat, %memit, and %mprun can diagnose where your code is spending the most time and energy.

Re-designing the code to make these areas more algorithmically efficient is the best way to improve your code.

Thank you¶

All slides and materials on Github

In [ ]:

Introduction to Python¶

Alex Pizzuto, UW-Madison¶

IceCube Bootcamp, 2020¶

Overview¶

Variables in Python¶

Survey of some built-in types¶

Simple types¶

Integers¶

Floating-point numbers¶

Complex numbers¶

Booleans¶

None type¶

Compound types¶

Lists¶

Tuples¶

Mutable vs. immutable objects¶

Sets¶

Strings¶

Dictionaries¶

Other types¶

Type casting¶

Arithmetic operations¶

Comparison operations¶

Control flow¶

Conditional statements¶

If statements¶

If-else statements¶

Elif statements¶

Loops¶

For loops¶

While loops¶

List comprehension¶

Functions¶

NumPy¶

Matplotlib¶

Other things to check out¶

Code Complexity¶

Outline¶

Example¶

Sorting a list of numbers¶

Our Sorting Algorithms¶

Insertion Sort¶

Bubble Sort¶

Merge Sort¶

Sorting Algorithms, Start Your Engines!¶

Why so different?¶

Computer Science Time¶

Summary¶

Profiling Code¶

Profiling Practice¶

Tools for spotting algorithmic inefficiencies¶

Runtime profilers¶

Functional profilers¶

Memory Profilers¶

Line Profilers¶

Fancy Line Profilers¶

Profiling Summary¶

Other topics (beyond our scope)¶

Thank you¶