# Advanced Python: Helpful Packages

Created for Bootcamp 2021, Kayla Leonard DeHolton

This document will cover helpful packages that you may need to use at some point. It is not meat to be a comprehensive tutorial on every package, but rather to make you aware of different packages and their features so you can look up more as needed later.

## Table of Contents

* [Computation](#computation)
    * [NumPy Arrays](#numpyarrays)
    * [More Numpy](#morenumpy)
    * [SciPy](#scipy)
* [File System Interfaces](#files)
    * [OS](#os)
    * [GLOB](#glob)
* [Data Frame & Storage](#dataframe)
    * [HDF5](#hdf5)
    * [Pandas](#pandas)
* [Additional Packages](#Additional)
    * [Scikit-learn](#scikit)
    * [Healpy](#healpy)
    * [Astropy](#astropy)
    * [Numba](#numba)

***

## Computation <a class="anchor" id="computation"></a>

### NumPy Arrays <a class="anchor" id="numpyarrays"></a>

Powerful tool for fast calculations. "Numerical Python".

In [1]:
import numpy as np

Numpy arrays are similar to lists but they are more convenient for performing actions on.

In [2]:
l = [1,2,3,4,5]
print('List:\n',l)
#print(f"List: {l}")

a = np.array(l)
#a = np.array(l, dtype=float)
print('Array:\n',a)

List:
 [1, 2, 3, 4, 5]
Array:
 [1 2 3 4 5]


Now let's square all the elements. Lists require list comprehension. Arrays can be acted on directly ("element-wise").

In [3]:
[i**2 for i in l]

[1, 4, 9, 16, 25]

In [4]:
a**2

array([ 1,  4,  9, 16, 25])

Numpy arrays become much faster than lists when the list/array is very very long.

In [5]:
import timeit

print('For 10 items in list:')
print('Array: %.6f s'%timeit.timeit('np.array(l)**2', 'import numpy as np \nl=range(10)', number=1000))
print('List:  %.6f s'%timeit.timeit('[i**2 for i in l]', 'import numpy as np \nl=range(10)', number=1000))

print('\nFor 100 items in list:')
print('Array: %.6f s'%timeit.timeit('np.array(l)**2', 'import numpy as np \nl=range(100)', number=1000))
print('List:  %.6f s'%timeit.timeit('[i**2 for i in l]', 'import numpy as np \nl=range(100)', number=1000))

print('\nFor 1000 items in list:')
print('Array: %.6f s'%timeit.timeit('np.array(l)**2', 'import numpy as np \nl=range(1000)', number=1000))
print('List:  %.6f s'%timeit.timeit('[i**2 for i in l]', 'import numpy as np \nl=range(1000)', number=1000))

print('\nFor 10000 items in list:')
print('Array: %.6f s'%timeit.timeit('np.array(l)**2', 'import numpy as np \nl=range(10000)', number=1000))
print('List:  %.6f s'%timeit.timeit('[i**2 for i in l]', 'import numpy as np \nl=range(10000)', number=1000))

For 10 items in list:
Array: 0.006663 s
List:  0.003213 s

For 100 items in list:
Array: 0.015233 s
List:  0.028766 s

For 1000 items in list:
Array: 0.094877 s
List:  0.251584 s

For 10000 items in list:
Array: 0.908515 s
List:  2.192099 s


Multidimensional arrays. Another example of the benefit of element-wise operations to avoid nested for loops and additional complexity.

In [6]:
l = [[1,2,3],[4,5,6],[7,8,9]]
print(l)
a = np.array(l)
print(a)

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
[[1 2 3]
 [4 5 6]
 [7 8 9]]


In [7]:
[[i**2 for i in j] for j in l]

[[1, 4, 9], [16, 25, 36], [49, 64, 81]]

In [8]:
a**2

array([[ 1,  4,  9],
       [16, 25, 36],
       [49, 64, 81]])

### More NumPy: Geometry, Algrebra, Random Number Generators <a class="anchor" id="morenumpy"></a>

Numpy also contains helpful math functions and constants.

In [10]:
print(np.pi)
#import math 
#print(math.pi)

3.141592653589793


Even operations like sin and cos can be applied elementwise on arrays.

In [18]:
angles = np.array([0,np.pi/6,np.pi/4,np.pi/2])
print(angles)
#print(np.rad2deg(angles))

[0.         0.52359878 0.78539816 1.57079633]


In [19]:
np.sin(angles)

array([0.        , 0.5       , 0.70710678, 1.        ])

Square root function

In [20]:
np.sqrt([1,4,9,16,1000])

array([ 1.       ,  2.       ,  3.       ,  4.       , 31.6227766])

Uniform distribution between 0 and 1

In [24]:
np.random.random_sample(size=10)
#np.random.random_sample(size=10)*10+40

array([0.33146286, 0.12201067, 0.41177215, 0.00382125, 0.36703448,
       0.00926243, 0.76561443, 0.20049812, 0.316484  , 0.63636602])

Standard normal distribution (Gaussian centered at 0 with a width of 1)

In [30]:
np.random.randn(10)
#np.random.normal(0,10,100)  # mean, sigma, nsamples 

array([-1.15625891, -0.06313289,  0.09931651, -0.86667169,  1.37849363,
       -0.31757699,  0.18746101,  0.79815169,  0.56538734,  1.17740464])

Very useful way to create evenly spaced arrays, start and stop inclusive

In [34]:
np.linspace(0,1,11)
#np.rad2deg(np.linspace(0, np.pi, 100))

array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ])

Similar function for getting points evenly spaced in log10. (The inputs here are the exponent 10^n).

In [37]:
np.logspace(0,1,11)
#np.logspace(9, 13, 10)

array([ 1.        ,  1.25892541,  1.58489319,  1.99526231,  2.51188643,
        3.16227766,  3.98107171,  5.01187234,  6.30957344,  7.94328235,
       10.        ])

### SciPy <a class="anchor" id="scipy"></a>

In [38]:
import scipy

Let's define function x^2

In [39]:
def f(x):
    return x**2
    #return (x-1)**2

In [40]:
# don't worry about plotting for now, this is just to visualize our function

import matplotlib.pyplot as plt

xs = np.linspace(-5,5,100)
plt.plot(xs,f(xs))
plt.xlabel('x',fontsize=14)
plt.ylabel('f(x)',fontsize=14)
plt.show()

<Figure size 640x480 with 1 Axes>

**Integration.** https://docs.scipy.org/doc/scipy/reference/integrate.html

Quad is for general purpose definite integrals. Returns the value of the integral along with an uncertainty.

In [41]:
from scipy import integrate

integrate.quad(f, 0, 4) 

(21.333333333333336, 2.368475785867001e-13)

**Optimization** https://docs.scipy.org/doc/scipy/reference/optimize.html

Find the minimum of a function. It can only find the minimum with limited computational precision which is why it returns a value (x) close to, but not exactly 0.

In [42]:
from scipy import optimize

optimize.minimize_scalar(f)
#optimize.minimize(f,1)

     fun: 0.0
    nfev: 8
     nit: 4
 success: True
       x: 0.0

***

## File System Interfaces <a class="anchor" id="files"></a>

### OS <a class="anchor" id="os"></a>

Allows you to interact to with the file system somewhat analagous to the command line.

In [43]:
import os

Make a directory.

In [44]:
os.mkdir('test_directory')

FileExistsError: [Errno 17] File exists: 'test_directory'

Get a list containing all files in a directory.

In [45]:
os.listdir('example_directory')

['test_file_1.txt',
 'test_file_2.txt',
 'test_file_3.txt',
 'test_file_4.txt',
 'test_file_5.txt',
 'README.txt']

### GLOB <a class="anchor" id="glob"></a>

Allows you to search for files using a wildcard * . Helpful for performing an action on many files.

In [64]:
import glob

Let's say we want to perform an action on all of our test files, but not the README.

In [65]:
os.listdir('example_directory')

['test_file_1.txt',
 'test_file_2.txt',
 'test_file_3.txt',
 'test_file_4.txt',
 'test_file_5.txt',
 'test_file_11.txt',
 'README.txt']

We could manually check that each file name is not README.txt, but that can be clunky. And in a very large directory, what if there's another odd file we didn't know about.

In [66]:
for filename in os.listdir('example_directory'):
    if filename!='README.txt':
        print(filename)
        f = open('example_directory/'+filename)

test_file_1.txt
test_file_2.txt
test_file_3.txt
test_file_4.txt
test_file_5.txt
test_file_11.txt


Or we can loop through all of our files, but what if a file is missing?

In [67]:
for i in range(1,6):
    f = open('example_directory/test_file_'+str(i)+'.txt')

Instead we can use glob which allows us to use a wildcard to search.

In [70]:
glob.glob('example_directory/test_file_*.txt')
#flist = glob.glob('example_directory/test_file_*.txt')
#flist.sort()
#print(flist)

['example_directory/test_file_1.txt',
 'example_directory/test_file_2.txt',
 'example_directory/test_file_3.txt',
 'example_directory/test_file_4.txt',
 'example_directory/test_file_5.txt',
 'example_directory/test_file_11.txt']

In [71]:
for filename in glob.glob('example_directory/test_file_*.txt'):
    f = open(filename)

Note: Sometimes glob returns items out of order, and you may need to apply the sorted() function to put them in numerical order.

***

## Data Frame & Storage <a class="anchor" id="dataframe"></a>

These are common ways to use and store data sets. If you come across these types, these may be helpful references to familiarize yourself with.

### HDF5 <a class="anchor" id="hdf5"></a>

https://docs.h5py.org/en/stable/quick.html#quick

The python interface to read/write to hdf5 files is called h5py.

In [72]:
import h5py

### Pandas <a class="anchor" id="pandas"></a>

https://pandas.pydata.org/pandas-docs/stable/user_guide/10min.html

In [73]:
import pandas as pd

***

## Additional Packages <a class="anchor" id="additional"></a>

### Scikit-learn <a class="anchor" id="scikit"></a>

Machine learning framework. Built on NumPy, SciPy, and matplotlib.

https://scikit-learn.org/

### Healpy <a class="anchor" id="healpy"></a>

Useful for plotting sky maps.

https://healpy.readthedocs.io/

### AstroPy <a class="anchor" id="astropy"></a>

Conveneint package for working with astronomical coordinates, etc.

https://docs.astropy.org/

### Numba <a class="anchor" id="numba"></a>

A "just-in-time" python compiler to improve the speed at which your code runs.

https://numba.pydata.org/numba-doc/dev/user/5minguide.html