We need numpy, scipy, scipy.stats, pyplot, and json
import numpy as np
import scipy as sp
import scipy.stats
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt
import json
json.load
¶Signature: json.load
(fp, encoding=None, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw)
Docstring:
Deserialize fp
(a .read()
-supporting file-like object containing a JSON document) to a Python object.
json_contents = json.load(open("hese_toy_simulation.json", "r"))
simulation_mapping = json_contents["mapping"]
simulation_events = np.array(json_contents["events"])
del json_contents
sim_map = sorted(simulation_mapping.items(), key=lambda x: x[1])
sim_map
print simulation_events[:2]
[[(k, e[i]) for k, i in sim_map] for e in simulation_events[:2]]
json_contents = json.load(open("hese_toy_data.json", "r"))
data_mapping = json_contents["mapping"]
data_events = np.array(json_contents["events"])
del json_contents
data_map = sorted(data_mapping.items(), key=lambda x: x[1])
data_map
print data_events[:2]
[[(k, e[i]) for k, i in data_map] for e in data_events[:2]]
In this case the generation probability includes information other than just the spectrum. That information is accounted for in the stored weights.
So to weight our simulation, we only need to concern ourselves with the flux.
Recall that the weight is the new flux divided by the injected flux (times other weighting factors).
$w_i=w\frac{\Phi_{test}(E_i,\theta_i)}{\Phi_{generated}(E_i,\theta_i)}$
In this case however, we have separate astophysical and atmospheric weights, so we will have to reweight them separately.
$w_{i,{atmo}}=w_{atmo}\frac{\Phi_{test, {atmo}}(E_i,\theta_i)}{\Phi_{generated, {atmo}}(E_i,\theta_i)}$
$w_{i,{astro}}=w_{atmo}\frac{\Phi_{test, {atmo}}(E_i,\theta_i)}{\Phi_{generated, {astro}}(E_i,\theta_i)}$
$w_{i}=w_{i,{atmo}}+w_{i,{astro}}$
For the atmospheric component our injected flux is $1.0$ times the expectation
So $w_{i,{atmo}}=N_{atmo} \cdot w_{atmo}$
For the astrophysical component we are using a powerlaw spectrum, and our injected flux is $1.0\cdot \frac{E}{100TeV}^{-2.5}$
So $w_{i,{astro}}=N_{astro} \cdot \frac{\frac{E}{100TeV}^{-\gamma}}{\frac{E}{100TeV}^{-2.5}} \cdot w_{astro}$
Define a function called weight_event
below that takes as arguments (events
, atmo_norm
, astro_norm
, astro_gamma
) and returns the weights of those events.
Remember to use the energy of the primary neutrino to weight the spectrum.
def weight_event(event, atmo_norm, astro_norm, astro_gamma):
astroWeight = event[...,0]
atmoWeight = event[...,1]
primaryEnergy = event[...,3]
weight = (atmoWeight * atmo_norm
+ astroWeight * astro_norm * pow(primaryEnergy/1e5, -(astro_gamma-2.5)))
return weight
We want to bin across all three observables we have (energy, zenith, topology)
In energy we generally use logarithmic bins, because they are appropriate for characterizing the sharply falling powerlaw spectrum. Our fit energy range is from 60TeV to 10PeV because it covers the deposited energy of our sample well, and below 60TeV we have poor understanding of the background. 20 bins are used as a compromise between fine binning for a better fit and coarse binning because of limited simulation statistics.
energy_bins = np.logspace(np.log10(60.0e3), np.log10(1.0e7), 20+1)
In zenith we use bins that are evenly spaced in cos(theta) because it scales with the overburden and observed flux. In a similar compromise we have found 10 bins to be appropriate for zenith.
zenith_bins = np.arccos(np.linspace(-1, 1, 10+1))[::-1]
In topology there are two categories, so it is equivalent to just define two bins.
topology_bins = np.linspace(0, 2, 2+1)
energy_bins = np.logspace(np.log10(60.0e3), np.log10(1.0e7), 20+1)
zenith_bins = np.arccos(np.linspace(-1, 1, 10+1))[::-1]
topology_bins = np.linspace(0, 2, 2+1)
For our likelihood problem we need to bin both the data, and simulation.
However binning is expensive and we only want to do it once.
One easy way to manage this with numpy is to create masks for each bin.
So events_in_bin = all_events[bin_mask]
Define a function that produces a mask for each bin
def make_bin_masks(energies, zeniths, topologies,
energy_bins=energy_bins, zenith_bins=zenith_bins, topology_bins=topology_bins):
This function should return an array of masks.
np.digitize
¶Docstring: digitize(x, bins, right=False)
Return the indices of the bins to which each value in input array belongs.
Each index i
returned is such that bins[i-1] <= x < bins[i]
if
bins
is monotonically increasing, or bins[i-1] > x >= bins[i]
if
bins
is monotonically decreasing. If values in x
are beyond the
bounds of bins
, 0 or len(bins)
is returned as appropriate. If right
is True, then the right bin is closed so that the index i
is such
that bins[i-1] < x <= bins[i]
or bins[i-1] >= x > bins[i]
if bins
is monotonically increasing or decreasing, respectively.
def make_bin_masks(energies, zeniths, topologies,
energy_bins=energy_bins, zenith_bins=zenith_bins, topology_bins=topology_bins):
assert(len(energies) == len(zeniths))
assert(len(energies) == len(topologies))
n_energy_bins = len(energy_bins) - 1
n_zenith_bins = len(zenith_bins) - 1
n_topology_bins = len(topology_bins) - 1
energy_mapping = np.digitize(energies, bins=energy_bins) - 1
zenith_mapping = np.digitize(zeniths, bins=zenith_bins) - 1
topology_mapping = np.digitize(topologies, bins=topology_bins) - 1
bin_masks = []
for i in range(n_topology_bins):
for j in range(n_zenith_bins):
for k in range(n_energy_bins):
mask = topology_mapping == i
mask = np.logical_and(mask, zenith_mapping == j)
mask = np.logical_and(mask, energy_mapping == k)
bin_masks.append(mask)
return bin_masks
Using the function we just wrote, we can call it get our simulation bin masks, and the data bin masks
bin_masks = make_bin_masks(simulation_events[:,2], simulation_events[:,6], simulation_events[:,5])
data_masks = make_bin_masks(data_events[:,0], data_events[:,2], data_events[:,1])
We want a function that takes the data, simulation, and our physics parameters; that returns the negative log likelihood.
One important piece of this is the expectation from the simulation.
Let's first define the function that returns the expectation.
def get_expectation(events, masks, weighting):
all_weights = weighting(events)
weights = []
for mask in masks:
weight = np.sum(all_weights[mask])
weights.append(weight)
return np.array(weights)
$\mathcal{L}(\vec{\theta}|\vec{d})=\prod_i\frac{(\lambda_i)^{k_i}e^{-\lambda_i}}{k_i!}$
$\mathcal{L}(\vec{\theta}|\vec{d})=\prod_i\frac{(\lambda_i)^{k_i}e^{-\lambda_i}}{\Gamma [k_i+1]}$
$\log\mathcal{L}(\vec{\theta}|\vec{d})=\sum_i k_i \cdot \log(\lambda_i) - \lambda_i - \Gamma [k_i+1]$
def llh(data, simulation_events, atmo_norm, astro_norm, astro_gamma):
expect = get_expectation(simulation_events,
bin_masks,
lambda e: weight_event(e, atmo_norm, astro_norm, astro_gamma))
l = -(data * np.log(expect) - expect - sp.special.loggamma(data+1))
l[np.logical_and(data == 0, expect == 0)] = 0
l = np.sum(l).real
return l
# Test it
data = np.sum(np.array(data_masks), axis=1)
print llh(data, simulation_events, 1.0, 6.0, 2.9)
data = np.sum(np.array(data_masks), axis=1)
print data
res = sp.optimize.minimize(lambda x: llh(data, simulation_events, *x),
[1.0, 8.0, 2.5],
method='L-BFGS-B',
bounds=[[0,None],[0,None],[None,None]])
print res
print 'Atmospheric Normalization = ', res.x[0]
print 'Astrophysical Normalization = ', res.x[1]
print 'Astrophysical Gamma = ', res.x[2]