Diffuse Fit Solutions¶

Imports¶

We need numpy, scipy, scipy.stats, pyplot, and json

import numpy as np
import scipy as sp
import scipy.stats
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt
import json

`json.load`¶

Signature: json.load(fp, encoding=None, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw)
Docstring:
Deserialize fp (a .read()-supporting file-like object containing a JSON document) to a Python object.

Load the simulation¶

json_contents = json.load(open("hese_toy_simulation.json", "r"))
simulation_mapping = json_contents["mapping"]
simulation_events = np.array(json_contents["events"])
del json_contents

sim_map = sorted(simulation_mapping.items(), key=lambda x: x[1])
sim_map

[(u'astroWeight', 0),
 (u'atmoWeight', 1),
 (u'energy', 2),
 (u'primaryEnergy', 3),
 (u'primaryZenith', 4),
 (u'topology', 5),
 (u'zenith', 6)]

print simulation_events[:2]
[[(k, e[i]) for k, i in sim_map] for e in simulation_events[:2]]

[[1.455e-05 9.797e-06 1.007e+05 1.391e+05 1.581e+00 0.000e+00 1.500e+00]
 [7.289e-06 4.587e-07 3.191e+05 2.485e+05 7.607e-01 0.000e+00 9.098e-01]]

[[(u'astroWeight', 1.455e-05),
  (u'atmoWeight', 9.797e-06),
  (u'energy', 100700.0),
  (u'primaryEnergy', 139100.0),
  (u'primaryZenith', 1.581),
  (u'topology', 0.0),
  (u'zenith', 1.5)],
 [(u'astroWeight', 7.289e-06),
  (u'atmoWeight', 4.587e-07),
  (u'energy', 319100.0),
  (u'primaryEnergy', 248500.0),
  (u'primaryZenith', 0.7607),
  (u'topology', 0.0),
  (u'zenith', 0.9098)]]

Load the data¶

json_contents = json.load(open("hese_toy_data.json", "r"))
data_mapping = json_contents["mapping"]
data_events = np.array(json_contents["events"])
del json_contents

data_map = sorted(data_mapping.items(), key=lambda x: x[1])
data_map

[(u'energy', 0), (u'topology', 1), (u'zenith', 2)]

print data_events[:2]
[[(k, e[i]) for k, i in data_map] for e in data_events[:2]]

[[6.22225597e+04 0.00000000e+00 5.31069057e-01]
 [6.16774498e+04 0.00000000e+00 3.50297146e-01]]

[[(u'energy', 62222.55968669473),
  (u'topology', 0.0),
  (u'zenith', 0.5310690569408663)],
 [(u'energy', 61677.44976245666),
  (u'topology', 0.0),
  (u'zenith', 0.350297145700488)]]

Define event weighting¶

In this case the generation probability includes information other than just the spectrum. That information is accounted for in the stored weights.
So to weight our simulation, we only need to concern ourselves with the flux.
Recall that the weight is the new flux divided by the injected flux (times other weighting factors).

$w_i=w\frac{\Phi_{test}(E_i,\theta_i)}{\Phi_{generated}(E_i,\theta_i)}$
In this case however, we have separate astophysical and atmospheric weights, so we will have to reweight them separately.
$w_{i,{atmo}}=w_{atmo}\frac{\Phi_{test, {atmo}}(E_i,\theta_i)}{\Phi_{generated, {atmo}}(E_i,\theta_i)}$
$w_{i,{astro}}=w_{atmo}\frac{\Phi_{test, {atmo}}(E_i,\theta_i)}{\Phi_{generated, {astro}}(E_i,\theta_i)}$
$w_{i}=w_{i,{atmo}}+w_{i,{astro}}$
For the atmospheric component our injected flux is $1.0$ times the expectation
So $w_{i,{atmo}}=N_{atmo} \cdot w_{atmo}$
For the astrophysical component we are using a powerlaw spectrum, and our injected flux is $1.0\cdot \frac{E}{100TeV}^{-2.5}$
So $w_{i,{astro}}=N_{astro} \cdot \frac{\frac{E}{100TeV}^{-\gamma}}{\frac{E}{100TeV}^{-2.5}} \cdot w_{astro}$
Define a function called weight_event below that takes as arguments (events, atmo_norm, astro_norm, astro_gamma) and returns the weights of those events.
Remember to use the energy of the primary neutrino to weight the spectrum.

def weight_event(event, atmo_norm, astro_norm, astro_gamma):
    astroWeight = event[...,0]
    atmoWeight = event[...,1]
    primaryEnergy = event[...,3]
    
    weight = (atmoWeight * atmo_norm 
              + astroWeight * astro_norm * pow(primaryEnergy/1e5, -(astro_gamma-2.5)))
    
    return weight

Define the binning¶

We want to bin across all three observables we have (energy, zenith, topology)
In energy we generally use logarithmic bins, because they are appropriate for characterizing the sharply falling powerlaw spectrum. Our fit energy range is from 60TeV to 10PeV because it covers the deposited energy of our sample well, and below 60TeV we have poor understanding of the background. 20 bins are used as a compromise between fine binning for a better fit and coarse binning because of limited simulation statistics.
energy_bins = np.logspace(np.log10(60.0e3), np.log10(1.0e7), 20+1)
In zenith we use bins that are evenly spaced in cos(theta) because it scales with the overburden and observed flux. In a similar compromise we have found 10 bins to be appropriate for zenith.
zenith_bins = np.arccos(np.linspace(-1, 1, 10+1))[::-1]
In topology there are two categories, so it is equivalent to just define two bins.
topology_bins = np.linspace(0, 2, 2+1)

energy_bins = np.logspace(np.log10(60.0e3), np.log10(1.0e7), 20+1)
zenith_bins = np.arccos(np.linspace(-1, 1, 10+1))[::-1]
topology_bins = np.linspace(0, 2, 2+1)

Function for binning¶

For our likelihood problem we need to bin both the data, and simulation.
However binning is expensive and we only want to do it once.
One easy way to manage this with numpy is to create masks for each bin.
So events_in_bin = all_events[bin_mask]
Define a function that produces a mask for each bin
def make_bin_masks(energies, zeniths, topologies, energy_bins=energy_bins, zenith_bins=zenith_bins, topology_bins=topology_bins):
This function should return an array of masks.

`np.digitize`¶

Docstring: digitize(x, bins, right=False)

Return the indices of the bins to which each value in input array belongs.

Each index i returned is such that bins[i-1] <= x < bins[i] if bins is monotonically increasing, or bins[i-1] > x >= bins[i] if bins is monotonically decreasing. If values in x are beyond the bounds of bins, 0 or len(bins) is returned as appropriate. If right is True, then the right bin is closed so that the index i is such that bins[i-1] < x <= bins[i] or bins[i-1] >= x > bins[i] if bins is monotonically increasing or decreasing, respectively.

def make_bin_masks(energies, zeniths, topologies,
                   energy_bins=energy_bins, zenith_bins=zenith_bins, topology_bins=topology_bins):
    
    assert(len(energies) == len(zeniths))
    assert(len(energies) == len(topologies))
    
    n_energy_bins = len(energy_bins) - 1
    n_zenith_bins = len(zenith_bins) - 1
    n_topology_bins = len(topology_bins) - 1
    
    energy_mapping = np.digitize(energies, bins=energy_bins) - 1
    zenith_mapping = np.digitize(zeniths, bins=zenith_bins) - 1
    topology_mapping = np.digitize(topologies, bins=topology_bins) - 1
    bin_masks = []
    for i in range(n_topology_bins):
        for j in range(n_zenith_bins):
            for k in range(n_energy_bins):
                mask = topology_mapping == i
                mask = np.logical_and(mask, zenith_mapping == j)
                mask = np.logical_and(mask, energy_mapping == k)
                bin_masks.append(mask)
    return bin_masks

Define bin masks¶

Using the function we just wrote, we can call it get our simulation bin masks, and the data bin masks

bin_masks = make_bin_masks(simulation_events[:,2], simulation_events[:,6], simulation_events[:,5])
data_masks = make_bin_masks(data_events[:,0], data_events[:,2], data_events[:,1])

Define the likelihood¶

We want a function that takes the data, simulation, and our physics parameters; that returns the negative log likelihood.
One important piece of this is the expectation from the simulation.

Define expectation function¶

Let's first define the function that returns the expectation.

def get_expectation(events, masks, weighting):
    all_weights = weighting(events)
    weights = []
    for mask in masks:
        weight = np.sum(all_weights[mask])
        weights.append(weight)
    return np.array(weights)

Define likelihood function¶

$\mathcal{L}(\vec{\theta}|\vec{d})=\prod_i\frac{(\lambda_i)^{k_i}e^{-\lambda_i}}{k_i!}$
$\mathcal{L}(\vec{\theta}|\vec{d})=\prod_i\frac{(\lambda_i)^{k_i}e^{-\lambda_i}}{\Gamma [k_i+1]}$
$\log\mathcal{L}(\vec{\theta}|\vec{d})=\sum_i k_i \cdot \log(\lambda_i) - \lambda_i - \Gamma [k_i+1]$

def llh(data, simulation_events, atmo_norm, astro_norm, astro_gamma):
    expect = get_expectation(simulation_events,
                             bin_masks,
                             lambda e: weight_event(e, atmo_norm, astro_norm, astro_gamma))
    l = -(data * np.log(expect) - expect - sp.special.loggamma(data+1))
    l[np.logical_and(data == 0, expect == 0)] = 0
    l = np.sum(l).real
    return l

# Test it
data = np.sum(np.array(data_masks), axis=1)
print llh(data, simulation_events, 1.0, 6.0, 2.9)

175.72043054128267

/cvmfs/icecube.opensciencegrid.org/py2-v3/RHEL_6_x86_64/lib/python2.7/site-packages/ipykernel_launcher.py:5: RuntimeWarning: divide by zero encountered in log
  """
/cvmfs/icecube.opensciencegrid.org/py2-v3/RHEL_6_x86_64/lib/python2.7/site-packages/ipykernel_launcher.py:5: RuntimeWarning: invalid value encountered in multiply
  """

Get the binned data¶

data = np.sum(np.array(data_masks), axis=1)
print data

[2 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 3 0 0 0 0 1 0 0 0 0 0 0 0
 0 0 0 3 0 2 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 2 0 1 0 0 1 0 0 0 0 0
 0 0 0 0 0 0 0 0 2 1 1 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 2 1 4 1 2 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 0 0 1 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 3 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 3 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 2 3 2 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 6 5 2 3 1 0 0 0 1 1 0 0 1 0 0 0
 0 0 0 0 2 3 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
 1 0 0 0 0 0 0 1 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0]

Minimize the negative log likelihood¶

res = sp.optimize.minimize(lambda x: llh(data, simulation_events, *x),
                           [1.0, 8.0, 2.5],
                           method='L-BFGS-B',
                           bounds=[[0,None],[0,None],[None,None]])

/cvmfs/icecube.opensciencegrid.org/py2-v3/RHEL_6_x86_64/lib/python2.7/site-packages/ipykernel_launcher.py:5: RuntimeWarning: divide by zero encountered in log
  """
/cvmfs/icecube.opensciencegrid.org/py2-v3/RHEL_6_x86_64/lib/python2.7/site-packages/ipykernel_launcher.py:5: RuntimeWarning: invalid value encountered in multiply
  """

print res

      fun: 175.4846456750402
 hess_inv: <3x3 LbfgsInvHessProduct with dtype=float64>
      jac: array([ 5.11590770e-05,  1.70530257e-05, -1.08002496e-04])
  message: 'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'
     nfev: 64
      nit: 14
   status: 0
  success: True
        x: array([0.99639299, 6.7141779 , 2.99387421])

print 'Atmospheric Normalization = ', res.x[0]
print 'Astrophysical Normalization = ', res.x[1]
print 'Astrophysical Gamma = ', res.x[2]

Atmospheric Normalization =  0.9963929866115293
Astrophysical Normalization =  6.71417790255065
Astrophysical Gamma =  2.9938742121917112