HDF5 Booking and Reading Introduction

So far, we have worked with i3 files, which are very convenient for manipulating data. Hence you will use them during your event selection. However, when you have your final dataset and want to have a look at the results, there are some drawbacks:

The icetray framework provides file formats other than i3 files: hdf5 and root.
Here, we will cover hdf5 file booking and reading only.

In order to be able to handle hdf5 files in ipython or the notebook (e.g. for plotting your data), we need to have the python-tables package installed.
Entering ipython in your shell will prompt you with the interactive ipython shell:

In [1]:
import tables

If the package is not available, the above line will return an error. In this case you have to get the python-tables package.
If you are on the VM (or on any linux machine), exit ipython and enter: sudo apt-get install python-tables
Then try to import tables again in ipython.

Booking hdf5 files

The project that allows you to book flat tables like hdf5 tables is tableio.
Additionally, we will need the particular project that handles the hdf format: hdfwriter (if we wanted to write root files, we would need rootwriter correspondingly).
A minimal script that translates user-specified i3 file frame objects into flat hdf5 tables is this:

In []:
#!/usr/bin/env python

import sys

from icecube import icetray, dataio
from icecube import tableio, hdfwriter

from I3Tray import *

outfile = sys.argv[1]
infiles = sys.argv[2:]

tray = I3Tray()

tray.Add("I3Reader",FilenameList=infiles)

hdftable = hdfwriter.I3HDFTableService(outfile)

tray.Add(tableio.I3TableWriter,'hdf1',
         tableservice = hdftable,
         SubEventStreams = ['in_ice'],
         keys = ['I3EventHeader']
        )

tray.Execute(200)
tray.Finish()

First, we have to create the TableService we want to use, which is needed by the I3TableWriter module, provided by the tableio project.
We also have to tell the writer which SubEventStreams to use (we encountered in_ice and ice_top).
Lastly, the frame object we want to book have to be listed in keys.

Instead of booking particular frame objects via the keys parameter, we can also book all frame objects of a particular dataclasses type, using the type parameter:

In []:
tray.Add(tableio.I3TableWriter,'hdf2',
         tableservice = hdftable,
         SubEventStreams = ['in_ice'],
         types = [dataclasses.I3Particle]
        )

This will book all frame objects of type I3Particle. You can have both I3TableWriters in the same script as long as you give the modules different names (here, the ability to name a module comes in handy).

The hdfwriter project has a python function implemented that allows us to perform the hdf writing in a single module call ( http://code.icecube.wisc.edu/projects/icecube/browser/IceCube/projects/hdfwriter/releases/V15-03-00/python/init.py):

In []:
tray.Add(hdfwriter.I3HDFWriter, 'hdf',
    Output=outfile,
    CompressionLevel=9,
    SubEventStreams=['in_ice'],
    Keys=['I3EventHeader']
    )

Troubleshooting

In order to book a frame object, the writer needs to know how to translate it. This especially requires that the project, which contains this translator, has to be imported in your script. If this is not the case, icetray will throw errors at you in the shell, but the program will still execute and exit cleanly:

In []:
ERROR (icetray): frame caught exception "unregistered class" while loading class type "I3LogLikelihoodFitParams" at key "SPEFitSingleFitParams" (I3Frame.cxx:961 in I3FrameObjectConstPtr I3Frame::get_impl(__gnu_cxx::hash_map<I3Frame::hashed_str_t, boost::shared_ptr<I3Frame::value_t>, I3Frame::hashed_str_t_hash>::const_reference, bool) const)
ERROR (I3TableWriter): Frame object 'SPEFitSingleFitParams' could not be deserialized and will not be booked. (I3TableWriter.cxx:269 in void I3TableWriter::Convert(I3FramePtr))

Not all frame objects have converters to flat table objects implemented. These objects can not (yet) be booked in hdf5 files and you will be prompted an error:

In []:
FATAL (I3TableWriter): No converter found for 'I3MCTree' of type 'TreeBase::Tree<I3Particle, I3ParticleID, __gnu_cxx::hash<I3ParticleID> >' (I3TableWriter.cxx:96 in bool I3TableWriter::AddObject(std::string, std::string, I3ConverterPtr, I3FrameObjectConstPtr))

Make sure you have commas after every item in the list, otherwise they will not be booked and there is neither a warning nor an error.

Reading hdf5 files

hdf5 files can be conveniently read in ipython, the ipython notebook, or python scripts.

We need to import the python tables package:

In [1]:
import tables

Open the file you want to work with:

In [2]:
my_file = tables.openFile('/home/icecube/i3/scripts/test.h5')

Access objects of the file through starting at the "root". Tab completion comes in really handy here, as it shows you which objects (e.g. I3EventHeader) are available.
Objects might have several entries, each filling its own column.
Let's look at the layout of these columns:

In [3]:
my_file.root.I3EventHeader.coldescrs
Out[3]:
{'Event': UInt32Col(shape=(), dflt=0, pos=1),
 'Run': UInt32Col(shape=(), dflt=0, pos=0),
 'SubEvent': UInt32Col(shape=(), dflt=0, pos=2),
 'SubEventStream': EnumCol(enum=Enum({'in_ice': 0}), dflt='in_ice', base=Int32Atom(shape=(), dflt=0), shape=(), pos=3),
 'exists': UInt8Col(shape=(), dflt=0, pos=4),
 'time_end_mjd_day': Int32Col(shape=(), dflt=0, pos=10),
 'time_end_mjd_ns': Float64Col(shape=(), dflt=0.0, pos=12),
 'time_end_mjd_sec': Int32Col(shape=(), dflt=0, pos=11),
 'time_end_utc_daq': Int64Col(shape=(), dflt=0, pos=9),
 'time_start_mjd_day': Int32Col(shape=(), dflt=0, pos=6),
 'time_start_mjd_ns': Float64Col(shape=(), dflt=0.0, pos=8),
 'time_start_mjd_sec': Int32Col(shape=(), dflt=0, pos=7),
 'time_start_utc_daq': Int64Col(shape=(), dflt=0, pos=5)}

The description shows us the data type of the column (e.g. UInt32), its shape (a tuple), its default value and the position of the column in the table.

Let's print the event IDs stored in the I3EventHeader:

In [4]:
print "Event Nr", my_file.root.I3EventHeader.cols.Event[:10]
Event Nr [  8  17  50  88 134 139 171 174 176 200]

This returns a list of event IDs, which we can work with if we assign it to a variable: event_id = my_file.root.I3EventHeader.cols.Event[:]
We see that each event ID only show sup once, as the EventHeader summarizes full events.
What happens if we print the content of OfflinePulses?

In [5]:
print "Charge", my_file.root.OfflinePulses.cols.charge[:100]
print "Event Nr", my_file.root.OfflinePulses.cols.Event[:100]
Charge [ 1.35137665  0.20345904  1.09731948  1.10716295  0.69570839  0.92778438
  0.82213461  1.04032433  1.82728612  0.84789002  1.05517828  1.7209003
  0.20389257  1.35772443  0.24323043  2.22634315  1.14647591  0.54103339
  1.17172372  0.84130961  1.76673746  0.87586147  1.13768351  1.11165977
  1.20190227  1.28818488  1.57756162  0.80558932  1.12772369  1.45038748
  1.03836918  0.76992059  0.36314341  0.49225807  0.91379881  0.59540397
  1.10084546  1.75127184  0.86143094  1.79990578  0.23149005  1.28169525
  0.33682501  0.90260136  1.08219874  1.38436592  0.86232346  1.21418834
  0.58640456  2.26622462  0.37661442  0.68322712  0.68813187  0.71307403
  0.22721849  0.49727988  0.78948653  1.37049973  0.45685816  1.91966438
  0.29205245  1.70578563  0.8885203   0.88749593  1.11761069  0.24877355
  0.52028656  0.33366835  0.27590105  1.32737339  0.32543108  1.03614211
  0.79824936  1.19477344  0.79732072  0.11996015  1.1589123   0.96281499
  1.17673051  0.260809    1.28910756  0.95523918  0.964607    1.14976239
  1.30370128  1.16178238  0.93663317  0.96692258  1.71367645  1.99491096
  1.22080123  1.25864661  0.88489348  0.25703129  1.89138329  1.3239826
  1.36886334  1.24800789  0.43490472  0.28742069]
Event Nr [ 8  8  8  8  8  8  8  8  8  8  8  8  8  8  8  8  8  8  8  8  8  8  8  8  8
  8  8  8  8  8  8  8  8  8  8  8  8  8  8  8  8  8  8  8  8  8  8  8  8  8
  8  8  8  8  8  8  8  8  8  8  8  8  8  8  8  8  8  8  8  8  8  8  8  8  8
  8  8  8  8  8  8  8  8  8  8  8  8  8  8  8  8  8  8  8  8  8  8 17 17 17]

Each entry in OfflinePulses corresponds to a single DOM hit, i.e. there are many entries for the same event ID.
This is important to consider when combining different table entries.
Most likely we are more interested in the total charge of the event. It is in principle possible to sum the charges here, but it is much more convenient to do that already in the script that produced the hdf5 file:

In [6]:
print "Charge", my_file.root.TotalCharge.cols.value[:10]
print "Event Nr", my_file.root.TotalCharge.cols.Event[:10]
Charge [  96.39882559   49.00671897   42.70347497  167.44731747  102.08684008
  282.60557214   83.33017394   71.25939269   50.16582267   56.34283438]
Event Nr [  8  17  50  88 134 139 171 174 176 200]

In [7]:
%pylab inline
import pylab

pylab.figure()

binwidth = (numpy.log10(6000.)-numpy.log10(1500.))/4. # 4 bins between 1500 and 6000 pe
numbins = 15
edges = 10.**numpy.linspace( numpy.log10(10), numpy.log10(10)+binwidth*numbins, numbins+1 )

charge = my_file.root.TotalCharge.cols.value[:]
event_id = my_file.root.TotalCharge.cols.Event[:]
cut = (charge[:]>100) & (event_id[:]>200)

#my_hist = pylab.hist(charge, bins = edges, log = True, histtype='stepfilled', edgecolor = "b", color = 'b', alpha = 0.5, label = "atm E-2")
my_hist = pylab.hist(charge[cut], bins = edges, log = True, histtype='stepfilled', edgecolor = "b", color = 'b', alpha = 0.5, label = "atm E-2")

pylab.loglog()
pylab.xlim([10,2000])
pylab.ylim([1,300])

print "Total number of events shown:", sum(my_hist[0])

pylab.ylabel("counts")
pylab.xlabel("total charge (PE)")
Populating the interactive namespace from numpy and matplotlib
Total number of events shown: 127.0

Out[7]:
<matplotlib.text.Text at 0x7f3790d29e10>