# Workshop on Machine Learning for Cosmic-Ray Air Showers

US/Eastern
Embassy Suites by Hilton Newark Wilmington South

#### Embassy Suites by Hilton Newark Wilmington South

Talks and breaks will be in the hotel. Dinner (Wednesday evening): Gelato Restaurant, Main Street Everything is in Newark DE (= Delaware; do not go to any other Newark elsewhere).
, , , ,
Description

Scope of the Workshop:

Machine Learning for the study of air showers is a raising topic, which meanwhile has reached all detection techniques: be it a higher precision for the reconstruction of particle-detector arrays, lowering the threshold or radio arrays, or improving the gamma-hadron separation. This workshop aims at bringing those together who actively work on machine learning in the context of air showers initiated by any type of primary particles (e.g., cosmic rays, gamma rays, or neutrinos). On purpose, we plan for a small workshop which allows for longer talks, plenty of question time, and extended discussions over coffee and lunch break. All machine-learning methods are welcome, be it boosted decision trees, or any type of neural networks. In addition to contributed talks, we plan for invited highlight talks on recent successful applications in our field as well as machine-learning science and prevailing hardware and software techniques.

Zoom Link (passcode will be sent to participants)

https://udel.zoom.us/j/93931346766

Hybrid Mode: on site + online

Since the small workshop of the type we plan will benefit from all the informal discussions in the breaks and after the sessions, we prefer to have most participants attending on site. Nonetheless, we recognize that travelling may still be a challenge for some participants, which is why we plan for the option of online attendance at the talks.

Talks:

We will have a number of invited and contributed talks. To allow for plenty of time for discussion, talks should be planned including 10 minutes of questions, so 35+10 or 20+10. Please include in your presentations not simply the results, but also explain how you have achieved them, where have been problems, and what open questions are. We also have a few flash talks for students of 10+5 min duration.

Proceedings:

We plan to make abstracts and materials (e.g., PDF of any slides shown) available via this indico during the conference and afterwards upload them on Zenodo free of charge to make them available open access and to provide a citable DOI. By submitting your abstract, you agree to the Zenodo terms. If you do not want to share all of the materials shown, just upload a reduced version of the PDF instead containing only the publishable material.

There will be no written proceedings in a classical sense.

https://zenodo.org/communities/ml-airshowers-bartol2022/

Location (changed):

Embassy Suites by Hilton Newark Wilmington South

654 S College Ave, Newark, DE, US

(Do to recent Covid restrictions, the workshop has moved to the Embassy Suites hotel which is close to the University of Delaware stadium.)

Travel:

Recommended airport is Philadelphia (PHL), but other airports are possible, in particular, if you rent a car to come to Newark DE (not Newark NJ).

From PHL you can take an airport shuttle:

Delaware Express Shuttle https://delexpress.com/reservations ; use Group Discount code 114906

Further travel information is available here: http://www.udel.edu/visitus/

Hotel:

You are responsible for your room reservation / cancellations.

Here is a link to the new conference hotel. You may be able to get better deals by thirdparty websites.

https://www.hilton.com/en/hotels/newdees-embassy-suites-newark-wilmington-south/

The "Baymont by Wyndham Newark I-95 at University of Delaware" is a cheaper hotel option right next to the conference hotel.

If you stay somewhere else (e.g., at the Marriott): because the workshop is not held on campus, you are not allowed to use the UD shuttle which is free of charge and has a stop at the stadium close to the hotel.

Dining:

There are limited options available at and close by the conference hotel. More options are available downtown at Main Street (see bottom right).

Conference Fee:

We plan to charge a small fee of 100 USD for participation on site and of 50 USD for remote participation to cover local and administrative cost (sorry, no reimbursements possible).

Pay here: http://www.udel.edu/machinelearning

Cancellation Policy:

Cancellations before 31 October will receive a full refund. We cannot guarantuee any refund for later cancellations. We are not responsible for any cancellation fee that may be charged by the hotel.

Covid Rules:

For the workshop you have to respect all Covid rules valid in Newark, Delaware and additional rules set by the University of Delaware. Currently these rules include among other measures:

• Vaccination mandate including a booster for those eligible (or a PCR test within taken within the last 72 hours; a delayed test result is your risk and would exclude you from any in-person events).
• Fill a daily health questionaire and only participate in the workshop if you get cleared: http://covidcheck.udel.edu

Contact:

Feel free to reach out to the organisation committee if you have any questions.

Surveys
After Workshop Survey
Registration
On-site participation
Remote participation
Participants
• Abdul Rehman
• Achara Seripienlert
• Agnieszka Leszczyńska
• Alan Coleman
• Alexander Novikov
• Andreas Haungs
• Anjana Kaushik Talluri
• Aurélien Benoit-Lévy
• Beatriz de Errico
• Benedetta Bruno
• Benjamin Flaggs
• Brian Humensky
• Cyrin Neeraj
• Dana Kullgren
• Daniel Nieto
• Daniil Reutsky
• Danish Farooq Meer
• Dave Seckel
• David Williams
• Dennis Soldin
• Diana Leon
• Dmitriy Kostunin
• Dmitry Malyshev
• Donghwa Kang
• Eduardo Moreno
• Ek Narayan Paudel
• Eli Kasai
• Eliza Gazda
• Eric Mayotte
• Ernesto Belmont
• Ezequiel Rodriguez
• Federico Bontempo
• Felipe Orozco
• Felix Schlüter
• Frank McNally
• Frank Schroeder
• Gašper Kukec Mezek
• George Filippatos
• Gopal Bhatta
• Gregory Foote
• Hao Zhou
• Ibrahim Torres
• Igor Romanov
• Ilona Zubrytska
• Ioan Maris
• Isabella Brewer
• Ivan Kharuk
• Jakub Juryšek
• Jamie Zvirzdin
• Jamie Zvirzdin
• Jeffrey Lazar
• jigar bhanderi
• Johannes Eser
• Jonas Glombitza
• Juan Carlos Díaz Vélez
• Juan Miguel Carceller
• Julian Saffer
• Kevin Almeida Cheminant
• Krishna Kumar
• Larissa Paul
• Leonel Morejon
• Logan Molchany
• Lucy Fortson
• Mahdi Bagheri
• Marcos Santander
• Margarita Tsobenko
• Marimuthu N
• Markus Roth
• Martin Liu
• Matthias Plum
• Mauricio Suarez-Duran
• Meghan Tanner
• Mikhail Kuznetsov
• Mikhail Zotov
• Mirco Huennefeld
• Najia Moureen Binte Amin
• Nataliia Borodai
• Nicolas Martin Gonzalez Pintos
• Nikita Petrov
• Noelia Santos
• Oleg Kalashev
• Oliver Isac Ruiz-Hernandez
• Olivier Martineau
• Orazio Zapparrata
• Paras Koundal
• Percy Cáceres
• Philip Ruehl
• Pierre-Simon Mangeard
• Pragati Mitra
• Pranav Sampathkumar
• Remy Prechelt
• Rocio Garcia Ginez
• Sandra LE COZ
• Serap Tilav
• Sergei Gleyzer
• Shehu AbdusSalam
• Sheridan Lloyd
• Sonja Mayotte
• Stef Verpoest
• Tanguy Pierog
• Tomás Capistrán
• Victor Manuel Luna Mendoza
• Xinhua Bai
• Yanee Tangjai
• Yuping Zheng
• Monday, 31 January
• 17:00 17:30
Registration
• 17:30 19:00
Welcome Reception
• Tuesday, 1 February
• 09:00 09:15
Introduction: Welcome
Convener: Frank Schroeder (University of Delaware / Karlsruhe Institute of Technology)
• 09:15 10:30
Tuesday
Convener: Frank Schroeder (University of Delaware / Karlsruhe Institute of Technology)
• 09:15
Deep learning in astroparticle physics 45m

In the past few years, deep-learning-based algorithms have been extraordinarily successful across many domains, including computer vision, machine translation, engineering, and science. Also, in physics, applications are accumulating due to the need for fast and precise algorithms that are able to exploit huge amounts of data. So, could it even become a new paradigm for data-driven knowledge discovery?

In this contribution, we introduce the fundamental concepts of deep learning, review the potential of this emerging technology, and illustrate the wide variety of possible applications in the context of particle and astroparticle physics.
Finally, we present novel approaches in the field and discuss future applications.

Speaker: Jonas Glombitza (RWTH AACHEN UNIVERSITY)
• 10:00
Machine Learning for High-Energy Physics Reconstruction and Analysis 30m Online

#### Online

The Large Hadron Collider (LHC) is delivering the highest energy proton-proton collisions ever recorded in the laboratory, permitting a detailed exploration of elementary particle physics at the highest energy frontier. In this talk, I will discuss the application of machine learning to problems in high-energy physics, with a focus on the challenges associated with large, complex datasets from the Large Hadron Collider, and those expected from the High-Luminosity Large Hadron Collider. I will discuss the application of state-of-the-art machine learning methods to new physics searches at the LHC, detector reconstruction, event simulation and real-time event filtering at the LHC.

Speaker: Sergei Gleyzer
• 10:30 11:00
Coffee Break 30m
• 11:00 12:30
Tuesday
• 11:00
Exploitation of Symmetries and Domain Knowledge in Deep Learning Architectures 45m Online

#### Online

The field of deep learning has become increasingly important for particle physics experiments, yielding a multitude of advances, predominantly in event classification and reconstruction tasks. Many of these applications have been adopted from other domains. However, data in the field of physics are unique in the context of machine learning, insofar as their generation process and the laws and symmetries they abide by are usually well understood. Most commonly used deep learning architectures fail at utilizing this available domain knowledge.
In this contribution, the importance of utilizing domain knowledge is highlighted and a hybrid reconstruction method is introduced that combines the benefits of maximum-likelihood estimation with those of deep learning. Domain knowledge, such as invariances and detector characteristics, can easily be incorporated in this approach. Although applicable to any simulation based experiment, the hybrid method is illustrated by the example of event reconstruction in IceCube.

Speaker: Mirco Huennefeld (Universität Dortmund)
• 11:45
Composition Analysis of cosmic-rays at IceCube Observatory, using Graph Neural Networks 45m Online

#### Online

The IceCube Neutrino Observatory, located at the South Pole, is a multi-component detector that detects high-energy particles from astrophysical sources. Cosmic Rays (CRs) are charged particles from these astrophysical accelerators. CRs and CR-induced air-showers furnish us with the possibility to discern the fundamental properties and behavior of such sources. When coupled to the IceTop surface array, IceCube affords unique three-dimensional detection and cosmic-ray analysis in the transition region from galactic to extragalactic sources. This work tries to improve the estimation of CR primary mass on a per-event basis in the mentioned energy range. The work benefits from using the full in-ice shower footprint and additional composition-sensitive air-shower parameters, in addition to global shower-footprint parameters already used in an earlier work. A Graph Neural Network (GNN) based implementation uses the full in-ice shower footprint. Described using nodes and edges, graphs allow us to efficiently represent relational data and learn hidden representations of input data to obtain better model accuracy. Mapping in-ice IceCube detectors, DOMs(Digital Optical Module), as a graph emerges as a natural solution. Using GNNs for cosmic-ray analysis at IceCube also has the added benefit of allowing an easier re-implementation to the planned next-generation upgraded instrument, called IceCube-Gen2.

Speaker: Paras Koundal (Karlsruhe Institute of Technology)
• 12:30 14:00
Lunch 1h 30m
• 14:00 15:30
Tuesday
• 14:00
Measurement of the high-energy muon multiplicity in cosmic-ray air showers with IceTop and IceCube using neural networks 30m Online

#### Online

The IceTop and IceCube detectors at the South Pole provide the opportunity to simultaneously measure the electromagnetic and low-energy muonic component of a cosmic-ray air shower at the surface, and the penetrating muons in the deep ice. Various properties of the bundle of muons above several 100 GeV measured in IceCube are sensitive to the mass of the primary cosmic ray and contain information about the hadronic physics of the first interactions in the atmosphere. By combining a maximum-likelihood reconstruction of the energy loss of the muon bundle with a simple Convolutional or Recurrent Neural Network, the multiplicity of muons above a certain energy threshold in the shower can be estimated with reasonable accuracy. Along with information on the electromagnetic shower component as measured by IceTop, this opens the possibility for a measurement of the evolution of the average high-energy muon content of air showers with primary energies from PeV to EeV.

Speaker: Stef Verpoest (University of Gent)
• 14:30
Air shower reconstruction using a Graph Neural Network for the IceAct telescopes 30m

The IceAct telescopes are prototype Imaging Air Cherenkov telescopes (IACTs) situated at the IceCube Neutrino Observatory at the geographic South Pole. The telescopes camera consist of 61 silicon photomultipliers  (SiPMs) with a hexagonal light guide glued to each SiPM. The IceAct telescopes measure the electromagnetic air shower component of cosmic rays in the atmosphere, which is complementary to the muonic component measured by the IceCube in-ice detector and the particle footprint measured at the surface by IceTop.  The shape of the events and the number of SiPMs hit per event within the IceAct telescopes, and the possibility of combining information from different detector components, makes the IceAct data a perfect candidate for a reconstruction of particle type and energy using a graph neural network (gnn). In contrast to other neural networks, gnns do not need a fixed structure between the nodes, the number nodes can differ between events and the connection between the nodes can be defined individually for each pair of nodes. A Monte Carlo study for a first gnn reconstruction of air shower events with the IceAct telescopes will be presented.

Speaker: Larissa Paul (Marquette University)
• 15:00
Cosmic ray mass composition study using a Random Forest applied to data from the IceAct telescopes 15m

The IceAct telescopes are prototype Imaging Air Cherenkov telescopes (IACTs) situated at the IceCube Neutrino Observatory at the geographic South Pole. The IceAct telescopes measure the electromagnetic air shower component of cosmic rays in the atmosphere, which is complementary to the muonic component measured by the IceCube in-ice detector and the particle footprint measured at the surface by IceTop. For this Monte Carlo study a random forest is used to analyze the mass composition of the cosmic rays spectrum using the three independent measurements of the cosmic ray air showers provided by the different detector components.

Speaker: Larissa Paul (Marquette University)
• 15:15
Pattern Recognition for Multiple Interactions in a Neutron Monitor 15m Online

#### Online

The flux of Galactic cosmic rays at Earth is modulated by the long term magnetic variations of the Sun (11-year sunspot cycle and 22-year magnetic solar cycle). This process known as Solar modulation is most pronounced at 1 GeV and below. However, it also operates at much higher energy, still exhibiting solar magnetic polarity dependence. For the last decades, ground-based neutron monitors provided valuable observations of the solar modulation up to a rigidity cutoff of about 17 GV. To extend the energy range of the neutron monitor observations, we recently upgraded the electronics of the Princess Sirindhorn Neutron Monitor in Thailand (PSNM, the operating neutron monitor at the highest geomagnetic rigidity cutoff) to record complex combinations of hits in multiple proportional counters. The variety of event topology recorded at the PSNM indicates multiple sources: energetic atmospheric nucleons (GeV-range), coincidence of secondary particles, and possibly small air-shower core passing through the detector. We discuss these observations with a preliminary analysis of a detailed Monte-Carlo simulation of energetic neutrons interacting in the detector.

Speaker: P.-S. Mangeard (University of Delaware)
• 15:30 16:00
Coffee Break 30m
• 16:00 17:30
Tuesday
• 16:00
Composition of 100 TeV - 100 PeV Cosmic Rays with IceCube and IceTop using Boosted Decision Trees 30m

IceTop is the surface component of the IceCube South Pole Neutrino Observatory and dedicated to the indirect detection of cosmic rays (CRs). The recent implementation of a new trigger that only requires 2 of IceTop's 6 central infill stations hit by a CR-induced air shower allowed to reduce the primary energy threshold for the detection of low-energy CRs from 1.6 PeV to 250 TeV. This lead to a narrowing of the gap between direct and indirect CR measurements and coverage of the entire knee region of the spectrum.

Apart from the reconstruction of primary energy, shower core position and zenith angle, this work aims to create a supervised machine-learning model that is capable of correctly predicting the mass composition of CR primaries. This requires the combination of signals from the surface and the corresponding tracks of high-energetic muons within the deep in-ice detector below. For this purpose, tree-based methods, namely random forests and boosted decision trees, have been trained for regression and classification tasks on Monte Carlo shower data of four primary types. Additionally, plans for a potential implementation with neural networks are presented.

Speaker: Julian Saffer (Karlsruhe Institute of Technology)
• 16:30
Cosmic rays primary energy estimation using Machine Learning and combined reconstruction 30m

The IceCube Neutrino Observatory at the South Pole is capable of measuring two components of the cosmic rays air shower. The electromagnetic component using a km2 surface array IceTop, and the high-energy muonic component using km3 in-ice array IceCube between 1.5 and 2.5 km below the surface. The combination of both arrays in conjunction with a new flexible curvature and new timing fluctuation function provides an opportunity for possible improvements of cosmic rays reconstruction. This work presents a preliminary investigation of possible improvements of cosmic rays primary energy estimation (proton, iron, helium, and oxygen) by using Machine Learning techniques and combined reconstruction.

Speaker: Diana Leon Silverio (South Dakota School of Mines and Technology)
• 17:00
Energy Reconstruction with Convolutional Neural Networks in IceTop 30m Online

#### Online

IceTop, the surface component of the IceCube Neutrino Observatory, consists of 81 stations that detect air showers produced by cosmic ray interactions with the atmosphere. An accurate energy estimator for IceTop is essential for studying the nature of the cosmic ray spectrum around the knee (300 TeV - 1 EeV). Using over 400,000 simulated events, we trained an array of convolutional deep neural networks (CNNs) to reconstruct the energy of a cosmic ray primary based on the charges detected at the surface. Preliminary results show that charge-only CNN models can deliver an energy resolution better than 10%, with significant improvements when including reconstructed zenith. This result is consistent with independent energy reconstructions used by IceCube, and indicates the promise of a deep-learning approach.

Speaker: Frank McNally (Mercer University)
• Wednesday, 2 February
• 09:00 10:30
Wednesday
• 09:00
Machine learning based event reconstruction in Telescope Array surface detector 45m Online

#### Online

The surface detector of the Telescope Array (TA) experiment is the largest one in the northern hemisphere. We overview the machine learning based event reconstruction methods being developed by the TA collaboration. The key idea is to use full detector Monte Carlo simulation to obtain the raw detector signal as a function of the primary particle properties and to train deep convolutional neural network to model the inverse function. The above technique can be used to enhance the energy and arrival direction reconstruction for the individual events and to estimate the mass composition for an ensemble of events.

Speaker: Oleg Kalashev (INR RAS Moscow)
• 09:45
State-of-art deep learning technologies and their application to air-shower reconstruction 45m Online

#### Online

Once again, the last several years reshaped the state-of-the-art in Computer Vision (CV). Non-convolutional approaches, such as Vision Transformers (ViT) and self-attention multi-layer perceptrons (SA-MLP), are quickly emerging, combined with novel optimization techniques and pre-training methods. Note that ViTs and SA-MLPs are evidently better at incorporating global information about the input data, they're also not spatially invariant, which is more appropriate for the cosmic-ray air-showers detectors. This contribution covers multiple approaches for the unsupervised pre-training - a technique that allows making model learn on the unlabeled (i.e., experimental) data and thus increases the model performance. However, each of the examined approaches is nontrivial to apply to air-showers, which poses a challenge yet to be solved.

• 10:30 11:00
Coffee Break 30m
• 11:00 12:30
Wednesday
• 11:00
Deep Learning for Air Shower Reconstruction at the Pierre Auger Observatory 30m

The measurement of the mass composition of ultra-high energy cosmic rays constitutes one of the biggest challenges in astroparticle physics. Detailed information on the composition can be obtained from measurements of the depth of maximum of air showers, Xmax, with the use of fluorescence telescopes, which can be operated only during clear and moonless nights.

Using deep neural networks, it is now possible for the first time to perform an event-by-event reconstruction of Xmax with the Surface Detector (SD) of the Pierre Auger Observatory. Therefore, previously recorded data can be analyzed for information on Xmax, and thus the cosmic-ray composition. Since the SD operates with a duty cycle of almost 100% and its event selection is less strict than for the Fluorescence Detector (FD), the gain in statistics with respect to the FD is almost a factor of 15 for energies above $10^{19.5}$ eV.

In this contribution, we introduce the neural network particularly designed for the SD of the Pierre Auger Observatory. We evaluate its performance using three different hadronic interaction models and verify its functionality using Auger hybrid measurements.
Finally, we quantify the expected systematic uncertainties and show that the method permits to determine the first two moments of the Xmax distributions up to the highest energies.

Speakers: Jonas Glombitza (RWTH AACHEN UNIVERSITY) , for the Pierre Auger Collaboration
• 11:30
Extraction of the Muon Signals Recorded with the Surface Detector of the Pierre Auger Observatory Using Recurrent Neural Networks 30m Online

#### Online

We present a method based on the use of Recurrent Neural Networks to extract the muon component from the time traces registered with water-Cherenkov detector (WCD) stations of the Surface Detector of the Pierre Auger Observatory. With the current design of the WCDs it is not straightforward to separate the contribution of muons to the time traces from those of photons, electrons and positrons in cosmic ray showers dominated by electromagnetic particles. Separating the muon and electromagnetic components is crucial for determining the nature of the primary cosmic ray and properties of hadronic interactions at ultra-high energies. We trained the neural network to extract the muon and the electromagnetic components from the WCD traces using a large set of simulated air showers, with energies between $10^{18.5}$ eV and $10^{20}$ eV and zenith angles below 60 degrees. The performance of this method is studied on experimental data of the Pierre Auger Observatory. It is shown that the predicted muon lateral distributions agree with the parameterizations obtained by the AGASA collaboration.

Speaker: Juan Miguel Carceller (University College London)
• 12:00
Neural Network Approaches for Event Classification Onboard EUSO-SPB2 30m

The Extreme Universe Space Observatory Super Pressure Balloon 2 (EUSO-SPB2) is under development, and will prototype instrumentation for future satellite-based missions, including the Probe of Extreme Multi-Messenger Astrophysics (POEMMA). EUSO-SPB2 will consist of two telescopes. The first is a Cherenkov telescope (CT) being developed to identify and estimate the background sources for future below-the-limb very high energy (E>10 PeV) astrophysical neutrino observations. The second is a fluorescence telescope (FT) being developed for detection of Ultra High Energy Cosmic Rays (UHECRs).

Super pressure balloons (SPB) are inherently risky due to the lack of flight controls compared to other orbital and suborbital crafts. The recovery of data from the instrument is only possible if the mission is terminated over land, therefore the only guaranteed data is what can be downloaded during the flight. Limited satellite based telemetry being shared between the two telescopes and housekeeping data results in roughly 1% of events recorded with the FT being downloaded during the flight. This necessitates onboard classification schemes to assign priority to data to be downloaded, which can be run using the limited computational resources of the SPB. We implement several architectures to achieve classification including convolutional, recurrent and Long Short Term Memory (LSTM) neural networks. These networks were trained using a large library of simulated EAS signals and both simulated noise and data taken from previous EUSO experiments. Ultimately, the neural network approach shows great promise but will require additional pre-flight testing in order to be fully validated.

Speaker: George Filippatos (Colorado School of Mines)
• 12:30 14:00
Lunch 1h 30m
• 14:00 15:30
Wednesday
• 14:00
CORSIKA and CONEX for air shower simulations 45m

In order to properly train neural networks to analyze air shower data, it is necessary to have accurate simulations providing the necessary level of details required to extract the required information. The most popular tool is certainly the current version of CORSIKA and its fast option for 1D simulation CONEX. We will present the basic principles of these tools and how to use them properly. The limitations, mostly coming from the hadronic interaction models, will be addressed to avoid any over interpretation of what the simulations can really do.

Speaker: Tanguy Pierog (Karlsruhe Institute of Technology (KIT), IAP)
• 14:45
CORSIKA 8: A modern framework for high-energy cascade simulations 45m Online

#### Online

The proliferation of innovative next-generation cosmic ray and neutrino observatories, with unique geometries (Earth-skimming, orbital, in-ice, etc.), and detection techniques (Cherenkov, radio, radar, etc.), requires the simulation of ultrahigh energy particle cascades which are challenging, if not impossible, to perform with current simulation tools like CORSIKA 7 and AIRES. These existing codes, which have been developed in FORTRAN for more than three decades, can be challenging to extend or extensively modify due to their fundamental software architecture, as well as due to their rigid assumptions about event geometries, the cascade environment, and the underlying physics models.

CORSIKA 8 is a completely new simulation framework, developed in modern C++ from the ground up, and is designed to perform high- and ultrahigh-energy particle cascades in matter. CORSIKA 8 has been designed to be extremely flexible, extensible, and easy-to-use while also being extremely performant via the use of compile-time optimization and HPC techniques like SIMD, parallelization, and GPU acceleration. In particular, CORSIKA 8 provides standard "pluggable" components for creating cascade simulations with unique geometries and physics, not only in air, but also in any other media including water, ice, and the lunar regolith.

We present the current status of the CORSIKA 8 project including the currently supported hadronic and electromagnetic physics models, the included radio & Cherenkov emission modelling, and give an introduction to new simulations that will be or are already enabled by the CORSIKA 8 project.

Speaker: Remy Prechelt (University of Hawai'i)
• 15:30 16:00
Coffee Break 30m
• 16:00 17:30
Wednesday: Tutorial
• 16:00
Machine Learning and Artificial Intelligence in Physics: Overview and Applications 1h 30m Online

#### Online

The use of computational algorithms, implemented on a computer, to extract information from data has a history that dates back to at least the middle of the 20th century. However, the confluence of three recent developments has led to rapid advancements in this methodology over the past 15-20 years: the advent of the era of large datasets in which massive of amounts of data can be collected, stored, and accessed efficiently; the development of computational algorithms that can perform classification and prediction to high degrees of accuracy across a variety of applied situations; and broad access to the computational power of modern computing systems that allow for the building of complex models of phenomenology in diverse domains. In this talk I will describe the basic fundamentals of Machine Learning (ML), how ML is used to extract information from data, the potential pitfalls to avoid when using ML in a variety of applications, the relationship between ML and what is currently commonly referred to as Artificial Intelligence (AI), and the transferability of ML from physics-based to non physics-based problems. The second half of this presentation will consist of a live demo applying ML to a physics application in the Python coding language using publicly available tools.

Speaker: Gregory Dobler (University of Delaware)
• Thursday, 3 February
• 09:00 10:30
Thursday
• 09:00
Machine learning in Baikal-GVD 35m Online

#### Online

Baikal-GVD is a large-scale underwater neutrino telescope currently under construction in Lake Baikal. Its principal component is a three-dimensional array of optical modules (OMs) registering Cherenkov light associated with the neutrino-induced particles. The OMs are organized in clusters, each containing 8 vertical strings with 36 OMs per string.

Located in a natural water reservoir, the OMs are exposed to the luminescence of the Baikal water. This necessitates the search for highly effective algorithms for noise rejection as the first step of data analysis. We developed a convolutional neural network reaching ~97% signal purity (precision) and ~99% survival efficiency (recall) for the signal hits on Monte-Carlo data. The architecture of the neural network exploits the causal connection between individual hits, rather than their spatial location.

The other problem we are solving with the help of neural networks is a reliable identification of neutrino events. The underlying issue is that muons flux due to cosmic rays is many orders of magnitude higher than that of neutrinos. Hence the discriminating algorithm must have extremly small error rate. We discuss how this can be achieved by adjusting event weights and choosing a proper loss function for the neural network.

Speaker: Ivan Kharuk (Institute for Nuclear Research RAS)
• 09:35
Towards mass composition study with KASCADE using deep learning 20m Online

#### Online

KASCADE was an air-shower detector located in Karlsruhe Institute of Technology. It consisted of scintillating detectors which were arranged in a 16×16 grid and recorded signals from secondary particles of air-showers. Data has been acquired from 1996 till 2013 and then has been made available online. Our goal is to find out, whether we can accurately reconstruct the initial particle by that data from the ground level. At the current stage of our work we use CORSIKA simulations of this experiment, getting data for 5 mass groups of particles and training our classifiers on them. We have tested two models: decision trees and convolutional neural network. After the training step we apply our models to the data from the real KASCADE experiment and check the credibility of the predicted particles distribution. Contrary to decision trees, the CNN are more sensitive to irregularities in the raw data and thus the data have to be preprocessed in term of application of additional quality cuts. In this talk we present the performance of developed classifiers and show our progress in preparation of the raw data for CNN.

Speaker: Daniil Reutsky (Moscow Institute of Physics and Technology)
• 09:55
IACT event reconstruction with deep learning: some progress, lessons learned, and outlook from CTLearn 35m Online

#### Online

CTLearn is a project that aims at IACT event reconstruction through the usage deep-learning models. The associated software packages include modules for loading and manipulating IACT data, and handling the training and test of deep-learning architectures with TensorFlow, using pixel-wise camera data as input. In this contribution we will comment on the challeges we faced so far, the lessons learned, our latest results, and our plans for the future.

Speaker: Daniel Nieto (Instituto de Física de Partículas y del Cosmos and Departamento de EMFTEL, Universidad Complutense de Madrid)
• 10:30 11:00
Coffee Break 30m
• 11:00 12:30
Thursday
• 11:00
Search for optimal deep neural network architecture for gamma detection at KASCADE 20m Online

#### Online

We focus on the novel data analysis from KASCADE, one of the most successful cosmic ray detectors in the >PeV range. The detector operated for about 15 years, its data are publicly accessible. The data archive includes about half a billion recorded air showers. Extensive air showers generated by ultrahigh-energy gamma-rays (not detected at the moment) are of particular research interest, since information about particles of this type allows us to learn about the properties of cosmic ray sources, as well as to study the nature of diffuse photons. The main problem is that this type of particle is difficult to distinguish against the background of cosmic protons, since the signatures left by protons and photons have similar characteristics. To solve this problem, we present a primary particle type classifier (gamma or proton) trained on the basis of the simulation data of the KASCADE detector. For classification, various approaches are applied using deep learning methods.

Speaker: Margarita Tsobenko (Higher School of Economics University - St. Petersburg)
• 11:20
Photon flux calculation using Deep Learning 35m Online

#### Online

Optical interferometry provides a sub-milliarcsecond resolution of astronomical objects. Intensity interferometry is a part of optical interferometry, which deals with correlation of intensities rather than amplitude of waves. For successful measurements, one needs large collecting area, such as an array of several telescopes separated by hundreds of meters with good time resolution of photon flux, e.g, imaging atmospheric Cherenkov telescopes such as H.E.S.S and CTA. The measurements have high photon rates, so that the pulses in PMTs from individual photons overlap. As a result, the rate determination by counting is unfeasible. We use several neural networks (such as CNNs, LSTMs, GRUs) in order to determine the rate of photons detected by the PMTs.

Speaker: Jigar Bhanderi
• 11:55
Improving the gamma-hadron separation for air showers at the IceCube Neutrino Observatory 35m

The IceCube Neutrino Observatory is a unique experiment located at the geographic South Pole. It is composed of two detectors: an optical array deep in the ice and an array of ice-Cerenkov tanks at the surface called IceTop. The combination of the two detectors can be exploited for the study of cosmic rays and the search for PeV photons. In particular, the in-ice detector measures the high-energy muonic component of air showers, and the surface detector all shower component and can be used for the general shower reconstruction. The aim of this work in progress is to discriminate between photon initiated and cosmic ray initiated air showers. This discrimination is performed using a machine learning technique named Random Forest. This is a supervised machine learning technique that predicts unknown data after studying labeled data. The physics quantities used for this study are the charges measured by the in-ice detector, the zenith angle, a parameter that describes the in ice containment of the shower, the reconstructed energy and a likelihood estimator that captures both the presence of individual muons and charge fluctuations in the surface array.
Furthermore, the planned enhancement of IceTop, comprised of surface radio antennas and scintillator panels, will contribute to the improvement of the gamma-hadron separation.

Speaker: Federico Bontempo (Karlsruhe Institute of Technology)
• 12:30 14:00
Lunch 1h 30m
• 14:00 15:30
Thursday
• 14:00
Open questions in deep learning techniques for the radio detection 45m Online

#### Online

Nowadays the deep learning techniques are broadly applied for the processing of radio signals generated in air-showers. The majority of the implementations are based on the convolutional neural networks (CNN) running of 1D arrays containing finite waveforms with radio impulses. This approach has shown its feasibility and is able to be implemented for the both trigger- and high- levels of data collection and analysis. However there is a room for the improvement and some open questions. During my talk we reviewed the current progress in the field, pointed the important issues and their possible solutions, and shared and discussed ideas of the optimal application of this technique.

Speaker: Dmitriy Kostunin (DESY)
• 14:45
Deep Learning for Classification and Denoising of Cosmic-Ray Radio Signals 30m Online

#### Online

Radio emission, produced mainly as a result of the geomagnetic deflection of oppositely charged particles within the cosmic-ray air showers, is contaminated by backgrounds such as the continuous Galactic background and thermal noise. This irreducible background poses a significant challenge for radio detection of air showers. To mitigate this effect of background we employ machine learning (ML) techniques. These techniques such as convolutional neural networks (CNNs) have been widely used to analyze visual imagery. It is only recently that these techniques have been adopted in many fields of science for the purpose of recognizing different patterns in the data. In this work, we use CNNs with the following two goals: to classify waveforms with signals against those that include only noise and to extract the underlying radio signals from the contaminated traces. To produce the required dataset for training the models, we use CoREAS simulations which calculate the radio signals from air showers. For background we considered Cane Model for average Galactic noise, with an additional thermal component. Both signal and background traces are filtered in the 50 - 350 MHz frequency band before training. With these ML models, we aim to improve the detection threshold and also the reconstruction efficiency of the radio technique for cosmic-ray air showers.

Speaker: Abdul Rehman (University of Delaware)
• 15:15
Training Neural Networks to Classify and Denoise Cosmic-Ray Radio Signals Using Background Measured at the South Pole 15m

Cosmic-ray air showers produce radio signals which can be detected from Earth’s surface. However, the radio background that is detected along with these signals can make it difficult to identify an air shower signal from the local background. To solve this problem, this project aims to train two convolutional neural networks (CNNs): a “classifier” and a “denoiser”. The classifier distinguishes a trace containing an air shower signal from a trace containing only background. The denoiser takes a noisy signal and removes the noise (background) from it. The dataset used to train these networks includes simulated air shower signals produced in CoREAS as well as background traces recorded with a prototype station at the IceCube Neutrino Observatory at the geographic South Pole. The training and analysis is performed using the frequency band from 100 to 350 MHz. The goal of these CNNs is to improve the detection threshold of radio experiments to detect signals with lower energies and to improve the removal of background noise from air shower radio signals. I will show how the CNNs perform in identifying cosmic ray signals and in extracting air shower pulses from the noisy waveforms.

Speaker: Dana Kullgren (University of Delaware)
• 15:30 16:00
Coffee Break 30m
• 16:00 17:30
Thursday
• 16:00
Crowdsourcing your training labels with Zooniverse 30m Online

#### Online

In this presentation, I will describe the Zooniverse.org citizen science platform as a tool to gather labels from over 2.5 million dedicated volunteers worldwide who are motivated to participate in scientific research. Hundreds of research teams now turn to Zooniverse for crowdsourcing tasks such as image classification and annotation which provide the large labeled data sets needed for optimal training of machine algorithms. I will provide examples from across several relevant domains including particle physics, multi-messenger astrophysics and IACT event categorization, with a focus on the Muon Hunter project used to gather millions of labels to train a CNN for an IACT calibration pipeline. I will demonstrate the ease with which a project can be developed with the Zooniverse Project Builder tools and describe the infrastructure available for integrating machine learning with Zooniverse including sophisticated active learning techniques.

Speaker: Lucy Fortson (University of Minnesota)
• 16:30
What slow down cosmic ray analysis and what can we do about them? 25m

Cosmic ray analysis relies on multiple steps including calibration, simulation, event reconstruction and interpretation, etc. Because of their broad energy coverage and sophisticated analysis and simulation techniques, large cosmic ray projects often suffer from their science analysis falling behind data collection. This challenge may be more severe in next generation multi-messenger astroparticle physics projects in which more hybrid detection techniques will be used. This presentation will list a few key "nodes" that often slows down the analysis and excite a roundtable discussion to see how we can mitigate the challenge by harnessing Big Data revolution.

Speaker: Xinhua Bai (South Dakota School of Mines and Technology)
• 16:55
Workshop on Machine learning for Cosmic-Ray Air Showers - Summary & Outlook 25m

Summary of the 3-day workshop and outlook into the future of cosmic rays analysis using state of the art machine learning techniques.

Speaker: Matthias Plum (Marquette University)
• 17:20
Good Bye 10m
Speaker: Frank Schroeder (University of Delaware / Karlsruhe Institute of Technology)