ICRR-MPP analysis pipeline for MAGIC and LST data
This repository contains the scripts needed to perform MAGIC+LST analysis with ctapipe.
Note: This package is under heavy development. Usage at your own risk, it is recommended to contact one of the latest committers if you plan to use this package.
A brief description:
-
config/CrabNebula.yaml
: an example of the configuration file, used by all the scripts. -
config/magic-cta-pipe_config_stereo.yaml
: an example of the configuration file for stereo analysis. -
hillas_preprocessing.py
: compute the hillas parameters. Loops over MCs and real data. This script uses the tailcuts cleaning. -
hillas_preprocessing_stereo.py
: compute the hillas and stereo parameters. Loops over MCs and real data. This script uses the tailcuts cleaning. -
hillas_preprocessing_MAGICCleaning.py
: compute the hillas parameters. Loops over MCs and real data. This script used the MAGIC cleaning implemented in MARS. -
hillas_preprocessing_MAGICCleaning_stereo.py
: compute the hillas and stereo parameters. Loops over MCs and real data. This script used the MAGIC cleaning implemented in MARS. -
train_energy_rf.py
: trains the energy RF. -
train_direction_rf.py
: trains the direction "disp" RF. -
train_classifier_rf.py
: trains the event classification RF. -
apply_rfs.py
: applies the trained RFs to the "test" event sample. -
add_orig_mc_tree.py
: adds the "original MC" tree info to the MC events tree processed earlier. -
make_irf.py
: generates IRFs based on the event lists with reconstructed parameters. -
make_event_lists.py
: produces the FITS event lists with application of the cuts.
Moreover, the utils
directory contains two modules:
-
MAGIC_Badpixels.py
: finds the so called bad/hot pixels i.e. pixels affected by stars, or pixels turned off or dead. -
MAGIC_Cleaning.py
: implements the MAGIC cleaning as defined in MARS.
There is also an IPython notebook, magic_lst_event_coincidence.ipynb
, which shows how to perform the coincidence of events between MAGIC and LST1 data, when data are taken by both systems.
Here below you can find a more detailed description of the pipeline work flow.
Configuration file CrabNebula.yaml
This is an example of the configuration file which is used by all the scripts of the pipeline. It is in YAML standard, which can be easily parsed and also easily readable by humans. Through this file, the user can configure the details of the analysis like input files, output files, details of the cleaning and Random Forest generation and analysis cuts to be applied to the events.
More in detail, the configuration file is a series of main keys, each having other nested (key, value) pairs. The main keys are:
data_files
image_cleaning
energy_rf
direction_rf
classifier_rf
irf
event_list
data_files
specifies the input and output files, both for simulated (MonteCarlo) and real data, denoted by the mc
and data
keys. Each set of data is has a train_sample
and test_sample
keys. For simulated data, the train_sample
key refers to the simulated data sample used for the training of the Random Forest classifier, whereas the test_sample
is the sample used to compute the Instrument Response Functions (IRFs). For real data, the train_sample
is what usually
is called OFF data, which are used together with simulated data in the Random Forest algorithm, while the test_sample
refers
to the so called ON data, that is the data the user wants to analyze.
Each train_sample
and test_sample
keys have two sub-keys, called magic1
and magic2
. As their name implies, the input and
output files are specified for each telescope independently, since the pipeline starts its processing from MAGIC calibrated data.
If the analysis uses data from a third telescope, as LST1, an additional key called, for example, lst1
can be added to specify the
input and output files. For the moment though, the pipeline works with MAGIC data only.
Each telescope key is used to specify the input and output files at different stages of the pipeline:
-
input_mask
: it specifies the input files to the pipeline; absolute and relative paths can be used; wildcards are allowed; -
hillas_output
: it specifies the name of the output file of the scripthillas_preprocessing.py
; -
reco_output
: it specifies the name of the output file after applying the Random Forests to the data. NB: this key must be set only for thetest_sample
data, either simulated or real.
The image_cleaning
key is used to specify the cleaning parameters. In particular, since for both MAGIC telescopes the cleaning settings
are the same, only one key called magic
is used. As for data_files
, when in the future LST1 will be added in the analysis, an additional
key should be added to specify the cleaning settings for that telescope.
The energy_rf
, direction_rf
and classifier_rf
keys specify the settings used for the each type of Random Forest used in the analysis.
Each of these keys have other sub-keys:
-
save_name
is the name of the output file for the specific Random Forest -
cuts
is a string to be applied on the input data to the Random Forests -
settings
is a set of keys specifying the settings for each Random Forest e.g. the number of estimators, the minimum number of events in each leaf and the number of jobs -
features
is a list of strings specifying the parameters to be used in the Random Forests training. NB: for thedirection_rf
key,features
is actually a dictionary with two keys,disp
andpos_angle_shift
. For each of those keys, a list is used to specify the parameters to be used for each of those Random Forests.
The irf
key has only one sub-key, called output_name
, which is the name (plus path) of the file where IRF will be stored in FITS format.
Finally, the event_list
key is used to specify some cuts, quality
or user selection
cuts.
Configuration file magic-cta-pipe_config_stereo.yaml
This configuration file is very similar to the previous one, but it should be used when stereo analysis has to be performed. In particular, what changes wrt
CrabNebula.yaml
is that there is only one telescope name key, namely magic
. This is because the input mask in this case will specify data from both
M1 and M2 to allow for stereo reconstruction.
hillas_preprocessing.py
The first script to run the pipeline is hillas_preprocessing.py
. It takes calibrated files (both simulated and real data) as input and processes them:
- it performs the image cleaning
- it calculates the Hillas parameters (using the
ctapipe.image.hillas_parameters
andctapipe.image.leakage
functions) - it computes the timing parameters (using the
ctapipe.image.timing_parameters.timing_parameters
function)
The settings of the cleaning, as well as the input and output files of the script, are specified in the configuration file. The format of the output files is HDF5.
For MAGIC data, its reading is performed through the ctapipe_io_magic
module. It defines the class
MAGICEventSource
, which inherits from the EventSource
class defined in ctapipe
,
used to setup classes to read different sources of data.
Running the script is straightforward:
$ python hillas_preprocessing.py --config=config.yaml
where config.yaml
is the name of the configuration file.
Other available options are:
-
--usereal
: run the script only over real data -
--usemc
: run the script only over MC data -
--usetest
: run the script only over test sample data -
--usetrain
: run the script only over train sample data -
--usem1
: run the script only over M1 data -
--usem2
: run the script only over M2 data
These options can be concatenated, e.g.:
$ python hillas_preprocessing.py --config=config.yaml --usereal --usetest --usem1
will run the script over real data from the test sample and from the M1 telescope only.
The next step in the pipeline is training the Random Forests for event classification, energy and direction reconstruction.
hillas_preprocessing_MAGICCleaning.py
It is similar to hillas_preprocessing.py
, the only difference is that it uses the MAGIC cleaning implemented in MARS. Its usage is the same as hillas_preprocessing.py
, see above.
hillas_preprocessing_stereo.py and hillas_preprocessing_MAGICCleaning_stereo.py
These script are very similar to hillas_preprocessing.py
and hillas_preprocessing_MAGICCleaning.py
, but they include also the reconstruction of stereo parameters.
Running the scripts is straightforward, e.g.:
$ python hillas_preprocessing_stereo.py --config=config_stereo.yaml
where config_stereo.yaml
is the name of the configuration file, the proper one for stereo analysis.
Other available options are:
-
--usereal
: run the script only over real data -
--usemc
: run the script only over MC data -
--usetest
: run the script only over test sample data -
--usetrain
: run the script only over train sample data
train_energy_rf.py, train_direction_rf.py, train_classifier_rf.py
These scripts take care of training different Random Forests with different purposes:
-
train_energy_rf.py
trains the Random Forest for the energy reconstruction -
train_direction_rf.py
trains the Random Forest for the event direction reconstruction -
train_classifier_rf.py
trains the Random Forest for the event classification
train_energy_rf.py
and train_direction_rf.py
run on simulated data from both the train and test sample. train_classifier_rf.py
instead runs on the test sample of simulated data and on OFF data.
Each scripts saves some performance summary plots as PNG images:
-
train_energy_rf.py
saves the energy migration matrix and the energy bias and RMS -
train_direction_rf.py
saves the histogram of theta2 and the PSF as a function of the energy and offset distance -
train_classifier_rf.py
saves the event classification histograms
To run these scripts, taking as example train_energy_rf.py
, just do:
$ python train_energy_rf.py --config=config.yaml
If you want to run these three scripts over DL1 data containing the stereo information, i.e. generated by hillas_preprocessing_stereo.py
or hillas_preprocessing_MAGICCleaning_stereo.py
, you need to add the --stereo
option when calling them from the command line.
Once the Random Forests are trained, they can be applied to the data. Before this step, another one must be performed using the script
add_orig_mc_tree.py
, described in the following paragraph.
add_orig_mc_tree.py
The script add_orig_mc_tree.py
opens the calibrated simulated files (for both train and test samples) to read the OriginalMC
tree,
containing the information about the simulated values for each event (e.g. energy, arrival direction of the events).
The information is then copied to the output files created by hillas_preprocessing.py
.
Run this script with the command:
$ python add_orig_mc_tree.py --config=config.yaml
Other available options are:
-
--usetest
: run the script only over test sample data -
--usetrain
: run the script only over train sample data -
--usem1
: run the script only over M1 data -
--usem2
: run the script only over M2 data -
--stereo
: run over DL1 data containing stereo information (i.e. generated byhillas_preprocessing_stereo.py
orhillas_preprocessing_MAGICCleaning_stereo.py
)
After this step, the Random Forests can be applied to the ON data and simulated data (test sample).
apply_rfs.py
The script apply_rfs.py
is responsible for applying the trained Random Forests (energy, event direction and classification) to the ON
and the test sample of simulated data, reconstructing the properties of the events. The result of the reconstruction is saved in a HDF5
output file, one for the ON and one for the simulated data, as specified by the reco_output
keys of the configuration file.
To run the script, just do:
$ python apply_rfs.py --config=config.yaml
If you want to run the script over DL1 data containing the stereo information, i.e. generated by hillas_preprocessing_stereo.py
or hillas_preprocessing_MAGICCleaning_stereo.py
, you need to add the --stereo
option when calling them from the command line.
make_irf.py
The script make_irf.py
generates the instrument response functions (IRFs) starting from the test sample of simulated data, after the Random
Forests have been applied to them. The result is a FITS file containing the following tables (the names are self-explanatory):
POINT SPREAD FUNCTION
ENERGY DISPERSION
EFFECTIVE AREA
For the time being, the name of the reconstructed test sample simulated data file and of the output FITS file is hardcoded in the script, but it will be changed in the future so that they can be set with the YAML configuration file. In any case, the script needs the configuration file to be passed as command line argument:
$ python make_irf.py --config=config.yaml
If you run the script apply_rfs.py
with the --stereo
option, then also make_irf.py
should be called with the --stereo
option.
make_event_lists.py
make_event_lists.py
is the last script of the pipeline and is responsible of creating an event list. First, a list of good time intervals (GTI)
is created (applying the cuts specified in the configuration file), then event information (ID, time, sky coordinates and reconstructed energy) are
extracted. The GTI and the event information are used to create two tables in the resulting FITS files: for each MAGIC run, a FITS file is generated.
To run this script:
$ python make_event_lists.py --config=config.yaml
If you used the --stereo
option for the previous scripts, then also make_event_lists.py
should be called with the --stereo
option.