Udst maker

Aus HERMESwiki
Zur Navigation springen Zur Suche springen

The uDST production program

The uDST production program is a HANNA-frame program which, in each invokation, produces uDSTs for a fill's worth of data. This document aims to provide the necessary information for running this program correctly and to describe the technical details of the data handling performed by this program. This document is organized into three sections:

  1. how to execute the uDST production program, including a detailed description of each command line argument,
  2. the technical details about various aspects of the uDST production, and
  3. a few example command.

Section 1: Executing the uDST production program and its command line arguments

Before executing the uDST production program, the necessary input files have to be arranged in a subdirectory before hand. The easiest way to setup these files to use the uDST daemon. If a fill has not yet been produced, the production manager can use the SETUP option to have the daemon execute the relevant setup scripts. Otherwise, the production manager should use the INSP option to have the daemon erase the existing uDSTs but keep the production structure intact.

Either way, once this setup has been done, the uDST production program can be executed by setting the current working directory to the production directory for this fill and executing the uDST_maker program by entering udst_maker with any of the command line arguments described in the next two subsections.


Subsection A: HANNA arguments

The first set of these arguments specify the input files for the event and slow data and are passed directly to the HANNA main program. These arguments are:

--slow [filename] (default = no fillfile)

This argument specifies the fillfile from which the program will fetch the slow control information. If this specified fillfile is not the one containing the information for the runs specified to the program (via the --flist argument), the output is unpredictable.

NOTE: If this argument is not present (i.e. the default), uDSTs will be produced without any slow control information. This feature, however, is not supported and so may or may not work at the moment.
--sdriver [MRFIL] (no default for this parameter)

the driver to use for reading the fillfile; this argument is required by HANNA when a fillfile is specified via the --fillfile argument.

--flist [filename] (no default for this parameter)

the name of the file containing the list of hrc.devents file (with complete paths) from which the event data is fetched. If this parameter is not supplied, this list of hrc.devents files (with complete paths) must be included on the command line.

--driver [RFIL] (no default for this parameter)

the driver to use for reading the hrc.devents files; because the hrc.devents are generic DAD files since the 96b production, this option should be set to RFIL when running this program.

NOTE : Neither HANNA nor the uDST production program check whether the specified slow and event data files contain information collected during the same fill. If not, however, the resulting uDSTs will very likely contain garbage.

Subsection B: uDST production program arguments

In addition to these HANNA arguments, the uDST production program also has many arguments which select features for the resulting uDSTs, for example, the choice of the type of uDSTs or the manner in which run changes are handled by the program. These features are controlled by command line arguments which are specific to the uDST production program. These arguments are:

--fillno [integer] (default = 0)

the fill number for this fill. This information is placed into the iFillNo field of the g1DAQ table.

96_97_98] (default = 96)

this argument specifies the data year. This information is used to select various year-dependent features of the production. Using the following table, this information is used to determine the type of polarized target:

95 : MIT helium-3 target
96/97 : ABS hydrogen target
98 : ABS deuterium target
NOTE: At present, the 95 option has not been debugged and thus will produce garbage uDSTs.
--prodversion [4 character string] (default = "")

this argument sets the uDST production version. This information is placed into the cudstVersion field in the g1DAQ table.

next1103_next1303] (default = hrcfile)

this argument determines the method used by the program for determining when to switch runs.

DAD_CDAD] (default = FZ)

this argument determines the driver used to write the uDSTs; the three supported choices are:

FZ = ADAMO GAF
DAD = standard DAD file
CDAD = compressed DAD file
--writeDST (default = yes)

this argument, when present, instructs the program to write uDSTs. By default, the program will write the g1Data dataflow and thus produce inclusive (g1) uDSTs. To produce the other types of uDSTs, either the --writeSEMI or the --writePID option is required.

--writeSEMI (default = do NOT fill semi-inclusive uDST tables)

this argument, when present, instructs the program to fill and subsequently write the smData dataflow and thus produce semi-inclusive (sm) uDSTs. By default, the program does not include cluster information in these uDSTs. This information can be included by using the --fillCALO option.

NOTE: this argument is not compatible with the --writePID argument. If both arguments are specified on the command line, only the second argument of the two on the command line is kept.
--writePID (default = do NOT fill PID uDST tables)

this argument, when present, instructs the program to fill and subsequently write the pdData dataflow and thus produce PID (pd) uDSTs. By default, the program does not include cluster information in these uDSTs. This information can be included by using the --fillCALO argument.

NOTE: this argument is not compatible with the --writeSEMI argument. If both arguments are specified on the command line, only the second argument of the two on the command line is kept.
DROP] (default = KEEP)

this argument specifies whether or not events which passed the "g1" cuts (see below) are placed into the uDSTs.

--Q2 [real number greater than 0.0] (default = 0.3)

this argument specifies the minimum for the Q2 cut used for event selection of the g1 criteria.

DROP] (default = KEEP)

this argument specifies whether or not events which have two or more tracks or trackless clusters (see below) are placed into the uDSTs.

DROP] (default = KEEP)

this argument specifies whether or not events which passed the "cluster" cuts (see below) are placed into the uDSTs.

NONE] (default = NONE)

this argument specifies the triggers for which events are placed into the uDSTs without regard for the standard event cuts. If NONE is specified, then events from all triggers are kept only if they fulfill the standard uDSTs cuts.

NO] (default = NO)
--doclusters (default = do NOT fill cluster uDST tables) OBSOLETE

this argument specifies whether or not the program fills the cluster information table for the semi-inclusive (smCluster) or the PID (smClusterPID) uDSTs.

NO] (default = NO)

this argument specifies whether or not the LUMI cluster information is placed into the smLUMI table for each event.

NO] (default = NO)

this argument specifies whether or not the time-of-flight information from the H1 and H2 hodoscopes should be used to compute the mass of the particles.

NO] (default = NO)

this argument specifies whether or not the hadron identification information is placed into the smRICH table for each track.

NOTE: If the data year specified by the --year argument is before 1998, this argument will be rejected because the RICH does not exist.
NO] (default = NO)

this argument specifies whether or not the muon hodoscope information should be used to identify muons in the standard acceptance.

UDST1_UDST2] (default = HRC)

this option selects the source for the PID likelihoods for the *hrc fields of the g1Track table. The three choices are:

HRC - use the likelihoods in the HRC rcPIDInfo2 table. These likelhoods were computed by HRC based upon parameterizations of each PID detector response.
UDST1 - use the likelihoods computed by the uDST production program using the data-based PID parent distributions from Kevin McIlhany, Mike McAndrews, and Felix Menden.
UDST2 - use the likelihoods computed by the uDST production program using the data-based PID parent distributions from Juergen Wieland.

For the 96b4, 96c0, and 97b3 uDSTs, the HRC pid was placed into the the *hrc fields. For the 97b3 and 97c0, the UDST1 pid was placed into the *hrc fields.

UDST1_UDST2] (default = UDST1)

this option selects the source for the PID likelihoods for the *new fields of the g1Track table. The three choices are:

HRC - use the likelihoods in the HRC rcPIDInfo2 table. These likelhoods were computed by HRC based upon parameterizations of each PID detector response.
UDST1 - use the likelihoods computed by the uDST production program using the data-based PID parent distributions from Kevin McIlhany, Mike McAndrews, and Felix Menden.
UDST2 - use the likelihoods computed by the uDST production program using the data-based PID parent distributions from Juergen Wieland.

For the 96b4, 96c0, and 97b3 uDSTs, the UDST1 pid was placed into the the *new fields. For the 97b3 and 97c0, the UDST2 pid was placed into the *new fields.

--breakevent [event number] (default = no break event)

this argument, when present, instructs the program to print a message when the specified event is produced in user_event(). By placing a break point at this print statement, the program can be paused at the particular event and subsequently the processing of this event can be studied in detail.

--online (default = do NOT fill the online uDST tables)

this argument, when present, instructs the program to fill the online (g1Online) table and to perform online calibration corrections. This flag is intended for online (A) productions only.

CALIB_BRP_CALIB_PCP_CALIB_TOM_CALIB_BEST95_EXTERNAL_ASCII] (default = RAW_BRP)
CALIB_EXTERNAL_ASCII] (default = RAW) OBSOLETE

this argument controls the source of information for the rPol, rPolErr, and rPolSystErr fields of the g1Target table so that each person doing analysis do not have to make this selection. The three choices are:

RAW_BRP - use the raw (i.e. online) BRP target data from the fillfile
CALIB_BRP - use the calibrated BRP target data from the fillfile
EXTERNAL_ASCII - use the calibrated BRP target data in an external ASCII file. The file is specified by the calibrated_target_FILE environment variable. If this variable is not defined, a default file is used. The format of this file is specified in the udst_handle_calibrated_target_info_from_ascii.c source code.
CALIB_PCP - use the calibrated PCP (pump-cell polarimeter) target data from the fillfile. This option is only valid for data collected in 1995.
CALIB_TOM - use the calibrated TOM (target optical monitor) target data from the fillfile. This option is only valid for data collected in 1995.
CALIB_BEST95 - use the calibrated PCP (pump-cell polarimeter) target data from the fillfile if it exists; otherwise, use the TOM (target optical monitor) data from the fillfile. This option is only valid for data collected in 1995.
EXTERNAL_ASCII] (default = FILLFILE)

this argument sets the source for the target data quality information placed into the iTargetDQ field of the g1Quality table. Normally, this information should be pulled from the fillfiles; however, if the fillfiles were made before this information was available, this feature allows the information to be pulled directly from the ASCII file supplied by the target group for the slow control production. The file is specified via the TARGETDQ_FILE environment variable. If this variable is not specified, a default file is used. The format of this file is described in the udst_handle_targetdq_from_ascii.c source code.

e-] (default = e+)

this argument specifies the type of particles in the beam and is used to fill the iBeamCharge table in the g1Beam table.

TPOL_CALIB_LPOL_RAW] (default = TPOL_CALIB)

this argument controls the source for the "best" beam polarization measurement, i.e. the data placed into the rPol, rPolErr, rPolSystErr, rPolFit, rPolNormFit, and rPolSystErr fields of the g1Beam table.

NOTE: If the data year specified by the --year argument is before 1997, the LPOL_RAW option will be rejected because the LPOL didn't work until after 1996.
EXTERNAL_ASCII] (default = FILLFILE)

this argument sets the source for the smoothed TPOL information placed into the rFitTPol, rFitTPolNorm, rFitTPolSyst, and iFitTPolGap fields of the g1Beam table. Normally, this information should be pulled from the fillfiles; however, if the fillfiles were made before this information was available, this feature allows the information to be pulled directly from the ASCII file supplied by the TPOL group for the slow control production. The format of this file is described in the udst_handle_smooth_TPOL_from_ascii.c source code. The file is specified via the TransPSmoothed_FILE environment variable. If this variable is not specified, a default file is used.

NOTE: When the TPOL is selected as the "best" beam polarimeter (via the --beampol option, this information will also be placed into the rPolFit, rPolNormFit, and rPolSystErr fields of the g1Beam table even if the information is drawn from an external ASCII file.
--dir [character string] (default = current working directory)

this argument sets the directory in which the uDSTs will be written.

RUNTYPE)

this argument instructs the program to produce ASCII history files recording information about specific occurances in the production. Currently, the following histories can be produced:

  1. for record statistics, use BURST
  2. for information on each vme time jumps, use VMEJUMPS
  3. for info on each synchronization problem, use BADSYNC
  4. for run statistics, use RUN
  5. for run type statistics, use RUNTYPE
  6. for record type statistics, use RECORD
  7. for target polarization, use TARGPPOL
  8. for g1 data quality, use G1DQ

The contents and format of the resulting history files is described in the udst_handle_history.c source code.

COMPUTED_EXTERNAL_ASCII] (default = FILLFILE)

this argument determines the source of the g1DQ information in the g1Quality table. Currently, the user can select one of the following sources for this information:

  1. the g1DQ table in the fillfiles by specifying the FILLFILE option -- this table contains the data quality determined by the g1 group from their analysis of the previous uDST production.
  2. the computation done by the uDST production program using the cuts specified in the g1_quality_computed.c by specifying the COMPUTEDoption.
  3. an external ASCII file by specifying the EXTERNAL_ASCII option. NOTE: this option has not yet been implemented.
--debug [see list below] (default = ALLOFF)

this argument instructs the program to produce debugging output. The possible types of debug output and the associated flags are:

SYNC : turn on output about the synchronization, equivalent to setting the ALLSLOW and EVENT debug output
ALLSLOW : produce a line of output for each slow table processed
EVENT : produce a line of output for each event processed
LUMI : produce a line of output each time the lumi information is updated with the new lumi information
TARGET : produce output about the new target polarization and alpha values whenever the information is updated or a new uDST record is started
BEAMPOL : produce output about the beam polarization whenever there is update
SLOWTABS : produce a line of output for each slow table processed
UNPOL : produce output whenever the unpolarized gas target information is updated
ACE : produce output during the processing of ACE data
TRKEFF : produce output during the processing of tracking efficiency data
ALLON : turn on all debugging output, equivalent to setting all of the above flags
ALLOFF : turn off all debugging output, the default

When executed, the udst production program will place the list of command line options in a file called "argument_flags_file" in the current working directory. The program will then produce uDSTs for each specified run. As it is processing the data, the program writes general messages about the data, any errors and, if the debugging features are active, debugging output to the standard output. This information can be collected by piping the output to a logfile.


Section 2 : technical details about the uDST production

Like any program, the uDST production program has many technical details which are important for understanding the data in the produced uDSTs. The following subsections provide this information.


Subsection A: event level cuts applied to the uDSTs

As part of the uDST production, crude cuts are applied to the event data to remove events which are not of interest. These cuts depend upon the type of uDSTs as follows:

g1 (inclusive) -

a track is included into these uDSTs if:

  • the event containing the track fired trigger 21,
  • the track was marked as "used" by HRC,
  • the track momentum was greater than 0.1 and less than 35 GeV, and
  • the track Q2 was greater than 0.3 GeV.
sm (semi-inclusive) -

in addition to the tracks present in the g1 (inclusive) uDSTs, these uDSTs contain the tracks from events which fulfilled at least one of the following conditions:

  • the event had two or more charged particles (i.e. tracks),
  • the event had two or more untracked clusters,
  • the event had one cluster which passed the "single cluster" cut, i.e. the sum of the momentum of one of the tracks and one of the untracked clusters with an energy above 0.8 GeV in the event was greater than 23.5 GeV and the phi openning angle between these two objects was greater than 2 radians.
pd (PID) - same as semi-inclusive uDSTs, except that only trigger 21 events are considered.

Subsection B: record switching

Normally, the uDST production program uses the slow data (i.e. the scaler events) to partition the data into records. In this scheme, the program will initiate a new record whenever a scaler event is processed, or equivalently, a burst or a polarization change is detected in the slow data. The processing of these events involves the sequential processing of several slow control tables, e.g. the dcScalInfo, dscSCALERS, and dcBurstInfo to name the three major tables. Since the order in which these tables are processed is not guaranteed by HANNA, a record switch cannot be reliably initiated by the processing of one of these tables because it's possible that one of the others has yet to be processed. Instead, to avoid this issue, the uDST production program initiates the record switch just prior to handling the first detector event handled after processing the dcScalInfo table. Since HANNA does not deliver this event until all of the updated slow control tables for a scaler event have been processed, this arrangement assures that the slow data is up-to-date when the new record is initiated.

However, when the slow data is not available, i.e. no fillfile is specified by the --fillfile command line argument, this scheme is not possible. In this case, the program will initiate a new record when the burst number or polarization state changes in the event data. Although the resulting uDSTs will still contain the tables for the slow data; this information is not valid and should be ignored in a subsequently analysis of these uDSTs.

Finally, because some information in a record is available prior to the record switch (e.g. all the scaler information), some information arrives during the processing of the event data (e.g. the ACE data), and still other information is not available until after the record is processed (e.g. any data quality flags which depend upon event data), the non-event tables in a uDSTs are filled at different times. To be specific,

at start of record : the target info, the beam polarization info, the g1DAQ table, the hv trip (g1HVTrip) table, the detector DQ info, the the hits/plane/burst (g1Detector) table
at end of record : the mean PID responses (g1BurstStat) table, the ACE (g1ACE/g1ACEcnts) tables, the track efficiency (g1TrkEffi) table, and the g1Quality bits which depend upon the event data.

If necessary, one can see exactly when each field in the uDSTs is filled by looking at the udst_DST.c source code.


Subsection C: synchronization of the slow data with the event data

Central to the making of the uDSTs is the need to assure that, in the filling of the uDSTs, synchronization problems between the event and slow data are detected and subsequently marked in the uDSTs. For this reason, the uDST production program checks that the events, which are processed in chronological order according to VME clock-1 time, have the same burst number and polarization states as the scaler event which defines the boundaries of a uDST record. If a discrepancy is detected, the uDST program marks the record by setting the appropriate bits in the iuDSTbad field of the g1Quality table.

The bits settings are:

0x00000001 = one or more events crossed a spin-flip boundary as defined by the 1203 scaler events. In this case, the affected record should be discarded from the analysis because the polarization state is potentially ambigiuous.
0x00000040 = in the 1996 data, the VME clock-1 time for a scaler event was derived from the VME clock-1 time of the preceding detector event. Since the slow control production computes the start time of the scaler event by subtracting 1 microsecond from the VME clock-1 time of the preceding scaler event, the detector event will be processed after the scaler event. This effectively moves a detector event across a spin-flip boundary. However, in truth, there is no problem.
0x00000080 = in the 1996 data, the VME clock-1 time for a scaler event was derived from the VME clock-1 time of the preceding detector event. Since the slow control production computes the start time of the scaler event by subtracting 1 microsecond from the VME clock-1 time of the preceding scaler event, the detector event will be processed after the scaler event. This effectively moves a detector event across a spin-flip boundary. However, in truth, there is no problem.

Subsection D: burst number jumps

In general, the burst should be numbered in ascending, consecutive order. So, when the uDST production program detects a jump in the burst numbering, it marks the records by setting the following bits in the iuDSTbad field of the g1Quality table:

0x00000800 = burst number jumped detected in the scaler data at the start of the record. In this case, a scaler event has probably been lost.
0x00000400 = burst number jumped detected in the scaler data at the end of the record. In this case, a scaler event has probably been lost.
0x00000200 = burst number jumped detected in the event data at the start of the record. In this case, the DAQ might not have stored all of events for this record and thus the record should be dropped from the analysis.
0x00000100 = burst number jumped detected in the event data at the end of the record. In this case, the DAQ might not have stored all of events for this record and thus the record should be dropped from the analysis.
0x00001000 = burst number jumped detected in the event or slow data at the end of the run. In this case, the DAQ was probably halted or paused and thus it did not store all of events for this record. For this reason, the record should be dropped from the analysis.

Subsection E: run switching

Presently, the uDST production program supports two methods for defining the run boundaries:

  1. the run boundary is placed at the end of the hrc.devents files or
  2. the run boundary is placed at the first 1103/1303 "burst change" scaler event processed after a new hrc.devents file is openned by HANNA.

This choice arises from the procedure used by the DAQ to handle a run switch. Presently, the DAQ decides whether or not to make a runswitch when the R5 machine (the UNIX machine on which all of the data taping is done) receives an 1103/1303 "burst change" scaler event. At these time, the DAQ checks to see if the amount of collected data exceed 450 MB, roughly the size of a tape on the GRAU robot. When it does, the DAQ initiates a run switch. This switch entails openning a new run file for storing the incoming data and then performing some initialization procedures on the readout and some detector electronics. At this time, however, the DAQ does not flush the data in its front-end memory for the recent detector, possibly including even 1203 "polarization flip" scaler events, to the R5 machine. These data will later at the R5 machine and thus are stored at the beginning of the new run file which was openned when the runswitch was performed. As a result, data from the previous run is carried over into the next run file, thereby blurring the run boundary.

When making production uDSTs, the last burst of the run, which has data split across two runs on tape, should be bridged together. For this purpose, the --runswitch option is set to NEXT1303. In this mode, the uDST production program switches runs after processing the first 1103/1303 "burst change" scaler event received after the last event in a hrc.devents file. This delaying of the run switch carries the data stored at beginning of each run which were sitting the front-end memory prior at the time of the run switch.

However, because event data are moved from one hrc.devents file to the another with this scheme, this feature complicates the task of debugging the uDSTs. Namely, it is no longer possible to expect that same number of tracks in the uDST as in the original hrc.devent file for a run. For this reason, when making test productions, the program should do a run switch after the last event of each hrc.devents file has been processed. This option is selected by setting the --runswitch command line argument to HRCFILE.


Subsection F: handling ACE data

Because ACE only seeing the event data, the ACE tables in the fillfile have the timestamps of the first reconstructed event in each burst or polarization state and not the timestamp of the scaler event associated with the change. Consequently, HANNA does not deliver these tables when the scaler event is processed put instead it delivers it after the first event has been processed. To make sure that the correct ACE tables is placed, this information is queued.


Subsection G: handling tracking efficiency data

Since the tracking efficiency information is computed from the raw ACE GAF files (via a code by Helmut Boettcher), the resulting tables will have the same timestamps as the ACE data and thus be out of sync for the same reason. So, like the ACE data, this information is buffered by the uDST production program and later merged into the uDSTs by burst number and polarization state.


Subsection H: the details about the source of each field in the uDSTs

More than enough space for words ....


Section 3: a few example commands for executing the uDST production program

  1. To execute the uDST production as was done for the 96b4 production, the following command line would be used: udst_maker --slow fillfile.da.gz --sdriver MRFIL --flist hrcfiles --driver RFIL --fillno 100 --year 96 --prodversion 96b4 --newPID --runswitch next1303 --DSTdriver CDAD --writeSEMI --doclusters --beampol TPOL_CALIB --tgtpol CALIB --targetDQ EXTERNAL_ASCII This set of command line arguments does:
    1. fetches the slow control data from the fillfile.da.gz file using the MRFIL driver. In a typical production, this file is a symlink to the actual slow control file.
    2. fetches the list hrc.devents file from the hrcfiles file. In a typical production, this file contains a list ...
    3. assigns 100 as the fill number
    4. specifies that the data year is 1996
    5. assigned "96b4" as the production version for the uDSTs
    6. has the program compute the new PID values
    7. has the program initiate run switches on the first 1303 scaler event processed following the startup of a new hrc.devents file
    8. writes the uDSTs in compressed DAD format
    9. fills the semi-inclusive tables
    10. fills the cluster tables
    11. uses the calibrated TPOL information as the source for the best beam polarization
    12. uses the calibrated target information in the fillfile as the source for the best target information
    13. fetches the target DQ information from an external ASCII file
    14. takes the g1DQ from the fillfile
  2. To execute the uDST production as was done for the 97b2 production, the following command line would be used: udst_maker --slow fillfile.da.gz --sdriver MRFIL --flist hrcfiles --driver RFIL --fillno 100 --year 97 --prodversion 97b2 --newPID --g1DQ COMPUTED --runswitch next1303 --DSTdriver CDAD --writeSEMI --doclusters --beampol TPOL_CALIB --TransPSmoothed EXTERNAL_ASCII --tgtpol EXTERNAL_ASCII This set of command line arguments does:
    1. fetches the slow control data from the fillfile.da.gz file using the MRFIL driver. In a typical production, this file is a symlink to the actual slow control file.
    2. fetches the list hrc.devents file from the hrcfiles file. In a typical production, this file contains a list ...
    3. assigns 100 as the fill number
    4. specifies that the data year is 1996
    5. assigned "97b2" as the production version for the uDSTs
    6. has the program compute the new PID values
    7. has the program initiate run switches on the first 1303 scaler event processed following the startup of a new hrc.devents file
    8. writes the uDSTs in compressed DAD format
    9. fills the semi-inclusive tables
    10. fills the cluster tables
    11. uses the calibrated TPOL information as the source for the best beam polarization
    12. uses the calibrated target information in an external ASCII as the source for the best target information
    13. takes the smoothed TPOL information from an external ASCII
    14. computes the g1DQ on the fly using the data quality criteria for 1997