MISSION 2: HERMES RECONNAISSANCE
Page maintainer: Larry
| This page is considered done. It been reviewed by Larry. There may be missing elements, but they are all flagged and the text has no errors. |
Welcome, Cadet, to the HERMES Software Suite. Your mission is to gain control of this unruly collection of files and programs. To do that, you need to know the lay of the land. Your reconnaissance training will help you to answer these questions:
- What programs are available and what do they do?
- How are the programs organized?
- Where do I find the source code?
- Where are the HERMES data files?
- What is the structure of these files?
- What basic concepts do I need to understand in order to perform a physics analysis?
Once you know this, you will be ready for your forthcoming missions, where you will learn how to use the programs and analyze the data files.
For your reconnaissance training, you must travel to the remote war-torn set of interconnected islands known as the HERMES PCFarm. At the time of writing, the PCFarm consists of 2 interactive login nodes (worf and geordi), one administrative server (kirk), 14 file servers with a total capacity of 40 terabytes and 49 dual-processor batch computing nodes. They all run LINUX. It is here, and only here, that you will find all the programs and data files of the HERMES Software Suite. The PCFarm is the official repository for all source codes and all data files. You must have a HERMES computer account at DESY to access it. This entire tutorial is based on an exploration of the PCFarm.
Please note, before you embark on this mission you must have some general knowledge of the experiment: you should understand at a basic level the principal physics we are trying to do, and you should be familiar with our spectrometer and its component detectors. For a fast introduction to the physics of DIS and the essentials of the experiment, check out these talks from summer 2008: HERMES Intro Part 1 and Part 2. I strongly recommend the habilitation thesis of Michael Dueren as an excellent introduction to the experiment. For more information about the physics we study, you might wish to review our publication list ... for further detail on the hardware, I recommend our spectrometer paper.
Overview of the HERMES Software Suite
The HERMES software suite contains a great many programs, including utilities, libraries, and analysis programs. This section aims to establish some structure to all this. I will briefly introduce the database structure around which all the software is built, and then we will follow along the path that the data takes in getting from tape all the way to publication. In particular, we will encounter the various production chains, which comprise the major steps in data processing.
ADAMO: the Database Concept
The HERMES software was written around a database concept: ADAMO. ADAMO is a CERNLIB programming library that provides such a database concept, and utility routines for its management. Since practically all of the HERMES data files are in ADAMO format, you must understand something about this package if you want to interact with any of the HERMES data files. Your next mission, in fact, will be just that: an ADAMO tutorial.
For now, we need only introduce a few concepts. ADAMO is nothing more than a database-management utility. It organizes information into dataflows, each consisting of many tables of information plus the relationships (or links) between them. All this information is stored in ADAMO files one record at a time. A record is like one card in a Rolodex file: it is a complete set of information (i.e. of filled tables) that is independent of all other records. This `complete set' of information is called a dataflow ... the importance of grouping tables and relationships into dataflows is simply that each ADAMO record consists of one and only one dataflow of information. Almost all ADAMO files use the same dataflow for every record.
These concepts are illustrated below with a simple example: a database of car owners, including the make and model of each car they own and the companies with which the cars are insured.
RECORD: 1 KEYTABLE: (CustomerNo = 1, Dataflow = CarOwners)
TABLE: Person TABLE: Car TABLE: Insurer
+-----------------+ +---------------------------------+ +--------------+
_ ID _ Name _ Age _ _ ID _ Make _ Model # Insurer _ _ ID _ Company _
+----_------_-----+ +----_--------_---------#---------+ +----_---------+
_ 1 _ Joe _ 35 _ _ 1 _ Ford _ Taurus # 1 _ _ 1 _ Safeco _
+-----------------+ _ 2 _ Toyota _ Corolla # 2 _ _ 2 _ Geico _
_ 3 _ Dodge _ Stratus # NULL _ +--------------+
_ 4 _ Honda _ Accord # 2 _
+---------------------------------+
RECORD: 2 KEYTABLE: (CustomerNo = 2, Dataflow = CarOwners)
TABLE: Person TABLE: Car TABLE: Insurer
+-----------------+ +---------------------------------+ +----------------+
_ ID _ Name _ Age _ _ ID _ Make _ Model # Insurer _ _ ID _ Company _
+----_------_-----+ +----_--------_---------#---------+ +----_-----------+
_ 1 _ Ruth _ 28 _ _ 1 _ Acura _ Legend # 1 _ _ 1 _ AAA _
+-----------------+ _ 2 _ Ford _ F150 # 1 _ _ 2 _ StateFarm _
_ 3 _ Geo _ Tracker # 2 _ +----------------+
+---------------------------------+
Each record in this database corresponds to a single person, and the information stored for that person is the content of the dataflow `CarOwners'. This dataflow contains three tables: `Person', `Car', and `Insurer'. As you see, each table has several columns (each denoting a different piece of information), and may have more than one row. The concept of a relationship is illustrated by the `Insurer' column of table `Car'. This relationship provides a way to link cars with insurance companies. In the example, Joe's two Japanese cars are insured with Geico, his Ford Taurus is insured with Safeco, and his Dogde Stratus is not insured at all (bad Joe!). Finally, note the `KEYTABLE' specification at the top of each record. The key table is a special table which is not actually part of the dataflow, but rather contains just enough information to uniquely identify each record (here, simply a `CustomerNo'). The last entry in the key table always contains the name of the dataflow stored in the record.
The structure of an ADAMO database is defined in a language called DDL, which stands for Data Definition Language. The structural definitions for any given database are contained in DDL files, which carry names like MCARLO.ddl or g1.ddl. In your software and analysis work, you will be interacting with various HERMES databases, and to do so, you must know their structure. Our reconnaissance tour will not only inform you of the various databases (i.e. data files) present at HERMES, it will also explain where to find the corresponding DDL files that will allow you to read and understand them. The details of DDL itself will be explained in the ADAMO tutorial. Also, you should be aware that in the case of most databases, the only documentation we have on what all the individual entries mean are the comments written directly in the DDL files. The DDL files for the uDST's are particularly complete in this regard, with many detailed comments.
Dad and PinK
As part of the birth of the HERMES software suite, an extension to ADAMO was written by famous HERMES programmer Wolfgang Wander. This extension package is called DAD, for Distributed ADAMO Database. DAD is a very powerful library which enables server and client programs to exchange information in ADAMO format using pipes and sockets. Such programs may even reside on different machines, but are still able to talk to each other using DAD and the internet! This client-server model appears frequently at HERMES, particularly at the online level. It can be pictured as follows: One big server program maintains a particular database (e.g. online monitoring information). A variety of low-level client programs feed it new information from time to time, with a time stamp. For example, a client might deliver periodic readings from a series of pressure gauges. ... the server updates its database with this information as it arrives. Meanwhile, another set of clients is running, performing monitoring tasks. These clients have asked the server that they be informed of updates to particular tables. When such updates occur, the server dutifully informs the clients, and the result may be, for example, a new point on one of the many montoring displays you see at the East Hall.
More about those monitoring displays ... i.e. the ones that dominate your existence when you are on shift. :) Those displays are examples of another extension of ADAMO, called PinK. This package was written by another famous HERMES programmer, Marc-André Funk (better known as MAF). It is actually not an extension of ADAMO, but rather an extension of the tcl scripting language. This scripting language allowed from its very inception the addition of new sets of commands, by anyone willing to write the necessary extenion library. PinK is exactly such a library: it adds all of the ADAMO commands to the tcl language. The point of this is simply: instead of always having to write a program in C or FORTRAN to interact with an ADAMO database, you can just write a script, in tcl + PinK.
Tcl is often denoted `Tcl/Tk' ... why is that? Because tk is yet another extension to tcl, but a very commmon one. It is an extension that allows tcl scripts to easily create graphical output! So here is the power of PinK: it combines the scripting language tcl, the graphical output commands of tk, and an interface to ADAMO. Et voila: if you are familiar with these three packages, you can rapidly write a PinK script to access information from any ADAMO database, and immediately present it in graphical format. A last note: MAF created a smaller version of PinK, called floyd. This is simply PinK without the tk extension -- i.e. without graphics.
By the way, PinK stands for Pink Is Not Kuip. You might consider asking one of your CERNLIB-enabled friends why that is funny. I trust that the significance of floyd is clear. :)
The PinK Browser
You will learn all about ADAMO files in the ADAMO tutorial. But for now, let me just introduce the pink browser which is the easiest way to examine their contents. This program provides a window-driven visual interface to ADAMO files, and is very easy to use: just type pb. The pink browser main window will appear. I'm sure you can figure out very quickly how to read in and display your GAF file. The basic button-pushing sequence is:
- Use the DAD or GAF pull-down menus to specify the file to read. Try for instance to open the following data file (a DAD file): /production/udst/00d2/smlinks/run10000.smdst.gz. Or the infamous geometry file (a GAF file): /hermes/pro/lib/hmcdg_00.ie.
- A controller window will appear ... hit some button like `Read; Next' or `Next; Fetch' to read in the first record.
- Press `Browse Record' to start trolling around through the tables and dataflows. Double clicking on dataflows will produce a list of the tables therein ... double click on table names will pop up their contents in separate windows.
- Hitting the 'Read; Next' button again in the controller will get the next record and update your table windows with the new data.
At the EAST Hall: DAQ and Slow Control
Our data cannot be really understood without at least some knowledge of this section. If you want only a "reconaissance to the reconaissance", then you can skip this section.
The very first programs that the data encouters as it leaves the electronic modules in the East Hall are the programs of the online software. This software is subdivided into two major, and very different, parts:
The DAQ (Data AcQuisition)
The monumental behemoth of coding that is the HERMES DAQ consists of thousands of lines of FORTRAN code with no comments, no indentations, no lowercase letters, no variables longer than 6 characters, and more hardwired magic-numbers than you can shake a stick at. :) The DAQ is centered around an event builder. This program responds every time the detector electronics generate a trigger, indicating that a potentially interesting event has been seen by the detector. When a trigger occurs, the DAQ collects the information recorded by every component of the detector, via their electronic readout modules (ADC's, TDC's, etc). The DAQ sorts this raw information and turns it into an event, with a standard structure. This event is then copied to two places for storage: (1) It is written immediately to DLT tape, right in the East Hall. (2) It is also copied to disk, for later transfer to the giant tape robot located at the DESY main site. The next pieces of software in the processing chain retrieve the data from the robot ... the DLT tapes serve only as a backup copy.
Each normal HERMES event (associated with a trigger condition) goes to the DAQ data stream as a single event record. However, the DAQ also stores other types of records. One important example are scaler events. These events are recorded once every ten seconds, and consist of a readout of all scalers in the experiment. As you know, scaler modules are simply counters and are thus designed to be read out only `once in a while', after they have had the opportunity to accumulate some counts. They are also cleared every time they are read out, so that they can start counting again from zero. This ten second time scale defines a burst: one burst is one ten-second period of time between scaler events. Scaler events are also known as 1103's ... this is the code the DAQ uses to identify such a record in its data stream. Each 1103 marks the start of a new burst.
Perhaps the most important example of scalers and what they are good for comes from the LUMI monitor. This device records Bhabha or Moller scattering events, which occur when beam particles collide with the orbital electrons of the target atoms. The number of such collisions occuring within one burst is a measure of the total luminosity present in that burst. The accumulated luminosity is of course a vital quantity to know if you plan on measuring a cross-section. The luminosity information at HERMES is measured entirely through the use of scalers, which simply count the number of Bhabha or Moller events detected since they were last reset. A second quantity that is computed using scaler events is trigger deadtime. One set of scalers counts the number of triggers generated during each burst by the electronics. Another set of scalers counts the number of accepted triggers -- i.e. the number of events that the DAQ was actually able to put on disk. The two are not exactly the same, because in general the DAQ cannot quite keep up with the trigger rate and a small percentage of events are simply lost. The fraction of events lost is called deadtime, and is determined by taking the ratio of the generated and accepted scaler readings. If you are creating a cross-section out of the recorded events, you must correct for the deadtime.
Bursts are thus data units of fundamental importance to the structure of the HERMES data stream. However, there is one complication that must be explained. The ten-second burst clock does provide a convenient time scale, but of more fundamental importance to our experiment is the time scale on which the target flips its polarization state. With the ABS target, this occurs about once every 90 seconds. Most of our data analysis involves the measurement of spin asymmetries, and the way that such analyses are performed is basically to measure a cross-section of interest in a spin-separated way: you measure the cross-section when the beam and target spins are parallel, and compare it with the cross-section for anti-parallel orientation. It is thus vitally important that the luminosity information provided by the scaler events is also recorded whenever the target flips, not just every 10 seconds ... otherwise you will never know how much lumi you recorded in one spin state vs the other. And so, we have another type of scaler event at HERMES: the 1203. This is just like a regular scaler event, except that it is generated whenever the target changes state. Specifically, a 1203 is generated when the target starts flipping, and when it stops flipping (having settled into its new spin state). The burst number is not incremented when a 1203 occurs. You will see evidence of this structure later, when we talk about the microDST's.
Other more specialized types of DAQ records exist, but they are only of interest to certain detectors. In general, these other record types are called user events. The LPOL (Longitudinal Polarimeter) for example has its own type of user event, where only LPOL information is recorded.
Finally, the DAQ organizes information into runs. A run is just another convenient block of information: each run corresponds to one file of data. The DAQ automatically ends a run whenever 560 Mbytes of information have been collected; it then closes off the current run file, and starts a new one. Runs may of course also be started and stopped by hand, by the software person on shift.
One final piece of DAQ jargon: equipment and subequipment numbers. From the DAQ point of view, the information from the HERMES detectors is organized by the location of each readout module in the electronics trailer. The DAQ imposes some structure on this by logically grouping together modules in different crates by equipment and subequipment number. The various modules of the RICH readout, for example, are physically located in different places, but are referred to the by the DAQ as equipment number 21. You will encounter this numbering scheme in the decoder program (HDC), and also in the error messages produced by the DAQ itself: `Trouble with Eq. 9, Subeq. 2' or some such thing.
The DAQ is the only major piece of HERMES software that does not work with ADAMO. Rather, the output of the DAQ (the raw HERMES event files) is in EPIO format. There are perhaps two, maybe three people at HERMES who know anything about this format. :) As I mentioned, the DAQ produces one EPIO file per run.
Slow Control
Slow control refers to the reading and recording of hardware information that changes on a slow time scale. The DAQ has to build events at the rate the triggers come in, which can be as high as 500 Hz. However, there are many devices in the experiment which do not need to be read out on such a rapid time scale. Examples are: pressure gauges, high voltage settings, and measurements of the phototube gains by the Gain Monitoring System. Measurements of this type are unlikely to change on a short time scale, and so they are recorded only once every few minutes. The slow control consists of a suite of many programs, interconnected via the client-server model, that read, write, or display such information about the status of the hardware. The most important slow control program is the taping client. It is surrounded by a large number of little client programs that monitor the output of various devices, and send new data to the slow control database from time to time. This data is transmitted in the form of ADAMO tables. The taping client watches for new tables as they come in, and dutifully writes them to a file. The taping client starts a new file at the beginning of each HERA fill, and so these files are usually referred to as fill files. The fill files are then copied over to the PCFarm for later processing. The taping client also creates nobeam files, corresponding to the periods in between fills. The fill file and nobeam files are sometimes referred to collectively as the slowlogs.
Another famous HERMES programmer is Walter Brueckner, the author of the DAQ. As I mentioned, the DAQ is unique in that it does not deal with ADAMO format at all. The DAQ, and the processes and machines associated with it, are often referred to as "Walter World". Walter World is populated with VT100 terminals and ASCII displays of information. As soon as you hit slow control, you have left Walter World, and will forever after be in the World of ADAMO, DAD, and PinK. When you are on shift, you can immediately tell the difference.
Time: VME vs UNIX
The DAQ and the slow control thus represent two separate data streams, and at some point down the line, we have to synchronize all this information. Time is thus of the essence. :) Two `clocks' (or measures of time) are in use at HERMES. The DAQ marks each of its records with VME time. VME is basically an `onboard' computing system that is an intrinsic part of the DAQ hardware. The VME clock is located amongst the racks of the electronics trailer, and is read out along with the other detectors. Meanwhile, the slow control tasks running on the various UNIX machines in the East Hall stamp each of their records with UNIX time, obtained from the internal clock in the UNIX operating system. A significant headache at HERMES comes from having to syncrhonize DAQ and slow control records stamped with these different clocks ... because, you guessed it, one or the other of them sometimes drifts. But such is life.
If you encounter time stamps in your analysis, you will find that they are usually given as two numbers. The first is the number of seconds since Jan 1, 1995, 00:00:01 (the Birth of HERMES). The second number refines the time stamp, giving the number of microseconds since the previous second.
The Main Production
We have now left the East Hall: the EPIO files have been successfully copied from Walter World to the PCFarm. The next processing step is the main production. This production chain deals only with the EPIO files, not the fill files ... they will be processed later on in the sequence. The main production consists mainly of two programs: HDC, which decodes the online information and applies detector calibrations, and HRC, which reconstructs all the wire chamber hits into actual particle tracks, and associates information from the PID detectors with each track. Finally, the data is run through ACE, which computes wire chamber efficiencies.
NEEDFIX: remove this paragraph ... At HERMES, the official main production chain runs on the PC farm ... not on the SGI. The farm is a cluster of Linux PC's named after Star Trek characters (e.g. picard.desy.de), and is one of our principal computing workhorses. The output of the main production is copied to both the SGI and the robot.
The main production takes us from EPIO files, to HDC files, to HRC files. Here is a little snapshot of what this accomplishes:
- EPIO files: There is one record per event (trigger). Each event consists of a raw dump of all the readings produced by all components of the detector. The items in the EPIO file look like this: `channel 23 of the ADC in slot 6 of subequipment 2 of equipment number 9 gave a reading of 356'.
- HDC files: There is one record per event. Each event consists of a series of ADAMO tables, containing the calibrated response of each module of each detector. The entries in the HDC file look like this: `Calorimeter block 416 recorded an ADC reading of 1726. After calibrations, this corresponds to an energy deposit of 2.6 GeV' ... or like this: `Wire number 75 in plane 3 of FC1 (the first Front Chamber) was hit; the hit occurred 9.6 nanoseconds after the most recent electron bunch crossed the target. I therefore calculate that, given the time it takes a particle to reach FC1, the particle producing this track crossed the chamber 0.21 microns away from the actual position of wire number 75.'
- HRC files: There is one record per event. Each event consists of a series of ADAMO tables, containing information arranged by particle track. Imagine that HRC has combined the wire chamber hits for a particular event into 3 reconstructed tracks. The entries in the HRC files look like this: `Track #2 has a momentum of 6.9 GeV/c, carries a negative charge, and originated from a position 3.1 cm downstream of the center of the target. Given the points at which this track passed through the PID detectors, I can tell you that it deposited 84 keV in the preshower, 40 keV in the TRD, and 6.8 GeV in the calorimeter.'
HDC
HDC (HERMES DeCoder) reads the raw data files and turns them into something comprehensible. It performs three major tasks:
- Mapping
- Each detector group supplies HDC with a mapping table. This table simply relates each hardware channel of their readout modules with a software channel. A hardware channel is identified by 4 numbers: channel-slot-subequipment-equipment. As described above, this designation refers to the physical location of the channel in the electronics trailer. Software channels are designated by a detector ID and a `wire' number within that detector. The detector ID comes in two forms: a 4-character name and an ID number (the connection between the two is provided by the geometry file, which is described later). An example of a detector name is 'H1LV', which refers to the lower hodoscope array H1, or 'B2U5', which denotes the 5th wire plane of the upper BC2 chamber (i.e. the second back chamber). The `wire number' within each detector is a generic term that may refer to an actual wire in a wire chamber, but also to e.g. a paddle number in the case of the hodoscopes.
- Calibration
- Each detector group also provides HDC with calibration information. This turns things like ADC readings into physics quantities, like deposited energy.
- Geometry
- So imagine HDC has determined that wire number 247 of detector F1L2 (one of the FC1 planes) has fired. From the time of the hit, and the calibration information, HDC also knows how far away from the wire the particle must have passed. But ... where IS this wire in space? HDC consults the geometry database to determine the wire's position, in the HERMES coordinate system.
The mapping, calibration, and geometry information that HDC needs to perform its three functions is handed to it by DAD servers. Jargon-wise, we refer to these servers as MServer (mapping), CServer (calibration), and GServer (geometry). In fact, these pieces of jargon do not really refer to a single server program ... the CServer for example refers to a collection of almost 20 separate little servers. The official servers run on the PC farm, on a machine called geordi.desy.de. You may connect to these servers yourself, if you choose to make a private run of HDC ... the information you need to connect to them is stored in a file called dadinit.cnf. But more on such technical details later. One other note: the GServer contains the information from the infamous geometry file, which you will learn about later.
GMS
NEEDTEXT
HRC
For each event, HRC (HERMES ReConstruction) basically collects all of the hits recorded by the wire chambers and tries to draw lines through them. These lines correspond to the tracks created by actual particles as they traversed the detector. The next paragraphs provide a very brief summary of how HRC operates.
The HERMES detector has a front part, and a back part, and between them is a big vertical magnetic field which causes charged particles to bend. HRC reconstructs partial tracks in the front and back regions independently: since there is no magnetic field in these regions, the partial tracks are indeed straight lines. The hits in the forward tracking chambers (VC, DVC, and FC) are used to determine the front partial tracks, while the hits in the four BC chambers form the back tracks. HRC then tries to connect the front and back partial tracks it found into full tracks: it projects pairs of partial tracks into the center of the magnet, and if they hit the same spot, it declares a match. By comparing the angle of the front track with that of the back track, and consulting a field map of the spectrometer magnet, HRC finally determines the momentum of the track. It also determines the charge of the particle: if the track bent to the right it was positively charged, and vice versa.
Some more HRC jargon is in order. All of our wire chambers consist of sets of planes denoted x, u, and v. The x planes contain vertical wires, while the u and v planes contain stereo wires tilted to the left and right of vertical by 30 degrees. HRC actually begins by connecting the hits in the x, u, and v planes separately into tree-lines. It then combines these into partial tracks, and finally goes on to form full tracks.
HRC is able to operate in several different tracking modes. These modes depend principally on which of the tracking chambers are used in the reconstruction. Perhaps the major variable here is the VC's: these finely-segemented detectors were fully operation only during the 1997 data taking period. Another variable is the DVC's: these chambers were only installed during 1997. HRC is thus forced to support several tracking options:
- Standard tracking refers basically to any tracking method involving the VC's. This method was indeed the standard for the 1997 data. However, since the VC's were more-or-less toast during all other years, this tracking method is not really our `standard' method ... rather it is the best (most accurate) tracking scheme we have had.
- Forced bridging. Unless you explicity request otherwise, HRC will engage its forced bridging algorithm whenever the VC's are not in use. Remember that before 1997, there was no DVC chamber in the front region ... only the FC's and VC's. The concept behind forced bridging is this: if the VC's are out of the game, the front tracking could use some help! Where does it get this help? From the back chambers. By comparison to the volatile front region, the four BC chambers in the back have operated perfectly from the beginning of the experiment. So how does force bridging work? As I described before, HRC constructs front and back partial tracks, then tries to match them together at the center of the magnet to form full tracks. If forced bridging is active, HRC performs an additional step on each of these full tracks. Using its calculation of the track momentum, it can use the precisely reconstructed back track to determine where the front track SHOULD have hit the center of the magnet (taking bending into account). This projection of the back track is then used to provide an additional space point for the front track. Adding this projected point from the back to the information from the front, the entire track is refit, and its momentum is redetermined, with enhanced precision.
- FC only. As you might guess, this tracking method means using only the FC's in the front, plus forced bridging. This is actually the tracking method that is most commonly used at HERMES.
- FC + DVC. And this is undoubtedly clear as well: it refers to the tracking method that uses FC and DVC information in the front, and also employs forced bridging. As I mentioned before, forced bridging kicks in by default whenever the VC's are absent.
NEEDTEXT: NoVC?
You may have noticed that the magnet chambers (MC's) have not yet been mentioned. In fact, they are not used in the regular tracking algorithm -- the reconstruction of front and back partial tracks, and the matching of the two is done without the assistance of the MC's. Such a complete track, with a front and back part, is referred to as a long track (or full track). However, if a particle is of low momentum, it may be bent so severely by the spectrometer magnet that it never reaches the back of the detector. This effect starts at about 4.5 GeV/c (i.e. just below that momentum, particles at the left and right edges of the front tracking detectors will be bent out of the acceptance). The acceptance drops to zero at around 1.5 GeV/c. However, such low momentum particles can still be measured by combining their front partial track with the hits they leave in two or more of the magnet chambers. This is called a short track (or magnet track). The momentum of such a track can be determined, because the hits in the MC's describe the bend radius of the particle in the magnetic field. Short tracks are very important in e.g. the reconstruction of Lambda particles, which decay to a proton and a negative pion of very low momentum (often called a `slow pion').
HRC takes almost all of its input from the HDC file. However one important external file that it reads in is the alignment file. For high-resolution track reconstruction, HRC must know with great precision (to within a fraction of a millimeter) exactly where all of the wires in the experiment are located in real space. As you will read in a later section, we have a geometry file that stores the positions of all detectors ... but these positions are not quite precise enough for HRC. Specifically, the detector positions in the geometry file are considered `baseline' positions and are not changed from year to year. In reality the wire chambers may move slightly, particularly when they have been removed and reinstalled between years. To determine these small displacements, we take several alignment runs right after any changes to the detector have occurred (typically once per year). Alignment runs are taken with the spectrometer magnet turned off, so that all particle tracks will be simple straight lines. An aligment code operated by the drift chamber groups analyses these special runs ... it iteratively applies tiny shifts to each chamber plane until the straightest lines are achieved. This best-fit set of position adjustments is recorded in the alignment file for a particular year, and used by HRC during the main production.
HRC performs other functions besides track reconstruction: it also processes information from the PID (particle identification) detectors. These are the CALO, hodoscopes, TRD, and Cerenkov. The RICH is not dealt with until later on in the production chain. (A small note about detector terminology: The term `hodoscope' refers to a scintillator array. The primary hodoscopes at HERMES are H0, H1, and H2; their principal function is to provide fast signals for the formation of our triggers. However, H2 has a second purpose: there is a thin lead curtain in front of it which turns it also into a `preshower' detector, useful for PID.) Basically, HRC associates the hits recorded in the PID detectors with each of the tracks it has found. For example, it checks to see which preshower paddle each track passed through, and then associates the amount of energy deposited in that paddle with the track. HRC then goes on to perform probabilistic PID calculations, which are described in a later section. However, it is important to note that HRC's PID calculations have now been superceded by those in the uDST writer.
HRC has to perform some special calculations related to the calorimeter. A particle passing through the heavy lead-doped glass of the calorimeter will start a shower, consisting of a cascade of particles. Such showers are in general not contained within a single CALO block: the shower spreads radially in space as the particles move forward, and will deposit some energy in neighbouring blocks. Consequently, to reconstruct a single shower hit, HRC must use a clustering algorithm. It collects the energies recorded in 3x3 squares of adjacent blocks, sums them together to provide the total energy deposit, and performs a calculation to determine the center of the cluster. Note that some clusters are not associated with tracks ... photons for example do fire the calorimeter, but do not produce hits in the wire chambers. HRC also saves these untracked clusters.
TMC
HRC does not know about the transverse magnetic field of the target magnet for the polarized data of 2002-2005 (in contrast to earlier years the target nucleons were polarized in the y- (i.e. transverse) direction and not in the z- (i.e. longitudinal) direction). The main momentum component for reconstructed tracks is the z-component. The force due to the magnetic field thus leads to partially large deflection of the tracks in the x-direction. The transverse target magnet correction (TMC) was designed to take the field into account and apply a correction to the front-track parameters provided by HRC. For this it uses the HRC track information, which should be valid in the region far behind the target magnet, and then reconstructs the vertex parameters along the beam line. Two distinct methods are available for redundancy. TMC runs on the HRC files and fills an additional table (rcVertexCorr) that can be used by the analyzers.
XTC
NEEDEXPERT: To write sections on DVC, FQS, HM and Wide Tracking in XTC
The output of HRC is processed with the so called eXternal Tracking Code (XTC). It is responsible for reconstructing tracks outside the regular HERMES acceptance, namely the Lambda Wheels (LW) and the Recoil Detector (RD). Both detectors (LW and RD) are treated independently in XTC and reconstructed tracks are written to separate tables in the output file (in addition to the tracks reconstructed by HRC).
Lambda Wheel Reconstruction
A detailled description of reconstruction of tracks in the LW can be found in M. Demey's thesis (p. 52 ff) and references therein.
Recoil Detector Reconstruction
Currently there are several reconstruction methods implemented for the Recoil Detector (see the Technical Design Report for details) which are numbered in the following way:
- Method 1
Simple track search with a momentum reconstruction using the actual Recoil Detector field map - Method 3
Reconstruction by bending in the magnetic field (assumed homogeneous) using only the scintillating fiber tracker (SFT) information. - Method 7
Sophisticated track search algorithm with a momentum reconstruction assuming a homogeneous field. Energy deposits along the track are taken into account - Method 15
Uses just information from the Silicon Detector (SSD). Simple track search with a momentum reconstruction by just energy deposits in the SSD - Method 701
Combines the track search algorithm from Method 7 with the momentum reconstruction from Method 1
The Recoil Detector sees mostly low momentum pions and protons (p<1 GeV/c) which loose energy along their way through the detectors and passive materials. A track is therefore not a perfect circle. This is taken into account in Method 7 of the XTC reconstruction. As the true particle identity is not know during the reconstruction several track hypotheses are written to the output. Each track which consists of a list of space points in the sub detectors (coordinate and energy deposit information from SSD and SFT) can have the following track parameter hypotheses:
- Pion Hypothesis
- Kaon Hypothesis
- Proton Hypothesis
- Deuteron Hypothesis
- Stopped Proton Hypothesis
- Stopped Deuteron Hypothesis
Each hypothesis contains the actual track parameters (vertex coordinates Vx, Vy, Vz; momentum p; polar angle theta; and azimuthal angle phi) which depend only on the particle type assumed in the reconstruction. Tracks that are reconstructed by their bending in the magnetic field (at least three hits in the sub detectors) will always have the pion hypothesis. For positively charged tracks the proton hypothesis is available in addition. With the PID information available at the analysis stage the analyzer can choose the correct hypothesis and hence get the correct track parameters. The "stopped proton" and "stopped deuteron" hypotheses are special cases which represent protons and deuterons that were stopped in the outer SSD layer. Method 15 which reconstructs the momentum by the energy deposits in the SSD provides the proton and stopped proton hypothesis. A detailed description of the Recoil Detector related ADAMO tables can be found on the Recoil ADAMO Tables page.
ACE
The name of the ACE program stands for Alignment, Calibration, and Efficiency. The original intention of this program was to fulfill all three functions ... but in fact ACE only does the third one: it computes the plane efficiencies of the wire chambers. (The remaining letters of `ACE' are fulfilled by a myriad of separate programs which are owned and operated by the detector groups).
For each wire chamber plane in the detector (call it the `test plane'), ACE considers a large number of reconstructed tracks, and asks the question: "How often did the test plane fire along these tracks?" If the plane only fired 95% of the time, it is 95% efficient. One efficiency number per run is produced for each plane. ACE does have to go to some lengths to find a good, bias-free, track sample to use for its calculations: the tracks it considers should be such that they did not require the test plane to have fired! Here is the problem: HRC requires a minimum number of planes to show hits before it will reconstruct a track. Suppose this number is 5. When ACE is testing one of the planes, it should not use tracks where only 4 of the other planes recorded hits: a hit in the test plane is not redundant in this case, and may have been necessary for the track to be reconstructed at all. This will bias the efficiency calculation to artificially large values. ACE instead considers only tracks where at least 5 of the other planes fired ... a hit in the test plane is thus redundant, and the frequency with which it fired is a proper measure of its efficiency. The situation is actually a bit more complicated than this, but I trust the principle is clear.
What analyzers finally need to normalize their data is the full efficiency of the tracking system. This efficiency is termed the Permuted Plane Efficiency (or PPE for short). It is computed in a combinatoric fashion from ACE's single-plane efficiencies. This calculation involves detailed knowledge of how HRC works, and is performed by a separate program (outside the production chain) which is owned and operated by the tracking group.
HTC
NEEDTEXT
Main Production Output and Naming Scheme
Each pass through the main production (HDC - HRC - ACE), for one year's worth of data, can take two months to complete (2007 data). Consequently, this is not something we do lightly. However, we do run all of our data sets through the main production at least twice, for reasons that are explained below. Each run of the main production is given a name like `97b'. Here, `97' refers to the year in which the data set was collected; the letter `b' means that this was the second production pass through the data.
There are very few reasons for running the main production multiple times on a particular data set:
- First, we always make at least 2 passes through the main production. The reason for this is calibrations. The `a' production of any given data year is the very first production that is run ... in fact, it runs in real time, as the data is being collected. Where do we get the calibration information for the `a' production? Answer: from the previous year. Thus, the calibrations in any `a' production are intrinsically out of date. We run the `a' productions as the data comes in for two reasons: (1) to make real-time checks of the incoming data, and ensure there are no major problems ... and (2) to provide a production that the detector groups can use to recalibrate their devices. These recalibrations are then applied to the `b' production. To summarize: no published HERMES result will ever come from an `a' production, because it is inherently uncalibrated.
- The principal reason for running more than two productions on a given data set is the tracking method. An excellent example is the 97c production. 97b was run using Standard Tracking (including the VC's). The 96 productions, by comparison, were all run in FC-only mode, because the VC's were simply not available during 1996. So now the story begins: during the analysis of the Hydrogen-target data, a vaguely-disturbing possible-discrepancy was observed between the results from 1996 and 1997. After many, MANY months of careful study, no problem was found, and the difference was attributed to a statistical fluctuation. However, as a last check, we decided to reproduce the entire 1997 data set using the same tracking method as 1996, FC-only -- just to be sure that the tracking method had nothing to do with anything. To end the story: no dependence of the hydrogen asymmetries on tracking method was found. :)
- Any major change to HDC or HRC requires a rerun of the main production. For example, the 1995 data was produced many times, because the HDC and HRC programs were still under development. In the near future we may see reruns of the main production, in order to incorporate information from some of the upgrade detectors that were added in 1998.
The output of the main production consists of a series of run directories -- i.e. one directory per run. The biggest file in each directory is the HRC file. The main HDC output file, containing all the decoded data tables, is never stored to disk: HDC output is piped directly to HRC during the main production, to avoid having to store the enormous HDC files on disk. However, the main production also produces a collection of smaller files along with the HRC file. These are mostly generated by HDC and ACE. HDC not only decodes event data, but also extracts various user events from the EPIO files: the all-important 1103 scaler events, polarimeter information, etc ... The information from these user events is dumped into a series of (relatively) small files.
The HERMES Coordinate System
We mentioned the HERMES coordinate system a couple of times in this section ... so let's define it:
- the origin (0,0,0) is at the center of the target
- the +z axis points downstream (in the direction of the positron beam)
- the +y axis points vertically upward
- the +x axis is chosen to form a right-handed coordinate system (thus, +x points to the left if you are looking downstream)
And here are our standard units:
- distances are in centimeters
- angles are in radians
- energies are in GeV
- momenta are in GeV/c
- masses are in (you guessed it!) GeV/c2
The Slow and uDST Productions
As mentioned before, the main production can take two months to run through a single year's worth of data ... which is a long time. The output of the main production also uses up an enormous amount of disk space. The produced 2007 data, for example, takes up about 10 TB of space. These are two of the reasons that we run a second production chain on our data: the slow production and the uDST production. uDST stands for microDST ... the DST part stands for Data Summary Tape. This name is indicative of the small size of the uDST files, as compared with the main production output.
The slow production is very fast: it takes a maximum of 2 days to produce a year's worth of data. The uDST production is nearly as fast, taking a maximum of 5 days.
The Slow Production
Remember the fill files that were generated by the online slow control software? This is where they get processed, to become slow files (sometimes called `post-produced slow files' in order to avoid/generate confusion). The slow files have basically the same structure as the raw fill files ... again, there is one file per fill.
The slow production is responsible for collecting data from three different sources, and synchronizing it by time stamp. The three sources of input are as follows:
- Most of the data processed by the slow production comes from the raw fill files.
- The slow production adds new tables of information, which come from external expert files supplied by various detector groups. These tables contain offline calibrations that were not needed during the main production. Examples are the smoothed polarimeter measurements, and the gain corrections to the LUMI rates.
- The slow production also pulls in data from the main production run directories. The information from all those little non-HRC files in the run directories is read in at this point. An important example is the scaler information from the 1103 user events. (Actually ... not all of the little files are read, many are only of interest to specific detector groups for their calibrations.)
The uDST Production
The purpose of the uDST production is to produce the very final data files that are used by analyzers to make physics results. The program that does this is called the uDST writer, and the output of the uDST production are the uDST files. Each uDST file corresponds to one run. It is quite possible that in your analysis career at HERMES, the only data files you will ever have to deal with are the uDST files.
Here is a snapshot of what the uDST writer does.
- It fuses together the information from the slow files and the HRC files. These two data streams contain data collected on very different time scales. The uDST writer uses time stamps to associate events from the HRC files with the calibrations and other measurements recorded in the slow files.
- Just like the slow production, the uDST writer reads in several expert files, containing offline calibrations. Much of this information concerns data quality. Each detector group goes carefully through the collected data every year, and decides which runs or bursts should be thrown out of the analysis because their detector experienced a fault during a particular period. The uDST also reads in other types of expert files. Examples are the parent distributions for the PID calculations, and calibrated polarization information from the target group.
- The uDST writer also performs a lot of calculations. Basically, any calculations that are not intrinsically related to the decoding and reconstruction tasks of HDC and HRC are performed here. Why? Because the main production takes a long time! An excellent example is the PID. As mentioned before, the official PID calculations are now done by the uDST writer ... and the RICH PID algorithm is only incorporated at the uDST stage. In this way, new parent distributions, or changes to the RICH code, can be quickly incorporated by running a new uDST production ... rather than having to wait two months for a new main production to finish.
- The uDST writer tries hard to compress the information from its input data streams into a relatively small format. For one thing, it does not store all events. For example, an event that did not fire trigger 21 and contains only one track is thrown out -- the point being that no useful physics analysis could be performed on such an event (at least, none that we know of at the moment :)). Further, most of the raw information from the detectors is tossed if it is not associated with a reconstructed track (actually, the main production already does most of this filtering). There is an important point here, Cadet: if you happen to have selected an analysis topic that is very new at HERMES (something which happens rather often these days :)), you may find that the uDST writer is throwing out events or information that you would like to keep. When this happens, well, we ask for a feasibility study. If everything looks ok (e.g. the new information you need does not require terabytes of additional disk space), then we change the uDST writer, and rerun the uDST production.
The uDST files are organized at two different levels: burst level, and track level. The fundamental organization is at the burst level: in ADAMO language, each record in the uDST files corresponds to one burst. Most tables in the uDST's contain burst-level information, and therefore contain only one row per record. However, within each record (burst) there may be many events, each of which may contain many tracks. Any non-burst-level tables in the uDST files are organized at the track level: they contain one row per track. Each track is associated with an event number, and (by virtue of its location in the record structure) with a particular burst. You have to deal with this track-event-burst structure when you perform your analysis. Fortunately, we have an excellent utility library called hanna which navigates these complications for you. More on hanna later.
I said above that each uDST record corresponds to one burst ... and in doing so, I glossed over an important subtlety. Remember our discussion about the DAQ: the burst number changes every 10 seconds -- whenever an 1103 scaler event is recorded. However, since the target spin state flips on a different time scale, target-related 1203 scaler events may occur within these 10-second bursts. The uDST files actually contain one record per 1103 or 1203 event -- there may thus be more than one uDST record with the same burst number, resulting in a split burst. These different uDST records are distinguished by the target spin state. Here is an example of what the sequence of records in a uDST file might look like:
burst# time (sec - usec) targetbit targetpol
------ ----------------- --------- ---------
28252 919972012 - 380927 8 -0.899
28253 919972022 - 388801 8 -0.899
28254 919972032 - 395604 8 -0.899
28255 919972042 - 403290 8 -0.899
28255 919972047 - 440855 4 0.918
28256 919972052 - 411090 4 0.918
28257 919972062 - 419583 4 0.918
28258 919972072 - 426539 4 0.918
Notice how burst number 28255 appears twice ... it was split in two by a change in the target spin state. The `targetbit' is an important entry in the uDST files which you will soon learn about in detail. As you can see, the codes 4 and 8 mean that the target is in a state with positive or negative nuclear polarization. So to summarize what happens to the split burst 28255: it will appear in two separate records in the uDST file, each corresponding to a different target spin state. The start of each burst portion was marked in the data stream by an 1103 and a 1203 scalar event respectively (the uDST writer does not really care which type of scalar event it was). These two records will carry the same DAQ burst number (28255), but not to worry ... there is also a uDST record counter which distinguishes them.
Slow and uDST Production Naming Scheme
Usually, any new run of the uDST production also involves a new run of the slow production. Thus, these two programs are typically referred to as the second production chain of HERMES. Since the uDST writer is the last step, this chain is often called simply the uDST production. And in fact, we sometimes do rerun just the uDST production without an attendant rerun of the slow production.
Recall that the main productions are given names like `98b'. The slow + uDST productions are given names like `98b4'. This indicates that the uDST production was run for the fourth time using the 98b main production. The details of why 98b4 is different from 98b2 are provided on one of our web pages (more on that later).
Here are some typical characteristics of the various passes through the uDST production, in any given year:
- As explained before, the `a' version of any main production contains last year's calibrations. Similarly, the a0 version of any uDST production also contains old calibrations, or no calibrations at all. Its only real purpose is to provide an essentially online check of the incoming data.
- The b0 production in any year has obviously been run from the b version of the main production, and thus incorporates many important calibrations at the decoding and tracking level. However, the expert information supplied to the slow and uDST productions has not been tuned at this stage. Rather, the purpose of b0 is to provide several detector groups with a uDST production they can use to perform these calibrations.
- The first uDST production that has a chance of being good enough for physics analysis is b1. Several detector groups require the b0 production before they can perform their calibrations at all: e.g. redetermination of the parent distributions, and refitting of the parameters needed in the time-of-flight algorithm.
Data Quality
The very last stage of the offline data production is the checking stage. After each uDST production is run, it is carefully and thoroughly checked for errors by a vigilant group known as the Data Cops. This group prepares plots of all important detector quantities vs run number, and goes through them carefully looking for regions where there is a problem. Sometimes the problems are out-and-out errors, e.g. the TRD records nothing for several hours, or the beam polarimeter fits are completely missing for a block of runs. Such errors are usually caused by missing input files from the detector experts, or runs which failed the production for some reason. They are fixed by simply rerunning all or part of the production after the source of the trouble has been identified.
The second type of trouble that the Data Cops track down is called data quality (or DQ), and is more subtle. It happens that during some bursts or some entire runs, pieces of the experiment were really malfunctioning (e.g. the missing TRD information cited above might be due to an actual failure of the device). It is very important that such periods be marked as `bad': analyzers must throw out these data to avoid biasing their physics results. The Data Cops prepare a carefully considered list of up to 32 data quality conditions for each uDST production, and then run a program which tests each burst against each condition. The result is a burstlist, a huge file describing the overall quality of each record (i.e. burst) in the production. Each line of the file corresponds to one record, and contains such bookkeeping information as the (DAQ) burst number, the uDST record number, the target state, and the VME time at which the burst began. Most importantly, each line also contains two badbit words, one for the top half of the detector and one for the bottom. These words are 8-character hexadecimal words ... they thus contain 32-bits, one per data quality criterion. If a bit is set, it means the corresponding criterion was not met. Thus, if a badbit word is not exactly 00000000, it means that the burst in question failed at least one of the data quality checks.
Analyzers can use the burstlists in one of two ways. The simplest way is to extract a list of the good bursts that you want to include in your analysis, and then supply it via a command line switch to your hanna-based analysis code. (If you are not using hanna, you must write a little routine to check your burstlist.) The alternative is to code in the data quality criteria yourself, and check each burst you analyze. The Data Cops do provide source code for each of their 32 tests. However, there is one important feature of the burstlists that you must keep in mind: they are intended for the analysis of polarized data until including the data of the year 2005 (In 2006/2007, the Recoil period, the target was always unpolarized and the bits in the burstlist were adjusted accordingly). If you are doing an unpolarized physics analysis you can ignore many of the data quality criteria, such as the cuts on good beam polarization or performance of the target. If you use the first method (parsing the burstlist) to do your data quality, you must not simply select those bursts with badbit word = 0, but rather check each word against a mask which isolates only those bursts of interest. This can be easily accomplished using perl, for example.
The Data Cops also maintain the extremely important DataCops pages on the web. The primary page for information on the uDST productions is accessed by the link `uDST Productions'. The burstlists, data quality plots, and release notes for each uDST production are all available on this page. The top-level page also provides links to Online and Offline Data Quality from the Main Productions. The Data Cops check more than just the uDST productions, and the resulting pages contain many useful plots summarizing our data taking (e.g. summaries of the total number of DIS events collected in each year are available on the Offline page). You should browse these pages just to see what's available! For example, do you want to find out what range of runs in 1997 correspond to data-taking from a Nitrogen target? You'll find this information in various places on the Data Quality pages, as described in the Technical Information section below.
Some additional information on features of certain productions is compiled on this Hermes-wiki page: Productions
A Note about PID
Electron - hadron separation
The PID code in the uDST writer (and in HRC) associates each track with the responses of each PID detector module that lies along the track. By "response" we mean the energy measured by a detector module. It then goes on to perform calculations with these responses. The end result of these calculations are parameters called PID2 through PID5, which are measures of the probability that a particular track was an electron rather than a hadron. (Note that charge doesn't matter in this context, so I use the words electron and positron interchangeably).
Let R denote the responses of all the PID detectors along a particular track. For example, R for one track might be a set of detector responses like this: 25 keV in the TRD, 3.5 GeV in the calorimeter, 9 MeV in the preshower, and 5 photoelectrons in the Cerenkov. Now, let Pe(p,R) represent the probability distribution that an electron of momentum p caused a response R. Similarly, let Ph(p,R) be the same probability distribution, but for a hadron. Then, for a track with a particular p and R, we calculate the quantity `PID' as follows:
Fehler beim Parsen (Konvertierungsfehler. Der Server („https://wikimedia.org/api/rest_“) hat berichtet: „Cannot get mml. Server problem.“): {\displaystyle PID=\log _{10}{\frac {P^{e}(p,R)}{P^{h}(p,R)}}}
A large positive PID value thus means that the track was probably an electron, and a large negative value indicates a hadron. PID = 0 means that there is an equal probability that the track is one or the other -- we can't tell. A typical cut used by analyzers to identify positrons is PID > 1 ... this selects only those tracks which were at least 10 times more likely to be a positron than a hadron.
The particular quantities PID2 through PID5 differ only in the number of detectors used to perform the PID calculation. PID3, for example uses H2 (the preshower), the CALO, and the Cerenkov, and is computed as follows:
Fehler beim Parsen (Konvertierungsfehler. Der Server („https://wikimedia.org/api/rest_“) hat berichtet: „Cannot get mml. Server problem.“): {\displaystyle PID3=\log _{10}{\frac {P_{H2}^{e}(p,R)P_{CALO}^{e}(p,R)P_{Cer}^{e}(p,R)}{P_{H2}^{h}(p,R)P_{CALO}^{h}(p,R)P_{Cer}^{h}(p,R)}}}
Here, Pe_CALO means the probability distribution based only on the calorimeter signal. These Pe and Ph functions, by the way, are called parent distributions. The difference between PID2, PID3, PID4, and PID5 is documented in the DDL files for HRC and for the uDST writer. Basically: PID2 uses only H2 and CALO, PID3 also uses the Cerenkov, and PID4 adds in the TRD as well. PID5 is different: it uses only the TRD, but contains a more sophisticated calculation of the TRD probabilities than that used in PID4. Thus, the standard PID parameter used at HERMES is PID3 + PID5, better known as PID3+5. (Note how the log10 simply makes this an alternative version of PID4.)
HRC does perform these PID calculations using the parent distributions. However, you should realize that HRC's calculations are now outdated, because the parent distributions it uses are old ones. The uDST writer recalculates the PID parameters, using the best parent distributions available.
Pion identification with the Cerenkov
The PID2 through PID5 parameters are only used to distinguish electrons from hadrons. However, the HERMES threshold Cerenkov detector was designed to allow further PID: it can separate pions from heavier hadrons (kaons or protons), within a certain momentum range. The lower end of this range is simply the threshold above which a pion will produce Cerenkov radiation. The upper end is the momentum at which a kaon will also radiate. In between the two momenta, you know that if a particle fired the Cerenkov, it must have been a pion. It could also have been an electron, which is even less massive, but the other PID detectors allow you to make that distinction. The gas mixture in the Cerenkov was changed in 1996, so the thresholds are not the same in all years.
In order to select pions with the Cerenkov, you should place momentum cuts on your tracks and then place a minimum cut on the number of photoelectrons produced in the Cerenkov. Conversely, heavier hadrons may be selected by requiring no photoelectrons.
Hadron identification with the RICH
Since 1998, the Cerenkov detector has been replaced by a RICH (Ring Imaging Cerenkov). This device is much more powerful than the Cerenkov, as it allows the separation of pions, kaons, and protons across basically the entire momentum range of HERMES. However the device is not designed to perform electron-hadron separation. That function is left to the CALO, Preshower, and TRD. The RICH PID calculations are a fundementally different beast from the electron-hadron algorithm. These calculations are performed entirely in the uDST writer ... HRC knows almost nothing about the RICH.
The RICH PID is performed in different ways: IRT, DRT, and RPS.
- IRT (Indirect Ray Tracing)
- IRT associates each hit in the photon detector matrix with each track, and using its knowledge of the mirror geometry, determines the angle of the photon with respect to the track. For each track, the algorithm then considers three particle hypotheses: was the particle a pion, a kaon, or a proton? Given each different mass hypothesis, plus the track's known momentum, one may calculate the expected opening angle of the Cerenkov cones produced in the gas and aerogel radiators. The algorithm analyses the photons that were observed near these expected ring radii, and calculates a probability number for each mass hypothesis. The uDST writer runs IRT for all events.
- DRT (Direct Ray Tracing)
- The DRT algorithm also considers three mass hypotheses for each reconstructed track. For each hypothesis, it simulates a large number of Cerenkov photons, producing an `expected' pattern of photon hits. The recorded pattern is then compared with this simulation, and a probability number is produced. DRT is superior to IRT in some momentum ranges (for example, it accounts for the fact that Cerenkov photons radiated in the gas volume can originate from anywhere along the particle track ... IRT assumes that the photons all came from the middle of the gas volume). However, DRT takes a lot longer to calculate. Because of this, the uDST writer previously not run for every track (productions run after 2008 have DRT for every track). However, there have been further developments...
For each reconstructed track, both IRT and DRT RICH PID methods yield one probability number for each mass hypothesis. These are combined into a decision: the identity of the particle is that hypothesis with the highest probability. How confident can we be of this decision? That is determined by the RQP (RICH Quality Parameter). This parameter is very similar in design to the PID parameter described earlier:
Here, Fehler beim Parsen (Konvertierungsfehler. Der Server („https://wikimedia.org/api/rest_“) hat berichtet: „Cannot get mml. Server problem.“): {\displaystyle P_{mass1}} is the probability for the mass hypothesis with the highest likelihood ... corresponds to the second most likely hypothesis.
- RPS (RICH PID Scheduler)
- The next development in the RICH PID was the creation of the scheduler: this is not another algorithm, but rather a neural-net code which decides whether to use the IRT or DRT method when calculating the probabilities for each track. A set of rules was developed to optimize performance by simultaneously maximizing efficiency and minimizing contamination. For example, IRT is more effective at higher momenta, while DRT does better at low momenta ... the scheduler knows this and selects the best method based on momentum, as well as other track parameters. In the uDST files the RPS decision is stored in the "BEST" link in the g1Track table.
- EVT (Event level DRT algorithm)
- The latest development for the RICH is an event level algorithm. The idea here is to consider all of the tracks in the detector at once. This is done by taking the existing expected patterns from the DRT algorithm and adding them together. The idea is to improve the identification of tracks that are close together and whose Cerenkov rings overlap. This is especially important for analyzers that are specifically looking at multiple track events or events where two tracks are close together. The complication is that there are three mass hypotheses for each track, so the number of combined hypotheses scales like Fehler beim Parsen (Konvertierungsfehler. Der Server („https://wikimedia.org/api/rest_“) hat berichtet: „Cannot get mml. Server problem.“): {\displaystyle 3^{\#tracks}} . So, if there are 4 tracks in the detector that's 81 hypotheses to check! Fortunately for us the number of tracks at HERMES is rarely more than 3. :) After all these hypotheses are evaluated the most likely one gives the particle type for all the tracks in the event.
For EVT the calculation of the RQP parameter is slightly more complicated. Instead of considering only the two most likely probabilities, we take the most likely probability, Fehler beim Parsen (Konvertierungsfehler. Der Server („https://wikimedia.org/api/rest_“) hat berichtet: „Cannot get mml. Server problem.“): {\displaystyle P_{mass1}} as before and then the next most likely hypothesis where the track you're calculating RPQ for is a different mass hypothesis. That way if one track in the event is impossible to identify, and therefore all the hypotheses where only this track changes have the same probability, the other tracks in the event won't have RPQ of 0.
To perform hadron identification with the RICH, you simply take the IRT, DRT, BEST, or EVT `most-likely hypothesis' decision from the uDST files. You may also choose to place a minimum cut on the RQP ... e.g. a cut RQP > 0.5 would throw away events where the RICH could not decide on particle type to better than a factor of 3. This will increase the purity of your hadron sample, but decrease the identification efficiency. But you have to correct for contaminations and efficiencies no matter what PID cuts you use.
The Monte Carlo Production
A Monte Carlo is a simulation program: it is a piece of software that simulates the physics processes you are studying in your experiment, based on known or hypothesized cross-sections. Using random numbers, it samples these cross-sections, and generates events like those you see in real life. Monte Carlo programs in particle physics also contain a model of the detector, and simulate the response of the detector to each generated event.
The Monte Carlo production chain consists of three programs: GMC simulates various physics processes, HMC runs these generated events through the detector, and WriteMcDST transforms the HMC output files into uDST format. Between HMC and WriteMcDST programs like HRC,HTC,TMC,XTC can be plugged. These serve various needs, most important of which (HRC) tries to reconstruct tracks and clusters from the hits generated by HMC.
Monte Carlo principles
Technically, the term Monte Carlo refers to a technique for doing numerical integration. Here's an example: how do you compute the number Pi? Pi is the area of a circle of radius one, but you cannot obtain its value through analytic means. (If you do the area integral by hand, you'll get, well, Pi ... but this analytic integration does not yield any numerical value). So how to get the number? Well, a circle of unit radius sits nicely inside a 2x2 square. We know the area of the square: 4. Since Pi is the area of the embedded circle, what you can do is generate a large number of random coordinates (x,y) within the 2x2 square, and count how many of them fall within the circle. Suppose you generated 1 million such numbers, and 785,380 of them landed within the circle. Then the area of the circle = the area of the square times the fraction 0.785380 ... namely Pi. (Give it a try :) it takes only a few lines to write such a program.)
Many steps in complexity later, the same principle appears in the HERMES Monte Carlo. Your goal is to measure cross-sections, which are probability distributions for certain things to happen when you scatter particle x from particle y. e.g. what is the cross-section for pion production as a function of pion momentum? To perform this measurement, you would like to have a perfect detector, that could see every single particle produced in every event and determine with perfect accuracy the properties of these particles. No detector is perfect, however. The HERMES spectrometer has a limited acceptance, and does not see all particles produced when the beam hits the target. It also has a limited resolution: when HRC says that a particle had a momentum of 4.5 GeV/c, it only does so with a precision of about 1%. One of the major purposes of the Monte Carlo simulation is to turn our measured cross-sections into true cross-sections, corrected for acceptance and resolution. In the simple Pi example, you used a Monte Carlo technique to turn a square into a circle. If the circle corresponded to your detector acceptance, you could use the comparison between the two areas to correct back your measurement (the area of a circle) to the truth (the area of a square). Similarly, the HERMES Monte Carlo turns true cross-sections into measured cross-sections. By comparing output with input, you may hope to correct your measurements for detector effects.
A second purpose of the Monte Carlo program is to perform background corrections. Simply: suppose you are measuring physics process A, but you know that your event sample contains a contamination from an unwanted physics process B. Somehow you have to subtract the events due to process B from your sample. There are several ways to do this sort of thing, one of which is Monte Carlo. If you happen to know the cross-section for process B (often because it was carefully measured by some other experiment), then you can generate a sample of events based on this process. You subtract these from your data, et voila, background correction accomplished. (Of course, nothing is ever quite that simple. :))
Monte Carlo is also very useful for sanity checking. Suppose you have written a beautiful analysis program to extract polarized quark distributions from the spin asymmetries measured at HERMES. How do you know that your program is really working? One excellent method of checking is to invert the Monte Carlo. You generate a sample of Monte Carlo events, using hypothesized polarized quark distributions as input. The Monte Carlo output will consist of a large number of simulated events based on this input, in a format very similar to that of the actual data. Thus: you can run your analysis program directly on the Monte Carlo events. If everything is working as it's supposed to, your analysis program should extract the same quark distributions that were used as input. In other words, the Monte Carlo can provide you with data from a fake world where you know all the answers ... the question is: can your analysis program recover those answers?
And finally, the Monte Carlo is used for projections and feasibility studies. A typical example is the addition of new (upgrade) detectors to the HERMES spectrometer. The design of all these detectors involved Monte Carlo work. Each detector was built in order to enhance our sensitivity to certain physics processes. Monte Carlo simulations of these processes allowed the design of each detector to be optimized: what angular range should it cover? what resolution is necessary? will it work at all?
GMC
GMC (Generator Monte Carlo) is a suite of physics generators that simulate different physics processes. Each generator is a separate program, e.g. gmc_dis simulates deep-inelastic scattering, while gmc_aroma simluates heavy quark production. Many of these programs are based on `standard' Monte Carlo packages, written by other people (usually theorists). The GMC code itself is basically a wrapper around these various generator packages. It provides:
- A standard user interface to each generator.
- A standard output format, which can be sent to HMC for processing through the detector simulation.
- The opportunity for the Monte Carlo group to supply default settings for each generator that have been found to be optimal for use at HERMES.
The GMC output files contain one event per record, with each event containing numerous particle tracks. Remember the HRC files, which also contained information by track? HRC only knows about tracks reconstructed by the spectrometer. GMC, on the other hand, knows everything about the generated physics event, but nothing about the spectrometer: it is purely a physics program. Consequently, Monte Carlo track tables contain many more entries than you would ever get out of HRC. Suppose that GMC produced an event which contained an ω meson. The ω decays very fast, often to a π+, a π-, and a π0. The π0 itself then decays to two photons. After the short amount of time (< 10E-17 sec) it takes for the decay to occur, you then have four particles (two charged pions and two photons). If all these particles actually made it into the detector, and HRC-type file would record two tracks (from the charged pions), and two untracked clusters (the photons). By comparison, GMC will record six separate tracks in its track table: ω, π+, π-, π0, γ, and the other γ. In other words: all unstable particles are also recorded as tracks in the GMC output. The ω itself lives for an extremely short amount of time, so it has no chance of being seen intact in the spectrometer. But to GMC, it is a track. GMC relates all of these tracks together using parent links. The ω is the parent of the π0, the π0 is the parent of the two photons, etc ...
HMC
HMC (HERMES Monte Carlo) is the detector simulation package. It takes GMC files as input, and runs all the events through the spectrometer, simulating the response of each detector to each particle. The HMC program contains an accurate model of the detector, which describes the location and materials of all hardware components.
The ouput files of HMC contain many tables, which reflect the various steps that the program takes when processing each event:
- The GMC (generated) tables are preserved intact.
- Each particle generated by GMC is painstakingly transported through the spectrometer. This is accomplished by a package called GEANT3, which comes from CERN. GEANT3 has detailed knowledge of all the many things that can happen to a particle when it traverses materials: multiple scattering, ionization energy loss, bremsstrahlung, hadronic interactions with atomic nuclei, etc ... GEANT3 randomly simulates what happens to each particle based on the cross-sections for all these physics processes.
- Some of the objects in the spectrometer model are designated as sensitive detectors. These are the devices that can actually record information about the particle: e.g. a wire chamber plane is a sensitive detector, while the target cell is not. Whenever a particle crosses a sensitive detector, GEANT3 records a hit. It stores the position of the hit, and also the amount of energy the particle deposited in the sensitive detector. Some of the HMC tables contain this hit information.
- All of the hits are then digitized. Based on the amount of energy deposited in each module of each detector, calculations are performed to simulate the actual (calibrated) signal the detector would have produced. The goal here is to provide information equivalent to the output of HDC ... and indeed, the HDC tables are filled by HMC. An example: digitization transforms an (x,y) hit in a wire chamber plane into a wire number and a drift distance. This calculation involves several digitization parameters, such as the efficiency of each plane.
The output of HMC is thus equivalent to HDC files, plus a lot of extra tables. The Monte Carlo production chain can therefore go ahead and reconstruct the simulated data, by simply passing it through HRC. HRC automatically preserves all the GMC tables in its output. It can preserve the other tables too, if you ask it nicely.
A small note: HMC itself does contain its own little event generator, called BACK. This is the background generator ... it throws particles randomly at the detector within an box in angle-momentum phasespace. In other words, if you use the background generator, you don't need to run GMC to generate events. No physics is involved in this little generator, but it does provide a convenient way of testing the detector's response to particular particles.
WriteMcDST
The Monte Carlo production finally passes the HRC files to the Monte Carlo DST writer, called WriteMcDST. This simply transforms the information into the uDST format used by most analyzers. Monte Carlo DST files (mcDST's) contain the same basic tables as data DST's, but a lot of hardware-type information is missing. For example, such things as data quality, HV trip detection, and pressure in the target cell are unknown to the Monte Carlo. However, the mcDST's contain a few additional tables ... these contain of course all the interesting information from GMC. For example, the real identity of each particle is recorded, not just the PID values computed from the simulated detector responses.
The Geometry File and HDB
Backing up for a second, remember that HMC needs a precise model of the HERMES spectrometer. This is provided by the infamous geometry file. Its name (more or less) is hmcdg.ie. This big ADAMO file contains the dimension, material, and location of every physical object in the spectrometer. Well, every relevant object ... the stairs up to the target platform are not included, for example. :) Each such object is referred to as a volume (this nomenclature comes from GEANT3).
The sensitive detectors mentioned before get special treatment. There are two ADAMO tables at the very end of the geometry file, called dgDETS and dgDETINFO, which contain information only about the sensitive detectors. Why is this important? Because these are the only two tables in the geometry file which are used by programs other than HMC. Yes, several programs need the geometry file, and that is why it is described here in its own section: HDC, HRC, and the uDST writer all read it in.
One reason is that these two tables supply the fundamental mapping between detector ID and detector name. Remember HDC's mapping function? All detector responses coming out of HDC are identified by a detector ID and a wire/module number. Great. But now suppose you want to retrieve hits from a particular detector, say the first wire plane of the upper part of BC4. This detector has a nice 4-character name: B4U1. It is much better to retrieve information using this name, rather than using the detector's ID number, which is 99. For one thing: it is 99 at the moment. The detector numbering might change! You do not want to write a program that accesses detector 99, and then discover that in some productions it was actually number 97. The dgDETINFO table provides the relation between name and ID. Also, the dgDETS and dgDETINFO tables of the geometry file provide basic position information about the detectors. HDC and HRC use this to determine the actual position of the various wires, hodoscope paddles, and calorimeter blocks in the experiment. As described earlier, HRC also consults the alignment file, which supplies precise, year-dependent corrections to the baseline positions stored in the geometry file.
So only two tables from the geometry file are needed by any program other than the Monte Carlo. For this and other reasons, the geometry file is now available in several version. One version, called simply hmcdg.ie is used by all the data analysis programs. Actually these programs (HDC, HRC, etc ...) usually get their information from the geometry server, but the GServer simply contains a copy of the tables in hmcdg.ie. The Monte Carlo uses files with names like hmcdg_97.ie. This file contains a careful description of exactly what the detector looked like in the year 1997. Since changes have occured to the experiment, different geometry files must be used to simulate the conditions during different data-taking periods.
The geometry file is managed by a program called HDB (HERMES DataBase). This program generates the actual geometry file needed by the other programs. To generate this file, it takes a different file as input, called generically hdbgeometry.ie. The point is that this input file is a lot easier to modify than the output file: HDB performs many calculations and administrative tasks, like making sure that the frames of the front chambers do not stick into the wire planes. For example, the H1 hodoscope array is described in the input file hdbgeometry.ie by a set of about 10 numeric parameters. The HDB program then performs the calculations necessary to create and position the various frame and paddle volumes of the device.
Two Words about Usercodes
We've come to the end of our first reconnaissance fly-by, Cadet. This mission is not over, but we have made it all the way through the HERMES Software Suite, from the East Hall to the uDST's. You know where the data comes from, and more or less what it looks like at the end. So now you want to do an actual physics analysis ... what's next?
Answer: you have to write your own program. The processed data is available to you in various formats (uDST's, HRC files, mcDST's, ...), but it is your job to read it in and turn it into a measurement. All of our data is in ADAMO format, so you will have to learn something about it ... that will be your next mission. The program you have to write is generally referred to as a usercode. Most likely, you will first want to create a PAW ntuple from the data, containing the information you are interested in. After you have studied your ntuple for a while, and know what you want to do, you may choose to write a code that calculates histograms directly ... such codes are faster than PAW at performing calculations and their output consumes less disk space. But it's up to you. If you want, you can write a usercode to turn the ADAMO data files into a format that Mathematica or Excel can read ... or just create a text file. Whatever you like. Your last mission in Software Bootcamp will be all about usercodes.
Hanna
Hanna (HERMES analysis) is a utility library written by famous HERMES programmer Marc-André Funk (a.k.a. MAF). It provides you with a convenient framework for your analysis code. It does significant bookkeeping, and can deal with files of all formats used at HERMES: HRC GAF's, uDST's, Monte Carlo files ... However, the most important feature of hanna (and the main reason it was written) is that it can synchronize slow and HRC files -- it can read in both types of files at the same time and correctly associate slow information with event information. Recall that the uDST writer performs this exact job ... in fact, the uDST writer itself is a hanna program.
Now that we have the uDST's available, you will probably never need hanna's synchronization capabilities. But it is still a convenient tool to use to read in any data files ... most of the examples in the `Analysis Usercodes' tutorial are written using the hanna framework.
The Release Structure: Where the Codes Are
So, you now have some idea of what the available codes are ... but where are they? Good question. The answer lies in the release structure. Understanding this structure is also important if you want to actually run the programs ...
Why do we need this structure?
The collection of HERMES software comprises many separate programs, but many of them rely on the same files. A notable example is the geometry file hmcdg.ie, which is read in by HMC, HRC, HDC, and many others. Other common files are needed by the programs at compile time: for example the ADAMO include file partap.inc, and the dad object library libdad.a.
In an attempt to make certain that every program is, in fact, using the same versions of these common files, we have developed a directory structure which groups all the software into releases. Each release contains a version of every program, as well as versions of all of the common files that they share. A source of trouble in the distant past was that the manager of one program would change one of these common files, but would keep this modified file only in a local area specific to his/her program. Bad idea! Other programs would also keep local copies of the file, with the consequence that modifications would be made independently to different files, and were easily lost.
To avoid these sorts of problems, it is important that anyone installing software on any computer system preserve this release structure. Even if you only want to install one program, please create an area on your computer with the full release structure, storing the common files needed by the program in the common directories. In this way, you will be certain to obtain all the necessary files, since they are generally not stored in the program-local directories.
The Release Structure
The master versions of the HERMES software releases are stored in the HERMES CVS repository, whose location is encoded in the CVSROOT environment variable. A (more or less) up-to-date compiled version is located on the PCFarm, under the directory /hermes. Release number 1 is kept in /hermes/r01, release number 2 is in /hermes/r02, and so on. The directory structure looks like this:
Directory: Comment: --------- -------- /hermes/r24 the release 24 version of the software /gmc the package directory for the GMC and HMC programs (containing source code and tar files) /hrc " " " " " HRC " /hdc " " " " " HDC " ... /bin copies of the executables from all the above programs /lib common libraries and ALL input files needed by the programs /include include files common to several programs /ddl ADAMO DDL files common to several programs /r25 identical structure for release 16 ... /pro -> /r24 the production release, simply a link to one of the existing releases /new -> /r25 the new release, also a link to one of the existing releases
The pro and new releases are not releases in their own right, but simply links to existing releases. The pro version is always a version which is frozen, i.e. it will never change and can be used reliably for long-term studies without fear of programs changing from day to day. The new release contains the current "beta-test" versions of all the programs, the versions that the software managers are updating constantly with bug fixes and changes. These versions are not stable, but they contain the latest updates, and do actually work most of the time. :)
In particular, whenever a software release is used for a run through the data production, it is immediately frozen. It becomes the `pro' version, and a new `new' version is started.
Most of the HERMES software packages are stored in the release structure, but not all: the slow production code and the uDST writer are stored elsewhere, along with the many expert files that are needed for them to run. Basically, it is not expected that users will want to run these production codes. Users should know where the source is located, however, and that is explained below. One other exception is DDL files. Most of these files are found in the ddl directory of the release structure, but sometimes files needed by only one program are only found in its package directory. A notable example is the set of DDL files for the HDC servers: they are found only in /hermes/r??/hdc/ddl.
The HERMES_ROOT Environment Variable
One of the consequences of the release structure is that you can run any program from any empty directory by setting only one environment variable: HERMES_ROOT. Suppose you want to use the software in release r25. If you set the environment variable HERMES_ROOT to the top directory of that release (i.e. to /hermes/r25), several wonderful things will happen:
- Any official HERMES program will know where to look for the input files it needs: in $HERMES_ROOT/lib. You do not have to copy current versions of input files to your work area before things will run properly!
- If you are trying to compile a private version of any HERMES software package, the package's Makefiles will know where to look for any include files they need: $HERMES_ROOT/include. The Makefiles can also find the libraries they need, in $HERMES_ROOT/lib.
- All the binaries (executables) from release r25 are conveniently stored in /hermes/r25/bin. This idea is this: you should include $HERMES_ROOT/bin in your UNIX PATH. Then you only need to type the name of any program, from any directory, and you will get the version corresponding to the release you selected.
To select r25 as the release you want to use, for example, use one of the following two command syntaxes: When using a Bourne-type shell (sh, bash, ksh, zsh, etc...) use
HERMES_ROOT=/hermes/r25 export HERMES_ROOT PATH="$HERMES_ROOT/bin:$PATH" export PATH
When using a C-type shell (csh, tcsh, etc...) use
setenv HERMES_ROOT /hermes/r25 set path=($HERMES_ROOT/bin $path)
The Official Search Path for Input Files
Official HERMES programs enforce a standard search path for all input files. Suppose you are running HMC from some random directory (called the `local' directory). HMC needs to read in a file called hmc.targ, describing the target density profile. Here is the search path that HMC and the other official programs employ to find a needed file:
- HMC checks the local directory. If the file is there, it is used. This enables you to work with private, working versions of any input file ... local version always supercede versions in the standard locations.
- If you have set the environment variable HERMES_LIB, HMC will next check that directory for the file. Most users will not need to think about this variable. It is useful, however, if you have installed a partial release in a private directory. You can point HERMES_LIB to your own release's lib/ directory, but point HERMES_ROOT to one of the standard releases. The standard release can thus supply any files that you did not install in your own partial release.
- If the needed input file is still not found, and if the environment variable HERMES_ROOT is set, HMC will look in the directory $HERMES_ROOT/lib.
- If that also fails, HMC will crash. :)
The idea is this: all the standard input files needed by the HERMES software are stored in $HERMES_ROOT/lib (e.g. the geometry file and the server list dadinit.cnf live there), but you can easily supercede these `official' files with private versions if you wish.
The HERMES search path is implemented by a standard little C module, variously called hermfile.c or hermesfile.c. Copies of this routine appear all over the HERMES software suite. For your programming convenience, here is a copy of this standard module, along with a README file that describes how to use it.
- Download: hermesfile.c
- Download: README-hermesfile
Software installation and configure scripts
Configure scripts are an essential part of the installation procedure for HERMES software. The HERMES software packages are run by many users on many different computers all over the world. It is thus important that the software can be easily installed under several different version of the UNIX operating system. In general, each brand of UNIX requires different compiler options and the use of different system libraries at the loading stage. The installation procedure may also have to deal with unusual software setups on particular machines (e.g. if a user has decided to install a personal version of ADAMO in some goofy location like ~joeblow/testing/adamo).
All of these variations are accomodated using configure scripts. Here's the idea: The software packages are all built using one or more Makefiles. Each of these files must know which compiler options and libraries to use to create the executable programs ... and those options are system dependent. Thus, HERMES software managers don't write Makefiles directly, but rather Makefile templates, called Makefile.in. These Makefile templates look just like Makefiles, except that whenever a system-dependent option appears, it is indicated by a tag. An example is the compilation rule for a Fortran routine. A regular Makefile would contain a rule like this:
f77 -c -O2 -Olimit 3200 -mips2 -trapuv -static -Nn30000 -Ne500 ...
A Makefile.in template recognizes that the string of compiler options may well be different on another system, and that even the compiler name itself (f77) may be different (e.g. some systems use the GNU version g77). Thus, the rule will appear in a Makefile.in template like this:
FC = @f77@ FFLAGS = @f77flags@ ... $(FC) -c $(FFLAGS) ...
The strings surrounded by @ signs are tags. The configure scripts parse the Makefile.in templates, and turn them into actual Makefiles by replacing the tags with appropriate system-dependent options.
To give you an overview of how this works, following is the basic installation sequence for any HERMES software package:
- set the environment variable CVSROOT=:pserver:anoncvs@kirk.desy.de:/hermescvsroot
- checkout a copy of the package you want; e.g. to get the "frozen" r24 version of HRC, type
cvs co -P -r r24 hrc
- To get the latest ("HEAD") version, just leave out the -r r24 option.
- Run the configure script. This creates actual Makefiles from the supplied Makefile.in templates, by replacing all the tags with values appropriate to your system.
- Type make. Some packages offer optional targets you can supply to make ... these are explained in the screen output of configure.
- If the package was successfully created, type make install to copy up the created binaries to the release's bin directory. Other files that are "owned" by this package, but are needed by other programs (e.g. DDL files) will also be copied up (e.g. to your release's ddl directory. This step will fail unless you have write access to the installation area.
Some more notes: configure basically does two things:
- It determines compiler options appropriate to your system.
- It searches in standard places for the libraries needed by the package. Some configure scripts also look for executables you need, like the ADAMO mad program.
The configuration options determined by the script are written to a small file called config.status, which contains a single sed command to perform all the tag replacements. At the very end, configure runs this sed command on the Makefile.in templates in the package directory, to produce properly configured Makefiles.
Please read the screen output of configure. It usually provides a summary of the selected configuration, and will always tell you if there was some problem (like a library that could not be found). You may need to adjust some of the options for your particular system. Rather than editing configure, you may do so by modifying your environment variables; these always override the default options coded into configure. Here are some environment variables that are commonly checked by configure.
- HERMES_ROOT is of course of key importance. As explained above, HERMES_LIB is also searched for libraries, if it is set in your environment.
- Your PATH is also important, if configure is looking for executables.
Starting with release r17, step-by-step instructions for installing the entire HERMES software suite on your computer are now available directly from the Software Page ... just click on the link to the release you want.
Setting up your HERMES Account
It is clearly important to set up your PATH and other HERMES-specific environment variables correctly. The system provides reasonable defaults, but to be sure you know what you are doing, you should place explicit settings in your shell's rc file (.bashrc for bash users, .cshrc for csh users, etc). Following is an example setup for the HERMES PCFarm:
HERMES_ROOT=/hermes/pro export HERMES_ROOT
PATH=$HOME/bin.Linux_RHEL3:$HOME/bin:/hermes/bin:$HERMES_ROOT/bin PATH=$PATH:/shared/usr/bin:/usr/sue/bin:/opt/products/scripts PATH=$PATH:/opt/products/bin:/bin:/usr/bin:/usr/bin/X11:/usr/local/X11/bin PATH=$PATH:/usr/local/bin:/usr/kerberos/bin:/cern/pro/bin:. export PATH
MANPATH=/shared/usr/man:/usr/sue/man:/opt/products/man:/usr/share/man:/usr/afsws/man MANPATH=$MANPATH:/usr/X11/man:/usr/local/X11/man:/usr/local/man:/usr/kerberos/man export MANPATH
If you are using a csh derivative, you must of course replace all the VAR=value commands with setenv VAR value. Also, I broke up the PATH declarations on several lines just for readability, you can do them on a single long line if you wish.
Only the first and last lines of the PATH declaration require some comment (the rest are either normal UNIX or specific to the PCFarm).
- Your local directory (`.') is placed last in the PATH. This avoids enormous confusion but is NOT standard on almost all UNIX systems. On some systems it is placed first, but there is a vague security risk involved in this case: a hacker could place an evil program called `ls' in your home directory ... you get the idea.
- /hermes/bin is placed high in the PATH list. This is an HERMES-specific thing: central computing at DESY controls our /usr/local/bin directory, and so any HERMES-specific versions of programs that we would like to supercede the official versions have to go in some other place. We use /hermes/bin for this purpose.
- In the PATH list you see the important binary directories of the HERMES software tree. If you change the setting HERMES_ROOT, you should do so in your shell's rc file, and then re-source it to update your PATH accordingly.
Try it out: run HMC
I assume that you have set HERMES_ROOT to some release, like /hermes/pro, and that you have included $HERMES_ROOT/bin in your PATH. You should now be able to run HMC without any problems. Go to any random directory, and type in the indicated commands. Here we go ...
echo $HERMES_ROOT/bin/hmc which hmc
Did they match? If not, your PATH is not set up properly.
Now let's run the Monte Carlo.
hmc <hit return when it asks for Workstation type> set bstop false set generator back name geom hmcdg_06.ie name digi hmc.digi_06-07 name resi deadstrip_06.dat init
Watch the program initializing. It reports several times that it is reading input files. Where is it finding these files? In $HERMES_ROOT/lib, as advertised. You can see the directory references scroll past on the screen.
An aside: when you hit return at the "Workstation type" question, a second, graphics window should have popped up on your screen. If it did not, you may not be forwarding graphical X11 information correctly; see the X forwarding with ssh section of Mission 1.
The fact that HMC found the input files was really all we wanted to demonstrate. But since we're here, try running these commands:
set drawsim yes view 5 ev 1
Cool, eh? OK, time to move on.
exit
The HERMES PCFarm
So far, we have discussed the HERMES software suite and the way it is set up on any analysis machine of the collaboration. This section provides some orientation specific to the HERMES PCFarm at DESY. Since you will undoubtedly have to interact with the PCFarm, you will need to know some of its quirks.
The information below is a quick summary ... for more detailed information, please consult the under the Computing Pages. A link to these pages is provided on the home page for HERMES members.
Some important accounts: opa, oma, ...
Most of the HERMES services and data production work are done under certain standard accounts. Since many members of the HERMES technical software group are involved in any data production, it would be foolish to have each person run various production codes under their own personal account. Instead we have common accounts for which several people have the password. You may find it helpful to hunt around in these accounts for certain files you are looking for.
- opa (Offline Production Account)
- All phases of the data production are run under the opa account. You will see below that the output of the productions are located in directories under the home area ~opa.
- oma (Online Monitoring Account)
- This account is used by the Data Cops for data quality monitoring of the main and uDST productions, and for storage of the uDST burslists. When HERA was running, it was also used for automatic copying of files from the East Hall to the PCFarm disks.
- w3hermes
- This account manages the HERMES web server. A URL like "http://www-hermes.desy.de/directory/file.html" points to the file ~w3hermes/html/directory/file on the PCFarm.
- majord
- All details of the majordomo mailing lists are stored here, including the web archives of old mails.
- onl (Online Account)
- The onl account was used widely on the online machines at the East Hall during data taking and is mentioned here only for completeness. Had you taken shifts during data taking, you certainly would have encountered the onl account. All data acquisition and slow control programs ran under this account, and it was also used for all shift business (e.g. preparation of run and shift summaries).
World Wide Web services
HERMES runs its own Web server; all of the HERMES Web pages are located on the PCFarm, under the account w3hermes. DESY also runs a Web server, and offers those who hold a DESY computer account (i.e. all HERMES members) their own personal Web area which can be used (among other things) to post files there. Here is the mapping between the these two types of Web areas and the files they point to:
- Files on the PCFarm
- The URL http://www-hermes.desy.de/file points to ~w3hermes/html/file on the PCFarm.
- Your personal files on the AFS file system
- Create a directory called www under your home directory, and you will have your own web site! The URL http://www..desy.de/~user/file maps to ~user/www/file. NOTE: you must make sure that the DESY Web server has the proper access rights to this AFS directory; to set the proper access, enter the AFS command
% fs setacl -dir ~/www -acl system:anyuser rl
I strongly encourage you to spend some time looking around our web site. There is a great deal of information available, and you should become familiar with where it is. Here are some pages of particular importance:
- Mailing Lists and Majordomo
- Since our collaboration is splattered across 3 continents, we must rely on e-mail to keep in touch on a daily basis. Numerous mailing lists have been established to accomplish this as efficiently as possible. These mailing lists are run using the majordomo package, which manages subscription requests and provides archiving of all mails sent to each list. The Majordomo page linked above contains a description of all lists, instructions for subscribing, and the archives. You should absolutely subscribe to the main mailing list, called hermes-list. If you do any sort of analysis at all, you should also subscribe to offline-list. And if your work is part of an established analysis group, you should also subscribe to that group's list.
- Documents Site
- This area provides access to an exhaustive database of all HERMES documents: publications, internal notes, theses, etc ... All released plots are there too, as well as slides from almost 100 HERMES talks. If you want more information on any subject, here is the place to look! The Document Search Page is a second front-end to this database, adding a powerful search facility.
- Detector Subgroup Pages
- Most detector subgroups (e.g. trigger, RICH, DAQ, ...) maintain their own web pages. These are all linked off the HERMES Homepage, and contain a great deal of useful information.
Disks, Robots, and Backups
Where do you go when you need disk space in which to perform your analysis? In order of increasing capacity, the AFS, user and group disks are available for this purpose. If you are lucky enough to be a member of a HERMES group that has such a disk, you should use this disk as your working space. Otherwise, the public user disks are available for you to use.
- AFS
- Your AFS ("home") directory on the PCFarm is located under /afs/desy.de/user/x/yyyy, where "x" is the first letter in your username, and "yyyy" is your username. Automatic backups to tape are performed regularly on all AFS user disks. However, the home disk partition is not very large, and so your home area carries a strict quota; to see what is is, enter
% fs listquota ~/
- The quota is given in units of 1024 KB blocks. Because of the limited space, it is recommended that you store only "important" files (e.g. source files and mail archives) here.
- user
- Your user directory on the PCFarm is located under /user/yyy, where "yyyy" is your username (if this directory does not exist, please ask a HERMES sysadmin to create one for you). There are no real quotas in effect here, but if you exceed a certain threshhold your directories in this area will be automatically blocked and a mail message sent to you. The threshhold is set typically at a couple of GB (although more space is available upon special request)
- /group??
- Certain disks have been allocated for usage by HERMES "groups"; these are UNIX groups whose names typically end with the characters "grp". If you are a member of such a group, say "rcoilgrp", you will have read/write access to a subdirectory with the same name on one more of these group disks. If you want to belong to a group, please notify one of the existing group members. To find all members of a group, you can issue the UNIX command
% ypcat group _ grep xxxxx
- where "xxxxx" is the name of a group.
Backups of files on the user and group disks are automatically done daily, but only for certain files (see NEEDEXPERT: the PCFarm backup web page is out of date. A day-old "snapshot" of all user and group files backed up can be found under the directories /backup??. If you have information on a public /user or /group?? disk that is not being backed up, you are responsible for backing up your files yourself. Do not take this statement lightly! Disks do die from time to time, and if you have not recently backed up your files, you will be extremely unhappy when it happens. Fortunately, backups are easy to perform: you may copy your files to the big tape robots at DESY Central Computing. Access to the robot is provided by the directory tree /acs/user. All you have to do is:
- Create a directory in /acs/user with your name on it.
- Make a tar file of the directories you wish to back up.
- Copy the tar file to your acs directory using the command dccp. It works just like normal cp.
Be a little cautious when backing up, however. Please do not copy many small files to the robot! Every osmcp command will result in 20 seconds of physical movement of the robot to mount a tape. Even if the tape is already mounted, every file will have an end-of-file mark which has a size of several megabytes on tape so that it can be found during fast forward. Reasonable robot file sizes are 100-500 MB ... everything below 10 MB is a pure waste of space and must be avoided (unless it is a one-time-only deal). The upper limit on file size is 500 MB, since the /acs user space still resides on 1.2 GB STK cartridges.
You can easily find out which files are stored on the robot ... just go to the directory of interest under the /acs top directory, and do an ls. You will see a list of all the robot files stored under that directory, just as if it the robot tapes constituted a normal filesystem. The only difference is that what you are looking at are actually file stubs: they have the same names and sizes as the actual files, but they are only pointers to those files. Thus, you cannot edit the files, or copy them somewhere else with cp ... you must use the osmcp command to retrieve a copy from tape.
The Computing Pages contain further information: about disks and robot copying.
NEEDEXPERT: the PCFarm "disks" and "robot copying" web pages are out of date
Scromp: Scratch Disks and Robot Copying
The PCFarm has almost a hundred disks named /data?? (where ?? is a 2-digit number) which are called production disks (since they are reserved exclusively for offline production use). Although these disks are not backed up, they use RAID technology and there is a good chance that we can recover from single disk failures.
You would think that 100 data disks would give the production more room than it would ever need, but quite the opposite is true. We are often operating right at the edge of our disk space capacity, and so management of the available space is very important. Furthermore, numerous production programs are usually running at the same time, and they are all constantly copying files back and forth between the tape robot, the staging disks on the PC farm, and the scratch disks on the PCFarm. To coordinate all this I/O to and from the scratch disks, we have the scromp software package (SCRatch disk/Osmcp Manager Package). This package provides two basic services: it provides a convenient mechanism for the copying of raw HERMES data files from the robot to the scratch disks, and it allocates disk space for the various jobs that need it to prevent clashes. The scromp LRServer (linkrun server) is always up and running on the PCFarm, under the opa account; it accepts requests from programs for allocation of disk space, and comes back with the name of a disk with enough free space. Also part of scromp are a number of useful client programs that you can use to interact with the LRServer:
- linkrun retrieves the raw EPIO file for one run of data from the robot, and copies it to one of the scratch disks. The program is smart enough to check first that the requested run is not already on disk, or is currently being copied over by another user. Please note that these giant EPIO files cannot live forever on the scratch disks, and they are automatically deleted after a few hours. If you retrieve one, you'd better use it right away.
- linkacs retrieves an arbitrary file from the robot.
- scalloc issues a request for space allocation to the LRServer, and scfree instructs the server to free up allocated space.
Technical Information: Where's the data?
It's now time to get down to technical details. You have an overview of the various HERMES software packages and data files ... but how do you run the programs? where do you find the data? where's the source? where are the DDL files? Read on, Cadet, all will be revealed.
In the descriptions below, some actual UNIX commands will be given. For example,
linkrun [-v] run year
The notation means the following:
- Items in square brackets are optional.
- Items in italics are not meant to by typed as is ... they indicate variables. e.g. in the example above, you should replace run with the run number you are interested in.
- Any appearance of the symbol ? denotes `some numeric character'. For example, the notation /data??/udst indicates the uDST areas on any of the disks data00 through data99.
- The wildcard symbol * denotes `any sequence of characters'.
This section contains considerable discussion about the structure of the various HERMES ADAMO files. During your first aerial fly-by, you obtained some brief information about ADAMO, and how to view ADAMO files using the pink browser. As you explore each section below, go ahead and look at the DDL files with your favourite editor, and the ADAMO files themselves with pb ... however, you may also want to return to this section after you complete your ADAMO training in the next mission.
Documentation
Links to `reference pages' are provided in each section below. Please note that these links are all accessible from elsewhere on the HERMES web site. For example, once you've read through this tutorial, you don't need to keep coming back here to find the DAQ web page ... it is linked directly off the HERMES home page. All of the links below can be reached from one of the following key reference pages:
One last note about documentation: we're really sorry, but it's far from perfect. Some programs have beautiful manuals, while some others have no written documentation at all. Such is life in experimental physics. Here are three little tips:
- Almost all programs will respond with a list of command options if you type command --help. This is the primary source of documentation for the myriad command-line options of HRC, for example.
- All those wonderful data files have hundreds of tables, each with many variables. The principal source of documentation for what all those variables mean are the comments within the DDL files. The DDL files for the uDST's are a stellar example of this: they contain very detailed comments.
- The ultimate source of documentation for any program is the source code itself. In the words of an old hacker epithet: "Use the Source, Luke". If you have a question about exactly how a particular quantity is computed, get into the habit of checking the source code. Just grep the source for your variable of interest, and do a bit of detective work. It usually doesn't take long to figure out what is going on.
When it all gets frustrating, just remember: "A physicist is a person who can function effectively without a manual." :)
The DAQ
Location of the source code
Honestly, you REALLY don't want to see the source code for the DAQ. Trust me.
(wanna sample? dxhrb2.desy.de:~wbr/online/run/evt/ctl.f ... I warned you!)
- NEEDEXPERT: Need sample of DAQ source code ;-)
Location of the output files
The output of the DAQ is the EPIO files, one per run, and they are stored on the big tape robot at the DESY main site. You can retrieve them from the robot using the program linkrun.
linkrun [-v] run year
As mentioned earlier, this scromp client program retrieves the EPIO file corresponding to one run from one data-taking year (format: 1998). The file is copied to a local disk attached to the machine where you execute the linkrun command: /scratch/runstage. A link to this file is placed in your local directory. The link is called runrun by default, but its name can be altered with the option [-l linkname]. The -v option to linkrun asks it to run in verbose mode. Other options are available ... type linkrun --help for a list. Note: the EPIO files are huge. Any files you retrieve using linkrun will be automatically deleted from disk within some number of hours.
Structure of the output files
The EPIO files are the only HERMES data files not in ADAMO format. There is very, very little chance that you need to know anything about this format. I certainly don't know anything about it. You only need the EPIO files if you need to do a private test run of the main prodution yourself.
Running the program(s)
You REALLY don't want to know how to run the DAQ!
Reference pages
- The DAQ Page NEEDEXPERT: Broken DAQ link is the place to go for more information.
- The scromp package web page describes how to use the linkrun client for EPIO file retrieval.
Slow Control
Location of the source code
The slow control programs are owned and operated by various detector groups, and form part of the online software. Almost all of these programs are scripts, either in pink or floyd ... even the taping client itself is a floyd script. All the little scripts may be found on the online machines, under the top-level slowcontrol directory /usf1/SLOW/. The code is organized into machine subdirectories, such as /usf1/SLOW/hercwins for applications that run on the hercules machine, and /usf1/SLOW/r4lowlevel for programs that run on axher4. Since this tutorial is really designed to teach you about the offline software (so that you can perform a physics analysis), I'm going to stop here and skip further details about the slow control code.
Location of the output files
The raw slowlogs (i.e. the fill files and nobeam files) are automatically copied from the East Hall to the SGI by a cron job. They may be found at /data01/production/slowlogs??, where ?? refers to the 2-digit year number. Recall that each fill file corresponds to one fill, while the nobeam files correspond to the periods between fills. They have names like this:
/production/slow/slowlogs00/fill.31.07.00-17:13:15.fz.gz /production/slow/slowlogs07/nobeam.01.01.07-00:00:52.fz.gz
The fill and nobeam periods are thus identified by a date and time stamp. Please note: you will see fill numbers floating around the software suite from time to time. These are very artificial numbers which are used internally in the (data) uDST production. Fills at HERMES are officially identified by date and time.
Structure of the output files
See the section below on the Structure of the output files for the slow production.
Running the program(s)
The many client and server programs that make up the slow control software are all controlled by a master daemon that makes sure they are all running properly during data taking. The individual jobs can also be interactively started or stopped via a useful shell script called jobctrl. This is the official way of running the slow control scripts, but you can also run private versions `by hand', if you know what you are doing.
Reference pages
- The Online Monitoring pages contain more information about slow control, including the excellent Shift Crew's Guide to HERMES Online Monitoring.
- Knut Woller's excellent Slow Production pages contain a wealth of information about the offline slow production, and in the process supply much detail about the online slow files as well.
The Main Production: HDC, HRC, and ACE
NEEDEXPERT to check section "The Main Production: HDC, HRC, and ACE"
Location of the source code
The source for all of these packages can be found in the standard release structure:
- The HRC source code is in /hermes/r??/hrc/src/
- The HDC source code is in /hermes/r??/hdc/source/. The code for the CServer, GServer, and MServer programs is also part of HDC, and is found in /hermes/r??/hdc/servers/. The directory /hermes/r??/hdc/mapper/ contains the program to load the mapping table to the MServer.
- You guessed it: the ACE sources are in subdirectories of /hermes/r??/ace/.
Location of the output files
As described earlier, the main production output is in the form of run directories, each of which contains an HRC file plus a number of other smaller files. Note that the HDC files are stored nowhere: HDC output is piped directly to HRC during the main production, and is never saved to disk or to tape. Within the run directories, the HRC files take up the most disk space, which is a lot! For this reason, we no longer store the complete output of the main productions on disk. Instead, we keep only 1 out of every 10 HRC files on disk, for special studies. All the run directories are still present, with all their little userevent files intact. It's just that 9/10 of them don't contain any HRC files ... we don't have enough disk space to keep them all. Anyway, here's the location of the main production run directories. Suppose that the main production you want is 97b ... the area you need to go to is
~opa/97b/
... that simple. Within this directory, you will find some other directories:
- config/ contains a number of input files needed by HDC, HRC, and/or ACE (for example, the wire chamber alignment file alignment.txt). It also contains the important handlerun script, which actually runs the production programs.
- calib/ contains `expert' input files supplied by the detector groups. Examples are the pedestal and gain files for the LUMI, hodoscopes, and calorimeter, and the space-drift-time relations (SDTR's) for the wire chambers. The information in these input files is used by HDC for its calibration and decoding work, and is loaded up to the CServer before the production runs.
- root/ contains the actual data files, organized into run directories called run?????/ (where ????? is the 5-digit run number). If the HRC output file is still there, it will be called hrc?????.devents.gz. Also, there are usually directories called root.0/, root.1/, or something similar. This `dot-something' designation indicates a partitioning of the produced runs into groups:
- root.0 is for unpolarized runs
- root.1 is for polarized runs
- root.t is for test runs Most often, both root.0 and root.1 are simply links to the main directory root.
One tiny, little note: the production areas under ~opa are fraught with symbolic links. For example, the run directories run????? are all symbolic links to directories on the scratch disks, and the production directories themselves (e.g. ~opa/97b) are links to the /data01 disk. Thus, if you move into one of these directories using cd, you may find yourself confused if you try to move out again using cd .. -- you will not be back in the ~opa directory where you started, but rather in some random scratch area ... :-)
If you need to retrieve one of the deleted HRC output files from the robot, you can do so with the scromp program linkacs. First you need the full name of the file you want. Suppose you want to retrieve run 19999 from the 97b production ... the main production output for this run (i.e. all files from the original run directory) are bundled into the following gzipped tar file on the robot:
/acs/prod97/version_b/run_19999_97b.1.tgz
The elements of this filename are no doubt clear, except for the `.1'. That refers to the original partitioning of the root.? directories described above. In this case, our run 19999 was a polarized run. To determine the exact filename for your run, just use ls ... as you learned before, this will give you a list of file stubs for whatever is stored on the robot. Finally, you retrieve the file using linkacs:
linkacs -v /acs/prod97/version_b/run_19999_97b.1.tgz
(The -v just puts the client program in verbose mode, where it prints out what it's doing in detail.) Your run will be copied to one of the scratch disks, and a link to it will be provided in your current directory.
Running the program(s)
There is a slim chance that you might want to run the main production chain yourself on one of the raw EPIO files. Here are some possible reasons: you are developing software for a new detector ... you are (gasp!) trying to change HRC ... or you are trying to perform an analysis that no one has thought of before, and need information that is not in the uDST files. Running HDC, HRC, and ACE is a notoriously complicated process because of the many command line switches and recommended settings that have to be supplied to these programs. But fortunately, you can profit from the experience of the production manager: just grab the commands he, she, or it used from the handlerun script of your favourite production. It is particularly important to retrieve the long lists of command line options, which are typically stored in the variable HDC_flags, HRC_flags, and ACE_flags within the handlerun script.
Here is an example script called chain, which shows how the whole thing works. To use the script, first edit the variables in the `user parameters' section. Then just execute it ... it will loop over the runs you have requested, retrieve the EPIO file from the robot, and run HDC and HRC (not ACE) on that file. Please note: the command-line flags handed to the programs in this example script are not a recommended set! (They came from a random special test). You must replace them with a set that suits your needs.
- Download: chain
Now let's be a little more careful here. If you really want to `do the job right', and duplicate the conditions under which one of the official main productions was run, you must consider these four points:
- Command line flags
- As described above, you should take these from the production's handlerun script.
- Input files
- As always, if the HERMES software suite is setup correctly on your system and if you have your environment variable HERMES_ROOT set to a valid release area, HDC and HRC will be able to find all the input files they need from the directory $HERMES_ROOT/lib (except for the actual data file of course!). However, if you really want to `do the job right', and duplicate the conditions under which one of the main productions was run, you must use the exact same input files as that production. In particular, there are some optional input files to HRC that are not provided in any /hermes/r??/lib directory, but are nonetheless important. Here are the two key examples:
- alignment.txt: The alignment file, containing precise, year-dependent corrections to the baseline chamber positions stored in the geometry file.
- hrcset.rz: An ADAMO file containing technical parameters relevant to the operation of the track reconstruction algorithm. It would be useless to store these optional files in the release areas, since they change from one production to the next. Both of them may be found in the productions' config/ subdirectories, and it is important that you copy them to your local area. You should also copy over the other input files in config/ (HODOSCOPE.ie, for example). However these other files are very stable ... they have not changed for years, and are available in /hermes/r??/lib.
- Input servers

- Source code release
- Each production used a particular release of the source code. In fact, our policy is that whenever a production is run, using some release version r?? of the HERMES software, we freeze that release and start a new one. (Well, that's the idea anyway ... ;-)) The HRC, HDC, and ACE programs are sufficiently stable that this point should be of minor importance -- using the current release to reconstruct older data should not cause any problems (and may actually incorporate useful bug fixes). But if you really need to do some archaeology and duplicate the exact conditions of the original production, you must use the original version of the software. To find out what version was used, consult the production pages. If this information is unhappily missing, there is a foolproof alternative: the handlerun script sets HERMES_ROOT at the very beginning.
A note about input: HSM, hrcset, and dadinit.cnf
Let me point out a few more things about the input files used by HDC and HRC.
First, about lookup tables. What is a lookup table? Well, suppose you have a complex function of several variables (call it f(x,y,z)) that you need to evaluate repeatedly in your program. A common example is when f(x,y,z) refers to a multi-variable integral with no analytic solution. Such integrals appear commonly in radiative corrections routines, for example, and must be evaluated numerically at every (x,y,z) point of interest. This numeric integration often takes a great deal of computer time, and so your program will become terribly slow if you have to perform the integration over and over at different points. To speed things up, programmers use a lookup table, which is simply a big array containing the value of the function f evaluated at many (x,y,z) points on a grid. You first generate the lookup table using your time-consuming integration program ... then you have your program read in this table, and interpolate between grid points to obtain the value of f at any point it needs. HRC needs a variety of such tables to do its work: a momentum lookup table, magnetic field maps, and tree-line search tables. These tables are all available in binary files that you can find in $HERMES_ROOT/lib:
- hfield.bmaf: Spectrometer field map
- tfield.bmap: Target field map
- hrc-*.mof and hrc-*.trf: Momentum lookup tables
- tree-*.tree: Tree search lookup tables
However, HRC can also generate these files, and will do so if it cannot find them. The momentum and tree-search lookup tables, for example, must be regenerated when the tracking method is changed. Also, these lookup tables are large, and take up a lot of RAM when a program loads them into memory ... and on a system like the HERMES SGI, many HRC processes may be running at the same time. To conserve memory, Wolfgang Wander wrote HSM (HERMES Shared Memory Manager). This is a little program which loads a requested lookup table to a shared memory area, accessible to other programs. (It's kind of like a very simple dad server.) Thus, when HRC needs to load a table, it actually goes through HSM: if the table is already in memory, it uses that ... otherwise it runs HSM to establish such an area. The Monte Carlo program HMC also needs some of these lookup tables, and also uses HSM.
As described above, one of the optional but important input files that HRC uses is hrcset.rz. This file contains one ADAMO table, called rcSET, which is the storage place for numerous technical parameters used by the track reconstruction algorithm. The reason I have singled out this input file is because there is a unique way to modify it: the HRC package contains a pink script called hrcset which is a graphical editor for the hrcset.rz file. Just run hrcset on its own, or on an existing hrcset file ... you can use the various sliders and input boxes to adjust such esoterica as the signal speed in the FC wires or the `road width' within which track bridging is performed. Each time you hit the `Save Setup' button, you add another row to the rcSET table. When HRC reads in the file, it uses only the last row.
Finally, a note about servers. Unlike HRC, HDC takes most of its input not from files but from dad servers (remember the CServer, GServer, and MServer?). The file dadinit.cnf, located in the usual directory /hermes/r??/lib of the release structure, contains the host definitions of all servers accessible from your machine. You will learn exactly what this file means in your next mission, about ADAMO and DAD. But for now, you can see just from looking at the file on the SGI that it does indeed contain entries for the various component dataflows of the CServer, GServer, and MServer (also for the LRServer that manages scromp requests). If you wish, you can use a private version of dadinit.cnf when you run HDC ... as you learned earlier, HERMES programs will always use local input files preferrentially if they are available. By altering dadinit.cnf, you can thus direct HDC to take its input from a file rather than a server, or point the program to a private server you have started yourself. (The binary files for the server programs are exactly where you'd expect to find them, in /hermes/r??/bin.)
Structure of the output files
The primary output of the main production is the HRC file, so let's discuss its ADAMO structure. Each record contains the dataflow rcEvents, corresponding to one event. Here is the structure of this dataflow:
rcEvents = {
rEvents = {
rcTrack, rcPartTrack, rcCluster, rcVertex, rcLumiCl,
rcTreeLine, rcDigitList, rcBridge, rcSpacePoint, rcSpaceDig,
rcPIDInfo2, rcTRDPuls, rcRaw, rcEvInfo, rcSET }
dcEvInfo
}
In this description, bold face is used to denote a dataflow while normal type denotes a table ... you see, then, that all but one of the tables (dcEvInfo) are actually part of a sub-dataflow called rEvents. Now, what do the colours mean? They indicate the DDL file which contains the various table and dataflow definitions: black = HRC.ddl (the primary HRC DDL file), green = HRCSET.ddl (the DDL file for the hrcset tracking options), and red = DECODE.ddl (the primary DDL file of HDC). These DDL files are of course stored in the release directory /hermes/r??/ddl. Finally, if you have a quick look at HRC.ddl, you will see that the key table for HRC output is rcEvKey = (iEvent, iRun, cType).
To get a feeling for the content of an HRC record, let's examine two tables. The most important table of all is rcTrack. It contains one row for each track reconstructed in the event. The columns provide such useful information as P (track momentum), Theta (polar angle in radians), and ZVx (z position of track's closest approach to the beam line). Many relationships are also defined between rcTrack and other tables. For example, each track may be linked to rows in rcPartTrack (indicates which front and back partial tracks were used), rcCluster (describes any calorimeter cluster on the track's path), and rcVertex (indicates an intersection point between this track and another, or this track and the beamline). As a second example, consider the HDC table dcEvInfo. Unlike rcTrack, this table contains only one row per event. A wealth of event-level information is recorded here: e.g. run number, event number, VME time, bunch number, and a bit-word indicating which triggers fired.
Feel free to explore ... have a look at the DDL files, or pull up an HRC file in the pink browser and click through a few records.
As shown above, dcEvInfo is the only HDC table that is preserved by HRC in dataflow rcEvents. However, if you supply the command line option --wrdigit to HRC, it will obligingly preserve all of the HDC tables. In this case, the HRC output file contains a different dataflow: rmEvents.
rmEvents = {
rcEvKey [KEYTABLE: iEvent, iRun, cType]
rEvents = {
rcTrack, rcPartTrack, rcCluster, rcVertex, rcLumiCl,
rcTreeLine, rcDigitList, rcBridge, rcSpacePoint, rcSpaceDig,
rcPIDInfo2, rcTRDPuls, rcRaw, rcEvInfo, rcSET }
mcEvents = {
mcSetPar, mcDigiPar, mcDetEff, mcChanEff,
mcEvData = {
mcEvent, mcBeam, mcTrack, mcVert, mcStrFu, mcHit,
mcRadCor, mcUser, mcTrigger, mcDig2Hit,
dataDfl = {
dataProp, dataPuls, dataHodo, mapdataCalo, mapdataLumi,
dataCalo, dataCaloSums, dataRICH, dcEvInfo, dcTrigger,
dcTestBC, dataCHARM, dataWheel, dcGMSInfo }
}
}
}
Now the entire HDC dataflow dataDfl is present. A typical HDC table is dataPuls, containing one row of information for each drift chamber wire which fired. The columns of dataPuls are iWire (wire number), iPuls (calibrated TDC value), rWirePos (wire position after alignment correction), and rPuls (drift distance calculated from TDC reading). Also there is a link from each row of dataPuls to the dgDETS table of the geometry file; this indicates which sensitive detector (chamber plane, in this case) we are talking about. This is the type of decoded information that HDC produces, and that HRC turns into reconstructed tracks.
Note that a new colour has appeared in our description of rmEvents: teal denotes tables and dataflows defined in MCARLO.ddl, which is the principal DDL file of the Monte Carlo (GMC and HMC). The reason is simple: the most common use of the --wrdigit flag is when a user is running HRC on Monte Carlo output, rather than real data. The user might well want to preserve all the HDC-type digitizations that HMC generated ... so HRC just assumes that it is being handed simulated rather than real data, and dumps both the HDC and Monte Carlo dataflows to its output file. If real data is in fact being used, no problem, the Monte Carlo tables will just be empty.
Finally, what about the output of ACE? It is most unlikely that you will ever need to use this output ... it is used only by the drift chamber experts to compute the PPE's (Permuted Plane Efficiencies). These are then merged with the rest of the data in the uDST production. But just for the record, the ACE output files are found in the main productoin run directories, and are called
- ace.ana.gaf.fz: contains hardware efficiencies
- ace.arc.gaf.fz: contains software (or `HRC') efficiencies, which include the intrinsic inefficiency of the tracking algorithm
These files typically contain only the dataflows anaSpinflow and arcSpinflow respectively, both of which are defined in the file ACE.ddl (along with numerous other tables and dataflows for ACE's internal use).
Reference pages
- The production page has up-to-date information on the details of all past and present main productions. Also, a wealth of detailed plots summarizing the output of the various main productions are available from the Data Quality web page.
- HDC only has a very old web page, available here ... it is old, but it is still useful. As you see, the original intention of this page was also to document ACE, but that never happened.
- HRC has no web page. The most detailed description of the tracking algorithm comes from the thesis of Wolfgang Wander, the guy who wrote the code. You can get it in English or in German.
- Don't forget the --help option: HDC, HRC, and ACE will happily dump all their command line options to screen if you execute them with the --help flag.
- How do you find out exactly what all the variables in the HDC, HRC, and ACE output files mean? Unfortunately, the only documentation we have is the comments in the DDL files and the source code for the programs. Sorry ... but you will find that the DDL comments are quite detailed.
- The scromp package web page describes how to use the linkacs client for retrieving main production output.
The Slow Production
With any luck you'll never have to think about the slow production. It's a disaster of scripts that rarely works correctly. Here's what you need to know:
Location of files, scripts, and codes
The slow production lives at
/production/slow
Here you will find such nice directories as
- /rawcopy the scripts that were previously run to copy the raw slow production information from the online machines to the data disks
- /slowlogsXX for each data year. These contains links to the raw slow production files on the data disks for each fill of that data year.
- /extern the external files and processing scripts are here. Within each subdirectory there is an /incoming directory where the expert files live
- /lumiXX for each year. The lumi monitor fits.
- /polariXX for each year. The LPOL and TPOL fits, and LPOL/TPOL ratio. The beamspin information is also here.
- /trackeffiXX for each year. The tracking efficiencies.
- /ugfs The unpolarized gas feed system map
- /dataXX for each year. The processed expert files live here (the tracking efficiency files are under fillwise/fillXXXXXXXX/ )
- /daemon the slow prod daemon scripts live here under /src. The resulting log files from the jobs submitted are also here, under /logs
- /src the actual work horses for the slow production processing live here. The are gmerge, gsplit and gconcat (all in /merge) and readslow, which produces the DQ plots.
- /binXX for each data year. This contains all the scripts needed to run the production for that data year.
- /slowXXXX for each production. This contains links to each of the produced slowproduction files for each fill in this production
- /docu the slow production documantation web pages are here
Running the slow production
Pity the man or woman that has this job. But, it is all nicely explained on the slow production cookbook page, although some of the information about processing the external files can be found in
/production/slow/extern/polariXX/README /production/slow/extern/lumiXX/README /production/slow/extern/trackeffiXX/README
Structure of the output files
The produced slow production files are, like everything else, DAD files. They contain many tables, some of which appear many times in the file. For example, the HVdata table contains some high voltage information, as well as a time stamp. There is a new table for each ~1second interval containing the updated HV values. Other tables, such as heraPolar exist on a time interval of ~1min, and still other tables, such as the dgDETS table, which contains detector information, appear only once per fill.
The best way to get a feel for the slow production files is to have a look at one!
cd /production/slow/slow07c0/fill.01.04.07-11\:50\:24/ pb -rfila fill.01.04.07-11\:50\:24.da.gz
Use the "Next Key; Read" and "Browse Record" buttons to move through and inspect the file. If you're looking for a particular record, in the selector field type
Name="HVdata"
And press "Next Key; Read" and then "Browse Record". Continue to press "Next Key; Read" and you will browse through the many HVdata tables.
Slow production status and DQ
What you may really want to know is the status of the slow production. There are several tools to do this: You should begin with the slow production main page
Here you find links to several useful pages, in particular the slow production status page On this page is a subsection for each production with the following pages:
- Details, news, and known problems - this contains some text about known anamolies in the slow production files, such as fills that are not produced and why, fills that are missing particular tables, etc. But, it's not always 100% up to date
- Slow production fill list including links to fill file - This is a list of every fill in the production and what runs from this fill failed. The reason for the failed run is indicated by the color code explained at the top of the page. Each fill is a link to the fills directory on the pcfarm where you can find the logfile, data quality plots, and of course the actual produced fill file.
- Slow production Data Quality - This page nicely displays all the data quality plots for the production. On the left is always the beam current and run number plots, allowing you to see where there was a pause in data taking. On the right you can choose between different expert information by clicking the buttons for the luminosity monitor ("lumi"), the TPOL and LPOL ("polari"), the tracking efficiencies ("effi"), or the prescale factors for triggers 21, 22, 28 and the calorimeter threshold ("prescale"). Skip to a fill with the links in the sidebar, or use the "First", "Prev", "Next", and "Last" buttons to move around (the "Prev" and "Next" buttons will show the same set of plots as you were previously viewing, so you can, for example, scroll though all the tracking efficiency plots by clicking on the first fill, clicking "effi", and then clicking "Next" through each fill in the production. No need to click "effi" over and over again)
The uDST Production
Location of the files and codes
Everything to do with the uDST production is stored under the opa account ... even the uDST writer source code is there, and is not kept within the HERMES release structure. There are two access points to the uDST production area:
- /production/opa/udstprod/ This is the real top-level directory, absolutely everything related to the uDST production is located underneath it.
- /production/udst/xxx/ where "xxx" is a uDST production version. This is an alternate access point for users. It contains links to only those directories from /production/opa/udstprod/udst_xxx/ which a normal user needs, plus the all-important pol_burstlist file. This giant file is the burstlist prepared by the Data Cops, containing data quality words for each burst in the production.
Everything under these top-level areas is organized within production directories ... even the source code: for archaeological reasons, a frozen copy of the complete code, including the DDL files, is stored with each production. Let's say we are interested in the 98b4 production: everything a user needs is then under
~opa/udstprod/udst_98b4/
Here are the directories, links, and files you will find in that area:
- udst_source_code/udst_maker/newddl_97/ This directory contains the all-important DDL file for this version of the uDST production, called g1_98B4.ddl. This is the only DDL file you need to interact with the uDST's. The DDL files are replete with highly informative comments, describing exactly what each DST variable means. We also have web documentation, described in the 'Reference Pages' section below. Please be aware: the DDL file does change from one production to the next ... be sure you are using the right one when you perform your analysis!
- udst_source_code/udst_maker/ contains the source code for the uDST writer.
- smlinks/ Here's where you find the uDST data files, called run?????.smdst.gz (where ????? denotes the 5-digit run number). Specifically, these are the semi-DST's, used by almost all analyzers. The details of exactly which events are kept in these DST's are provided in the comments at the top of the DDL file. Basically, the filtering criteria are designed to preserve all information needed by any inclusive or semi-inclusive analysis being performed at HERMES today. An example of an event which is thrown out is one with a single track that has no chance of being the scattered beam positron. We don't know of any physics we could do with such an event, so it's thrown out to save valuable disk space.
- nanolinks/ This directory contains the nano-DST's, which are smaller versions of the semi-DST's.
- g1links/ Yet another uDST version: the inclusive DST's. These contain only trigger 21 events, and only those tracks which are good candidates for the scattered DIS beam lepton. This format is almost never used (or produced) anymore.
- pdlinks/ The PID-DST's are a technical version including special information for the determination of the PID parent distributions.
As with the main production area, please realize that most of these directories are links to other places, and so moving up and down the directory tree may be a bit confusing.
Running the program(s)
It is 99.9% certain that you will never need to run the uDST production. It is essentially impossible to do so unless you are the production manager. The uDST writer brings together data from the main production, the slow production, and a raft of expert input files, and all of these sources of input must be in the right place before the uDST production can run. Nevertheless, documentation has been prepared by uDST guru Brendan Fox for the education of future production managers (see the Reference Pages section below for more information).
Structure of the output files
The structure of the uDST files is completely defined in the one DDL file g1_98B4.ddl (I'll just use the 98b4 production here as an example). The records of the semi-DST's all contain the dataflow smData:
smData = {
smTrack, smCluster, smLumi, smRICH,
g1Data = {
g1Track, g1DAQ, g1Quality, g1HVtrip, g1QualInfo, g1BurstStat,
g1Detector, g1Beam, g1Target, g1HE3, g1ABS, g1Unpol, g1TrkEffi,
g1ACE, g1ACEcnts, g1Online, g1uDSTstat, g1SpinGate, g1Trigger }
}
(As you probably guessed, the old inclusive DST's only contained the dataflow g1Data.)
It is very important to understand that the uDST files are organized at the burst level, and not at the event level like the HRC files. In other words, each uDST record corresponds to one burst, and may contain many events. This takes a bit of getting used to, since from a physics/analysis perspective it is more natural to think in terms of events. The majority of the uDST tables are also burst-oriented, and contain only one row per record. Examples are g1BurstStat (average response of the PID detectors during this burst), g1Beam (beam polarization at burst start), and g1Target (target information at burst start). These tables are all important, but are technical in nature.
The physics-oriented tables on the other hand are organized differently, and contain more than one row per record. The most important tables of the entire uDST are g1Track and smTrack. These contain one row per track, and there is a 1-to-1 relation between them. For example, in one record, g1Track and smTrack might each have 153 rows. Row #12 in both tables refers to the 12th track of the current burst ... g1Track gives the primary information for the track (such as momentum and scattering angle), while smTrack provides some additional data (such as calorimeter energy and a time-of-flight measurement from the hodoscopes). Table smRICH is also linked to g1Track, and supplies RICH probability calculations for each track. But how do we access the event-level structure of the data? Our 153 tracks come from perhaps 80 different events within the same burst. The answer is to use the iEvent variable of g1Track, which associates each track with a unique event number. Two other tables are organized in a similar way: smCluster and smLumi contain one row per untracked calorimeter and LUMI cluster respectively. These tables also contain iEvent variables to associate each cluster with a particular event.
To summarize, your analysis code should perform these steps as it goes through the data stream:
- Read in one record = burst of data, and store the burst-level information somewhere (e.g. target and beam polarizations).
- Now retrieve the first row of g1Track and check its event number. Then retrieve all entries of g1Track with the same event number, and pull the correponding rows from smTrack and smRICH via the relationships between the tables. If you are interested in untracked photon clusters, consult smCluster and smLumi for any entries with the same event number.
- Now you have all information in hand for one event. Go ahead and perform your Nobel-prize winning calculations, and store the results.
- Time to get the next event. Pull the next unused row from g1Track, get its event number, and off you go again.
And now that you understand how to work with the uDST's, I'll tell you the easy way to do it: use hanna! This excellent utility library was described briefly in an earlier section ... now you see how useful it is! What it does is to provide a main program and framework for your analysis usercode. Hanna takes care of reading in the uDST files burst-by-burst, splitting the information into events, and even working with the Data Cops' burstlist. All you have to do is write a handful of user routines with names like user_burstinit and user_event. These are called by the main program at appropriate points in the processing of the data stream, and the code you write tells hanna what to do. For example, your user_burstinit routine is called at the start of every new burst, and user_event is called for every event. Hanna conveniently passes to user_event the first and last g1Track row numbers corresponding to the current event so that you can easily retrieve the information you need.
One last note. Remember what we learned about 1103's, 1203's, and split bursts? To recap briefly: there may be more than one uDST record per DAQ burst. This occurs when a DAQ burst is split by a spin-flip transition, as signalled by a 1203 scalar event. You don't really need to worry about this. However, I said earlier that each partial burst could be identified by a UDST record counter, and I'd like to show you now where that is. You can find both the uDST record number and the DAQ burst number in table g1DAQ: the variable names are simply iUDSTcounter and iBurst respectively. You can also find these counters in the key table g1Key which uniquely identifies each record. The columns of g1Key are: the run number iRun, the UDST record number iUDSTcounter, the DAQ burst number iBurst, and finally the dataflow name cName = smData. You will find these three numbers (as well as the fill number) in each row of the Data Cops' burstlist.
Reference pages
Fortunately, rather a lot of documentation is available for the uDST production and its output. All of it is available directly from the HERMES Data Cops web site. There you will find these important links:
- uDST productions. This link points to masses of information on the output of each uDST production. Along with a raft of plots, you will find
- the data quality burstlists and their documentation
- release notes for each uDST production
- convenient links to the DDL files for each production
- links to the shift reports, run summaries, and logrun.new files compiled by the shift crews during data taking
- uDST Documentation. Here's where to go for technical information, such as:
- a detailed description of all values in the uDST DDL file
- summaries of each year's data taking, including run ranges for polarized running, for different unpolarized targets, and for test periods
- a manual for the operation of the uDST production
The Monte Carlo
There are two very different aspects to working with the Monte Carlo: using the files of an official production, and running it yourself. If at all possible, we encourage you to do the former. Large sets of simulated events using a variety of physics generators are available on disk for you to use. If you need a special production for your work, just ask the MC production manager -- he, she, or it has efficient scripts, experience, and fast computers at his disposal, all ready to cater to your every simulated whim. Also, the output of the Monte Carlo production is presented in a format very similar to that of the uDST's. It thus takes very little work to modify your analysis code so that it can process Monte Carlo files as well as real data.
If you want to run the Monte Carlo yourself, it is not so difficult ... but it is certainly not as easy as using existing files. For your reconnaissance training, we will only discuss the use of the official production files. The Monte Carlo web page contains plenty of documentation about running the programs should you need to do so in the course of your analysis.
Location of the source code
The code for GMC and HMC can be found in the release area: /hermes/r??/gmc/src/ and directories therein. The code for WriteMcDST is also there: /hermes/r??/writeMcDST/src/. The current production version should be linked to the /hermes/pro/... directory.
There is a lot of documentation already written about the innards of the Monte Carlo code, so I won't go into detail here. (See the links in the `Reference Pages' section below if you are interested). Let me just remind you of the programs that comprise the Monte Carlo production chain:
- GMC
- This is the physics generator package, and the first step in the production chain. GMC is actually a suite of separate programs which provide a common interface to a variety of physics generators. More on that below ...
- HMC
- The output of GMC is sent to HMC, where each generated particle is laboriously tracked through a detailed simulation of the spectrometer. HMC's output files include the same tables as the HDC output files, containing the simulated, calibrated response of the HERMES detectors to each event generated by GMC. Of course the output files also contain a lot of MC-specific tables, all of which come from GMC.
- HRC
- Since HMC output files look like HDC output files, HRC can process them. The MC production chain next calls HRC to perform track reconstruction on the simulated data.
- WriteMcDST
- Finally, the HRC files are sent to the DST writer, which converts them into a format very similar to that of the data uDST's.
Introduction to GMC
Let me next introduce you to some of the standard physics packages that GMC uses (see also MC Generators):
- LEPTO: An industry-standard package for the simulation of deep-inelastic lepton scattering.
- PEPSI: An enhanced version of LEPTO which adds support for polarized beams and targets.
- AROMA: Another modification of LEPTO ... this one is designed to perform precise calculations of near-threshold charm production.
- JETSET: An industry-standard fragmentation package for the simulation of final state hadrons, based on the repeated breaking of `colour strings' (or flux tubes, if you prefer). This is one of the celebrated Lund programs, so-called because they were written at Lund University by famous Swede Torbjorn Sjostrand and his entourage of Viking theorists. The fragmentation algorithm used in JETSET is called the Lund string model. LEPTO, PEPSI, and AROMA all call JETSET to perform fragmentation on the generated struck quark and target remnant.
- PYTHIA: Named after an insane oracle of ancient Greece, this is the ultimate Phonebook of physics generators. The phrase `screams of the Pythia' appears in the manual's preface, providing the innocent user with a gentle welcome to several hundred pages of options and tunable parameters. PYTHIA can simulate the physics output of beam X scattering on target Y. Here X and Y can describe any experiment you have the misfortune to dream up: fixed target ep scattering ... photons incident on a magic neutron target ... or colliding beams of top quarks. PYTHIA is tightly linked with JETSET -- they are both Lund programs, they share the same manual, and PYTHIA calls JETSET by default to simulate final-state hadron formation. PYTHIA is principally used at HERMES to simulate quasi-real photoproduction: very-low-Q2 events which are out of the range of the DIS generators LEPTO and PEPSI. AROMA, by comparison, is able to run down to very low Q2 (since it has the mass of the charm quark as an alternative hard scale), and is considered superior to PYTHIA in the near-threshold region of charm production. Another note: despite its inumerable options, PYTHIA does not support polarization in any way. An enhanced version called SPHINX does exist, but only supports polarization in proton-proton scattering experiments ... so we don't use it.
- DIPSI: A fine diffractive generator with an unfortunate name. DIPSI specializes in diffractive vector meson production.
- EPJPSI: Another fine diffractive generator, this one is dedicated to J/psi production, and borrows heavily from LEPTO code.
- RADGEN: This is a fast version of the popular POLRAD program for performing radiative corrections in both unpolarized and polarized lepton scattering experiments. RADGEN is much faster than the lethargic POLRAD because it uses lookup tables rather than repeated numeric integration. The Cadet is advised not to look at the RADGEN code at this stage, as it is a classic example of `legacy software' and may cause premature strokes and heart failure.
And finally, here are the principal programs of the GMC suite:
- gmc_dis: This is our numero-uno workhorse generator. It simulates the inclusive part of the spin-dependent DIS cross-section on its own, using a broad selection of structure functions and parton distribution functions. It calls PEPSI to simulate the struck quark and target remnant, and PEPSI then calls JETSET to fragment these objects into final-state hadrons. gmc_dis also calls the RADGEN package to simulate QED radiative effects ... elastic and quasi-elastic events may be thrown by this package, and any radiated photon is entered into the output tables.
- gmc_aroma, gmc_dipsi, and gmc_epjpsi all provide direct access to the indicated generator packages. These programs are our best generators for charm production, diffractive light vector meson production, and diffractive J/psi production respectively.
- gmc_pythia and gmc_pythia6: These programs call PYTHIA (versions 5 and 6 respectively) to simulate photoproduction.
- gmc_trans is a spin-dependent generator of (semi-inclusive) single-hadron events where the dependence on intrinsic transverse quark momenta is regarded to simulate transverse single-spin asymmetries. In the various versions of gmc_trans various models or parameterisations (aka global fits to our data) are implemented.
The rest of this mission is still under development.
Location of the production output
The official MC files are located in the directories
/mcdataxx
where xx is some number (currently) between 01 and 09. There you usually find various productions, some of them produced for very special purposes and studies. A good place to start to look for a current MC production is the MC Productions page of the Wiki.
Structure of the mcDST output files
As you have no doubt learned by now, Cadet, the first step in understanding the structure of the HERMES data files is to find the corresponding DDL's. The mcDST's are defined by two DDL files: g1_MC.ddl and g1.ddl. They can both be found in the usual release area /hermes/r??/ddl. The dataflow written is called semiMC ... here's the structure:
g1MC = {
g1MC = { g1MTrack, g1MEvent, g1MVert, g1Track }
smTrack, smCluster
}
The tables and dataflows in magenta come from g1_MC.ddl ... those in black come from g1.ddl. No doubt you recognize those black tables: they are the primary physics tables from the data DST's. Here in the Monte Carlo DST's they contain the reconstructed output of HRC, which is designed precisely to mimic the final output of our experiment. Here's the ticket about g1.ddl: just about any version from the data production will do since the Monte Carlo uses only 3 of its tables. All the others contain hardware oriented information like the status of the target magnet or logbook data quality bits ... things about which the Monte Carlo knows nothing.
Meanwhile, g1_MC.ddl supplies three new tables. These contain the `answers at the back of the book' -- the original information produced by GMC! For example, all generated particle tracks are preserved in g1MTrack, including those that were not reconstructed and those corresponding to rapidly decaying virtual particles that we could never see. There is a very useful link between the `true' and `reconstructed' information: each row of g1Track is linked to the corresponding row in g1MTrack.
Important MC details: particle codes and normalization
In real life we can only determine the probably identity of a particle (pion? kaon? electron?) using the responses of our PID detectors. But in the world of Monte Carlo we know everything! The Monte Carlo packages we use at HERMES record the actual identity of particles using either the GEANT3 or Lund particle code schemes. The GEANT3 scheme contains about 50 codes, while the Lund scheme contains many more. The reason is simple. The GEANT3 package tracks particles through the materials of a detector ... and it therefore has no interest in a particle such as the rho meson which decays so fast (within about 1 fm) that no detector can observe its brief career as a rho. Detectors such as ours can of course deduce the existence of a rho meson by measuring the invariant mass of the pions to which it decays ... but the rho itself is not observed directly. The Lund programs, however, are not detector tracking packages but physics generators and they are very interested in the rho. The Lund particle-code scheme contains codes for all manner of hard-scattering exotica including individual quarks, exchange bosons, colour-strings, and every hadronic resonance you can imagine.
You will typically find variables called iGType and iLType in Monte Carlo data files, referring respectively to the GEANT3 and Lund codes of the particle in question. You will also see such things as iLParent, providing the Lund code for a particle's parent. If you are going to do any sort of Monte Carlo study, I strongly suggest that you print out the following particle code table and keep it with you at all times, coffee stains and all. It lists both GEANT3 and Lund codes. I created this table a long time ago for my own convenience, and my printed copy sports many coffee stains.
An important aspect of using Monte Carlo data is to do a proper normalization. Basically it boils down to the question: ``If I make a Monte Carlo file containing 20,000 events, and use it to create a histogram of pion yields vs z, how do I turn my result into a meaningful pion rate or yield?.
Event weighting
First of all, it is important to know that events generated with the event generators LEPTO, PEPSI, AROMA (so, most importantly, by the gmc_disNG program) come with an event weight associated with them. This weight can be found in the variable Weight in the g1MEvent table. The reason for the existance of this weight is this: The kinematics of an inclusive DIS event depend on two different independent variables, which can be chosen to be eg
- scattering angle and energy of the scattered lepton or
- x and y or
- x and or
- x and
The last three options are available in LEPTO/PEPSI, usually Fehler beim Parsen (MathML mit SVG- oder PNG-Rückgriff (empfohlen für moderne Browser und Barrierefreiheitswerkzeuge): Ungültige Antwort („Math extension cannot connect to Restbase.“) von Server „https://wikimedia.org/api/rest_v1/“:): {\displaystyle (x,Q^2)} is used. LEPTO/PEPSI now generates events with a flat distribution in the Fehler beim Parsen (MathML mit SVG- oder PNG-Rückgriff (empfohlen für moderne Browser und Barrierefreiheitswerkzeuge): Ungültige Antwort („Math extension cannot connect to Restbase.“) von Server „https://wikimedia.org/api/rest_v1/“:): {\displaystyle (x,Q^2)} plane. The actual cross section, however, is not flat in this kinematic plane. The event weighting factors take into account the Fehler beim Parsen (MathML mit SVG- oder PNG-Rückgriff (empfohlen für moderne Browser und Barrierefreiheitswerkzeuge): Ungültige Antwort („Math extension cannot connect to Restbase.“) von Server „https://wikimedia.org/api/rest_v1/“:): {\displaystyle x} and Fehler beim Parsen (MathML mit SVG- oder PNG-Rückgriff (empfohlen für moderne Browser und Barrierefreiheitswerkzeuge): Ungültige Antwort („Math extension cannot connect to Restbase.“) von Server „https://wikimedia.org/api/rest_v1/“:): {\displaystyle Q^2} dependence of the cross section. So only the combination of Fehler beim Parsen (Konvertierungsfehler. Der Server („https://wikimedia.org/api/rest_“) hat berichtet: „Cannot get mml. Server problem.“): {\displaystyle \mathrm {event} \times \mathrm {eventweight} } leads to a physically meaningful kinematic distribution of events.
In contrast, the PYTHIA generator generates events according to the cross section of the different physics processes (DIS, PGF, VMD, ...). Thus the relative weight of the events is already correct and all events have a weight of 1.
So, when writing an analysis code for Monte Carlo data it is a good idea to always use the weight provided by g1MEvent.Weight. It is essential for gmc_disNG data, and it does not harm for PYTHIA data.
Normalization to cross section
Taking the event weight into account leads to proper relative normalization of the events in a MC production. But to compare distributions of one MC production with data or another MC production, they have to be normalized to cross section. The procedure is explained on the general MC page.
Running the program(s)
Reference pages
Continue to MISSION 3: ADAMO AND DAD.
N.C.R. Makins (makins@uiuc.edu)