UDST production daemon
Section 1: the uDST production daemon
The production of uDSTs is directed by a daemon. The daemon directs the production by executing the processes which create and remove uDSTs according to the control instructions in a set of three files. These files are:
- the udst.conf file - this file contains the configuration settings for the production, e.g. the version of the uDST production, the source for the event data, the source for the slow data, the location and names of scripts executed to produce the uDST production, etc.
- the fill.conf file - this file contains the information about each fill handled by the production. This file is used to keep track of the status of the production and to direct the daemon as to how to handle the production of each fill.
- the disk.conf file - this file contains the list of production disks upon which the daemon is allowed to produce and thus store uDSTs.
If these three files are present and contain reasonable settings, the production manager can execute the uDST daemon. To do so, the manager should set his/her current working directory to the directory containing the executable code for the daemon (typically the /bin subdirectory of a production) and then enter one of the following two commands:
| - |
this command executes the daemon in "interactive" mode and allows the user to monitor the status and control the operation of the daemon directly.
| |||
| - |
this command executes the daemon in "daemon" mode. In this mode, the daemon runs as a background process which does not terminate when the user logs out. Since the daemon receives its directions from the fill.conf file or, as needed, the section2.html monitor program, the user receives the UNIX prompt back immediately after the daemon is successfully started.
|
Regardless of how it is started, the daemon's first task is to check that there are no other daemons running on the production. Since the daemon creates the "udstlock" file at startup and removes it at termination, the presence of this file indicates that the daemon is already running. If the daemon detects this file at startup, it prints out an error message and then terminates. The user should then look for aother processes which is currently running the daemon (via the "ps -fu" command). If another process is not found, the user should delete the "udstlock" file and then re-run the daemon.
However, when the "udstlock" file is not present at startup, the daemon starts its initialization by creating this file and writing its process identification number (pid) into it. Next, the daemon reads the three input files which, as mentioned above, configure and control its operation. After these files are read, the daemon will perform its first production pass. As long as the limit on the number of production passes or available disks has not been exhausted, the daemon will initiate additional production passes whenever an existing production passes completes, its production holdoff counter expires (as set in the udstd_files.html#udst.conf udst.conf file), or it receives a "initiate production" directive from the section2.html monitor program.
During a production pass, the daemon selects a fill for processing. This selection is accomplished by stepping through the udstd_files.html#fill.conf fill.conf file until an entry is located that, via the contents of the status/control (column 8) field, instructs the daemon to perform some action. Presently, there are five actions which the daemon can initiate for a fill. These actions are:
- section1.html#production production - produce the uDSTs for this fill
- section1.html#setup setup - setup the production structure for a fill, including executing the preprocessing scripts, but do NOT actually produce the uDSTs. This option is intended to allow the production manager to debug a problem with a particular fill
- section1.html#redo redo - produce a new set of uDSTs for this fill
- section1.html#wipe wipe - erase the production setup and any resulting uDSTs for this fill but do NOT make new ones at this time.
- section1.html#inspect inspect - erase the current set of uDSTs but keep the production structure intact so that the uDSTs can be hand-made by the production manager. This option is intended to assist with debugging the uDST production program on a fill which failed production.
The daemon's handling of each of these actions is detailed in the following five sections of this document.
Production of uDSTs by the daemon
Of these possible actions, the most common action is to produce the uDSTs for a fill. To do this, the daemon selects the most free, not presently in use, production disk from the list of possible disks in the disk.conf file. It then creates a subdirectory on this disk in which the uDSTs for the chosen fill will be produced and thus stored. This subdirectory will be named according to the following convention:
| where: |
|
Next, the daemon spawns a subprocess to make the uDSTs for the fill. This subprocess has its working directory set to the directory just created by the daemon and, from the information in the udst.conf file and fill's entry in the fills.conf file, receives a list of environment_variables.html environment variables specifying the uDST production parameters. The daemon next redirects the output from this subprocess to the script.out file located in the production directory and finally releases the subprocess by having it execute the command, program, or script specified by the "command" parameter in the udst.conf file. Now that the subprocess is running, the daemon updates the entry for this fill in the fill.conf; specifically, it sets the status/command (column 8) field to PROC and sets the disk (column 9) field to the production disk selected for the production. The daemon will then hibernate until either the subprocess terminates or, due to expiration of its holdoff timer, another parallel production pass is initiated.
In a standard uDST production, the subprocess executes the production_scripts.html#handle_uDSTproduction handle_uDSTproduction script. This script first executes the production_scripts.html#setup_production setup_production script to create the necessary links and files needed by the uDST production program, then it executes the production_scripts.html#execute_uDST_preprocessors.sh execute_uDST_preprocessors.sh script to perform the preprocessor steps needed to collect and to generate necessary input files for the uDST production, and finally executes the production_scripts.html#do_production do_production script to run the uDST production program. Each of these scripts has extensive checks and report both progress and error messages in logfiles which are named by adding the suffix of ".out" and ".err" to the script's name. These files, along with the script.out file, are kept so that the production manager can locate a problem with the production process.
When the do_production script finishes, the handle_uDSTproduction script exits and thereby terminates the subprocess created by the daemon. The completion of this process causes the operating system to wake up the daemon. The daemon then updates the fill.conf file with the status of the now completed production. If production was successful as indicated by a return code of 0 from from the subprocess, the daemon sets the control/status (column 8) field of the fill's entry in the fill.conf to done. Otherwise, it sets this field to fail. The cause of a failure can be determined by looking at the various logfiles from the production. Regardless of the outcome of the task, the daemon will initiate another production pass.
Setup of the production structure by the daemon
From the beginning, the setup of the production structure for a fill directive is handled in the same manner as the production of the uDSTs for a fill. However, since the uDSTs themselves are unwanted, the process terminates after executing the production_scripts.html#execute_uDST_preprocessors.sh execute_uDST_preprocessors.sh script, i.e., the final script which executes the uDST production program, production_scripts.html#do_production do_production, is not executed in this case. If the scripts complete successfully, the daemon will be update the control/status field of the fill's entry in the fill.conf file to check to indicate that the fill is awaiting hand-checking. The production manager can then run the uDST production program by hand in order to diagnose a problem. On the other hand, if a problem was encountered while setting up the production structure for a fill, the daemon will update the control/status field of the fill's entry to fail. The production manager should then investigate the log files in the production area for the fill to figure out the cause of the problem.
Removal of uDSTs by the daemon
Sometimes, for whatever reason, it is necessary to remove the uDSTs for a fill. Since the daemon is responsible for managing the disk space used for uDSTs, it is not a good idea to remove uDSTs by hand. Instead, the production manager should instruct the daemon to remove the uDSTs by setting the status/command field (column 8) of a fill's entry in the fill.conf file to WIPE. When, during a production pass, the daemon selects this fill, it will remove the uDSTs. Just as for the producing uDSTs, the daemon removes uDSTs by creating a subprocess, setting the production directory as the subprocess's current working directory, and then passing a list of environment variables to the subprocess. The daemon then has this subprocess execute the command, program, or script specified by the "wipe_production" parameter in the udst.conf file and, as with the production of uDSTs, goes to sleep until the subprocess terminates.
In a standard uDST production, the subprocess executes the production_scripts.html#wipe_production wipe_production script. This script first performs reasonable integrity checks to avoid removing the wrong set of uDSTs and, if all is okay, removes the production directory and all the symlinks for this fill from disk.
When this subprocess completes, the daemon is awakened by the operating system and subsequently updates the fill.conf file by clearing the disk (column 9) field and setting the status/command (column 8) field to "hold". In this way, the fill's entry now instructs the daemon to ignore this fill and thus the daemon will passes over this fill during subsequent production passes.
Remaking of uDSTs by the daemon
To reproduce the uDSTs for a fill, the old uDSTs first need to be removed from disk and then the fill needs to be requeued for production. To do this, the production manager should instruct the daemon to remake the uDSTs by setting the status/command field (column 8) of a fill's entry in the fill.conf file to REDO. The daemon, when it selects this fill during a production pass, erases the old uDSTs for the fill using the same procedure described in the Removal of uDSTs section above. However, in this case, the fills needs to be rescheduled for production after the old uDSTs are deleted. So, when the daemon updates the fill.conf file after being awakened by the termination of the "delete the uDSTs" subprocess, the daemon clears both the status/commmand (colum 8) and the disk (column 9) fields of the fill's entry. This new entry for the fill now indicates that the fill is pending production and thus the fill will be reproduced when it is selected by the daemon during a later production pass.
Debugging the uDST production by the daemon
As a special feature for assisting with debugging the uDST production program on a fill, the uDSTs for a fill can be deleted without removing the production infrastructure. The production manager can then run the uDST production by hand for this fill and thus investigate any odd features of the production. To use this feature, the production manager should first have the daemon produce the uDSTs for the fill and, then, instruct the daemon to remove the uDSTs but keep the production infrastructure by setting the status/control (column 8) field for the fill's entry in the fill.conf file to INSP.