PCFarm Info
Page maintainer: Eduard
| This page is considered done. It been reviewed by Alexander. There may be missing elements, but they are all flagged and the text has no errors. |
Disclaimer: Most of the relevant information (slightly outdated, since the setup has been upgraded
several times since the original launch in 2002) concerning the PC Farm can be found here.
Also look to Bootcamp for some Unix basics, and UNIX Tricks for some clever ideas.
AFS tips and tricks (home directories)
Quota management
Home directories at DESY are managed through a centralized AFS system. (for HERMES /user disk quota check here). To check your AFS home quota use the following command:
fs lq ~
You'll get something like:
Volume Name Quota Used %Used Partition user.yourname 130080 97707 92%<< 37% <<WARNING
The numbers are in kBytes. If you run out of quota (e.g. "Used>Quota"), you will have troubles with logging in KDE and applications like pine, gv, firefox etc. If that's the case, try logging in text-mode (Ctrl-Alt-F1 from a Linux desktop), and cleaning up. A good candidate to blame for filling your quota up is the firefox cache:
rm ~/mozilla/firefox/*/Cache/*
If that doesn't help, search for large files/directories for a further cleanup (may take a while):
cd ~ du -ks * . _ sort -n
Ignore the .OldFiles directory, that's your daily backup.
If you REALLY need more space, send an email to one of the HERMES AFS administrators
File and Directory Permissions (aka Access Control Lists - ACLs)
AFS ignores the usual UNIX file permissions. Instead a sophisticated Access Control List (ACL) is acknowledged. Unlike the UNIX file permissions, ACLs apply to directories rather then individual files. Try the following command in your home directory:
fs la Access list for . is Normal rights: system:anyuser l yourname rlidwka
The letters rlidwka have the following meanings:
| l | lookup | Basically can do ls in that directory, without having the file size and dates |
|---|---|---|
| r | read | Read the contents of the file |
| i | insert | Create new files/directories |
| w | write | Write into files |
| d | delete | Delete files/directories |
| k | locK | Lock Files |
| a | administer | Modify the ACLs |
In the example given above (the default setting for a newly created home directory), the owner of the directory is authorized to do everything, and the system:anyuser (anybody else) can only list your files, without seeing their content.
Special Directories and Modifying ACLs
Every new home directory comes with a bunch of automatically generated directories. Some have predefined ACLs, for example:
fs la www Access list for www is Normal rights: usg rl system:administrators rlidwk system:anyuser rl yourname rlidwka
Here, the important lines are the "usg rl" and "system:anyuser rl", which allow anyone in world to access (read) the files in the www directory. That one is actually a special directory, which can be accessed via http://www.desy.de/~yourname . If you don't have (and would like to have) such a webpage, you can create one, and modify the permissions accordingly:
mkdir ~/www fs sa -dir ~/www -acl system:anyuser rl
Similarly, the ~/hermes directory has the following permissions by default:
fs la ~/hermes Access list for hermes is Normal rights: usg:hermes rl system:administrators rl yourname rlidwka
And the important line here is "usg:hermes rl" which means everybody at HERMES can read the contents of files which you put in this directory (good for storing presentations, papers etc).
If you are drafting a paper and would like to allow write access to a restricted list of people in the given directory, here's what you wanna do:
mkdir -p ~/papers/g1draft fs sa -dir ~/papers/g1draft -acl peter rlidw fs sa -dir ~/papers/g1draft -acl john rlidw fs sa -dir ~/papers/g1draft -acl james rlidw
Where peter, john, james are the apostles login names of the drafters you want to give full read/write/create/delete access to that directory.
Another example - if you have a directory with the right permissions, and you just want to give the same permissions to another directory, the easy solution is:
fs ca -fromdir ~/papers/olddraft/ -todir ~/papers/newdraft
For other AFS commands see here
Backup policy
Every AFS directory is backed up regularly. For home directories it is done every night, so if you accidentally deleted a file, you can immediatelly restore it:
rm ~/.profile cp ~/.OldFiles/.profile ~/
For restoring files which were rewritten several times AFS@DESY offers the beautiful Tivoli utilities to handle the incremental backup system. For example, to restore your ~/.root_hist from half a year ago, you do the following:
dsmc_afs restore -inactive -pick ~/.root_hist
After giving your AFS password (and some waiting), you will get a list of backup versions for that file:
TSM Scrollable PICK Window - Restore
# Backup Date/Time File Size A/I File
-----------------------------------------------------------------------
1. _ 2008.04.26 13:33:21 15,30 KB A /afs/desy.de/user/d/dich/.ro
2. _ 2008.04.25 23:55:03 17,19 KB I /afs/desy.de/user/d/dich/.ro
3. _ 2008.04.24 14:09:51 15,99 KB I /afs/desy.de/user/d/dich/.ro
4. _ 2008.04.21 21:33:03 14,49 KB I /afs/desy.de/user/d/dich/.ro
5. _ 2008.04.17 14:28:58 13,14 KB I /afs/desy.de/user/d/dich/.ro
6. _ 2008.04.15 21:29:13 13,87 KB I /afs/desy.de/user/d/dich/.ro
7. _ 2008.04.13 13:31:45 13,26 KB I /afs/desy.de/user/d/dich/.ro
8. _ 2008.04.12 13:32:43 13,25 KB I /afs/desy.de/user/d/dich/.ro
9. _ 2008.04.04 13:35:45 11,99 KB I /afs/desy.de/user/d/dich/.ro
10. _ 2008.04.03 14:07:35 11,78 KB I /afs/desy.de/user/d/dich/.ro
11. _ 2008.04.02 13:32:38 7,89 KB I /afs/desy.de/user/d/dich/.ro
12. _ 2008.03.26 13:46:11 7,81 KB I /afs/desy.de/user/d/dich/.ro
13. _ 2008.03.08 00:01:23 7,96 KB I /afs/desy.de/user/d/dich/.ro
14. _ 2008.03.05 15:19:22 7,57 KB I /afs/desy.de/user/d/dich/.ro
0---------10--------20--------30--------40--------50--------60--------7
<U>=Up <D>=Down <T>=Top <B>=Bottom <R#>=Right <L#>=Left
<G#>=Goto Line # <#>=Toggle Entry <+>=Select All <->=Deselect All
<#:#+>=Select A Range <#:#->=Deselect A Range <O>=Ok <C>=Cancel
where you can select the actual version(s) you think should be restored, by entering it's # number (or a range) and pressing enter. After you're satisfied with the selection, press O<Enter>. The selected file will be restored in its former location (a choice to overwrite the existing files will be given).
ATTENTION: only such AFS files which are readable by system:administrators will be stored in the TSM backup node AFSFILE. AFS setacl example:
fs setacl -dir private -acl system:administrators rlidwka
Alternatively, use the GUI client dsm_afs. There, choose the "Restore" option, then select the View->Display active/inactive files option, then go to File Level->afs/desy.de/user/s by clicking on the + sign (instead of "s" take the first letter of your login name). Select your home directory by clicking on the little grey square on the right side of the + sign. The content of directory will be displayed. Select the file(s) you'd like to restore (grey buttons on left of the name) and press "Restore" button. You'll have the choice of restoring the files in their original location OR choose a new location. In addition, a search function is available.
More information available here.
Interactive nodes
| This page is considered done. It been reviewed by Alexander. There may be missing elements, but they are all flagged and the text has no errors. |
Two workgroup servers (worf and geordi) are available for interactive login via ssh.
The interactive nodes are equipped with ~300GB of /scratch storage (4 days lifetime for files, cleanup without further notice), and have the NFS-based disks as well as tape storage mounted under /user?? and /acs/, respectively.
/user?? disk quota
The NFS disks /user0[1..7] have volumes from ~350-800GB each. Every user who needs to store files gets a directory created (upon request to pcfarmnfs@hermes.desy.de), with a symbolic link in /user/yourname . The default quota is set to 3GB, and can be expanded upon need. Additional dedicated space is available for different analysis groups in /group0[1..3] and for MC development and productions in /mcdata0[1..9].
/user?? disk backup policy
Though the /user??, /group?? and /mcdata?? disks are officially NOT backed up, there is an effort to save at least some crucial stuff. The backup is performed on a nightly basis and includes only ASCII (source, header, script, macro, ...) files, not exceeding a size threshold of 1MB. Binary files like n-tuples are NOT backed up. The last night backup can be restored directly on the interactive nodes by copying the files back from respective directory in /backup0[1..2]. For more sophisticated backups you'll need to contact PC Farm administrators.
Interactive job policy
As the name suggests, the interactive nodes are installed to run interactive jobs (email, browsing, editing, compiling, testing, doing interactive analysis etc, just to name a few). What they are NOT designed for is running of CPU/memory/disk-intensive tasks. To save the users of the PCFarm from such unauthorized (sometimes unintentional, e.g. due to a bug in a test code) abuse, there's a watchdog program running on the interactive nodes which kills every job that takes more then 90% of the CPU for longer then 10 minutes. For every job that needs intensive computing resources, one has to consider submitting it to the batch system.
Batch system (PBS)
Out batch system is based on OpenPBS V2.3, with the server running on kirk. The system is arranged such that every user gets a fair share of resources, which are then optimally utilized. It's based on queues with different limits. For the job to be executed, a shell script has to be prepared and submitted to the batch system.
Submitting, Monitoring and Killing jobs
To submit a job to the batch system, use the qsub command (man qsub will give all options):
qsub -q M myscript.sh
To check the status of your job, run qstat:
qstat Job id Name User Time Use S Queue ---------------- ---------------- ---------------- -------- - ----- 992643.kirk myscript.sh yourname 02:51:44 R M
The "Job id" is a unique number for this particular job, which can be used later to get detailed information or stop the execution of the job. The output of the qstat displays also the queue it's submitted to, the status of your job (under S, can be (Q)ueued, (R)unning, (E)nding), and the actual CPU time already used
Before you submit, try to figure out how much time your job will need to complete (e.g., if you plan to analyze 30000 runs, try your code first with 100 and scale). Depending on the need, choose a corresponding queue. In the above example M queue is addressed, which will reserve 8 hours of CPU time for your job. If not finished by that time, your job will be automatically killed afterwards and a notification email sent to your DESY account.
If you realized the code does something wrong (e.g. you expected it to complete in 10 minutes but it still runs after 2 hours, or if you found a bug in the code AFTER having it submitted, or if the intermediate output looks suspicious), you can explicitly tell the batch system to terminate the execution by issuing:
qdel 997725
where 997725 is your job id.
Useful tips
Sometimes it's useful to learn on which batch node your job is being executed (or queued) currently. For that the -n option to qstat can be used, e.g.:
qstat -n 997725.kirk
kirk.desy.de:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
----------- -------- ----- -------------- ------ --- --- ------ ----- - -----
997725.kirk marukyan L batch_d_dvcs.m 2083 1 -- -- 24:00 R 00:59
fhebtn34/0
In this example, it's fhebtn34.
By default, the batch system stores the standard output and standard error of your job into separate files (myscript.sh.oXXXXXX and myscript.sh.eXXXXXX, with XXXXXX being the job id). If you want these files to be merged, add the -j oe option to qsub.
If your job gets killed, you get a notification email by default. This behavior can be controlled by the -m string option, where string is a combination of n,a,b,e letters, with:
n no email is sent
a mail is sent when the job is aborted by the batch system
b mail is sent when the job begins execution
e mail is sent when the job terminates
e.g., -m ae will notify you when your job finishes, by either normal or forced termination.
Queue Length
- S 10 minutes
- M 8 hours
- L 24 hours
- XL 72 hours
- XXL 240 hours
To obtain this information from the PBS job system, use the following commands:
$ qmgr -c "list queue M"
Example of Batch Script
$ cat my_script.com #!/bin/sh cd /scratch/$PBS_JOBID echo `pwd` echo `hostname` echo $PBS_O_WORKDIR echo $PBS_JOBNAME echo $PBS_JOBID echo $PBS_QUEUE
This script will just "cd" to the working directory on a batch node, print out host name where it runs, current directory, job name and identifier, queue name. You can use any other shell commands in a batch script, like "cp" to copy over input/output files from/to your private /user?? area.
Using GRID at DESY
| Important note: This page has not yet been reviewed by an expert! Use the information below with caution.
Other pages that have not been reviewed yet can be found in the category FORREVIEW. |
GRID is the new worldwide project for efficient utilization of computing resources. It uses certain abstraction mechanisms for managing user input and output files, job submission, control and monitoring. In this section you'll find a very much simplified (hence incomplete, yet hopefully usable) step-by-step guide describing how to Obtain a GRID certificate, Submit very simple jobs, Transfer data (files) to and from GRID.
NOTE: It is important to remember that the batch nodes of the GRID system are clean linux installations (currently Scientific Linux 5), namely without AFS or CERN software, let alone HERMES software. In practice this means that all the necessary software has to be compiled on some SL5 machine (pal cluster is a good choice) and then transferred to the Grid system, using either the sandbox or the storage element (recommended).
Obtaining a GRID certificate
To use the GRID a special certificate is required. A user certificate consists of a private key with a private password and a certified public key. The private key is exclusively possessed by the user and is not known at any stage to the registration authority (RA) or Certification Authority (CA). Lost private/public keys or the password can not be recovered by any means.
A certificate is valid for 1 year and must be renewed before its expiration. Users get notified by e-mail 3 weeks before this date by the CA. In Germany user certificates are issued by GridKa of FZ Karlsruhe (FZK). DESY acts as a Registration Authority (RA) for GridKa and provides an registration service.
Application process
First, download and print the application form, then fill it and have it signed by your supervisor. The signed form has to be sent to DESY IT together with a copy of your passport/ID card.
Next, point your browser to GridKa web portal and follow the instructions, namely:
- Import the root certificate into your browser
- Click on "Your first certificate" and fill in the form
- Press "Send"
From now on your certification process has started, you will receive several emails with simple and detailed instructions on what to do next. Remember that your password is not recoverable!
Adding a Virtual Organization (VO)
As soon as your certificate has arrived (and imported in your browser), you should add yourself to a VO, in our case hermes. For that, point your browser to HERMES VOMRS @ DESY and follow the instructions. You would need to confirm your application following email instructions. Also your VO administrator has to confirm that you are allowed to run GRID jobs for HERMES.
Conversion steps
To use your certificate on the PCfarm it needs to be converted to a PEM format (originally it's a CRT file). First, export your certificate to a file using the instructions of Exporting certificates from your browser into a file usercred.p12. Place that file in your AFS ~/.globus directory. Remember to change the safety permissions on that file:
chmod go-rwx ~/.globus/usercred.p12
Login to pal.desy.de via ssh, go to the directory ~/.globus and run
source /afs/desy.de/project/glite/UI/etc/profile.d/grid-env.sh
Note that the above command need to be ran every time you login to pal in order to submit or monitor any jobs.
Therefore, best of all make an init script in e.g. ~/grid/ with contents like:
source /afs/desy.de/project/glite/UI/etc/profile.d/grid-env.sh export LFC_HOST=`lcg-infosites --vo hermes lfc` voms-proxy-info
and call it every time after logging in, if you plan to work with grid.
Back to the exported certificate - use openssl to convert it to a PEM format:
openssl pkcs12 -nocerts -in usercred.p12 -out mykey.pem
When asked for, supply the password chosen when exporting the certificate and the PEM pass phrase (this is the same as the one for exporting the certificate). As a result, a file named mykey.pem will appear in your ~/.globus directory.
Testing your certificate
Continue, on pal again, run
voms-proxy-init --debug -verify -voms hermes
(it will ask for the password as usual). If the command output ends with something like:
.................................................................................++++++ Done Your proxy is valid until Tue May 26 08:28:42 2009
then your certificate has been successfully used to create a Grid Proxy. This proxy (similar to an AFS token) is valid for 12 hours by default, and is used by the submitted jobs to access Grid resources. As soon as the proxy expires, your job will stop, so it is your responsibility to take care that the jobs run for a limited time only. The proxy can be initialized with a longer lifetime than the default 12 hours (up to 96 hours) using the options "-valid 96:00 -vomslife 96:00", but in reality the DESY batch is limited to 48 hours cpu and 72 wall time, therefore such values don't make much sense.
Running
voms-proxy-info
will show the remaining lifetime for your proxy.
Submitting (simple) jobs to GRID
A simple batch job consists of a shell script and a Grid job definition file. Their contents and submission process are described below.
Composing the batch script
For a simple test, create an env.sh file in any directory of your choice, with the contents:
#! /bin/sh ######################################################################### /bin/hostname -f /bin/date /usr/bin/id /bin/pwd /bin/ls -al /bin/df . /usr/bin/env $GLOBUS_LOCATION/bin/grid-proxy-info -all #########################################################################
Composing the JDL (Job Definition Language) file
To run the above script, create an env.jdl file (this is the globus "submitter" script) with the contents:
VirtualOrganisation = "hermes";
Executable = "env.sh";
Arguments = " ";
StdOutput = "out";
StdError = "err";
InputSandbox = {"env.sh"};
OutputSandbox = {"out","err"};
Submitting the job
To submit the above test job run
glite-wms-job-submit -a -o pid env.jdl
Here, the -a option will delegate the job automatically (recommended), and -o pid will put the process(job) ID info into the file named 'pid' in the same directory. The 'pid' file may contain more than one job information.
Monitoring the job
Check the job status with
glite-wms-job-status -i pid
The output will contain the keyword "Scheduled" for a while, then "Running", and when it returns "Done", the output of the job may be retrieved. Note that the status is being updated every several minutes, therefore our test job will still have the status "Running" for quite a while after being completed in reality.
Input/Output in GRID
The simple test job described above uses the "sandbox" mechanism for input/output. Although convenient, it may only be used for relatively small data volumes (up to few MB, usually the standard output and error logs). For larger data transfers Grid-specific storage elements (SE) have to be used.
Using the JDL sandbox
The 'env.jdl' we used for test contains the following lines:
InputSandbox = {"env.sh"};
OutputSandbox = {"out","err"};
The InputSandbox statement tells the system to copy the "env.sh" from our local system to the remote batch node before execution (the Executable statement indicates which exactly file has to be ran). out and err are standard names for redirecting the standard output and error, respectively. The OutputSandbox statement tells the system that these files are to be copied back (upon request, e.g. NOT automatically, as opposed to PBS), otherwise all the output which is not in the sandbox or not saved by any other means (usually on SE) is deleted upon job completion.
To retrieve the output sandbox contents, run
glite-wms-job-output --dir ./output -i pid
when the job completes. The './output' directory contents will be overwritten, so take care... If the 'pid' file contains more than one job, they will be listed and a choice offered. Only sandboxes of jobs with status "Done." can be retrieved.
Using the Storage Element (SE)
The Storage Element (SE) is a dedicated (virtual) space in the Grid infrastructure. It can be accessed from the pal host for initial data storage (executables, libraries, kumacs etc.) needed for the job. Similarly, the job script may use it for output storage, which can then be retrieved to pal.
First, define the following environment variable:
export LFC_HOST=`lcg-infosites --vo hermes lfc`
The dedicated space for hermes in SE is called /grid/hermes . To list its contents, run:
lfc-ls -l /grid/hermes
Similarly, to create a subdirectory in that space, use
lfc-mkdir /grid/hermes/nameit/
Copying files onto SE is a creative process, hence you use:
lcg-cr -d dcache-se-desy.desy.de -l lfn:/grid/hermes/nameit/testfile file:/somedir/testfile
(remember that the testfile has to be located on pal where our NFS disks (/user??, /group?? etc) are not accessible, therefore take care to scp it first to pal). For deleting, use:
lcg-del -d dcache-se-desy.desy.de -l lfn:/grid/hermes/nameit/testfile
To access the SE from your script don't forget to define the LFC_HOST variable, and use
lcg-cp -v lfn:/grid/hermes/dich/nameit/testfile ./testfile
Using GRID for MC productions (NEW!!!)
| This page is not yet ready for use or review. It is assigned to Eduard. The page isn't being edited right this minute, so feel free to add any information you have about this subject. |
| Important note: This page has not yet been reviewed by an expert! Use the information below with caution.
Other pages that have not been reviewed yet can be found in the category FORREVIEW. |
Here is a step-by-step guide to GRID-based MC productions. It assumes that you already have a valid GRID certificate, registered at HERMES VO, and are roughly familiar with general flow of MC productions.
Storage
Local (final)
As GRID is normally used for large productions, the first thing to check is the available storage at the location where your files will be finally stored. This can either be
- /mcdata?? (HERMES PCfarm NFS-based fileservers, quick and easy access, but limited storage, run df -h /mcdata?? to find out which of the discs has sufficient free space)
- /acs/mc? (tapes, virtually unlimited, yet slow and less convenient access)
- /acs/scratch (dCache disc-only pools, total space ~22TB, as fast as NFS, access via dcap-preload library)
General recommendations apply, e.g. coordinate the location usage with people responsible for storage and the management. Then, create corresponding directory structures, indicating relevant information in the name, like: RESULTS_pythia_pol_htc_tmc_2004_p_posi .
Remote (GRID)
From any grid-enabled host (e.g. pal cluster), run
$ lcg-infosites --vo hermes se
The resulting numbers tell the corresponding space available in all organizations supporting HERMES VO (currently DESY HH and DESY Zeuthen)
Avail Space(Kb) Used Space(Kb) Type SEs ---------------------------------------------------------- 29031557141 9768442859 n.a globe-door.ifh.de 3459305720 3376341868 n.a dcache-se-desy.desy.de
(Note that Zeuthen doesn't report the space reserved for HERMES but the total, which is somewhat useless. Please use DESY HH SE for now).
If necessary, clean up older productions to free space (like in the example above, where half of the available space is used by an old production). Use a script like below on pal:
#!/bin/bash
PRODIR=pythia_pol_05p
for fu in `lfc-ls /grid/hermes/dich/$PRODIR/` ; do
echo -n $fu " " ; date
lcg-del -a lfn:/grid/hermes/dich/$PRODIR/$fu
if [ $? -ne 0 ] ; then # something went wrong
echo Something went wrong with $fu
fi
done
Computing resources
From any grid-enabled host (e.g. pal cluster), run
$ lcg-infosites --vo hermes ce
The resulting numbers tell the corresponding CPU cores available at the moment:
#CPU Free Total Jobs Running Waiting ComputingElement ---------------------------------------------------------- 4076 0 661 566 95 grid-cr5.desy.de:8443/cream-pbs-desy 784 0 0 0 0 lcg-ce0.ifh.de:2119/jobmanager-lcgpbs-hermes 4076 822 661 566 95 grid-ce5.desy.de:2119/jobmanager-lcgpbs-desy 4076 823 661 560 101 grid-ce4.desy.de:2119/jobmanager-lcgpbs-desy
The relevant numbers are those listed with grid-ce5. Another possibility of obtaining visually enriched information is via GRID Monitor. The availability of free CPU cores should usually not affect your planning of MC job submission, expect the cases where the GRID is completely overloaded with queued jobs. In this case there is a danger that your proxy may expire before the jobs finish running. Take care to specify the maximum lifetime for your proxy right before submitting the jobs, and split them into smaller chunks (e.g. submit 200 jobs per day and update your proxy every time before submission).
Necessary Software
GRID batch nodes are "vanilla" installations of Scientific Linux 5 (currently). This means there is NO additional software package installed on these nodes, like CERN, Adamo, HERMES, ROOT etc. It is your responsibility to make sure all necessary packages are present before your job will start running. Software packages that are needed often and by many users can be pre-packaged by HERMES VO GRID software manager account. Currently there is only the HERMES software tree r25, located in $VO_HERMES_SW_DIR on DESY GRID nodes. Since theoretically your jobs may be running anywhere in the world, it is always a good idea to check if the software is available in $VO_HERMES_SW_DIR, otherwize unpack a preinstalled tarball from lfn:/grid/hermes/hermes_r25.tgz. A similar tarball of r24 software tree is available as well. A recommended way of preparing the software is as follows (assuming a sh/bash type script):
HERMESROOT=${VO_HERMES_SW_DIR:=/opt/vo/hermes}/r25
LFC_HOST=grid-lfc.desy.de
export LFC_HOST
echo First check if HERMES software is there:
if [ ! -e $HERMESROOT/bin/hrc ] ; then
echo HERMES software directory not available at $HERMESROOT
echo Changing HERMESROOT from $HERMESROOT to $HOME/hermes/r25
HERMESROOT=$HOME/hermes/r25
export HERMESROOT
mkdir -vp $HERMESROOT
echo Creating a custom local copy
echo Copying the HERMES r25 release to $HERMESROOT
lcg-cp lfn:/grid/hermes/puthermes.sh.tgz $HERMESROOT/hermes_r25.tgz
cd $HERMESROOT
tar zxf hermes_r25.tgz
echo Done.
else
echo HERMES software already available at $HERMESROOT
fi
Other input files
Normally, any HERMES MC production needs a bunch of input files (geometry, magnetic field maps, RICH background etc) to properly generate data. Often these input is quite large, therefore using the JDL Sandbox is not recommended. Best of all, collect all necessary files into a tarball and copy to the SE, like in /grid/hermes/dich/disng/disng.tgz. As with HERMES software, you should take care to unpack this archive in your working directory before starting the production. You would also probably need to adjust your generator batch script before actually starting since the file locations would be different from the ones that are normally used on the PCfarm (e.g. /user.., /group.., /mcdata.. are not accessible on GRID).
Controlling the output
This chapter is currently overcomplicated since the HERMES PCfarm is based on SLD3 software, which does not support GRID middleware. Other DESY nodes, like pal, where GRID software can be run, do not have the HERMES NFS discs /user, /group and /mcdata.. mounted, neither the tape storage /acs. Therefore some tricks are necessary to store the generated output to the GRID SE first, and then transfer them to either tapes or disc storage of HERMES.
The MC output can be logically split into two parts - the large uDST output, and smaller multiple logfiles. From logistical point of view it is convenient to join the logfiles into a single tar archive before shuffling between nodes. Hence, every run produced will be stored as two files - the large smdst file and the tar file will all logs and kumacs used for generation. Here's an example of saving the output to SE:
echo "Now we write things to the output directory"
lfc-mkdir -p /grid/hermes/dich/${GRIDOUTDIR}/
lcg-cr -d dcache-se-desy.desy.de \
file:./$smdst_file \
-l lfn:/grid/hermes/dich/${GRIDOUTDIR}/${PBS_JOBID}_${smdst_file}
if [ $? -ne 0 ] ; then
echo Couldnt create /grid/hermes/dich/${GRIDOUTDIR}/${PBS_JOBID}_${smdst_file}
fi
The processes of transfer to and from SE can be decoupled, currently there is no intelligent daemonized solution developed to detect that a run has been copied to the SE and can be transferred to the HERMES storage. In the future, when the PCfarm is upgraded to SLD5, these processes can be controlled in a more elegant and automatized way.
In short, to transfer a file from GRID SE to HERMES PCfarm, the following actions are needed:
- Get the location of the file with run number num in SE:
smdst=`lfc-ls /grid/hermes/dich/$PRODIR/ _ egrep *$num.smdst.gz`
- Convert its SRM path to GSIDCAP:
fsrm=`lcg-lr lfn:/grid/hermes/dich/$PRODIR/$smdst _ sed sZsrm://dcache-se-desy.desy.deZgsidcap://dcache-desy-gsidcap.desy.de:22128/Zg`
- Afterwards, dccp would be able to transfer the file directly from SE to either HERMES NFS fileservers, or to tapes, depending on your choice, for example:
dccp $fsrm dcap://dcachedoor2.desy.de:22129/pnfs/desy.de/hermes/mc3/PYTHIA6_NOV09/RESULTS_pythia_2006_p_posi/${smdst##*.de_}
Running the jobs
Since the driving kumacs (gmc/hmclogon) are tailored individually for every production, it makes sense to use the JDL Sandbox for these purposes. The JDL files should be composed according to the format described above, with InputSandbox containing comma-separated list of (small) input kumacs needed for the production. A real example for a single run JDL file is given below:
VirtualOrganisation = "hermes";
Executable = "batch_pythia_photo_05p";
Arguments = " ";
Environment = {"RUN=1000","RUNS=1000","GEN_EV=100000"};
NodeName = "runs1000-1000";
Rank = other.GlueCEStateFreeCPUs;
Requirements = ( other.GlueCEUniqueID == "grid-ce5.desy.de:2119/jobmanager-lcgpbs-desy" ) ;
StdOutput = "out";
StdError = "out";
InputSandbox = {"batch_pythia_photo_05p","pythia6_05_acc_photo_p_ELEC.kumac","backgroundMC_pythiaPHOTO.txt"};
OutputSandbox = {"out"};
Here, the environment variables are introduced for convenience, since the batch script will generate MC data accordingly. In general, any new MC production should first get tested with a very small number of events and a single submitted job to make sure all the necessary inputs are available in the GRID environment. Later on, if everything succeeds (never happens from the first attempt :) one may generate sets of JDLs, which would differ only by the run number variable, and submit the full set as a collection. Assuming such a JDL set located in a directory pythia_runs_1000-2000, the submission command would then be:
glite-wms-job-submit -a -o pythia_runs_1000-2000.pid --collection pythia_runs_1000-2000
Again, it's the best to try this sequence out with a small number of files and tiny amount of events per run, simply to check that all works as expected. One should note that the submission process itself is quite time consuming, e.g. from the moment you issued the glite-wms-job-submit to the moment when MC scripts will actually start running on the GRID batch nodes there might be a time gap of 5..30 minutes, depending on the load on GRID batch system (even if there are free nodes available, the batch system has to handle O(1000) jobs with their I/O, so it's pretty slow). Once submitted, the job status can be monitored via the glite-wms-job-status command.
From practical point of view, for large productions it is the most optimal to have one job per run, with some reasonably large number of events per run (usually 50-200K depending on the generator and settings). This is dictated by the limited proxy lifetime and the fact that some of HERMES MC jobs crash or hang without a known reason, blocking the job output.
Backing up to tape
How can I archive/backup my files on the DESY tape robot ?
Before you continue, keep in mind the following rule:
DO NOT archive small files (less than 10-20 MB) since this is extremely inefficient and tape space consuming; before archiving, group files togheter with tar as explained below.
Under our HERMES machines, the DESY tape robot appears as the /acs directory. Each user can make its own directory under /acs/user with the command:
mkdir /acs/user/name
where name is generally your login name (for example otto).
Archiving and restoring files is done with the command
dccp
(Disc Cache copy) which is the analogous of the
common cp command.
Warning: dccp does
not have the options of cp;
the dccp man page is
available here.
The simplest syntax of dccp is:
dccp source destination
Suppose you want to archive on the tape
the file /user02/otto/pinco.pallino.
You already created the directory
/acs/user/otto.
The command
dccp /user02/otto/pinco.pallino /acs/user/otto/
does the right work: the pinco.pallino file is saved on the tape, with the same name pinco.pallino (add the option -d 2 to have a verbose output of what is going on during the archiving, or if something doesn't work as expected).
ls -l /acs/user/otto
shows the content of your archive (at present only
pinco.pallino
will be listed).
To retrieve/restore the pinco.pallino file from the robot,
just do the opposite:
dccp /acs/user/otto/pinco.pallino pinco.pallino
this
command will retrieve pinco.pallino from the tape
to the file pinco.pallino in your current directory.
You can create subdirectories in /acs/user/otto as a normal
file system. You can even delete files with the rm command,
but this only means that you
are deleting the link ("file stub") to the real file on the robot.
Note that you cannot overwrite a file stub, you have to remove it
first from your acs directory.
If you have to archive one directory (and its content),
it is better to tar and compress
it before:
tar cvfz filename.tgz directory
The latter will produce the file filename.tar.gz,
that you can easily
archive as outlined before.
When you retrieve the archived file, you can uncompress and
untar it with:
tar xvfz filename.tgz
The original directory and its content is re-created in the current directory.
Copying multiple files directly (with wildcards like * or ?) is not possible via dccp, e.g.:
dccp *.hbook /acs/user/otto
will fail. To avoid copying large amounts of files by hand either use tar archives or simple shell scripting like (USE WITH CARE):
for file in *.hbook ; do dccp $file /acs/user/otto ; done
Direct access to tapes
If your analysis code needs to read in data from tapes you have the following options:
- either use dccp to copy the files from tapes to your local directory and then analyze, or
- if the total amount of data is too large you may try to read the data in directly using the dcap preload library.
The usage of the latter is rather trivial: set the environment variable with
LD_PRELOAD=/afs/desy.de/group/hermes/dcap/lib/libpdcap.so export LD_PRELOAD
either on the command line or in your batch script, and all the file access calls ('open', 'read', 'seek', 'close' etc) will be replaced with their dcache analogs, which will automatically fallback to the default system calls in case of regular local or nfs files. In case of a tape-located file, it will be copied to a dcache pool and then fed to your application, as if it were local or nfs-mounted.
Access to dCache without mounting
Using the dcap preload library (as described above) one may gain (read-only) access to files stored in dCache (or tapes) without actually needing a system-wide mount point like /pnfs/desy.de/hermes or /acs, but deploying the URL-style access to dCache:
# Setup the preload library LD_PRELOAD=/afs/desy.de/group/hermes/dcap/lib/libpdcap.so export LD_PRELOAD # Get a directory listing ls dcap://dcache-door-hera03.desy.de:22125//pnfs/desy.de/usr/hermes/ # copy a file from dCache to local directory cp dcap://dcache-door-hera03.desy.de:22125//pnfs/desy.de/usr/hermes/udst/06f1/smlinks/run10000.smdst.gz . # run your analysis code directly on a uDST file in dCache ./your_analysis_code dcap://dcache-door-hera03.desy.de:22125//pnfs/desy.de/usr/hermes/udst/06f1/smlinks/run10000.smdst.gz
If you analysis code uses a runlist of a usual kind, e.g. contains list of uDST files from /production directory, like:
/production/udst/06f1/smlinks/run10000.smdst.gz /production/udst/06f1/smlinks/run10001.smdst.gz /production/udst/06f1/smlinks/run10002.smdst.gz ...
you can easily convert it to an alternative version using the dCache-mounted or dCache URL via sed:
sed -e sZ/productionZ/pnfs/desy.de/hermes/scratchZg runlist.dat > runlist_dcache.dat sed -e sZ/productionZdcap://dcache-door-hera03.desy.de:22125//pnfs/desy.de/usr/hermes/scratchZg runlist.dat > runlist_dcache_url.dat