PCFarm Info

Aus HERMESwiki
Zur Navigation springen Zur Suche springen

Page maintainer: Eduard

Checked.png This page is considered done. It been reviewed by Alexander. There may be missing elements, but they are all flagged and the text has no errors.


Disclaimer: Most of the relevant information (slightly outdated, since the setup has been upgraded several times since the original launch in 2002) concerning the PC Farm can be found here. Also look to Bootcamp for some Unix basics, and UNIX Tricks for some clever ideas.


AFS tips and tricks (home directories)

Quota management

Home directories at DESY are managed through a centralized AFS system. (for HERMES /user disk quota check here). To check your AFS home quota use the following command:

 fs lq ~ 

You'll get something like:

Volume Name                   Quota      Used %Used   Partition
user.yourname                130080     97707   92%<<       37%    <<WARNING

The numbers are in kBytes. If you run out of quota (e.g. "Used>Quota"), you will have troubles with logging in KDE and applications like pine, gv, firefox etc. If that's the case, try logging in text-mode (Ctrl-Alt-F1 from a Linux desktop), and cleaning up. A good candidate to blame for filling your quota up is the firefox cache:

rm ~/mozilla/firefox/*/Cache/*

If that doesn't help, search for large files/directories for a further cleanup (may take a while):

cd ~
du -ks * . _ sort -n

Ignore the .OldFiles directory, that's your daily backup.

If you REALLY need more space, send an email to one of the HERMES AFS administrators

File and Directory Permissions (aka Access Control Lists - ACLs)

AFS ignores the usual UNIX file permissions. Instead a sophisticated Access Control List (ACL) is acknowledged. Unlike the UNIX file permissions, ACLs apply to directories rather then individual files. Try the following command in your home directory:

fs la
Access list for . is
Normal rights:
  system:anyuser l
  yourname rlidwka

The letters rlidwka have the following meanings:

l lookup Basically can do ls in that directory, without having the file size and dates
r read Read the contents of the file
i insert Create new files/directories
w write Write into files
d delete Delete files/directories
k locK Lock Files
a administer Modify the ACLs

In the example given above (the default setting for a newly created home directory), the owner of the directory is authorized to do everything, and the system:anyuser (anybody else) can only list your files, without seeing their content.

Special Directories and Modifying ACLs

Every new home directory comes with a bunch of automatically generated directories. Some have predefined ACLs, for example:

fs la www
Access list for www is
Normal rights:
  usg rl
  system:administrators rlidwk
  system:anyuser rl
  yourname rlidwka

Here, the important lines are the "usg rl" and "system:anyuser rl", which allow anyone in world to access (read) the files in the www directory. That one is actually a special directory, which can be accessed via http://www.desy.de/~yourname . If you don't have (and would like to have) such a webpage, you can create one, and modify the permissions accordingly:

mkdir ~/www
fs sa -dir ~/www -acl system:anyuser rl 

Similarly, the ~/hermes directory has the following permissions by default:

fs la ~/hermes
Access list for hermes is
Normal rights:
  usg:hermes rl
  system:administrators rl
  yourname rlidwka

And the important line here is "usg:hermes rl" which means everybody at HERMES can read the contents of files which you put in this directory (good for storing presentations, papers etc).

If you are drafting a paper and would like to allow write access to a restricted list of people in the given directory, here's what you wanna do:

mkdir -p ~/papers/g1draft
fs sa -dir ~/papers/g1draft -acl peter rlidw
fs sa -dir ~/papers/g1draft -acl john rlidw
fs sa -dir ~/papers/g1draft -acl james rlidw

Where peter, john, james are the apostles login names of the drafters you want to give full read/write/create/delete access to that directory.

Another example - if you have a directory with the right permissions, and you just want to give the same permissions to another directory, the easy solution is:

fs ca -fromdir ~/papers/olddraft/ -todir ~/papers/newdraft

For other AFS commands see here

Backup policy

Every AFS directory is backed up regularly. For home directories it is done every night, so if you accidentally deleted a file, you can immediatelly restore it:

rm ~/.profile
cp ~/.OldFiles/.profile ~/

For restoring files which were rewritten several times AFS@DESY offers the beautiful Tivoli utilities to handle the incremental backup system. For example, to restore your ~/.root_hist from half a year ago, you do the following:

dsmc_afs restore -inactive -pick ~/.root_hist

After giving your AFS password (and some waiting), you will get a list of backup versions for that file:

TSM Scrollable PICK Window - Restore

     #    Backup Date/Time        File Size A/I  File
        -----------------------------------------------------------------------
     1. _ 2008.04.26 13:33:21      15,30 KB  A   /afs/desy.de/user/d/dich/.ro
     2. _ 2008.04.25 23:55:03      17,19 KB  I   /afs/desy.de/user/d/dich/.ro
     3. _ 2008.04.24 14:09:51      15,99 KB  I   /afs/desy.de/user/d/dich/.ro
     4. _ 2008.04.21 21:33:03      14,49 KB  I   /afs/desy.de/user/d/dich/.ro
     5. _ 2008.04.17 14:28:58      13,14 KB  I   /afs/desy.de/user/d/dich/.ro
     6. _ 2008.04.15 21:29:13      13,87 KB  I   /afs/desy.de/user/d/dich/.ro
     7. _ 2008.04.13 13:31:45      13,26 KB  I   /afs/desy.de/user/d/dich/.ro
     8. _ 2008.04.12 13:32:43      13,25 KB  I   /afs/desy.de/user/d/dich/.ro
     9. _ 2008.04.04 13:35:45      11,99 KB  I   /afs/desy.de/user/d/dich/.ro
    10. _ 2008.04.03 14:07:35      11,78 KB  I   /afs/desy.de/user/d/dich/.ro
    11. _ 2008.04.02 13:32:38       7,89 KB  I   /afs/desy.de/user/d/dich/.ro
    12. _ 2008.03.26 13:46:11       7,81 KB  I   /afs/desy.de/user/d/dich/.ro
    13. _ 2008.03.08 00:01:23       7,96 KB  I   /afs/desy.de/user/d/dich/.ro
    14. _ 2008.03.05 15:19:22       7,57 KB  I   /afs/desy.de/user/d/dich/.ro
        0---------10--------20--------30--------40--------50--------60--------7
<U>=Up  <D>=Down  <T>=Top  <B>=Bottom  <R#>=Right  <L#>=Left
<G#>=Goto Line #  <#>=Toggle Entry  <+>=Select All  <->=Deselect All
<#:#+>=Select A Range <#:#->=Deselect A Range  <O>=Ok  <C>=Cancel

where you can select the actual version(s) you think should be restored, by entering it's # number (or a range) and pressing enter. After you're satisfied with the selection, press O<Enter>. The selected file will be restored in its former location (a choice to overwrite the existing files will be given).


ATTENTION: only such AFS files which are readable by system:administrators will be stored in the TSM backup node AFSFILE. AFS setacl example:

fs setacl -dir private -acl system:administrators rlidwka

Alternatively, use the GUI client dsm_afs. There, choose the "Restore" option, then select the View->Display active/inactive files option, then go to File Level->afs/desy.de/user/s by clicking on the + sign (instead of "s" take the first letter of your login name). Select your home directory by clicking on the little grey square on the right side of the + sign. The content of directory will be displayed. Select the file(s) you'd like to restore (grey buttons on left of the name) and press "Restore" button. You'll have the choice of restoring the files in their original location OR choose a new location. In addition, a search function is available.


More information available here.

Interactive nodes

Checked.png This page is considered done. It been reviewed by Alexander. There may be missing elements, but they are all flagged and the text has no errors.

Two workgroup servers (worf and geordi) are available for interactive login via ssh.

The interactive nodes are equipped with ~300GB of /scratch storage (4 days lifetime for files, cleanup without further notice), and have the NFS-based disks as well as tape storage mounted under /user?? and /acs/, respectively.

/user?? disk quota

The NFS disks /user0[1..7] have volumes from ~350-800GB each. Every user who needs to store files gets a directory created (upon request to pcfarmnfs@hermes.desy.de), with a symbolic link in /user/yourname . The default quota is set to 3GB, and can be expanded upon need. Additional dedicated space is available for different analysis groups in /group0[1..3] and for MC development and productions in /mcdata0[1..9].

/user?? disk backup policy

Though the /user??, /group?? and /mcdata?? disks are officially NOT backed up, there is an effort to save at least some crucial stuff. The backup is performed on a nightly basis and includes only ASCII (source, header, script, macro, ...) files, not exceeding a size threshold of 1MB. Binary files like n-tuples are NOT backed up. The last night backup can be restored directly on the interactive nodes by copying the files back from respective directory in /backup0[1..2]. For more sophisticated backups you'll need to contact PC Farm administrators.

Interactive job policy

As the name suggests, the interactive nodes are installed to run interactive jobs (email, browsing, editing, compiling, testing, doing interactive analysis etc, just to name a few). What they are NOT designed for is running of CPU/memory/disk-intensive tasks. To save the users of the PCFarm from such unauthorized (sometimes unintentional, e.g. due to a bug in a test code) abuse, there's a watchdog program running on the interactive nodes which kills every job that takes more then 90% of the CPU for longer then 10 minutes. For every job that needs intensive computing resources, one has to consider submitting it to the batch system.

Batch system (PBS)

Out batch system is based on OpenPBS V2.3, with the server running on kirk. The system is arranged such that every user gets a fair share of resources, which are then optimally utilized. It's based on queues with different limits. For the job to be executed, a shell script has to be prepared and submitted to the batch system.

Submitting, Monitoring and Killing jobs

To submit a job to the batch system, use the qsub command (man qsub will give all options):

qsub -q M myscript.sh

To check the status of your job, run qstat:

qstat
Job id           Name             User             Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
992643.kirk      myscript.sh      yourname         02:51:44 R M

The "Job id" is a unique number for this particular job, which can be used later to get detailed information or stop the execution of the job. The output of the qstat displays also the queue it's submitted to, the status of your job (under S, can be (Q)ueued, (R)unning, (E)nding), and the actual CPU time already used

Before you submit, try to figure out how much time your job will need to complete (e.g., if you plan to analyze 30000 runs, try your code first with 100 and scale). Depending on the need, choose a corresponding queue. In the above example M queue is addressed, which will reserve 8 hours of CPU time for your job. If not finished by that time, your job will be automatically killed afterwards and a notification email sent to your DESY account.

If you realized the code does something wrong (e.g. you expected it to complete in 10 minutes but it still runs after 2 hours, or if you found a bug in the code AFTER having it submitted, or if the intermediate output looks suspicious), you can explicitly tell the batch system to terminate the execution by issuing:

qdel 997725

where 997725 is your job id.

Useful tips

Sometimes it's useful to learn on which batch node your job is being executed (or queued) currently. For that the -n option to qstat can be used, e.g.:

qstat -n 997725.kirk

kirk.desy.de:
                                                            Req'd  Req'd   Elap
Job ID       Username  Queue  Jobname        SessID NDS TSK Memory Time  S Time
-----------  --------  -----  -------------- ------ --- --- ------ ----- - -----
997725.kirk  marukyan  L      batch_d_dvcs.m   2083   1  --    --  24:00 R 00:59
   fhebtn34/0

In this example, it's fhebtn34.


By default, the batch system stores the standard output and standard error of your job into separate files (myscript.sh.oXXXXXX and myscript.sh.eXXXXXX, with XXXXXX being the job id). If you want these files to be merged, add the -j oe option to qsub.


If your job gets killed, you get a notification email by default. This behavior can be controlled by the -m string option, where string is a combination of n,a,b,e letters, with:

               n  no email is sent
               a  mail is sent when the job is aborted by the batch system
               b  mail is sent when the job begins execution
               e  mail is sent when the job terminates

e.g., -m ae will notify you when your job finishes, by either normal or forced termination.


Queue Length

  • S 10 minutes
  • M 8 hours
  • L 24 hours
  • XL 72 hours
  • XXL 240 hours

To obtain this information from the PBS job system, use the following commands:

 $ qmgr -c "list queue M"

Example of Batch Script

 $ cat my_script.com 
 
 #!/bin/sh

 cd /scratch/$PBS_JOBID
 echo `pwd`

 echo `hostname`

 echo $PBS_O_WORKDIR
 echo $PBS_JOBNAME
 echo $PBS_JOBID
 echo $PBS_QUEUE
 

This script will just "cd" to the working directory on a batch node, print out host name where it runs, current directory, job name and identifier, queue name. You can use any other shell commands in a batch script, like "cp" to copy over input/output files from/to your private /user?? area.

Using GRID at DESY

Review.png Important note: This page has not yet been reviewed by an expert! Use the information below with caution.

Other pages that have not been reviewed yet can be found in the category FORREVIEW.

GRID is the new worldwide project for efficient utilization of computing resources. It uses certain abstraction mechanisms for managing user input and output files, job submission, control and monitoring. In this section you'll find a very much simplified (hence incomplete, yet hopefully usable) step-by-step guide describing how to Obtain a GRID certificate, Submit very simple jobs, Transfer data (files) to and from GRID.


NOTE: It is important to remember that the batch nodes of the GRID system are clean linux installations (currently Scientific Linux 5), namely without AFS or CERN software, let alone HERMES software. In practice this means that all the necessary software has to be compiled on some SL5 machine (pal cluster is a good choice) and then transferred to the Grid system, using either the sandbox or the storage element (recommended).

Obtaining a GRID certificate

To use the GRID a special certificate is required. A user certificate consists of a private key with a private password and a certified public key. The private key is exclusively possessed by the user and is not known at any stage to the registration authority (RA) or Certification Authority (CA). Lost private/public keys or the password can not be recovered by any means.

A certificate is valid for 1 year and must be renewed before its expiration. Users get notified by e-mail 3 weeks before this date by the CA. In Germany user certificates are issued by GridKa of FZ Karlsruhe (FZK). DESY acts as a Registration Authority (RA) for GridKa and provides an registration service.


Application process

First, download and print the application form, then fill it and have it signed by your supervisor. The signed form has to be sent to DESY IT together with a copy of your passport/ID card.

Next, point your browser to GridKa web portal and follow the instructions, namely:

  1. Import the root certificate into your browser
  2. Click on "Your first certificate" and fill in the form
  3. Press "Send"

From now on your certification process has started, you will receive several emails with simple and detailed instructions on what to do next. Remember that your password is not recoverable!

Adding a Virtual Organization (VO)

As soon as your certificate has arrived (and imported in your browser), you should add yourself to a VO, in our case hermes. For that, point your browser to HERMES VOMRS @ DESY and follow the instructions. You would need to confirm your application following email instructions. Also your VO administrator has to confirm that you are allowed to run GRID jobs for HERMES.

Conversion steps

To use your certificate on the PCfarm it needs to be converted to a PEM format (originally it's a CRT file). First, export your certificate to a file using the instructions of Exporting certificates from your browser into a file usercred.p12. Place that file in your AFS ~/.globus directory. Remember to change the safety permissions on that file:

chmod go-rwx ~/.globus/usercred.p12

Login to pal.desy.de via ssh, go to the directory ~/.globus and run

source /afs/desy.de/project/glite/UI/etc/profile.d/grid-env.sh

Note that the above command need to be ran every time you login to pal in order to submit or monitor any jobs.
Therefore, best of all make an init script in e.g. ~/grid/ with contents like:

source /afs/desy.de/project/glite/UI/etc/profile.d/grid-env.sh
export LFC_HOST=`lcg-infosites --vo hermes lfc`
voms-proxy-info

and call it every time after logging in, if you plan to work with grid.

Back to the exported certificate - use openssl to convert it to a PEM format:

openssl pkcs12 -nocerts -in usercred.p12 -out mykey.pem

When asked for, supply the password chosen when exporting the certificate and the PEM pass phrase (this is the same as the one for exporting the certificate). As a result, a file named mykey.pem will appear in your ~/.globus directory.

Testing your certificate

Continue, on pal again, run

voms-proxy-init --debug -verify -voms hermes

(it will ask for the password as usual). If the command output ends with something like:

.................................................................................++++++
 Done
Your proxy is valid until Tue May 26 08:28:42 2009

then your certificate has been successfully used to create a Grid Proxy. This proxy (similar to an AFS token) is valid for 12 hours by default, and is used by the submitted jobs to access Grid resources. As soon as the proxy expires, your job will stop, so it is your responsibility to take care that the jobs run for a limited time only. The proxy can be initialized with a longer lifetime than the default 12 hours (up to 96 hours) using the options "-valid 96:00 -vomslife 96:00", but in reality the DESY batch is limited to 48 hours cpu and 72 wall time, therefore such values don't make much sense.

Running

voms-proxy-info

will show the remaining lifetime for your proxy.

Submitting (simple) jobs to GRID

A simple batch job consists of a shell script and a Grid job definition file. Their contents and submission process are described below.

Composing the batch script

For a simple test, create an env.sh file in any directory of your choice, with the contents:

#! /bin/sh

#########################################################################
/bin/hostname -f
/bin/date
/usr/bin/id
/bin/pwd
/bin/ls -al
/bin/df .
/usr/bin/env
$GLOBUS_LOCATION/bin/grid-proxy-info -all
#########################################################################

Composing the JDL (Job Definition Language) file

To run the above script, create an env.jdl file (this is the globus "submitter" script) with the contents:

VirtualOrganisation = "hermes";
Executable    = "env.sh";
Arguments     = " ";
StdOutput     = "out";
StdError      = "err";
InputSandbox  = {"env.sh"};
OutputSandbox = {"out","err"};


Submitting the job

To submit the above test job run

glite-wms-job-submit -a -o pid env.jdl

Here, the -a option will delegate the job automatically (recommended), and -o pid will put the process(job) ID info into the file named 'pid' in the same directory. The 'pid' file may contain more than one job information.


Monitoring the job

Check the job status with

glite-wms-job-status -i pid

The output will contain the keyword "Scheduled" for a while, then "Running", and when it returns "Done", the output of the job may be retrieved. Note that the status is being updated every several minutes, therefore our test job will still have the status "Running" for quite a while after being completed in reality.

Input/Output in GRID

The simple test job described above uses the "sandbox" mechanism for input/output. Although convenient, it may only be used for relatively small data volumes (up to few MB, usually the standard output and error logs). For larger data transfers Grid-specific storage elements (SE) have to be used.

Using the JDL sandbox

The 'env.jdl' we used for test contains the following lines:

InputSandbox  = {"env.sh"};
OutputSandbox = {"out","err"};

The InputSandbox statement tells the system to copy the "env.sh" from our local system to the remote batch node before execution (the Executable statement indicates which exactly file has to be ran). out and err are standard names for redirecting the standard output and error, respectively. The OutputSandbox statement tells the system that these files are to be copied back (upon request, e.g. NOT automatically, as opposed to PBS), otherwise all the output which is not in the sandbox or not saved by any other means (usually on SE) is deleted upon job completion.

To retrieve the output sandbox contents, run

glite-wms-job-output --dir ./output -i pid

when the job completes. The './output' directory contents will be overwritten, so take care... If the 'pid' file contains more than one job, they will be listed and a choice offered. Only sandboxes of jobs with status "Done." can be retrieved.

Using the Storage Element (SE)

The Storage Element (SE) is a dedicated (virtual) space in the Grid infrastructure. It can be accessed from the pal host for initial data storage (executables, libraries, kumacs etc.) needed for the job. Similarly, the job script may use it for output storage, which can then be retrieved to pal.

First, define the following environment variable:

export LFC_HOST=`lcg-infosites --vo hermes lfc`

The dedicated space for hermes in SE is called /grid/hermes . To list its contents, run:

lfc-ls -l /grid/hermes

Similarly, to create a subdirectory in that space, use

lfc-mkdir /grid/hermes/nameit/

Copying files onto SE is a creative process, hence you use:

lcg-cr -d dcache-se-desy.desy.de -l lfn:/grid/hermes/nameit/testfile file:/somedir/testfile

(remember that the testfile has to be located on pal where our NFS disks (/user??, /group?? etc) are not accessible, therefore take care to scp it first to pal). For deleting, use:

lcg-del -d dcache-se-desy.desy.de -l lfn:/grid/hermes/nameit/testfile

To access the SE from your script don't forget to define the LFC_HOST variable, and use

lcg-cp -v lfn:/grid/hermes/dich/nameit/testfile ./testfile

Using GRID for MC productions (NEW!!!)

Underconstruction.gif This page is not yet ready for use or review. It is assigned to Eduard. The page isn't being edited right this minute, so feel free to add any information you have about this subject.
Review.png Important note: This page has not yet been reviewed by an expert! Use the information below with caution.

Other pages that have not been reviewed yet can be found in the category FORREVIEW.

Here is a step-by-step guide to GRID-based MC productions. It assumes that you already have a valid GRID certificate, registered at HERMES VO, and are roughly familiar with general flow of MC productions.

Storage

Local (final)

As GRID is normally used for large productions, the first thing to check is the available storage at the location where your files will be finally stored. This can either be

  1. /mcdata?? (HERMES PCfarm NFS-based fileservers, quick and easy access, but limited storage, run df -h /mcdata?? to find out which of the discs has sufficient free space)
  2. /acs/mc? (tapes, virtually unlimited, yet slow and less convenient access)
  3. /acs/scratch (dCache disc-only pools, total space ~22TB, as fast as NFS, access via dcap-preload library)

General recommendations apply, e.g. coordinate the location usage with people responsible for storage and the management. Then, create corresponding directory structures, indicating relevant information in the name, like: RESULTS_pythia_pol_htc_tmc_2004_p_posi .

Remote (GRID)

From any grid-enabled host (e.g. pal cluster), run

$ lcg-infosites --vo hermes se

The resulting numbers tell the corresponding space available in all organizations supporting HERMES VO (currently DESY HH and DESY Zeuthen)

Avail Space(Kb) Used Space(Kb)  Type    SEs
----------------------------------------------------------
29031557141     9768442859      n.a     globe-door.ifh.de
3459305720      3376341868      n.a     dcache-se-desy.desy.de

(Note that Zeuthen doesn't report the space reserved for HERMES but the total, which is somewhat useless. Please use DESY HH SE for now).

If necessary, clean up older productions to free space (like in the example above, where half of the available space is used by an old production). Use a script like below on pal:

#!/bin/bash

PRODIR=pythia_pol_05p

for fu in `lfc-ls /grid/hermes/dich/$PRODIR/` ; do
    echo -n $fu "       " ; date
       lcg-del -a lfn:/grid/hermes/dich/$PRODIR/$fu
       if [ $? -ne 0 ] ; then # something went wrong
          echo Something went wrong with $fu
       fi
done

Computing resources

From any grid-enabled host (e.g. pal cluster), run

$ lcg-infosites --vo hermes ce

The resulting numbers tell the corresponding CPU cores available at the moment:

#CPU    Free    Total Jobs      Running Waiting ComputingElement
----------------------------------------------------------
4076       0     661            566       95    grid-cr5.desy.de:8443/cream-pbs-desy
 784       0       0              0        0    lcg-ce0.ifh.de:2119/jobmanager-lcgpbs-hermes
4076     822     661            566       95    grid-ce5.desy.de:2119/jobmanager-lcgpbs-desy
4076     823     661            560      101    grid-ce4.desy.de:2119/jobmanager-lcgpbs-desy

The relevant numbers are those listed with grid-ce5. Another possibility of obtaining visually enriched information is via GRID Monitor. The availability of free CPU cores should usually not affect your planning of MC job submission, expect the cases where the GRID is completely overloaded with queued jobs. In this case there is a danger that your proxy may expire before the jobs finish running. Take care to specify the maximum lifetime for your proxy right before submitting the jobs, and split them into smaller chunks (e.g. submit 200 jobs per day and update your proxy every time before submission).

Necessary Software

GRID batch nodes are "vanilla" installations of Scientific Linux 5 (currently). This means there is NO additional software package installed on these nodes, like CERN, Adamo, HERMES, ROOT etc. It is your responsibility to make sure all necessary packages are present before your job will start running. Software packages that are needed often and by many users can be pre-packaged by HERMES VO GRID software manager account. Currently there is only the HERMES software tree r25, located in $VO_HERMES_SW_DIR on DESY GRID nodes. Since theoretically your jobs may be running anywhere in the world, it is always a good idea to check if the software is available in $VO_HERMES_SW_DIR, otherwize unpack a preinstalled tarball from lfn:/grid/hermes/hermes_r25.tgz. A similar tarball of r24 software tree is available as well. A recommended way of preparing the software is as follows (assuming a sh/bash type script):

HERMESROOT=${VO_HERMES_SW_DIR:=/opt/vo/hermes}/r25
LFC_HOST=grid-lfc.desy.de
export LFC_HOST

echo First check if HERMES software is there:
if [ ! -e $HERMESROOT/bin/hrc ] ; then 
    echo HERMES software directory not available at $HERMESROOT
    echo Changing HERMESROOT from $HERMESROOT to $HOME/hermes/r25
    HERMESROOT=$HOME/hermes/r25
    export HERMESROOT
    mkdir -vp $HERMESROOT
    
    echo Creating a custom local copy
    echo Copying the HERMES r25 release to $HERMESROOT
    lcg-cp lfn:/grid/hermes/puthermes.sh.tgz $HERMESROOT/hermes_r25.tgz
    cd $HERMESROOT
    tar zxf hermes_r25.tgz
    echo Done.
else
    echo HERMES software already available at $HERMESROOT 
fi

Other input files

Normally, any HERMES MC production needs a bunch of input files (geometry, magnetic field maps, RICH background etc) to properly generate data. Often these input is quite large, therefore using the JDL Sandbox is not recommended. Best of all, collect all necessary files into a tarball and copy to the SE, like in /grid/hermes/dich/disng/disng.tgz. As with HERMES software, you should take care to unpack this archive in your working directory before starting the production. You would also probably need to adjust your generator batch script before actually starting since the file locations would be different from the ones that are normally used on the PCfarm (e.g. /user.., /group.., /mcdata.. are not accessible on GRID).

Controlling the output

This chapter is currently overcomplicated since the HERMES PCfarm is based on SLD3 software, which does not support GRID middleware. Other DESY nodes, like pal, where GRID software can be run, do not have the HERMES NFS discs /user, /group and /mcdata.. mounted, neither the tape storage /acs. Therefore some tricks are necessary to store the generated output to the GRID SE first, and then transfer them to either tapes or disc storage of HERMES.

The MC output can be logically split into two parts - the large uDST output, and smaller multiple logfiles. From logistical point of view it is convenient to join the logfiles into a single tar archive before shuffling between nodes. Hence, every run produced will be stored as two files - the large smdst file and the tar file will all logs and kumacs used for generation. Here's an example of saving the output to SE:

echo "Now we write things to the output directory"

lfc-mkdir -p /grid/hermes/dich/${GRIDOUTDIR}/
lcg-cr -d dcache-se-desy.desy.de \
       file:./$smdst_file \
       -l lfn:/grid/hermes/dich/${GRIDOUTDIR}/${PBS_JOBID}_${smdst_file}
if [ $? -ne 0 ] ; then
   echo Couldnt create /grid/hermes/dich/${GRIDOUTDIR}/${PBS_JOBID}_${smdst_file}
fi

The processes of transfer to and from SE can be decoupled, currently there is no intelligent daemonized solution developed to detect that a run has been copied to the SE and can be transferred to the HERMES storage. In the future, when the PCfarm is upgraded to SLD5, these processes can be controlled in a more elegant and automatized way.

In short, to transfer a file from GRID SE to HERMES PCfarm, the following actions are needed:

  • Get the location of the file with run number num in SE:
smdst=`lfc-ls /grid/hermes/dich/$PRODIR/ _ egrep *$num.smdst.gz`
  • Convert its SRM path to GSIDCAP:
fsrm=`lcg-lr lfn:/grid/hermes/dich/$PRODIR/$smdst _ sed sZsrm://dcache-se-desy.desy.deZgsidcap://dcache-desy-gsidcap.desy.de:22128/Zg`
  • Afterwards, dccp would be able to transfer the file directly from SE to either HERMES NFS fileservers, or to tapes, depending on your choice, for example:
dccp $fsrm  dcap://dcachedoor2.desy.de:22129/pnfs/desy.de/hermes/mc3/PYTHIA6_NOV09/RESULTS_pythia_2006_p_posi/${smdst##*.de_}

Running the jobs

Since the driving kumacs (gmc/hmclogon) are tailored individually for every production, it makes sense to use the JDL Sandbox for these purposes. The JDL files should be composed according to the format described above, with InputSandbox containing comma-separated list of (small) input kumacs needed for the production. A real example for a single run JDL file is given below:

VirtualOrganisation = "hermes";
Executable    = "batch_pythia_photo_05p";
Arguments     = " ";
Environment   = {"RUN=1000","RUNS=1000","GEN_EV=100000"};
NodeName      = "runs1000-1000";
Rank = other.GlueCEStateFreeCPUs; 
Requirements  = ( other.GlueCEUniqueID == "grid-ce5.desy.de:2119/jobmanager-lcgpbs-desy" ) ;
StdOutput     = "out";
StdError      = "out";
InputSandbox  = {"batch_pythia_photo_05p","pythia6_05_acc_photo_p_ELEC.kumac","backgroundMC_pythiaPHOTO.txt"};
OutputSandbox = {"out"};

Here, the environment variables are introduced for convenience, since the batch script will generate MC data accordingly. In general, any new MC production should first get tested with a very small number of events and a single submitted job to make sure all the necessary inputs are available in the GRID environment. Later on, if everything succeeds (never happens from the first attempt :) one may generate sets of JDLs, which would differ only by the run number variable, and submit the full set as a collection. Assuming such a JDL set located in a directory pythia_runs_1000-2000, the submission command would then be:

glite-wms-job-submit -a -o pythia_runs_1000-2000.pid --collection pythia_runs_1000-2000

Again, it's the best to try this sequence out with a small number of files and tiny amount of events per run, simply to check that all works as expected. One should note that the submission process itself is quite time consuming, e.g. from the moment you issued the glite-wms-job-submit to the moment when MC scripts will actually start running on the GRID batch nodes there might be a time gap of 5..30 minutes, depending on the load on GRID batch system (even if there are free nodes available, the batch system has to handle O(1000) jobs with their I/O, so it's pretty slow). Once submitted, the job status can be monitored via the glite-wms-job-status command.


From practical point of view, for large productions it is the most optimal to have one job per run, with some reasonably large number of events per run (usually 50-200K depending on the generator and settings). This is dictated by the limited proxy lifetime and the fact that some of HERMES MC jobs crash or hang without a known reason, blocking the job output.

Backing up to tape

How can I archive/backup my files on the DESY tape robot ?

Before you continue, keep in mind the following rule:

DO NOT archive small files (less than 10-20 MB) since this is extremely inefficient and tape space consuming; before archiving, group files togheter with tar as explained below.

Under our HERMES machines, the DESY tape robot appears as the /acs directory. Each user can make its own directory under /acs/user with the command:

   mkdir /acs/user/name

where name is generally your login name (for example otto).


Archiving and restoring files is done with the command dccp (Disc Cache copy) which is the analogous of the common cp command.

Warning: dccp does not have the options of cp; the dccp man page is available here.
The simplest syntax of dccp is:

    dccp source destination

Suppose you want to archive on the tape the file /user02/otto/pinco.pallino. You already created the directory /acs/user/otto.
The command

   dccp /user02/otto/pinco.pallino /acs/user/otto/

does the right work: the pinco.pallino file is saved on the tape, with the same name pinco.pallino (add the option -d 2 to have a verbose output of what is going on during the archiving, or if something doesn't work as expected).

   ls -l /acs/user/otto

shows the content of your archive (at present only pinco.pallino will be listed).
To retrieve/restore the pinco.pallino file from the robot, just do the opposite:

   dccp /acs/user/otto/pinco.pallino  pinco.pallino

this command will retrieve pinco.pallino from the tape to the file pinco.pallino in your current directory.
You can create subdirectories in /acs/user/otto as a normal file system. You can even delete files with the rm command, but this only means that you are deleting the link ("file stub") to the real file on the robot. Note that you cannot overwrite a file stub, you have to remove it first from your acs directory.
If you have to archive one directory (and its content), it is better to tar and compress it before:

   tar cvfz filename.tgz directory

The latter will produce the file filename.tar.gz, that you can easily archive as outlined before.
When you retrieve the archived file, you can uncompress and untar it with:

   tar xvfz filename.tgz

The original directory and its content is re-created in the current directory.

Copying multiple files directly (with wildcards like * or ?) is not possible via dccp, e.g.:

dccp *.hbook /acs/user/otto

will fail. To avoid copying large amounts of files by hand either use tar archives or simple shell scripting like (USE WITH CARE):

for file in *.hbook ; do dccp $file /acs/user/otto ; done

Direct access to tapes

If your analysis code needs to read in data from tapes you have the following options:

  • either use dccp to copy the files from tapes to your local directory and then analyze, or
  • if the total amount of data is too large you may try to read the data in directly using the dcap preload library.

The usage of the latter is rather trivial: set the environment variable with

LD_PRELOAD=/afs/desy.de/group/hermes/dcap/lib/libpdcap.so
export LD_PRELOAD

either on the command line or in your batch script, and all the file access calls ('open', 'read', 'seek', 'close' etc) will be replaced with their dcache analogs, which will automatically fallback to the default system calls in case of regular local or nfs files. In case of a tape-located file, it will be copied to a dcache pool and then fed to your application, as if it were local or nfs-mounted.

Access to dCache without mounting

Using the dcap preload library (as described above) one may gain (read-only) access to files stored in dCache (or tapes) without actually needing a system-wide mount point like /pnfs/desy.de/hermes or /acs, but deploying the URL-style access to dCache:

# Setup the preload library
LD_PRELOAD=/afs/desy.de/group/hermes/dcap/lib/libpdcap.so
export LD_PRELOAD
# Get a directory listing 
ls dcap://dcache-door-hera03.desy.de:22125//pnfs/desy.de/usr/hermes/
# copy a file from dCache to local directory
cp dcap://dcache-door-hera03.desy.de:22125//pnfs/desy.de/usr/hermes/udst/06f1/smlinks/run10000.smdst.gz .
# run your analysis code directly on a uDST file in dCache
./your_analysis_code dcap://dcache-door-hera03.desy.de:22125//pnfs/desy.de/usr/hermes/udst/06f1/smlinks/run10000.smdst.gz

If you analysis code uses a runlist of a usual kind, e.g. contains list of uDST files from /production directory, like:

/production/udst/06f1/smlinks/run10000.smdst.gz
/production/udst/06f1/smlinks/run10001.smdst.gz
/production/udst/06f1/smlinks/run10002.smdst.gz
...

you can easily convert it to an alternative version using the dCache-mounted or dCache URL via sed:

sed -e sZ/productionZ/pnfs/desy.de/hermes/scratchZg runlist.dat > runlist_dcache.dat
sed -e sZ/productionZdcap://dcache-door-hera03.desy.de:22125//pnfs/desy.de/usr/hermes/scratchZg runlist.dat > runlist_dcache_url.dat