MISSION 1: UNIX SURVIVAL TRAINING

Aus HERMESwiki
Zur Navigation springen Zur Suche springen

Page maintainer: Larry

Checked.png This page is considered done. It been reviewed by an expert. There may be missing elements, but they are all flagged and the text has no errors.

Welcome to the computing world of particle physics, Cadet. Lesson 1 is this: everything we do, we do in UNIX. And so the first thing you will have to do is become familiar with that noble operating system. However, this tutorial is not an attempt to teach you the basics of UNIX. There are too many excellent books available, and it would be simply wasteful to write another one. If you are not familiar with UNIX yet, the very first section below provides a list of some recommended books that you might consult.

The purpose of this Mission is rather to attempt to `fill in the blanks' between those excellent UNIX books and actual UNIX Survival. This tutorial attempts to do the following:

  • Explain some practical UNIX tips and tricks that are learned only from experience.
  • Point out some important utilities that are not a part of the UNIX operating system itself, but which are of great importance.
  • Show you some particulars about how to setup and use your UNIX accounts on the HERMES machines at DESY.

A brief remark about notation. In the tutorial, some actual UNIX commands will be given. For example,

linkrun [-v] run year

The notation means the following:

  • Items in square brackets are optional.
  • Items in italics are not meant to by typed as is ... they indicate variables. e.g. in the example above, you should replace run with a run number.
  • Any appearance of the symbol ? denotes `some character'. For example, the notation /data??/runstage indicates the runstaging areas on any of the disks data00 through data99.
  • The wildcard symbol * denotes `any sequence of characters'.


UNIX Preparation and Concepts

In this section, we will go through the things you need to know before you embark on your Survival Training.

UNIX Essentials

If you have never encountered UNIX before, you need to get the basics under your belt. Ideally, you should read through a good book ... here are some recommendations from the speed-reading machine himself, Dana Robinson:

For those who learned DOS, there is `UNIX for DOS Users' (in DESY library). I recommend `UNIX for Dummies' to learn the basics of UNIX. I think a good command reference would also be helpful since man pages and kind of hard to use, but I haven't found a really good one yet. I used to use `UNIX in a Nutshell', which was OK but I could probably find a better book if I looked hard enough. It's for SVR4, so I'm not sure how compatible it is with both the IRIX and Linux systems we use. Be careful of the big 'telephone directory' books. They are usually poorly written and are hard to carry around.

and a couple more from me:

When you are contemplating your choice of UNIX books, check the Table of Contents. Look for a chapter on how to program under UNIX, including some description of the make command. One example is `UNIX for Programmers and Users: A Complete Guide' by Graham Glass. Also, if by some strange warp in the space-time fabric you learned VMS and have still not encountered UNIX, there is a great book called `UNIX for VMS Users' by Philip E. Bourne. It was published by Digital Press (i.e. DEC), so I'm not sure if you can still find it.

If you are a hardy (or impecunious) soul, you can try to bootstrap yourself using only free online tutorials. Here's some good links:

  • A Basic UNIX Tutorial from Idaho State University is really excellent. It's concise, but covers all the basics.
  • CERN Unix User Guide, in postscript or HTML (web-page) format.
  • Scientific Linux for DESY a quick reference guide from the folks at DESY ZEUTHEN. By the way, DESY central computing (known as the `IT' group) has prepared other quick references as well, you may find them here.

It is also helpful to have access also to online documentation -- online references that you can consult to quickly recall the syntax of a particular command (in case you leave your reference books at home, or find it physically inefficient to actually walk over to your bookshelf :)). Documentation for all UNIX commands is available through the notorious UNIX man pages. These are accessed simply by typing

man command

on any UNIX computer. If your output looks strange, try the GNU version:

gman command

The man pages are not pedagogical (hence the need for a good book), but they do provide a superb reference: what they lack in pedagogical merit, they make up for in completeness. Once the concepts of UNIX are clear to you, the man pages are perfectly readable, and are the ultimate source of mastery of all UNIX commands.

HERMES Computers and their UNIces

UNIX is the Operating System (OS) we all use. However, UNIX is a rather fluid concept ... instead of being an actual OS, it is really a class of operating systems. You see, different versions of UNIX exist: BSD, SYSV, Ultrix, OSF, IRIX, Linux, HP-UX, A/UX, Solaris, OSX ... the list goes on. In the old days, these different UNIces were distributed along philosophical lines; these days it is more likely product-lines which spawn different versions of our favourite OS. (e.g. Apple Macs use OSX). But Linux is fast becoming the most important of all UNIces, since it is (a) free and (b) runs on ordinary PC's with Intel processors. Since PC's are such a wonderfully cheap and powerful choice for computing hardware these days, Linux is rapidly becoming the most common version of UNIX.

All these UNIces are basically the same ... but slightly different. A novice user will rarely notice the differences, but a seasoned UNIX warrior will. The main variations I have observed are in three places:

  1. A handful of commands take different options depending on UNIX version. The prime example is the notorious ps command, which takes entirely different sets of options. This point is becoming less important, however, with the increasing prevalence of GNU software (described later).
  2. Another handful of commands are completely different depending on UNIX version. The most common example is the set of commands for dealing with printers: SysV-type UNIces use lp, lpstat, and cancel, while BSD-based systems use lpr, lpq, and lprm. However, both command sets are often available on one system.
  3. The identity and location of common system files is somewhat system-dependent.
  4. The optimal C and FORTRAN compiler settings vary considerably between systems.

Items 1, 2, and 3 are `your problem', I'm afraid ... but for item 4, HERMES has devised some relatively robust scripts which account for platform-dependent compiler variations. More on that later in Bootcamp.

Linux is the UNIX brand (or flavor) that is presently supported by HERMES ("supported" means that if we or you find a bug, we will fix it). Although "back in the day" we had many different types of computers running HERMES software, now virtually all analysis is done of the PC production farm at DESY:

PC production farm

name: worf.desy.de and other Star Trek characters (kirk, spock, mccoy, scotty, chekov, riker, uhura, troi, data, geordi, picard, crusher, guinan, sisko, dax, tuvok, neelix, torres, obrien)
OS: Scientific Linux SL release 3.0.8

You may find that the HERMES software works on other flavors of UNIX, but please be cautious and check that everything is really working!

Shells, Scripts and Variables

You need to know something about shells. The term shell refers to the command-line interface between you and UNIX. When you log in to a UNIX machine, the command line prompt you see is being generated by a program, called the shell, which obligingly accepts your typed commands and interprets them. Yes, it really is a program, and you can run it yourself: just type sh and voila! You have started a new shell within your current one and you can type commands into it as usual. To leave your new shell, type exit.

Shells are no more unique than UNIces. There are two basic `flavours' of UNIX shell: the Bourne shell and the C shell. Each has its own syntax rules and collection of built-in shell commands. They are really quite similar ... a novice user will hardly notice the difference. However the advanced user will find significant differences, particularly once he or she starts writing scripts ...

A script is just a file which contains a sequence of commands. If you find yourself having to use the same sequence of UNIX commands over and over in your work, just type them once into a file and execute that file. The UNIX shell will interpret each command as if you had typed it interactively. Scripting is a very general concept. Many interactive applications, such as sophisticated editors or data analysis programs, offer some mechanism for recording sequences of commands once and for all in a file (often called a macro instead of a script).

A variable is a named quantity that is associated with a value that can be retrieved either at the command line level or in scripts by placing a "$" character in front of it. There are two kinds of variables:

  • environment variables
variables are available to programs and scripts and contain the current values; they are part of the program/script environment
  • shell variables
variables are either not available to programs and scripts, or have been reset to default values

OK, let's try to create and run a simple shell script:

	echo 'echo Hello.' > test	! You just created a one-line file called test
	cat test			! Yes, your file contains one shell command
	chmod a+x test   			! Give your file execute permission
	./test				! Execute your file

Whee! It worked. Just note that important chmod command: you have to give your script execute permission before it can be run. File access permissions and chmod should be familiar to you from your UNIX reading. Also, here's a classic little UNIX-newbie snafu: try running that last command without the path specification `./'. Hah! If you just type test, I'll wager you won't get you expected! Why? Type which test. Aha! There is already a UNIX command called test, and it lives in a directory which is higher up in your PATH environment variable than your local directory `.'! When this happens you need to explicitly tell the UNIX shell that you want to use the program test in your present directory, so you should type ./test instead.

It looks simple ... but in fact, you have just written a little program. The UNIX shell offers you many powerful built-in commands to write quite complex programs in this way. For example, all shells offer branching and looping constructs (such as if-then-else), variables, and arithmetic and logical operators, just like a `real' programming language.

As your UNIX expertise grows, you will find yourself writing shell scripts to do more and more of your work for you. Further, there are numerous shell scripts littered throughout the HERMES software suite. It is thus important that you gain some familiarity with scripting, and the shell's programming-type commands. Most UNIX books contain one or more chapters on scripting, while other books are entirely devoted to scripting with a particular shell. The shells also have their own man pages, which are excellent and utterly complete references to the entire command set. But first, you must select which shell to learn. Here are the options:

  • C shell: The `vanilla' version is csh. The more popular, enhanced version is called tcsh. Mostly tcsh has the same commands and scripting syntax as csh, but it adds many convenient features for interactive use.
  • Bourne shell: The vanilla version is sh. The Korn shell (ksh) is a slightly enhanced variation. Superior versions are the Z shell (zsh) and Bourne-Again shell (bash). Again the additional features offered by these latter variants are mostly for interactive use ... but not entirely.

Both types of shells are fine for interactive shells, but the Boune shell is recommended for script programming because of these extra features.

Which shell are you using now? To find out, just type echo $SHELL ... this prints out the value of the shell variable SHELL, which contains the full name of your login shell. If it is not bash and you want bash, then you can just type bash to open a new shell. However, it is unfortunately a very system dependent (even within different linux versions) just what is shown for the running shell (it might show sh but be bash for example). Ultimately you will want to change the default shell for your account to match your preference. Normally, on your own PC, this is accomplished using the chsh command. However, people using AFS accounts on the PC farm must contact their group administrator or someone at UCO (the user consultant office of the IT division) to have their shell changed. (More on that later.)

Here is an example of a very useful shell enhancement that is provided by bash, zsh, and tcsh: tab completion. Create some file in your present directory called junk.txt (the command touch junk.txt will create an empty file, for example). Start typing this, but don't hit return: ls j ... now hit the TAB key. Hah! The shell just filled in the only available filename starting with j. Now create a second file called junk2.txt, and do it again: ls j, then TAB. This time the shell filled in junk -- as much as it could without more information. At this point, you can get a list of the available files matching your pattern: depending on your system, you either hit TAB again, or else type CTRL-D. Useful, eh? The tab completion mechanism is also context-sensitive. In the previous example, the shell knew you were looking for a filename ... but if you just type j on a new line, followed by either TAB or CTRL-D, you will get a list of all the commands that start with j. Wow! Tab completion is not available in the vanilla shells csh or sh. Here's another example: an enhanced history mechanism. If you repeatedly type the up arrow on your keyboard, you will scroll backwards through the `history' list of the commands you just entered. This is so natural that you probably never thought about it ... but in fact this convenient feature is not available in the vanilla shells.

Finally, let's go back to scripting. Scripting is so important that I want to get all the basics out of the way right here, so let's talk about two more important things. You have probably encountered these points already in your reading, but they are so important that I don't want to take chances!

First is spawning (also called forking). When you executed your little test script earlier, it actually spawned another shell -- a child process -- and the script's instructions were executed in that shell, not your current one. Why is that important? Well, because of shell variables. Like I said, shells are programming languages and they certainly know how to deal with variables. Do this if you are in a Bourne shell:

	myvar="xxx"				! define shell variable myvar
	echo $myvar				! check that it is indeed set
	echo 'echo myvar is $myvar' > test	! create a new shell script
       chmod a+x test                   ! Give your file execute permission
	./test					! execute the script

(Please note: no spaces are allowed around the equals sign in the first line!) If you are in a C shell, do this instead:

	set myvar = "xxx"			! define shell variable myvar
	echo $myvar				! check that it is indeed set
	echo 'echo myvar is $myvar' > test	! create a new shell script
        chmod a+x test 		        	! Give your file execute permission
	./test					! execute the script

(Here, the spaces in the assignment statement are optional, but the set command must be used.) Look at the output ... WOOPS! The script is unaware of your shell variable! Now try this:

	. test      (Note the "dot")

The equivalent C shell command is

        source test

Aha! Now the shell variable is seen by your script. What happened? In the first case, the script was executed in a child shell, and that shell has no memory of your current shell's variables. In the second case, you used the ". test" command to force execution of the script in your current shell -- no spawning!

And now we come to environment variables. These are simply shell variables that are exported to any and all child processes. Let's transform myvar to an environment variable and try our example again. Do this if you are in a Bourne shell:

       myvar="xxx"
       export myvar

or this if you are in a C shell:

	setenv myvar "xxx"

And finally, run your test script once more:

	./test

As you see, the child process in which the script ran has full knowledge of your environment variables ... just not your ordinary shell variables, or your aliases.

You can use the env command (printenv if you are in a C shell) to obtain a complete list of the environment variables that are currently defined. To un-define such a variable, use unsetenv in the C shell, and unset in Bourne shell. Note that assigning an empty string to an environment variables does not undefine it ... blank is an acceptable value.

The second basic-scripting concept we have to cover is the magic script line. I said repeatedly above that a child shell was spawned to run the command(s) in your script. But which shell? Believe it or not, it is typically sh -- the vanilla Bourne shell. However, this is one of those things which is very system dependent, so we are asking for trouble if we don't specify which shell to interpret our script. If you are about to write a major shell script which vacuums your floors and cleans your windows, you may not want the vanilla Bourne shell to execute it! You may be a C shell acolyte, or you may want to use the advanced features of bash. No problem! If you want your script to run under bash, just put this magic line at the very top of your script:

	#!/bin/bash

This bizarre syntax is found nowhere else in UNIX. It simply tells UNIX which program to run to interpret the commands in the rest of the file. You can even supply command line options just as if you were starting the shell by hand. Here are some useful ones for debugging that work with any shell:

	#!/bin/bash -x		! echo all commands to screen before execution
	#!/bin/bash -v		! echo all commands after variable substitutiion

These options are very valuable for finding out exactly what a script is doing, and where it goes wrong. You can also pile them on top of each other, of course: bash -xv. C shell scripters may also like to know about this option:

	#!/bin/csh -f		! startup fast: do NOT run .cshrc

The particulars of the .cshrc startup file are explained in the section below on setting up your account.

Editors and Pagers

You will need to edit files, so you need to know an editor. In fact, you will spend so much time editing files that it will be to your great, time-saving advantage to become quite expert with one of the standard editors. Here are your choices:

emacs
This is by far the most popular choice these days. Emacs is hugely powerful, but can also run in a mode where it can be used with no introduction. When you run emacs, it will most likely pop up a new graphics window for you. You can click anywhere in this window and type in or delete text, just like in Windows' Notepad program or Mac's SimpleText. The emacs graphics window provides nice pull-down menus to do things like save your file or search for strings. There is also an even nicer version, called xemacs. However, the true power of emacs comes from its giant set of keyboard commands. Emacs wizards never use the cute buttons ... often, they run emacs without the graphics window at all (emacs -nw runs the editor directly in your terminal window). The keyboard commands give you access to a wealth of advanced features, such as string finding and replacement, column editing, and macro recording.
vi
The vi editor is part of the core UNIX command set, and so it is recommended that anyone doing system work knows how to use it. But these days, emacs is distributed with all modern UNIces and so the importance of vi has decreased. Unlike the graphical version of emacs, vi can not be used unless you do some reading first -- it has a rather non-intuitive command structure that takes some getting used to. Pure vi does not have the powerful features of emacs, but there are many enhanced versions that do. An excellent one is vim, which is pretty nearly as powerful as emacs. It is often included in commercial UNIX distributions, or you can get it yourself from the VIM Home Page. I use vim as my editor of choice, but it's just a personal preference. Emacs commands all require the use of one or more modifiers (escape-meta-alt-control-shift), which I find awkward and annoying, particularly on a small laptop keyboard ... but it's up to you. Certainly vi and vim have a higher learning curve than emacs.

Pagers are programs that allow you to read text files page by page without using an editor. The standard UNIX pager programs are more and its superior cousin less. The man command calls one of these pagers to display the man pages, using whichever program is specified in your PAGER environment variable. I would just like to point out how powerful pagers are ... they include a large set of keyboard shortcuts for moving efficiently through a file. Here is a sample of the main shortcuts available in less:

	h		! display help = list of all commands
	SPACE		! go forward one window
	b		! go back one window
	down-arrow	! go forward one line (j also works, as in vi)
	up-arrow	! go forward one line (k also works, as in vi)
	g		! go to top of file
	14g		! go to line 14 from top
	G		! go to bottom of file
	/pattern	! search forward for pattern
	?pattern	! search backwards for pattern
	n		! go to next match for current pattern
	N		! go to previous match for current pattern
	q		! quit

These commands (especially the search functions) can greatly enhance your efficiency at using the man pages.

Programming

To do significant software work, you will need to know a programming language. The HERMES software suite and the libraries on which it depends are written in a mixture of C and FORTRAN. Good knowledge of one of these languages is essential, and a basic familiarity with the other is helpful (i.e. to the point where you can read the source code, if not write it).

If you've never programmed in C or FORTRAN, here's some more recommendations from Dana Robinson:

C programming: I used 'Practical C Programming' (in DESY library) to learn C. It is excellent, as are most of the O'Reilly 'animal' books. My only complaint is that it does not include an overview of the standard C libraries. I have an old, out-of-print C book that does. I highly recommend any C programmer to hunt around until they find a cheap book that does explain all the standard libraries in a reference-like form. It keeps you from reinventing the wheel and speeds development.
Fortran: There are a bunch of Fortran books in the library that are decent. My advice to the new learner is to get one that covers Fortran 77 only and NOT Fortran 66, 8x, 90, etc. Some Fortran 90 books are OK to learn from, since they keep the F90 stuff separated from the F77 material (in a special chapter or at the end of each chapter). You just have to hunt around until you find something you like. New books are hard to find since nobody uses Fortran anymore (except us). Most new books are intended for use as college textbooks and so carry a textbook price (~$50-70).

What about online references for C and FORTRAN? In the case of C, it is easy: almost all UNIX systems include pages for each and every C library command. So, if you have forgotten the particulars of the strcmp command, just type man strcmp ... et voila. UNIX, after all, is written in C.

For the older (but wiser :)) FORTRAN language, one must turn to the web for good online references. An excellent reference manual describing the F77 (FORTRAN-77) standard can be found here. This manual is located at a web site called www.fortran.com ... good place to look, eh. :) Their site features an abundance of links, including one to online tutorials. I have not looked at any of these tutorials ... maybe they're helpful, maybe they're not. I would advise you to avoid the FORTRAN-90 standard. It was a great idea but it never caught on ... the UNIX FORTRAN compiler itself is called f77.

There is one complication with regards to FORTRAN: the most popular FORTRAN 77 compiler, g77, is no longer supported. Not only is it not supported, but it is no longer available in the latest versions of the LINUX operating system! Since HERMES Code Warriors support only FORTRAN 77, those who wish to install and run HERMES software on their own LINUX boxes must ensure that g77 is installed.

NEEDTEXT - We should have some statement here about C++. As Larry points out, this is NOT backward compatible with HERMES software, so we should develop some way of controlling this so that useful C++ code is available, but people are aware that is has to be an "add on."

Setting up your HERMES AFS Account

If you have a brand new HERMES account, you need to set it up so that it works properly right from the start. This special setup is only required because you are a Cadet ... if all you want to do is read email or check the web, none of this is really necessary.

AFS

First we have to talk briefly about AFS. This is the Andrew File System, a very nifty way of sharing files and disks between computers all over the world. (The main alternative method is called NFS, for Network File System ... but more on that later). At DESY all the accounts use the AFS file system and are centrally managed by the IT group through a central registry. There are also accounts for the Microsoft operating system on PCs, for which HERMES software is definitely not supported but you may find useful if you want to use programs like PowerPoint, Word, and Acrobat Distiller.

After you get an account, you can "see" the afs structure when you log in; just type pwd.

AFS accounts bring with them a few little modifications of some of the standard UNIX account management commands and concepts. Here are the three main ones:

changing your password
You should know from your UNIX reading that the passwd command is used to change your password. When you have an AFS account, this password propagates to all IT managed accounts -- under AFS, you see, these are basically the same account as far as administrative things like passwords are concerned. To change your AFS password, simply type:
	passwd
DESY has also set up a registry web interface (https://registry.desy.de/registry) for changing passwords, but you must be inside the DESY firewall for it to work.
changing your default shell
Your choice of default shell is stored in the DESY registry database under AFS, along with some other information like your preferred email address and phone number. This registry information is displayed when someone uses the finger command to find out about you. DESY IT policy is that only someone from the UCO or your group administrator can change your choice of shell. One last remark about changing your default shell: your choices are limited to only those shells listed in the system file /etc/shells are valid selections.
file permissions
From your UNIX reading, you should have learned about the chmod command for altering access permission on your files. But files managed by AFS have a completely different permission system. For one thing, it is based on access control lists rather than the familiar user-group-other categories. For another thing, files you create in an AFS directory are by default given the same access rights as the directory in which they are created (with the mkdir command). This means that subdirectories created in your "home" directory are not readable by anyone but you (becuse your "home" directory is not readable). This is opposite to the usual UNIX behaviour, and can be quite irritating at first. When your AFS home area is setup, only one subdirectory called public is initialized in such a way that anything placed within it is readable by everyone else. You can change all this, but you're simply going to have to learn about AFS access permissions: just read IT's Introduction to AFS at DESY page.

Shell configuration

What we're going to do next is set you up with a couple of standard configuration files. These special files should be placed in your home directory. They are run whenever you start a new shell (e.g. when you log in), and they set important environment variables like your PATH. If you modify these configuration files, the settings contained within will only be activated if you start a new shell ... or if you "source" them (in "C shell Speak").

The files you need depend on your choice of shell: bash or tcsh.

If your default shell is bash ...

Download: .bashrc

If your default shell is tcsh ...

Download: .login

I'll just make one remark about the PATH setting provided by these files. Your local directory (pathname ".") is probably placed somewhere in your "search pathlist" (to see your search pathlist, type echo $PATH). It usualy is either first or last in this list. Some sysadmins like it to be first, so that a user is sure that if she by chance has created a program with the same name as an already existing program, hers will be run first. Other sysadmins prefer to have it last on security grounds (a "cracker" could conceivably place an evil program called `ls' in your home directory ... you get the idea).

Here's a tip: if you find yourself on a new system and would like to set up appropriate startup files, you first need to determine some initial values for three important environment variables: PATH, MANPATH, and LD_LIBRARY_PATH. These variables contain colon-separated lists of directory names, specifying the search path for executable programs, man page files, and programming libraries respectively. The best thing to do is to see what the system has supplied as a default: do echo $PATH when you first log in. Put that specification in your .bashrc file, and then modify it as you learn more about the new system. For MANPATH, there is also a command called manpath which will return a recommended setting for you if the MANPATH variable is unset. Finally, I should point out that LD_LIBRARY_PATH is unlike the other two in that the ld (loader) program which uses it will still search its default directories, even if LD_LIBRARY_PATH is set. (By comparison, PATH and MANPATH settings override any defaults). So you only have to supply non-standard library directories (e.g. the location of the CERN programming libraries ... more on that later). The standard search path for ld may be found either at the end of the ld man page or in the file /etc/ld.so.conf (Linux systems only).

So that's all very nice ... or is it? Here comes a nasty little complication: there's more than one way to start a shell, and the startup configuration files that are run change accordingly.

  • login shells are the type of shell process that is run whenever you log in. In the GNU version of bash, you can also explicitly start such a shell using bash --login.
  • interactive shells are any shell processes into which you can type commands. Login shells are one example. If you just type bash, you will also create an interactive shell.
  • non-interactive shells are the type that are spawned when a shell script is executed.

The question is: which startup files are executed by each of the three types of shells? Basically it's a mess:

  • Non-GNU versions of bash run .bashrc for all interactive shells, and nothing for non-interactive shells.
  • The GNU version of bash runs a file called .profile for login shells, .bashrc for all other interactive shells, and nothing for non-interactive shells.
  • tcsh executes .tcshrc for all shells. If it can't find that, (or if you're running vanilla csh) it executes .cshrc instead. You can suppress this for non-interactive shells by using the -f option on the magic shell-invocation line at the top of your script. But one more thing: for login shells, C shell also runs a file called .login if it exists. This file is run after .tcshrc or .cshrc.

... and even these rules don't always work. Look: what you really want to do is make sure that your startup file runs whenever a login shell is created, since all of your child processes are ultimately spawned from such a shell! So here's what I suggest:

  • for bash users: Put all your beautiful configurations in .bashrc. Then create a file called .profile which contains this one line:
 . $HOME/.bashrc
  • for tcsh users: Put all your beautiful configurations in .login. Just realize that any shell variables or aliases you defined there will not be accessible to your C shell scripts ... only your environment variables. This is a good thing if you want anyone else to use your scripts! If you find that annoying, create a file called .tcshrc which contains this one line:
  source $HOME/.login

If you are really curious what is going on with startup files on your particular system, just put an echo command at the top of each file, like echo ".profile is running now ...". You'll soon see which files are being run on which occasions.

The User Interface: Window Managers and Terminals

The section gives you a little background about what is going on as you use your terminal windows and other graphics interfaces on whatever unix system you are using. You will almost never be working on a monitor which is directly connected to the HERMES PC farm, rather you will be connecting to the computer, most likely via a program like ssh (we'll discuss this more later). In fact, just to be reading this wiki you must have the basic ability to work on the local computer at which you are actually sitting!

When you first approach a computer with a graphics monitor, the first thing you will see is a login screen: this screen is created and managed by the display manager. When you log in, you start an X session. Most personal computers these days have a default interface which may (or may not) be true X, but it is still very similar to X in the way it works. X is the language in which UNIX computers express graphics information, and we will learn more about it later. For now, we will just introduce the concept of an session manager, a program which manages your display and all the windows you open. To be precise, amongst the session manager's duties is control of a window manager. It is this program which does the actual control of your windows, allowing you to move them, iconify them, resize them, etc ...

One of the most important windows you can open is a terminal window. On X-based graphics machines your terminal window is generated by the program xterm. There is probably some cute button or menu on your screen that will create such a window for you ... all it's doing is running the xterm program. In general, a terminal is simply a text-based device for interacting with a computer. In the days before graphics displays, there were devices called VT100 terminals consisting of just a 24x80 character display and a keyboard. We used to have such terminals in the OstHalle. xterm simulates the operation of such a VT100 terminal, but in a graphics environment controlled by your session manager. The great advantage of these virtual terminals is that you can have more than one open at the same time!

Together the window manager and the terminal window comprise the user interface between you and your computer. In this section I will show you some features of this user-interface that can make your work much more efficient.

Your GUI, the Window Manager

GUI stands for Graphical User Interface, which is precisely what the window manager provides. There are many different window managers available, such as AfterStep, mwm, kde, and Motif ... the most popular on HERMES machines was fvwm2 back when people directly logged on to the old mainframes or data acquisition machines. All of these have many similar features, but I will focus on fvwm2. If you're happy with the window manager you are using at present, you can skip this section.

Fvwm2 is a very nice window manager, but it is not necessarily the default on the HERMES or other linux machines. To change your default window manager, you must edit a file called .Xsession in your home directory: any commands you place in this file will be executed when your session begins. Here is an example of such a file:

	xrdb -merge .Xresources
	ENVIRONMENT=LOGIN
	export ENVIRONMENT
	/usr/bin/X11/xterm -sb -sl 500 &
	xclock &
	fvwm2

Important: make sure your .Xsession file has execute permission! You are writing a little Bourne shell script here for the X initialization procedure to run. Be sure also to run any programs that do not quit immediately in the background (xterm and xclock in this example) using the `&' character. The last thing that your script does is start a window manager ... this is very important, otherwise you will not be able to move or resize your windows!

fvwm2 can be extensively customized by creating a file called .fvwm2rc in our home directory. The syntax of this file is explained in the fvwm2 man page. Here is an example that provides you with many nice features:

Download: .fvwm2rc

Amongst other things, this configuration file defines helpful pull-down menus for starting common programs. To access these menus, click on the root window (the Desktop in Mac/Windows lingo). Any window manager will usually have several such menus which can be accessed by clicking different mouse buttons, or holding down CTRL or SHIFT when you click. Just try all-of-the-above! Other graphics programs like xterm also provide you with such useful menus most of the time ... just click everywhere (try all mouse buttons, and try the CTRL and SHIFT modifiers). The xterm menus allow you to change the font size in your terminal window, add scroll bars, and send various control signals.

The configuration file also tells fvwm2 to start a number of plug-in modules at startup. In the .fvwm2rc file provided above, the FvwmPager module is invoked to give you that nice blue button bar. The xxx module produces the little red map of your virtual desks. The screen you are looking at is one of them ... if you click on one of the other primary squares in the little red map you will be transported to another desktop, where you can open even more windows. Your effective desktop area is thus greatly enlarged! You can move windows around by pressing SHIFT while dragging the little window images around in the desktop map. Each plug-in module has its own man page.

You can perform cut and paste operations with fvwm2 and other UNIX window managers. Here's how: Drag out the text area you want using the left mouse button. Then click the middle mouse button somewhere else, and voila! The text you selected will be pasted at your cursor location. You can also select single words of text by double clicking on them with the left mouse button.

UNIX window managers can be extensively customized. One example is key bindings. Sophisticated window managers (and some other programs) allow you to assign command to cetrain keystrokes (commonly called keyboard shortcuts). You will find some people who have configured their window managers so that they hardly have to touch the mouse at all: you can program keyboard shortcuts to move the cursor, resize windows, and change virtual desks, etc.

Tips and Tricks with UNIX

Files, Disks, and Important Directories

NEEDTEXT text coming up by Ed, Larry ...

UNIX Filesystems

The UNIX operating system is able to deal with many different filesystems, which are just protocols for the management and organization of files. As you are no doubt aware, your files are stored on hard disks. But where are those disks? One or more of the disks is probably connected to the machine you are working on. But it is quite likely that you are also using disks connected to some other machine, which you are not logged in to at all. The thing is: some filesystems are designed for use with your computer's local disks, and others provide a virtual interface to disks located on a remote computer. Most but not all of the HERMES disks use journaling filesystems. Journaling is a technique whereby changes to the filesystem are recorded in a log area. The purpose of this is that it makes it much faster to recover from a computer or disk crash: the recorded journal information is used to complete operations that were in progress at the time of the crash. You may encounter this journaling concept elsewhere (e.g. the HERMES database package dad uses journal files for the protection of its servers). For access to data on remote disks, UNIX computers use NFS (Network File System) and AFS (Andrew File System). More concepts: to be accessible to a computer, filesystems must be mounted, which is the software equivalent of plugging in an external disk. The list of filesystems that are mounted when the system boots up can be found in the file /etc/fstab. Each filesystem typically corresponds to one complete disk, but that is not always the case. You will see that the remote filesystems (designated nfs or afs) in this file contain a hostname as part of their specification. For example, you will see that the directory /production is in fact an NFS-mounted filesystem, and resides on the host kirk which is one of the machines of the PC farm. The df command is very useful for examining filesystems. If you just type it in with no arguments, it presents a list of all currently mounted systems along with the amount of space available on each one. If you supply a file or directory name as an argument, you will obtain information only for the disk containing that file. The amount of free and used disk space is reported in blocks, a term which usually refers to a unit of 1024 bytes (i.e. approximately 1 kilobyte). Some UNIces, however, choose to define a block as 512 bytes. One last detail about disk space: some entries in the fstab file have a quota specification. This limits the amount of space that a user can fill on that filesystem. On the HERMES PC farm, for example, the /afs/desy.de/user/ filesystem has a quota. The reason is that all users' home directories live on this relatively small disk, and it can fill up quickly. Use the fs listquota command to view your quota status. If you exceed your quota, a variety of things may happen depending on your system ... often you will just receive a warning message every time you log in, until you clean up your area, but you may also not be able to receive email to your DESY account.

Text vs Binary Files

Any file can be broadly classified as a text file or a binary file. Here's the difference: each byte = 8 bits of data in a file contains a number from 0 to (2^8 - 1) = 255. In hexadecimal notation, this range is 00 to FF. Now the first half of this range (00 - 7F) is used by ASCII (American Standard Code for Information Interchange) to represent characters: letters, numbers, and punctuation marks, plus a few other things. For example: the capital letter `N' has ASCII code 4E, the space character has code 20, and code 07 represents the invisible `bel' character which causes your keyboard to emit a beep. Here is an ASCII Table if you are interested. Programs which deal with text files (such as editors and compilers) know how to interpret these codes. However binary files (such as executable programs in machine language or compressed graphics files) also use the remaining range of byte values (F0 - FF) These cannot be interpreted as ASCII text and will come up as gibberish if you open such a file in your text editor. Let me now introduce a few useful little commands for dealing with files of various types.

    • file: Run the file command on a filename to learn its type. This cute little utility will report some rather useful information from the headers of binary files which you cannot easily read (e.g. it will tell you the format and image size of graphics files such as GIF's or JPEG's). It will also look at the first few hundred bytes of a text file and try to guess what it is -- program written in C? English text? If you run it on a file containing German sentences, it will report `ascii text' instead of `English text' ... magic! :-)
    • od and strings: These commands are useful if you should ever need to examine the contents of a binary file. The od command simply dumps the contents of the file in a readable format, expressing each character in hexadecimal, octal, etc... notation (see the man page). The strings command hunts through the file looking for sequences of adjacent ASCII characters, and reports those to you. When applied to an executable program e.g., this command will retrieve such things as the error messages the program can deliver, or the X resources it knows about -- basically any character-string parameters that the programmer used. This can be surprisingly useful.
    • nm and objdump: These are extremely useful commands for examining the content of an object file, namely a file containing compiled machine code. The names of all subroutines, functions and global variables used by the code may be retrieved, making the commands extremely useful for locating missing routines in programming libraries. We will come back to these useful commands in the programming part of this mission.

Important Directories

As you know, UNIX filesystems are organized into directory hierarchies. It is rather useful to know some of the most important standard directories:

  • /home
    Most systems have a /home area in which users' home directories reside. However, on the PC farm, this is not used (though the directory exists) and the equivalent area is /afs/desy.de/user
    /bin, /usr/bin, and /usr/local/bin
    These are the most common places to find the executables for all the standard UNIX commands and common utilities. All three of these directories should thus reside in your PATH environment variable. Let me make a remark about the /usr/local area: traditionally, the programs that came with the original operating system are stored in /bin or /usr/bin. The /usr/local tree is a second version of the /usr area which provides a place for the system manager to store all the `non-standard' utilities that he or she would like to install. It is thus wise to place /usr/local/bin earlier in your path than the other directories.
    /lib, /usr/lib, and /usr/local/lib
    These are the equivalent of the bin directories, but for the storage of programming libraries or important input files. On some newer systems you will also find /usr/share and /usr/local/share being used in a similar capacity.
    /usr/include and /usr/local/include
    You guessed it ... here is where you find whatever include files you need to use those programming libraries in your code.
    /cern
    This is the home of the many utility programs and programming libraries that we use in particle physics from the coding gurus at CERN. This software collection is loosely referred to as CERNLIB, although that term more properly refers to a particular sub-package. The CERN software is organized into releases, with names like 97a and 2000. Release directories with these names appear under the cern/ top-level directory. There are also virtual releases called pro and new, which are symbolic links to whatever releases are presently considered stable (pro) and developing (new). The release areas look very much like the /usr/local tree: they contain bin, lib, and include subdirectories that contain exactly what you expect.
    /var and /etc
    These are system directories containing such things as the system's boot procedure and log files. A few examples to whet your appetite:
    • /etc/printcap lists all printers that you can use (note that on DESY site, you get to most printers via a "spooler" which distributes the files to the printers on the DESY network
    • /etc/shells lists the valid shells that you can establish as your default shell
    • /etc/bashrc, login, cshrc, ... (without the usual dot in the filename) are the startup files that the system runs when you log in. It is only after these procedures are executed that your personal startup files are run.
    • /etc/password contains information about all user accounts (on most systems), including each user's name, default shell, and encrypted password

There are a couple of other important directories, but these will be described below, in the sections on GNU software and the X11 graphics system.

GNU Software

The UNIX programs you find on your computer come from many different sources. Some are distributed with the system, others have been retrieved from the net by your system administrator, and still others come from specialized coding houses such as CERN. A very important source of UNIX software these days comes from the GNU Project. GNU represents an amorphous group of volunteer programmers who write all kinds of fabulous free software. Their work led directly to Linux, the UNIX operating system that is taking the PC world by storm. To be precise, the Linux kernel (the engine which underlies the operating system) comes from Linus Torvalds, but everything around it on most popular Linux distributions comes from GNU. `Everything around it' means the commands and utility programs that you actually use.

If you are using a Linux system, you are no doubt already using the GNU versions of such basic commands as ls, man, and bash itself. If not, you are most likely running your OS's own versions, but can probably switch to GNU if you try. Why switch? Two reasons: (1) GNU versions are better ... more powerful to be exact. For example, the GNU version of the make programming utility supports a much richer command language than other versions. And reason (2): because these versions are more powerful, it is assumed by several pieces of the HERMES software suite that they are in use. The make command is the perfect example: HERMES Makefiles (for building our software packages) all use GNU syntax and will not work if you are using a non-GNU version. The same is assumed for the tar utility for creating software archives. How do you know which version you are using? Well, GNU programs have a signature: they all support dual versions of command line options, one version predicated with a single `-' character (as in ls -a), and another, more descriptive version that starts with `--' (as in ls --all). This is such a nice idea that it is becoming a standard. Furthermore, all GNU programs support the options --help (gives you a convenient list of command line options) and --version (tells you the program version, which will always include `GNU' for GNU software). So if you are not using GNU versions by default, how do you switch? You could download your own copy and put it in your personal bin area. But, on Linux systems, most the programs in /usr/bin and /usr/local/bin are already GNU software, so check before trying this. Note some GNU utilities do not appear in /usr/bin or /usr/local/binbut are available in these directories with another name: the usual program name preceded by the letter g. A good example is the GNU version of man ... on the HERMES PC farm, it is available as /usr/bin/gman. Another example is the GNU version of tar which is often called gtar. For cases like this, simply create an alias for `man' which points to gman. Excellent manuals are available for GNU software packages. (The excellent manuals are yet another reason to favour GNU software!) You can find them at the GNU web site: http://www.gnu.org/manual/. Many formats are provided, including HTML for online reading and postscript for printing.

Mastering Man Pages

I have extolled the virtues of the UNIX man pages a few times ... they're not very pedagogical, but they are complete. Once the basic concepts of UNIX are clear to you, reading the man pages for bash, X, or perl for example are excellent ways to master the details of those complex systems. Note that there is a man page named X, even though X is not a command (it is a software package). Here are some basic but important features of the man command. Unless otherwise specified, they work with any version of man.

  • searching
    man -k string searches for all man pages whose name or brief description contains the requested string. Your string may contain the familiar shell wildcards *, ?, and [...]. Just be sure to enclose your search string in quotes to prevent globbing: if the shell sees an unescaped wildcard in a command, it will try to replace it with all matching filenames in your local directory! You can also escape individual wildcards by placing a backslash in front of them. man -f string does a more restrictive search, basically only retrieving pages whose names contain the requested string.
  • printing man pages
    Since man pages are such useful references, it is often useful to print them out. The command
	man -c name > manpage.txt

produces an ASCII text file. If your printer is sufficiently smart, you can just print this text file directly. Otherwise you have to use one of the tricks described in the postscript section to first change your file to postscript format.

  • page locations
    man -w name tells you the actual location of the man page file with the given name. This can be useful if there is more than one version. On the PC farm man -w ls gives me this:
	/usr/share/man/man1/ls.1.gz

If there are more than one, then by default you will get the first one. To specifically select a particular man page, you can specify the man page path explicitly with

	man -M /usr/share/man/ ls
  • multiple pages
    As just described, there is sometimes more than one man page with the same name. The default behaviour of the non-GNU man command is to display them all, one after the other. Usually you can just keep hitting space and your pager will advance from the end of one page to the next one. If that fails, typing :q will cause the pager to quit the current man file and go to the next one.
  • sections
    man section name will restrict your man page selection to a particular section (chapter) of the virtual UNIX manual. Sometimes this is important ... here's an example. Suppose you are interested in learning more about the C function printf. If you type man printf on the PC farm, you will get a page describing a command which tells you about the UNIX command printf. If you type man -k printf and look just for the printf commands, you will see that there is another page of the same name:
printf (1)                 - format and print data
printf (3)                 - formatted output conversion

To retrieve the page you want, just specify section 3: man 3 printf. By the way, ever wonder what all those section numbers mean? Section 1 contains all executable programs or shell commands ... for the others, check the GNU man page for man itself. Here's another little tip: Each section of the online manual contains a special page called intro, with a brief description of that section.

Compression and Archiving

If you have ever downloaded software for your PC or Mac from the net, you have no doubt encountered data compression schemes. In order to save disk space programmers have come up with ingenious ways of compressing the information in files. An obvious example is text files: since ASCII characters use only the first 7 bits of each byte, they can be readily compressed into binary files that don't waste that last bit. Real compression techniques are much more sophisticated than that, and in fact use rather interesting mathematical algorithms. If you are interested, you might have a look at this introduction to data compression page.

If you come from the PC world, you have probably encountered the zip format, which is based on the LZW (Lempel-Ziv-Welch) compression algorithm. These files typically carry the extensions .zip or .Z and may be created or unpacked using the UNIX utilities compress and uncompress. But the standard UNIX utilities for file compression and decompression are the GNU programs gzip and gunzip. GNU-zipped (`gzipped') files carry the extension .gz and are compressed using a modified LZW algorithm. gunzip is able to decompress not only these files but also those created by the compress and pack programs. Although it comes from UNIX land, gzip format has become so common that even Mac/Windows decompression utilities such as Stuffit Expander know how to deal with it.

Here are the simple UNIX commands for dealing with compressed files:

	gzip file		! Replaces file with compressed version file.gz
	gunzip file.gz		! Replaces file.gz with uncompressed version file
	gunzip -c file.gz	! Leaves file.gz alone, sending uncompressed
				!	data to STDOUT

Archiving refers to a mechanism for packing a collection of related files into one tar file (stands for Tape ARchive). The most common example is distributed software packages. These packages usually consist of many source files and it is thus natural for the programmer to distribute his or her files as a single archive. The UNIX command for creating such an archive is simply tar. (The origin of the `Tape Archive' nomenclature is that the tar program was designed as a utility for archiving files on a permanent medium like magnetic tape.) Naturally GNU has a version of this important program and it may be called gtar if you are not working on a Linux system. As mentioned before, it is most important that you set up your system so that the GNU version is the default -- HERMES software relies on it. Tar files are traditonally carry the extension .tar ... but you will usually find them in a gzipped state, with extension .tar.gz. An alternative extension for gzipped tape archives is .tgz, which reflects GNU-tar's ability to compress on the fly.

Following is an overview of the most common ways of using the GNU version of tar.

First, we can view the contents of a tar file as follows:

	tar -tf file.tar		! List contents of file.tar
	tar -tvf file.tar		! List contents of file.tar in verbose mode
	tar -ztf file.tar.gz		! List contents of compressed archive
	tar -ztf file.tgz		! same

To explain: -t option asks for a content list, and -f always comes right before the name of the tar file itself. The options -v and -z work with any combination of options. -v always requests verbose mode (`tell me exactly what you did'), and -z engages compression mode (`compress or decompress on the fly').

Here are the most common GNU-tar commands for extracting the contents of a tar file. Note: none of these commands affect the tar file itself> It remains intact even after a full file-extraction procedure ... if you want to delete it after a successful unpacking you must do so yourself.

	tar -xf file.tar		! Extract all contents of file.tar
	tar -zxvf file.tar.gz		! Extract contents of compressed archive
					! 	file.tar.gz in verbose mode
	tar -xvf file.tar f1		! Extract only file f1

Here are the most common commands for creating or modifying a tape archive using GNU-tar:

	tar -cf file.tar dir		! Create tar file of all files in directory
					!	dir AND its subdirectories
	tar -cf file.tar f1 f2  	! Create tar of files f1 and f2
	tar -rf file.tar f1		! Replace file f1 in archive
	tar --delete -vf file.tar f1	! Delete file f1 in archive

Printing

Actually printing a document or file turns out to be very system dependent. If you are working at home, you probably have a dedicated printer hooked to your local system. If you are at a lab, you probably have access to hundreds of printers which are sitting on the local network; some may be access controlled, but in general you have the ability to print any old embarrassing thing out anywhere on your local network ;) So here, just a few words about printing at DESY itself, and then we'll direct you to documentation from the DESY IT division and UCO.

The printers on the DESY network are fed by two spooler machines; these map the actual printer name to its actual network address, and send your file off to that address in an orderly fashion. You will need to set up your local system printer control program (on linux and Macs it is CUPS) to correctly communicate with the spoolers. You can find extensive information and examples at DESY IT for a number of different operating systems and even flavours of linux. Follow the sidebar link to "Services" then the mainpage link to "Printing." I just note here that the "standard" hermes BW postscript printer in Bldg. 1e is HMSPS1. If you're looking for a particular type of printer (e.g. duplexing color), the DESY printer list page is really useful.

Customizing

One of the great joys of UNIX is that you can customize your computing experience in just about any way you want. Following is an overview of the principal ways of doing this. One important customization technique (X resources) is left to a later section.

rc files

You have already encountered the .bashrc and .cshrc files that are executed whenever you start a new shell. In fact a large number of UNIX programs support such rc files (where rc stands for `run commands'). The editor vim for example looks for a file .vimrc in your home directory ... if it exists, the optional editor settings contained within are run during the program's startup phase. Here is a portion of my .vimrc file:

	set ic
	set nowrapscan
	set hi=100
	set undolevels=100

The syntax for these lines is obviously specific to vim. As is usually the case, these customization commands can be given from within the program ... it is just more convenient to bundle your favourites together into an rc file. The most common use of rc files is for shells, editors, and window managers. You will find some people with very elaborate configuration files for these frequently-used utilities.

The syntax of the rc file is always explained in the program's man page. The man page may also reveal that the rc file has an unexpected name ... the lynx browser for instance looks for lynx.cfg by default. It is also possible to alter the expected name of the rc file, either with a command line option or an environment variable.

Environment variables, command line options, and aliases

Again, read the man pages. Most programs support a number of special environment variables which allow you to customize their operation. We have already encountered some examples ... e.g. the variable MANPATH which is checked by the man program. Your favourite environment variable settings should go in your shell's startup files.

Most programs also support a number of command line options. If there are comand-line options which you use frequently, a good idea is to create aliases that incorporate those options. Pedantically speaking (and that is our job here), one should note that "aliases" do not exist for the Bourne shell. For most variants of the Bourne shell, however, they do exist and are so useful that we descibe them here. For example, the command

	alias d='ls -sFC'	! (sh version)
	alias d 'ls -sFC'	! (csh version)

creates a new command called d which lists the contents of a directory in a particularly useful way. (By the way, I sneaked this nice alias into the example .bashrc and .login files I gave you earlier). Here's another example:

	alias gprint='gunzip -c \!* _ lpr -Phermesps1'

creates a useful command that will unzip a compressed postscript file on the fly before sending it to the printer. Note the special syntax \!*. This represents `whatever you typed next on the command line'. It enables you to use your new alias like this: gprint file.ps.gz. Please note that you must enclose these characters in single or double quotes ... if not, the shell will try to glob the wildcard * into a list of all matching filenames.

The customization options available via rc files, environment variables, and command line options frequently overlap (there's more than one way to skin a cat!) The usual priority is ``command line options override environment variables which override rc files.

Shell variables

Just like many other programs, the shell can be customized using environment variables. But it is also aware of a second set called shell variables. These are the ordinary variables that are used extensively in scripts but are not exported to child shells ... we saw them in action before. Now each shell offers a number of variables that have special meaning to it. The C shell special variables are in lowercase while the Bourne shell ones are usually in full caps. Some are set automatically ... e.g. cwd (C shell) and PWD (Bourne shell) always contain your present working directory. (If you don't believe me, have a look at their values.) Other special variables customize the shell's behaviour. You will find complete lists of all these variables in the bash and tcsh man pages, under the heading `Shell Variables'. A few examples are presented below.

You can list the values of all your defined shell variables by typing set; printenv does the same thing for your environment variables. You un-define shell variables using unset. But to actually set the variables, the syntax differs between shells, as we saw earlier:

	myvar="xxx"		! Bourne shell
	set myvar "xxx"		! C shell

So variable assignment in bash does not need a set command ... but surprise! There is a bash set command. It is rather unusual -- it takes options: set -o option. Many of these strange options provide the same functionality that tcsh accomplishes with shell variables. To unset such an option, use set +o option. To list the current status of all options, type set -o on its own.

Here is another important difference between the shell flavours: in Bourne shell, environment variables are just a special case of shell variables ... but in C shell they are completely independent -- you can have an environment variable and a shell variable with the same name, but different values. This explains the use of unset in bash for both shell and environment variables, while C shell requires the special unsetenv command for environment variables.

So with that introduction, here's a taste of the sort of shell variables and bash set options that you can use to customize your shell's behaviour:

  • prompt (tcsh) and PS1 (bash): Contains the `prompt string' that you see everytime the shell is ready to accept another command. There are many special codes you can include in this string, to do such neat things as display your present working directory or the current time. The example bashrc and login files provided earlier contain a useful setting for the prompt variable.
  • history (tcsh) and HISTSIZE (bash): Contains the number of commands to remember in the history list, so that you can retrieve them using the up arrow. If this variable is set to 0, your will not be able to use the up arrow at all. In bash, you also need to enable the entire mechanism with set -o history, but this is the default.
  • cdpath (tcsh) and CDPATH (bash): Contains a list of directories in which the shell looks for destination directories when you use the cd command. The list is space-separated in tcsh, and colon-separated in bash.
  • noclobber (tcsh) and set -o noclobber (bash): If this variable / option is set, the shell is prevented from overwriting an existing file when you use output redirection (e.g. cat newfile.txt > oldfile.txt will fail).
  • noglob (tcsh) and set -o noglob (bash): Prevents globbing of wildcards such as * and ? on the command line. Can be quite useful in scripts!

There are many more ... read the man pages for bash and tcsh if you are interested.

Power Tools

UNIX has many powerful commands and utilities that can help you in your work. Let me just introduce you to some, so that you know what's available. If you need one of these power tools, consult your UNIX book or the man pages to learn more about it.

Here are some simple commands you must know about:

	diff file1.txt file2.txt	! compare two files and report differences
	diff -w file1.txt file2.txt	! ignore differences that are only whitespace
	head -200 file.txt		! extract the first 200 lines of a file
	tail -30 file.txt		! extract the last 30 lines of a file
	cut -f 2-4,6 file.txt		! extract tab-separated columns
					! 	2, 3, 4, and 6
	cut -d ' ' -f 2-4,6 file.txt	! use spaces (not tabs) as the
					! 	column separator
	sort +0 -2 file.txt		! sort a file alphabetically using space-
					! 	separated columns 0 thru 2
	wc file.txt			! count the number of words in a file
	wc -l file.txt			! count the number of lines in a file

Here are some more complex utilities that you should look up if you need them:

  • cron: Allows you to schedule a process to run at some later time or at specified time intervals, even when you are not logged int.
  • sed (stream editor): Edits lines passed through it via pipes. This is extremely useful for doing string manipulation in shell scripts.
  • test: Evalutes all manner of logical expressions, and can perform extensive checks on files (e.g. checking if they exist or are executable). This is extremely useful in shell scripts.
  • awk: A powerful utility for extracting columns of data from text files and performing mathematical operations on them. I've seen people do actual data analysis using awk! However ...
  • perl: is awk's big brother. Perl is a complete scripting language of great power, but can also run in command-line mode to mimic awk. It is an even more powerful scripting language than bash and tcsh, particularly in the area of string manipulation. Once you learn perl, you will probably stop writing shell scripts altogether. Perl is the number-one language for writing Common Gateway Interface (CGI) scripts on the web -- our entire HERMES documents database software plus the CGI scripts that allow you to search it are written in perl.
  • grep: This command searches through files for strings or string patterns. These patterns are called regular expressions and can be very complex indeed. You must become familiar with regular expressions and grep, you have no idea how much you will use them! Also, there is an extended version of grep called egrep, which offers some additional features but also loses some.
  • find: Searches for files in directory trees, and optionally executes commands on those files.

Let me give you some important historical background. In the beginning, there was ed, the most venerable of UNIX text editors. Ed is a simple line editor, but contains a powerful syntax for doing text manipulation. This syntax includes regular expressions for search patterns. Later, UNIX programmers built ex and vi on top of ed, including such sophisticated features as the capability to display more than one line and edit more than one file at a time. In the meantime, sed was born, which allows string editing on the fly via pipes. grep was also created, to search files for regular expressions without editing them. The shell itself also uses some fragments of regular expression syntax to do filename globbing (e.g. the wildcards * and ?). And finally the string manipulation operations in awk and perl resemble those of ed. Here are some examples of ed's venerable syntax at work:

	grep '\<not\>' *.txt

finds all occurrences of the word 'not' in all .txt files, but does not match such things as 'note' or 'cannot'

	echo 'One word' _ sed 's/[oO]/X/g'

replaces both O and o in the input string with X ... try it!

	ls [a-z]*

lists the names of all files starting with a lowercase letter in the current directory

  • From within vi:
	:%s/be \([a-z]\{1,\}\) careful/remain \1 calm/g

replaces `be very careful' with `remain very calm', or `be slightly careful' with `remain slightly calm', throughout the entire file.

To summarize: the ultimate resource for all this wonderful syntax is the ed man page. Or a good book, of course. :-)

Finally, let me extoll the virtues of the find command with a couple of examples. Suppose you are working with a data file produced by the HERMES reconstruction program HRC, and are bewildered by the variable rcTrack.AngleCorr ... what exactly does it mean? There is no manual for HRC, so you have to use the source, Luke:

     find /hermes/pro/hrc -name '*.[ch]' -exec grep rcTrack.AngleCorr {} \; -print

That will locate all C source code files or include files that use this variable. And here's another find sledgehammer, but let me warn you that the following is a system and probably shell dependent trick!. Suppose you are compiling a program, and the loader complains that it cannot find some module it needs called xsltGetUTF8Char. When this happens, it usually means that your program needs some non-standard library that you have not specified on the compilation line. What library could possibly contain this bizarre routine? It is likely to be in /usr somewhere ... and the library must end with either .a or .so (static or shared-object) or it is a single object ending in .o . Unfortunately, the shared object files are not configured on the PC farm to be searched with the following, but I hope you get the idea behind the trick. Here we go (warning - this might take a long time!):

	find /usr/ \( -name 'lib*.a' -o -name 'lib*.o' \) -exec nm -go {} \; 2>&1 _ \
		grep " T xsltGetUTF8Char"

The shared object file version of this command is:

	find /usr/ -name 'lib*.so' -exec nm -Do {} \; 2>&1 _  grep " T xsltGetUTF8Char"

There's your answer: xsltGetUTF8Char is a module from the library /usr/lib/libxslt.a AND in /usr/lib/libxslt.so... better add that to your compilation line! Later, we'll discuss just how you can tell which one your program linked to. Note that you'll see multiple entries for every time the symbol occurs, but the point is you found the library. Be aware that other libraries (especially those containing the basic C routines are not under /usr but under /lib (at least at present on the PC farm). Note that these commands are meant for Bourne/bash/Zsh shells; for C/Tcsh shells you must replace "2>& _" by "_&".

Suitably impressed? You have no idea how often one uses tricks just like this. Those two examples are particularly common ... I suggest you read enough about grep and find to figure out what they mean. You can also refer to the Wiki on UNIX Tricks!

Networking and X11

Internet Basics

Why do you need to know anything about the internet? Simple: it's impossible to perform a physics analysis without it. Even if you install the HERMES software on the machine on your desk, you'll have to filter the data files on the HERMES PCFarm and pull them over to your own machine. Most likely you will run your entire analysis on a machine located in another building, or even on another continent. All this fabulous connectivity is made possible by the internet, an impossibly-huge amorphous network that connects millions of computers around the world. Time for some jargon.

TCP/IP (Transmission Control Protocol / Internet Protocol) is the way that computers on the internet talk to each other. Each machine that wants to talk TCP/IP to another machine must have an IP address so that the machines in between know where to send the information. An IP address is a 32-bit number that must be unique on the entire global network. This number is split into four 8-bit bytes; for example, 131.169.245.44 is the IP address of the HERMES interactive login node "worf". IP addresses can also be referenced as names: 131.169.245.44 is equivalent to worf.desy.de, and 128.174.129.135 is equivalent to zero.physics.uiuc.edu. This association is managed by a Domain Name Server (DNS). Each DNS is in charge of assigning names to one domain of addresses. These domains are arranged in a hierarchical fashion: the entire *.gov, *.edu, *.org, etc blocks of addresses are top-level domains. Below that, *.uiuc.edu is the University of Illinois domain with addresses 128.174.*.*. The block of addresses *.physics.uiuc.edu = 128.174.135.* is not considered a domain, since it does not have its own DNS. The utility program nslookup can be used to query the DNS. If you type nslookup worf.desy.de from "worf", for example, you will get the numeric IP address. You will also see that the DNS which was consulted is 131.169.40.200 (milky.desy.de) -- desy.de is a domain, and has its own DNS. Your IP address can also be logically split into a subnet and a host. Somebody, somewhere is the immediate ISP (Intenet Service Provider) for your network services. This ISP has a pool of IP addresses to dispense, and splits them into one or more subnets. Each subnet is then given to a network administrator, who assigns host addresses to the various computers under his or her dominion. For example, the machine zero.physics.uiuc.edu lives on subnet 128.174.135, and we have a network administrator who assigns host names like `zero' and addresses like 128.174.135.20 to all of our machines.

TCP/IP communication is accomplished using packets, which are just little bunches of data. These little packets are sent from machine A to machine B one at a time. If the two machines are located on different subnets, the packets are sent through one or more intermediate computers called routers (or gateways). The essence of the IP protocol, and one of the great innovations of the internet, is that it provides packet-switched network communication. This means that each packet can potentially take a different route from A to B. The alternative is a circuit-switched network, such as the telephone system, where a physical, dedicated connection is made between both ends. One downside to this is that no other phone conversation can share the same connection ... and so during periods where one or both people are silent, considerable bandwidth is being wasted. A packet-switched network is also `bomb-proof', in that it is highly robust against the failure of individual routers. If the routers go down somewhere between you and me, the packets will just be rerouted somewhere else.

There are three main TCP/IP packet types. TCP (now meaning Telnet Control Protocol) is the main type. It is a connection-based protocol which is opened when two machines (actually two processes on different machines) start talking to each other, and closes the connection when they are done. TCP guarantees packet delivery, and most connection protocols (SSH, HTTP, ...) use it. A smaller packet type is UDP (User Datagram Protocol). Communication using this protocal consists of repeated openings and closings of network connections. UDP is usually less resource-intensive than TCP, but does not guarantee packet delivery. It is used mostly for utilities like ping and traceroute which simply check whether a remote machine is accessible or not. The traceroute program is fun to play with, actually ... you hand it a remote IP address, and it will trace the sequence of routers the packets go through. Unfortunately, for bandwidth and security reasons, a lot of machines these days have disabled UDP access. Finally, ICMP (Internet Control Message Protocol) packets are error-reporters that are sent when transmission problems occur.

The internet is a strange beast. It is not owned or operated by any person or company. Rather it is a collection of open standards to which people around the world voluntarily adhere. Development of the internet is largely accomplished by means of RFC's (Request For Comment). Much like our own HERMES internal notes, these are papers distributed to the internet community at large, suggesting new standards or technologies. These documents are reviewed, and if people like something because it's a really good idea, it gets adopted. Quite lovely, really. :-) If you are interested in more information about the fascinating world of the internet, a great place to go is Connected: An Internet Encyclopedia.

Making Connections

Internet Protocols

There are lots of ways to connect to a remote computer over the internet. You can log in, send graphics, check if the machine is alive, talk to people interactively, and retrieve files and web pages. Each connection actually consists of one process (i.e. a running program) communicating to another over the net, using a set of rules known as a protocol. Protocols are rather like languages: each is designed to facilitate a certain type of communcation. The language of mathematics is excellent for explaining physics, English is wonderful for poetry and elaborate sarcasm, while Italian is ideal for operatic arias. :-) Here are some of the main protocols you will encounter on the internet:

  • HTTP (HyperText Transfer Protocol) is the most popular protocol for the delivery of web pages. You know how web addresses almost always start like this: http://hostname.domain/...? That initial part of the URL (Uniform Resource Locator = web address) specifies the protocol to use when transferring the document. You may have seen another URL prefix: ftp://.... That specifies the FTP File Transfer Protocol. Other possibilities are gopher://, telnet://, and file:// (the latter is for local files on the same machine as the browser).
  • telnet, rlogin, and SSH (Secure SHell) are protocols for logging in to a remote machine. SSH is rapidly supplanting the other protocols because it is more secure. When you log in to a machine via the telnet protocol, you give your username and password to establish the connection. Your communication is then encrypted using your password. But the one thing that is not encrypted is the password itself! That is broadcast over the net as clear text, and is vulnerable to the password-sniffer programs used by hackers. The SSH protocol gets around this problem by encrypting your password too, using an ingenious method. (If you're interested, check the web or the SSH man page).
  • ftp (File Transfer Protocol) is designed for the efficient exchange of files between machines. Unfortunately it suffers from the same security problems as telnet, so most systems have made it impossible to use this program except under very limited circumstances. You can use the SSH protocol using the command sftp.
  • The X11 protocol is the principal method of transmitting graphics information between UNIX machines.
  • SMTP (Simple Mail Transfer Protocol), POP (Post Office Protocol), and IMAP (Internet Message Access Protocol) are all ways of delivering email messages.

These protocols are all very nice, but are only useful because there are programs which use them. The software that accomplishes the actual communication between machines is subdivided into clients and servers. The client/server model can be found everywhere in modern computing, but it is easy to understand. Let's take a familiar example: displaying the HERMES home page in your web browser. The protocol in use is HTTP. Your web browser (Firefox or Internet Explorer) is the client here: it is requesting information from a remote site. For the page to be succesfully transmitted, there must also be a web server running on the HERMES machine -- this program has to `serve up' the data requested by the client. The server program in this case is called httpd. That final `d' makes a frequent appearance in the names of server programs ... it stands for daemon, which is basically another word for server. For example, the server for telnet requests is telnetd, sshd handles SSH communication, and ftpd serves up files using the FTP protocol. If you look at the full process list on any machine you will see many of these daemons running under the root account, all waiting patiently for requests from client programs.

Logging In and Copying Files

But now down to practical matters: what you really need to know of course is how to use the client programs! Here we go:

ssh
Since security is such a big deal these days, ssh has supplanted telnet as the login client of choice. More and more machines have even disabled telnet access entirely -- and the HERMES PC farm machines has disabled it as well. You cannot get in via telnet unless you are connecting from within the desy.de domain. Here's how you log in with ssh to worf, one of the machines which you can connect to from outside DESY :
	ssh -l username worf.desy.de

You can do fancy things with ssh, like setting up your accounts so that you never have to type in your password ... but I'll leave that for you to ponder. :-) One other thing about ssh: you can also use it to execute commands on remote machines, without logging in. Let's use the date command as an example. If you want to ask what time it is in Germany, do this:

	ssh username@worf.desy.de 'date'

The date command (or anything else you supply) will run on worf, and you'll see it's output ... all without logging in. The ls command is particularly useful since executing it via ssh is the only way to get a list of files on the remote machine.

sftp
On to file transfer. To retrieve files from a remote machine, you can use the interactive program sftp. In the example below, you're only supposed to type in the stuff in boldface ... the rest is produced by the program. Depending on the system, you may not see ****** (but you won't see your real password in any case).
	sftp makins@worf.desy.de
	Connecting to worf.desy.de...
	makins@worf.desy.de's password: ******
	sftp>

At this point, there are a number of commands that you can use:

	sftp> ls		! lists files in current directory on remote machine
	sftp> cd doc	! moves you into directory doc on remote machine
	sftp> !pwd	! tells you where you are on your local machine
	sftp> !command	! executes command in spawned shell on local machine
	sftp> lcd mydoc	! moves you into directory mydoc on local machine
	sftp> get f.txt	! copies remote file f.txt to current directory on
			!	local machine
	sftp> put t.txt	! copies local file t.txt to remote machine
	sftp> help	! produces a list of all ftp commands
	sftp> help hash	! gives help for the mysterious hash command
	sftp> quit	! disconnects
scp
Like telnet, ftp transmits passwords in clear text. It has thus been supplanted as the standard file transfer utility by scp, another program of the SSH suite. scp is marvelous, it works just like the UNIX cp command, but with a host designation prepended to the remote filenames:
	scp makins@worf.desy.de:junk.txt ./myjunk.txt
	scp myjunk.txt makins@worf.desy.de:

copies one file either from or to the hermes SGI. You can also copy multiple files:

	scp 'makins@worf.desy.de:doc/junk*' .

Note the single quotes around the remote filename description. That is important: if you leave them out, the shell will glob the wildcard, and try to expand doc/junk* to include any matching filenames on your local machine ... which is not what you want. Finally, you can copy entire directories recursively:

	scp -r makins@worf.desy.de:doc .

By the way, rlogin, rsh, and rcp are older clients that do the same things as ssh -l, ssh, and scp ... but without password encryption. So they've also gone the way of the dinosaur.

Web Browsers

Firefox
Since you are reading this tutorial, you no doubt know something about web browsers. Web clients like Firefox are all dolled up with lovely buttons and pull-down menus, so there's not much to explain. However, here are a few less-than-obvious tips:
  • disk cache: Firefox does something rather smart to speed up its operations. Every time you download a web page, Firefox stores a copy of it in your disk cache, which is simply a directory on your account. You may find the cache under your home directory, in the subdirectory ~/.mozilla/firefox/xxxxxxxxx.default/Cache, where "xxxxxxxx" is a string generated by Firefox. The reason for this cache is to speed up operations: if you revisit a page that is already in your cache, Firefox renders the local version -- it's much faster than downloading the whole page again over the internet. However ...
  • shift-reload: Have you ever edited a web page, clicked the reload button in Firefox, and found that nothing has changed? The solution: hold down the SHIFT key when you click reload. This forces the browser to reload the page. Otherwise, the browser simply compares the remote version of the page to that in cache. If it finds no differences, it renders the local version in cache. This technique sometimes fails. For example, new versions of included graphics or style sheets are not picked up as differences by the browser.
  • lock files: When Firefox starts up, it creates a little file called lock in your ~/.mozilla/firefox/xxxxxxxx.default directory. This file is then deleted when you exit your browser. The lock file prevents two Firefox'es from running at the same time ... confusion would arise, e.g., if each version tried to edit the same bookmark file. Sometimes it can happen that Firefox crashes, and does not delete the lock file. You will then get a warning when you start up a new Firefox process. To overcome this problem, simply delete the lock file.
lynx
Lynx is an old text-based web browser. That's right, no graphics at all, it runs directly in your terminal window. You might think Lynx is no longer useful in these days of high-speed connections and fancy web graphics. But not at all. It is still useful, for two reasons. (1) It's fast! If your net connection is slow, consider Lynx. (2) It's scriptable! What I mean by this is that since you can invoke it directly from the command line, like this:
	lynx http://www-hermes.desy.de/

you can call it from a shell script. That can be quite handy ... I'll leave you to dream up some applications. Here are some useful modifiers that you can hand to Lynx for scripting applications:

	lynx -dump -nolist -width=132 http://www-hermes.desy.de/

The -dump option means `retrieve the web page to STDOUT and quit' -- perfect for use in a script! Option -nolist means do not include a list of linked URL's at the end of the dump, and -width=132 means render the page using a text width of 132 columns (rather than the usual 80). Dumping this particular page is not so interesting, but I hope you get the idea. Another great use for lynx is when you don't have a VPN with a lab, but you want to access an online journal which gives you access only if you have the correct domain name. You can log on a machine at the lab, then run lynx; you can use X to open a browser on the lab machine, but it will be glacially slow.

X11

X11 (or just `X') refers to the lanaguage (or protocol) that is most commonly used to transmit and display graphics information on UNIX computers. When you run a program like xterm to create a terminal window, the necessary graphics specifications like foreground colour, background colour, window size, font, etc are encoded using the X11 protocol. Just like the other protocols we learned about, the programs which use X are subdivided into clients and servers. The X server runs constantly on any machine with X support and does the actual drawing of things on your screen. Meanwhile all sorts of X client programs tell it what to draw.

At the heart of the X window system are its programming libraries. There are several of them, and they are collectively known as the X Toolkit. Here are the principal ones:

  • Xlib X basic stuff
  • Xt X Toolkit intrinsic functions
  • Xaw X Athena Widget set
  • Xmu X Miscellaneous Utilities
  • Xext X extension library
  • oldX routines from old version X10

All X11 software typically lives under the directory /usr/X11R6. The R6 designation means `release 6' which is the most current version of X at the moment. The X client and server programs are in /usr/X11R6/bin, so it's important to have this in your PATH. You may also find these programs under some other directory, like /usr/bin/X11, which is simply annoying. >:-_ The libraries live in /usr/X11R6/lib, while other important X components like the font directories are pathologically located in /usr/X11R6/lib/X11. The Man pages for X and its programs are in (you guessed it!) /usr/X11R6/man.

Let's talk about X clients for a second. Basically any program that produces graphics output is an X client, but there are a number of key clients that are distributed along with the X distribution. Perhaps the most important example is the xterm program. Try running it in the background: xterm &. Poof! You have a new terminal window. There is probably some cute button or menu on your screen that will create such a window for you ... all it's doing is running the xterm program. Other clients distributed with X are xclock (a clock), xcalc (a calculator), and the cutesy CPU-chewer xeyes (an annoyance). In addition there are many clients of a technical nature that can be very useful if you quest for X Wizardry. We will encounter some of them below.

One of the most powerful features of X is that it is instrinsically network aware. If you are running on a UNIX machine right now, it is for sure running an X server which is busy drawing all the stuff on your screen. Now open a new terminal window and log in to some remote computer ... here's the neat part: any X client on the remote machine can easily be told to send its graphics instructions to the X server on your desk ... et poof! The graphics output of that application will appear on your monitor. The key is the environment variable DISPLAY, which contains the X address of the server to which all graphics and other X directives should be sent. All you need is the address of your X server, which is easy: if your local machine has IP address mycomputer.uabc.edu, do this on the remote machine (if you are running is running bash):

	export DISPLAY='mycomputer.uabc.edu:0.0'

That puts your remote clients permanently in touch with the X server on your desk. The :0.0 at the end specifies displaynumber.screennumber. These numbers are only different from 0 on elaborate systems where more than one monitor, keyboard, or mouse is being controlled by the same X server. You must specify the display number, even it if zero. But the screen number does default to 0 so you can emit the .0 at the very end.

X forwarding with ssh

Here is a better way of rerouting X instructions. If you connected to the remote machine using ssh, you have nothing to do -- DISPLAY was automatically set when you logged in. If you connected to a machine in the HERMES PC farm, for example, and you do echo $DISPLAY, you will see that it is set to something like localhost:22.0. This seems odd, since graphics are now being directed to your local machine, not the remote machine. But this is how the ssh does its thing: it is has established a new display on the remote machine's X server which simply points back to your desktop monitor.

Most ssh installations don't perform this X forwarding automatically anymore, due to concerns about security. But you can enable "trusted" X forwarding yourself. One way is with the ssh command line argument -XY. The other is more convenient: create your own configuration file for ssh. When you first run ssh on any machine, it will create a directory called .ssh in your home area and a place some technical files there. Create a file called config in this directory, containing the following lines:

        ForwardX11  yes
        ForwardX11Trusted yes

There you go, now full X forwarding will always be enabled. However, you may want to control who is really trusted; in this case, you could leave the "ForwardX11" set in the config, but use the -Y on the command line. Without the "trusted" X connection, some of the commands below such as xrdb, which could be issued on a remote machine, will not affect the X server you are using on your local machine.

Guess what: you can not only export X graphics to a remote machine, you can also retrieve them. The X window dump command xwd takes a screenshot of a remote display and dumps it to a file. For example,

	xwd -out screenshot.xwd -root -display host:0

will grab the entire screen (also called the root window) from machine host and store it in the output file you specified. This graphics file will be written in X's special `window dump' graphics format, and can be displayed by most graphics programs. If you omit the -display host:0 option, the address in the DISPLAY environment variable will be used. If -root is omitted, you are implying that you want to capture only one window not the entire screen ... the program will then wait for you to select a window by clicking on it. This can be quite useful for e.g. capturing part of a Firefox page to a file for printing: you can crop or rescale the image so that it fits on one page.

The xwd command is very useful for capturing screenshots of your display ... but thanks to X networking, you can also use it to grab someone else's screen! Clearly this is something you would like to prevent on your own display! That's what the xhost command is for:

	xhost -

prevents all other users from accessing your X server. This is a very good command to put in one of your startup files! Similarly, xhost + opens access to all users. Now realize that when you use xhost -, you are also preventing yourself from sending graphics from a remote host to your monitor -- the access restriction works in both directions. You can get around this by using the xhost command to allow only some users on some hosts to access your display. But there is a much better way: just connect everywhere using ssh. This protocol not only sets the DISPLAY variable automatically when you connect, it also gives you a clear authorized path back to your local X server.

To summarize the main message of this chapter so far: connect everywhere using ssh. It encrypts your password and manages your X connections automatically.

X Resources

X allows you to easily customize the graphics output of your X client programs. By altering the resources used by a client such as xterm, you can adjust such things as the background colour, the foreground (text) colour, the font, whether or not there's a scroll bar, how many lines are saved, etc ... To do all of this, you do not need to recompile any programs. Resources are simply variables that the client program uses, but special ones that you can supply externally to the program in a couple of ways. Some of these resources are intrinsic to the X Toolkit and are used by all X clients; others are client-specific. If you've ever played with a Macintosh, the concept of resources will be familiar.

Here is an example of an X resource specification:

There are two ways of setting resources. The first is to specify them as command-line options to your X client program. All of the X Toolkit resources are available in this way, so this gives us a good opportunity to list them. Comma-separated entries here denote equivalent ways of specifying the same thing.

	-display display		! display address of X server
	-geometry geometry		! initial size and location of window
	-bg color, -background color	! window background colour
	-fg color, -foreground color	! colour for foreground text or graphics
	-bd color, -bordercolor color	! window border colour
	-bw pix, -borderwidth pix	! width in pixels of window border
	-fn font, -font font		! font for text display
	-title string			! window title (usually appears in title bar)
	-iconic				! start window as an icon
	-rv, -reverse			! enable reverse video (swap fg and bg colours)
	+rv				! disable reverse video

Here are two more which will only make sense in the next paragraph:

	-xrm resourcestring 		! specify an X resource that doesn't have
					! an explicit command line argument
	-name instance			! give this process a special instance name

For the second (superior) method, we must learn a bit of X syntax. Here are 4 examples of X resource specifications. They all instruct the xterm program to use red as the foreground (text) colour, with various caveats.

	XTerm*foreground: red
	XTerm.foreground: red
	xterm*foreground: red
	xterm.foreground: red

The second field, foreground, is clearly the resource name and the third field is its value. The first field is either a class or instance name. A class consists of all the invocations of a particular program. By convention, the class name is just the program's name with the first letter capitalized ... but if the first letter is an X, the first two letters are capitalized. The first two example lines above are of this type, and will affect all xterm processes you run. An instance name, however, may refer to a particular process. The default instance name for any process is just the program name itself, without capitalization. Thus the last two example lines also affect any xterm. However, you can alter the instance name of a process by using the -name command line option mentioned in the previous paragraph. Thus you could define a special type of window with

	myxterm*foreground: blue

which you would summon by typing

	xterm -name myxterm

Finally, about the * and . separator characters: * indicates that the resource setting should be applied to all child windows of the indicated process (loose binding), while the . is a tight binding that limits the setting to the main window only.

Where do you put all your X resource settings? The -xrm command line option mentioned above is one possibility. But the superior way is to place all your resource settings in a file called .Xresources in the home directory of your local machine (its X server is the one running your local computer screen). When your X server first starts up (i.e. when you start a new session) it loads this file. Specifically, the server executes the command xrdb -load ~/.Xresources. If you change your .Xresources file in the middle of a session, you can run this command yourself to activate your new settings. To merge your new settings with your old ones (without throwing the old ones away) use xrdb -merge ~/.Xresources.

The values that each X resource expects come in all different types: integers, strings, true/false, on/off. However there are three common value types that have a special syntax:

geometry resources
	XTerm*geometry: 80x48+0-10

The geometry resource specifies the size and location of the program's window. The syntax is WIDTHxHEIGHT+XOFF+YOFF. The WIDTH and HEIGHT are the number of columns and rows for windows like xterm's which display text. For other windows, they are numbers of pixels. +XOFF and +YOFF specify the coordinates in pixels of the window's top-left corner. The origin of the X coordinate system is the top-left corner of your screen, with the positive X and Y axes pointing right and down respectively. If you specify -XOFF or -YOFF, the offset will instead be taken relative to the right or bottom edge of the screen. Both the WIDTHxHEIGHT and +XOFF+YOFF strings may be specified on their own, if you are not interested in setting the other part of the resource. To examine the geometry specifications of existing windows or your entire display you can use the programs xwininfo and xdpyinfo.

colour resources
	XTerm*foreground: #FF00A3		! old syntax
	XTerm*foreground: rgb:FF/00/A3
	XTerm*foreground: rgbi:1/0/0.7
	XTerm*background: black

Colours can be specified in various ways. The first three examples show how to select a colour by mixing red, green, and blue values, in that order. The first two examples use hexadecimal numbers, while the last one uses fractions of 1. All three examples thus produce purplish colours. The last example illustrates the use of named colours. All available colour names are defined in this file:

	/usr/X11R6/lib/X11/rgb.txt
font resources
	XTerm*font: -adobe-courier-medium-r-normal--10-100-75-75-m-60-iso8859-1
       XTerm*font: -*-courier-medium-r-normal--10-*-*-*-*-*-*-1
	XTerm*font: courR08
	XTerm*font: 6x10

The first line shows a complete font description, which includes many fields. To learn about what they mean, try playing with the font selector program xfontsel. The second illustrates that some of the fields may be left blank, using the '*' wildcard character. You only need to specify enough fields to uniquely identify a font. (If you specify less, some pseudo-random selection will be made.) Note in particular the `10' in this example, which is the font size in pixels. The next field is 100, which is the font size in tenths of a point. (A point is defined to be 1/72 of an inch.) The available X fonts are all defined in PCF (Portable Compiled Format) files located in subdirectories of /usr/X11R6/lib/X11/fonts. The second two example lines above illustrate how fonts may also be selected using just the names of these font files (in this case courR08.pcf and 6x10.pcf.

Finally, you'd probably like to know which resources are available for you to customize your favourite programs. No problem: typing appres XTerm will dump all resources of the class XTerm to your screen along with their current values. If it is not clear what the resources mean, check the program's man page. Finally, there is an interactive utility called editres which allows you to interactively adjust the resources of a running program.

Graphics and Document Creation

Macs and Windoze-boxes are replete with programs to create and display graphics, along with other multi-media formats such as audio and movies. There are also numerous `desktop publishing' utilities for creating format documents such as articles and presentations. Is any of this available on UNIX machines? The answer is yes ... but a qualified yes.

Graphic Formats

Many different formats are available for storing graphics information: GIF, JPEG, EPS, etc. These formats can be subdivided into two categories: vector-mapped and bit-mapped graphics. Here's the difference. Imagine a simple picture consisting of a red triangle with the word `Hello' written inside it in a bold, black helvetic font. A vector-mapped file will encode this picture using the coordinates of the triangle's corners, the word Hello and its coordinates, the font name Helvetica-Bold, plus some codes for the colours red and black. Vector-mapped graphics formats are like programming languages, containing commands for drawing some number of graphics primitives such as circles, rectangles, text, curves, and arrows. A bit-mapped file does not contain such instructions, but instead consists of a big array describing the colour of each pixel in the image. The word pixel (or `dot') simply refers to the smallest-sized unit of information that can be controlled. When a bit-mapped image is displayed on some device, like your computer screen or a printer, its physical size is determined by two numbers: the x-y extent of the image in pixels and the dpi (dots-per-inch) specification of the device. The graphics information on your computer screen is a good example. Your screen consists of some fixed number of pixels in the horizontal and vertical directions, such as 800x600 for a laptop screen or 1600x1200 for a big workstation monitor. Most monitors are actually able to change this resolution. But what they cannot change is the physical size of the screen. If the red triangle image is 200x170 pixels large, it will thus be physically smaller on a 15" monitor at 1024x768 than on a 17" monitor at the same resolution. In case you're curious, the xdpyinfo command will tell you all about your monitor.

Vector-mapped graphics languages are clearly ideal for encoding formatted text and line drawings, while bit-mapped graphics are suitable for detailed images such as photographs which cannot be easily encoded as a finite number of instructions. Vector-mapped graphics offer an important advantage: they can be rescaled without any loss of quality. Taking the example of our red triangle again, if you use a graphics program to blow it up by a factor of 2, it will look just the same if it was encoded in a vector-mapped language. If the triangle is in a bit-mapped language, your size doubling will cause jaggies to appear: the sides of the triangle will no longer be perfect lines but stair-case approximations of a line. The bit-mapped file is not aware that your file contains a triangle, you see, and simply does its best to extrapolate the appearance of the image at twice the size.

Graphics you encounter on the net will typically be in bit-mapped format. The most common formats of this type are GIF (Graphics Interchange Format), JPEG (Joint Photographic Experts Group), and TIFF (Tagged Image File Format). Files in these formats are all binary files. They also include some sort of compression algorithm to reduce file size (e.g. if 1000 adjacent pixels are all the same colour, this can be encoded more efficiently than repeating the colour `red' 1000 times).

The graphics you encounter in physics, namely articles, diagrams, and data plots, will be vector-mapped. The most common format for encoding such information in particle physics is postscript, a language which is spoken by many high-quality printers. An alternative of increasing popularity is PDF (Portable Document Format) from Adobe. PDF is quite similar to postscript, but has certain advantages. First, it is platform independent, thanks to Adobe's vigorous effors to distribute their free Acrobat Reader program to users working on all manner of machines. An important part of this is font inclusion: PDF files may optionally contain a complete description of every font used to create the document. A common source of platform-dependence for formatted-text documents is fonts: different machines have different lists of available fonts, and even those they have in common may be defined in a different way. The second advantage is that PDF documents are searchable. This is a very powerful capability. If you are reading a nicely-formatted PDF document with Acrobat Reader (acroread in UNIX), you have a search (`Find') function at your disposal. This is not possible with any postscript reader available today. Finally, postscript files are always ASCII text files; PDF files may be either text files or compressed binary files.

Displaying and Modifying Graphics

Here is a summary of the most common UNIX utilities for the display, conversion, and manipulation of non-postscript graphics:

ImageMagick
This is a suite of programs that can display, manipulate, and convert-between any graphics format known to man or mouse. man ImageMagick will provide you with a brief description of the 9 programs within this package. The principal ones are:
  • display: Displays graphics of any known format. When you first open a file in the program, left-click on the graphics window to obtain the command window. The menus in this command window allow you to perform many manipulations, such as rescaling, cropping, or saving to another graphics format. (If there is anyone out there who is thinking `GKON' at this point, give me a shout. :-)) This program can even display multi-page files in postscript or PDF format -- press the spacebar to advance through the pages. Further keyboard commands are explained in the Help-Overview menu. But there are better utilities for reading multi-page documents.
  • convert: This is a command-line interface to the format conversion capabilities of display. It is incredibly easy to use:
	convert file.gif file.tiff

will convert a GIF file to TIFF format. ImageMagick follows the GNU standard: convert --help will spit out a full list of supported formats and command line options. Finally, note that all geeks in the Universe are easily identified by their use of a final spurious `k' in the spelling of the simple word `magic'.

xv
This progam is very similar to display, but it offers a few additional options for manipulating images (e.g. altering colors).
gv and ghostview
These are programs for viewing postscript files on your computer screen. They are both graphics interfaces to a package called ghostscript which interprets both the postscript and PDF languages. Of the two viewer programs, gv is more feature rich. One particularly useful feauture is that it can display gzipped ps files directly ... no need to unzip first! In fact, on the HERMES PC farm the command ghostview actually runs gv:-)
acroread
This is the UNIX version of Adobe's mega-popular Acrobat Reader, the freely available utility for displaying documents in PDF format. acroread does include a search function.

So UNIX has many fine utilities for displaying and manipulating graphics. Working with multi-media files like movies is not so easy however. Since you don't need to view movies to perform a physics analysis :-) I will just mention that ImageMagick also handles video format. You will also find that some popular multi-media plugins for Firefox are not available (or do not really work very well) yet for the UNIX operating system. Examples are Macromedia Flash movies and streaming audio/video. But this way, you get more work done. :-)

Postscript

Let me add a few more words about postscript at this point. It is the most common language for the presentation of graphics in the world of particle physics, but we are really unique in this respect -- practically everyone else in the non-UNIX universe is quite unaware of postscript. If you are forced to work on a Windows box, you may obtain a viewer program from the main Ghostscript Site at the University of Wisconsin.

Postscript (`ps') format is designed for documents consisting of one or more complete pages. An important subclass of postscript format is EPS (encapsulated postscript) which is designed for individual pictures such as data plots. (You will also see formats called EPSF and EPSI lying around ... these are basically identical to EPS.) There are only two significant differences between PS and EPS. First, EPS files contain a bounding box, which specifies the coordinates of the image's outer edges. This tells a publishing program like Word or LaTeX how much space the figure will occupy when it is included in a document. The second difference is that PS files contain a final command called `showpage' which instructs the printer to print a page. EPS files do not contain this command since they are meant to be included within a page ... you don't want the printer to eject a page everytime it encounters a figure. This means that you (usually) cannot print EPS files directly. You must first create a PS document using some package such as LaTeX.

If you do man -k postscript, you will find quite a number of utility programs for working with postscript. Here are some useful examples:

  • ps2epsi converts a postscript file to EPS format by figuring out the bounding box for you.
  • psnup can make a postscript document more compact by placing multiple pages on a single piece of paper. For example,
	psnup -2 file.ps > file2.ps

will place two pages from the original file side-by-side on a single page in the output file. Saves paper. :-)

  • a2ps and enscript are fast ways of transforming an ASCII text file to postscript. Note that some printers can only process ps files, making these utilities essential. But apart from that, these utilities can turn your plain ASCII files into beautiful documents with nice fonts and informative headers. Here are some examples:
	a2ps -1 -p -nP file.txt > file.ps	! non-GNU version
	a2ps -1 --out=file.ps file.txt		! GNU version
	enscript --header='__$n %W    Page $%' --media=A4 --font=Courier@12 \
		--output=file.ps file.txt

As I mentioned before, postscript files are plain ASCII text files, so you can open them in an editor or pager. For one thing, the first line of the file will tell you if the file is in PS or EPS format. Also, let me point you to one of the links on our HERMES Documents page: Tips for Manipulating Postscript. This page contains some scripts that edit ps files to do various useful things: e.g. converting from A4 to US-letter paper size, or extracting EPS figures from a document.

LaTeX and dvips

At some point you will need to create a formal document to describe your work: a talk, internal note, publication, or thesis. The standard `publishing' package in particle physics is not Microsoft Word, but LaTeX. Every one of the internal notes, publications, and talks that you will find on our Documents web page were created with LaTeX. Amongst its many virtues, it is by far the most powerful package available anywhere for working with mathematical symbols. Another key virtue is that, unlike MSWord or MSPowerPoint, LaTeX has never corrupted a file.

LaTeX is a programming language for producing formatted text, and you must learn it. The standard book that everyone uses is LaTeX: A Documentation Preparation System User's Guide and Reference Manual by Leslie Lamport. I'm not going to teach you LaTeX, but I will describe the way the system works. You create a file (conventionally called something.tex) containing all your text and formatting commands, then you run the latex command on it. This produces a file called something.dvi, which is in so-called DVI (DeVice Independent) format. Finally, you run a program called dvips on your DVI file to turn it into postscript. Note that there is a program called xdvi which can display the DVI file directly.

Download: latex-example.tar.gz

I've placed some examples in the tar file above, which contain a few neat LaTeX tricks. One file, called fb2000.tex, is a conference proceeding while another one, lambda-pol.tex, is a slide from a talk. Here's how you create the postscript files from these LaTeX files:

latex fb2000.tex
latex fb2000.tex (run again to get cross-references right)
dvips -t A4 -o fb2000.ps fb2000.dvi

... and the same thing for lambda-pol. Also included in the tar file are the EPS files for the figures; these are pulled into the document by dvips. The fb2000 file illustrates the use of a style file, called epspcrc1.sty. This file was distributed by the conference to provide a uniform appearance for all proceedings. Some of the formatting commands in that file (e.g. \maketitle) are not standard LaTeX, but are defined by the style file. The two examples also illustrate the use of some standard add-on packages, such as wrapfig, graphicx, and fancybox as well as the "foils" class from foiltex. These are sufficiently common that the relevant style files are probably included in your system's LaTeX distribution. If you don't have them, you can visit the CTAN Online Catalogue which contains a huge number of LaTeX add-on packages.

Finally, note that dvips allows you to specify the paper size (here -t A4 ... the North American standard is different: -t letter). You can also specify -t landscape if you want your document to come out in landscape rather than portrait orientation. I have seen confusion arise when people use the landscape option in their LaTeX files, but then forget to inform dvips about it.

Creating Graphics

It is quite likely that you will want to create graphics at some point, such as schematic drawings. The Mac/Windows world is replete with giant, sophisticated programs for this type of thing: Photoshop, PowerPoint, Canvas, Illustrator, etc. Most of these packages are not available for UNIX machines. Here are your principal choices:

xfig
This is a sparse but very nice little drawing program, and is still the most popular choice of particle physicists. It is extremely easy to use and has a nice man page. People often use xfig for creating Feynman diagrams, but it is a bit annoying since you first have to construct things like photon and gluon symbols using very basic primitives. To get you started, you can have a look at this file: feyn-template.fig. It contains some pre-made photons, gluons, and other symbols. Also, there is a very nice program called pstoedit which can convert postscript files to xfig format, thereby allowing you to alter them.
OpenOffice
Microsoft Office, the ultimate in feature-rich bloatware, is certainly not available for UNIX machines. However there is a package called OpenOffice which works on Linux machines and mimics many of the features of MS Word and PowerPoint. For one thing, it is the only UNIX way you can edit files in native Word or PowerPoint format ... that can be quite useful. But further, it is a fine desktop-publishing program. As Linux becomes more and more prevalent, people are turning to OpenOffice for writing talks and creating figures. If OpenOffice is available on your Linux machine, you start it with the command soffice.

HERMES Programming

As I mentioned before, you must be familiar with either C or FORTRAN to work with the HERMES software suite. So I'll assume that you have learned one of these languages ... on to some practical information on how to use your programming skills! Some of you are already C++ programmers and there are tools in development for you, but since the basic HERMES suite is NOT in C++, we'll continue to work only with C and FORTRAN at this point.

Compiling and Linking

To create a program in C or F77, you write a source code file using your favourite editor and then compile it to create an executable program in machine language. Here's the standard `Hello World' example of how it works. First in C ... create a file called junk.c containing these lines:

 	#include <stdio.h>
	#include <strings.h>
	#include <math.h>
	int main ( int argc, char **argv ) {
	  printf ("Hello World. :-) \n");
	}

Now compile it using the program cc and run the resulting executable program:

	cc -o junk junk.c
	./junk

Now in FORTRAN ... create a file called junk.F containing these lines (be careful to start each line with 6 blank spaces!):

	program junk
	print *, 'Hello World. :-)'
	end

The FORTRAN compiler is called f77:

	f77 -o junk junk.F
	./junk

There are also gnu versions of these two compilers: gcc and g77. Again, these have become so widely used that you will find that cc and f77 are actually already the gnu versions (on the HERMES PC farm for example).

The Compilation Sequence

Now that was all very simple ... but what the compiler actually did was call a sequence of programs to perform these four functions: preprocessing, compilation, and linking. The cc program is often called a compiler, but it is really a compiler driver -- an interface to various other programs.

Following is the basic sequence of steps performed by compiler drivers such as called by cc and f77. At the beginning of each section, I give the command to stop the process each stage (in case you are fascinated by all this :-) and would like to see the output).

Preprocessing

	gcc -E junk.c -o junk.i 	! make preprocessed file junk.i

Notice those #include instructions in the C example above? Those are not part of the C language at all, but are interpreted by the cpp preprocessor. There are only about 10 preprocessor commands and you can learn about them on the cpp man page. (They all start with a # symbol.) What the preprocessor does is basically some on-the-fly editing of your source code before the compiler gets to it. The #include directive means: insert the text from the indicated file right here in the source code. Another popular cpp instruction is #define, as in #define EBEAM 27.5. The preprocessor cpp will now replace all occurrences of the string EBEAM in your source code with the string 27.5. The preprocessor is somewhat context sensitive: string replacement will only be done when the redefined variable appears as a lone word and outside a quoted string. For example, this works:

	#define XXX printf
	#define YYY 3
	int main() { XXX ("I am %d years old.\n", YYY); }

but this fails:

	#define YYY 3
	int main() { printf ("I am YYY years old.\n"); }

and this will not compile:

	#define XXX rin
	int main() { pXXXtf ("I am 3 years old.\n"); }

The redefined strings XXX and YYY are not called variables, but preprocessor macros. This reflects the fact that they are often used to define entire functions, as in

	#define FCSTRCPY(out,in)  { \
	int __n; \
	__n = sizeof(out); \
	strncpy(out,__blanks,__n); \
	strncpy(out,in,MIN(__n,strlen(in))); \
	}

As a final note: preprocessors are not only used by compiler drivers. Remember the .Xresources file we talked about earlier? It is read by xrdb to customize your X11 resources. You will see from the xrdb man page that this program runs cpp as its first pass ... so your resource file can contain instructions like #include and #define.

Compilation

	gcc -c junk.c -o junk.o 	! compiled object file junk.o is in (binary) machine code

This stage transforms the preprocessed C or F77 instructions into an object file containing machine code. The output of this stage is in binary format, consisting of instructions that your machine can process. It is at this stage that your file becomes really system-dependent. The most common format for UNIX object code is ELF (Executable and Linking Format), although you may still find the older aout format around. Since most humans don't read binary easily, you can get a translation of the binary into an ascii file in which each machine command or variable is given a name. This is the most primative computer code, known as assembly language and is very specific to the actual CPU command set. You can see this code for our example by running the command:

	gcc -S junk.c -o junk.s 	! assembled (object) file junk.s is in assembly machine language

One could use the assembler (usually called as) to compile codes directly written in assembly language into a binary object file in machine code, but there are very few cases anymore where this is necessary.

Linking (also called Loading)

	gcc junk.o -o junk	 	! linked file junk is executable

Let's have a look at the contents of your object file junk.o. Do this:

	nm junk.o

There is similar GNU command called objdump. When invoked with the -t option it does the same thing as nm ... but on GNU systems, objdump's output is more intelligible. These commands can be spectacularly useful. They dump all the symbols present in your object file! Here are two (partial) example lines from the nm command you ran above. On the PC farm, GNU's nm command will produce these corresponding lines:

   00000000 T main
            U printf

and here's what objdump -t produces:

junk.o:     file format elf32-i386

SYMBOL TABLE:
00000000 l    df *ABS*  00000000 junk.c
00000000 l    d  .text  00000000 
00000000 l    d  .data  00000000 
00000000 l    d  .bss   00000000 
00000000 l    d  .rodata        00000000 
00000000 l    d  .note.GNU-stack        00000000 
00000000 l    d  .comment       00000000 
00000000 g     F .text  00000022 main
00000000         *UND*  00000000 printf

(Geez, things just have to look more complicated under Linux ...) Anyway, let's go back to the first output format. The class 'Proc' refers to a procedure. Your code contains a procedure called 'main', which is the main routine you wrote. But it also contains a reference to procedure 'printf' which is seems to be 'Undefined'. Here's the point: your machine cannot run your object file junk.o because it doesn't have any machine code for the printf command. This machine code is available in one of the C standard object libraries. The loader program ld locates this library and `pulls in' the needed object code. This finally creates an executable which you can run.

More about Linking and Libraries

Let's talk about that last linking phase a bit more.

First, about libraries. An object library is nothing more than a collection of object files all collected together into one using the ar command. ar is very similar to tar:

	ar c libMyCode.a code1.o code2.o	! Create object library
	ar r libMyCode.a code1.o		! Replace module in library

This example illustrates the naming convention: libSomething.a Further, libraries created and named like this are called static libraries. The other alternative is shared object libraries, named libSomething.so.

In our simple example earlier the loader needed to pull in code for the printf procedure from some library ... which one? On the PC farm (in tcsh!), this power command (introduced earlier) might give us the answer:

	find /usr -name 'lib*.a'  -exec nm -go {} \; _&  grep " T printf"

(On other systems the output of the nm command may be different and you may have to modify this a bit.) We find that the printf procedure is defined in several librarieson the PC farm: libc.a located in the directory /usr/lib, libc.a located in the directory /usr/lib/nptl,and libc_p.a located in the directory /usr/lib/. You did not have to specify the name of this library because it is one of the standard C libraries that the loader ld automatically uses when called specifically by gcc. But how can we tell which library the object code for printf was taken from? We tell the compiler to tell us! As painful as it is, in this case it is important to tell the loader ld to do this by passing it commands via gcc, so that we really see what happens when we link as part of the gcc command. So here's the beast:

     gcc -Wl,--cref -Wl,-Map -Wl,junk.map -o junk junk.c

This command creates a linkmap file junk.map that contains all the gruesome details about the linking of all the object and the matching of symbols and other things you may not want to ever know about! But it is a gold mine for some things. In this particular case, if you search on printf, you will first find it attached to another symbol (GLIBC_2.0 in my case). If you search further in the cross reference table you will find this same printf@@GLIBC_2.0 references to /lib/lic.so.6 so in fact, the printf object didn't come from any of the regular object libraries we found above (hence our warning above in Power Tools). Most of the time you don't need to create this monster map file, but every now and then when you are not sure just where the object code is coming from, it provides a definite answer.

Back to everyday concerns. If your code called instead a routine from a non-standard library, you must specify the library explicitly on the command line. Here's an example: suppose you wrote a program called random.c which calls the random number generator ranecu from the CERN library libpacklib.a. You would compile your code as follows:

	cc -lpacklib -o random random.c

The argument of the -l option is passed to the loader as an additional library to search for needed routines. But there is one more wrinkle: the loader has to find the indicated library. Where is it? It may be in one of the standard library directories that the loader checks ... if so, you're done. The standard search path for ld may be found either at the end of the ld man page or in the file /etc/ld.so.conf (Linux systems only). It contains such directories as /usr/lib and /usr/local/lib. If your library is not in one of these directories, you can specify additional directories to search when you run cc. Typically libpacklib.a is taken from the directory /cern/pro/lib/libpacklib.a ... here's how you inform the loader where to look:

	cc -L/cern/pro/lib -lpacklib -o random random.c

More than one search directory can be supplied, but they must all appear before the relevant library specification in the command line. Here is a typical example for HERMES software: our programs usually need libtap.a from the ADAMO package and libpacklib.a from CERNLIB (amongst other things):

	cc -L/hermes/adamo/lib -ltap -L/cern/pro/lib -lpacklib -o random random.c

You must also realize something else: the order in which the libraries themselves are specified on the command line can be important, because the loader only searches forward in the list of object libraries to resolve any undefined symbols. In this case, ADAMO is a CERN-based package and so the ADAMO library libtap.a (or .so) calls routines from libpacklib.a. The -ltap instruction must thus appear before -lpacklib on the command line. Occasionally you may find a strange situation where library A calls routines from library B, and vice versa. In this case you must repeat one of the specifications, as in

	cc -lA -lB -lA -o random random.c

In the previous subsection our simple junk program involved only one source code file, junk.c. Usually, however, programs are split up into multiple files, each containing one or more related routines. The standard way of compiling such a program is to first turn all the source files into individual object files: we run the compiler on each source file with the -c option to make it stop before the loading phase. Finally we link all the object files together (along with any necessary library code) with one final command. Suppose we have three source files: main.c containing the main program, and sub1.c, sub2.c containing subroutines called by main. Further suppose that these routines need code from the non-standard library /disk1/lib/libfrog.a. Here is the typical compilation sequence for such a program:

	cc -c -o main.o main.c
	cc -c -o sub1.o sub1.c
	cc -c -o sub2.o sub2.c
	cc -o main main.o sub1.o sub2.o -L/disk1/lib -lfrog

If you find yourself constantly using libraries from one or more directories, you may want to make them part of the standard search path. You can do this by placing the directories in the environment variable LD_LIBRARY_PATH. This was mentioned before: LD_LIBRARY_PATH is a colon-separated list of directories for ld to search. It is analogous to the PATH list of directories that the shell searches for executables ... except that the library path is added to the standard list rather than replacing it.

Common Compiler Options

We have learned that `compilers' such as gcc and f77 are actually compiler drivers which call other programs to perform the preprocessing, compiling, assembling, and linking functions. These drivers can take many command line options to modify their behaviour ... but it is important to realize that some of these are handed to the preprocessor, some to the compiler, etc.

Following is a very brief introduction to a few of the more common compiler options. The complete list is not only very long, it also changes drastically from one compiler or system to the next. The man pages, of course, tell the full story. But on we go ...

-l and -L
We already covered this ... the -l and -L options are used to specify non-standard libaries and their directories.
-I
Just as -L indicates a non-standard directory to search for needed libraries, the option -I is used to provide a non-standard directory for include files. Clearly it is the preprocessor which receives this search path, so that it knows how to resolve your #include directives. The preprocessor does have a standard list of directories to search, and these are searched after those you specify with -I. The standard search path always checks your local directory first, and also includes such locations as /usr/include and /usr/local/include.
-O and -g
Compilers are pretty smart: if you ask them nicely (using the -O option) they will try very hard to optimize your code. Further, you can request various levels of optimization. Usually -O1 (weakest) through -O3 (strongest) are available. Here is an example of a badly written FORTRAN function to multiply an integer by 3:
	integer function triple (i)
		integer j, k
		j = i
		k = j*3
		triple = k
		return
		end

Obviously the local variables j and k are utterly useless ... the entire procedure could have been coded like this:

	integer function triple (i)
		triple = 3*i
		return
		end

An optimizing compiler will realize this, and the redundant variables j and k will not appear at all in the assembler version of your routine. This is a good thing! However, it will make life a bit interesting if you try to run your code through a symbolic debugger. The standard debuggers are gdb and dbx. They basically work like this: if your executable program is called junk, you can run it within a debugger by typing e.g. dbx junk. This will get you to a command-line prompt -- dbx and gdb are interactive programs. You can now enter a variety of commands that let you run the junk program in a controlled way and examine its operation. For example: you can place `break points' at particular spots in your code. Program execution will stop when it reaches those points, and you can then use other debugger commands to examine the contents of the program's variables -- to figure out what's happening. This lovely procedure is obviously confused when certain inefficient variables and/or instructions have been completely erased from your code by your compiler! So: the -O option does not coexist well with symbolic debuggers. I will not explain dbx or gdb any further. They are nifty utilities, to be sure, and I used to use them a lot. But I have since decided that the `print statement' method of debugging is just as efficient. ;-) The -g option is the opposite of -O in that it is designed to faciliate symbolic debugging (at the expense of optimization). -g also takes level numbers, just like -O. The -g option more-or-less ensures that all symbols (variables, subroutines, functions, ...) are preserved in the executable so that the symbolic debugger knows how to look up the value of any object used in your program.

-Wl,-option (linker options) and -Wa,-option (assembler options)
I said earlier that some options given to the compiler driver may be handed to the preprocessor, some to the compiler, etc. Well, -Wl,-option allows you to explicitly send -option directly to the loader. The list of loader options can of course be found on the ld man page. -Wa does the same thing for the assembler. In the search for printf above, we already used these options to ask the linker for a cross-references map file.
-static or -no_shared
These options (which one depends on compiler version) prevent linking with shared-object libraries. Further, -static forces all local symbols to be bound to a global address. This is traditionally the default in FORTRAN: local variables that are defined and used in routine X may be expected to contain the same value when you next call the routine X. But the UNIX operating system is not naturally FORTRAN-aware. Unless you tag a local variable with the save command or place it in a common block, it will not necessarily contain the same value when you come back to it -- the local variable's address may have been randomly reassigned. Much CERN software is written by FORTRAN programmers of old, who assume that the static memory allocation is on by default. It is therefore prudent to compile any programs involving CERN source code with -static.
-Dmacro
This option defines the indicated preprocessor macro. Thus you can use the preprocessor along with this command line option to enable or disable entire sections of code. Here is an example taken from a real F77 code:
	#ifdef USE_RCOFFSET
		write(6,'(/Initializing momentum lookup table .../)')
		call rcOffset_init(iok)
		if (iok.ne.0) then
		  print *, 'ERROR: could not load momentum lookup table.'
		  stop
		endif
	#endif

If the macro USE_RCOFFSET is defined on the compilation command line using the option -DUSE_RCOFFSET, this section of code (and other related sections) will be included from the source during compilation. Very convenient!

-check_bounds
Here is an example of very bad FORTRAN coding: you declare an array with REAL*8 ARRAY(10), and then later write to element 11 of this array. If you do something like this, very bad things will happen: Some F77 compilers support this option, which gives you an error whenever you write to an array outside its dimensioned array bounds.

An important note to FORTRAN programmers: always use the preprocessor's #include command, rather than FORTRAN's own include statement. Reason: there is a utility called makedepend which conveniently goes through a source code tree pulling out all include-file dependencies. This is great for creating Makefiles! But it doesn't recognize the F77 include statement, so stick with #include. Also, name all your source files *.F, not *.f. This is the only way to tell the GNU compiler g77 that it should first run the preprocessor on your files!

Makefiles

A few paragraphs ago we encoutered a sequence like this for compiling a program with three source code files:

	cc -c -o main.o -I/disk1/include main.c
	cc -c -o sub1.o sub1.c
	cc -c -o sub2.o sub2.c
	cc -o main main.o sub1.o sub2.o -L/disk1/lib -lfrog

This time, we'll suppose that the code in main.c also requires an include file called inc.h and located in the directory /disk1/include. I've thus added a -I directive to the first command to let the compiler know where to find the include file. Now it is natural to write a script containing these 4 commands ... you can then recompile your program by typing in only one command. However, most programs have much more than 3 source code files, making the compilation script rather long ... and further, each file can take a significant amount of time to compile. So let's say you modify only sub1.c and wish to recompile the program. It is obviously inefficient to run your script and recompile everything: all you need to do to update the executable is remake the one object file sub1.o and relink.

Sadly, your simple shell script is not that intelligent. What it does not understand (without a great deal of coding) are the concepts of dependencies and modification times. Enter the make utility: it is a specialized scripting language that is designed to deal precisely with these two concepts. The command make is like sh or tcsh. When you run it, it looks for a script file called Makefile in your current directory and executes the instructions within. Here is an appropriate Makefile for our example:

	# Define variables

	OBJ = main.o sub1.o sub2.o
	LIBS = -L/disk1/lib -lfrog
	INCLUDES = -I. -I/disk1/include
	CC = gcc
	CFLAGS = -ansi -g3

	# Define rules

	all:	main
	main: $(OBJ)
		$(CC) -o $@ $(OBJ) $(LIBS)
	main.o: main.c inc.h
		$(CC) -c $(CFLAGS) $(INCLUDES) -o $@ $*.c
	sub1.o: sub1.c
		$(CC) -c $(CFLAGS) $(INCLUDES) -o $@ $*.c
	sub2.o: sub2.c
		$(CC) -c $(CFLAGS) $(INCLUDES) -o $@ $*.c

Since you know what this script is supposed to do, you can figure out the basic elements of the make scripting language from this example. Here's some explanation:

Internal Variables
OBJ, LIBS, etc are internal variables, and are dereferenced using the syntax $(OBJ). If a variable is not defined within the Makefile, its value is taken from your environment: if you have an environment variable set with the same name, its value will be used by make.
Rules, Targets, & Dependencies
The meat of the Makefile is the set of rules you define. Take this rule for example:
	main.o: main.c inc.h
		$(CC) -c $(CFLAGS) -o $@ $*.c

This tells make how to create the object file main.o from the source file main.c. The thing you want to create (main.o) is called the target, and the files it depends on (main.c and inc.h) are the dependencies. The target all is special: that is what make will create if you run it without any arguments. (Alternatively, you can run make sub1.o to update only that target.) As you see, there is a cascade of dependencies: the final target all needs the target main, which in turn needs all the object file targets. The magic of make is in these dependencies: the program will only recreate a given target file if its modification date is older than that of one of the files on which it depends. Thus if you edit only inc.h and type make, the sub1 and sub2 modules will not be recompiled since make knows they won't change. This is an enormous time saver for large programs with many modules! However, you sometimes may want to force recompilation of a target despite the modification dates ... here the touch command is very useful as it instantly changes any file's modification time to the present time.

Syntax
Here is the syntax for writing rules:
	target: dependecy1 dependency2 ...
		shell-command1
		shell-command2
		...

The target must be followed by a colon, and the shell commands must all be preceded by tab characters. And yes, those are real shell commands that you are writing ... to be precise, they are vanilla Bourne shell commands. (You can change this with make's SHELL variable but I'll let you look that up if you're interested.) You will be happy to recall that vanilla sh supports line continuation, so this sort of thing can be used for long rules:

	dist:
		tar -zcf frame.tar.gz configure Makefile.in $(TARSRC) $(TARINC) \
		    README README.rcOffset ddl/makeddl $(DDL)
Automatic Variables
Some variables have special meaning to make, like $@ and $* in this example. Both refer to the rule's target: $@ stands for its full name (e.g. sub1.o) while $* stands for the base portion before the extension (e.g. sub1). These variables are merely a convience in this example, but as you will see below they are crucial in the writing of more sophisticated rules. For example, we can rewrite the last three rules of our example using an implicit rule:
	.c.o:
		$(CC) -c $(CFLAGS) $(INCLUDES) -o $@ $*.c
	main.o: inc.h

The implicit rule tells make how to create any *.o file from a source file called *.c ... the last line specifices the additional dependency of main.o on the include file inc.h. One warning: for reasons I don't understand, only certain extensions are permitted in implicit rules. Typically these relate source code files in familiar languages (*.c or *.F) to object files (*.o). An implicit rule starting with .Q.o for example will fail ... don't ask me why.

Common Targets
Here are some common targets that you will find in the majority of Makefiles, both inside and outside of HERMES:
  • install: After running make to compile everything, one can often run make install to copy the new executables and/or libraries to standard locations (e.g. /usr/local/bin and/or /usr/local/lib).
  • dist: Create a tar file for distribution of the software.
  • clean: Clean up the area by deleting all files that can be created from others (.e.g. all object and executable files). This target is useful to give you a clean slate if you want to recompile everything. Sometimes it is run before a distribution tar file is made, but a well-constructed Makefile should make this unnecessary (why should you have to recompile everything every time you make a new tar file?)

As I said before, be sure that you are using the GNU version of make. As usual with GNU, there is an excellent manual available.

CFORTRAN

The HERMES software suite is written in both C and FORTRAN so it is important to be able to have FORTRAN routines call C routines and vice versa. In addition, the routines may be using the same variables kept in a common memory block, and in order for both C and FORTRAN routines to be able to use the blocks consistently, one has address them properly, especially character strings.

Interfacing C and FORTRAN is in general quite difficult and system-dependent. Fortunately, Burkhard Burow (from DESY) has written a comprehensive set of macros which make this easier for us; they are located in a single C file called cfortran.h, which is located in directory /hermes/include. Note that these macros only define a framework - a further set of macros which describes the routine types plus argument types must be supplied by the user. For any given package such as ADAMO (which is written entirely in FORTRAN), this can mean writing several hundred macros! Fortunately, this has been already been done once for every package used in HERMES. The ADAMO routine macros, for example, are in /hermes/include/adamo.h. To call any ADAMO routine from a C routine one needs only to add the following "include" statements:

#include "cfortran.h"
#include "adamo.h"

That's pretty much it for the coding, but a few tricks are stiil needed for compilation and linking. For compilation, one needs to specify a system type via a macro defintion used in the "compile" command (for gcc this is -Df2cFortran). For linking, one needs to link using the C compiler command and specify additional FORTRAN object modules and FORTRAN system libraries. These additional FORTRAN objects, unfortunately, do not have standard names or locations; e.g. for the PCFarm one needs to add the string

-lfrtbegin -lg2c -lm -L/usr/lib/gcc-lib/i386-redhat-linux/3.2.3 \
  -L/usr/lib/gcc-lib/i386-redhat-linux/3.2.3/../../.. -lm -lgcc_s?g

You should be aware that all these nasty compilation and linking details are handled for you if you use the standard HERMES "configure" scripts together with a user-written "Makefile.in" file (see Porting user software to Linux for an explanation of this).

For a complete example of use of CFORTRAN with ADAMO (including "configure" and "Makefile.in" scripts), please see the MISSION 3 section of BOOTCAMP.

Continue to MISSION 2: HERMES RECONNAISSANCE.


N.C.R. Makins (makins@uiuc.edu)