"Those who don't understand Unix are condemned to reinvent it, poorly."
Henry
Spencer, well-known Unix systems programmer
This document does not assume any experience with Unix, but it does assume experience with computer systems in general and a willingness to learn. The purpose of writing this document is twofold:
Most documentation can be divided classified as a "reference" or a "tutorial" -- this is more a tutorial. It does not go in-depth into any subject (that's the purpose of reference documentation), but it provides a general overview of some of the issues you will encounter when you first start playing with unix.
An Operating System is the most important software component in a computer. Operating systems are written to provide a standardized interface to a computer's hardware; programmers who write programs which run on top of operating systems can communicate with the hardware by way of the operating system, and the programs they write will be simpler and easier to move to different computer systems since only the operating system needs to know the pecularities of the particular hardware. Modern operating systems also allow preemptive multitasking - although only one program can run at a time in a uniprocessor system, the operating system can interrupt any program to allow another program to run. Modern operating systems protect programs from one another so that if one program has some sort of bug, it will not affect other programs, nor will it bring down the system. Multiuser operating systems either allow more than one user to use a computer at once, or only allow one user at a time, but keep the environment of different users separate, depending on the definition of "multiuser". Unix is a modern multitasking multiuser operating system.
user@hostname:/dir$ _
(Where the underscore represents your cursor.) The most important
information you should gain from this is the current user. Although
this may appear in your prompt explicitly, the "$" sign also indicates
the current user. Unix systems have two basic types of accounts - user
accounts and superuser accounts, usually "root". The "root" account has
complete and total control over the system, and should only be used
when required. More information about this can be found in the section on
account management.
The "$" in the above prompt tells you two things - 1. this is a user
account, and 2. this is a Bourne-type shell. In C-type shells, the "$"
is usually replaced by a "%", but in both shells, the superuser prompt
should end with a "#".
You may be wondering what "Bourne" shells and "C" shells are. These are just two types of shells - since shells are full programming languages, these are like dialects of the same language, like Scheme is to Common Lisp, for example. In Linux systems, the default shell for both root and user is "bash", which is a Bourne shell. In modern BSD systems, the default shell is "tcsh", which is a C shell; most Linux users never use C shells. We'll get back to differences between Linux and the freely-available BSDs later.
Earlier it was mentioned that shells are full-programming languages. Specifically, shells are interpreted languages, which means they are not like C where programs must be translated before they are executed, but they are like lisp or ML, where statements are executed as the interpreter sees them. This means that it can sometimes be very convenient to write a shell program, save it in a file and then execute that shell program at some later time. Indeed, this is very common - when a Unix system boots up, the first program it will execute will immediately call some shell scripts to do the rest of the system set up. Most of the things you can change about a Unix system are changed through shell scripts, so it is important to understand them. ALL shell scripts that are executed at boot-time should be written for a Bourne shell. Although you may want to learn the C shell for your own edification, you must learn the Bourne shell in order to change anything non-trivial about a Unix system. For the most part, the same features are available when writing a shell script or using the shell interactively; first we will show some things common in interactive shell usage, and then some features more common in shell scripts.
In order to fully understand how a shell works, one must understand the Unix process model, so we will take a quick digression from shell concepts; the Unix process model is one of the most popular features of Unix and has also been copied in other systems.
A process in Unix is a particular invocation of a program; in other words, it is a program running on the computer. You may have more than one invocation of the same program, and both processes are completely independent. If a program is launched twice, you will have two processes; if one of those processes crashes, it will not affect the other one. It is important to make the distinction between a process and a thread: a thread "part of" a process. A thread represents a particular part of the process that's running: threads are popular because on multiprocessor machines, a multithreaded program may run on both processors at the same time. Since a thread is "part of" a process, if a thread does something to make the program crash, the entire process will crash. You will generally never have to worry about threads, and they are only mentioned in order to note that they are not processes.
Processes are arranged hierarchically: each process has a "parent" process and may have a number of "children" processes, which themselves may have children processes. Processes are always created the same way: a process must "fork" to create a new process. When a process "forks", it creates another process which is an identical copy (except for a few things noted below). The new process (which is identical to the old process) is called the child and the old process (which does not exit upon forking) is called the parent. If all one had to work with processes was "fork", then it would only be possible to run one program. In order to run a different program, a process must replace itself with the new program to be run. This process is called an "exec". After a process "execs", it is no longer running the same program, but rather just started running a new program. Note however, that the process is still the same although it is running a different program (ie, it still has the same parent).
Each process has several pieces of data associated with it:
You can use the "cd" shell builtin to change the current working directory of the shell process. If you truly understand the Unix process model, you will see why "cd" must be a shell builtin and cannot be a disk command. In a similar vein, it is not possible for a disk command to change the value of an environment variable in the shell. See the section on environment variables to see how to manipulate the shell's environment. See the section on process management for information on directly managing processes.
#rca-linux-presentation.html# foobar2 src try3 try5 GNUstep mbox try.c try3.c try5.c bin mymacros.h try2-no-optimization.s try3.s try6 foo.awk rca-linux-presentation.html try2.c try4 try6.c foobar sendmail.cf try2.s try4.c
The shell will expand the following patterns on the left to the files on the right:
*.c try.c try2.c try3.c try4.c try5.c try6.c *c src try.c try2.c try3.c try4.c try5.c try6.c *ste* GNUstep foo* foo.awk foobar foobar2 *foo* foo.awk foobar foobar2 try?.c try.c try2.c try3.c try4.c try5.c try6.c foo???? foo.awk foobar2 *nogood*
Note the last two examples - "?" must match one and only one character, "*" may match zero or more characters, and patterns need not match anything at all. When a pattern does not match anything at all, the shell passes the actual pattern to the program as an argument. Patterns may match either files or directories (in the second example, "src" is actually a directory), since a directory is nothing more than a special type of file to Unix.
As you may have noticed, several of the example files had a ".c" extension, indicating they were C program listings. If you come from a DOS/Windows background, you should know that file extensions mean nothing to Unix - they are simply for the user's ease in identifying files. Also, MS-DOS converts should note that unix filename expansion is very different from DOS filename expansion. If you are at a DOS prompt and you type "pkzip *.c", DOS invokes the "pkzip" program and passes it one argument, "*.c". It is up to "pkzip" to figure out how "*.c" is supposed to be expanded, so each DOS program may do filename expansion differently (whereas in unix, no program does any filename expansion - filename expansion is handled by the unix shell). DOS is completely and absolutely different from Unix.
If you come from a MacOS background, note that all Unix files are "flat" - there is no such thing as a resource and data fork, and there is no way of identifying the type or creator of a file other than by looking at the file's contents.
Shell expansion also varies from shell to shell, and much more complicated steps may be involved; we'll get to another type of expansion later.
You may want to examine or change your path variable at some point. To change an environment variable in a C shell, type in
PROMPT% setenv VARIABLE=VALUE
setenv is a C shell builtin command - it could not be a disk command, since if it were, the "setenv" program would be launched, inheriting the parent's environment (in this case, the parent process is the shell), and then the "setenv" program would modify its own environment, not the shell's environment. Note that the C shell has has some additional things to deal with when working with environment variables. Variables which are important to the shell (such as PATH and the variable describing how to set up the shell prompt) are dealt with specially. For example, to add '/foo' to your path in the C shell, you would do something like this:
PROMPT% set path=($path /foo)
The Bourne shell does not make any distinction like this; in the Bourne shell, you can set a variable's value by typing in
PROMPT$ VARIABLE=VALUE
The Bourne shell works differently from the C shell in that not all environment variables are automatically inherited by child processes. To mark a variable for export to subsequent child processes, type in
PROMPT$ export VARIABLE
The semicolon ";" character works just like a newline (the character generated when you hit "enter"), so this setting a variable in a Bourne shell is often seen as
PROMPT$ VARIABLE=VALUE ; export VARIABLE
or
PROMPT$ export VARIABLE=VALUE
This last syntax works in almost all Bourne shells, so you can feel free to use it interactively. However, if you are writing a shell script to be used in by other people, you should always use the 'VAR=VAL ; export VAR' syntax.
To examine the contents of an environment variable, you must use variable expansion. Variable expansion is an expansion step that is taken before the type of expansion named above (pathname expansion). Expansion includes several other steps (which are different in various shells), but pathname and variable expansion are the most important. To expand the name of a variable to its contents, simply prepend a "$" sign to the variable. For example, $PATH would expand to the value of your path, $HOME would expand to the name of your home directory, etc. The easiest way to see the value of a variable is to use variable expansion to pass the value of the variable to the "echo" program, which simply shows you the arguments it was passed ("echo" is actually a builtin in some shells, but that's besides the point). For example, echo $PATH would show you your PATH.
Another thing you need to know about the shell is redirection. Since many Unix programs read from the keyboard and write to the terminal and most data in Unix is stored in text files, it is often useful to redirect either the input or the output of a program to or from a file. This is what redirection does.
In order to understand redirection, you must first understand filehandles. Every modern operating system uses some form of filehandles, but these are only important to users (in addition to programmers) in Unix systems. In order to read or write to a file, a program must first "open" the file - the open operation returns a uniqe integer descriptor for the particular file, and from then on, the program uses that integer to identify the file it wants to read from or write to, instead of using the filename. The program may then "close" that file when it is done with it. The set of open filehandles and the actual files they point to are uniqe for every process. Therefore, PID 500 may have an open file with a file descriptor of 5 and PID 600 may also have an open file with file descriptor 5, but these may point to completely different files, or they may point to the same file, in which case bad things will happen if both programs wish to write to that file (file locking is implemented elsewhere). The set of open files for a process is inherited from its parent, although the exact semantics of how this works may differ among Unix variants. Later on, you'll see that everything in Unix should be represented as a file; right now, all you need to know is that three file descriptors are special: file descriptor 0 points to the "standard input" (stdin) of a process, file descriptor 1 points to the "standard output" (stdout), and file descriptor 2 points to the "standard error output" (stderr) of a program. By default, stdin is your keyboard, and stdout and stderr are both connected to your terminal window. This means that if a program "reads" from stdin, it will not read from a file on disk, but it will read from the keyboard, and if it "writes" to stderr or stdout, it will not write to a disk file, but it will write to your terminal window. Stderr is different from stdout in that a program should output exceptional output to stderr and normal output to stdout.
Redirection is a means by which you can change the default file descriptors of a process from the shell. Remember that open files are inherited across the creation of child processes, so the shell can temporarily close stdin, open another file (perhaps a disk file called "/tmp/foo") to file descriptor 0 and then launch the child process; the child process will then have its stdin redirected to the disk file. Whenever the child wants to output something to the terminal, it will instead end up writing to the file "/tmp/foo". (Technically, this isn't how it really works, but this is functionally equivalent). Similarly, you can redirect stdout and stderr.
The syntax for shell redirection in the Bourne shell is:
PROMPT$ PROGRAM FD> OUTFILE PROMPT$ PROGRAM FD>> OUTFILE PROMPT$ PROGRAM FD< INFILE
where FD is a positive integer file descriptor. The ">" character means that OUTFILE will be opened so that PROGRAM can write to the file, and the "<" character means that OUTFILE will be opened so that PROGRAM can read from the file. When you use the ">" character, OUTFILE will be truncated when it is opened; that means that if OUTFILE contained anything, those contents will be immediately erased and PROGRAM will write to an empty file. If you use the ">>" character instead of the ">" character, any output from PROGRAM will be appended to OUTFILE; the current contents of OUTFILE will not be lost. The most common redirections involve stdin (file descriptor 0) and stdout (file descriptor 1), so the Bourne shell provides shortcuts for redirecting these file descriptors. Using ">" without FD (ie, "ls > /tmp/ls-output") will redirect file descriptor 1 to OUTFILE and using "<" without FD will redirect file descriptor 0. Redirection in the C shell works a bit differently: you cannot explicitly state a file descriptor number; you can only redirect the stdin and stdout of a program. Indeed, one of the greatest weaknesses of the C shell is that there is not way to redirect stderr (if you ever need to do this, you can try "(PROGRAM > /dev/tty) >& ERRFILE").
In the Bourne shell, if you want to redirect a file descriptor not to another file, but an open filehandle, the following syntax may be used:
PROMPT$ PROGRAM FD1>&FD2 PROMPT$ PROGRAM FD1<&FD2
The first example redirects anything written to FD1 to be written to FD2; the second example redirects anything read from FD1 to be read from FD2. FD1 may be ommitted, and the default value for FD1 is the same as mentioned above for the ">" redirection and the "<" redirection. The most common cases are redirecting a program's standard error to the standard output, or redirecting a programs standard output to standard error, which, respectively, are acomplished like this:
PROMPT$ PROGRAM 2>&1 PROMPT$ PROGRAM >&2
Very often, one wants to redirect the stdout of one program to the stdin of another - Unix shells provide a means to do this easily, the pipe. This is the syntax of the shell pipe:
PROMPT$ PROG1 | PROG2
This will redirect the stdout of PROG1 to the stdin of PROG2. You can chain many pipes together, so commands like "PROG1 | PROG2 | PROG3" are perfectly legal.
Some examples are in order. In order to understand these examples, you must know that "cat" is a simply program that simply copies its stdin to stdout; the "foo" program is a program that outputs "Normal Output" to stdout and "Error Output" to stdout.
PROMPT$ ./foo Standard Output Error Output PROMPT$ # let's look at this program 'foo' to see how it works: PROMPT$ cat foo #!/bin/sh echo Standard Output echo Error Ouput 1>&2 PROMPT$ ./foo 2> errfile Standard Output PROMPT$ cat errfile Error Output PROMPT$ ./foo 2> errfile | cat > outfile PROMPT$ cat outfile Standard Output PROMPT$ ./foo | cat | cat | cat | cat | cat Error Output Standard Output PROMPT$ # in the above example, note that "Error Ouput" was printed first, PROMPT$ # even though our program output it second. This shows the time PROMPT$ # it took for "Standard Output" to go through all those invocations PROMPT$ # of the 'cat' program.
The next section discusses escaping various characters; in order to have such a discussion some common terminology is needed. This makes no difference in written communication, but you must use these names in any verbal communication to avoid any confusion.
| Character | Name (preferred name first) |
|---|---|
| ! | Bang, Exlamation Point |
| # | Hash, Pound |
| & | Ampersand |
| * | Star (rarely "asterisk" or "times") |
| - | Dash (never "minus" or anything else) |
| _ | Underscore |
| | | Pipe |
| \ | Backslash |
| / | Slash |
| ~ | Tilde |
| ` | Backtick, backquote |
| ' | Single quote |
| " | Double quote |
| < > | Less than, Greater than |
| [ ] | Left (right) Bracket (also "square bracket") |
| { } | Left (right) Curly Brace (also "curly bracket") |
As an example, suppose someone told you over the telephone to type in "hash bang user bin pearl dash double you." You would have to infer the pathname and the argument and understand the meaning of the characters to be exactly "#!/usr/bin/perl -w".
Consider the following example:
PROMPT$ echo This computer costs much more than $1 This computer costs much more than PROMPT$
The problem with the 'echo' statement is that the shell did variable expansion on the "$1" characters. Since "1" is actually an environment variable (which is has an empty value), the shell expanded it and did not print the dollar sign. This introduces a need for a mechanism called quoting. Quoting allows one to remove the special meaning of certain characters, such as the dollar sign or whitespace.
The simplest way to quote something is to preceed the character to be quoted with a backslash ("\"). If you've read the previous section, you'll know the importance of differentiating between the slash (which has no special meaning to the shell, but denotes pathnames in the unix filesystem) and the backslash (which does backslash quoting in the unix shell and has no special meaning in the unix filesystem). Here are some examples of quoting using the backslash:
GREENSCREEN$ # The program "foo" prints out each of its arguments, one per line GREENSCREEN$ ./foo one two three one two three GREENSCREEN$ # The '$' character does variable expansion in the shell: GREENSCREEN$ ./foo one two $three one two GREENSCREEN$ # it can be quoted with a backslash: GREENSCREEN$ ./foo one two \$three one two $three GREENSCREEN$ # the space character separates arguments; it, too, can GREENSCREEN$ # be quoted with a backslash: GREENSCREEN$ ./foo one\ two three one two three GREENSCREEN$ # the star character expands to all files in the current GREENSCREEN$ # directory in the next example: GREENSCREEN$ ./foo * foo GREENSCREEN$ ./foo \* * GREENSCREEN$ # The '>' symbol denotes redirection: GREENSCREEN$ ./foo > /dev/null GREENSCREEN$ ./foo \> /dev/null > /dev/null GREENSCREEN$ # The backslash can quote itself: GREENSCREEN$ ./foo \\ \
Consider the following example:
PROMPT$ ls This is a crazy filename: $x <> $y \| \' \& PROMPT$
Obviously, this file was not created by an experienced unix user, so it would be safe to delete it. This can be accomplished using backslash quoting as follows:
PROMPT$ rm This\ is\ a\ crazy\ filename:\ \ \$x\ \<\>\ \$y\ \\\|\ \\\'\ \\\& PROMPT$ ls PROMPT$
This should immediately strike you as error-prone and tedious; this is why the shell provides double quoting. When the shell sees anything placed in between double quotes, it will remove the meaning of some of the special characters (but not all of the special characters). Different shells may do different things, so the best way to learn exactly what your shell will expand between double quotes is to experiment as follows:
PROMPT$ # this is a normal Bourne shell, let's see how it does double-quoting:
PROMPT$ # again, we have our program "foo" which simply prints its arguments,
PROMPT$ # one per line:
PROMPT$ ./foo one\ two three
one two
three
PROMPT$ ./foo "the first argument" "the second argument"
the first argument
the second argument
PROMPT$ # let's see if it escapes the dollar sign:
PROMPT$ FOO=bar
PROMPT$ ./foo "$FOO"
bar
PROMPT$ # apparently not.
PROMPT$ # however, you can do backslash quoting within double quotes:
PROMPT$ ./foo "\$FOO"
$FOO
PROMPT$ # can also escape the question mark and the star:
PROMPT$ ./foo ./fo*
./foo
PROMPT$ ./foo "./fo*"
./fo*
PROMPT$ ./foo ./fo?
./foo
PROMPT$ ./foo "./fo?"
./fo?
PROMPT$ ./foo ">" /dev/null
>
/dev/null
PROMPT$ ./foo "|" cat
|
cat
PROMPT$ ./foo # this is a comment
PROMPT$ ./foo "# this is not a comment"
# this is not a comment
PROMPT$ # the ampersand has special meaning to the shell (discussed later):
PROMPT$ ./foo "&"
&
PROMPT$ # we'll discuss backticks later; let's see what happens
PROMPT$ # to them within double quotes:
PROMPT$ cat ./foo
#!/usr/bin/perl
foreach (@ARGV) {
print;
print "\n";
}
PROMPT$ ./foo `cat ./foo`
#!/usr/bin/perl
foreach
(@ARGV)
{
print;
print
"\n";
}
PROMPT$ ./foo "`cat ./foo`"
#!/usr/bin/perl
foreach (@ARGV) {
print;
print "\n";
}
PROMPT$ # that was meant to show that double quoting does NOT
PROMPT$ # escape backticks, although it's a bit convoluted.
PROMPT$ # the parenthesis is also special to the shell (more on
PROMPT$ # that later):
PROMPT$ ./foo (something)
Syntax error: word unexpected (expecting ")")
PROMPT$ ./foo "(something)"
(something)
PROMPT$ # but double quotes do escape parentheses.
PROMPT$ # remember you can do backslash quoting within double
PROMPT$ # quotes, so this works:
PROMPT$ ./foo "The most trite phrase in computer science: \"HELLO WORLD\""
The most trite phrase in computer science: "HELLO WORLD"
PROMPT$ # now we can specify that annoying filename more easily:
PROMPT$ ls
This is a crazy filename: $x <> $y \| \' \&
foo
PROMPT$ rm "This is a crazy filename: \$x <> \$y \\| \\' \\&"
PROMPT$ ls
foo
PROMPT$ # Now let's see how the C shell handles some of the most
PROMPT$ # important examples above. The C shell does not allow
PROMPT$ # one to use comments interactively (yet another problem
PROMPT$ # with the C shell), so you won't have my comments
PROMPT$ # for the next few lines.
PROMPT$ /bin/csh
%setenv FOO bar
%./foo "$FOO"
bar
%./foo ">" /dev/null
>
/dev/null
%./foo "|" /dev/null
|
/dev/null
%./foo "&"
&
%./foo "./fo?"
./fo?
%./foo "./fo*"
./fo*
%./foo "`cat ./foo`"
#!/usr/bin/perl
foreach (@ARGV) {
print;
print "\n";
}
%./foo "(something)"
(something)
%./foo "\$"
Variable name must contain alphanumeric characters.
%./foo "\\"
\\
%./foo "\""
Unmatched ".
%./foo "How would you rate the \"C\" shell's stupidity?"
Unmatched '.
%exit
PROMPT$ # so the C shell does not allow backslash quoting within double
PROMPT$ # quotes; this is actually a huge problem for C shell programming,
PROMPT$ # and is yet another reason why one should never write anything
PROMPT$ # in C shell.
In addition to double-quoting, the shell provides single quoting. Since you now have an example of how to find out exactly how double quoting works, you can apply that to single quoting to find out how it works for yourself; I will only say that it's more restrictive than double quoting (it doesn't do variable interpretation, etc.). Needless to say, single quoting works differently in the C shell than the Bourne shell. The removal of the annoying file can be done with single quoting as follows (combining single quoting and double quoting since one can't quote the single quote (') within single quotes):
PROMPT$ ls This is a crazy filename: $x <> $y \| \' \& foo PROMPT$ rm 'This is a crazy filename: $x <> $y \| \'"'"' \&' PROMPT$ ls foo PROMPT$ # better example: PROMPT$ ./foo 'This is all one argument: $foo <>?*& \\ `/bin/ls`' This is all one argument: $foo <>?*& \\ `/bin/ls`
PROMPT$ ls This is a crazy filename: $x <> $y \| \' \& foo PROMPT$ rm This* PROMPT$ ls foo PROMPT$
As previously mentioned, Unix is a multitasking operating system. If you uses a windowing system, this becomes apparent as you can have multiple processes all running at the same time in different windows. However, some people consider it tedious to open a new terminal window for every process one wants to use. You can control multiple processes from within one shell using job control. This is one of most-often overlooked aspects of the shell for the unix newbie.
Unix job control is actually somewhat complex, so the following explanation is a simplification. A process which is launched from a shell may be in one of four different states:
Whenever you see a shell prompt waiting for you to type in a command, the shell is the current foreground process. The shell accepts input from the terminal (the command you type in). Whenever you type in a command as all commands above have been typed, a process is launched and that process becomes the foreground process. For example:
GREENSCREEN$ # the shell is currently the foreground process GREENSCREEN$ # now I launch 'cat' and it becomes the foreground GREENSCREEN$ # process (recall that 'cat' is a program which GREENSCREEN$ # copies its standard input to its standard output): GREENSCREEN$ cat > /dev/null Cat is reading from standard input and throwing away its input. You can see that the shell is no longer the foreground process as these lines that I'm typing are not being interpreted as commands. 'Cat' will continue in this fashion until I tell it to stop. I can stop 'cat' in three ways: 1. Hitting control-C - this will effectively kill 'cat' - the process will no longer be a foreground process because it will not exist. 2. Hitting control-D - control-D typed in from a terminal means "end of file." Recall that I could have just as easily redirected a file to be the standard input of 'cat' - when I do that, the file from which 'cat' is reading has a definite end, and 'cat' exits when it has finished reading the entire file. I can indicate that I have given 'cat' an entire "file" on standard input by hitting control-D which signifies "end-of-file." 3. Hitting control-Z - this will turn 'cat' from a foreground process into a suspended (or stopped) process, and will make the shell the foreground process again. I hit control-D: GREENSCREEN$ # back at my shell
As mentioned above, one can turn the foreground process into a suspended process by hitting control-Z. From the shell, one can examine which processes are in the background and which processes are stopped or suspended using the 'jobs' shell builtin:
GREENSCREEN$ # launching cat: GREENSCREEN$ cat > /dev/null Now I'll suspend 'cat' by hitting control-Z: ^Z zsh: suspended cat > /dev/null GREENSCREEN$ # Note that my shell points out that 'cat' has been suspended. GREENSCREEN$ # I can see which jobs are backgrounded and stopped/suspended GREENSCREEN$ # using the 'jobs' builtin: GREENSCREEN$ jobs [1] + suspended cat > /dev/null GREENSCREEN$ # Of course, I can only have one foreground process, GREENSCREEN$ # but I may have multiple background and suspended processes: GREENSCREEN$ cat > /dev/null ^Z zsh: suspended cat > /dev/null GREENSCREEN$ jobs [1] - suspended cat > /dev/null [2] + suspended cat > /dev/null GREENSCREEN$ # I can also gather some more information about my GREENSCREEN$ # jobs by passing the '-l' parameter to 'jobs': GREENSCREEN$ jobs -l [1] - 21284 suspended cat > /dev/null [2] + 21292 suspended cat > /dev/null GREENSCREEN$ # now I know the PIDs of the suspended processes, and GREENSCREEN$ # I can kill them off: GREENSCREEN$ kill -9 21284 21292 [2] + killed cat > /dev/null [1] + killed cat > /dev/null GREENSCREEN$ jobs GREENSCREEN$
Of course, making a process suspended would not be very useful if there were no way to make the process the foreground process again. This is accomplished using the 'fg' (foreground) shell builtin. The syntax of the 'fg' builtin is:
PROMPT$ fg %JOBNO
Where JOBNO is a job number. The job number is the number "1" or "2" in the previous example. You may ask why one can't simply use the PID of a suspended process with 'fg' - for an illustration of why job numbers are separate from PIDs, consider the following example:
GREENSCREEN$ cat | cat > /dev/null
Back in 'cat', but which one? I launched two processes above, and both
are running simultaneously. Now I suspend this *job* (not *process*):
^Z
zsh: suspended cat | cat > /dev/null
GREENSCREEN$ jobs -l
[1] + 21326 suspended cat |
21327 suspended cat > /dev/null
Carefully note the output of 'jobs' above - it's telling me that I have two processes which make up one job. The job number is "1" (indicated by "[1]"); the PID of the first process in the job is 21326, and that process' standard input was the terminal and its standard output was a pipe; the PID of the second process in the job is 21327, and that process' standard input was a pipe and its standard output was "/dev/null". Now we can get back to using the 'fg' builtin:
GREENSCREEN$ jobs -l
[1] + 21326 suspended cat |
21327 suspended cat > /dev/null
GREENSCREEN$ # the most common usage of the 'fg' builtin:
GREENSCREEN$ fg %1
[1] + continued cat | cat > /dev/null
Now I suspend this job again:
^Z
zsh: suspended cat | cat > /dev/null
GREENSCREEN$ # and create a new job:
GREENSCREEN$ cat > /dev/null
which I also suspend:
^Z
zsh: suspended cat > /dev/null
GREENSCREEN$ # and now I have two jobs suspended:
GREENSCREEN$ jobs
[1] - suspended cat | cat > /dev/null
[2] + suspended cat > /dev/null
GREENSCREEN$ # back to the first:
GREENSCREEN$ fg %1
[1] - continued cat | cat > /dev/null
and I kill it by hitting control-C:
^C
GREENSCREEN$ jobs
[2] + suspended cat > /dev/null
GREENSCREEN$ # and I get rid of the other one:
GREENSCREEN$ fg %2
[2] - continued cat > /dev/null
This time I'll use control-D:
GREENSCREEN$
Very often, you will suspend a primary job (like an editor) temporarily, run a quick command (like 'ls') and then return to the primary command. It would be tedious to look up the job number of the primary job and then give that as an argument to 'fg' again and again, so 'fg' supports a shortcut - if you don't give it any arguments, it will return to the job which was most recently in the foreground. This job is indicated with a "+" (plus) in the output of the 'jobs' builtin. You can experiment with this on your own to see how it suits you.
So far, you've seen examples of foreground and suspended (or stopped) processes, but not background processes. Recall that a suspended process is not running - it is completely halted. A background process, on the other hand, continues to run although it is not accepting input from the terminal. A background process can be created in two ways:
GREENSCREEN$ # let's say I want to find out where in my filesystem GREENSCREEN$ # the file called "stdio.h" resides. I can do this using GREENSCREEN$ # the 'find' command, which will traverse the entire filesystem GREENSCREEN$ # looking for "stdio.h" (don't worry about the syntax of GREENSCREEN$ # the 'find' command for now, just realize that it may GREENSCREEN$ # take a long time to do its work). Since this command GREENSCREEN$ # traverses the entire filesystem, it will attempt to GREENSCREEN$ # access directories for which I (as a regular non-root) GREENSCREEN$ # user don't have permission to read, and print out GREEENSCREN$ # an error message on standard error for each of these GREENSCREEN$ # directories. I don't care about these errors, so I GREENSCREEN$ # redirect standard error to /dev/null: GREENSCREEN$ find / -name stdio.h -print 2> /dev/null /usr/include/stdio.h /usr/src/contrib/libio/dbz/stdio.h /usr/src/contrib/libio/stdio/stdio.h /usr/src/include/stdio.h /usr/ports/lang/compaq-cc/files/include/stdio.h GREENSCREEN$ # I timed that using my wristwatch - that GREENSCREEN$ # took almost ten minutes to complete! GREENSCREEN$ # If I weren't using multiple terminals, GREENSCREEN$ # I would have been stuck here for ten minutes GREENSCREEN$ # staring at the screen. There are two ways GREENSCREEN$ # to deal with this: GREENSCREEN$ # first way, background the job from the start: GREENSCREEN$ find / -name stdio.h -print 2> /dev/null & [1] 21370 GREENSCREEN$ /usr/include/stdio.h # note that it found one file right away (most likely the most-accessed GREENSCREEN$ # file). It's also mixing /usr/src/contrib/libio/dbz/stdio.h /usr/src/contrib/libio/stdio/stdio.h in its output with my /usr/src/include/stdio.h GREENSCREEN$ # typing which is a bit confusing. GREENSCREEN$ # let's kill off the job: GREENSCREEN$ jobs [1] + running find / -name stdio.h -print 2> /dev/null GREENSCREEN$ jobs -l [1] + 21370 running find / -name stdio.h -print 2> /dev/null GREENSCREEN$ kill 21370 GREENSCREEN$ [1] + terminated find / -name stdio.h -print 2> /dev/null GREENSCREEN$ jobs GREENSCREEN$ GREENSCREEN$ # now suppose I started this command and I want the GREENSCREEN$ # command to terminate normally (I don't want to kill GREENSCREEN$ # it), but I also don't want it taking up my terminal GREENSCREEN$ # window: GREENSCREEN$ find / -name stdio.h -print 2> /dev/null /usr/include/stdio.h ^Z zsh: suspended find / -name stdio.h -print 2> /dev/null GREENSCREEN$ # I suspended the job almost immediately (with control-Z). GREENSCREEN$ # Note that I'm not getting any output - remember that GREENSCREEN$ # suspended jobs are *not* running, whereas backgrounded GREENSCREEN$ # jobs *are* running. Suppose I want to let the command GREENSCREEN$ # continue: GREENSCREEN$ jobs [1] + suspended find / -name stdio.h -print 2> /dev/null GREENSCREEN$ # this is how to use the 'bg' builtin: GREENSCREEN$ bg %1 [1] + continued find / -name stdio.h -print 2> /dev/null GREENSCREEN$ # note that the job is now running, and it's likely GREENSCREEN$ # to mix in its output with my ty/usr/src/contrib/libio/dbz/stdio.h /usr/src/contrib/libio/stdio/stdio.h ping. GREENSCREEN$ # enough of/usr/src/include/stdio.h that... GREENSCREEN$ jobs -l [1] + 21375 running find / -name stdio.h -print 2> /dev/null GREENSCREEN$ kill 21375 GREENSCREEN$ [1] + terminated find / -name stdio.h -print 2> /dev/null GREENSCREEN$
The above examples may seem a bit convoluted, and they are: this is because these are fairly advanced topics, and backgrounding a suspended job is a fairly rare occurrence. Along these lines, you may ask what happens when you background a job which wants input from a terminal (recall that only the foreground job can receive terminal input). The best way to find out is to experiment:
GREENSCREEN$ # let's find out what happens when a backgrounded job GREENSCREEN$ # tries to read from the terminal: GREENSCREEN$ cat > /dev/null & [1] 21398 GREENSCREEN$ [1] + suspended (tty input) cat > /dev/null GREENSCREEN$ # it was backgrounded and then immediately suspended. GREENSCREEN$ # Some shells support a builtin 'kill' command which GREENSCREEN$ # can work with job numbers in addition to PIDs (this GREENSCREEN$ # is not true for all shells, but is very useful if your GREENSCREEN$ # shell does support it): GREENSCREEN$ kill %1 [1] + terminated cat > /dev/null GREENSCREEN$
At this point, you should understand the most basic usage of job control and can use multiple processes from one terminal window. The following sections are a bit more advanced so you can skip them if you wish. A quick review of basic job control: The above should be enough to allow you to use your shell interactively; you will also want to look into history expansion and job control if you're going to be doing any serious work in Unix. The only thing you need to know about job control at this point is that if you append a "&" character to the end of the command, that command will run in the "background" - the shell will launch the program and immediately return you to a shell prompt, not waiting for the program to terminate. Most daemons (more about daemons later) and long-running programs are run in the background, often with their output redirected to a file. The rest of this section deals with non-interactive Bourne shell scripting. Writing a shell script in C shell is a mistake, so we will not be concerning ourselves with the C shell's corresponding scripting capabilities.
In Unix, each process has what is known as a return value; the return value is a number that a process leaves on termination, and the parent process may (and should) examine the return value of any of its children. In general, a return value of 0 indicates success and any other return value indicates failure (note that this is the opposite from C). In the Bourne shell, you can examine the return value of the previous program by looking at the value of the "?" environment variable. For example, /bin/true is a program that always returns a return value of 0, and you can verify this:
PROMPT$ /bin/true PROMPT$ echo $? 0 PROMPT$ /bin/false PROMPT$ echo $? 1 PROMPT$ ls /bin/sh /bin/sh PROMPT$ echo $? 0 PROMPT$ ls /bin/nosuchfile ls: /bin/nosuchfile: No such file or directory PROMPT$ echo $? 1
In and of itself, the return value is of little use, but combined with the "if" shell builtin, the return value is extremely important. The if builtin has the following syntax:
if COMMAND ; then STUFF-IF-TRUE ; else STUFF-IF-FALSE ; fi
Remember that the semicolon is treated like a newline, so the above is equivalent to
if COMMAND
then STUFF-IF-TRUE
else STUFF-IF-FALSE
fi
The "if" command is a conditional - if COMMAND has a return value of zero, STUFF-IF-TRUE will be executed, otherwise STUFF-IF-FALSE will be executed. For example,
PROMPT$ if /bin/true ; then echo YES ; else echo NO ; fi YES PROMPT$ if /bin/false ; then echo YES ; else echo NO ; fi NO
A very useful program to use with the if conditional is "test". The test program can determine if a file exists, what permissions it has, and many other things. See its manpage for details. The test program is so often used in shell scripts, that in fact it has a shortcut. Some systems may have a symbolic link to the test program named "[". Most shells have a builtin test command and the "[" which behave similarly to the disk command. Here are some things you can do with the test program:
PROMPT$ # see if /bin/sh is executable by regular user PROMPT$ if [ -x /bin/sh ] ; then echo yes ; else echo no ; fi yes PROMPT$ # see if /etc/passwd is writeable by regular user PROMPT$ if [ -w /etc/passwd ] ; then echo yes ; else echo no ; fi no PROMPT$ # test for string equality PROMPT$ FOOBAR=something You have new mail in /var/mail/ahiorean PROMPT$ if [ $FOOBAR = something ] ; then echo yes ; else echo no ; fi yes
A shortcut for the "if" conditional is "&&", which should be interpreted as "and"; the corresponding "or" is "||". Remember that these builtins work on the shell's idea of what "true" and "false" are; that is, a return value of 0 is true and any other return value is false. The syntax for "&&" and "||" are:
PROMPT$ PROG1 && PROG2 PROMPT$ PROG1 || PROG2
The first statment will execute PROG2 iff PROG1 returns 0, and the second command will execute PROG2 iff PROG1 returns a non-zero value.
Besides conditionals, a complete programming language must have a means of repeating some statements; recursion isn't widely used in shell scripting, so you'll have to rely on iteration to repeat statements. The two most commonly used iteration operators in the shell are the "while" loop and the "for" loop. Both work similarly to C programming language. The syntax for each is:
PROMPT$ while PROG1 ; do STATEMENT-LIST ; done PROMPT$ for VAR in VALUE-LIST ; do STATEMENT-LIST ; done
The while loop works very simply: each time through the loop (including the very first time), it will run PROG1; if the return value of PROG1 is 0, then it will execute STATEMENT-LIST, and then return to testing PROG1 and executing STATEMENT-LIST. If PROG1 every returns a non-zero value, the while loop will terminate. The for loop works very differently; VAR is the name of a temporary variable; it is customary to capitalize environment variables, but this is your choice. VALUE-LIST is a whitespace-delimited list of values. Each time through the loop, VAR will be set to the next value from VALUE-LIST and STATEMENT-LIST will be executed. These are most easily understood through some examples.
PROMPT$ while /bin/true ; do echo Hello ; done
This will continue printing out "hello" until you hit the interrupt key, usually control-C. NB that /bin/true is executed each time through the loop, and you are counting it to return a non-zero value at one point if you do not want an infinite loop. /bin/true always returns 0, so the above is an infinite loop.
PROMPT$ for VAR in 1 2 3 4 5 ; do echo VAR = $VAR ; done
The output of the above sequence is:
VAR = 1 VAR = 2 VAR = 3 VAR = 4 VAR = 5
Note how we combined variable substitution with the "for" loop; this is the most common way of using a for loop, with variable substitution. One program that you may find useful in combination with the for loop is "seq"; "seq" will print out sequences of numbers st. the "1 2 3 4 5" above could have been replaced with "seq 5". However, when you try to use this program to generate the numbers you need, you run into a problem: the "for" syntax requires that "in" is followed by a list of whitespace-delimited strings, not a program statement which generates a list of whitespace-delimited strings. For example, this will not work:
PROMPT$ for I in seq 5 ; do echo $I ; done seq 5
As you can see, we need to have some way of grabbing the output of a program and using that in place of some text. The easiest way to do this is with backticks (the "`" character, on the same key as the tilde on most PC and Mac keyboards). Whenever the shell sees any text between backticks, it will execute that text and replace that text with the output of the execution. For instance, to use the "seq" above, we would do:
PROMPT$ for I in `seq 1 5` ; do echo $I ; done 1 2 3 4 5
By now, you should know most things needed to be able to write effective shell scripts and to be able to read most other people's shell scripts. For more information, you should see the documentation for your particular shell. One of the first things most people who learning shell scripts do is to rewrite all of the system startup scripts; this can often be a very useful exercise.
First, you may want to read the section on the Unix process model to get an idea of what a process is and what data is stored with a process.
For quick reference, you can read the summary on process management.
| Symbolic Name | Value | Meaning |
|---|---|---|
| TERM | 15 | Politely ask the process to terminate |
| KILL | 9 | Impolitely make the process terminate |
| HUP | 1 | Hang up, something has been disconnected |
| INT | 2 | Interrupt the process |
| SEGV | 11 | Segmentation fault, a memory violation |
| USR1 | varies | User-defined signal |
Some explanation is in order: remember that it is the kernel which "makes" signals happen, not a user process (a user process can only ask the kernel to generate a signal). Some signals are therefore meant to communicate information from the kernel to the process. An example of this is SIGSEGV: the kernel sends this signal to a process whenever a process attempts to access some memory address which it is not allowed to access (indicating a programmer error); the program can then do some cleaning up and quit (it's rather difficult to recover gracefully from a memory violation). This cleaning-up can be fairly important: the process may wish to flush some buffers, close some files, release some locks, etc. If these actions do not occur, they may adversely affect future invocations of the program, or other processes which were cooperating with the signalled process. This is the reason for two different signals to make a process terminate: SIGTERM can be intercepted by a program (a program may install a signal handler for it), and the program will clean up and (hopefully) quit when receiving SIGTERM. Sometimes the cleanup code may be broken, so it is useful to have a signal which will terminate a process unconditionally, without giving the process a chance to do anything: this is SIGKILL, signal 9. When a process receives signal 9, the kernel will destroy the process immediately. The other types of signals are often used in ways which vary from program to program: they can be used to ask a program to perform a specific action at a certain time. For example, most daemons will re-read their configuration files or restart when you send them SIGHUP. See the section 7 manual entry for 'signal' for more informaion. You can send any signal to a process using the 'kill' program; again, see the manual entry for more information, including the program's syntax.
The basic information you need from all of this: use "kill 4217" to ask the process with PID 4217 to terminate; do "kill -9 4217" to make it go away immediately, no questions asked; do "kill -HUP 4217" to ask it to reread its configuration file, if it supports this behaviour (see the program's manual page to see if it does - the program will terminate if it does not support this behaviour).
In order to use the "kill" program, you must know the PID of the process you want to kill. You can use the "ps" program to do this. The "ps" program works completely differently in BSD and SYSV Unix. You should read the manual page for "ps" on the system you're using to see what options it supports. Basically, you most often want a very verbose process listing which you sort out by hand (ie, as opposed to sorting out the data programatically). Under BSD Unix, do "ps auxww"; under SYSV Unix, do one of "ps -elf", "ps -Af" or whatever suits you. Note that if you are running Solaris and you've installed the Berkeley compat package, you can get a BSD-like "ps" in /usr/ucb/ps. Generally, you can pipe the output of "ps" to less(1) to view it, or through grep(1) to find a process by name.
If you are using Linux (or more generally, the GNU utilities), you may have a program called "killall": this will kill a process by name, saving you the "ps auxwww | grep foo". Be careful with this program; on non-Linux systems, "killall" does something completely different, which can be quite painful if you type it in as root. If you wish to go deeper into this, some of the following sections describe shell scripts, and automating the "ps | grep ; kill" tedium might make a useful shell script project. Note that there is absolutely no standard API for getting process information; if you want to write a C program that gets a PID from a process name, the most portable way to do that is to do 'popen("/usr/bin/ps")' or something similar; by extension, this means that your favorite language (perl, python, whatever) will not provide such an API, and you have to parse the output of "ps" after you figure out what kind of "ps" you have.
In short, this is what you need to know about process management:
To find the PID of a process by name:
On Linux or *BSD:
ps auxww | less
ps -elf | more
To stop a process which has PID 1447:
kill 1447Wait a couple of seconds
kill -9 1447
I won't bore you with too much history at this point. The only
history applicable right now is that two different "flavors" of unix
are popular today, BSD and System V Unix. This section will show a few
of the difference between the two, in the context of what you are
likely to see at this university.
The original Unix was developed at AT&T by Brian Kernigham and
Dennis Ritchie; over the years, AT&T kept improving and releasing
newer versions of Unix, and lately the actual trademark to Unix has
been sold by and to a number of companies. At some point,
System V, Relase 4 (SVR4) Unix was released and this
became somewhat of a standard within some communities. Many modern
Unices, including Solaris, HP/UX and others are based upon SVR4.
BSD stands for Berkeley Software Distribution - BSD Unix was developed
at by Berkeley grad students working off the source code for one of the
earlier versions of AT&T Unix. Eventually, BSD and SVR4 diverged
and now many differences exist, but both systems usually borrow
features from on another.
One of the major differences between SVR4 and BSD Unix is the
networking model. TCP/IP is the Unix networking protocol, and both
SVR4 and BSD implement TCP/IP, but in different ways. BSD uses
Berkeley sockets, whereas SVR4 has a confusing API called "streams".
Both sockets and streams are general IPC (interprocess communication)
APIs which can do more than just networking.
The other immediately visible difference (besides different syntax for
commands, etc.) between Berkeley and SVR4 Unix is the way in which
startup scripts are organized. Under a BSD system, after the kernel is
finished booting, it runs init, which then runs the program /etc/rc.
/etc/rc then calls various other shell scripts which usually are
/etc/rc.*, or /etc/rc.d/*. You modify the way the system boots up by
grepping for what you want to change in /etc/ or /etc/rc.d and then you
fire up your editor and bang away. Often BSD systems will have a
configuration file which defines the way the boot scripts behave;
FreeBSD has a file, /etc/rc.conf which even has an extensive manual
page. You should also do an "apropos rc" on your BSD system for any
special explanations and caveats about your boot script system.
Types of Unices
| S | single-user | This is usually only run when something really bad happens. |
| 0 | shutdown | usually run when the system is going down for reboot or poweroff |
| 1 | on | run when the system is being brought up |
| 2 | multiuser | run when you wish to enable people besides the root user to log on. On some systems, this also brings up some network daemons to allow remote logons. |
| 3 | multiuser with network | This run level does not exist on some systems, but is replaced by run level 2. If this does exist, it usually brings up the network daemons which allow users to log on remotely. |
| 4 | xdm | Sometimes xdm may get its own runlevel. It might be found here, or perhaps in runlevel 3. |
| 5 | powerdown | Take the machine down and turn off its power, if possible in software |
| 6 | reboot | Reboot the system. More specifically, takes the system down to the initdefault entry in /etc/inittab, which usually takes the system into run level S or 5. |
You can change the current runlevel by using the "init" or the "telinit" commands. See their manpages for more information. You can change the default run level by editing the file /etc/inittab, and you may boot into a runlevel other than the default run level by passing a parameter to the kernel which is passed to init, either by doing something with the firmware or passing a parameter to your bootmanager.
The meanings of these run levels are configurable, so they may be
different from system to system. Usually, you have at least one or
two free run levels which you can customize to do whatever you want.
The programs which are run at runlevel changes are set by symbolic
links to scripts. Generally, each service you want to run has a script
which starts, stops, or asks about the status of the service. Such a
script will usually take one argument which is either "start", "stop"
or "status". All the scripts for all the services your computer has
are usually all piled together in one directory. This directory may be
/etc/init.d, /etc/rc.d, or (in SuSE Linux) even such a strange place
such as /sbin/init.d, but most commonly, /etc/rc.d/init.d. You can
at any time start or stop a service by simply calling one of those
scripts with a "start" or "stop" argument. When switching runlevels,
these scripts are automatically called with a "start" or "stop" argument.
You specify which scripts to run in which runlevel by making a symbolic
link to that script in a certain directory. For instance, all the scripts
to run in run level 2 are in /etc/rc2.d (or maybe /etc/rc.d/rc2.d).
To tell the system whether to pass an argument of "start" or "stop" to
the script, you prefix the name of the symlink with either the character
"S" (start) or "K" (kill). These are case-sensitive, as is everything
in Unix. Some systems may have to run boot scripts in a certain order
(for instance, you need to run the IP configuration script before
running the ssh daemon script). To ensure that scripts are run in the
correct order, you must append a number to the "S" or "K" in the name
of the symbolic link to the script. All scripts in one runlevel will
be run in order, according to these numbers. An example is in order.
Let's look at Harper's configuration.
The first thing to note are all the "NOUSE" files. These are scripts
which were first installed by the Solaris installation program but were
later disabled by the Harper administrators. If a file in an
/etc/rc?.d directory does not begin with an "S" or a "K", it is
ignored. Now we can see the services which need to be killed in order
to get in to run level 2 (multiuser with network). "dmi", "snmpdx",
"nfs.server" are all killed when going into run level 2. The first two
are Sun snmp services; these are not needed in Harper's regular
configuration, and neither is the NFS server daemon, so they are turned
off (Harper doesn't export any NFS shares, it only uses other machines'
shares). The other services which are run in order according to the
number after the "S" in the name of the program. For example, we know
that the standard Solaris distribution does not come with ssh -
therefore ssh must have been installed by the Harper admins; let's see
how they did it.
As you can see, SVR4 initialization scripts are a big mess. If you
actually have a system which runs so many services where such nonsense
is useful, you may want to distribute your services over more than one
machine.
Now that the major differences between SVR4 and BSD are clear, you
should know what types of systems most students will be running. The
most common incantations of BSD systems are freely-distributable: These
include FreeBSD, NetBSD and OpenBSD. FreeBSD is considered by
some the most mature, and it offers many features; unfortunately, it
only runs on PCs. FreeBSD comes with extensive documentation,
including some books, and it has the most extensive collection of
manual pages I have ever seen. NetBSD is known as the "most portable
operating system in the world." If you have some kind of computer,
NetBSD will probably run on it. OpenBSD split off from NetBSD at some
point because of some disagreements between the developers; OpenBSD
concentrates on security and built-in encryption, and it will run on
PPC Macs and PCs. It is considered by most experts the most secure
operating system in the world, and it boasts such features as a default
installation of OpenSSH, an excellent and unrestricted ssh
implementation. Many times when a buffer overflow is found which
affects many operating systems, the corresponding OpenBSD program was
fixed months ago. All of the freely-distributable BSDs take features
from one another; for instance, FreeBSD pioneered the "ports" system, a
really powerful and convenient way of distributing and installing
software from source; nowadays, both NetBSD and OpenBSD use the ports
system.
Linux is another Unix operating system, which seems more popular than
the BSDs. Most people will say that Linux is a SVR4 Unix; these people
are wrong. Linux is a BSD Unix; it uses BSD-style system calls, all of
the networking softare and APIs were borrowed from BSD, and most of the
user commands use BSD-style syntax; the entire networking subsystem in
the kernel was taken directly from BSD. Linux does not implement
streams, so there is no way that it can be considered a SVR4 system.
The only reason that some people confuse it with a SVR4 system is most
distributions use SVR4-style boot scripts.
Most of the software that
comes with a Linux distribution is actually GNU software. GNU (GNU's
not Unix) is a project started at MIT which is an attempt to write a
complete freely-distributable Unix system. The GNU kernel (the HURD),
was not very complete at the time of the release of the Linux kernel,
and the Linux kernel was licensed under the
GNU Public License, so
people using Linux starting using GNU software and people who wanted to
use the GNU system started using the Linux kernel.
Some of the GNU people feel that GNU is such an important part of
modern Linux distributions that Linux distributions should be named
"GNU/Linux" instead of "Linux." Licensing is a very important issue
within the Unix community; the GPL is one type of license, whereas the
the BSDs are licensed
differently.
What is important to note about these licenses is that both allow you
to use the software they license freely; you can download and run the
software, in addition to modifying the source code. The GPL contains a
clause, however, which explicitly prohibits any changes to software
licensed under it from ever being "closed", or no longer freely
redistributable in source code form. This prevents many for-profit
software companies from incorporating GPL'ed code into their projects,
while BSD code may be incorporated into closed-source software more
easily.
HARPER$ ls /etc/rc2.d/
K07dmi NOUSES90va_monitor S74xntpd
K07snmpdx README S75cron
K28nfs.server S00umask S75savecore
NOUSE.S47asppp S01MOUNTFSYS S76nscd
NOUSE.S70uucp S02QUOTA S80PRESERVE
NOUSE.S72autoinstall S05RMTMPFILES S80lp
NOUSE.S73cachefs.daemon S20sysetup S85identd
NOUSE.S74autofs S21perf S88utmpd
NOUSE.S75flashprom S22acct S90amon
NOUSE.S85power S30sysid.net S92volmgt
NOUSE.S89bdconfig S69inet S95SUNWmd.sync
NOUSE.S91afbinit S71rpc S95networker
NOUSE.S92volmgt S71sysid.sys S99apache
NOUSE.S93cacheos.finish S72inetsvc S99audit
NOUSES65ipfboot S72nfs.server S99sshd
NOUSES80spc S73nfs.client S99tsquantum
NOUSES90mon_cm S74autofs XS88sendmail
NOUSES90monlog S74syslog XS99drac
HARPER$ ls -l /etc/rc2.d/S99sshd
lrwxrwxrwx 1 root other 16 Jun 10 1999 /etc/rc2.d/S99sshd -> /etc/init.d/sshd
HARPER$ # So, the Harper admins put their startup script for sshd
HARPER$ # in the directory /etc/init.d, and made a symbolic link
HARPER$ # named /etc/rc2.d/S99sshd which points to this script.
HARPER$ # When the system goes into run level 2, /etc/init.d/sshd
HARPER$ # will be called with a single argument: "start". Let's see
HARPER$ # if the sshd script correctly deals with this argument.
HARPER$ cat /etc/init.d/sshd
#!/bin/sh
if [ $# -ne 1 ] ; then
echo "Usage: $0 {start,stop,restart}"
exit 1
fi
case $1 in
'start')
echo "starting sshd"
/opt/sbin/sshd&
;;
'stop')
echo "Stopping sshd"
kill `cat /etc/sshd.pid`
;;
'restart')
echo "Restarting sshd"
kill -HUP `cat /etc/sshd.pid`
;;
esac
HARPER$ # This is a very simple (but effective) script which
HARPER$ # does the right thing with its arguments. I may have
HARPER$ # not explained all of the things that the script uses;
You have new mail in /var/mail/ahiorean
HARPER$ # $# is the number of command-line arguments, $1 is the
HARPER$ # first argument, etc. The 'case' statement has a bit
HARPER$ # of a strange syntax in Bourne shell, but that's how it
HARPER$ # works.
| Redhat Linux | Most Linux users started here. Redhat is the most popular Linux distribution, and although many advanced users may prefer other distributions, everyone acknowledges the great work that Redhat has done for the Linux community. Redhat has extensive technical support, and also caters to suit-wearing business types who need some sort of guarantee of technical support because they don't trust their IT personnell to install Slackware. Redhat pioneered a package system, rpm, which many other distributions have adopted. Redhat makes PC, Alpha AXP, and Sun SPARC distributions. |
| SuSE Linux | These guys are German. In fact, a lot of the stuff in this distribution caters to Germans, like the default keyboard layout when you invoke a shell during the installation, or the comments in the boot scripts, but you should have no problems even if you speak only English. This distro also comes with the most software out of the Linux distributions - the latest SuSE distro comes on 6 CDs full of programs. SuSE uses a completely non-standard boot script setup, and often things are placed in very strange places (which can break compiles when a lib isn't where it's supposed to be). SuSE is an rpm-based distribution, and is available for the PC, PPC Mac, Alpha AXP, and (soon), the IBM S/390 (ie, IBM mainframes). |
| Debian GNU/Linux | This Linux vendor is the most concerned about the GPL and GNU/Linux issues out of the major Linux vendors. They ensure that the software they provide is separated into clearly marked groups based upon license. Recently, Debian announced that they will make a distribution available which uses the GNU HURD, so it will not even be called a Linux system, but be a completely GNU system. Debian uses its own package format which its users seem to think is better than sliced bread. Debian is available for the PC, the Alpha AXP, the Sun SPARC, the PowerPC, the Motorola 6800, and the ARM Processor. |
| Slackware |
This was one of the first Linux distributions. It also attempts
to be the most "Unix-like" Linux distribution, meaning that it is
very similar to commercial Unix distributions such as SCO or Solaris.
In reality, Slackware looks a lot more like BSD than any of the
Linux distros; it is the only major distribution to use the simpler
BSD boot script mechanism rather than the SVR4 boot mechanism.
Slackware does not include any of the GUI tools that other
distributions use to configure things, overwriting your custom
configuration files and your custom boot scripts. Slackware is
definitely not for beginners; it is the do-it-yourself distro, but
anyone who has used any Linux distro for at least a year has a
responsibility to try Slackware - many say that they've learned
more about Unix after one night of Slackware than after months of
Redhat. Slackware has the most advanced package system of all the
distros: ordinary tarballs with text-based registration files.
Slackware is available only for the PC, unfortunately.
"slackware users don't matter. in my experience, slackware users are either clueless newbies who will have trouble even with tar, or they are rabid do-it-yourselfers who wouldn't install someone else's pre-compiled binary even if they were paid to do it." Some guy on a mailing list |
| LinuxPPC | This is basically Redhat for the PowerPC. A well-rounded distribution with an easy installation for Macs, RS/6000s and other PPC-based workstations. |
| Yellow Dog Linux | This is also Redhat for the PowerPC. The installation program is a bit better than LinuxPPC's and it comes with a fairly good installation manual. |
Besides these commercial distributions, you can also build your own Linux distribution. This involves downloading the source for every single library and program you need, compiling, installing in a chroot'ed environment, and more compiling. Linux distributions make things a log easier as they package the right versions of all the software together so that a system works correctly, they apply the latest patches to software, and they generally set things up to run more smoothly. Compiling your own distro takes an extremely long time (you're lucky if you get it done in three or four days), and the learning benefit isn't all that great. However, the option is always there, so you cannot whine when you don't like the way your vendor set up your distribution.
Knowing how to navigate the filesystem in a Unix system makes many
things much easier. You can always continually "cd" into various
directories looking for something (the "locate" and "find" commands can
be useful here), but knowing where things are is quite a benefit. If
you come from a MacOS background, you should know that Unix systems are
very different from Macintosh systems in that you cannot simply move
most things around arbitrarily. Macintosh filesystems include two
forks, which allow programs to store certain data along with the
program in one user-visible file, st. when a program is moved, the
corresponding data for the program can also be found (at least this is
how it used to work before huge applications like Netscape and Office
came around). This has several disadvantages, in that many times
Macintosh filesystems cannot take advantage of the latest advances in
filesystem technology and you have to "sit" any file which you wish to
move over the Internet (which was designed with Unix systems with flat
files in mind). MacOSX client will use a very clever mechanism where
"Applications" are actually complete directories which contain static
and dynamic data within them, along with the actual program binaries;
it will be interesting to see how this system turns out to allow the
flexibility that MacOS users are used to on a modern Unix filesystem.
With current Unix systems, you can move around files if you wish, but
you will certainly run into troubles; if you move something, you must
ensure that all other programs that every use that file know that the
location has changed. This is often more trouble than it's worth, and
it usually much easier to simply recompile the program to use a
different directory if such a move is absolutely needed. Unix files
also have no inherent "type"; a file is a file, and only you can choose
what you want to do with it. Text files and binary files are treated
identically at all levels of the API, unlike DOS/Windows. Binary files
are very rare in Unix systems; most Unix users fully expect that the
text-based tools that they love will be able to change any system
configuration files, and that they can programmatically automate such
changes. Indeed, this is one of the beauties of Unix - once you learn
a toolset, you can apply it to almost any situation, and you do not
need to "dig" into the system to make non-trivial changes, nor do you
need to write actual programs in C to interface with a proprietary API
which does not allow non-trivial changes from userland.
Another of the beauties of Unix is that everything is a file.
Writing to a network connection is absolutely the same as writing to a
disk file, playing a sound involves writing to a sound device file, and
even writing to a disk file involves writing to another device file.
All of the tools which you learn or write to process files are
applicable to virtually any device. Many of the devices that are
represented as files are represented as special device
files: a device file is a file which represents some sort of
device on your computer, ie, anything that requires a device driver.
There are two kinds of device files: character files and block files;
these refer to the way in which data can be read/written to the
device. For example, disks are block device files; with a modern hard
disk it impossible to read or write a single byte at a time. Instead,
only blocks of bytes can be read or written, the size of the
block varying from device to device. When a single byte is written to
a block device, the device driver will take that byte and put it in a
buffer; when the buffer fills, the entire buffer is written as a block
(things can actually get much more complicated with real hardware and
hardware scheduling algorithms). A character device is a device to
which you can read/write one byte at a time. An example of a character
device is the terminal window; you can write one byte (one ascii
character) to the terminal at any time, and the character will display
almost instantaneously.
Since disks and partitions are block files, it is necessary to add a
level of inderection so that one does not need to write blocks directly
to disk addresses. This is called mounting a
filesystem. The MacOS has a similar system; when one inserts a CD-ROM,
an icon appears on the desktop representing the contents of the CD.
The CD has actually been mounted, and the mountpoint is the desktop
(which is just another directory). Thus, the disk can be accessed as
Hard Disk:Desktop Folder:CD-ROM. In DOS/Windows, one always "mounts"
to a drive letter; the unix system is much more flexible. A filesystem
can be mounted anywhere at any time. Typically this is done with the
"mount" command, or via the file "/etc/fstab" (which works in
combination with the mount command). See the manpages for more
information.
Since Unix is a multiuser environment, it must provide protection of
files from users. This is accomplished through
permissions. If you are a MacOS or Win9x user, this
concept will be new to you; if you are an NT user, this should be
familiar for you. The Unix model uses "users" and "groups" to provide
filesystem security; each user is a member of one or more groups. Each
file has an owner, which is a user and a group. Each inode (more about
those later) has a data structure which describes the permissions of
the file. Each bit in this structure indicates on of the permissions.
Here are the relevant bits (counting from 0):
Filesystem Standards
| 0 | global execute |
| 1 | global write |
| 2 | global read |
| 3 | group execute |
| 4 | group write |
| 5 | group read |
| 6 | owner execute |
| 7 | owner write |
| 8 | owner read |
| 9 | set group ID |
| 10 | set user ID |
YEENOGHU$ # One has to be root in order to open a raw socket in Unix, YEENOGHU$ # and the only way to send an ICMP message is via a raw YEENOGHU$ # socket; thus "ping" must be setuid in order for all users YEENOGHU$ # to use it: YEENOGHU$ ls -l /usr/sbin/ping -r-sr-xr-x 1 root bin 20404 Oct 6 1998 /usr/sbin/ping YEENOGHU$ # note the "s" in the permissions -- that means setuid. YEENOGHU$ mkdir tmp YEENOGHU$ ls -ld tmp drwx------ 2 ahiorean college 512 Sep 13 05:27 tmp YEENOGHU$ cd tmp YEENOGHU$ cd .. YEENOGHU$ chmod 600 tmp YEENOGHU$ ls -ld tmp drw------- 2 ahiorean college 512 Sep 13 05:27 tmp YEENOGHU$ cd tmp bash: cd: tmp: Permission denied YEENOGHU$ chmod 500 tmp YEENOGHU$ ls -ld tmp dr-x------ 2 ahiorean college 512 Sep 13 05:27 tmp YEENOGHU$ cd tmp YEENOGHU$ ls YEENOGHU$ touch foobar touch: foobar cannot create YEENOGHU$ cd .. YEENOGHU$ chmod 300 tmp YEENOGHU$ ls -ld tmp d-wx------ 2 ahiorean college 512 Sep 13 05:27 tmp YEENOGHU$ cd tmp YEENOGHU$ touch foobar YEENOGHU$ ls .: Permission denied YEENOGHU$ cd .. YEENOGHU$ chmod 700 tmp YEENOGHU$ rm -rf tmp YEENOGHU$ # now let's have some fun with our umask YEENOGHU$ umask 077 YEENOGHU$ umask 000 YEENOGHU$ touch foobar YEENOGHU$ ls -l foobar -rw-rw-rw- 1 ahiorean college 0 Sep 13 05:35 foobar YEENOGHU$ rm foobar YEENOGHU$ mkdir foobar YEENOGHU$ ls -ld foobar drwxrwxrwx 2 ahiorean college 512 Sep 13 05:35 foobar YEENOGHU$ rmdir foobar YEENOGHU$ umask 011 YEENOGHU$ mkdir foobar YEENOGHU$ ls -ld foobar drwxrw-rw- 2 ahiorean college 512 Sep 13 05:37 foobar YEENOGHU$ rmdir foobar YEENOGHU$ umask 022 YEENOGHU$ touch foobar YEENOGHU$ ls -l foobar -rw-r--r-- 1 ahiorean college 0 Sep 13 05:37 foobar YEENOGHU$ # umask 022 is relatively sane; use umask 077 to deny YEENOGHU$ # other users all access to your files. You will have YEENOGHU$ # all sorts of problems if the first digit of your umask YEENOGHU$ # is not 0.
Another important feature of Unix filesystems is that they can contain "links". A link is a special type of file which points to another file. If you are from a MacOS background, this is no problem for you; symbolic links are exactly "aliases". If you are from a Windows background, you will have a harder time with this. You may think that "shortcuts" are like links, but "shortcuts" are nothing like links. A "shortcut" is just a file (extension ".lnk") which the windows shell uses to store some information in. Only the Windows shell knows how to dereference shortcuts - if you are programming in win32, you have to make special concessions with explorer.exe if you want to have your program use shortcuts. "Shortcuts" are implemented above the API level in windows, whereas links are implemented at the filesystem level in Unix.
Unix filesystems have two types of links: symbolic links and hard
links. Both can be created using the "ln" command - "ln" without any
options will create a hard link and "ln -s" will create a symbolic
link. A symbolic link is implemented as a file (everything in Unix is
a file - directories are even files) which simply contains the text of
the path to which the link points. Programs usually can't know if they
are derenferencing a symbolic link or a regular link; they have to go
out their way to find out what kind of file they are handling, and most
programs don't (and shouldn't) do this. In MacOS, sometimes an alias
can be updated when the file it points to is moved, but in Unix, you
have to do this by hand. If the file a symbolic link points to is
moved, the link is called a "broken link", and it doesn't point to
anything. Symbolic links are extrememly useful, and you'll be seeing a
lot of them. For instance, the SVR4 boot mechanism uses symbolic links
instead of just copies of boot scripts so that if you change one copy
of a boot script, the others are also updated.
A hard link is very different from a symbolic link. Whereas a symlink
is a "pointer" to a file, a hardlink is simply another name for the
file. Specifically, a hard link represents an "inode", which is a data
structure that contains filesystem information about a file such as the
location of the file's blocks on the disk, etc. A "superblock" is a
list of inodes in the filesystem, and if it is ever corrupted, the
filesystem is screwed, so generally a filesystem will keep extra copies
of superblocks. Every file you see when you do an "ls" is really a
hard link in that it is a pointer to an inode; when a file has more
than one hardlink to it, it has more than one pointer to its inode, so
when you delete a file to which you have made a hardlink, the file
isn't really deleted; you must delete all hardlinks to the file to
actually recycle the space the file uses on disk. An inode data
structure contains an integer which represents the number of hard links
to the inode, and only when this integer reaches 0 the file is
"deleted". If you move a file, any hard links to it will still be
valid. Hard links can't span across different filesystems (it should
be obvious why they can't do so). Hard links are rarely used; they
can be useful for breaking out of chroot'ed environment, but other than
that, I haven't found much use for them.
Some examples of links are in order.
One thing you may have noted above is the permissions of all the links
I made; the permissions of the hard links were exactly the same as the
permissions of the file they were linked to (the permissions are kept
in the inode actually), but the permissions of the symbolic links were
always 777. If someone tries to dereference a symlink, he/she must
have permission to dereference both the symlink and the file it points
to. The way in which you can change the permissions on the actual
symbolic links differs among the various flavors of Unix.
The installation of the operating system should set up a functioning
/etc/fstab and mount all the needed filesystems at system startup.
After this you can navigate the filesystem as you wish, and you should
learn about where things are held, and what the purposes of the various
directories are. If you are on one of the BSD systems, this is very
simple: simply do a "man hier".
Most Linux systems, sadly, do not have a "hier" manpage. Also, there
is no single standard among Linux systems for where files are supposed
to be, etc (some attempts have been made, but distributors are not very
good about following them). Filesystems will always vary from
distribution to distribution, but here are some of the usual places in
a Linux filesystem:
Text files are immensely important in Unix; most of the configuration
for a program is done by editing a text file, and all types of scripts
(shell scripts, perl scripts, python scripts, etc.) are plain text
files. Textfile-based configuration has the disadvantage that it may
be a bit more difficult for the newbie, but it has many advantages. A
well-edited configuration file can list all the options of a
particular program, whereas you have to fish through menus and dialogs
in a GUI program. Configuration files can be moved and changed
depending on your setup and it is extremely common to generate the
configuration file for a program depending on the state of the system.
Most programming languages like C, C++, Lisp, ML, TeX are written in
simple text files, so if you do a lot of programming, you will be in
deal with text files a lot.
In order to deal with text files, you need to use a text editor; a text
editor usually does not deal with binary files well, but some text
editors allow you to edit binary files, either as text or as a hex
output (ie, like a hex editor). If you spend any time at all in Unix,
you need to find a good text editor and learn it inside and out. As a
programmer, I spend about 90% of my computer time in a text editor.
Modern Unix systems come with two popular text editors: vi and emacs.
Vi (which was developed at Berkeley and is pronounced "vee eye") was
one of the first full-screen text editors for Unix. Before the age vi,
people generally used glass teletypes to communicate with computers,
but at one point, CRT terminals became popular. Vi was written to
take advantage of the fact that (a) you could now display more than one
line of text at a time and (b) you could actually see any changes on your
display in real-time. Vi is pretty much the standard text editor for
all unix systems. Every single Unix system has some kind of
vi on it, so if you learn vi, you won't ever have to learn any other
text editor. Unfortunately, vi has a very steep learning curve - it
uses "modes" for editing, where you can't always just type some
characters and expect them to be inserted. Vi is not very extensible,
but it is very easy to combine commands together to perform powerful
combinations. If you wish to learn vi, you may want to get a good book
on vi (I'm not kidding, people have written substantial books about
this text editor), but if you need a quick introduction, log on to
Harper and type "vilearn". You only need to learn a subset of the
commands for vi to use it effectively. Modern implementations of vi
include nvi (a bug-for-bug reproduction of the original Berkeley vi and
is the standard editor on modern BSD systems), elvis, (a portable
version of vi with certain extensions (like HTML rendering!)), and vim,
(the most featureful and popular vi implementation). Vim is a great
editor for many tasks, and it includes an extensive help subsystem.
The other popular text editor is "Emacs", although it is not really a
text editor. Emacs is actually a Lisp interpreter which has certain
Lisp primitives which are useful for text editing. Most of Emacs is
actually not written in C, but is written in Emacs Lisp, so you can
change almost any feature about Emacs very quickly (learning and
changing Lisp is much easier than learning and mucking around
with C). Emacs is in fact so extensible that such things as web
browsers, mail and news readers, interactive AI programs, tetris, and
even fractal generators have been written for Emacs. Emacs includes
something approximate to a GUI with menus and dialog boxes, but these
can easily be disabled if you don't like them. It includes an
impressive help subsystem, where you can browse info documentation, each lisp function
has built-in documentation which you can obtain from the name of the
function, and you can "grep" this documentation with the "apropos"
subsystem. Emacs includes several terminal emulators, so you actually
never have to leave emacs if you don't want to (in fact, whenever you
see terminal output in this document, it was generated in a subshell in
an emacs buffer and then copied into this buffer). Emacs simply has
too many features to list and it can take years to learn all of them.
If you think that you "know" emacs because you know some keyboard
commands for text editing, you are horribly mistaken - one does not
really "know" emacs until one uses begins to program in Emacs Lisp.
The best way to learn Emacs is from 1. the emacs manual (an info
document), 2. the book
Programming in Emacs Lisp, and 3. the info node on the
Emacs Lisp language and the online help facilities for Lisp functions.
One thing which beginners find difficult about Emacs is that is has
thousands of commands, and lots of these commands are bound to keys
which have to be used very often (you can, of course, change every
keyboard binding if you like). Emacs make extensive use of the control
key, so it is almost impossible to use without a unix-like keyboard
layout (ie, where the "caps lock" key appears on PC and Mac keyboards
there should be a control key); this makes it difficult to use if you
use a terminal emulator to log on to a unix system from MacOS (where it
is impossible to remap the caps lock key to a control key), or from an
NT box where you don't have Administrator access (you need Admin access
to remap the key). Sometimes heavy Emacs users get "Emacs pinky" from
using the left control key so much. Emacs has two different
implementations: GNU Emacs is the original Emacs written by the FSF (in
fact, Emacs is one of the two programs which made GNU so popular (the
other is gcc)); XEmacs is based on GNU Emacs, but it branched off and
developed separately years ago. Both provide similar features, and
both will run either in X or in a terminal window. The only time when
a Unix system will not have Emacs installed is when you just brought up
the system and you haven't downloaded and compiled emacs, so you can be
pretty sure that learning emacs will be useful when you have to move to
other Unix systems. Most Unix distributions will install Emacs by
default.
Pico means small. Pico is a small editor (but much larger than vi)
which was originally developed as the editor for the "pine" email
program. Pico was developed when the Internet began to become popular
and people other than Unix gurus were beginning to log on to Unix
systems to use email. Pico is a horrible editor which is completely
inextensible, makes it difficult to do some basic editing tasks, and is
generally annoying. The only reason I mention it is because some
beginners think that pico is easier to use than vi or emacs because it
lists all of the possible commands at the bottom of the screen and it
behaves similarly to MacOS or Windows text editors. Almost no Unix
distribution will install pico by default, so you'll need to download
and compile it yourself if you want it. If you use Unix even a little
bit, or if you do any programming (on any system, not just
Unix), you should really invest the time to learn how to use a real
text editor.
Documentation is extremely important in any system, but even moreso in
a complex system such as Unix. There are different levels of
documentation from tutorials for the complete beginner to terse
references of options and arguments for the guru. Navigating the
various available types of documentation is probably the most important
skill when learning Unix.
YEENOGHU$ # first, we create a "regular" file:
YEENOGHU$ touch file1
YEENOGHU$ # now make a symbolic link called "file2" which points to "file1":
YEENOGHU$ ln -s file1 file2
YEENOGHU$ ls
file1 file2
YEENOGHU$ file file2
file2: empty file
YEENOGHU$ ls -l
total 2
-rw-r--r-- 1 ahiorean college 0 Sep 13 04:31 file1
lrwxrwxrwx 1 ahiorean college 5 Sep 13 04:32 file2 -> file1
YEENOGHU$ # note that "file1" has size 0, whereas "file2" has size
YEENOGHU$ # 5; NB the number of characters in the name of "file1",
YEENOGHU$ # and you can get a clue about how symbolic links are
YEENOGHU$ # implemented. Now we have some fun with symlinks.
YEENOGHU$ rm file1
YEENOGHU$ file file2
file2: symbolic link to file1
YEENOGHU$ # now file2 is a broken symbolic link.
YEENOGHU$ ln -s file2 file1
YEENOGHU$ # now we have recursive symbolic links. Let's see how the
YEENOGHU$ # operating system likes that, eh?
YEENOGHU$ ls -l
total 4
lrwxrwxrwx 1 ahiorean college 5 Sep 13 04:36 file1 -> file2
lrwxrwxrwx 1 ahiorean college 5 Sep 13 04:32 file2 -> file1
YEENOGHU$ cat file1
cat: cannot open file1
YEENOGHU$ # didn't work, but "cat" isn't using "perror" -- let's
YEENOGHU$ # find a program which prints out the actual error message
YEENOGHU$ # that I'm looking for
YEENOGHU$ head file1
file1: Number of symbolic links encountered during path name traversal exceeds MAXSYMLINKS
YEENOGHU$ # that's what I wanted. What happenned is that the "open"
YEENOGHU$ # system call failed and returned that error.
YEENOGHU$ # now let's work with hard links.
YEENOGHU$ rm file1 file2
YEENOGHU$ echo "Some file contents" > file1
YEENOGHU$ ln file1 file2
YEENOGHU$ cat file2
Some file contents
YEENOGHU$ rm file1
YEENOGHU$ cat file2
Some file contents
YEENOGHU$ ln file2 file1
YEENOGHU$ mv file2 file3
YEENOGHU$ cat file1
Some file contents
YEENOGHU$ # my home directory is mounted via nfs; it's on a separate
YEENOGHU$ # filesystem from tmp:
YEENOGHU$ ls
file1 file3
YEENOGHU$ rm file3
YEENOGHU$ ln file1 /tmp/file1
ln: /tmp/file1 is on a different file system
YEENOGHU$ # so hard links can't span filesystems, but symlinks can:
YEENOGHU$ ln -s file1 /tmp/file1-symlink
YEENOGHU$ cat /tmp/file1-symlink
cat: cannot open /tmp/file1-symlink
YEENOGHU$ ls -l /tmp/file1-symlink
lrwxrwxrwx 1 ahiorean college 5 Sep 13 04:44 /tmp/file1-symlink -> file1
YEENOGHU$ # My mistake, you should always use a full path when making
YEENOGHU$ # a symlink unless you specifically need a relative path
YEENOGHU$ rm /tmp/file1-symlink
YEENOGHU$ pwd
/home/ahiorean/html/tmp
YEENOGHU$ ln -s /home/ahiorean/html/tmp/file1 /tmp/file1-symlink
YEENOGHU$ cat /tmp/file1-symlink
Some file contents
YEENOGHU$ ls -l /tmp/fil*
lrwxrwxrwx 1 ahiorean college 29 Sep 13 04:46 /tmp/file1-symlink -> /home/ahiorean/html/tmp/file1
YEENOGHU$ # again, note the size of the symlink and the length of the path
YEENOGHU$ # that it points to.
YEENOGHU$ chmod 000 ./file1
YEENOGHU$ cat /tmp/file1-symlink
cat: cannot open /tmp/file1-symlink
YEENOGHU$ ls -l /tmp/file1-symlink
lrwxrwxrwx 1 ahiorean college 29 Sep 13 04:54 /tmp/file1-symlink -> /home/ahiorean/html/tmp/file1
/ - the root directory. You do not keep files here and you should
not make any subdirectories here (except perhaps some mount points at
your prerogative). This should be read-only for users.
Unauthorized access to this machine is prohibited.
Use of this system is limited to authorized individuals only.
All activity is monitored.
Make sure that this file does not contain the word "welcome", as this
has been used by perpetrators as a legal defense (ie, "It said
'Welcome' - I thought it was OK to use that computer...").
HARPER$ ls -l /var/spool/mail/ahiorean
-rw------- 1 ahiorean 24327 2115105 Sep 13 01:32 /var/spool/mail/ahiorean
LINUXTEST# du -sh /*
5.5M /bin
4.5M /boot
2.6M /core
113k /dev
1.8M /etc
6.0k /home
13M /lib
12k /lost+found
3.0k /mnt
31M /netscape-jail
1.0k /opt
du: cannot change to directory /proc/6/fd: Permission denied
du: /proc/757/fd/4: No such file or directory
0 /proc
2.1M /root
3.8M /sbin
8.4M /tmp
677M /usr
9.7M /var
Remember that /proc isn't a regular filesystem, so you'll have problems
if you ever try to write in /proc, or if you try to recurse down its
subdirectories like "du" just tried; the files in /proc change so often
that you may have a directory may disappear from under your nose. As
you can see, /usr is where most of the files in this system are kept.
/usr always contains several subdirectories which mimick the directory
structure for /.
Editors
Documentation
| 1 | User commands - these are the actual programs that you have on your computer. |
| 2 | System calls - manual pages also contain API documentation for programmers. Indeed, manual pages are invaluable when you are programming. The manual pages in section 2 describe the system calls provided by the Unix kernel and the libc wrappers associated with them. |
| 3 | Library functions - these describe the library functions provided by libc and other libraries. Only useful for programmers. |
| 4 | Device files - these describe device files, ie, the things that live in /dev. Often this is where a device driver in the kernel will describe its programming interface. |
| 5 | File formats - these describe the formats for various files including configuration files. If you forgot what the fourth field of /etc/passwd is, for example, you can look it up in the section five manpage of "passwd". These are very useful for the system administrator, and they often also describe C structures to interface with programs, so they are invaluable for the programmer as well. |
| 6 | Games - not very useful. If you are looking for a manual page for Quake, you probably need help. |
| 7 | Miscellaneous - miscellaneous stuff that doesn't fit elsewhere. |
| 8 | Administration commands - sometimes the programs in /sbin will have their manpages in section eight instead of section 1. |
| 9 | Kernel Interfaces - Describes how to hack your kernel. |
Additionally, some Unix variants (such as Solairs) have manual pages which are further subdivided into subsections. On a BSD system, you can get a specific section of a manual page by adding the section number in between the name of the manual page and the "man" command - for instance, "printf" is both a user command and a libc function on many Linux systems. If you want the C function, you type in "man 3 printf" and if you want the user command, you can type in "man 1 printf". In Solaris, you can use the "-s" option to "man" to get a particular subsection. Each manual section should have an "intro" manpage, so you can do "man 2 intro" to get an introduction to your kernel's system calls.
Manual pages are stored on disk in the "man" subdirectories; this usually includes /usr/man or maybe /usr/share/man. Within these directories are subdirectories for each section of a manual page, and these directories generally contain the troff source for the manual page. Troff is an old typesetting language which is only used for manpages these days. Many times, manual pages are compressed to save on disk space.
Although manual pages are usually accessed via the "man" command, you can also format manual pages for displays other than a text terminal. You can try the "xman" or "tkman" commands to see what I mean. Emacs can display manpages either via the "man" function or via the "woman" package. If you want to print a manual page, the most portable way to do this is to first find the actual source for the manual page, uncompress it if necessary, convert it to postscript and then print it. For example, if I wanted to print the manual page for "ascii" so I can have an ascii manpage stapled to my wall, I would go through the following:
YEENOGHU$ # first, find the manpage; you can use "locate" if you have it, or YEENOGHU$ # you can use "find" or just look where you expect it to be: YEENOGHU$ find / -name ascii.5* -print 2> /dev/null /usr/man/sman5/ascii.5 YEENOGHU$ # It's not compressed, so you can format it directly: YEENOGHU$ groff -S -Tps -mandoc /usr/man/sman5/ascii.5 > ascii.ps YEENOGHU$ # Now you can print "ascii.ps" via "lpr". YEENOGHU$ # If you just want to view without using the 'man' YEENOGHU$ # command, you can do this: YEENOGHU$ nroff -man /usr/man/sman5/ascii.5 | more
The locations of the manual file locations are kept the the environment variable MANPATH. MANPATH will be searched sequentually when "man" is invoked. You may need to change your manpath if some software doesn't install its manual pages in a pre-existing man directory.
If you need to look something up in a manual page but you don't remember the name of manual page that it is under, you can use the "apropos" command. The "whatis" command will search for an exact match to you manpage. Use "makewhatis" in a man directory to build the database for "whatis".
GNU programs often do not have manual pages; instead GNU prefers its own documentation system called "info". GNU info was created before the explosion of the web, but in many ways it is very similar to HTML - info documents are text files which contain hyperlinks to other parts of the file or other files. Info is the native documentation format for Emacs, so you can use Emacs's info mode to browse info documents. Some people do want to install emacs but still want to browse info documents; for this, there is a standalone program called "info". Info has a manual page, but the best way to learn about GNU info is by typing "info info". Many Unix distributions come with entire books in info format; for example, you may have an info document for "gawk" which is actually an entire book devoted to the "awk" programming language and GNU awk in particular.
Many Unix users do like to use word processors such as Word or WordPerfect, but instead like to actually typeset documents so that they look professional. One of the most popular typesetting systems is the TeX system (written by Donald Knuth, stands for Tau-epsilon-chi, pronounced like "tech"). TeX is actually the most popular system for writing mathematical documents; most mathemeticians publish all of their work in TeX and advanced mathematics students will often have their first introduction to Unix because they need to learn TeX to publish a paper (an alternative is the "Scientific Word" program which is basically a Microsoft Windows frontend to TeX). TeX is actually a complex programming language - one writes a TeX program as a plain text file with imbedded formatting commands and then uses the "tex" command to translate the document to its final printable form. This generates a binary file called a "dvi" (device independent) file. Texinfo is a set of TeX macros which are used to write a document which can be translated into dvi or into the GNU info format; most info documents are actually generated from texinfo documents. Use the "texi2dvi" command to transate a texinfo file to dvi. LaTeX is another macro package for TeX which simplifies some mundane formatting tasks - use the "latex" command to translate a latex file to dvi. If you need to have users of non-Unix operating systems read a TeX file, the safest way to give them the document is to translate it to PDF using "pdftex" or "pdflatex". Note that translating to PostScript and then to PDF is a bad idea, as the fonts may look bad when viewed in a PDF browser. See the next section for information on printing dvi files.
PostScript is a programming language; it is actually a very elegant and powerful programming language. PostScript was actually written as a graphics display language and it was meant to be implemented in hardware on printers. Because of its elegance, it quickly became a favorite language among technical people for displaying typeset documents, and it has been adopted as the standard document format in Unix systems. Many books and other documents are available in PostScript format. You can use the "gv" or "gs" commands to view a PostScript document ("gs" stands for GhostScript, which is a PostScript interpreter, and "gv" stands for GhostView, which is a frontend to "gs"). The standard Unix print format is PostScript, so if you want to print a dvi file, you may need to convert it to PostScript first (using "dvips"). See the section on printing for more information. If you actually want to learn the PostScript programming language (a very good idea, PostScript will change the way you think about programming), the definitive reference is the so-called "Red Book," the PostScript Language Reference by Adobe, the creators of PostScript; other resources include the "Green Book", PostScript Language Design, a gentler introduction by Adobe, and the book Thinking in PostScript by Glenn Reid (Addison Wesley), another introductory text.
You may also find important information from other sources. One of the best places to look is on the Web; simply use your favorite search service (such as Google). It is also important to note the many Unix gurus are very active on Usenet, so search the newsgroups if you have a particular technical question. A good place to start is with the local newsgroup uchi.comp.unix. Google Groups keeps archives of many newsgroups which you can search with their web-based interface. Many people have written documentation for Linux, and the Linux documentation project is a good place to find books and tutorials, in addition to HOWTOs, documents which quickly explain how to do a specific task. Good Unix distributions come with extensive documentation, and you should look in /usr/doc or /usr/share/doc to see what's available.
When X11 was released, a sample implementation of the X11 protocol was released. This sample implementation was then adopted by commercial unix vendors who adapted the distribution for their own versions of Unix and subsequently bundled X with their operating systems. In "X11R6", the "R6" refers to the release version of the sample X11 implementation on which the X software is based. There was an "R5" a very long time ago, and there are no plans for an "R7" release, so "X" generally means "X11R6". The difference between X11R6 and X11R5 are fairly small - X11R6 primarily fixed bugs with the R5 implementation and introduced a few new utility functions for programmers.
In addition to the commercial unix vendors who based their own X implementations on X11R6, an organization named "XFree86" used the X11R6 sample implementation to develop X software for x86 computers (ie, IBM PCs). Nowadays, XFree86 runs on more than just PCs (it can run on MacOS X, for example). XFree86 is the standard X software for both Linux and *BSD.
X is a client/server-based model. It is a bit confusing in that "client" and "server" are somewhat of misnomers. An X server is the program which runs on the local computer and sets up the display and keyboard/mouse for local use; the X server also handles X networking requests from non-local machines. The X server runs on the computer to which your keyboard, mouse and monitor are connected. When you wish to run a program locally (the normal case), that X program sends its X requests to a Unix domain socket on the local machine (usually /tmp/.X11-unix/X0). (A Unix domain socket is a special file that allows different processes running on the same machine to communicate.)
You can also run X programs remotely as follows: first, ensure that your X server is running; then log in to another machine using an xterm; then check your DISPLAY environment variable - ensure that it has your local hostname in it; then you simply launch the program on the remote machine, and it will run on the remote machine but display on your screen and take input from your mouse and keyboard. When a program is run like this, the remote program is called a "client" and your local X server is the "server" - the client sends display information to the server using TCP, and the server sends keyboard and mouse events to the client. The X protocol can also be tunnelled using ssh - see the section on security for details. In fact, this is the most common way of running X programs over a network.
Setting up X can often be very painful and is not covered in this document. See the manual pages for the XF86Config file and the xf86setup command if you are using a Linux or freely-distributable BSD system. Once you have a working X installation, you will need to actually start X. There are generally three ways to do this: 1. via "startx", 2. via "xinit" or 3. via "xdm". "startx" is a shell script which comes with many X installations - it generally does some additional things in addition to calling "xinit". "startx" may also be called "xstart" or various other things on some systems. It is used for starting X from the text console. "xinit" usually does most of the work of configuring the environment for X and then launches the X server. "Xdm" is different from "xinit" and "startx" in that it actually replaces the textual login prompt with a graphical login prompt; to run xmd, simply type "xdm" as root. You will then be presented with a configurable login widget, and after you are done with your X session, X will be restarted with xdm appearing again. Many times xdm will be started from the system initialization scripts, and it may even have its own run level.
When you first log into X, X will run a shell script that you write to set up your environment. The name of the shell script may vary, but generally, if you use xinit (or startx), X will run the file "~/.xinitrc"; if you use xdm, it will run the file "~/.Xsession".
Unlike other graphical systems, X does not have a standard widget set. In fact, X itself does not define widgets, but widgets are instead defined via libraries which run on top of X. This means that X programs will have no standard "look-and-feel", unlike Windows or MacOS. Some say that this is the greatest weakness in X, but it also provides considerable flexibility.
Some of the more common widget libraries include the Athena library, Motif, GTK and QT. The Athena library was one of the first widget libraries for X; in fact, Athena applications are among the first graphical applications, and this shows. Athena is extremely ugly and can be difficult to use. Athena does not define a real "menu" widget - menus are usually implemented as buttons, if at all. You should also be aware that Athena scrollbars work completely differently from the usual scrollbars in other systems. It is unfortunate that many older X programs have not been ported to use a more modern widget library.
One of the most widespread widget libraries is "Motif." Motif defines things such as menus, a 3-D look and feel, and all the other features you would expect in a modern graphical interface. Unfortunately, Motif is not free software - programs that use motif are often distributed as statically linked binaries, since you must pay for the Motif library. An example of a motif application is Netscape Communicator 4.x. Lesstif is a free implementation of most of Motif.
GTK is a modern graphics library and it provides many of the features of the Windows GDI, the MacOS interface and more. The appearance of GTK widgets is completely customizable. GTK programs are written in C and GTK is the graphics platform the the GNOME desktop system. You can read more about GTK at its homepage.
QT is a portable graphics library provided by TrollTech AS. QT programs can be written to be portable to X and Windows if written carefully. QT programs are written in C++ and QT is the graphics library for the KDE desktop environment.
Many programs which use the standard X libraries are configurable in two ways: one is the usual way, within the program using menus and dialog boxes, and the other is with X resources. See the manual page for xrdb and the section on resources in the X manpage for information about setting up a .Xresources file.
Some operating systems, such as MS Windows or MacOS, provide window management utilities as "part" of the OS. This is not so in Unix; window management is performed by a completely separate program (a windowmanager), which may be replaced, hacked, etc. The "Window Management" that I'm referring to is the way with which you interact with your graphical environment: this includes moving/resizing/iconifying windows, starting new programs, working with workspaces, etc. You change your windowmanager by editing your '.xinitrc' or '.xsession'. Try not to spend too much time configuring your windowmanager and trying out new ones, as this can be quite time-consuming and you won't get much work done; I'd recommend just finding something that works and going with it, rather than continually tweaking things. A list of popular window managers:
New windowmanagers seem to come out every week, so this list is not by any means exhaustive. Here and here are some longer lists. Some people like huge, bloated, gui-configurable windowmanagers so that they can show off transparent terminals, shaped windows, etc. to non-unix users. Some people like very minimalist windowmanagers so they can get their work done quickly, as most work in Unix happens in the terminal window or the text editor, not the windowmanager.
Most Unix users organize their work based upon specific tasks which are being accomplished; contrast this with MacOS, which organizes work based upon the application being used (ie, a task is usually accomplished using only one application in MacOS, but in Unix, you usually use many different programs when working on something), or Microsoft Windows, which attempts to organize work not based upon the program running, but rather upon "documents". There are many other differences between the way one works with Unix as compared to Windows or MacOS; the basic point is that paradigms which work well in Windows or MacOS will not necessarily work in X, so don't try to apply them unnecessarily. For example, almost all applications in MacOS have a menu bar, in the same place all the time. This makes it easy to access the functions of the current program in a standard way. On the other hand, why does a terminal emulator need a menubar? Does a text editor need a toolbar? Most text editing operations happen using the keyboard, so using a toolbar will slow down the user since the hands must go off the keyboard. These are just some things to keep in mind when you are trying decide on how to choose a windowmanager or organize your work when using Unix.
Since Unix is a such a networked operating system, it is often preferred
by attackers. With a Unix machine, it does not matter whether you
are sitting at your terminal or you are logged on accross the world;
you can use everything in a Unix machine from a networked environment.
This makes Unix machines especially attractive to perpetrators because
they can build up a chain of comprimised machines to launch further
attacks; this can make tracking them difficult. Very often when a Unix
machine is attacked, the attacker does not really care about what is on
that particular machine; he/she is only attacking the machine to gain
access to other machines or to launch a DoS attack on someone else.
In MacOS and Windows, most tasks must be performed at the actual system,
so these machines are more unlikely to be attacked by a serious cracker
(but do not let this tempt you into thinking that you are safe if you are
not running Unix - an attacker can just as easily upload a non-interactive
program to a comprimised NT box and use that for nefarious purposes).
It is therefore imperative that if you run any sort of Unix, you
must secure your system. Every year, many machines on this
campus are comprimised, and most of these comprimises could have very
easily have been prevented.
Most Unix installations are, by default, incredibly insecure. Unix
vendors like to "show off" their systems, and as such, they enable just
about every service that the machine can run. Your first job is to
disable these services.
Many Unix network daemons are run from "inetd", the Internet super
daemon. Inetd actually listens for a bunch of connections and then
passes off any connections to the actual network daemons on connection.
Inetd can also block out certain addresses, but it is a poor excuse for
a firewall. Inetd is usually started up from your boot scripts, and its
configuation file is /etc/inetd.conf. It also reads /etc/hosts.allow
and /etc/hosts.deny (see the "hosts.allow" or "hosts_access" manpages for
more info on these - technically, some versions of inetd don't read these
files but pass off that responsibility to a different program). The first
thing you should do when you first log on as root to a new system is
to disable all of the services in /etc/inetd.conf. Simply comment them
out with a "#" to disable the services; see the manpage for "inetd.conf"
for info about the format of the file. You probably don't want to run
any of these services, but if there is one that you really want, you'll
have to keep inetd running. Otherwise, it is a good idea to disable
inetd from starting up (see the sections on boot
scripts for how to do this). Common services which are started via
inetd are telnet, ftp, some network diagnostic services, rlogin, finger,
rpc.statd, lpd, and others. These are some of the most dangerous and most
commonly comprimised services, so if you don't want to be rooted, disable
all of these. There is absolutely no excuse for running any
of the above services on a desktop box. Alternatives are explained below.
After disabling inetd, you need to disable any other network daemons
you don't need that aren't started from inetd. The general rule is that
if you don't know what something is, you don't need it.
This is important; if you do not use something, get rid of it. Here is
a list of common services that are not started via inetd:
| httpd | Web servers usually aren't started from inetd for performance reasons. You should carefully evaluate whether you need to run a web server - this school provides various places where you can put up a webpage, and it may actually look more "professional" to have your webpage have a standard-looking URL rather than an IP address or an ugly dorm hostname. If you want to experiment with web servers, CGIs, or PHP, you may want to restrict access to the uchicago.edu domain; this is an especially good idea if you are writing CGIs. Note that grepping for "httpd" may not work if you are using Apache (the de-facto web server for Unix); in this case, grep for "apachectl." Some Unix distributions contain documentation that can only be accessed while running a web server (perhaps on a non-standard port); be aware that you may break such documentation systems when you disable httpd. |
| portmap | Portmap takes RPC requests and translates them into TCP or UDP ports. Basically the only reason that you would need to run this is if you want to use NFS. RPC is responsible for a huge, huge number of exploits, so don't enable this. |
| syslogd | DON'T disable this one. Syslogd manages syslog error messages (see the manpage for details). Disabling this actually makes the box much less secure because you might not get important log messages indicating an attack. This is usually stuck in with the network daemons because it can act as a network daemon, accepting logs from other machines, but by default this is turned off. See the syslogd manpage to ensure that the command-line switches which enable other machines to log to you are not included, but otherwise leave this alone. |
| named | This is a DNS name server daemon. This is usually BIND 8 on modern Unix systems. Disable this immediately if it is enabled. BIND is responsible for a large number of exploits, and you have absolutely no business running a name server from a dorm machine. |
| routed | Disable this immediately if it is enabled. You only need to run this if your machine is a router. |
| rwhod | Disable this immediately. This allows other hosts to perform a remote "who" command to see who is logged onto your system. Such information is invaluable to attackers. |
| lpd | This is the Berkeley printer daemon. You need to run this if you want to print (even if you only want to print to local printer connected to your machine); otherwise disable it. Ensure that people can't use the printer service on your machine remotely by checking /etc/hosts.lpd (see the manpage) and making sure that your /etc/printcap does not anywhere contain the "rs" capability (see the manpage). See the section on printing for information about how exactly this works. |
| sendmail | This is an smtp server, among other things. In general, when it is invoked from a boot script, it is only to listen for incoming mail. Disable this immediately - you do not need it. You do not need to "run" sendmail like this if you only want to send mail messages from your system. It is recommended that you use POP or IMAP to read your mail and that your computer does not allow others to send it mail directly (besides, a dorm machine will change its IP address every year, so it would actually be inconvenient to receive messages directly). Sendmail has been known to be responsible for a number of exploits. Only experienced systems administrators should attempt to configure and run sendmail, as it will give a novice migraines (trust me). |
| xntpd | This is an NTP client - its purpose is to keep
your clock synced with an NTP server's clock, which can be important if
you're using time-sensitive protocols such as kerberos. The xntpd
configuration can be a bit confusing; here are some steps to configure
this to run correctly (this is assuming you are running the standard
xntp3 distribution): in /etc/ntp.conf, put the names of each of the
NTP servers you need to use as described in the documentation. You
should also specify a driftfile. The other options aren't really
needed. Here is a sample configuration file for use in the dorms:
server ntp-0 server ntp-1 server ntp-2 driftfile /etc/ntp/driftXntpd only corrects for small differences in the time; to make large changes in the time, you should run the ntpdate program. You may also want to set your hardware clock after syncing the time via ntpdate. |
| xfs | This is a font server. Generally, you do not need
to run this, but RedHat Linux enables it by default (for what reasons
escape me...). If this is enabled and you disable it, you may break your
X configuration. In order to ensure you can still use X after
disabling this, look for the xfs configuration file which is generally
specified via the command line to xfs. In the xfs configuration file,
look for all the font directories that are exported: these are usually
comma-separated directories after the "catalogue" directive in the xfs
config file. Add all of these font paths to the FontPath directive in
your XF86Config file. For example, if your xfs configuration file
contains this line:
catalogue: /usr/X11R6/lib/X11/fonts/100dpi,
/usr/X11R6/lib/X11/fonts/75dpi
add the following directives to the "Files" section of your XF86Config:
FontPath "/usr/X11R6/lib/X11/fonts/100dpi" FontPath "/usr/X11R6/lib/X11/fonts/75dpi"You should then test your X configuration with xfs turned off before you permanently kill xfs from your startup scripts. I know of one local exploit which uses xfs. |
| yp* | Anything that starts with the letters "yp" or "rpc.yp" is used for YP. NIS (network information service) is another name for YP (yellow pages). The most common use for YP is to distribute a user base; dorm machines shouldn't have very many users and you can't really fit a cluster of machines into a dorm room, so you have no reason to run any NIS services. |
| mountd nfsd rpc.lockd rpc.statd | These are used for NFS. NFS (network file sharing) is the standard file sharing protocol for Unix (like AFP is for Macs and SMB is for Windows machines). NFS has been known for a large number of exploits, so disable it. You don't need these programs to mount remote drives (these are server-only programs), and you probably won't need to use NFS even on the client side since you can't get to your Harper or CS home directories by mounting an NFS share. NFS (like all RPC-based protocols) requires portmap. |
| natd | This is an IP translation utility, only needed if your box is acting as a gateway to some other machines. If this is enabled by default, your Unix vendor needs to have a serious talking-to. |
| rarpd | This daemon takes requests for ethernet addresses and responds with the corresponding IP address. This is only used in combination with other daemons when your machine acts as a server for diskless clients which boot off a network drive. You don't need this. |
| timed | This is another protocol for synchronizing time via a network. Don't use this; use xntpd instead. |
| sshd | See the section on ssh for details. This is included here because some of the newest Unix distributions might actually include OpenSSH, so you don't even have to download and compile ssh yourself. If you do download and compile ssh yourself, beware that you might have to disable the system script if you write your own. |
Although running the above programs may be a security risk, one of the best reasons for running Linux as a college student is to learn about networking, so you may want to run these services, not to actually use them, but to find out how they work. In this case, I recommed that you set an internal network (ie, use IPs defined in RFC 1918). If you don't have enough machines, you may want to look into VMWare which will allow you to run multiple operating systems on the same machine.