The Unix command line - beginner's and advanced level

The Unix command line - beginner's and advanced level

Sven H.M. Buijssen

Institute of Applied Mathematics
Faculty of Mathematics
TU Dortmund University

Unix

History

A lot of standardisation has occurred, but be aware that there are slight differences in command line tools and their options between different Unix flavours.

File system

Directories

There are a number of different actual file system implementations for Unix, but they all share a hierarchical tree structure. They follow the so-called Filesystem Hierarchy Standard.

File system

Directory commands

Commands to work with directories:

Useful options:

  • cd without any argument changes to home directory
  • cd - goes back to previous directory
  • Both mkdir [path] and rmdir [path] accept the option -p:
    • mkdir -p [path]: create [path] by creating all the non-existing parent directories first, if necessary.
    • rmdir -p [path]: remove [path] and its parent directories which become empty.
  • ls options will be discussed later in details, so only shortly:
    • -l: use a long listing format;
    • -a: include hidden file system objects (i.e. that start with .)
    • -d: list directory entries themselves, not the respective's directory contents
    • -1: show a single entry per line (in short format)

File system

Directory commands (examples)

Examples:

pwd
/path/to/this/tutorials/test/directory
mkdir -p 5/4/3/2/1 6
ls
5/ 6/
find
. ./5 ./5/4 ./5/4/3 ./5/4/3/2 ./5/4/3/2/1 ./6
cd 5
ls
4/
cd ..
rmdir -p 5/4/3/2/1
find
. ./6

File system

Basic file operations

Commands to organise files:

Useful options:

  • cp, mv and rm accept the option -i:
    • cp -i [source] [destination]: ask interactively before overwriting any file in [destination].
    • mv -i [source] [destination]: ask interactively before overwriting any file in [destination].
    • rm -i [source1] [source2] [filemask]: ask interactively before removing any of the given files.
  • rm -f [source1] [source2]: do not ask when removing any of the given files.
  • rm -r [source1] [source2]: option to remove directories and their contents, too, recursively (even if directory not empty).
  • cp -p [source] [destination]: preserve time stamp (and permissions and ownership) of original file.
  • cp -a [source] [destination]: recursively copy, preserve time stamp (and permissions and ownership) and do not dereference symbolic links. (Basically like creating a tarball of [source] and unpacking it at [destination], exact replica)-a is GNU extension! Available only in cp implementation from GNU coreutils,
    i.e. by default available with Linux distributions, but not on vanilla Solaris.

File system

Inspecting file content (1)

Commands to inspect files:

File system

Inspecting file content (2)

Commands to inspect files (cont.'d):

File system

Symbolic links

A symbolic link (often abbreviated to symlink) is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path. (Think: similar to a "shortcut" in Microsoft Windows)

Symbolic links operate transparently for most operations: programs that read or write to files named by a symbolic link will behave as if operating directly on the target file. (Some programs, like cp, rm, tar may handle symbolic links specially: after all, removing e.g. a symbolic link to a directory should remove the symbolic link, not the entire directory it links to.) Syntax:

ln -s [target] [link]

Prominent example for those with account in TU Dortmund Math network or on LiDOng cluster: symlink named nobackup in home directory, it points to a completely different file space, but when performing operations like cd ~/nobackup, cp ~/nobackup/file /tmp the symlink behaves completely transparent.

Symlinks are typically used

File system

Symbolic links (2)

mkdir -p my-great-project/release-0.1
cd my-great-project
mkdir release-0.2
ln -s release-0.2 stable-release
ls -l
total 8 drwxr-xr-x 2 user group 4096 Jan 2 18:33 release-0.1/ drwxr-xr-x 2 user group 4096 Jan 2 18:33 release-0.2/ lrwxrwxrwx 1 user group 11 Jan 2 18:34 stable-release -> release-0.2/

The path my-great-project/stable-release could get "published". The benefit of publishing that instead of my-great-project/release-0.2 is that whenever you come up with a better release you can simply adjust the symbolic link - without the need to inform anyone or change any setting that refers to the then outdated my-great-project/release-0.2:

mkdir release-1.0
rm -f stable-release
ln -s release-1.0 stable-release

File system

Permissions

With Unix-like systems, every file system object has individual permissions that control the ability of the users to view or make changes and whether or not they are able to access files in subdirectories.

For every file system object, there are permissions for the owner of an object, group permissions and permissions for anybody else with access to said file system; these permissions are packed in groups of three (triads). Example:

ls -lad /bin /bin/ls /usr/bin/java /dev/null /var/log/messages
drwxr-xr-x 2 root root 4096 Oct 7 18:18 /bin/ -rwxr-xr-x 1 root root 114032 Sep 21 2010 /bin/ls* crw-rw-rw- 1 root root 1, 3 Dec 19 12:48 /dev/null lrwxrwxrwx 1 root root 22 Feb 13 2012 /usr/bin/java -> /etc/alternatives/java* -rw-r----- 1 syslog adm 1130 Jan 2 08:01 /var/log/messages

File system

Permissions (2)

ls -lad /bin /bin/ls /usr/bin/java /dev/null /var/log/messages
drwxr-xr-x 2 root root 4096 Oct 7 18:18 /bin/ -rwxr-xr-x 1 root root 114032 Sep 21 2010 /bin/ls* crw-rw-rw- 1 root root 1, 3 Dec 19 12:48 /dev/null lrwxrwxrwx 1 root root 22 Feb 13 2012 /usr/bin/java -> /etc/alternatives/java* -rw-r----- 1 syslog adm 1130 Jan 2 08:01 /var/log/messages

Permissions explained

The leading character distinguishes directories (d) from files (-), symbolic links (l) and character special files (c).

The remaining 3x3 characters have different meaning depending on whether the object is a file or directory:

Files

File system

Permissions (3)

ls -lad /bin /bin/ls /usr/bin/java /dev/null /var/log/messages
drwxr-xr-x 2 root root 4096 Oct 7 18:18 /bin/ -rwxr-xr-x 1 root root 114032 Sep 21 2010 /bin/ls* crw-rw-rw- 1 root root 1, 3 Dec 19 12:48 /dev/null lrwxrwxrwx 1 root root 22 Feb 13 2012 /usr/bin/java -> /etc/alternatives/java* -rw-r----- 1 syslog adm 1130 Jan 2 08:01 /var/log/messages

Permissions explained

The leading character distinguishes directories (d) from files (-), symbolic links (l) and character special files (c).

The remaining 3x3 characters have different meaning depending on whether the object is a file or directory:

Directories

Source: http://www.greenend.org.uk/rjk/tech/perms.html

File system

Changing permissions

Use chmod to change permissions. chmod supports a numeric (octal number) and a symbolic input mode to encode permissions. For the occasional user, the latter is prefered as it is easier to learn and remember. Syntax:

chmod [references][operator][modes] file ...

The references are used to distinguish the users to whom the permissions apply. Without references, it defaults to "all" but modifies only the permissions specified. The references are code as follows:

  • u (user): refers to the owner of the file
  • g (group): includes all users who are members of the file's/directory's group
  • o (others): for all users who are not the owner nor members of the group the file/directory belongs to
  • a (all): user + group + others

The following operators are viable to specify how the modes of a file should be adjusted:

  • +: adds the specified modes
  • -: removes the specified modes
  • =: prescribes exactly the modes specified

Useful command line option: -R to apply an operation recursively

File system

Changing permissions (2)

Examples:

touch foo.bar
ls -l foo.bar
-rw-r--r-- 1 user group 0 Jan 2 09:11 foo.bar
chmod u-rw,o+x foo.bar
ls -l foo.bar
----r--r-x 1 user group 0 Jan 2 09:11 foo.bar
chmod a=rw foo.bar
ls -l foo.bar
-rw-rw-rw- 1 user group 0 Jan 2 09:11 foo.bar

File system

Permissions (4)

Directory permissions lead to some odd effects:

If you have r, but not x on a directory, then you can list the files in the directory but not access them in any way. Trying to do a plain ls on such a directory will work; trying to do ls -l will produce a "Permission denied" error for each file in the directory:

mkdir tmp
touch tmp/file1; echo "foo bar" > tmp/file2
chmod u=r tmp
ls -ld tmp
dr--r-xr-x 2 user group 4096 Jan 2 11:15 tmp/
ls tmp
ls: cannot access tmp/file1: Permission denied ls: cannot access tmp/file2: Permission denied file1 file2
ls -l tmp
ls: cannot access tmp/file1: Permission denied ls: cannot access tmp/file2: Permission denied total 0 -????????? ? ? ? ? ? file1 -????????? ? ? ? ? ? file2

File system

Permissions (5)

Conversely if you have x, but not r, then you will not be able to list the files in it, but if you happen to know their names then you will still be able to access them, and you will be able to cd into the directory:

chmod u=x tmp
ls -ld tmp
d--xr-xr-x 2 user group 4096 Jan 2 11:15 tmp/
ls tmp
ls: cannot open directory tmp: Permission denied
ls -l tmp/file2
-rw-r--r-- 1 user group 8 Jan 2 11:15 tmp/file2
cat tmp/file2
foo bar

File system

Permissions (6)

Finally, if you do neither have r nor w for a directory, you are not allowed to remove it - even if you are the owner. Continuing from the previous example, we note:

rm -rf tmp
rm: cannot remove `tmp': Permission denied

Re-add read and write permission for yourself, r and w, then you can remove the directory:

chmod u+rw tmp
rm -rf tmp
ls -d tmp
ls: cannot access tmp: No such file or directory

File system

Changing permissions (3)

In order to be able to change the default permissions for newly created files and directories (next slide), one should be familiar with the alternative to specify permissions: the octal notation of file system permissions. It consists of at least three digits, the left-most digit is for user permissions, the middle for users belonging to the designated group and the right-most for all other user accounts.

Examples:

File system

Changing the default permissions for new files and directories

When a new file or directory is created by a process, it gets default permissions (by applying a so-called mask). These default permissions are inherited from the process that spawned the current one. If the process is a shell, the default permissions can be tweaked by using (the shell built-in command) umask (being a shell builtin-in function as opposed to a program the supported syntax does vary, unfortunately).

umask typically supports the octal notation, bash additionally supports input in symbolic notation (including the very same operators chmod supports, e.g. umask o-w,g-w).

The mask is programmatically applied by the operating system by first negating (complementing) the mask, and then performing a logical AND with the requested file mode. Loosely speaking, the given mask is subtracted from a default of 666 and the resulting octal number is applied for files (while for directories a default of 777 is used).

C Shell Family

(only octal numbers as input supported)

umask 022
rm -rf file subdir; touch file; mkdir subdir
ls -ld file subdir
-rw-r--r-- 1 user group 0 Jan 1 16:48 file drwxr-xr-x 2 user group 4096 Jan 1 16:48 subdir/
umask 026
rm -rf file subdir; touch file; mkdir subdir
ls -ld file subdir
-rw-r----- 1 user group 0 Jan 1 16:48 file drwxr-x--x 2 user group 4096 Jan 1 16:48 subdir/

Bourne Shell Family

(octal numbers and symbolic mode as input supported)

umask 022
rm -rf file subdir; touch file; mkdir subdir
ls -ld file subdir
-rw-r--r-- 1 user group 0 Jan 1 16:48 file drwxr-xr-x 2 user group 4096 Jan 1 16:48 subdir/
umask u=rwx,g=rx,o=x
rm -rf file subdir; touch file; mkdir subdir
ls -ld file subdir
-rw-r----- 1 user group 0 Jan 1 16:48 file drwxr-x--x 2 user group 4096 Jan 1 16:48 subdir/

Standard Output and Error Redirection

Redirect standard output/error to a file

The shell and many UNIX commands take their input from standard input (stdin), write output to standard output (stdout), and write error output to standard error (stderr). By default, standard input is connected to the terminal keyboard and standard output and error to the terminal screen.

C Shell Family

It is possible to redirect stdout of a program to a file by means of the > character, followed by the desired destination:

echo "Hello world" > file.txt
cat file.txt
Hello world

Bourne Shell Family

It is possible to redirect stdout of a program to a file by means of the > character, followed by the desired destination.

echo "Hello world" > file.txt
cat file.txt
Hello world

Instead of >, one can also use 1>.
(1 is in the Bourne Shell Family the file descriptor number for stdout. 2 is for stderr.)

Standard Output and Error Redirection

Redirect standard output/error to a file (2)

C Shell Family

To redirect both stdout and stderr of a program to a file, use >&, followed by the desired destination:

ls /path/to/nowhere > ls.stdout
ls: cannot access /path/to/nowhere: No such file or directory
cat ls.stdout
# Note: empty output because nothing would have
# been shown on stdout, only stderr
ls /path/to/nowhere >& ls.stdout-and-stderr
cat ls.stdout-and-stderr
ls: cannot access /path/to/nowhere: No such file or directory

Bourne Shell Family

To redirect only stderr of a program to a file, use 2>, followed by the desired destination:

ls /path/to/nowhere 2> ls.stderr
cat ls.stderr
ls: cannot access /path/to/nowhere: No such file or directory

To redirect both stdout and stderr of a program to a single file, use 1>[file] 2>&1:

ls /path/to/nowhere 1> ls.stdout-and-stderr \ 2> &1

To redirect stdout and stderr of a program to separate files, use 1>[file1] 2>[file2]:

ls /path/to/nowhere 1> ls.stdout 2> ls.stderr

Standard Output and Error Redirection

Redirect standard output/error to a file (append mode)

Sometimes, stdout (or stderr) of several, individually invoked processes needs to be gathered in a single file.

The redirection operator, >, overwrites the given log file at every invocation.

>> [file] redirect output to [file], but appends to said file instead of overwriting it.

Example:

echo "Hello" > file.txt
echo "world" > file.txt
cat file.txt
world
echo "Hello" > file.txt
echo "world" >> file.txt
cat file.txt
Hello world

Standard Output and Error Redirection

Redirect standard output/error to a program

The pipe operator | allows to feed stdout of one program to stdin of another program for further processing:

ls -1 /
bin/ boot/ cdrom/ dev/ [more lines to follow]
ls -1 / | head -n 2
bin/ boot/

C Shell Family

Again, stdout and stderr can be merged by means of the |& operator.

ls -1 / |& head -n 3

Bourne Shell Family

To postprocess both stdout and stderr at the same time, first merge stderr into stdout, 2>&1, and then use the pipe operator |:

ls -1 / 2>&1 | head -n 3

Recent Bash versions (≥ 4) support the shorthand |& for 2>&1.

Standard Input Redirection

Redirect standard input to a program

Many commands accept input from standard input (stdin). By default, stdin reads information from your keyboard, but just like stout, it can be redirected. One way, as we have just seen, is by means of a pipe. Another way is by means of the redirection operators < [file] and << [MARKER].

A few slides back, we have learned about tac. Without a file argument, tac reads standard input and then again displays it with all the lines reversed.

tac
some random text # Press Ctrl-D at beginning of a new line to end input. # You will then get the following output of tac:
text random some

Standard Input Redirection

Redirect standard input to a program (2)

Using tac like shown on the previous slide does not make much sense. But it does when redirecting its input and feeding tac the content of a file:

echo "world" > file.txt; echo "hello" >> file.txt; echo "oh," >> file.txt
tac < file.txt
oh, hello world

The result can, of course, be redirected again to a file.

tac < file.txt > filereverse.txt

Standard Input Redirection

Redirect standard input to a program (2)

The second syntax of redirecting standard input can be used (e.g. from a shell script) to feed a program custom (multiline) text without having to

cat <<EOF
oh, hello world EOF↵
oh, hello world

This example is pretty dull; peaking ahead into the section about useful little helper programs available on Unix-like systems we will see on the next slide a nice example involving the mailx command.

Little helpers

mailx

Use mailx to send e-mail directly from command prompt (e.g. to notify oneself when a process has finished):

Examples:

(echo "this is part of the message body"; ls; echo "end of message body") | mailx -s "subject" recipients-email-address@domain.org
./cc2d; cat <<EOF | mailx -s "cmd line notifier" recipients-email-address@domain.org Dear myself, This is to inform myself that the simulation code running on `hostname` has finished just now. EOF

Replace recipients-email-address@domain.org with your e-mail address, run the examples and then check your e-mail.

Note that mailx will use the generic $USER@$HOSTNAME as sender address. Replying to a mail sent this way is typically not possible, unless you happen to send from a server that is setup as mail receiving host, too. To manually set a (valid) reply address, some mailx implementations allow the option -r fromaddress (Solaris), on Linux you need something like -a "Reply-To: fromaddress".

Device files

A device file (or special file) is an interface for a device driver that appears in a file system as if it were an ordinary file. They provide simple interfaces to peripheral devices, such as hard disks (including hard disk partitions), USB resources, printers and serial ports. Finally, device files are useful for accessing system resources that have no connection with any actual device such as data sinks and random number generators.

Most prominent examples:

Subshells

Multiple commands: run unconditionally and with boolean operators

It is sometimes helpful to agglomerate the output of multiple commands.
The ; character separates individual commands, both of which are run unconditionally.
The && operator realises a boolean AND, the || operator a boolean OR.
Nesting commands in parenthesis, (...), groups commands and merges their output together:

mkdir tmp; (cd tmp && pwd && ls) || echo "changing directory failed"
/path/to/current/directory/tmp
cd ..
(cd foobar && pwd && ls) || echo "changing directory failed"
bash: cd: foobar: No such file or directory changing directory failed

With shells from the Bourne Shell Family any error message from the commands blocked together with (...) and run in a subshell can be suppressed:

(cd foobar && pwd && ls) 2> /dev/null || echo "changing directory failed"
changing directory failed

Subshells

Running a command in a subshell and using its stdout as stdin for another command

The backtick operator, `...`, runs commands in a subshell and returns their output. In that respect it is similar to the pipe operator, but unlike the latter the backtick operator can occur multiple times in the parent command.

Example:

Simple (integer) math in the shell:

echo `expr 5 + 10`
15
echo `expr 3 \* 2`
6
echo `expr 2.5 \* 2`
expr: non-numeric argument
echo `expr 100 / 6` `expr 66 / 6`
16 11

Be aware that the backtick operator collapses contained whitespace (spaces, tabulators, carriage returns) to a single blank:

echo " Hello " > foo
(echo; echo "world "; echo) > bar
cat foo bar
␣␣Hello␣␣␣␣ world␣
echo "<" `cat foo bar` ">"
< Hello world >

Bash versions since at least 2.0b5 provide the $(...) operator as an alternative to `...`. Advantage is that it allows for nesting of subshells (to find the appropriate escape sequences (see slides on special shell characters) for `...` inside a `...` can be hard, as the opening backtick for the nested command needs to be "hidden" to not be wrongly interpreted as closing backtick operator):

echo $( i=0; i=$( expr $i + 1 ); )
1

Shells

Special characters I: wildcards

Command line interpreters like Bash, Tcsh, ksh, zsh, dash etc. support filename wildcarding by means of wildcards and (simple) regular expressions:

Shells

Special characters I: wildcards (2)

The shell interprets the wildcards, expands the list and then passes this list to the programs invoked. It is not the programs that expand the list!

touch file1.txt file2.csv file3. foo.txt bar.html
ls
bar.html file1.txt file2.csv file3. foo.txt
ls *.txt
file1.txt foo.txt
ls *.txt f*
file1.txt file1.txt file2.csv file3. foo.txt foo.txtNote that some files are listed twice,
that is because they matched multiple patterns.
ls [^f]*
bar.html
ls file?.{txt,csv,}
file1.txt file2.csv file3.

Shells

Special characters II: Characters for comments, history repeat, escaping/quoting

Command line interpreters (aka Unix shells) recognise a number of characters as special:

Examples:

echo "Hello world" # and hello to class, too
Hello world
cd #data; pwdBash users beware: you will find yourself back in your home directory. Escape # by prefixing.
with a backslash or nest string in quotes. Somehow, tcsh catches this exception and really
changes into a subdirectory named #data, if it exists.
cd \#data; pwd; cd ..
/path/to/directory/#data
cd "#data"; pwd; cd ..
/path/to/directory/#data
!e
echo "Hello world" # and hello to class, too Hello world

Shells

Environment variables

In all Unix and Unix-like systems, each process has its own separate set of environment variables. By default, when a process is created, it inherits a duplicate environment of its parent process (except for explicit changes made by the parent when it creates the child). Environment variables surve configuration purposes and inter-process communication purposes.

Important environment variables

Shells

Environment variables (2)

Working with environment variables

Subshells

A little more info an subshells

A subshell is a child process of the currently run shell. As such it inherits the environment from the parent process: it is invoked in the same directory and has (a copy of the) same environment variables.

Changes made to the environment in a subshell, however, do not affect the parent process. In other words, an environment variable that gets set/unset/altered in a subshell gets its original value back as soon as the subshell exits and the parent shell takes over again!

Example: changing e.g. the environment variable PAGER in a subshell:

echo $PAGER
less -iRS
echo `setenv PAGER foo; echo $PAGER` # bash: echo `export PAGER=foo; echo $PAGER`
foo
echo $PAGER
less -iRS

Shells

Multitasking: Foreground Processes and Background Processes

As multitasking operating systems, Unix-like systems allow for running multiple processes in the background while one continues to work in the foreground.

All examples so far showed foreground processes: The user had to wait for one foreground process to complete before he could run another one.

running them in foreground is (in general) the only reasonable way to run the program.

The shell does not have to wait for a background process to end before it can run more processes. Within the limit of the amount of memory available, you can enter many background commands one after another. To run a command as a background process, type the command and add & to the end of the command. While that is running in the background, the shell prompt will return. At this point, you can enter another command for either foreground or background process. Background jobs are run at a lower priority to the foreground jobs. You will see a message on the screen when a background process is finished running.

A foreground process can be made a background process by first stopping it by pressing Ctrl-Z and then typing bg at the shell prompt.

(A foreground process can be aborted by pressing Ctrl-C at the shell prompt.)

Shells

Multitasking: Foreground Processes and Background Processes (Examples)

Examples:

emacs &
[1] 9796
firefox &
[2] 9797
find /
# a long list of output
# press Ctrl-Z
^Z [3]+ Stopped find /
jobs
[1]- Running emacs & [2]- Running firefox & [3]+ Stopped find /
fg 2
firefox
# press Ctrl-C to abort foreground process or Ctrl-Z again to interrupt it
bg 3

Frequently used commands

ls

ls lists directory contents.

Useful command line options

Frequently used commands

grep

grep searches the named input files (or stdin) for lines containing a match to the given PATTERN. By default, grep prints the matching lines. Syntax:

grep "PATTERN" [file OR stdin] [file2]

Examples:

cat <<EOF > file.txt
oh, hello world EOF↵
grep hel file.txt
hello
grep -v hel file.txt
oh, world

Frequently used commands

grep (2)

Frequently used command line options

Helpful GNU extension

Frequently used commands

grep (3)

Syntax of PATTERN (aka syntax of basic regular expressions)

Frequently used commands

grep (4)

Examples:

wget http://www.mathematik.tu-dortmund.de/~buijssen/unix-command-line/samplefile.log
cat samplefile.log
iteration param value1 value2 value3 string ------------------------------------------------------------ 1 0.9 2.79E-05 2.01E-05 1.09E-06 oman 2 0.9 2.85E-07 2.44E-07 5.86E-08 ecuador 3 0.9 5.62E-06 4.40E-06 4.18E-07 mongolia 4 0.9 5.65E-06 4.48E-06 4.03E-07 argentina [...]
grep 'land$' samplefile.logMake sure to use single ticks with tcsh.
Otherwise tcsh interprets $" as reference
to variable named " and complains about
an "Illegal variable name."
With single ticks, the shell does not try to
expand any variables inside the quoted string.
16 0.9 2.79E-05 2.06E-05 1.17E-06 switzerland 18 0.9 5.65E-06 4.19E-06 5.17E-07 ireland 30 0.9 2.79E-05 2.00E-05 1.06E-06 england
grep "2\.2.E-0[1-5]" samplefile.log
27 0.9 2.81E-05 2.26E-05 1.48E-06 bahrain 31 0.9 2.80E-05 2.20E-05 1.38E-06 ivory_coast
grep "\bba" samplefile.logRequires GNU grep to work because of use of '\b'!
27 0.9 2.81E-05 2.26E-05 1.48E-06 bahrain 37 0.9 1.21E-06 9.71E-07 1.97E-07 barbados
grep "\b[^b-ds-z][a-z]b.*a" samplefile.log
7 0.9 2.79E-05 2.04E-05 1.14E-06 lebanon 36 0.9 1.33E-06 1.12E-06 2.63E-07 libya 45 0.9 1.15E-06 9.08E-07 9.93E-08 albania

Frequently used commands

egrep

egrep is a version of grep that supports extended regular expressions, i.e. in addition to the previously listed regular expressions it supports. Syntax:

egrep "PATTERN" [file OR stdin] [file2]

Examples:

egrep "\b[^b-ds-z][[:alpha:]]+(ba|bya)" samplefile.log
7 0.9 2.79E-05 2.04E-05 1.14E-06 lebanon 36 0.9 1.33E-06 1.12E-06 2.63E-07 libya 45 0.9 1.15E-06 9.08E-07 9.93E-08 albania
egrep "^[[:space:]]+[0-9] .*on" samplefile.log
3 0.9 5.62E-06 4.40E-06 4.18E-07 mongolia 6 0.9 2.79E-05 2.16E-05 1.31E-06 honduras 7 0.9 2.79E-05 2.04E-05 1.14E-06 lebanon

Frequently used commands

find

find searches for files in a directory hierarchy. Syntax:

find directory [directory2] [options]

Frequently used command line options

Frequently used commands

find (2)

Examples:

find /usr/bin -name "*foo*bar*"
# search for files containing the substrings "foo" and "bar"
find ~/ -size +100M -name "*.gmv" -exec gzip {} \; -ls
# compresses all GMV output files in your home directory and below that # exceed a size of 100 MiB; list those files, too
find ~/ ! -type l \( -perm -g+w -o -perm -o+w \) -ls
# find and list in detail all non-symlinks in your home directory and below # group members or others can write to
find . \( -name "sample*" -o -name "*.foo" \) -exec grep "[dz][eio]" {} \; -ls
# search for all files with prefix "sample" or suffix ".foo" # in current directory and subdirectories that contain a substring # where the letter 'd' or 'z' is followed by 'e', 'i' or 'o'; # matching lines are shown 2 0.9 2.85E-07 2.44E-07 5.86E-08 ecuador 13 0.9 2.79E-05 2.09E-05 1.21E-06 burundi 16 0.9 2.79E-05 2.06E-05 1.17E-06 switzerland 22 0.9 1.16E-06 9.03E-07 1.04E-07 india 37 0.9 1.21E-06 9.71E-07 1.97E-07 barbados 40 0.9 1.22E-06 9.79E-07 2.16E-07 indonesia 48 0.9 2.83E-05 2.33E-05 1.68E-06 sweden 1937766 2 -rwxr-xr-x 1 user group 2879 Jan 4 13:01 ./samplefile.log
find /tmp -mtime -15 -ls
# search for all objects in /tmp that changed within the last 15 days
find /tmp -mmin -15 -ls
# search for all objects in /tmp that changed within the last 15 minutes

Frequently used commands

gzip / gunzip / bzip2 / bunzip2

gzip and bzip2 reduce the size of the files given as argument using compression algorithms (as does e.g. zip in the Microsoft Windows world). Upon successful compression, new files are created which get the original file names with ".gz" and ".bz2", respectively, appended. The input files are removed!

gunzip and bunzip2 are the corresponding counterparts to uncompress files again. Likewise, the compressed input files are removed upon successful decompression.

Example:

cp /etc/mime.types foo.file
ls foo.file*
foo.file
gzip foo.file
ls foo.file*
foo.file.gz
gunzip foo.file.gz
ls foo.file*
foo.file

It is often not necessary to uncompress a compressed file explicitly (i.e. saving the uncompressed stream to disk) if all you want is just some kind of content inspection, see the commands zcat/bzcat, zmore/bzmore, zless/bzless, zgrep/bzgrep, zegrep/bzegrep.

Frequently used commands

tar

tar is the standard Unix tool to create archives (as is zip in the Microsoft Windows world). Syntax:

tar [operation mode] [options] file-or-directory ...

Frequently used command line options

Frequently used commands

tar (2)

Examples:

mkdir -p tmpdir1 tmpdir2/subdir1 tmpdir2/subdir2
tar -c -v -f archive.tar tmpdir*
# Recursively store the contents of directories tmpdir* in archive.tar
gzip archive.tar
# => yields archive.tar.gz
tar -cvzf archive.tar.gz tmpdir*
# => Same result: just storing and compression combined in a single step
rmdir -p tmpdir1 tmpdir2/subdir1 tmpdir2/subdir2
ls -d tmpdir1 tmpdir2/subdir1 tmpdir2/subdir2
ls: cannot access tmpdir1: No such file or directory ls: cannot access tmpdir2/subdir1: No such file or directory ls: cannot access tmpdir2/subdir2: No such file or directory
tar -xvpzf archive.tar.gz
# => Extract content from archive
ls -d tmpdir1 tmpdir2/subdir1 tmpdir2/subdir2
tmpdir1/ tmpdir2/subdir1/ tmpdir2/subdir2/

Frequently used commands

file

file determines file type by analysing file content. Syntax:

file file-or-directory ...

Examples:

file /etc/a[lu]*s* /bin/bash /lib32/libc-*.so /usr/lib/firefox/icons/*.png /usr/*/*/kde/HTML/en/*/top.jpg ParaView-3.98.0-Windows-64bit.exe
/etc/aliases: ASCII text /etc/aliases.db: Berkeley DB (Hash, version 9, native byte-order) /etc/alternatives: directory /etc/auto.master: ASCII English text /etc/auto.misc: ASCII English text /etc/auto.smb: Bourne-Again shell script, ASCII text executable /etc/autofs_ldap_auth.conf: regular file, no read permission /bin/bash: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.24, stripped /lib32/libc-2.15.so: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.24, stripped /usr/lib/firefox/icons/mozicon128.png: PNG image data, 128 x 128, 8-bit/color RGBA, non-interlaced /usr/share/doc/kde/HTML/en/common/top.jpg: JPEG image data, JFIF standard 1.01 ParaView-3.98.0-Windows-64bit.exe: PE32 executable (GUI) Intel 80386, for MS Windows, Nullsoft Installer self-extracting archive

Frequently used commands

man

man command tries to locate a so-called man page for command (by querying all paths listed in the environment variable MANPATH).

On a Unix-like operating system, a man page is since 1971 the default form of online software documentation in concise format, a quick reference guide. (Other formats are info pages (using the editor emacs as viewer) and since the late 90s HTML pages.)

The viewer for a man page is given by the environment variable PAGER. If unset, it is in general more or less - depending on the platform. (For navigation and search shortcuts see slide 7.)

Example:

man find

Commands one should know

top

top is a task manager program. It produces an ordered list of running processes selected by user-specified criteria, and updates it periodically. By default results are sorted by decreasing CPU usage, but other sorting criteria can be specified interactively (hit ? to get a help screen with all available shortcuts, e.g. M to sort by decreasing memory usage and T to sort by decreasing CPU time). top lists

top is by default available on Linux systems, but not on, e.g., vanilla Solaris.

Note: In the header section of top output, the amount of total, used and free memory is printed. Be aware that the values for memory used and memory free may paint a wrong picture: the values include the amount of memory used by the cache, cache to e.g. start a program faster when invoked for a second time. To correctly determine the amount of free memory on a Linux machine (part of the procps package, which is not available on vanilla Solaris), invoke free -m and look at the cell "free buffers/cache"!

Exercise: Compare the amount of free memory reported by top -n 1 | head -5 and free -m.

Commands one should know

ps

ps is, compared to top (see previous slide), a lower level, but more powerful tool to retrieve information about active processes.

Without options, ps prints information about processes that have the same effective user ID and the same controlling terminal (tty) as the invoker. This way, ps can be used in a similar way as jobs: to see all the processes that have been spawned from the current terminal.

Using command line options, the output of ps can be completely customised. This can prove useful when, e.g., monitoring the memory footprint of a job.

Which command line options are supported differs between Unix systems, but either one of

ps aux
ps -efl

will work and retrieve a complete list of all running processes. Using pipes, the output is typically passed to a viewer (like less) or filtered (by e.g. grep, sed, awk, perl etc.). Example:

ps aux | sed "1p; /$LOGNAME.*bash/p; d;"
# ⇒  retrieves a complete list of all running processes and limits it to the #    first line and all processes that belong to the invoker and stem from running the command #    'bash': USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND user 1479 0.0 0.0 11744 3628 pts/18 Ss Jan10 0:00 /bin/bash user 2530 0.0 0.0 11732 876 pts/4 Ss 2012 0:00 /bin/bash user 2531 0.0 0.0 11712 876 pts/2 Ss+ 2012 0:00 /bin/bash user 2532 0.0 0.0 11712 876 pts/6 Ss+ 2012 0:00 /bin/bash

Commands one should know

kill

On slide 38 we learned that processes can be aborted by hitting Ctrl-C, interrupted with Ctrl-Z and resumed with fg and bg - but all needs to be done in the controlling terminal.

kill provides a means to send (abort/interrupt and other) signals to any running process, identified by its process ID (in short: PID).

ps and/or top are typically used to determine the required PID. The signal to send depends on context at hand. Commonly used signals:

firefox &
ps -u $LOGNAME -U $LOGNAME | grep firefox
29334 ? 00:10:00 firefox
kill -STOP 29334 # interrupt firefox process, similar to Ctrl-Z from terminal
kill -CONT 29334 # have firefox continue, same as "fg" and "bg" from terminal
kill 29334 # tell firefox to exit
kill -KILL 29334 # "pull the plug" on firefox a.k.a forced exit

Commands one should know

du

du estimates disk usage of each given file, recursively for directories. Syntax:

du [option] file-or-directory ...

Frequently used command line options

Examples:

du -ksc /etc/* | head -5
20 /etc/ConsoleKit 4 /etc/LatexMk 8 /etc/Muttrc 24 /etc/Muttrc.d 4 /etc/ODBCDataSources

Commands one should know

df

df displays the amount of disk space available. Without any arguments, lists all currently mounted file systems. With arguments, the output is limited to file systems listed as argument.

Frequently used command line options

Examples:

Show free space on file system current path belongs to

df -m .
Filesystem 1M-blocks Used Available Use% Mounted on doncamillo:/export/home 3955183 1112338 2842845 29% /home/doncamillo

Commands one should know

lpr / lpq

The lp* commands are command line tools to interact with a configured printer (more precisely: with a spooling daemon maintaining a printer queue). This spooling daemon is typically set up to automatically recognise content (see file) and invoke an appropriate filter that converts ASCII text, Postscript and PDF documents to raw command sequences understood by the printer in question.

lpr -Pprintername [file]

prints [file] to the printer designated by the name [printername]. If the -Pprintername command line option is omitted, the environment variable PRINTER is used instead (math network: set to an appropriate value depending on a user's group membership).

lpq -Pprintername

prints the printer queue (list of active print requests along with owner ship and status) for the printer designated by the name [printername]. If the -Pprintername command line option is omitted, the environment variable PRINTER is used instead.

Commands one should know

lpq / lprm

Example:

lpq
Printer: ps1042@jerusalem 'HP4300DTN' Queue: 2 printable jobs Server: pid 26933 active Rank Owner/ID Pr/Class Job Files Size Time active user@hostname+23 A 23 /tmp/acroread_3003_ 205085 19:08:33 2 user@hostname+36 A 36 /tmp/acroread_3003_ 349582 19:08:35 done user@hostname+10 A 10 /tmp/acroread_3003_ 247956 19:08:32

To remove a waiting print job from the queue, determine its ID from above's lpq output and run

lprm -Pprintername "user@hostname+36"
Printer pscoldup1042@jerusalem: checking perms 'user@hostname+36' dequeued 'user@hostname@r-ray+36'

Little helpers

sort / uniq / wc

The following commands come in handy sometimes to postprocess screen output or file content:

Examples:

cat <<EOF | sort -k1,1 -k2,2r | uniq 3 4 1 2 3 2 1 2 EOF
1 2 3 4 3 2
file /usr/bin/* | grep "shell script" | wc -l
389

Little helpers

hostname

To determine name of server a user is currently logged into: hostname

Examples:

hostname
cassini
echo "Program XX on `hostname` has finished" | mailx -s "subject" recipients-email-address@domain.org

Little helpers

diff / md5sum / sha*sum

To check whether files are identical or differ, a number of command line tools are available:

Examples:

echo "foo" > file1.txt; echo "bar" > file2.txt; echo "foo" > file3.txt
diff -sq file[12].txt; diff -sq file[13].txt
Files file1.txt and file2.txt differ Files file1.txt and file3.txt are identical
echo "foo" | diff -qs file1.txt -
Files file1.txt and - are identical
echo "fop" > file4.txt; md5sum file?.txt
d3b07384d113edec49eaa6238ad5ff00 file1.txt c157a79031e1c40f85931829bc5fc552 file2.txt d3b07384d113edec49eaa6238ad5ff00 file3.txt 89de73aaae8c956fb7c9379be7978e5b file4.txt

When you are interested in detail in what respect two files differ, plain diff output is typically hard to read. For these cases, GUI tools like tkdiff, kdiff3, meld, bcompare and emacs' ediff mode are better suited.

Little helpers

cut

To postprocess screen output or file content by extracting certain sections or column from each line of input (file/stdin), use cut. Useful command line options: -b to select by bytes, -d to specify a field delimiter other than tabulator, -f to select certain fields

Examples:

ls -l /usr/bin | cut -b1-11,47- | head -n 5
total 42367 -rwsr-sr-x 012 X* lrwxrwxrwx 012 X11 -> ./ -rwxr-xr-x :15 Xorg* -rwxr-xr-x :50 a2p*
find /usr/bin | sort | cut -d/ -f1,3-5 | tail -n 3
/bin/zipsplit /bin/zsoelim /bin/zxpdf
cut -b20-45 samplefile.log | cut -d' ' -f 2 | head -n 5
# First extract a rectangular subsection of the input data, then from that column 2 value1 -------------------------- 2.01E-05 2.44E-07 4.40E-06

Note that cut splits the input at every occurrence of the given delimiter, even if that means that the resulting field is empty!

Little helpers

Further commands you might find worth learning more about in self-study:

Little helpers for programmers

ldd / nm

The following commands come in handy when debugging why a certain program will not start or not link:

To be continued if there is sufficient interest.

(Possible topics: sed, awk, perl, shell scripting in sh, bash or tcsh, GNU make, CMake.)