CS304: Unix Skills

This course requires you to work on a Linux server (the CS department server, also known as tempest), using a command line, and your life will be much easier if you have some Unix skills. (Linux is one of many descendants of Unix, an operating system that predates DOS and Windows. Mac OS X is another descendant, so many of the commands and concepts below will work on a Mac as well.)

Motivation

You already have a lot of experience working with computers, but with a GUI. A GUI is a graphical user interface, meaning icons, menus, and even drag-and-drop — heaven help us. Although Mac and Windows are rather different, they both use GUIs that rely crucially on those techniques to get work done. GUIs are intuitive, user-friendly, and easy to work with. I'm asking you to give all that up in favor of a CLI (command line interface), so I'd better have a really good reason.

The main reason is that the easiest way to connect to the server is through a relatively narrow tunnel, through which it's quick and easy to send textual commands and get textual responses. (It's possible to set up an X11 tunnel and use a Linux GUI, but that would require you to become familiar with a Linux GUI, so I think it's better to learn some skills that are more universally applicable.)

Another reason, almost as important, is that these skills will make you more efficient both now and in the future, in ways that you can't yet foresee. That's because programming languages are (for the most part) textual, and there's only a very blurry line between a program to accomplish some task and a series of commands to accomplish something. In short, commands can be automated in a way that is hard or impossible for a GUI.

Consider a quick example. Using a GUI, changing the permissions on a file takes about 5 mouse clicks (not including all the navigating around to find the file). Changing the permissions on a file using a CLI takes about 20 keystrokes. If you have to change the permissions on 100 files, it'll take you 5×100=500 mouse clicks, but about 20 keystrokes will do the trick using a CLI, with judicious use of wildcards. (Plus, if you're a good typist, those 20 keystrokes can take less time than those 5 mouse clicks.)

A software developer named Lawrence D'Oliveiro talks about this in his Tumblr post entitled CLI versus GUI Deathmatch!. You should also read a recent post about Linux philosophy. (I'll bet you didn't know an operating system could have a philosophy!)

Directories

Like all filesystems, Unix is organized as a tree of directories. To be able to refer to another directory or file, you have to understand the notation of the filesystem. Here are some to know about:

/
The directory whose name is the slash character is the root of the tree. Every directory and file is a descendant of this directory. Assuming there are no permissions issues, you can uniquely specify a directory or file by starting at the root. For example, this file is:
/home/cs304/public_html/unix.html

The preceding is called an absolute pathname, since it works from anywhere.

You'll notice that the different directories in the tree of directories are separated from each other with slashes. So, a slash plays two roles: it separates parent from child and also stands for the root of the ancestry tree. A side-effect of this notation is that you cannot name a directory or file with a slash in the name. (On Mac OS, the directory separator is a colon; on Windows, it's a backslash.)

.
The directory whose name is a single period (pronounced dot) is like the pronoun me: it stands for the directory you are in. Each process on a Unix system has a "current working directory" (CWD) and all relative pathnames implicitly start at the CWD. For example, the follow are two equivalent ways to say "the file named foo.text in the current working directory:"
./foo.text
 foo.text

Relative pathnames are very useful and important because they allow code to be relocatable, meaning that a directory subtree can be copied to another location, possibly even on another machine, and all relative pathnames that stay within the subtree will still work!

..
The directory whose name is a double period (pronounced dot dot) is like the word mom: it stands for the unique parent directory of a directory. For example, the following says "the file named bar.text in the the parent of the current working directory:"
../bar.text

The dot-dot syntax can be very useful in relative pathnames, to address a file that is related via an ancestor (say an uncle, or a second cousin, once removed).

~/
The directory whose name is a tilde is another kind a pronoun: it means your login directory. For example, each of you has a public_html directory in your home directory. Here is how you would address a file in that directory:
~/public_html/index.html

Remember that for each of you, that will be a different file!

Also, because it refers to the password database, the tilde is typically not available in programs, but it is a nice feature of the shell, so you will often use it in commands.

~user/
Tilde has a second usage, where it is immediately followed by the name of a user: it means the login directory of that user, referring to a kind of "database" called the password database, which stores the home directory for each user. (This is the /etc/passwd file, which you're welcome to look at if you want to.) For example, the following path would be the address a file in the CS 304 course account on the CS server:
~cs304/public_html/home.html

Most of the pathname concepts above may be familiar to you from URLs, since the syntax of the pathname in a URL derives from Unix pathnames.

With these concepts in mind, hopefully the following sections will be more clear. I will be more terse in these sections, so if you find a command confusing, I encourage you to make use of one of those thousands of web tutorials on unix and linux. At the end of this document, I have links to the man pages for these commands.

Conventions and Prompts

In the following sections, I will give sample input and output from interactions with a Linux machine (actually, Tempest, the CS department server).

When you are logged into a Linux machine (directly via the console or or across the network via ssh), you will be running a "shell" (a shell is just a program that allows you to run commands). When the shell is ready for your command, it will print a "prompt." That prompt is wildly customizable. On Tempest, the default prompt is like this:

[user@host cwd]

That is, the shell prints three pieces of information, enclosed in square brackets: the username you're logged in with, the name of the machine you're logged into (the host), and the name of your current working directory (cwd). In the examples below, I will usually be logged into the "Wendy Wellesley" test account, so the prompt will look like this:

[wwellesl@tempest ~] 

You will never type that part!

For brevity, I will often replace that with just a $ prompt.

Also note that all these examples have a typographic convention that the stuff you're supposed to type is in bold monospace and the responses and other output is in regular monospace. Any tutorials you read on Linux will probably have occurrences of a prompt (possibly very terse, such as a dollar sign or a percent sign), and may have conventions to help you distinguish what you type from the computer's response.

man

From the very beginning, Unix machines have had online "manuals" for use by everyone from novices to experts. Probably only one (unix) person in a thousand remembers more than a handful of the options for the "ls" command. So, when you're logged in, don't hesitate to use the "man" command to learn more about a command you're unfamiliar with:

[wwellesl@tempest ~] man ls

To exit from man, type "q".

Of course, these online man pages are on the web as well; I give some links at the end of this page.

As I mentioned, the shell always puts you "in" a directory, your "current working directory" (CWD, also called "." or dot). Commands to know:

ls
lists the files and directories in the given directory. With no arguments, lists the contents of the CWD.
cd
changes the CWD to the given directory. With no arguments, changes to your home directory.
pwd
prints the absolute pathname of the CWD, in case you forget where you are.

Moving and Copying

Now that you can move around, you'll want to be able to move and copy files. Commands to know:

cp
copies the first argument (a file) to the second argument (either a file or a directory). There are many other options; see the man page for more.
mv
moves the first argument (a file) to the second argument (either a file or a directory).
rm
removes (deletes) the file(s). Caution! This is not a reversible operation: there is no "un-rm" command.

Now that you know about cp, read this brief digression on the puzzle of tilde.

Tab completion

The Unix shell has many built-in conveniences for power users and poor typists. One you should know about is "tab completion." If you type part of a filename, enough to identify a unique file in the directory, and you hit the "tab" key (above caps lock on the left side of your keyboard), the shell will fill out the rest of the filename. If your prefix is not unique, the shell will fill out as much as it can, and allow you to make a choice of how to continue.

You don't have to do this, of course, but it beats typing the whole name, which is slow and error-prone.

Wildcards

If you want a command such as ls or chmod to apply to several or many files, you can list all of them on the command line, but that can be tedious if there are many files. Wildcards are special characters that match any character, allowing you to specify a pattern for the filenames. (Like a wildcard in a card game.)

*
The asterisk character matches any character and as many as possible.

Just be super careful using both rm and the asterisk; it's really easy to delete all your files!

Making Files and Directories

To make a file, you would typically use a text editor, such as Emacs or vi. Emacs and vi are very different in usage, philosophy and user base. Emacs is slower to start up, but bloated with many features. vi is quicker to start up but is leaner. There are many other differences, but this is not the place to continue the decades-long cold war between the Emacs and vi factions.

You should know, however, that I'm firmly in the Emacs camp.

In CS 304, we'll be using Visual Studio Code, so you don't need to learn Emacs or vi. It's good to squirrel that knowledge in the back of your mind, though, because in a different environment, you might not have Visual Studio Code, but if you're on a Unix system, you will always have vi and almost always have Emacs. (I can think of only one time in my life when Emacs wasn't already installed, and it only took a few minutes to install it.)

To manage files and directories, use these commands:

touch file
creates an empty file with the given filename. You may never use this command, but it's very useful in demonstrations and experiments.
mkdir dir
creates the named directory.
rmdir dir
removes (deletes) the directory, but only if it's empty.
rm -r dir
recursively removes (deletes) the directory tree. Caution! This command is even more dangerous than "rm" itself. Not for the faint of heart.

tar and gzip

If someone wants to give you a bunch of files and directories, they could attach each of them to a mail message to you, or put them all on a web server where you could download them, but what if there were hundreds or thousands of files and directories? Handling them all one-at-a-time would be tedious at best.

One solution, used for decades by Unix people, is a "tarfile," which is a single file that contains a directory tree. If you download a significant collection of software from a website, you will almost certainly be offered the option of downloading a tarfile, among other options.

(Historical aside: the name "tar" comes from "tape archive" from back in the olden days when the "tar" was actually used to write a directory tree to a magnetic tape. However, some bright spark added a command line argument allowing "tar" to write the archive to a file on disk — the "tarfile" or "tarball" — instead of to a tape device. )

You're all familiar with compression, such as MP3 compression of a sound file. A compression program used for decades by Unix people is called "gzip." For text files, gzip does an excellent job, often reducing the size of a file by a factor of three or so. So, in fact, when downloading software, the option is typically not an uncompressed tarfile but a gzipped tarfile.

Commands to know:

tar cf tarfile directory
Create a tarfile from the given directory
tar tf tarfile
List the contents of the given tarfile (the "t" is for "table of contents").
tar xf tarfile
Extract the contents of the given tarfile, creating a subdirectory of the current directory.
gzip file
Compress the given file, replacing it with a .gz version.
gunzip file
Un-compress the given file

Adding a "z" to the first argument of "tar" makes it work with gzip compression.

Note, an alternative to the tar/gzip combination is zip and unzip.

Tar Gotchas

There are several pitfalls with using tar. It's not that tar does anything wrong, but it acts differently from how you might naively expect. So, let's become less naive.

First, the tarfile contains a directory tree. Suppose it contains a very small directory like this:


  mydir/
     apple.py
     cherry.py

Let's tar this up in the usual way, namely to be in the folder *above* mydir and to tar it into a file called mydir.tar:

[wwellesl@tempest ~] tar cf mydir.tar mydir

We can even get a table of contents of a tarfile, like this:

[wwellesl@tempest ~] tar tf mydir.tar
mydir/
mydir/apple.py
mydir/cherry.py

So, the name of the directory, mydir, is part of the contents of the file. So, what happens if we rename the file, say like this: mv mydir.tar yourdir.tar. Does that change the contents? No!

[wwellesl@tempest ~] mv mydir.tar yourdir.tar
[wwellesl@tempest ~] tar tf yourdir.tar
mydir/
mydir/apple.py
mydir/cherry.py

This surprises many novices, but in retrospect it makes sense. Changing the name of a file doesn't change the contents.

The next pitfall follows from the first. If I have the April 3rd version of the team project in a folder called teamproj, and a teammate sends me the April 4th version as a tarfile containing the teamproj, and I un-tar that file in the directory containing teamproj, the result will overwrite or replace my folder and its contents. That can be confusing and surprising: where are the new files? Right where the old ones were!

What I do if I'm nervous about un-tarring a file is I create a brand-new directory, mv the tarfile into the new directory, and untar it there. Anything that appears in that directory came from the tarfile.

ssh and scp

Often, the computer we are physically touching, using its keyboard and mouse, and looking at its screen, is not the one we want to be working with. For example, you login to your own laptop or to Mac #7 in L180, but you really want to be logging into Tempest and modifying your files there. Alternatively, you want to copy some files from your laptop to Tempest or connect from your laptop shell to a shell on Tempest. The following commands enable this remote work across the network:

ssh user@host
Remotely login to the given host computer as the given user account. ssh will prompt you for the password for the account and relay it to the host. If the password is accepted, ssh will start a remote shell for you.
scp local-file user@host:path/to/remote/file
This command copies a local file to a remote file. (Notice the user@host on the destination.) This command is a lot like cp except that you can precede the filenames with user@host: to have them copied across the network to the destination host. You can use this command to copy a file from your local machine, say your laptop, to Tempest, or from your C9 workspace to Tempest.
scp user@host:path/to/remote/file path/to/local/file
scp can also go the other way, copying a file from the remote host to your local machine.

As an example, I logged into a Mac (station #12 in L180 where I logged in as "sanderso") to do the following. Notice the different prompt on the Mac versus Tempest.

sci-L180-12:~ sanderso$ cd Desktop/
sci-L180-12:Desktop sanderso$ ls -l mypage.html 
-rw-r--r--  1 sanderso  WELLESLEY\Domain Users  0 Jan 26 12:29 mypage.html
sci-L180-12:Desktop sanderso$ scp mypage.html wwellesl@tempest:public_html/
The authenticity of host 'tempest (149.130.15.5)' can't be established.
RSA key fingerprint is ae:53:ce:76:03:10:a9:23:ee:89:14:5a:23:3f:fb:32.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'tempest,149.130.15.5' (RSA) to the list of known hosts.
wwellesl@tempest's password: 
mypage.html                                    100%    0     0.0KB/s   00:00    

Let's take a moment to look at that scary message from scp. The ssh and scp programs are secure, and they protect against eavesdropping by encrypting all traffic to and fro, and they protect against machine "spoofing" by checking the identity of the remote host. If you've never previously connected to that remote host from this local host, scp can't check the identity so it asks whether you a sure. On-campus, you can comfortably always say "yes," since LTS has good control of the hostnames. Across the wilds of the internet, spoofing can arise, so you have to be more thoughtful. We don't have time to get into that here, though, so let's continue with our example.

sci-L180-12:Desktop sanderso$ ssh wwellesl@tempest
wwellesl@tempest's password: 
Last login: Wed Feb 20 15:14:31 2013 from 149.130.206.217
[wwellesl@tempest ~] cd public_html/
[wwellesl@tempest public_html] ls -l mypage.html 
-rw-r--r--. 1 wwellesl wwellesl 0 Jan 26 12:33 mypage.html
[wwellesl@tempest public_html] logout
Connection to tempest closed.

Did you notice that we didn't get the scary message from ssh the second time? Did you also notice the different prompt, so that we know where we are? This is important; it's easy to get confused when you have different shells, all on the same screen, but logged into different machines. (It's not uncommon for me to be logged into 3 or 4 machines from my office machine or my laptop.)

Oh, and there's the logout command. I didn't teach you that; it's pretty easy to guess what it does. You should always logout of a machine when you're done, because connections do use up resources and a host can't support an infinite number of them.

drop

Now that you know about permission bits and such, you understand a bit more deeply what prevents you from copying one of your files into a directory that I own: the permission bits on that directory don't allow you ("others") to write to that directory.

But what if I wanted to allow you to write to one of my directories in a controlled way, say as a way of submitting an assignment. An analogy would be like sliding your printout under my door: you can put something of yours into something I own, but it then becomes mine and you can't pull it back out again, though you might be able to look at it.

There is no standard, built-in, Unix command to do what I've described, but I have written one for us at Wellesley.

The drop command only works on Tempest, so you must first scp the file to Tempest, then ssh to Tempest and execute the drop command.

drop account file
Copy the given file to the "drop" subdirectory of the given account. Actually, copy it to a special sub-directory for all your submissions, named for your account.

Here's the "drop" command in action:

[wwellesl@tempest public_html] ls -l wendy.html
-rw-rw----. 1 wwellesl wwellesl 152 Jan 15  2010 wendy.html
[wwellesl@tempest public_html] drop cs304 wendy.html
Copying wendy.html (from wwellesl) to /home/cs304/drop/ (uid 1942)
/home/cs304/drop/wwellesl doesn't exist, making it.
Successful drop.
[wwellesl@tempest public_html] ls -l /home/cs304/drop/wwellesl/
total 4
-r--r-----. 1 cs304 wwellesl 152 Jan 25 18:50 wendy.html

Notice that the drop created the wwellesl subfolder of the /home/cs304/drop folder, just for us, since this is our first drop.

By the way, if we needed to drop a whole bunch of files, we could tar them up and drop the tarfile.

Command Summary

There are, of course, many other useful commands, but these should get you started. Here they all are, with links to man pages, thanks to tutorialspoint.com:

  1. man
  2. ls
  3. cd
  4. pwd
  5. cp
  6. mv
  7. rm
  8. mkdir
  9. rmdir
  10. tar
  11. gzip
  12. ssh
  13. scp