CS 204 Unix Skills

This course requires you to work on a Linux server (the CS department server, also known as tempest), using a command line, and your life will be much easier if you have some Unix skills. (Linux is one of many descendants of Unix, an operating system that predates DOS and Windows. Mac OS X is another descendant, so many of the commands and concepts below will work on a Mac as well.)

Motivation

You already have a lot of experience working with computers, but with a GUI. A GUI is a graphical user interface, meaning icons, menus, and even drag-and-drop — heaven help us. Although Mac and Windows are rather different, they both use GUIs that rely crucially on those techniques to get work done. GUIs are intuitive, user-friendly, and easy to work with. I'm asking you to give all that up in favor of a CLI (command line interface), so I'd better have a really good reason.

The main reason is that the easiest way to connect to the server is through a relatively narrow tunnel, through which it's quick and easy to send textual commands and get textual responses. (It's possible to set up an X11 tunnel and use a Linux GUI, but that would require you to become familiar with a Linux GUI, so I think it's better to learn some skills that are more universally applicable.)

Another reason, almost as important, is that these skills will make you more efficient both now and in the future, in ways that you can't yet foresee. That's because programming languages are (for the most part) textual, and there's only a very blurry line between a program to accomplish some task and a series of commands to accomplish something. In short, commands can be automated in a way that is hard or impossible for a GUI.

Consider a quick example. Using a GUI, changing the permissions on a file takes about 5 mouse clicks (not including all the navigating around to find the file). Changing the permissions on a file using a CLI takes about 20 keystrokes. If you have to change the permissions on 100 files, it'll take you 5×100=500 mouse clicks, but about 20 keystrokes will do the trick using a CLI, with judicious use of wildcards. (Plus, if you're a good typist, those 20 keystrokes can take less time than those 5 mouse clicks.)

A software developer named Lawrence D'Oliveiro talks about this in his Tumblr post entitled CLI versus GUI Deathmatch!. You could also read a post about the Linux philosophy. (I'll bet you didn't know an operating system could have a philosophy!) Neither are necessary or required, but you might find them interesting.

Directories

Like all filesystems, Unix is organized as a tree of directories. To be able to refer to another directory or file, you have to understand the notation of the filesystem. Here are some to know about:

/
The directory whose name is the slash character is the root of the tree. Every directory and file is a descendant of this directory. Assuming there are no permissions issues, you can uniquely specify a directory or file by starting at the root. For example, this file is:
/home/cs204/public_html/unix.html

The preceding is called an absolute pathname, since it works from anywhere.

You'll notice that the different directories in the tree of directories are separated from each other with slashes. So, a slash plays two roles: it separates parent from child and also stands for the root of the ancestry tree. A side-effect of this notation is that you cannot name a directory or file with a slash in the name. (On Mac OS, the directory separator is a colon; on Windows, it's a backslash.)

.
The directory whose name is a single period (pronounced dot) is like the pronoun me: it stands for the directory you are in. Each process on a Unix system has a "current working directory" (CWD) and all relative pathnames implicitly start at the CWD. For example, the follow are two equivalent ways to say "the file named foo.text in the current working directory:"
./foo.text 
 foo.text

Relative pathnames are very useful and important because they allow code to be relocatable, meaning that a directory subtree can be copied to another location, possibly even on another machine, and all relative pathnames that stay within the subtree will still work!

..
The directory whose name is a double period (pronounced dot dot) is like the word mom: it stands for the unique parent directory of a directory. For example, the following says "the file named bar.text in the parent of the current working directory:"
../bar.text

The dot-dot syntax can be very useful in relative pathnames, to refer to a file that is related via an ancestor (say an uncle, or a second cousin, once removed).

~/
The directory whose name is a tilde is another kind a pronoun: it means your login directory, referring to a kind of "database" called the password database. For example, each of you has a public_html directory in your home directory. Here is how you would address a file in that directory:
~/public_html/index.html

Remember that for each of you, that will be a different file!

Also, because it refers to the password database, the tilde is typically not available in programs, but it is a nice feature of the shell, so you will often use it in commands.

~user/
Tilde has a second usage, where it is immediately followed by the name of a user: it means the login directory of that user. For example, the following path would be the address a file in the CS 204 course account on the CS server:
~cs204/public_html/home.html

Most of the pathname concepts above may be familiar to you from URLs, since the syntax of the pathname in a URL derives from Unix pathnames.

Many students find the tilde the most confusing pathname concept. One metaphor that works is to think of an account as someone's "house": the place where all their stuff is. My house is ~anderson. The CS 204 stuff is in ~cs204. Georgia Dome's^gdome stuff is in ~gdome. Your stuff (files and folders) are all in your account, say ~sk7 or whatever. So, when you say ~foo, it means the account or "house" of a user named foo.

With these concepts in mind, hopefully the following sections will be more clear. I will be more terse in these sections, so if you find a command confusing, I encourage you to make use of one of those thousands of web tutorials on unix and linux. At the end of this document, I have links to the man pages for these commands.

Conventions and Prompts

In the following sections, I will give sample input and output from interactions with a Linux machine (actually, Tempest, the CS department server).

When you are logged into a Linux machine (directly via the console or across the network via ssh), you will be running a "shell" (a shell is just a program that allows you to run commands). When the shell is ready for your command, it will print a "prompt." That prompt is wildly customizable. On Tempest, the default prompt is like this:

[user@host cwd]

That is, the shell prints three pieces of information, enclosed in square brackets: the username you're logged in with, the name of the machine you're logged into (the host), and the name of your current working directory (cwd). In the examples below, I will usually be logged into the "Wendy Wellesley" test account, so the prompt will look like this:

[wwellesl@tempest ~] 

You will never type that part!

For brevity, I will often replace that with just a $ prompt.

Also note that all these examples have a typographic convention that the stuff you're supposed to type is in bold monospace and the responses and other output is in regular monospace. Any tutorials you read on Linux will probably have occurrences of a prompt (possibly very terse, such as a dollar sign or a percent sign), and may have conventions to help you distinguish what you type from the computer's response.

man

From the very beginning, Unix machines have had online "manuals" for use by everyone from novices to experts. Probably only one (unix) person in a thousand remembers more than a handful of the options for the "ls" command. So, when you're logged in, don't hesitate to use the "man" command to learn more about a command you're unfamiliar with:

[wwellesl@tempest ~] man ls

To exit from man, type "q".

Of course, these online man pages are on the web as well; I give some links at the end of this page.

As I mentioned, the shell always puts you "in" a directory, your "current working directory" (CWD, also called "." or dot). Commands to know:

ls
lists the files and directories in the given directory. With no arguments, lists the contents of the CWD.
cd
changes the CWD to the given directory. With no arguments, changes to your home directory.
pwd
prints the absolute pathname of the CWD, in case you forget where you are.

Moving and Copying

Now that you can move around, you'll want to be able to move and copy files. Commands to know:

cp
copies the first argument (a file) to the second argument (either a file or a directory). There are many other options; see the man page for more.
mv
moves the first argument (a file) to the second argument (either a file or a directory).
rm
removes (deletes) the file(s). Caution! This is not a reversible operation: there is no "un-rm" command.

Tab completion

The Unix shell has many built-in conveniences for power users and poor typists. One you should know about is "tab completion." If you type part of a filename, enough to identify a unique file in the directory, and you hit the "tab" key (above caps lock on the left side of your keyboard), the shell will fill out the rest of the filename. If your prefix is not unique, the shell will fill out as much as it can, and allow you to make a choice of how to continue.

You don't have to do this, of course, but it beats typing the whole name, which is slow and error-prone.

Wildcards

If you want a command such as ls or rm to apply to several or many files, you can list all of them on the command line, but that can be tedious if there are many files. Wildcards are special characters that match any character, allowing you to specify a pattern for the filenames. (Like a wildcard in a card game.)

*
The asterisk character matches any character and as many as possible.

Just be super careful using both rm and the asterisk; it's really easy to delete all your files!

Making Files and Directories

To make a file, you would typically use a text editor, such as Emacs or vim. Emacs and vim are very different in usage, philosophy and user base. Emacs is slower to start up, but bloated with many features. vim is quicker to start up but is leaner. There are many other differences, but this is not the place to continue the decades-long cold war between the Emacs and vim factions.

You should know, however, that I'm firmly in the Emacs camp.

In CS 204, we'll be using Visual Studio Code, so you don't need to learn Emacs or vim. It's good to squirrel that knowledge in the back of your mind, though, because in a different environment, you might not have Visual Studio Code, but if you're on a Unix system, you will always have vim and almost always have Emacs. (I can think of only one time in my life when Emacs wasn't already installed, and it only took a few minutes to install it.)

To manage files and directories, use these commands:

touch file
creates an empty file with the given filename. You may never use this command, but it's very useful in demonstrations and experiments.
mkdir dir
creates the named directory.
rmdir dir
removes (deletes) the directory, but only if it's empty.
rm -r dir
recursively removes (deletes) the directory tree. Caution! This command is even more dangerous than "rm" itself. Not for the faint of heart.

tar and gzip

If someone wants to give you a bunch of files and directories, they could attach each of them to a mail message to you, or put them all on a web server where you could download them, but what if there were hundreds or thousands of files and directories? Handling them all one-at-a-time would be tedious at best.

For example, when you travel, you typically put your clothes and other things in a suitcase, so you have one thing to carry instead of dozens. The tar is like that for files: it puts a bunch of files into a single file, for easy carrying. (Technically, it puts a copy of the file into the suitcase. If the file outside the suitcase is changed, the copy inside the suitcase is not.) You can then attach that single suitcase (called a "tarfile") to an email message, upload it to a website, or whatever you need to do.

One solution, used for decades by Unix people, is a "tarfile," which is a single file that contains a directory tree. If you download a significant collection of software from a website, you will almost certainly be offered the option of downloading a tarfile, among other options.

Even better, tar provides an easy way to copy a directory tree into the tarfile, retaining all the structural information, so that when the tarfile is unpacked, the receiver not only gets the files, but the folders and the files are put into the correct folders, so that a copy of the original directory tree is re-created at the destination.

(Historical aside: the name "tar" comes from "tape archive" from back in the olden days when the "tar" was actually used to write a directory tree to a magnetic tape. However, some bright spark added a command line argument allowing "tar" to write the archive to a file on disk — the "tarfile" or "tarball" — instead of to a tape device. )

The resulting tarfile is often quite large, depending on what kinds of files go into it. So, we often want to compress the tarfile to make it smaller.

You're all familiar with compression, such as MP3 compression of a sound file. A compression program used for decades by Unix people is called "gzip." For text files, gzip does an excellent job, often reducing the size of a file by a factor of three or so. So, in fact, when downloading software, the option is typically not an uncompressed tarfile but a gzipped tarfile.

Commands to know:

tar cf tarfile directory
Create a tarfile from the given directory
tar tf tarfile
List the contents of the given tarfile (the "t" is for "table of contents").
tar xf tarfile
Extract the contents of the given tarfile, creating a subdirectory of the current directory.
gzip file
Compress the given file, replacing it with a .gz version.
gunzip file
Un-compress the given file

Adding a "z" to the first argument of "tar" makes it work with gzip compression.

Note, an alternative to the tar/gzip combination is zip and unzip.

ssh and scp

Often, the computer we are physically touching, using its keyboard and mouse, and looking at its screen, is not the one we want to be working with. For example, you login to your own laptop or to Mac #7 in SCI L180, but you really want to be logging into Tempest and modifying your files there. The following commands enable this remote work across the network:

ssh user@host
Remotely login to the given host computer as the given user account. ssh will prompt you for the password for the account and relay it to the host. If the password is accepted, ssh will start a remote shell for you.
scp local-file user@host:path/to/remote/file
This command copies a local file to a remote file. (Notice the user@host on the destination.) This command is a lot like cp except that you can precede the filenames with user@host: to have them copied across the network to the destination host. You can use this command to copy a file from your local machine, say your laptop, to Tempest, or from your C9 workspace to Tempest.
scp user@host:path/to/remote/file path/to/local/file
scp can also go the other way, copying a file from the remote host to your local machine.

As an example, I logged into a Mac (station #12 in SCI 257 where I logged in as "sanderso") to do the following. Notice the different prompt on the Mac versus Tempest.

 
sci-257-12:~ sanderso$ cd Desktop/ 
sci-257-12:Desktop sanderso$ ls -l mypage.html  
-rw-r--r--  1 sanderso  WELLESLEY\Domain Users  0 Jan 26 12:29 mypage.html 
sci-257-12:Desktop sanderso$ scp mypage.html wwellesl@tempest:public_html/ 
The authenticity of host 'tempest (149.130.15.5)' can't be established. 
RSA key fingerprint is ae:53:ce:76:03:10:a9:23:ee:89:14:5a:23:3f:fb:32. 
Are you sure you want to continue connecting (yes/no)? yes 
Warning: Permanently added 'tempest,149.130.15.5' (RSA) to the list of known hosts. 
wwellesl@tempest's password:  
mypage.html                                    100%    0     0.0KB/s   00:00     

Let's take a moment to look at that scary message from scp. The ssh and scp programs are secure, and they protect against eavesdropping by encrypting all traffic to and fro, and they protect against machine "spoofing" by checking the identity of the remote host. If you've never previously connected to that remote host from this local host, scp can't check the identity so it asks whether you a sure. On-campus, you can comfortably always say "yes," since LTS has good control of the hostnames. Across the wilds of the internet, spoofing can arise, so you have to be more thoughtful. We don't have time to get into that here, though, so let's continue with our example.

 
sci-257-12:Desktop sanderso$ ssh wwellesl@tempest 
wwellesl@tempest's password:  
Last login: Wed Feb 20 15:14:31 2013 from 149.130.206.217 
[wwellesl@tempest ~] cd public_html/ 
[wwellesl@tempest public_html] ls -l mypage.html  
-rw-r--r--. 1 wwellesl wwellesl 0 Jan 26 12:33 mypage.html 
[wwellesl@tempest public_html] logout 
Connection to tempest closed. 

Did you notice that we didn't get the scary message from ssh the second time? Did you also notice the different prompt, so that we know where we are? This is important; it's easy to get confused when you have different shells, all on the same screen, but logged into different machines. (It's not uncommon for me to be logged into 3 or 4 machines from my office machine or my laptop.)

Oh, and there's the logout command. I didn't teach you that; it's pretty easy to guess what it does. You should always logout of a machine when you're done, because connections do use up resources and a host can't support an infinite number of them.

drop

In Unix, files and folders have ownership and permissions; for the most part, the permissions will prevent you from copying a file you own into a folder that I own. (And vice versa.)

But what if I wanted to allow you to write to one of my directories in a controlled way, say as a way of submitting an assignment. An analogy would be like sliding your printout under my door: you can put something of yours into something I own, but it then becomes mine and you can't pull it back out again, though you might be able to look at it.

There is no standard, built-in, Unix command to do what I've described, but I have written one for us at Wellesley.

The drop command only works on the CS server, so you need to make sure the file is there, first.

drop account file
Copy the given file to the "drop" subdirectory of the given account. Actually, copy it to a special sub-directory for all your submissions, named for your account.

Here's the "drop" command in action:

[wwellesl@tempest public_html] ls -l wendy.html 
-rw-rw----. 1 wwellesl wwellesl 152 Jan 15  2010 wendy.html 
[wwellesl@tempest public_html] drop cs204 wendy.html 
Copying wendy.html (from wwellesl) to /home/cs204/drop/ (uid 7003) 
/home/cs204/drop/wwellesl doesn't exist, making it. 
Successful drop. 
[wwellesl@tempest public_html] ls -l /home/cs204/drop/wwellesl/ 
total 4 
-r--r-----. 1 cs204 wwellesl 152 Jan 25 18:50 wendy.html 

Notice that the drop created the wwellesl subfolder of the /home/cs204/drop folder, just for us, since this is our first drop.

By the way, if we needed to drop a whole bunch of files, we could tar them up and drop the tarfile.

Important Note: The drop command transfers ownership of the file to the the account that you dropped it to. So, above, it transfers ownership of the file to the CS 204 course account. You can't delete the file anymore, because it no longer belongs to you. If you need to revise, drop another file, naming it something perspicuous like wendy-revised.html. Think of this as like emailing a paper to a professor: you can't delete a prior email once you have emailed them; you can only send a new email with the revised paper.

Command Summary

There are, of course, many other useful commands, but these should get you started. Here they all are, with links to man pages, thanks to tutorialspoint.com:

  1. man Example: man ls to learn about the ls command
  2. ls Example: ls folder to list the contents of that folder
  3. cd Example: cd folder to change to that folder
  4. pwd Example: pwd to find out what folder you are in
  5. cp Example: cp file destination to copy the file to the destination (filename or folder)
  6. mv Example: mv file destination to move the file to the destination (filename or folder)
  7. rm Example: rm file to delete (remove) the file
  8. mkdir Example: mkdir project1 to make the project1 directory
  9. rmdir Example: rmdir project1 to delete the project1 directory as long as it is empty
  10. tar Example: tar cf p1.tar project1 to create a tarfile of the contents of the project1 folder
  11. gzip Example: gzip p1.tar to compress the p1.tar file, replacing it with p1.tar.gz
  12. ssh Example: ssh user@cs.wellesley.edu to login to remote host cs.wellesley.edu using your user account
  13. scp Example: scp laptop-file.jpg user@cs:server-file.jpg copy the file laptop-file.jpg from your laptop to your account on the CS server naming the result server-file.jpg