CS 204 Unix Skills¶
This course requires you to work on a Linux server (the CS department server, also known as tempest), using a command line, and your life will be much easier if you have some Unix skills. (Linux is one of many descendants of Unix, an operating system that predates DOS and Windows. Mac OS X is another descendant, so many of the commands and concepts below will work on a Mac as well.)
Motivation¶
You already have a lot of experience working with computers, but with a GUI. A GUI is a graphical user interface, meaning icons, menus, and even drag-and-drop — heaven help us. Although Mac and Windows are rather different, they both use GUIs that rely crucially on those techniques to get work done. GUIs are intuitive, user-friendly, and easy to work with. I'm asking you to give all that up in favor of a CLI (command line interface), so I'd better have a really good reason.
The main reason is that the easiest way to connect to the server is through a relatively narrow tunnel, through which it's quick and easy to send textual commands and get textual responses. (It's possible to set up an X11 tunnel and use a Linux GUI, but that would require you to become familiar with a Linux GUI, so I think it's better to learn some skills that are more universally applicable.)
Another reason, almost as important, is that these skills will make you more efficient both now and in the future, in ways that you can't yet foresee. That's because programming languages are (for the most part) textual, and there's only a very blurry line between a program to accomplish some task and a series of commands to accomplish something. In short, commands can be automated in a way that is hard or impossible for a GUI.
Consider a quick example. Using a GUI, changing the permissions on a file takes about 5 mouse clicks (not including all the navigating around to find the file). Changing the permissions on a file using a CLI takes about 20 keystrokes. If you have to change the permissions on 100 files, it'll take you 5×100=500 mouse clicks, but about 20 keystrokes will do the trick using a CLI, with judicious use of wildcards. (Plus, if you're a good typist, those 20 keystrokes can take less time than those 5 mouse clicks.)
A software developer named Lawrence D'Oliveiro talks about this in his Tumblr post entitled CLI versus GUI Deathmatch!. You could also read a post about the Linux philosophy. (I'll bet you didn't know an operating system could have a philosophy!) Neither are necessary or required, but you might find them interesting.
Directories¶
Like all filesystems, Unix is organized as a tree of directories. To be able to refer to another directory or file, you have to understand the notation of the filesystem. Here are some to know about:
- /
- The directory whose name is the slash character is the root of
the tree. Every directory and file is a descendant of this
directory. Assuming there are no permissions issues, you can
uniquely specify a directory or file by starting at the root. For
example, this file is:
/home/cs204/public_html/unix.html
The preceding is called an absolute pathname, since it works from anywhere.
You'll notice that the different directories in the tree of directories are separated from each other with slashes. So, a slash plays two roles: it separates parent from child and also stands for the root of the ancestry tree. A side-effect of this notation is that you cannot name a directory or file with a slash in the name. (On Mac OS, the directory separator is a colon; on Windows, it's a backslash.)
- .
- The directory whose name is a single period (pronounced
dot
) is like the pronounme
: it stands for the directory you are in. Each process on a Unix system has a "current working directory" (CWD) and all relative pathnames implicitly start at the CWD. For example, the follow are two equivalent ways to say "the file namedfoo.text
in the current working directory:"./foo.text foo.text
Relative pathnames are very useful and important because they allow code to be relocatable, meaning that a directory subtree can be copied to another location, possibly even on another machine, and all relative pathnames that stay within the subtree will still work!
- ..
- The directory whose name is a double period (pronounced
dot dot
) is like the wordmom
: it stands for the unique parent directory of a directory. For example, the following says "the file namedbar.text
in the parent of the current working directory:"../bar.text
The dot-dot syntax can be very useful in relative pathnames, to refer to a file that is related via an ancestor (say an uncle, or a second cousin, once removed).
- ~/
- The directory whose name is a tilde is another kind a pronoun:
it means your login directory, referring to a kind of "database"
called the password database. For example, each of you has a
public_html
directory in your home directory. Here is how you would address a file in that directory:~/public_html/index.html
Remember that for each of you, that will be a different file!
Also, because it refers to the password database, the tilde is typically not available in programs, but it is a nice feature of the shell, so you will often use it in commands.
- ~user/
- Tilde has a second usage, where it is immediately followed by
the name of a user: it means the login directory of that user. For
example, the following path would be the address a file in the CS
204 course account on the CS server:
~cs204/public_html/home.html
Most of the pathname concepts above may be familiar to you from URLs, since the syntax of the pathname in a URL derives from Unix pathnames.
Many students find the tilde the most confusing pathname concept. One
metaphor that works is to think of an account as someone's "house":
the place where all their stuff is. My house is ~anderson
. The CS
204 stuff is in ~cs204
. Georgia Dome's^gdome stuff is in
~gdome
. Your stuff (files and folders) are all in your account, say
~sk7
or whatever. So, when you say ~foo
, it means the account or
"house" of a user named foo
.
With these concepts in mind, hopefully the following sections will be more clear. I will be more terse in these sections, so if you find a command confusing, I encourage you to make use of one of those thousands of web tutorials on unix and linux. At the end of this document, I have links to the man pages for these commands.
Conventions and Prompts¶
In the following sections, I will give sample input and output from interactions with a Linux machine (actually, Tempest, the CS department server).
When you are logged into a Linux machine (directly via the console or across the network via ssh), you will be running a "shell" (a shell is just a program that allows you to run commands). When the shell is ready for your command, it will print a "prompt." That prompt is wildly customizable. On Tempest, the default prompt is like this:
[user@host cwd]
That is, the shell prints three pieces of information, enclosed in square brackets: the username you're logged in with, the name of the machine you're logged into (the host), and the name of your current working directory (cwd). In the examples below, I will usually be logged into the "Wendy Wellesley" test account, so the prompt will look like this:
[wwellesl@tempest ~]
You will never type that part!
For brevity, I will often replace that with just a $
prompt.
Also note that all these examples have a typographic convention that
the stuff you're supposed to type is in bold
monospace
and the responses and other output is in regular
monospace
. Any tutorials you read on Linux will probably have
occurrences of a prompt (possibly very terse, such as a dollar sign or
a percent sign), and may have conventions to help you distinguish what
you type from the computer's response.
man¶
From the very beginning, Unix machines have had online "manuals" for use by everyone from novices to experts. Probably only one (unix) person in a thousand remembers more than a handful of the options for the "ls" command. So, when you're logged in, don't hesitate to use the "man" command to learn more about a command you're unfamiliar with:
[wwellesl@tempest ~] man ls
To exit from man, type "q".
Of course, these online man pages are on the web as well; I give some links at the end of this page.
Navigating¶
As I mentioned, the shell always puts you "in" a directory, your "current working directory" (CWD, also called "." or dot). Commands to know:
- ls
- lists the files and directories in the given directory. With no arguments, lists the contents of the CWD.
- cd
- changes the CWD to the given directory. With no arguments, changes to your home directory.
- pwd
- prints the absolute pathname of the CWD, in case you forget where you are.
Moving and Copying¶
Now that you can move around, you'll want to be able to move and copy files. Commands to know:
- cp
- copies the first argument (a file) to the second argument (either a file or a directory). There are many other options; see the man page for more.
- mv
- moves the first argument (a file) to the second argument (either a file or a directory).
- rm
- removes (deletes) the file(s). Caution! This is not a reversible operation: there is no "un-rm" command.
Tab completion¶
The Unix shell has many built-in conveniences for power users and poor typists. One you should know about is "tab completion." If you type part of a filename, enough to identify a unique file in the directory, and you hit the "tab" key (above caps lock on the left side of your keyboard), the shell will fill out the rest of the filename. If your prefix is not unique, the shell will fill out as much as it can, and allow you to make a choice of how to continue.
You don't have to do this, of course, but it beats typing the whole name, which is slow and error-prone.
Wildcards¶
If you want a command such as ls
or rm
to apply to several or many
files, you can list all of them on the command line, but that can be
tedious if there are many files. Wildcards are special characters that
match any character, allowing you to specify a pattern for the
filenames. (Like a wildcard in a card game.)
- *
- The asterisk character matches any character and as many as possible.
Just be super careful using both rm
and the asterisk; it's really
easy to delete all your files!
Making Files and Directories¶
To make a file, you would typically use a text editor, such as Emacs or vim. Emacs and vim are very different in usage, philosophy and user base. Emacs is slower to start up, but bloated with many features. vim is quicker to start up but is leaner. There are many other differences, but this is not the place to continue the decades-long cold war between the Emacs and vim factions.
You should know, however, that I'm firmly in the Emacs camp.
In CS 204, we'll be using Visual Studio Code, so you don't need to learn Emacs or vim. It's good to squirrel that knowledge in the back of your mind, though, because in a different environment, you might not have Visual Studio Code, but if you're on a Unix system, you will always have vim and almost always have Emacs. (I can think of only one time in my life when Emacs wasn't already installed, and it only took a few minutes to install it.)
To manage files and directories, use these commands:
- touch file
- creates an empty file with the given filename. You may never use this command, but it's very useful in demonstrations and experiments.
- mkdir dir
- creates the named directory.
- rmdir dir
- removes (deletes) the directory, but only if it's empty.
- rm -r dir
- recursively removes (deletes) the directory tree. Caution! This command is even more dangerous than "rm" itself. Not for the faint of heart.
tar and gzip¶
If someone wants to give you a bunch of files and directories, they could attach each of them to a mail message to you, or put them all on a web server where you could download them, but what if there were hundreds or thousands of files and directories? Handling them all one-at-a-time would be tedious at best.
For example, when you travel, you typically put your clothes and
other things in a suitcase, so you have one thing to carry instead of
dozens. The tar
is like that for files: it puts a bunch
of files into a single file, for easy carrying. (Technically, it puts
a copy of the file into the suitcase. If the file outside the
suitcase is changed, the copy inside the suitcase is not.) You can
then attach that single suitcase (called a "tarfile") to an email
message, upload it to a website, or whatever you need to do.
One solution, used for decades by Unix people, is a "tarfile," which is a single file that contains a directory tree. If you download a significant collection of software from a website, you will almost certainly be offered the option of downloading a tarfile, among other options.
Even better, tar
provides an easy way to copy a
directory tree into the tarfile, retaining all the structural
information, so that when the tarfile is unpacked, the receiver not
only gets the files, but the folders and the files are put into the
correct folders, so that a copy of the original directory tree is
re-created at the destination.
(Historical aside: the name "tar" comes from "tape archive" from back in the olden days when the "tar" was actually used to write a directory tree to a magnetic tape. However, some bright spark added a command line argument allowing "tar" to write the archive to a file on disk — the "tarfile" or "tarball" — instead of to a tape device. )
The resulting tarfile is often quite large, depending on what kinds of files go into it. So, we often want to compress the tarfile to make it smaller.
You're all familiar with compression, such as MP3 compression of a sound file. A compression program used for decades by Unix people is called "gzip." For text files, gzip does an excellent job, often reducing the size of a file by a factor of three or so. So, in fact, when downloading software, the option is typically not an uncompressed tarfile but a gzipped tarfile.
Commands to know:
- tar cf tarfile directory
- Create a tarfile from the given directory
- tar tf tarfile
- List the contents of the given tarfile (the "t" is for "table of contents").
- tar xf tarfile
- Extract the contents of the given tarfile, creating a subdirectory of the current directory.
- gzip file
- Compress the given file, replacing it with a
.gz
version. - gunzip file
- Un-compress the given file
Adding a "z" to the first argument of "tar" makes it work with gzip compression.
Note, an alternative to the tar/gzip combination is zip
and unzip
.
ssh and scp¶
Often, the computer we are physically touching, using its keyboard and mouse, and looking at its screen, is not the one we want to be working with. For example, you login to your own laptop or to Mac #7 in SCI L180, but you really want to be logging into Tempest and modifying your files there. The following commands enable this remote work across the network:
- ssh user@host
- Remotely login to the given host computer as the given user account. ssh will prompt you for the password for the account and relay it to the host. If the password is accepted, ssh will start a remote shell for you.
- scp local-file user@host:path/to/remote/file
- This command copies a local file to a remote file. (Notice the
user@host
on the destination.) This command is a lot likecp
except that you can precede the filenames with user@host: to have them copied across the network to the destination host. You can use this command to copy a file from your local machine, say your laptop, to Tempest, or from your C9 workspace to Tempest. - scp user@host:path/to/remote/file path/to/local/file
- scp can also go the other way, copying a file from the remote host to your local machine.
As an example, I logged into a Mac (station #12 in SCI 257 where I logged in as "sanderso") to do the following. Notice the different prompt on the Mac versus Tempest.
sci-257-12:~ sanderso$ cd Desktop/ sci-257-12:Desktop sanderso$ ls -l mypage.html -rw-r--r-- 1 sanderso WELLESLEY\Domain Users 0 Jan 26 12:29 mypage.html sci-257-12:Desktop sanderso$ scp mypage.html wwellesl@tempest:public_html/ The authenticity of host 'tempest (149.130.15.5)' can't be established. RSA key fingerprint is ae:53:ce:76:03:10:a9:23:ee:89:14:5a:23:3f:fb:32. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'tempest,149.130.15.5' (RSA) to the list of known hosts. wwellesl@tempest's password: mypage.html 100% 0 0.0KB/s 00:00
Let's take a moment to look at that scary message from scp
. The
ssh
and scp
programs are secure, and they protect against
eavesdropping by encrypting all traffic to and fro, and they protect
against machine "spoofing" by checking the identity of the remote
host. If you've never previously connected to that remote host from
this local host, scp
can't check the identity so it asks whether you
a sure. On-campus, you can comfortably always say "yes," since LTS has
good control of the hostnames. Across the wilds of the internet,
spoofing can arise, so you have to be more thoughtful. We don't have
time to get into that here, though, so let's continue with our
example.
sci-257-12:Desktop sanderso$ ssh wwellesl@tempest wwellesl@tempest's password: Last login: Wed Feb 20 15:14:31 2013 from 149.130.206.217 [wwellesl@tempest ~] cd public_html/ [wwellesl@tempest public_html] ls -l mypage.html -rw-r--r--. 1 wwellesl wwellesl 0 Jan 26 12:33 mypage.html [wwellesl@tempest public_html] logout Connection to tempest closed.
Did you notice that we didn't get the scary message from ssh
the
second time? Did you also notice the different prompt, so that we know
where we are? This is important; it's easy to get confused when you
have different shells, all on the same screen, but logged into
different machines. (It's not uncommon for me to be logged into 3 or 4
machines from my office machine or my laptop.)
Oh, and there's the logout
command. I didn't teach you that; it's
pretty easy to guess what it does. You should always logout of a
machine when you're done, because connections do use up resources and
a host can't support an infinite number of them.
drop¶
In Unix, files and folders have ownership and permissions; for the most part, the permissions will prevent you from copying a file you own into a folder that I own. (And vice versa.)
But what if I wanted to allow you to write to one of my directories in a controlled way, say as a way of submitting an assignment. An analogy would be like sliding your printout under my door: you can put something of yours into something I own, but it then becomes mine and you can't pull it back out again, though you might be able to look at it.
There is no standard, built-in, Unix command to do what I've described, but I have written one for us at Wellesley.
The drop
command only works on the CS server, so you need to make
sure the file is there, first.
- drop account file
- Copy the given file to the "drop" subdirectory of the given account. Actually, copy it to a special sub-directory for all your submissions, named for your account.
Here's the "drop" command in action:
[wwellesl@tempest public_html] ls -l wendy.html -rw-rw----. 1 wwellesl wwellesl 152 Jan 15 2010 wendy.html [wwellesl@tempest public_html] drop cs204 wendy.html Copying wendy.html (from wwellesl) to /home/cs204/drop/ (uid 7003) /home/cs204/drop/wwellesl doesn't exist, making it. Successful drop. [wwellesl@tempest public_html] ls -l /home/cs204/drop/wwellesl/ total 4 -r--r-----. 1 cs204 wwellesl 152 Jan 25 18:50 wendy.html
Notice that the drop
created the wwellesl
subfolder of the
/home/cs204/drop
folder, just for us, since this is our first drop.
By the way, if we needed to drop a whole bunch of files, we could tar them up and drop the tarfile.
Important Note: The drop
command
transfers ownership of the file to the the account that you dropped it
to. So, above, it transfers ownership of the file to the CS 204 course
account. You can't delete the file anymore, because it no longer
belongs to you. If you need to revise, drop another file, naming it
something perspicuous like wendy-revised.html
. Think of
this as like emailing a paper to a professor: you can't delete a prior
email once you have emailed them; you can only send a new email with
the revised paper.
Command Summary¶
There are, of course, many other useful commands, but these should get you started. Here they all are, with links to man pages, thanks to tutorialspoint.com:
- man Example:
man ls
to learn about thels
command - ls Example:
ls folder
to list the contents of that folder - cd Example:
cd folder
to change to that folder - pwd Example:
pwd
to find out what folder you are in - cp Example:
cp file destination
to copy the file to the destination (filename or folder) - mv Example:
mv file destination
to move the file to the destination (filename or folder) - rm Example:
rm file
to delete (remove) the file - mkdir Example:
mkdir project1
to make theproject1
directory - rmdir Example:
rmdir project1
to delete theproject1
directory as long as it is empty - tar Example:
tar cf p1.tar project1
to create a tarfile of the contents of theproject1
folder - gzip Example:
gzip p1.tar
to compress thep1.tar
file, replacing it withp1.tar.gz
- ssh Example:
ssh user@cs.wellesley.edu
to login to remote hostcs.wellesley.edu
using youruser
account - scp Example:
scp laptop-file.jpg user@cs:server-file.jpg
copy the filelaptop-file.jpg
from your laptop to your account on the CS server naming the resultserver-file.jpg