Python Virtual Environments

When working on a Python project, we may have to install some Python packages. You can use a command called pip (package installer for python). Here's a link to the documentation on pip.

Pip is a command that looks online at the Python Package Index, finds the package you want to install and then installs it on your local disk. But where? By default, it will install them in a "system" directory.

On the CS server, and in general whenever you are working on a system, you might not have administrator privilege to install Python packages in system directories.

Furthermore, you might need a different set of packages for different projects. Even worse, the set of packages in one project might be incompatible with the set of packages in another. For example, they both might need the orange package (used for data mining), but one might need version 2.7 and the other version 2.6. What to do?

Virtualenv to the rescue!

Virtualenv is a really smart feature of the Python infrastructure that allows:

  • Non-root users to install python packages/modules
  • Different projects to use different Python packages/modules
  • A project can bundle itself with its various modules in a way that makes it easy to copy to another computer.

That is, each virtualenv is a place where we can put the Python modules and packages that belong to a particular project. Different projects can have different sets of modules, and each project is completely independent.

Virtualenv Concepts

To understand virtualenv, it helps to know a couple of concepts first:

  • When Python imports a module, it searches for it on a list of directories called the PYTHONPATH.
  • The list of directories is usually stored in an environment variable of that name.
  • That environment variable is modifiable, often by source-ing some shell commands.
  • The PYTHONPATH is read by programs like python and pip.
  • The pip program installs Python modules from various internet sites. It installs the module into a directory in your PYTHONPATH, from which Python will import it. Which brings us full-circle.

For example:

$ mkdir fred 
$ cd fred 
$ virtualenv venv 
$ ls 
venv/ 
$ ls venv 
bin/ include/ lib/ local/ 
$ ls venv/bin 
activate easy_install pip pip3.6 python python3.6 
$ ls venv/lib/ 
python3.6/ 
$ ls venv/lib/python3.6 
... site-packages/ 
$ ls venv/lib/python3.6/site-packages/ 

We can depict a subset of the tree like this:

 
fred/ 
    venv/ 
        bin/ 
            activate 
            pip 
            python 
        lib/ 
            python3.6/ 
                site-packages/ 

It's that last place where pip installs packages, and where python reads them, once you activate the virtual environment. (Note that activating the virtual environment modifies the shell's prompt, to remind you that you are in one.)

$ source venv/bin/activate 
(venv) $ pip install pymysql 
(venv) $ ls venv/lib/python3.6/site-packages/
... pymysql ...

Now, Python can import the PyMySQL package. (Usually, the name you give to pip is the same as the name of the package, but PyMySQL decided to be different in capitalization.

Pros and Cons of Virtualenv

  • Pro: you don't need to be root (be able to use sudo) to install Python packages. Instead, you can install them to directories that you own and control.
  • Pro: each virtualenv is independent, so different virtualenvs can have different, even conflicting, sets of Python packages.
  • Con: because each virtualenv is independent, you have to (re-)install packages in each one. (It's possible to copy a virtualenv to another location, but such operations are considered fragile and therefore are discouraged. ) Fortunately, there are tools that make this easier.
  • Con: the pathnames embedded in venv/bin/activate are absolute pathnames, so the virtualenv is not (generally) portable and relocatable, even on the same machine. You can't just mv it to another place. (There are tricks, but we'll learn a different way to copy a project.)

You can determine what virtualenv you are in by doing:

(venv) $ printenv VIRTUAL_ENV 

Notice that it's an absolute pathname.

The source Command

(Reading the rest of this page isn't strictly necessary, but you might find it useful and enlightening.)

You'll notice that we activate a virtual environment by using a source command. What's that and how is it related to the MySQL source command?

The Unix source command is the older one. Indeed it is likely the ancestor of all source commands (the ur-source command). The source command means:

there are some unix commands in the named file. Read that file and execute them.

You can see that the MySQL version of source means almost exactly the same thing:

there is some SQL code in the named file. Read that file and execute them.

If you encounter that command in other languages and situations, it's a decent bet that it means the same thing.

Source Pathnames

When we refer to a file in the source command, we can either give a relative pathname or an absolute pathname. We learned about both of kinds of pathnames when we learned about Unix.

If you are in your ~/cs304/ folder, you can do:

source venv/bin/activate

That's a relative pathname and is the shortest pathname I can suggest. With tab completion at the end, it's not hard to type.

If you are in a folder that is a sibling of your venv folder, you can use a relative pathname like this:

source ../venv/bin/activate

A little more to type, but beats having to move around using cd, like this:

cd ..
source venv/bin/activate
cd folder_you_were_in

Finally, you could instead use an absolute pathname by starting with your home directory in the pathname. The following command will work from anywhere:

source ~/cs304/venv/bin/activate

Again, a bit more to type than the minimal version, but only a couple of characters, and you avoid having to cd to your home directory and cd back to where you want to work.

I will try to remember to always use this absolute pathname in directions and examples, but you should keep it in mind for your own work.