Python Virtual Environments

When working on a Python project, we may have to install some Python packages. You can use a command called pip (package installer for python). Here's a link to the documentation on pip.

Pip is a command that looks online at the Python Package Index, finds the package you want to install and then installs it on your local disk. But where? By default, it will install them in a "system" directory.

On the CS server, and in general whenever you are working on a system, you might not have administrator privilege to install Python packages in system directories.

Furthermore, you might need a different set of packages for different projects. Even worse, the set of packages in one project might be incompatible with the set of packages in another. For example, they both might need the orange package (used for data mining), but one might need version 3.12 and the other version 3.9. What to do?

Virtual Environments to the rescue!

Virtual environments are a really smart feature of the Python infrastructure that allows:

  • Non-root users to install python packages/modules
  • Different projects to use different Python packages/modules
  • A project can bundle itself with its various modules in a way that makes it easy to copy to another computer.

That is, each virtualenv is a place where we can put the Python modules and packages that belong to a particular project. Different projects can have different sets of modules, and each project is completely independent.

I found this Virtual Environment Primer that looks quite good, if you'd like something more thorough.

Virtualenv Concepts

To understand virtualenv, it helps to know a couple of concepts first:

  • When Python imports a module, it searches for it on a list of directories called the PYTHONPATH.
  • The list of directories is usually stored in an environment variable of that name.
  • That environment variable is modifiable, often by source-ing some shell commands.
  • The PYTHONPATH is read by programs like python and pip.
  • The pip program installs Python modules from various internet sites. It installs the module into a directory in your PYTHONPATH, from which Python will import it. Which brings us full-circle.

Don't get confused: a virtual environment is just a folder (directory) that has some pre-installed stuff in it, including some pre-installed Python packages. Creating a virtual environment just means creating a folder and installing some stuff in it. Since you own the folder, you can install addition stuff (Python modules) into it.

Creating a Virtual Environment

Historically, there was (and still is) a command called virtualenv which creates a virtual environment. But nowadays, the recommended practice is to use the Python command, along with a venv module specified on the command line, to create the virtual environment. For example:

$ python3.12 -m venv foo_env

will create a folder called foo_env and install some stuff into it. The nice thing about this way of creating a virtual environment is that you can be really clear about what version of python is installed into that environment. The command above creates a virtual environment that uses Python version 3.12, probably for a project called foo. Another project, called bar, that uses the older 3.9 version of Python, might be created like this:

$ python3.9 -m venv bar_env

Activating a Virtual Environment

Remember, you might have several virtual environments, each one for a different project, and so you activate a virtual environment when you're ready to use that set of packages. The commands to activate a virtual environment are stored in a file inside the environment folder, so, when you are ready to work on the foo project, you would do:

$ source foo_env/bin/activate

If the activation is successful, it modifies your prompt by putting the name of the venv folder in the prompt, so that you can be reminded that this shell has an active virtual environment and which one it is. Like this:

(foo_env) $ python

The command above will run python 3.12 and will load python packages from the foo_env folder.

Installing Python Modules using PIP

There's also a command called pip, which installs Python packages from the Internet, typically from the Python Package Index at Pypi.org.

The command downloads the software and any dependencies (other packages that the package depends on) and installs them into the active virtual environment. So, use it after you do the activate command, above. The command might look like this:

(foo_env) $ python -m pip install some_package

There's also a shorthand, which I always use:

(foo_env) $ pip install some_package

Deactivating

When you are done with a virtual environment, you can deactivate it:

(foo_env) $ deactivate
$ 

Example

Here's an example of creating a virtual environment and some of the subfolders it creates.

$ mkdir foo
$ cd foo
$ python3.12 -m venv foo_env
$ ls 
foo_venv/ 
$ ls foo_venv 
bin/ include/ lib/ local/ 
$ ls foo_venv/bin 
activate easy_install pip pip3.13 python python3.13
$ ls foo_venv/lib/ 
python3.12/ 
$ ls foo_venv/lib/python3.12
... site-packages/ 
$ ls foo_venv/lib/python3.6/site-packages/ 

We can depict a subset of the directory tree like this:

 
foo/ 
    foo_venv/ 
        bin/ 
            activate 
            pip 
            python 
        lib/ 
            python3.12/ 
                site-packages/ 

It's that last place (site-packages) where pip installs packages, and where python reads them, once you activate the virtual environment. (Remember that activating the virtual environment modifies the shell's prompt, to remind you that you are in one.)

$ source foo_venv/bin/activate 
(foo_venv) $ pip install pymysql 
(foo_venv) $ ls venv/lib/python3.12/site-packages/
... pymysql ...

Now, Python can import the PyMySQL package. (Usually, the name you give to pip is the same as the name of the package, but PyMySQL decided to be different in capitalization.

Pros and Cons of Virtualenv

  • Pro: you don't need to be root (be able to use sudo) to install Python packages. Instead, you can install them to directories that you own and control.
  • Pro: each virtualenv is independent, so different virtualenvs can have different, even conflicting, sets of Python packages.
  • Con: because each virtualenv is independent, you have to (re-)install packages in each one. (It's possible to copy a virtualenv to another location, but such operations are considered fragile and therefore are discouraged. ) Fortunately, there are tools that make this easier.
  • Con: the pathnames embedded in venv/bin/activate are absolute pathnames, so the virtualenv is not (generally) portable and relocatable, even on the same machine. You can't just mv it to another place. (There are tricks, but we'll learn a different way to copy a project.)

That concludes the basic idea and usage of virtual environments. The rest of this reading has some practical tips and related information.


Appendix

Most of the time, you can infer from the prompt exactly what virtual environment you are in. But if it ever gets confusing, the following command will give you the complete, absolute pathname:

(foo_env) $ printenv VIRTUAL_ENV 

The source Command

You'll notice that we activate a virtual environment by using a source command. What's that and how is it related to the MySQL source command?

The Unix source command is the older one. Indeed it is likely the ancestor of all source commands (the ur-source command). The source command means:

there are some commands in the named file. Read that file and execute them.

You can see that the MySQL version of source means almost exactly the same thing:

there is some code in the named file. Read that file and execute them.

The only difference is the kind of code in the file. If you encounter that command in other languages and situations, it's a decent bet that it means the same thing.

Source Pathnames

When we refer to a file in the source command, we can either give a relative pathname or an absolute pathname. We learned about both of kinds of pathnames when we learned about Unix.

If you are in your ~/cs304/ folder, you can do:

source venv/bin/activate

That's a relative pathname and is the shortest pathname I can suggest. With tab completion at the end, it's not hard to type.

If you are in a folder that is a sibling of your venv folder, you can use a relative pathname like this:

source ../venv/bin/activate

A little more to type, but beats having to move around using cd, like this:

cd ..
source venv/bin/activate
cd folder_you_were_in

Finally, you could instead use an absolute pathname by starting with your home directory in the pathname. The following command will work from anywhere:

source ~/cs304/venv/bin/activate

Again, a bit more to type than the minimal version, but only a couple of characters, and you avoid having to cd to your home directory and cd back to where you want to work.

I will try to remember to always use this absolute pathname in directions and examples, but you should keep it in mind for your own work. I suggest that you use the command with the tilde.