Python Review

This document isn't intended to teach Python from the beginning, especially for non-programmers. Instead, it's intended to refresh the memory for people who have programmed in Python before, but maybe not for a while, and have forgotten aspects of it.

The first part of this document is mostly review. Feel free to skip ahead if you aren't learning anything from a particular section.

Python as an Environment

One of the best ways to learn python is to type expressions into the Read-Eval-Print Loop (REPL) to see what they do. (Other sources will call this the interpreter, but you can have an interpreter without having this ability to type in expressions and have them evaluated and the results printed.) Just give the command python to your Linux or Mac shell, and start typing expressions.

(Note, some people find it confusing that there is a program/command called python, but the program is the thing that understands the Python language and executes/runs your programs.)

In these notes, I'll show Python REPL interactions like the following. Don't type the >>>; that's the prompt by the Python REPL. Similarly, don't type the $ before the python command; that stands for the shell prompt, which is probably some detailed string like [youracct@tempest dir]. (A prompt just means that something is ready for your next input.)

$ python 
>>> a=3 
>>> b=4 
>>> a+b 
7 
>>> import math 
>>> c = math.sqrt(a*a+b*b) 
>>> c 
5.0 
>>> quit() 
$

To exit, invoke the quit() function or type a control-D. Notice how that returns to our Unix prompt.

Python as a Language

Python code looks like no other code that I'm familiar with. It's a complete departure from the C family of languages.

# Python uses # as a end-of-line comment character 

import math    # packages are "loaded" by the import statement 

a = 3          # no need to declare types; very dynamic 
b = 4 
c = math.sqrt(a*a+b*b) 

So far, not too bad. Let's see some syntax:

if a == b: 
    print('a and b are the same')
else: 
    print('a and b differ')
print('go on')

Hmm. Where are the parens and braces? Gone! Python knows that the else: section is over because the indentation ends. That's right, the indentation has syntactic meaning in Python. So, the following two programs are different in Python.

i = 0 
while i < 10: 
    i += 1 
print(i)
i = 0 
while i < 10: 
    i += 1 
    print(i)

The first one prints only the last number, after the loop, while the second one prints every number, because the print statement is inside the loop.

Functions in Python

Functions in Python are simple: a name, a formal argument list (no datatypes), and a body. The end of the body is, as expected, signalled by the end of indentation. They are introduced by the special keyword def.

def mean(a,b): 
    return (a+b)/2 

Even better is to add a string as the first line of the body. Later, we'll see a tool that will use these for self-documenting files:

def mean(a,b): 
    "returns the arithmetic mean of the two numbers" 
    return (a+b)/2 

Here is a whole file of function definitions:

import math

def hypo(a,b):
    """Returns the length of the hypotenuse of a right triangle with the given legs"""
    # algorithm based on the Pythagorean theorem
    return math.sqrt(a*a+b*b)

def fibonacci(n):
    """Returns the nth Fibonacci number, for `n' a non-negative integer"""
    if type(n) != type(1) or n<0:
        raise Exception('bad argument to fibonacci')
    if n<2:
        return n
    else:
        # what a horrible algorithm!  Never do this!!
        return fibonacci(n-1)+fibonacci(n-2)

def gcd(a,b):
    """Returns the greatest common divisor of the two arguments.
Example: gcd(9,8)=1, since 9 and 8 are relatively prime, but
gcd(24,30)=6, since 6 divides both 24 and 30."""
    # This implementation is Dijkstra's method
    print("a is {a} and b is {b}".format(a=a,b=b))
    if a == b:
        return a
    elif a > b:
        return gcd(a-b,b)
    else:
        return gcd(a,b-a)

def triangular(max):
    """Generates a triangular list of lists up to the given max"""
    result = []
    # range() gives you a list of integers; "for" iterates over lists
    for n in range(max):
        result.append(list(range(n)))
    for elt in result:
        print(elt)

if __name__ == '__main__':
    if hypo(3,4) != 5:
        print('error in hypo:  hypo(3,4) returns ',hypo(3,4))
    print('The first ten Fibonacci numbers are')
    for i in range(10):
        print(fibonacci(i),' ', end=' ')
    # this empty print statement just gives us a blank line
    print()
    print('testing gcd(20,45)')
    if gcd(20,45) != 5:
        print('error in gcd:  gcd(20,45) returns ',gcd(20,45))
    print("here's a list of 5 lists")
    triangular(5)
    

You can try these by downloading the mathfuns.py python file, importing the contents into python, and running the functions:

$ python
>>> import mathfuns
>>> mathfuns.hypo(5,12)
13.0 
>>> mathfuns.gcd(55,89)
1 

You can avoid the filename (which is also the name of the module) by importing particular members or all members:

>>> from math import sqrt 
>>> sqrt(9) 
3 

It's even possible to import every member of a module, but this is considered poor practice:

>>> from mathfuns import * 
>>> fibonacci(100)   # too long! 
>>> gcd(30,50) 
a is 30 and b is 50 
a is 30 and b is 20 
a is 10 and b is 20 
a is 10 and b is 10 
10 

It's generally considered better not to do this, because it can become less clear where a function (like sqrt, fibonacci and gcd above) is defined.

Datatypes in Python

All my examples so far have been numeric, for no good reason but that numbers don't need much introduction. Let's look at some more interesting datatypes. To play with these test values, download this sampledata file.

Strings

Strings pretty much work as you expect. You can concatenate them with the + operator. You can take their length. You can print them.

>>> x = 'spam, '
>>> x 
'spam, '
>>> y = 'eggs, '
>>> y
'eggs, '
>>> x+x 
'spam, spam, ' 
>>> x+x+y+' and '+x 
'spam, spam, eggs,  and spam, ' 
>>> x+x+y+'and '+x 
'spam, spam, eggs, and spam, ' 
>>> len(x) 
6 
>>> len(x+y) 
12 
>>> print(x+y) 
spam, eggs, 

Lists

Lists are denoted with square brackets with commas between the elements. You can index them numerically, and extract sub-lists. You can append items onto the end (actually, either end). You can store into them.

>>> cheeses = [ 'swiss', 'gruyere', 'cheddar', 'stilton', 'roquefort', 'brie' ]
>>> cheeses 
['swiss', 'gruyere', 'cheddar', 'stilton', 'roquefort', 'brie'] 
>>> len(cheeses) 
6 
>>> cheeses[0] 
'swiss' 
>>> cheeses[1:3] 
['gruyere', 'cheddar'] 
>>> cheeses[1:4] 
['gruyere', 'cheddar', 'stilton'] 
>>> cheeses.append('gouda') 
>>> cheeses 
['swiss', 'gruyere', 'cheddar', 'stilton', 'roquefort', 'brie', 'gouda'] 
>>> cheeses[0] = 'emmentaler' 
>>> cheeses 
['emmentaler', 'gruyere', 'cheddar', 'stilton', 'roquefort', 'brie', 'gouda'] 

The append shows how to invoke a method on a list, and that lists are mutable, unlike tuples. (See the next section.)

Tuples

Tuples are sequences of items, just like lists, except that they use parentheses instead of square brackets and they are immutable, which means that once created, they can't be modified.

>>> troupe = ('Cleese', 'Palin', 'Idle', 'Chapman', 'Gilliam', 'Jones')
>>> troupe 
('Cleese', 'Palin', 'Idle', 'Chapman', 'Gilliam', 'Jones') 
>>> len(troupe) 
6 
>>> troupe[0] = 'Homer'   # won't work 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment
>>>

Tuples are useful to know about in CS 304 because one common way to interact with a database is to get a row out of the database, and one standard representation of a row is as a tuple.

Tuples of One Element

One tricky fact about tuples is that a tuple of exactly one element requires a seemingly unnecessary comma:

t1 = (42,)     # a tuple of one element

If you omit the comma, Python will think you are just using ordinary parentheses, and won't create a tuple.

What's interesting is that you don't even need the parentheses. It's the commas that create a tuple. So the following is equivalent:

t2 = 42,      # a tuple of one element

Destructuring Assignment

One very cool feature of tuples in Python is destructuring assignment, also called unpacking. This occurs when we want to pull all of the values out of a tuple and assign them to variables.

For example, suppose we've just retrieved a row from a database table about movies and we know that the columns are

(TT, Title, release year, Director ID, addedby ID)

Here's an example of such a tuple:

movie_row = (1517268, 'Barbie', 2023, 1950086, 8436)

Now suppose we want to pull all those values out into variables. We can do that with one line:

tt, title, year, dirid, addedby = movie_row

Isn't that better than the following?

tt = movie_row[0]
title = movie_row[1]
year = movie_row[2]
dirid = movie_row[3]
addedby = movie_row[4]

Note that we can use this same trick to initialize two variables at once:

low, high = 1, 20

Comprehensions

Both lists and tuples are sequences, and as such can be easily iterated over, sometimes building new lists on the way. If you have a list of numbers, you can create a new lists of the squares of those numbers like this:

nums = list(range(5))             # [0, 1, 2, 3, 4]
sqrs = [ x*x for x in nums ]      # [0, 1, 4, 9, 16]

This is called a list comprehension. It's an elegant and efficient way to create a new list from an existing list.

Dictionaries (Hashes)

Like all civilized languages, Python has dictionaries built-in (sometimes called hashtables or hashmaps in other languages, such as Java). They act a little like arrays that have strings as indexes. You can store into hashes and iterate over them easily.

The notation of a dictionary is braces around key/value pairs. For example, we can create a dictionary that stores the capital of each of a few countries. The key in each pair is the name of the country. The value is the matching capital city.

capitals = {'France': 'Paris', 'England': 'London', 'China': 'Beijing'}
print(capitals['France'])         # prints "Paris"
capitals['India'] = 'New Delhi'   # store a new capital

You can iterate over a dictionary with a special version of the for loop. The key variable below will be bound to each key in the dictionary.

for key in capitals:
    print(k, capitals[k])

The code above prints:

France Paris
England London
China Beijing
India New Delhi

Key Errors

What happens if we try to look up a key and it's not in the dictionary. For example, we try to look up the capital of a city that is not in the dictionary? We will get a KeyError:

>>> capitals['Brazil']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'Brazil'

There are two ways to deal with this. One is to surround the access with a try/catch and deal with it that way:

country = input('what country? ')
try:
    print('The capital of '+country+' is '+capitals[country]
except KeyError:
    print('The capital of '+country+' is not known')

That works and is very general, but can sometimes be cumbersome.

An alternative is to use the .get() method on a dictionary, which allows you to specify a default value if the key is not found. If you don't specify a default value, you get 'None'. (The default default.) You won't get a KeyError.

For example:

country = input('what country? ')
cap = capitals.get(country)
if cap is not None:
    print('The capital of '+country+' is '+cap)
else:
    print('The capital of '+country+' is not known')

or better:

country = input('what country? ')
cap = capitals.get(country, 'not known')
print('The capital of '+country+' is '+cap)

The .get() method on dictionaries is very useful and we'll see it in CS 304.

Iteration in Python

We've seen the usual while loop above, which is very normal. We've also seen how the for loop iterates over the items in a list or over the keys in a dictionary. What if you want to iterate over a series of numbers, like a C-style for loop? You can do that with the range() function, though I don't think you'll often have to in CS 304. Other than for purely numeric code, most for loops are iterating over some data structure, via numerical indices. Nevertheless, here's an example from mathfuns.py

def triangular(max):
    """Generates a triangular list of lists up to the given max"""
    result = []
    # range() gives you a list of integers; "for" iterates over lists
    for n in range(max):
        result.append(list(range(n)))
    for elt in result:
        print(elt)

Note that we have to wrap the range(n) with list() to convert it into a list of numbers. (In Python3, range() returns a special range object &mdash a kind of generator &mdash, but we won't worry about that.)

>>> from mathfuns import triangular 
>>> triangular(10) 
[] 
[0] 
[0, 1] 
[0, 1, 2] 
[0, 1, 2, 3] 
[0, 1, 2, 3, 4] 
[0, 1, 2, 3, 4, 5] 
[0, 1, 2, 3, 4, 5, 6] 
[0, 1, 2, 3, 4, 5, 6, 7] 
[0, 1, 2, 3, 4, 5, 6, 7, 8] 

Stuff Not Covered in CS 111

The following sections cover things that you might not have seen in CS 111, so this is worth reading.

Executable Python Scripts and Modules

You can put a bunch of Python code, including function definitions and such, into a file and run it. Look at now_v1.py:

from datetime import datetime

now = datetime.now()
print(now.strftime("%Y-%m-%d %H:%M:%S"))  # like the internet standard

You can run it from the shell as follows:

$ python now_v1.py 

That's a useful way to write our main programs.

Modules in Python

We've also seen that we can import functions and other useful stuff from files into the Python environment, as when we imported some functions from the math package.

It's smart to write your Python code in a modular way, grouping related functions in a file that can then be imported into other parts of the program and into the main program. (You did a lot with importing code in CS 230.)

Look at now_v2.py:

from datetime import datetime

def now():
    """Returns a string for the current day and time

the format YYYY-MM-DD HH:MM:SS is like the internet format"""
    now = datetime.now()
    return now.strftime("%Y-%m-%d %H:%M:%S")  # like the internet standard

That file demonstrates how we could write a module.

Here's how we could use that module:

$ python 
>>> import now_v2 
>>> now_v2.now() 
'2022-02-17' 
>>> print(now_v2.now())
2022-02-17 

But now the file doesn't work as a shell script. Can we do both? Amazingly, the answer is yes!

Here's the trick. You can put an if statement in your file that checks to see if the __name__ variable has the value __main__. If it does, the file is being run as a script, rather than being loaded as a module.

Therefore,

  • you can put your module-like function definitions above that line, and
  • you put your script-like code to run below that line, indented within the conditional.

Look at now_v3.py:

from datetime import datetime

def now():
    """Returns a string for the current day and time

the format YYYY-MM-DD HH:MM:SS is like the internet format"""
    now = datetime.now()
    return now.strftime("%Y-%m-%d %H:%M:%S")  # like the internet standard

# the following code is only executed if this file is invoked from the
# command line as a script, rather than loaded as a module.

if __name__ == '__main__':
    print(now())
    

And as a script, we can run it like this:

$ python now_v3.py 
2022-02-17 

As a module, it works just like now_v3.py:

$ python 
>>> import now_v3
>>> now_v3.now() 
'2022-02-17' 
>>> print(now_v2.now())
2022-02-17 

We will use this trick a lot in CS 304!

PyDoc

To get the documentation on a Python module, including one you write yourself, you can use the pydoc shell command:

$ pydoc mathfuns 

produces the following documentation for mathfuns right to your screen. (You can also set up pydoc as a web server, which is very cool.)

Of course, the documentation that Pydoc gives you comes from the author of the module, and when you write Python code, you shoulder the responsibility of documenting what you create.

Give every function a meaningful documentation string. Write the kind of documentation you'd like to read if you wanted to know how to use the function. The string goes (in triple-quotes, which allows for multiple lines) as the first element of the function definition.

Some additional guidelines and information:

Programming Methodology

You've probably noticed that I haven't covered Object-Oriented Programming (OOP) for either PHP or Python. Object-oriented programming (OOP) is new, modern, better, and we all should use it, right? Both languages have OOP, and you're welcome to use it, but we won't (necessarily) be using it. Why?

  • OOP is for controlling complexity, and we don't have that kind of complexity.
  • Objects are for modeling entities with state and behavior, and that doesn't fit the kind of information processing we're doing: we're doing "filtering" or "transformation" processing.

However, we can and should use procedural modularity and abstraction.

That said, if you come across an aspect of your coding that you feel would be improved by using OOP, please ask! I'd be glad to help you with it.