
Welcome!
Everything is fine.
Python and PyMySQL¶
Today, we'll begin Python database scripts.
- Goals
- Plan
- Announcements
- Project Team Formation
- Recap of last Week
- Background Knowledge
- PyMySQL API
- CS304DBI API
- Quiz Questions
- Installing CS304dbi
- Demo
- Code for Today
- Script 0
- Comparison to PyMySQL
- Abstract Template
- People
- Scripts with Command Line Arguments
- BreakOut 1: Look up Pets
- Scripts that Modify the Database
- Prepared Queries
- Famous XKCD Cartoon
- Queries about Prepared Queries
- SQL Injection Attacks
- Summary
Goals¶
- Write a simple Python script that reads from a database and uses a command line arg
- Understand SQL injection attacks and the use of prepared queries
Plan¶
- Discussion/Announcements
- (20) Recap
- PyMySQL API
- CS304dbi API
- Database query scripts
- Database scripts with command line args
- Quiz questions
- (15) Exercise
- (10) Modifying the DB
- (5) Prepared Queries. We may not get to this
Announcements¶
- Grading status
- For pair assignments, one one Gradescope submission is necessary. Thanks.
- Please submit P0 by end of today.
- Please submit H4 ER diagrams soon
- We'll do a mid-semester evaluation next week
Project Team Formation¶
Please put some effort into this.
- Read the Google Doc of ideas
- email people whose ideas you like
- Update the Google spreadsheet with the nucleus of teams
I will demo using a spreadsheet of last year's teams
If necessary, I'll assign teams, but I'd rather it be student-led.
Next week,
- Tuesday, we'll hopefully complete the process
- Wednesday, you'll sit with your team
- We'll discuss norms for Group Work. There's a brief reading on group work
Recap of last Week¶
Concepts
- UID
- processes
- ports
- requests
- handlers
- decorators
- virtual environments
Procedure
- Activate venv
- Run Flask
- Choose Tunnel from VSCode offering
- Go to browser to access website
Let's run example1.py from last time!
source ~/cs304/venv/bin/activate
cd ~/cs304/flask1/
python example1.py
Let's try some of the other examples as well
Background Knowledge¶
How many know about
- the Python magic IF statement:
if __name__ == '__main__':
- Python command line arguments using
sys.argv()
- importing Python modules
- writing a Python module to be imported
If many don't, we'll digress to python skills
PyMySQL API¶
The PyMySQL API:
import pymysql
conn = pymysql.connect()
with keyword args for host, user, password, and db.conn = pymysql.connect()
with keyword arg for credentials fileconn.select_db()
to use a different databasecurs = conn.cursor()
gives you a cursor (a tuple cursor)curs.execute(sql)
executes the given query.curs.execute(sql,list)
executes a prepared query, with the list of tainted values filled in for the placeholders%s
in the query.row = curs.fetchone()
returns one row of the result orNone
when exhaustedrows = curs.fetchall()
returns a list of all the resultsconn.commit()
to make changes permanent
CS304DBI API¶
However, I recommend that we instead use a thin wrapper module that
I wrote, called cs304dbi
. The code is in your pymysql
folder, and
it's not that long, including a demo at the end. You're welcome to
read it, but you don't have to.
import cs304dbi as dbi
dbi.conf(db)
to read the~/.my.cnf
file and use a database. That argument is optional, defaulting to your own, but you should typically supply it, so I can run your codeconn = dbi.connect()
uses the cachecurs = dbi.cursor(conn)
gives you a cursor (a tuple cursor)curs = dbi.dict_cursor(conn)
gives you a dictionary cursor
You can also use:
dbi.cache_cnf()
to read the~/.my.cnf
filedbi.use()
to use a different database, but must be done before connecting
The dbi.conf
function does those steps more easily.
Connections and cursors are exactly the same.
All the rest is the same as PyMySQL, including the conn.select_db()
method, which you can use if you want to switch databases after
connecting.
Quiz Questions¶
As usual, there were many important questions about this
I decided to split the quiz questions in two, deferring all the questions about prepared queries.
Let's start with the other quiz questions.
Installing CS304dbi¶
Activate your Venv and install the CS304dbi.py module
cp ~cs304flask/pub/downloads/pymysql/cs304dbi.py ~/cs304/venv/lib/python3.12/site-packages/
Test that it worked. You'll have to copy or type these by hand, instead of copy/paste. It's good practice.
$ python
>>> import cs304dbi
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'cs304dbi'
>>>
$ source ~/cs304/venv/bin/activate
$ python
>>> import cs304dbi
Demo¶
I'll do a brief demo of how to use cs304dbi
from the Python REPL for
testing, debugging, etc. Feel free to try it with me!
$ source ~/cs304/venv/bin/activate
$ pydoc cs304dbi
$ python
>>> import cs304dbi as dbi
>>> dbi.conf('wmdb')
>>> conn = dbi.connect()
>>> curs = dbi.dict_cursor(conn)
>>> curs.execute('select * from person where nm < 10')
>>> curs.fetchall()
The conf
step combines use
and cache_cnf
. It's optional, but if
you do it, do it before you connect. If you omit it, you'll use the
default database, which is your personal database.
Code for Today¶
Activate your Venv and copy today's examples from the course directory:
source ~/cs304/venv/bin/activate
cd ~/cs304
cp -r ~cs304flask/pub/downloads/pymysql pymysql
cd pymysql
Script 0¶
A script using pymsql
could be as simple this (see
simplescript_raw.py
)
import pymysql
conn = pymysql.connect(host='localhost',
user='cs304guest',
password='secret')
conn.select_db('wmdb')
curs = conn.cursor(pymysql.cursors.DictCursor)
curs.execute('''
select nm,name,birthdate
from person
where nm = 123''')
row = curs.fetchone()
print(row)
or with cs304dbi
could be like this (see
simplescript.py
):
import cs304dbi as dbi
dsn = dbi.conf('wmdb')
conn = dbi.connect()
curs = dbi.cursor(conn)
curs.execute('''
select name,birthdate
from person
where nm=123''')
row = curs.fetchone()
print('{} born on {}'.format(row[0],row[1]))
Comparison to PyMySQL¶
- optionally read and cache the credentials
- optionally use a different database
conn = dbi.connect()
connects with cached credentialscurs = dbi.cursor(conn)
gets a tuple cursorcurs = dbi.dict_cursor(conn)
gets a dictionary cursor
Abstract Template¶
It might be helpful to see this (relatively) concise app outline. Note that this is not running code. There's a lot left out, so that we can see the high-level organization.
# This is not complete running code, but it gives you some examples of
# using a MySQL database from Flask
from flask import (Flask, render_template ...)
app = Flask(__name__)
@app.route('/')
def index():
return render_template('main.html',title='Hello')
@app.route('/get-data/')
def get_data():
# this example uses a dictionary cursor and a prepared query
# to get a list of results
conn = dbi.connect()
curs = dbi.dict_cursor(conn)
curs.execute('select a, b, c from table where x = %s and y = %s',
[13, 42])
val_list = curs.fetchall()
render_template('stuff.html', data = val_list)
@app.route('/get-stats/')
def get_stats():
# this example uses a tuple cursor and a prepared query
# to get one row of results
conn = dbi.connect()
curs = dbi.cursor(conn)
curs.execute('select min(a), max(a), avg(a) from table where x = %s',
[13])
(min_a, max_a, avg_a) = curs.fetchone()
render_template('stats.html', x = min_a, y = max_a, z = avg_a)
@app.route('/insert/', methods=['POST'])
def insert():
# this example uses a tuple cursor to insert some data
conn = dbi.connect()
curs = dbi.cursor(conn)
curs.execute('insert into table(a,b,c)',
[101, 102, 103])
conn.commit()
render_template('ok.html')
if __name__ == '__main__':
import sys, os
db_to_use = 'put_database_name_here_db'
print('will connect to {}'.format(db_to_use))
dbi.conf(db_to_use)
port = os.getuid()
app.debug = True
app.run('0.0.0.0',port)
Notice
- there's a special way to have a function called once just before the first request
- one call to
dbi.conf()
before the first request
People¶
Here's a complete script (people.py
) to list all the people in the
database and their birthdates.
import cs304dbi as dbi
# ==========================================================
# The functions that do most of the work.
def getPeople(conn):
'''Returns the name and birthdate of all the entries in
the person table, as a list of dictionaries.
'''
curs = dbi.dict_cursor(conn)
curs.execute('select name,birthdate from person')
return curs.fetchall()
# ==========================================================
# This starts the ball rolling, *if* the file is run as a
# script, rather than just being imported.
if __name__ == '__main__':
dbi.conf('wmdb')
conn = dbi.connect()
pl = getPeople(conn)
for person in pl:
print('{name} born on {date}'
.format(name=person['name'],
date=person['birthdate']))
Run it!
python people.py
What are your questions?
Try starting a Python REPL and importing the dbi module and trying some of the functions. Try each step here:
>>> import cs304dbi as dbi
>>> dbi.cache_cnf()
>>> dbi.use('wmdb')
>>> conn = dbi.connect()
>>> curs = dbi.dict_cursor(conn)
>>> curs.execute('select * from person where nm < 10')
>>> data = curs.fetchall()
>>> len(data)
>>> print(data)
>>> for row in data:
print(row)
When we've all understood that example, we'll go on to the next exercise.
Scripts with Command Line Arguments¶
Python can access the things on the command line by using sys.argv
which
is a list of strings.
The peopleBornInMonth.py
script lists people born in a given month:
import cs304dbi as dbi
# ==========================================================
# The functions that do most of the work.
def peopleBornInMonth(conn,month):
'''Returns the name and birthdate of people born in given month,
as a list of dictionaries
'''
curs = dbi.dict_cursor(conn)
# this uses a prepared query, since the 'month' comes
# from an untrustworthy source
curs.execute('''
select name,birthdate from person
where month(birthdate) = %s''',
[month])
return curs.fetchall()
# ==========================================================
# This starts the ball rolling, *if* the file is run as a
# script, rather than just being imported.
def usage():
print('''Usage: {script} month
Prints actors born in given month, a number from 1-12'''
.format(script=sys.argv[0]), file=sys.stderr)
if __name__ == '__main__':
import sys
if len(sys.argv) != 2:
usage()
elif not( sys.argv[1].isdigit() and
1 <= int(sys.argv[1]) <= 12):
usage()
else:
dbi.conf('wmdb')
conn = dbi.connect()
people = peopleBornInMonth(conn,int(sys.argv[1]))
for person in people:
print(('{} born on {}'
.format(person['name'], person['birthdate'])))
BreakOut 1: Look up Pets¶
Write a simple script to look up info about a pet given the pet's name. Say, look up the owner's address. Use LIKE to make it easier to find matches.
I suggest you copy personLookup.py
as a starting
point:
cp personLookup.py petLookup.py
If your personal database doesn't have the pet and owner tables,
you can (re)create pet9 and owner9 using
mysql5-refinteg/vet-setup.sql
and
mysql5-refinteg/vet-data.sql
My solution is petLookup_solution.py
. Try it!
Here's the code
'''Lists pet addresses given a (partial) name
Written Spring 2015
Re-written Spring 2018
Re-written Fall 2018 for C9
Re-written for Fall 2019 for pymysql/dbi and python 3
Re-written for Spring 2020 for updated dbi
Scott D. Anderson
'''
import sys
import cs304dbi as dbi
# ================================================================
# The functions that do most of the work.
def getInfoGivenPartialName(conn, partialName):
'''Returns a list of pets with that partial name'''
curs = dbi.dict_cursor(conn)
curs.execute('''select pet9.name,address
from owner9 inner join pet9 using(oid)
where pet9.name like %s''',
['%' + partialName + '%'])
return curs.fetchall()
# ================================================================
# This starts the ball rolling, *if* the script is run as a script,
# rather than just being imported.
def usage():
print('''Usage: {script} partialName
Prints pet's name and address'''
.format(script=sys.argv[0],
file=sys.stderr))
if __name__ == '__main__':
if len(sys.argv) != 2:
usage()
else:
conn = dbi.connect()
for pet in getInfoGivenPartialName(conn, sys.argv[1]):
print('{} who lives at {}'
.format(pet['name'], pet['address']))
Scripts that Modify the Database¶
There's an important difference between the default behavior of the
PyMySQL client and the mysql
client. The mysql
client automatically
commits the changes (makes them permanent). The PyMySQL
default
behavior requires you to commit the changes by using the commit()
on
the connection object.
In raw PyMySQL:
conn = pymysql.connect(...)
curs = conn.cursor()
curs.execute('delete from pet')
conn.commit()
The personInsert.py
script is an example. Note that the connection is
done with just one line, since we want to use the default credentials file
and not use
a different database. So all we need is
conn = dbi.connect()
Here's the complete code:
'''Inserts a person
Written Spring 2015
Re-written Spring 2018
Re-written Fall 2018 for C9
Re-written in Fall 2019 for pymysql, dbi, and python3
Re-written in Spring 2020 for updated dbi module
Scott D. Anderson
'''
import sys
import cs304dbi as dbi
STAFFID = 1 # my id is 1. Yours is different. See assignment 1
# ================================================================
# The functions that do most of the work.
def personInsert(conn, nm, name, birthdate=None):
'''Inserts the given person'''
curs = dbi.cursor(conn)
# nr is number of rows affected. Should be 1
nr = curs.execute('''
insert into person(nm,name,birthdate,addedby)
values (%s,%s,%s,%s)''',
[nm,name,birthdate,STAFFID])
conn.commit()
return nr
# ================================================================
def usage():
print('''Usage: {script} nm name [birthdate]
Inserts that person; birthdate is optional'''
.format(script=sys.argv[0]),
file=sys.stderr)
if __name__ == '__main__':
if len(sys.argv) < 3:
usage()
elif not sys.argv[1].isdigit():
usage()
else:
conn = dbi.connect()
if len(sys.argv) < 4:
print('omitting birthdate info')
nr = personInsert(conn,sys.argv[1],sys.argv[2],None)
else:
nr = personInsert(conn,sys.argv[1],sys.argv[2],sys.argv[3])
print('{} inserted'.format(nr))
The personDelete.py
script is similar.
Prepared Queries¶
Most students struggle with SQL Injection and prepared queries. That's normal.
I'm going to try a couple of metaphors. If they don't work, I'm sorry.
- in making a chicken caesar salad, you keep the raw chicken separate from the lettuce, tomatoes and such (because, ew, salmonella), then you cook the chicken (but not the veggies) and you combine them at the end.
- attacking a vending machine is easier if you don't just insert a quarter/token, you get to insert part of the blueprints. Instead, we build the machine and only later allow users to insert stuff.
- Code as arguments:
How many if
statements in the following code? What does it return?
def foo(stuff):
stuff
if x > 3:
return 1
else:
return 2
foo('''x = 7
if x > 4:
return 3
''')
Famous XKCD Cartoon¶
Queries about Prepared Queries¶
Let's discuss the prepared queries quiz questions.
SQL Injection Attacks¶
There is a video of this in the videos, so we'll skip the demo.
There are two personDelete
scripts in that directory. Compare them
with the diff
command:
$ diff personDelete.py personDeleteBad.py
They differ in just one line, shown below. Let's focus on those two ways to incorporate variable data into a SQL statement:
nr = curs.execute('''delete from person where nm = %s''' % (nm,))
nr = curs.execute('''delete from person where nm = %s''' , [nm])
See the difference? The first uses Python's string formatting syntax to create an execute a single string. The latter uses PyMySQL's prepared query syntax.
Does it matter? YES! The former is a SQL injection vulnerability.
Watch as I show you in my copy of the database; you can try yours as
well. Note that I'm using mysql
to look up the person, rather than the
personLookup.py
script because the latter looks in the WMDB and not my
personal database.
Let's do the controlled experiment. First, what should happen:
mysql < wmdb-copy.sql
mysql -e 'select * from person where nm = 666'
python personDelete.py 666
mysql -e 'select * from person where nm = 666'
python personInsert.py 666 Satan
mysql -e 'select * from person where nm = 666'
python personDelete.py '666 or 1 = 1'
mysql -e 'select * from person where nm = 666'
So it works fine, even when some nefarious user gives '666 or 1 = 1' as the NM.
Now, let's show what should not happen:
mysql < wmdb-copy.sql
mysql -e 'select * from person where nm = 666'
python personDeleteBad.py 666
mysql -e 'select * from person where nm = 666'
python personInsert.py 666 Satan
mysql -e 'select * from person where nm = 666'
python personDeleteBad.py '666 or 1 = 1'
mysql -e 'select count(*) from person;'
Oops. The point is that the bad version allows a nefarious user to execute the following SQL code:
delete from person
where nm = 666 or 1 = 1;
Now you can better understand the famous XKCD cartoon illustrating the point:
Summary¶
- See API, above
- Optional setup:
dbi.conf
- as necessary:
dbi.connect
- to query: create a cursor, execute and fetch results
- to update: create a cursor, execute, and commit using connection
- if there are dynamic values from untrustworthy sources, use prepared queries