File Upload¶
There will be times when we want to allow the user to upload files to the server. One example would be pictures for a user profile (like Facebook), or for a picture-sharing site like Flikr or pictures of an item for sale. Or, it might be a PDF of a letter of recommendation; I'm often asked to do that.
File Upload with Flask¶
Before class, please read this tutorial on File Upload with Flask.
That previous tutorial page is really too short. The following is better and more complete: Flask File Uploads. Please read it, too.
Issues¶
- Where to store the files (the filesystem or the database)
- Getting them back out of the filesystem or database
- MIME types for files
- The mechanisms for file upload in the browser (new FORM input)
- How to process file upload on the server side in Flask.
Images in Files¶
The most obvious place to put pictures is in the filesystem. For example, we can have an "uploads" subdirectory of our Flask app where we can store headshots of the people in the WMDB. I decided to use the IMDB headshots, and I named them NM.jpg where NM is the person's id in the NAME table.
I added a picfile
table to the WMDB (actually to the copy in
cs304_db
) with the NM and a filename. The filename is almost
entirely determined by the NM, but this approach allows us to have
NM.gif
and NM.png
as well as NM.jpeg
drop table if exists picfile;
create table picfile (
nm int primary key,
filename varchar(50),
foreign key (nm) references person(nm)
on delete cascade on update cascade
);
describe picfile;
An alternative to putting the picture into the filesystem is to put it into the database using a column datatype called "binary large object" or BLOB. For simplicity, we'll omit BLOBs, but let me know if you'd like to learn about them.
File Upload¶
Uploading an image is the same as uploading any file, so we'll cover that. Here are the general principles:
- You need a fancier kind of
FORM
element, specifically a differentENCTYPE
that handles the binary data of a file by specially encoding it. - You need a new kind of
INPUT
element, which gives you awidget
for specifying the file. - Some different processing in the script for getting the data from the file(s) and putting the data either into the filesystem or into a BLOB.
HTML's File Upload INPUT¶
- Use the POST method for your FORM. Recall that one limitation of the GET method is that URLs usually have some bound on length, and we don't want to have that with a file.
ENCTYPE="multipart/form-data"
must be an attribute of the FORM.INPUT TYPE=FILE NAME=name-of-file-input
Here's an example:
{% extends "base.html" %}
{% block main %}
<p>Form to upload a file:
<form method="post" action="" enctype="multipart/form-data">
<p><label>NM: <input type="text" name="nm"></label></p>
<p><label>Pic: <input type="file" name="pic"></label></p>
<p><input type="submit" value="upload"></p>
</form>
{% if nm != '' and src != '' %}
<p>Last upload: {{nm}}</p>
<p> <img src="{{src}}"></p>
{% endif %}
{% endblock %}
File Upload Security Concerns¶
File upload has a number of unique security concerns. We'll discuss:
- filename naming: Don't allow a file to be named something that could
be executed via the web, such as
name.php
orname.cgi
. (Did you notice that the linked reading mentioned XSS attacks?) - file contents: be sure it really is an image (or whatever), rather than some unexpected content.
- file size: a malicious user could try to overwhelm your server or use up all the disk space.
- access to uploaded content: if you can, don't put the uploaded file in a location accessible from the web. This helps you avoid people using your system as, for example, a porn server. Flask allows us to implement a route that serves images, so we'll know what images we are serving.
- access to the upload script: if you can, require people to be authenticated before they upload anything. Then, hopefully, you can force them to be responsible for their actions. It's not a bad idea to keep track of who uploads each file.
MIME Types¶
When the back-end delivers content to the browser, it is often labeled with the type of data. For example, an image might have a "image/jpeg" header. That's its MIME_type.
- MIME is for Multimedia Internet Mail Extensions. It was orginally for emailing attachments. The attachment was labeled as to its type, so that helper applications could be invoked appropriately, without necessarily depending on a filename extension.
- The HTTP headers from the server must always announce the MIME type of the data being returned to the client. When the data comes from a file, it figures it out in a variety of ways, such as the file extension and the "file" command.
Working Flask Example¶
Copy upload
to your account:
cd ~/cs304/
source venv/bin/activate
cp -r ~cs304/pub/downloads/upload upload
cd upload
I'll demo and explain the following:
create-table-picfile.sql
to create the tableinsert-picfiles.py
to bulk-insert the existing picsshow-picfiles.sql
to list the people with picturesapp.py
the example app that does file upload to filesdelete-image.sh
to delete an image
Running It¶
- create the table using
create-table-picfile.sql
- bulk-insert the files using
insert-picfiles.py
- list them, just to see using
picfiles.sql
- run
app-upload-files.py
- display one picture; notice the URL
- display all pictures; notice the URL
- upload a picture
- (optional) delete the picture using
delete-image.sh
- repeat
I recorded a video demonstration of file upload in videos
Flask Code¶
In a moment, we'll look at the app.py
code. It's
lengthy, but comprehensible.
Here's a road map:
- There's some configuration (upload directory name, etc) at lines 18-19
- There's a route
@app.route('/pic/<nm>')
(line 25) which just sends a single picture to the browser. It does that by using the flasksend_from_directory
function, getting the filename out of the database. - There's a route
@app.route('/pics/')
(line 39) that sends a page with all the images on it. Each image is actually shown by the first route, but this shows how we can combined database queries with images. - There's a route
@app.route('/upload/', methods=["GET", "POST"])
(line 48) that does the upload. The meat of it is only a dozen lines of code or so.
Here's the complete app:
'''This example implements file upload to the filesystem'''
from flask import (Flask, render_template, make_response, request, redirect,
url_for, session, flash, send_from_directory, Response)
from werkzeug.utils import secure_filename
app = Flask(__name__)
import sys, os
import cs304dbi as dbi
import secrets
app.secret_key = secrets.token_hex()
# This gets us better error messages for certain common request errors
app.config['TRAP_BAD_REQUEST_ERRORS'] = True
# new for file upload
app.config['UPLOADS'] = 'uploads'
app.config['MAX_CONTENT_LENGTH'] = 1*1024*1024 # 1 MB
@app.route('/')
def index():
return render_template('base.html')
@app.route('/pic/<nm>')
def pic(nm):
conn = dbi.connect()
curs = dbi.dict_cursor(conn)
numrows = curs.execute(
'''select filename from picfile where nm = %s''',
[nm])
if numrows == 0:
flash('No picture for {}'.format(nm))
return redirect(url_for('index'))
row = curs.fetchone()
return send_from_directory(app.config['UPLOADS'],row['filename'])
@app.route('/pics/')
def pics():
conn = dbi.connect()
curs = dbi.dict_cursor(conn)
curs.execute('''select nm,name,filename
from picfile inner join person using (nm)''')
pics = curs.fetchall()
return render_template('all_pics.html',n=len(pics),pics=pics)
@app.route('/upload/', methods=["GET", "POST"])
def file_upload():
if request.method == 'GET':
return render_template('form.html',src='',nm='')
else:
try:
nm = int(request.form['nm']) # may throw error
f = request.files['pic']
user_filename = f.filename
ext = user_filename.split('.')[-1]
filename = secure_filename('{}.{}'.format(nm,ext))
pathname = os.path.join(app.config['UPLOADS'],filename)
f.save(pathname)
conn = dbi.connect()
curs = dbi.dict_cursor(conn)
curs.execute(
'''insert into picfile(nm,filename) values (%s,%s)
on duplicate key update filename = %s''',
[nm, filename, filename])
conn.commit()
flash('Upload successful')
# decided to just re-render the form (rather than use
# post-redirect-get) so that we can display the uploaded
# picture.
return render_template('form.html',
src=url_for('pic',nm=nm),
nm=nm)
except Exception as err:
flash('Upload failed {why}'.format(why=err))
return render_template('form.html',src='',nm='')
if __name__ == '__main__':
if len(sys.argv) > 1:
# arg, if any, is the desired port number
port = int(sys.argv[1])
assert(port>1024)
else:
port = os.getuid()
dbi.conf() # connect to personal database
app.debug = True
app.run('0.0.0.0',port)
The HTML Code¶
There are two parts to the HTML code:
- the browser requesting a particular picture from the server so that it can be displayed (this would be a GET request) with the server sending the picture data to the browser, and
- the browser sending a picture to the server in order for it to be uploaded.
Let's look at them in order. The browser can request a picture in a
number of ways but the most common is as the SRC
attribute of an
<IMG>
tag. In our example code, the template file looks like:
<p><img src="{{url_for('pic',nm=pic['nm'])}}" alt="{{pic['name']}}"></p>
In the code above, the pic
variable is a dictionary read from the
database. It has two keys, the nm
and the name
of the person. We
need the nm
for the URL, and therefore for the url_for
function,
which refers to our pic(nm)
handler function.
Now let's look at uploading a picture. That requires a special form:
<form method="post" action="{{url_for('file_upload')}}" enctype="multipart/form-data">
<p><label>NM: <input type="text" name="nm"></label></p>
<p><label>Pic: <input type="file" name="pic"></label></p>
<p><input type="submit" value="upload"></p>
</form>
Notice that the data (the nm
, which is text, and the pic
, which is
a file) are sent to our file_upload
function, which we saw above.
Caching¶
As we know, browsers cache files, including CSS, JS and images. That can cause trouble when we upload a new image to replace an existing one. I've solved this and I have a decent writeup, but haven't yet had time to write for the web. Let me know if you run into this in your project.
Security Concerns¶
Very high. Bad stuff can happen when you allow untrustworthy stuff onto your server.
- Control the access: don't allow anonymous people to upload stuff for all to see. Require logins, sessions, etc.
- Control the filename. Don't let people name uploads because they might name it as a command or something executable via the web.
- Control the size; don't let people overwhelm your server with huge files.
- Control the location. You don't want people being able to use your server as an image server, serving porn or something.
- Control the content. In the shell, you can check using the Linux file command to check what kind of data this is. In Python, you have to use an extra module, such as
imghdr
orfilemagic
Summary¶
- File upload is an important feature of many web databases, not just FaceBook and Yelp. You should consider its usefulness in your project.
- File upload can be a security risk, so you should be careful.
Specific technology pieces:
- form enctype:
<FORM ENCTYPE="MULTIPART/FORM-DATA" ...>
- form inputs:
<INPUT TYPE=FILE NAME=name-of-file-input>
- folders: you'll probably create a folder for uploaded files, say,
uploads
, next totemplates
andstatic
. You could, of course, have different folders for different kinds of files, etc. - files: you'll need a naming scheme for your uploaded files.
- You could use a counter:
fileNNN.jpg
- You could use an ID:
fileNM.jpg
- You could use a timestamp:
file-2022-04-01-23-01-12.jpg
(your phone's camera does that)
- You could use a counter:
- You'll probably store the filename in an appropriate database table, unless the naming scheme allows the filename to be inferred from other data.
- Flask's
request
object has afiles
dictionary, containing all the uploaded files, as Pythonfile
objects. file
objects have a.save()
method, allowing you to save the file to your desired pathname- You'll construct your pathname from the uploads folder and the
chosen filename using
os.path.join(folder, file)
from the Pythonos
module. - Avoid user data in the filename/pathname, though that is possible.
- Use
secure_filename
if there's any user data in the filename from werkzeug.utils import secure_filename
- To deliver files to the browser, use
send_from_directory(uploaddir, filename)
- The browser requests files (e.g. images) in a separate request, so you have to have a route/endpoing to respond to those requests.
Appendix¶
The following is optional
The Python imghdr
Package¶
Being able to identify the type of a file is important. We can't trust
the way that the file is named; we need to look at the actual
data. One way is the Python imghdr
package.
You can try it by running python and using the following code:
import imghdr
imghdr.what('uploads/123.jpg')