File Upload
There will be times when we want to allow the user to upload files to the server. One example would be pictures for a user profile (like Facebook), or for a picture-sharing site like Flikr or pictures of an item for sale. Or, it might be a PDF of a letter of recommendation; I'm often asked to do that.
- File Upload with Multer
- Issues
- Images in Files
- File Upload
- HTML's File Upload INPUT
- File Naming
- Example 1, Open
- Example 1, Configure Multer
- Three Examples
- Open Example 1, Endpoints and Handlers
- Example 1, Upload Form
- Example 1, Upload
- Analysis
- Limited Example 2: Open with Authentication and Authorization
- Example 2, Upload
- Analysis
- Example 3, Private
- Error Handling
- Video
- Conclusion
- Summary
- Optional Chaining
File Upload with Multer¶
Multer is a standard Node.js module for handling requests that are
encoded as multipart/form-data, which is the format for requests
that include files. We'll talk more about that in a minute. Meanwhile,
before class, please read this tutorial on
Issues¶
- Where to store the files (the filesystem or the database)
- Getting them back out of the filesystem or database
- MIME types for files
- The mechanisms for file upload in the browser (new FORM input)
- How to process file upload on the server side in Flask.
Images in Files¶
The most obvious place to put pictures is in the
filesystem. (Terminology: the "filesystem" is the place where files
and folders live. It's usually a disk but could be a thumb drive or
something else.) For example, we can have an uploads subdirectory of
our Express app where we can store files.
Our MongoDB database collection can have documents like:
{
_id: ObjectId("642496122c0d194d852a091b"),
title: 'john lewis',
path: '/uploads/photo-154834.jpg'
},
{
_id: ObjectId("642496fd2c0d194d852a091c"),
title: 'george clooney',
path: '/uploads/photo-155229.png'
}
(An alternative to putting the picture into the filesystem is to put it into the database which has some advantages in many situations, but it's conceptually easier to put them in the filesystem, so that's what we'll do. Let me know if you're curious about saving them in the database.)
File Upload¶
Uploading an image is the same as uploading any file, so we'll cover that. Here are the general principles:
- You need a fancier kind of
FORMelement, specifically a differentENCTYPEthat handles the binary data of a file by specially encoding it. - You need a new kind of
INPUTelement, which gives you awidget
for specifying the file. - Some different processing in
server.jsfor getting the file from the request.
HTML's File Upload INPUT¶
- Use the POST method for your FORM. Recall that one limitation of the GET method is that URLs usually have some bound on length, and we don't want to have that with a file.
ENCTYPE="multipart/form-data"must be an attribute of the FORM.INPUT TYPE=FILE NAME=name-of-file-input
Here's an example:
<form action="/upload" method="POST" enctype="multipart/form-data">
<p>
<label for="title">title
<input type="text" name="title" placeholder="photo title">
</label>
</p><p>
<label>file
<input type="file" accept="image/*" name="photo" >
</label>
</p><p>
<input type="submit" value="Submit Uploads">
<input type="reset" value="Reset">
</p>
</form>
File Naming¶
The uploaded files will have names on the user's computer, but we should not necessarily use those names. In fact, we probably should not, for reasons of uniqueness and security.
In principle, files will be uploaded by lots of independent
users. Imagine an app where students can upload resumes. Almost
everyone will upload a file called resume.pdf or something pretty
similar.1 So, we need a way to generate unique
filenames. Then we'll need to store the filenames in the database.
How to name the files? My cellphone names pictures based on the time
the picture was taken, like YYYYMMDDHHMMSS. (The first four digits are
the year, the next two are the month, and so forth down.) That naming
scheme assumes I don't take more than one picture per second. Pretty
reasonable for a cell phone, but maybe not for a web application that
allows uploads from many users at the same time. The Multer
documentation suggests a pattern like <type>-<time>-<random> with
the <type> being the name of the file input (such as avatar or
profile or some such), the <time> being the number of milliseconds
since 1/1/19702 (a huge number) and the <random> being a
9-digit random number, so that even two file uploads of the same type
at the same millisecond would be highly unlikely to match. That seems
excessive for our example, so we'll use a similar pattern but
<type>-<hhmmss>.<ext>, where the <ext> is the file extension like
jpeg or png. The <hhmmss> is actually inadequate for long-term
use (files exactly a day apart would collide), but it is fine for a
simple demo.
You'll have to think about how you want to generate filenames in your projects.
Here's a helper function we will use:
/* input is an (optional) date object. Returns a string like 123456
for 56 seconds past 12:34. If the argument is omitted, the current
time is used.
*/
function timeString(dateObj) {
if( !dateObj) {
dateObj = new Date();
}
// convert val to two-digit string
d2 = (val) => val < 10 ? '0'+val : ''+val;
let hh = d2(dateObj.getHours())
let mm = d2(dateObj.getMinutes())
let ss = d2(dateObj.getSeconds())
return hh+mm+ss
}
The local function d2 just takes a numeric input and returns a
two-digit string, so 5 becomes '05' and 34 becomes '34'.
Example 1, Open¶
Our first example focuses on the basic functionality:
- users can upload files along with a title
- all uploaded files are displayed on the main page
- files are served out like any other static file (e.g. CSS files)
Example 1, Configure Multer¶
First, we need to configure Multer. We'll upload the files to a folder
called uploads and we'll serve them out of that folder as static
files. We'll do that like this:
app.use('/uploads', express.static('uploads'));
Next, we configure the storage property of Milter. Milter can store
files (temporarily) in memory or permanently on disk. We'll use the
latter. We have to define functions that determine the folder it will
store the files in (destination) and the file naming pattern
(filename):
var storage = multer.diskStorage({
destination: function (req, file, cb) {
cb(null, 'uploads')
},
filename: function (req, file, cb) {
let parts = file.originalname.split('.');
let ext = parts[parts.length-1];
let hhmmss = timeString();
cb(null, file.fieldname + '-' + hhmmss + '.' + ext);
}
})
Finally, we create a middleware function (called upload) using the
milter module. Its argument is a dictionary of storage
specifications and, optionally, filesize limits. We should almost
always specify filesize limits to avoid attacks in which someone
occupies our server and exhausts our disk space by uploading enormous
files.
var upload = multer({ storage: storage,
// max fileSize in bytes, causes an ugly error
limits: {fileSize: 1_000 }});
The value of upload is a function that is suitable for use as
middleware. Recall that middleware are functions in Express that
take arguments like this:
function (req, res, next) { ... }
Middleware functions are executed with the request before our final
handler gets it, with each invoking next to execute the next
function in the middleware chain.
In Express, we usually let the middleware run for every
request. That's what app.use(middleware) does. For Milter, we don't
want it to run for every request. We only want it to run for our
upload endpoint. We'll get to that in a minute.
Three Examples¶
The rest of this document will describe three examples, all of which
are in the fileUploads folder in /cs304node/apps:
open.js, which shows the basic upload functionality without any security restrictions. This demo would be appropriate for a corporate or academic intranet where all the users are trustworthy.limited.jswhich has some improved error handling and some important security restrictions. Unfortunately, there's a subtle security hole, so this isn't quite right.private.jswhich fixed the security hole from thelimitedexample.
Open Example 1, Endpoints and Handlers¶
For the open example, we'll just have a few endpoints, namely the
main page and the route that does the upload. The example creates a
collection called files in your personal database, which we can get
using the USER environment variable. So we have these global
constants:
const DB = process.env.USER; // 'uploadTest';
const FILES = 'files';
The main page is generated by the following handler:
app.get('/', async (req, res) => {
const db = await Connection.open(mongoUri, DB);
let files = await db.collection(FILES).find({}).toArray();
return res.render('open.ejs', {uploads: files});
});
There's nothing really surprising here. The handler gets every file
from the collection and passes the array to the render function.
The EJS file can be broken down into two major portions. The first is the form to allow file upload, which we saw above, but we'll review in a moment. The second is the display of the currently uploaded files:
<div class="container">
<% uploads.forEach( file => { %>
<figure>
<img class="uploadedImg" src="<%= file.path %>" alt="<%= file.title %>">
<figcaption><%= file.title || 'Untitled' %></figcaption>
</figure>
<% }) %>
</div>
That uses the file.path property, which will be something like
/uploads/photo-123456.jpeg. The documents may also have a
file.title property that we can use for captions and alt text.
Example 1, Upload Form¶
Let's review the file upload form:
<form action="/upload" method="POST" enctype="multipart/form-data">
<p>
<label for="title">title
<input type="text" name="title" placeholder="photo title">
</label>
</p><p>
<label>file
<input type="file" accept="image/*" name="photo" >
</label>
</p><p>
<input type="submit" value="Submit Uploads">
<input type="reset" value="Reset">
</p>
</form>
- it uses
method="POST" - it uses an
enctype="multipart/form-data"; that's necessary for file upload. - it has a
type=textinput for thetitleof the photo - it has a
type=fileinput for thephotoitself - the
action="/upload"which is where the form data will go
Now, let's look at the /upload handler, which is where the form data
(including the file) goes.
Example 1, Upload¶
The /upload endpoint uses a syntax that tucks in the upload
middleware after the first argument, shifting our handler function to
the third argument. That means that the upload middle will run
before ours. It will parse the multi-part form data, and put the data
in the request (req) object. In this case, we know the form is just
uploading a single file and the input name is photo, so we use the
following as the middleware function:
upload.single('photo')
(The are other variations if you are allowing the user to upload several files at a time and so forth.)
Here's the full handler, including our code:
app.post('/upload', upload.single('photo'), async (req, res) => {
console.log('uploaded data', req.body);
console.log('file', req.file);
// insert file data into mongodb
const db = await Connection.open(mongoUri, DB);
const unprot = db.collection(UNPROT);
const result = await unprot.insertOne({title: req.body.title,
path: '/uploads/'+req.file.filename});
console.log('insertOne result', result);
return res.redirect('/');
});
We can get an object representing the uploaded file as
req.file. That object has properties like:
file {
fieldname: 'photo',
originalname: 'john-lewis-507376.jpg',
encoding: '7bit',
mimetype: 'image/jpeg',
destination: 'uploads',
filename: 'photo-154834.jpg',
path: 'uploads/photo-154834.jpg',
size: 2081
}
fieldnameis the name of the form input; here it'sphoto.originalnameis the name of the file on the user's computer.mimetypeis a standard representation of the kind of data the file is.destinationis where the file was storedfilenameis our server-side generated filename. This file was uploaded at 3:48pm.pathis the combination of destination and filenamesizeis the size of the file in bytes. This one is pretty small.
The rest of the function is code you've seen before: it connects to the database, and inserts one document in the collection.
That's it! We'll play with this demo in class, but it covers all the basics.
Analysis¶
There are absolutely no restrictions on who can upload a file or what they can upload. Furthermore, anyone can see the uploaded files. As mentioned earlier, this might be appropriate for an environment (such as a corporate intranet) where everyone is authenticated and trustworthy.
In more general situations, we might want to require people to login before uploading a file, and files could then be associate with an owner. Furthermore, we might want to limit who can view the uploaded files. For example, in many social media apps, you can limit viewing to just yourself or just people on your friends list, and so forth.
The limited app is more protected. The new app will have users with
logins and each user will have a collection of photos that they own.
- Users should be required to login before they can upload. That way, if anyone uploads something that shouldn't be there, you'll know who did it.
- We'll check whether someone is authorized to view the photo collection for a given user.
For simplicity, we'll have only the user themself authorized to view
their photos, but we'll have an isAuthorizedToView function that
could, in principle, look up whether the logged-in user is authorized
to view the files in some other person's collection.
(FYI, these two related ideas are called authentication and authorization. Authentication is who someone is, while authorization is what they are allowed to do.)
Limited Example 2: Open with Authentication and Authorization¶
The next example has forms to register and login, just like we saw in the reading on passwords. For the sake of brevity, I'll skip all that.
There will now be two collections:
fileOwnerswhich is a list of documents each of which is a registered user and their password. (In this example, we didn't encrypt the passwords.)filesOwnedwhich is similar to our previousfilescollection, but now each document records theownerof the file in a property of that name.
Therefore, we define these constants:
// collections in the user's personal database
const DB = process.env.USER;
const FILES = 'filesOwned';
const USERS = 'fileOwners';
Once someone is logged in, they are redirected to a route similar to this one:
// the photos of the logged-in user
app.get('/myphotos', async (req, res) => {
const db = await Connection.open(mongoUri, DB);
const fileCol = db.collection(FILES);
const username = req.session.username;
if (!username) {
console.log("not logged in");
req.flash('info', "You are not logged in");
return res.redirect('/login');
}
const uploads = await db.collection(FILES).find({owner: username}).toArray();
const users = await db.collection(USERS).find({}).toArray();
const userId = req.session.userId;
return res.render('auth.ejs', {username, userId, users, uploads});
});
This code just takes the username from the session and uses it to
look up the files that have {owner: username}. It also looks up all
the users in the app. Finally, it renders a page with that
information. The page looks like this:
As you can see, there's a list of links to all users. Each of those is a URL that comes to this endpoint:
// The :username in the URL (endpoint) is the username of the person
// whose photos we want to view.
app.get('/photos/:username', async (req, res) => {
const photoOwner = req.params.username; // username of owner of photos
const username = req.session.username; //
if (!username) {
console.log("not logged in");
req.flash('info', "You are not logged in");
return res.redirect('/login');
}
if (!isAuthorizedToView(username, photoOwner)) {
console.log("not authorized");
req.flash('info', "You are not allowed to view this person's photos")
// send them to the main page
return res.redirect('/')
}
// database lookup
const db = await Connection.open(mongoUri, DB);
const fileCol = db.collection(FILES);
const uploads = await db.collection(FILES).find({owner: photoOwner}).toArray();
const users = await db.collection(USERS).find({}).toArray();
const userId = req.session.userId;
return res.render('auth.ejs', {username, userId, users, uploads});
});
Most of that code is the same as the code for the /myphotos handler,
but now it's possible that the logged-in user (req.session.username)
might not be the same as the id of the person whose photos these are
(photoOwner). Before we allow the person to view the pictures, we
check if they are authorized (isAuthorizedToView). If they are not
authorized, we flash an error message and send them on their
way. Otherwise, we proceed exactly as we did with /myphotos.
Example 2, Upload¶
The procedure for upload is almost the same, but we just have to record the owner of the file. We also have do to stuff like checking that the user is logged in, etc.
app.post('/upload', upload.single('photo'), async (req, res) => {
const username = req.session.username;
if (!username) {
req.flash('info', "You are not logged in");
return res.redirect('/login');
}
console.log('uploaded data', req.body);
console.log('file', req.file);
// insert file data into mongodb
const db = await Connection.open(mongoUri, DB);
const result = await db.collection(FILES)
.insertOne({title: req.body.title,
owner: username,
path: '/uploads/'+req.file.filename});
console.log('insertOne result', result);
// always nice to confirm with the user
req.flash('info', 'file uploaded');
return res.redirect('/');
});
Analysis¶
This works well and does almost exactly what we want. However, there's
a flaw. The URLs for the various images look like
/uploads/photo-202341.png and those URLs are handled by our
staticServer handler, which doesn't check for authorization. So the
URLs work whether or not the correct person is logged in!
Here's a scenario:
- Alice logs in.
- She uploads a photo and its URL is
/uploads/photo-202341.png - She's pleased with it, and sends the URL to a friend
- That friend is able to view the photo without being Alice.
- In fact, the friend doesn't even have to login!
- The "friend" can share the URL with other people as well
- Alice has completely lost control of her picture.
(This is very much like the "anyone with the link" sharing option in Google Docs. So, this might be an acceptable solution in some cases.)
If we want to restrict viewing only to logged-in, authorized users, the solution is to not serve the files in the same way that we serve static files. Instead, we'll intercept those requests and check that the person is authorized before serving them.
Example 3, Private¶
This last version of the app, private.js is almost identical to the
previous one, but instead of
app.use('/uploads', express.static('uploads'));
which sets up the static and un-checked serving of files out of the
/uploads folder, we substitute the following endpoint:
// Added this to require authorization to view a file
app.get('/uploads/:file', async (req, res) => {
const filename = req.params.file;
console.log('getting', filename);
const username = req.session.username;
if (!username) {
req.flash('info', "You are not logged in");
return res.redirect('/login');
}
const db = await Connection.open(mongoUri, DB);
const pathname = '/uploads/'+filename;
const fileDoc = await db.collection(FILES).findOne({path: pathname});
if(!fileDoc) {
console.log("no such file");
req.flash('error', "No such file");
return res.redirect('/myphotos');
}
if(!isAuthorizedToView(username, fileDoc.owner)) {
console.log("not authorized");
req.flash('info', "You are not authorized to view this file")
return res.redirect('/myphotos');
}
return res.sendFile(path.join(__dirname, pathname));
});
This endpoint handler checks that the person is logged in, that the file exists, and that the user is authorized to view the file. None of that code has anything new in it, though you should read over it to make sure it's clear.
The only really tricky part of this function is the last line:
return res.sendFile(path.join(__dirname, pathname));
The __dirname variable is a variable that is set by Express to the
directory that the app is running in. For example,
/home/cs304node/apps/fileUpload for this app. Joining that with the
pathname, like /uploads/photo-202341.png yields a complete pathname
to the file. The res.sendFile function just sends a file to the
browser as its response.
Error Handling¶
There is one other feature that we added to the limited and
private versions that the simple open version lacked is nicer
error handling. The open version handles errors in the default
Express way by printing a backtrace to the browser. That's great for
debugging, but now we've introduced the possibility of a user error
caused by uploading a file that is too big. If that happens, we want
to give the user an nice error message and redirect them someplace
else in our app.
We can set up an error handler in Express. (You can read more about error handling in Express.) The following does the trick:
app.use((err, req, res, next) => {
console.log('error', err);
if(err.code === 'LIMIT_FILE_SIZE') {
console.log('file too big')
req.flash('error', 'file too big')
res.redirect('/')
} else {
console.error(err.stack)
res.status(500).send('Something broke!')
}
})
We define this at the end of the file (probably should go in the
"postlude"), so that it goes last of all the middleware. It looks at
the error code, err.code, that Multer puts in the error object and,
if it's the LIMIT_FILE_SIZE error, it flashes an error message and
returns the user to the home page. Otherwise, it sends a terse error
message to the browser and prints the backtrace to the Node.js
console.
Video¶
There's a video of these three versions in action in our videos collection. Look near the end of the page.
Conclusion¶
File upload is somewhat complicated, but well worth doing. There are, however, some security concerns that you should keep in mind, because bad stuff can happen when you allow untrustworthy stuff onto your server.
- Control the access: don't allow anonymous people to upload stuff for all to see. Require logins, sessions, etc.
- Control the filename. Don't let people name uploads because they might name it as a command or something executable via the web.
- Control the size; don't let people overwhelm your server with huge files.
- Control the location. You don't want people being able to use your server as an image server, serving porn or something.
- Control the content. In the shell, you can check using the Linux file command to check what kind of data this is.
Summary¶
Optional Chaining¶
(Just in case they don't have a title property, we use the
file?.title syntax, which is called optional
chaining. Optional
chaining means that if the property doesn't exist, you get undefined
or null rather than throwing an error.)
-
I run into the same thing as a professor. Every student will turn in a file named
hwk1.pdf— understandably — but that makes it hard to tell who each file belongs to! ↩ -
Unix time traditionally is measured in milliseconds since 1/1/1970, which is called "the epoch". January 1, 2023 is 1672549200000 milliseconds since the epoch. ↩