File Upload

There will be times when we want to allow the user to upload files to the server. One example would be pictures for a user profile (like Facebook), or for a picture-sharing site like Flikr or pictures of an item for sale. Or, it might be a PDF of a letter of recommendation; I'm often asked to do that.

File Upload with Multer¶

Multer is a standard Node.js module for handling requests that are encoded as multipart/form-data, which is the format for requests that include files. We'll talk more about that in a minute. Meanwhile, before class, please read this tutorial on

File Upload with Multer

Issues¶

Where to store the files (the filesystem or the database)
Getting them back out of the filesystem or database
MIME types for files
The mechanisms for file upload in the browser (new FORM input)
How to process file upload on the server side in Flask.

Images in Files¶

The most obvious place to put pictures is in the filesystem. (Terminology: the "filesystem" is the place where files and folders live. It's usually a disk but could be a thumb drive or something else.) For example, we can have an uploads subdirectory of our Express app where we can store files.

Our MongoDB database collection can have documents like:

{
    _id: ObjectId("642496122c0d194d852a091b"),
    title: 'john lewis',
    path: '/uploads/photo-154834.jpg'
  },
  {
    _id: ObjectId("642496fd2c0d194d852a091c"),
    title: 'george clooney',
    path: '/uploads/photo-155229.png'
  }

(An alternative to putting the picture into the filesystem is to put it into the database which has some advantages in many situations, but it's conceptually easier to put them in the filesystem, so that's what we'll do. Let me know if you're curious about saving them in the database.)

File Upload¶

Uploading an image is the same as uploading any file, so we'll cover that. Here are the general principles:

You need a fancier kind of FORM element, specifically a different ENCTYPE that handles the binary data of a file by specially encoding it.
You need a new kind of INPUT element, which gives you a widget for specifying the file.
Some different processing in server.js for getting the file from the request.

HTML's File Upload INPUT¶

Use the POST method for your FORM. Recall that one limitation of the GET method is that URLs usually have some bound on length, and we don't want to have that with a file.
ENCTYPE="multipart/form-data" must be an attribute of the FORM.
INPUT TYPE=FILE NAME=name-of-file-input

Here's an example:

    <form action="/upload" method="POST" enctype="multipart/form-data">
        <p>
            <label for="title">title
                <input type="text" name="title" placeholder="photo title">
            </label>
        </p><p>
            <label>file
                <input type="file" accept="image/*" name="photo" >
            </label>
        </p><p>
            <input type="submit" value="Submit Uploads">
            <input type="reset" value="Reset">
        </p>
    </form>

File Naming¶

The uploaded files will have names on the user's computer, but we should not necessarily use those names. In fact, we probably should not, for reasons of uniqueness and security.

In principle, files will be uploaded by lots of independent users. Imagine an app where students can upload resumes. Almost everyone will upload a file called resume.pdf or something pretty similar.¹ So, we need a way to generate unique filenames. Then we'll need to store the filenames in the database.

How to name the files? My cellphone names pictures based on the time the picture was taken, like YYYYMMDDHHMMSS. (The first four digits are the year, the next two are the month, and so forth down.) That naming scheme assumes I don't take more than one picture per second. Pretty reasonable for a cell phone, but maybe not for a web application that allows uploads from many users at the same time. The Multer documentation suggests a pattern like <type>-<time>-<random> with the <type> being the name of the file input (such as avatar or profile or some such), the <time> being the number of milliseconds since 1/1/1970² (a huge number) and the <random> being a 9-digit random number, so that even two file uploads of the same type at the same millisecond would be highly unlikely to match. That seems excessive for our example, so we'll use a similar pattern but <type>-<hhmmss>.<ext>, where the <ext> is the file extension like jpeg or png. The <hhmmss> is actually inadequate for long-term use (files exactly a day apart would collide), but it is fine for a simple demo.

You'll have to think about how you want to generate filenames in your projects.

Here's a helper function we will use:

/* input is an (optional) date object. Returns a string like 123456 
for 56 seconds past 12:34. If the argument is omitted, the current
time is used.
*/

function timeString(dateObj) {
    if( !dateObj) {
        dateObj = new Date();
    }
    // convert val to two-digit string
    d2 = (val) => val < 10 ? '0'+val : ''+val;
    let hh = d2(dateObj.getHours())
    let mm = d2(dateObj.getMinutes())
    let ss = d2(dateObj.getSeconds())
    return hh+mm+ss
}

The local function d2 just takes a numeric input and returns a two-digit string, so 5 becomes '05' and 34 becomes '34'.

Example 1, Open¶

Our first example focuses on the basic functionality:

users can upload files along with a title
all uploaded files are displayed on the main page
files are served out like any other static file (e.g. CSS files)

Example 1, Configure Multer¶

First, we need to configure Multer. We'll upload the files to a folder called uploads and we'll serve them out of that folder as static files. We'll do that like this:

app.use('/uploads', express.static('uploads'));

Next, we configure the storage property of Milter. Milter can store files (temporarily) in memory or permanently on disk. We'll use the latter. We have to define functions that determine the folder it will store the files in (destination) and the file naming pattern (filename):

var storage = multer.diskStorage({
  destination: function (req, file, cb) {
    cb(null, 'uploads')
  },
  filename: function (req, file, cb) {
      let parts = file.originalname.split('.');
      let ext = parts[parts.length-1];
      let hhmmss = timeString();
      cb(null, file.fieldname + '-' + hhmmss + '.' + ext);
  }
})

Finally, we create a middleware function (called upload) using the milter module. Its argument is a dictionary of storage specifications and, optionally, filesize limits. We should almost always specify filesize limits to avoid attacks in which someone occupies our server and exhausts our disk space by uploading enormous files.

var upload = multer({ storage: storage,
                      // max fileSize in bytes, causes an ugly error
                      limits: {fileSize: 1_000 }});

The value of upload is a function that is suitable for use as middleware. Recall that middleware are functions in Express that take arguments like this:

function (req, res, next) { ... }

Middleware functions are executed with the request before our final handler gets it, with each invoking next to execute the next function in the middleware chain.

In Express, we usually let the middleware run for every request. That's what app.use(middleware) does. For Milter, we don't want it to run for every request. We only want it to run for our upload endpoint. We'll get to that in a minute.

Three Examples¶

The rest of this document will describe three examples, all of which are in the fileUploads folder in /cs304node/apps:

open.js, which shows the basic upload functionality without any security restrictions. This demo would be appropriate for a corporate or academic intranet where all the users are trustworthy.
limited.js which has some improved error handling and some important security restrictions. Unfortunately, there's a subtle security hole, so this isn't quite right.
private.js which fixed the security hole from the limited example.

Open Example 1, Endpoints and Handlers¶

For the open example, we'll just have a few endpoints, namely the main page and the route that does the upload. The example creates a collection called files in your personal database, which we can get using the USER environment variable. So we have these global constants:

const DB = process.env.USER;    // 'uploadTest';
const FILES = 'files';

The main page is generated by the following handler:

app.get('/', async (req, res) => {
    const db = await Connection.open(mongoUri, DB);
    let files = await db.collection(FILES).find({}).toArray();
    return res.render('open.ejs', {uploads: files});
});

There's nothing really surprising here. The handler gets every file from the collection and passes the array to the render function.

The EJS file can be broken down into two major portions. The first is the form to allow file upload, which we saw above, but we'll review in a moment. The second is the display of the currently uploaded files:

    <div class="container">
      <% uploads.forEach( file => { %>
        <figure>
          <img class="uploadedImg" src="<%= file.path %>" alt="<%= file.title %>">
          <figcaption><%= file.title || 'Untitled' %></figcaption>
        </figure>
      <% }) %>
    </div>

That uses the file.path property, which will be something like /uploads/photo-123456.jpeg. The documents may also have a file.title property that we can use for captions and alt text.

Example 1, Upload Form¶

Let's review the file upload form:

    <form action="/upload" method="POST" enctype="multipart/form-data">
        <p>
            <label for="title">title
                <input type="text" name="title" placeholder="photo title">
            </label>
        </p><p>
            <label>file
                <input type="file" accept="image/*" name="photo" >
            </label>
        </p><p>
            <input type="submit" value="Submit Uploads">
            <input type="reset" value="Reset">
        </p>
    </form>

it uses method="POST"
it uses an enctype="multipart/form-data"; that's necessary for file upload.
it has a type=text input for the title of the photo
it has a type=file input for the photo itself
the action="/upload" which is where the form data will go

Now, let's look at the /upload handler, which is where the form data (including the file) goes.

Example 1, Upload¶

The /upload endpoint uses a syntax that tucks in the upload middleware after the first argument, shifting our handler function to the third argument. That means that the upload middle will run before ours. It will parse the multi-part form data, and put the data in the request (req) object. In this case, we know the form is just uploading a single file and the input name is photo, so we use the following as the middleware function:

upload.single('photo')

(The are other variations if you are allowing the user to upload several files at a time and so forth.)

Here's the full handler, including our code:

app.post('/upload', upload.single('photo'), async (req, res) => {
    console.log('uploaded data', req.body);
    console.log('file', req.file);
    // insert file data into mongodb
    const db = await Connection.open(mongoUri, DB);
    const unprot = db.collection(UNPROT);
    const result = await unprot.insertOne({title: req.body.title,
                                           path: '/uploads/'+req.file.filename});
    console.log('insertOne result', result);
    return res.redirect('/');
});

We can get an object representing the uploaded file as req.file. That object has properties like:

file {
  fieldname: 'photo',
  originalname: 'john-lewis-507376.jpg',
  encoding: '7bit',
  mimetype: 'image/jpeg',
  destination: 'uploads',
  filename: 'photo-154834.jpg',
  path: 'uploads/photo-154834.jpg',
  size: 2081
}

fieldname is the name of the form input; here it's photo.
originalname is the name of the file on the user's computer.
mimetype is a standard representation of the kind of data the file is.
destination is where the file was stored
filename is our server-side generated filename. This file was uploaded at 3:48pm.
path is the combination of destination and filename
size is the size of the file in bytes. This one is pretty small.

The rest of the function is code you've seen before: it connects to the database, and inserts one document in the collection.

That's it! We'll play with this demo in class, but it covers all the basics.

Analysis¶

There are absolutely no restrictions on who can upload a file or what they can upload. Furthermore, anyone can see the uploaded files. As mentioned earlier, this might be appropriate for an environment (such as a corporate intranet) where everyone is authenticated and trustworthy.

In more general situations, we might want to require people to login before uploading a file, and files could then be associate with an owner. Furthermore, we might want to limit who can view the uploaded files. For example, in many social media apps, you can limit viewing to just yourself or just people on your friends list, and so forth.

The limited app is more protected. The new app will have users with logins and each user will have a collection of photos that they own.

Users should be required to login before they can upload. That way, if anyone uploads something that shouldn't be there, you'll know who did it.
We'll check whether someone is authorized to view the photo collection for a given user.

For simplicity, we'll have only the user themself authorized to view their photos, but we'll have an isAuthorizedToView function that could, in principle, look up whether the logged-in user is authorized to view the files in some other person's collection.

(FYI, these two related ideas are called authentication and authorization. Authentication is who someone is, while authorization is what they are allowed to do.)

Limited Example 2: Open with Authentication and Authorization¶

The next example has forms to register and login, just like we saw in the reading on passwords. For the sake of brevity, I'll skip all that.

There will now be two collections:

fileOwners which is a list of documents each of which is a registered user and their password. (In this example, we didn't encrypt the passwords.)
filesOwned which is similar to our previous files collection, but now each document records the owner of the file in a property of that name.

Therefore, we define these constants:

// collections in the user's personal database

const DB = process.env.USER;
const FILES = 'filesOwned';
const USERS = 'fileOwners';

Once someone is logged in, they are redirected to a route similar to this one:

// the photos of the logged-in user

app.get('/myphotos', async (req, res) => {
    const db = await Connection.open(mongoUri, DB);
    const fileCol = db.collection(FILES);
    const username = req.session.username;
    if (!username) {
        console.log("not logged in");
        req.flash('info', "You are not logged in");
        return res.redirect('/login');
    }
    const uploads = await db.collection(FILES).find({owner: username}).toArray();
    const users = await db.collection(USERS).find({}).toArray();
    const userId = req.session.userId;
    return res.render('auth.ejs', {username, userId, users, uploads});
});

This code just takes the username from the session and uses it to look up the files that have {owner: username}. It also looks up all the users in the app. Finally, it renders a page with that information. The page looks like this:

screenshot with list of links to other users — Screenshot of file upload example with display of the logged-in user (both username "joe" and userId "6424d2cb8fd3c36541c78f39"). Also a list of all the users of the app; more precisely, hyperlinks to their pages. Finally, the same file upload form we had before.

As you can see, there's a list of links to all users. Each of those is a URL that comes to this endpoint:

// The :username in the URL (endpoint) is the username of the person
// whose photos we want to view.

app.get('/photos/:username', async (req, res) => {
    const photoOwner = req.params.username; // username of owner of photos
    const username = req.session.username;   // 
    if (!username) {
        console.log("not logged in");
        req.flash('info', "You are not logged in");
        return res.redirect('/login');
    }
    if (!isAuthorizedToView(username, photoOwner)) {
        console.log("not authorized");
        req.flash('info', "You are not allowed to view this person's photos")
        // send them to the main page
        return res.redirect('/')
    }
    // database lookup
    const db = await Connection.open(mongoUri, DB);
    const fileCol = db.collection(FILES);
    const uploads = await db.collection(FILES).find({owner: photoOwner}).toArray();
    const users = await db.collection(USERS).find({}).toArray();
    const userId = req.session.userId;
    return res.render('auth.ejs', {username, userId, users, uploads});
});

Most of that code is the same as the code for the /myphotos handler, but now it's possible that the logged-in user (req.session.username) might not be the same as the id of the person whose photos these are (photoOwner). Before we allow the person to view the pictures, we check if they are authorized (isAuthorizedToView). If they are not authorized, we flash an error message and send them on their way. Otherwise, we proceed exactly as we did with /myphotos.

Example 2, Upload¶

The procedure for upload is almost the same, but we just have to record the owner of the file. We also have do to stuff like checking that the user is logged in, etc.

app.post('/upload', upload.single('photo'), async (req, res) => {
    const username = req.session.username;
    if (!username) {
        req.flash('info', "You are not logged in");
        return res.redirect('/login');
    }
    console.log('uploaded data', req.body);
    console.log('file', req.file);
    // insert file data into mongodb
    const db = await Connection.open(mongoUri, DB);
    const result = await db.collection(FILES)
          .insertOne({title: req.body.title,
                      owner: username,
                      path: '/uploads/'+req.file.filename});
    console.log('insertOne result', result);
    // always nice to confirm with the user
    req.flash('info', 'file uploaded');
    return res.redirect('/');
});

Analysis¶

This works well and does almost exactly what we want. However, there's a flaw. The URLs for the various images look like /uploads/photo-202341.png and those URLs are handled by our staticServer handler, which doesn't check for authorization. So the URLs work whether or not the correct person is logged in!

Here's a scenario:

Alice logs in.
She uploads a photo and its URL is /uploads/photo-202341.png
She's pleased with it, and sends the URL to a friend
That friend is able to view the photo without being Alice.
In fact, the friend doesn't even have to login!
The "friend" can share the URL with other people as well
Alice has completely lost control of her picture.

(This is very much like the "anyone with the link" sharing option in Google Docs. So, this might be an acceptable solution in some cases.)

If we want to restrict viewing only to logged-in, authorized users, the solution is to not serve the files in the same way that we serve static files. Instead, we'll intercept those requests and check that the person is authorized before serving them.

Example 3, Private¶

This last version of the app, private.js is almost identical to the previous one, but instead of

app.use('/uploads', express.static('uploads'));

which sets up the static and un-checked serving of files out of the /uploads folder, we substitute the following endpoint:

// Added this to require authorization to view a file

app.get('/uploads/:file', async (req, res) => {
    const filename = req.params.file;
    console.log('getting', filename);
    const username = req.session.username;
    if (!username) {
        req.flash('info', "You are not logged in");
        return res.redirect('/login');
    }
    const db = await Connection.open(mongoUri, DB);
    const pathname = '/uploads/'+filename;
    const fileDoc = await db.collection(FILES).findOne({path: pathname});
    if(!fileDoc) {
        console.log("no such file");
        req.flash('error', "No such file");
        return res.redirect('/myphotos');
    }
    if(!isAuthorizedToView(username, fileDoc.owner)) {
        console.log("not authorized");
        req.flash('info', "You are not authorized to view this file")
        return res.redirect('/myphotos');
    }
    return res.sendFile(path.join(__dirname, pathname));
});

This endpoint handler checks that the person is logged in, that the file exists, and that the user is authorized to view the file. None of that code has anything new in it, though you should read over it to make sure it's clear.

The only really tricky part of this function is the last line:

    return res.sendFile(path.join(__dirname, pathname));

The __dirname variable is a variable that is set by Express to the directory that the app is running in. For example, /home/cs304node/apps/fileUpload for this app. Joining that with the pathname, like /uploads/photo-202341.png yields a complete pathname to the file. The res.sendFile function just sends a file to the browser as its response.

Error Handling¶

There is one other feature that we added to the limited and private versions that the simple open version lacked is nicer error handling. The open version handles errors in the default Express way by printing a backtrace to the browser. That's great for debugging, but now we've introduced the possibility of a user error caused by uploading a file that is too big. If that happens, we want to give the user an nice error message and redirect them someplace else in our app.

We can set up an error handler in Express. (You can read more about error handling in Express.) The following does the trick:

app.use((err, req, res, next) => {
    console.log('error', err);
    if(err.code === 'LIMIT_FILE_SIZE') {
        console.log('file too big')
        req.flash('error', 'file too big')
        res.redirect('/')
    } else {
        console.error(err.stack)
        res.status(500).send('Something broke!')
    }
})

We define this at the end of the file (probably should go in the "postlude"), so that it goes last of all the middleware. It looks at the error code, err.code, that Multer puts in the error object and, if it's the LIMIT_FILE_SIZE error, it flashes an error message and returns the user to the home page. Otherwise, it sends a terse error message to the browser and prints the backtrace to the Node.js console.

Video¶

There's a video of these three versions in action in our videos collection. Look near the end of the page.

Conclusion¶

File upload is somewhat complicated, but well worth doing. There are, however, some security concerns that you should keep in mind, because bad stuff can happen when you allow untrustworthy stuff onto your server.

Control the access: don't allow anonymous people to upload stuff for all to see. Require logins, sessions, etc.
Control the filename. Don't let people name uploads because they might name it as a command or something executable via the web.
Control the size; don't let people overwhelm your server with huge files.
Control the location. You don't want people being able to use your server as an image server, serving porn or something.
Control the content. In the shell, you can check using the Linux file command to check what kind of data this is.

Summary¶

Optional Chaining¶

(Just in case they don't have a title property, we use the file?.title syntax, which is called optional chaining. Optional chaining means that if the property doesn't exist, you get undefined or null rather than throwing an error.)

I run into the same thing as a professor. Every student will turn in a file named hwk1.pdf — understandably — but that makes it hard to tell who each file belongs to! ↩
Unix time traditionally is measured in milliseconds since 1/1/1970, which is called "the epoch". January 1, 2023 is 1672549200000 milliseconds since the epoch. ↩