File Upload, Contents Checking

Note: this section written in part with help from ChatGPT. I haven't yet had time to check/test everything, but it seems right.

When users upload files, we ought to check that they have uploaded the right kind of file. If we are displaying pictures in the application, we want them to be, say, JPEG or WEBP, not PDF or DOCX. But if people are uploading their resumes, perhaps we want the latter. Finally, we usually want to prevent people from uploading executable files (maybe PHP, CGI, or EXE).

Here are some options, with some pros and cons.

Check MIME type via Multer (file.mimetype)

Multer exposes the MIME type sent by the client (usually a web browser, such as Chrome). The MIME type is, essentially, the kind of data in the file. Back in the olden days, it was invented so that people could attach files to (text) emails, and so the name "MIME" stands for "Multipurpose Internet Mail Extensions", but you don't need to remember that. You can think of it as Media Type, but with standardized names. The names are grouped, so they are in two parts, like:

  • 'image/jpeg'
  • 'image/png'
  • 'application/pdf'

So, the browser sends the MIME type along with the file, and Multer can look at that information. Here's some code:

const multer = require('multer');

const upload = multer({
  fileFilter: (req, file, cb) => {
    const allowed = ['application/pdf', 'image/jpeg'];

    if (allowed.includes(file.mimetype)) {
      cb(null, true);
    } else {
      cb(new Error('Invalid file type'), false);
    }
  }
});

The key point here is the fileFilter function that can check the file's mimetype against a list of allowed types.

This is pretty easy to do, but it's possible for the mimetype to be faked, especially if the client is, say, a Python script. So, its security is not perfect.

  • Pros
    • Easy
    • Built into Multer
  • Cons (important)
    • Not trustworthy — the client can fake this header
    • Example: a .exe file renamed to .jpg can pass

Check file extension

The file is also sent with a file extension, and you can check that:

const path = require('path');

const ext = path.extname(file.originalname).toLowerCase();

if (ext === '.pdf' || ext === '.jpg' || ext === '.jpeg') {
  // accept
}

But, again, this requires trusting the client, so it's not perfect.

  • Pros
    • Simple sanity check
  • Cons
    • Still easily spoofed
    • Same weakness as MIME type

Much better is to check the actual file data, using a JS library.

You first have to install the library into your node_modules folder:

npm install file-type

(I have installed this into our "omnibus" node_modules folder.)

You would use this in the code that handles the uploaded file:

const multer = require('multer');
const { fileTypeFromBuffer } = require('file-type');

const upload = multer({ storage: multer.memoryStorage() });

app.post('/upload', upload.single('file'), async (req, res) => {
  const type = await fileTypeFromBuffer(req.file.buffer);

  if (!type) {
    return res.status(400).send('Unknown file type');
  }

  if (
    type.mime === 'application/pdf' ||
    type.mime === 'image/jpeg'
  ) {
    res.send('File accepted');
  } else {
    res.status(400).send('Invalid file type');
  }
});
  • Pros
    • Looks at actual bytes (e.g., %PDF- or JPEG header)
    • Much harder to fake
    • Industry-standard approach

However, this is just for files uploaded into memory. We usually upload to disk files. So, one more option:

Checking a file on Disk

Instead, we can look at the file on disk:

const fs = require('fs');
const { fileTypeFromFile } = require('file-type');

const type = await fileTypeFromFile(req.file.path);