MongoDB CRUD

This reading continues our discussion of MongoDB databases and how to use them from Node.js.

Last time, we learned about the CRUD operations on a database: create, read, update, and delete. We then learned a lot about the find method in MongoDB, which filled in a lot of information about reading data. In this reading, we'll learn how to insert, update and delete data. We'll also learn some about aggregation expressions.

Counts

Sometimes, we don't care what the documents in a collection are, we just want to count them. Surprisingly, there are several ways to count, because MongoDB is designed for large datasets where the number of documents might be constantly changing, and sometimes you just want a quick estimate.

count documents usage example

Here's an estimated count of all the movies in the WMDB. The return value is a number.

let results = await db.collection('movies').estimatedDocumentCount();
console.log('found', results);

For a more exact count, we use a different method:

let results = await db.collection('movies').countDocuments();
console.log('found', results);

(Seems like they could have made the method names more symmetrical.) Both of those queries returned 1463 as of this writing.

We can also supply a query and count the number of documents that satisfy it:

let results = await db.collection('movies').countDocuments({release: "2022"});
console.log('found', results);

That query returns 13 (for the current WMDB). Note that the release year is a string. If I had queried for {release: 2022} I would have gotten zero. (Indeed, I made that mistake.)

Distinct Items

When we are counting, sometimes we don't care about repeats. For example, we often have documents with values that are repeated in other documents and we might be interested in how many different or distinct values there are. In the WMDB, for example, the release year of a movie will obviously have duplicates, since there are a lot of movies released each year.

Sometimes, we want to know how many distinct values there are. MongoDB's node API provides a method just for that, called distinct.

distinct documents usage example

Here's an example of counting release

years = await db.collection("movies").distinct("release");
console.log(`There are ${years.length} distinct release years in the collection`);

The return value is an array of the values, so no need for the .toArray() method. As of this writing, that returned 90 different release years, from 1912 to 2023. Compare that to the 1463 movies in the database. Given that movies have only really existed for a bit over a century, the WMDB has covered most years.

We can also query the collection, providing a query document, just like the one for find, as the second argument to distinct. So here are the years in which Peter Jackson released a movie:

let query = {"director.name": "Peter Jackson"};
let results = await db.collection('movies').distinct('release', query);
console.log('Peter Jackson released movies in');
results.forEach((r) => console.log(r));

And the results are

2001
2002
2003

for the three Lord of The Rings movies. (He's directed other movies, but they aren't in the WMDB.)

Insert

Insert is fairly easy, because you don't have to worry about combining with existing data. You can just provide a document and insert it into the database. The usual method is the insertOne method on a collection.

See mongodb node insertOne

Here's how I was inserted into the staff collection in the wmdb database:

const wmdb = await Connection.open(mongoUri, 'wmdb');
const result = await wmdb.collection('staff')
                         .insertOne({uid: 1, name: "Scott D. Anderson"});

The return value is a document describing the results. For example, MongoDB will automatically generate an ID for the new document, making it the value of _id in the document in the database. The return document has the id as the value of insertedId. So, you could do:

const result = await db.collection('staff')
                       .insertOne({uid: 1, name: "Scott D. Anderson"});
console.log(`that was given the id: ${result.insertedId}`);

A similar method takes an array of documents and inserts all of them. (This is much more efficient than iterating over the array and inserting each one separately.) We could insert several people like this:

const newStaff = [{uid: 801, name: "Harry"},
                  {uid: 802, name: "Ron"},
                  {uid: 803, name: "Hermione"}];

const options = {ordered: true};
const wmdb = await Connection.open(mongoUri, 'wmdb');
const result = await wmdb.collection('staff').insertMany(newStaff, options);
console.log(`${result.insertedCount} documents were inserted`);

Update

The updateOne() method takes a "filter" document just like find, so it can locate the document you want to update. Once it's found it, there's an update document that says what modifications to make.

Here's how we could add an email address field for me in the staff table

const wmdb = await Connection.open(mongoUri, 'wmdb');
const filter = {uid: 1};  // document to update
const update = {$set: {email: "scott.anderson@wellesley.edu"}};   // changes to make
const options = {upsert: false};
const result = await wmdb.collection('staff').updateOne(filter, update, options);
console.log(`matched ${result.matchedCount} documents`,
            `modified ${result.modifiedCount}`);

The upsert token is a portmanteau of update and insert. When it's true, it means if you don't find the document, insert one. Here, we don't want to insert my email address if my document isn't in there, but if updates are fairly complete, this becomes a reasonable idea.

You can read more at this helpful tutorial:

upsert tutorial

Upsert is false by default, so it wasn't necessary to state it above. But that allowed us to introduce the topic.

The $set operator allows us to replace fields with values:

{$set: {name: "Lord Voldemort", back: true, nose: false}}

If the field doesn't exist, it will be created. If the field name has a dot, the embedded document will be updated:

{$set: {"addr.city": "Little Whinging", 
        "addr.county": "Surry"}}

We can also update elements of an array, again using dot notation but with numerical indexes:

{$set: {"hobbies.0": "tv",
        "hobbies.1": "doughnuts",
        "hobbies.2": "beer"}}

The tutorials above are quite good.

Delete

Deleting documents is fairly straightforward. The argument is a filter, as with find, but the documents are deleted rather than being retrieved. The following would delete me from the staff table:

const result = await db.collection('staff').deleteOne({uid: 1});
console.log(`deleted: ${result.deleteCount} documents`);

Aggregation Pipeline

So far, we've been working with documents as individual items, but there are a number of useful queries that can be done by grouping documents. This is called aggregation, and the query operators use a pipeline in their processing.

The following tutorial is quite good: Aggregation Pipeline Tutorial

There are several pages following that have some nice examples.

I do not expect you to master the aggregation pipeline in this course. But you should have heard of it and have some idea about its use.

Please read over the tutorial above. Here's another example. Suppose we have a contacts collection (like a phone's contacts app), or maybe it's a collection of students and the state/country they are from. The admissions office often wants to get good geographic coverage, but sometimes that means there's only 1 person from, say, Wisconsin. They might be called the "token cheesehead." (Wisconsin is known for its cheese, and so "cheesehead" is an affectionate nickname for a person from Wisconsin.) How could we find the tokens?

Here's a sample contacts list:

const contacts = [{name: "Mom", state: "VA"},
                  {name: "Connie", state: "WA"},
                  {name: "Dad", state: "NC"},
                  {name: "Lynn", state: "CA"},
                  {name: "Pattie", state: "MA"},
                  {name: "Allen", state: "MA"},
                  {name: "Jeff", state: "MA"},
                  {name: "Pat", state: "MA"},
                  {name: "Matt Damon", state: "MA"},
                  {name: "Matt Damon", state: "CA"},
                  {name: "Scout", state: "NH"},
                  {name: "Ron", state: "ME"},
                  {name: "Fred", state: "CT"},
                  {name: "George", state: "RI"},
                  {name: "Harry", state: "VT"},
                  {name: "Percy", state: "RI"}];

Here's some code that will use the aggregation pipeline to solve this. The pipeline just has two stages. The first stage groups the people by their state and counts the size of each group:

        { $group: { _id: "$state",
                    cnt: {$count: {}}}},

The _id says how (by what field) to group the documents. The cnt creates a new field that is the size of the group, using {$count: {}}.

So the second stage of the pipeline gets documents like:

{ _id: 'WA', cnt: 1 }
{ _id: 'VA', cnt: 1 }
{ _id: 'MA', cnt: 5 }
...

The second stage of the pipeline filters out just the documents where the count is 1, using the $match operator:

        { $match: { cnt: 1 } }

Finally, we get those results. That's our list of states where only one person hails from. We can then turn around and, using $in find the people in those states.

Here's the complete solution:

    const results = await col.aggregate( [
        { $group: { _id: "$state",
                    cnt: {$count: {}}}},
        { $match: { cnt: 1 } }
    ]).toArray();
    console.log(results.length);
    results.forEach((d) => console.log(d));
    // token states
    const states = results.map(d => d._id);
    // people in those states
    const tokens = await col.find({state: {$in: states}}).toArray();
    console.log(tokens.length);
    tokens.forEach((d) => console.log(d));

All Examples

You can see all the examples (part 2) and you can run them in the CS 304 course account:

cd ~cs304node/apps/exampleQueries/
node exampleQueries-part2.js

Conclusion

We learned about a few more query methods:

  • countDocuments(query) which counts documents matching the query, which is optional
  • estimatedDocumentCount(query) which estimates the number of documents matching the query, which is optional
  • distinct(prop, query) which counts the number of distinct value of some property, after running the query, which is optional

We completed our coverage of the CRUD operations:

  • insertOne(doc) inserts a document;
  • updateOne(filter, update, options) finds a document and updates it. For example:
  • updateOne({uid: 1}, {$set: {email: "scott.anderson@wellesley.edu"}})
  • "upsert" is an option that will insert a document if it doesn't find a match.
  • deleteOne(filter) deletes one document matching the filter
  • deleteMany(filter) deletes all documents matching the filter

We also learned a little about the aggregation pipeline.

  • It's a list (JSON array) of stages
  • Stages can group, filter, sort, project and similar operations on their input documents
  • Grouping is the most important part of the aggregation pipeline

References