Databases and MongoDB Query Language MQL

This reading begins our discussion of databases and how to use them from Node.js. It assumes no prior knowledge of databases, so if I make a mistake and refer to something you don't know, please help me out and let me know.

Databases

Databases are software that stores data and allows us to get it back out quickly and efficiently. We typically don't worry about exactly where or how the data is stored. (In this course, we'll be using an installation of MongoDB that is stored in the cloud, meaning on servers maintained by MongoDB; for all we know, they have outsourced the servers to AWS or Google cloud.)

Very generally, a database stores data, but that definition is so generally as to be useless. So, let's be more specific:

A database

  • stores data
  • the data is in machine-friendly form
  • the representation is easy and efficient to search. For example, hashtables (dictionaries) or b-trees.
  • the database may have additional indexes to speed up certain kinds of searches.
  • the representation is easy and efficient to insert, update and delete

For the most part, we trust that the DBMS — Database Management System does all these things correctly and efficiently. So, we will treat this as an "opaque box" or abstraction barrier, and we won't worry about how the DBMS is implemented. (That would be a different course.)

CRUD

What we do care about is how to create new items in the database, read them back out, update them, and if necessary, delete them. These four operations go by the evocative acronym CRUD:

  • CREATE means to insert a new item or entity in the database. For example, a new movie in a collection of movies or adding a new hire to a collection of employees.
  • READ means to look up an entity or set of entities. Say you look up Glass Onion on the Internet Movie DataBase — IMDB or all employees in the sales department.
  • UPDATE means to modify an entity. Say Wendy in Sales is getting a bonus and that needs to be recorded in the database. Or we need to update the average rating of Glass Onion.
  • DELETE means to remove the item from the database. Maybe Wendy in Sales has been hired away, and we need to delete her from the database.

SQL vs NoSQL

For many decades, the industry standard database technology has been SQL. There are a number of implementations by different companies, but they all organize data in similar ways (tables of rows and columns) and support a common language (Structured Query Language — SQL) for querying and managing the data.

In the past decade or so, an important new database technology has arisen, called NoSQL. It's strange to be defined by what it's not, so let's talk a little about what it is.

In a NoSQL database, the entities are not necessarily stored as a row in a table (as in SQL). In MongoDB, as in many other NoSQL databases, the entities are stored as JSON documents. (There are other kinds of NoSQL databases, but we will not be covering them.) Furthermore, NoSQL databases allow the items to be stored more flexibly, so, for example, not every document will have the same properties (fields) and they may have embedded documents or even arrays of documents. There are a few other important differences (joins, the ACID properties and horizonal scaling via sharding) that we will discuss later in course.

Note that this is not an either-or choice. It is common for companies to have both a SQL database for certain kinds of data (employee database, financial records) and a NoSQL database (website, mobile apps, point of sale registers).

Many tutorials and references for NoSQL databases explicitly compare them to SQL databases, but since most of you don't know SQL, we won't be doing that in CS 304. We will discuss SQL later in the course.

MongoDB

One of the important NoSQL databases is MongoDB. It has documentation, support and a free cloud-based installation (Mongo Atlas) at MongoDB.com. It's what we will be using in CS 304.

Sign up for the MongoDB Student Pack to receive $50 in Atlas credits and free certification!

Documents

The items or entities that we store in MongoDB are called documents. A document is JSON data, which MongoDB actually stores in a compact binary representation called BSON.

In a collection of people, each person might be represented with a document like this:

{name: "Scott",
 hair: "increasingly gray",
 height: 170, // centimeters
 office: "W118",
 email: "scott.anderson@wellesley.edu"
}

JSON objects like this give us key/value pairs, so given a document that we've looked up, we can easily find a person's name or height or whatever.

Documents can be more interesting than that, though. They can have embedded documents for more structured data. Let's add a multi-part addr field for the mailing address:

{name: "Scott",
 hair: "increasingly gray",
 height: 170, // centimeters
 office: "W118",
 email: "scott.anderson@wellesley.edu",
 addr: {office: "Computer Science Department",
        organization: "Wellesley College",
        street: "106 Central St",
        city: "Wellesley",
        state: "MA",
        zip: "02481"
        }
} 

So, if the variable person1 contains the document above, we can print the person's zip code like this:

console.log(person1.addr.zip);

(Note that chains of dots like that can fail if, say, someone in the database doesn't have an addr field, and with MongoDB, not every person needs to have all the same fields. So modern JavaScript introduced the ?. operator which will just give you undefined if the chain fails. See optional chaining.)

Documents can also have embedded lists. Let's add a list of hobbies:

{name: "Scott",
 hair: "increasingly gray",
 height: 170, // centimeters
 office: "W118",
 email: "scott.anderson@wellesley.edu",
 addr: {office: "Computer Science Department",
        organization: "Wellesley College",
        street: "106 Central St",
        city: "Wellesley",
        state: "MA",
        zip: "02481"
        }
  hobbies: ["tennis", "movies", "reading", "bridge", "dance"]
} 

MongoDB is organized around documents like these. Sets of documents are stored in a collection, which is usually a set of documents for a particular kind of entity. For example, an student database might have collections of students, courses, majors, dorms and the like.

In MongoDB, we can store a number of databases, where the word "database" means a set of related collections.

  • a document is data about one entity, represented as JSON
  • a collection is a set of documents about a particular kind of entity
  • a database is a set of related collections

To make this more concrete, let's talk in depth about a particular example database.

WMDB

For the purposes of this course, we have created a database of movies called the Wellesley Movie Database or WMDB. It has three collections:

  • movies: each document is about a particular movie
  • people: each document is about a person, maybe an actor, director or key grip
  • staff: are Wellesley students who have contributed to the WMDB over the years

Let's start with one of the staff members, me:

{uid: 1,
 name: "Scott D. Anderson"
}

The uid is a unique numeric identifier; we'll talk more about it in the next section.

Now let's turn to a person, George Clooney. His entry in the people collection might look like:

{nm: 123,
 name: "George Clooney",
 birthdate: "1961-05-06",
 addedby: {uid: 1, name: "Scott D. Anderson"}
}

Again, the nm is a numeric unique identifier.

Here's a movie he acted in. It's a document in the movies collection

{tt: 120780,
 title: "Out of Sight",
 release: "1998",
 director: null
 addedby: {uid: 1, name: "Scott D. Anderson"}
 cast: [ {nm: 123, name: "George Clooney"},
         {nm: 182, name: "Jennifer Lopez"} ]
} 

You can see that the addedby information is duplicated in the person document and the movie document. Some, but not all, of the person information is duplicated in the movie document.

Indeed, our earlier person document is incomplete. It actually has additional information in it, listing the movies that he's acted in (according to our small and incomplete WMDB; see the IMDB for the complete data):

{nm: 123,
 name: "George Clooney",
 birthdate: "1961-05-06",
 addedby: {uid: 1, name: "Scott D. Anderson"}
 movies: [{tt: 90865, title: "Combat High", release: "null"},
          {tt: 120780, title: "Out of Sight", release: "1998"},
          {tt: 1454468, title: "Gravity", release: "2014"}]
}

It's common in MongoDB to duplicate data. We can see in George Clooney's data that he acted in "Out of Sight". However, we don't know the cast for "Out of Sight"; for that, we'd have to look up its entry in the movies collection. How do we decide what data to put in a document and when to cut it off and require another lookup?

There's no single right answer for that. However, for web applications, a useful guideline is to have all the information in one document to serve one web request. For the lookup assignment, you'll write a web application that displays information about people and movies.

For this app, we wanted to be able to satisfy any request with just a single lookup. That's an arbitrary rule, but useful. In general, you don't want to have a website page request require an arbitrary number of database lookups, such as every person in a cast or every film in an actor's filmography.

(Note that not everyone in the WMDB people collection is an actor. Some are directors. Some have some other role.)

Keys

One of the things you noticed in the examples above were that each entity had a unique numeric identifier. This is common and standard in databases. This property is usually called an id (identifier) or a key.

One really important reason for this is that names are not unique. Every time I call the doctor, they ask me for my birthdate, because there's more than one "Scott Anderson" in their database. It's the same reason that Wellesley students are issued an B-number (now C-number). (Movie titles aren't always unique, either. IMDB lists 4 movies named "Hamlet": 1996, 1990, 2000, 1948.)

It's also convenient. The Social Security Administration and other US government entities can identify people with their 9-digit social security number rather than name + birthdate (and even name + birthdate isn't guaranteed to be unique). It's nice to have something that is compact and guaranteed to be unique.

The IMDB does this as well. If you look at their URLs, the page for a particular person looks like /name/nm1234 and for a movie looks like /title/tt4567. We used these same ID numbers in the WMDB. (The reason so many of my examples use George Clooney is that his IMDB id number is 123, which is easy to remember.)

IDs in MongoDB

IDs are important enough that MongoDB has special support for it. Unless you go out of your way to disable or replace it, each document inserted into a collection gets issued a unique identifying number as the value of the _id property. This number is computed using a variety of pieces of information so that with extremely high probability, it is globally unique, without requiring a global counter. (It's good to avoid global quantities when code is running concurrently, which MongoDB is designed for). So, the complete WMDB entry for George Clooney is actually:

{"_id": ObjectId("63c460782a77c1456baaffd9"),
 nm: 123,
 name: "George Clooney",
 birthdate: "1961-05-06",
 addedby: {uid: 1, name: "Scott D. Anderson"}
 movies: [{tt: 90865, title: "Combat High", release: "null"},
          {tt: 120780, title: "Out of Sight", release: "1998"},
          {tt: 1454468, title: "Gravity", release: "2014"}]
 }

The ObjectId is a 24-character hexadecimal number that is virtually guaranteed to be unique. Unfortunately, they lose a bit in convenience and compactness, being 24 characters long. We will discuss them more later in the course, but you can ignore them for now.

MongoDB Interfaces

The underlying software provides the ability to query collections, but there are a variety of interfaces. The one that MongoDB does the best job of documenting is their Mongo Shell, called Mongosh. If you do a web search for "mongo find", you'll probably be taken to this page:

mongodb find

While that page has a lot of great information on it, unfortunately, that's not the interface from Nodejs.

Better references are these:

The two interfaces are similar though, so it's not a bad idea to read the Mongosh documentation. Just keep in mind the warning at the top of it: that it's for Mongosh only.

In this course, we've decided to mostly ignore MongoSH (though we may change our minds) and focus on just one API. The disadvantage is that rather than typing commands into a shell (like MongoSH), you'll write them in a file and run the file using node. While this is more tedious for quick queries, it will be better for when you are writing your apps.

MongoDB provides two functions: findOne and find. They search identically, but if the query matches multiple documents, findOne only returns the first, while find returns all of them.

Find123

Here's our first complete example. You can run it like this:

cd ~cs304node/apps/exampleQueries/
node find123.js

We'll pick it apart, and then we'll switch to discussing how to use .find():

// File to find the person with nm 123
// Scott D. Anderson and Olivia Giandrea

const path = require('path');
require("dotenv").config({ path: path.join(process.env.HOME, '.env')});
const { Connection } = require('./connection');
const cs304 = require('./cs304');

const mongoUri = cs304.getMongoUri();
console.log('mongoUri', mongoUri);

async function main() {
    let db = await Connection.open(mongoUri, 'wmdb');
    // here's the one line that does the query
    let results = await db.collection('people').find({nm: 123}).toArray();
    console.log('found', results.length, 'results');
    results.forEach((r) => console.log(r));
    await Connection.close();
    console.log('done');
}

main();

The file begins with a comment saying what it does and who wrote it; always nice.

Next, there are a few lines that load the path, dotenv, connection and cs304 modules:

const path = require('path');
require("dotenv").config({ path: path.join(process.env.HOME, '.env')});
const { Connection } = require('./connection');
const cs304 = require('./cs304');

As we learned earlier in the course, the Node software usually requires modules to be loaded before it can do anything really interesting. In brief:

  • the path module allows us to join paths and filenames.
  • the dotenv module reads in some environment variable settings from a file named .env in our home directory. The home directory is found using yet another environment variable called "HOME" (or, from within Node, process.env.HOME).
  • The connection module is some code we wrote to help you with connecting to the Mongo database in a convenient way.
  • The cs304 module has some additional code we wrote to help you with your apps.

Both of the first two are loaded from the node_modules folder in the current directory. The second two are loaded from the local directory: notice the leading ./ in the pathnames.

The next line generates a URI to connect to Mongo Atlas, using a function that we supplied in the cs304 module:

const mongoUri = cs304.getMongoUri();

This preamble boilerplate code will be at the top of all your apps; it's in our all our demos and our starter app as well. You can copy it and mostly forget it, though it's good to have a little understanding of what it does.

Finally, we get to the function, main, that does all the work. It's an async function because it uses await to open a connection, to query the database, and to close the connection.

async function main() {
    let db = await Connection.open(mongoUri, 'wmdb');
    // here's the one line that does the query
    let results = await db.collection('people').find({nm: 123}).toArray();
    console.log('found', results.length, 'results');
    results.forEach((r) => console.log(r));
    await Connection.close();
    console.log('done');
}

The first line connects to Mongo Atlas and says we'll use the wmdb database.

    let db = await Connection.open(mongoUri, 'wmdb');

You'll do something like that before any use of the database; it's what gets us started.

At long last, we get to the code that queries the database. The asynchronous function is find. First, we have to choose what collection we want to query, using the .collection(colName) method of the db. The find function is actually a method on a collection, so we chain these together.

The findOne and find methods take the same arguments and search the same way. (Most of the rest of this reading will be about the argument to find.) The difference is in the return value. The findOne method returns the desired document. The find method returns something called a cursor. A cursor can be used to process the results one at a time, but in this course we will always retrieve all the results to an array, using the toArray() method:

    let results = await db.collection('people').find({nm: 123}).toArray();

The results variable just holds a list of JSON documents. The next few lines are normal, everyday JavaScript, which you're quite used to now. Here, they just print out the results.

    console.log('found', results.length, 'results');
    results.forEach((r) => console.log(r));

Last, we close the database connection:

    await Connection.close();
    console.log('done');

Our web apps typically will not close the connection; they will leave it open for subsequent queries.

The last line of the file just runs main:

main();

That's it. So, now we can start discussing how to do queries using find.

MongoDB Query Language MQL

Querying in MongoDB is broken down into two kinds: relatively simple queries that use the MongoDB Query Language (MQL) and more complex queries using the MongoDB Aggregation Framework. In this course, I expect you to feel comfortable with the MQL and have some ability with the aggregation framework, but not necessarily feeling comfortable with it. We'll discuss the aggregation framework later in the course.

The MQL is used in the arguments to find and other method that query the database. We'll see others.

The argument to find is a document that describes what we are looking for. In the query above, the argument was {nm: 123}.

Finding using Property Values

Probably the most common way to look something up is by property/value pairs that it has. We just saw how to look someone up using their NM:

{nm: 123}

Change the 123 to 182 and we find Jennifer Lopez.

We can also look people up by name in the same way:

{name: "George Clooney"}

Of course, we have to make sure it's spelled correctly and we might have to deal with multiples, because names are not unique. But you knew that.

The following would look up movies named "Hamlet". (Assume we chose the "movies" collection instead of the "people" collection, otherwise, we'd probablyl find nothing, since the property is name not title.)

{title: "Hamlet"}

If we just want the 1948 version with Sir Laurence Olivier, we can specify multiple field values:

{title: "Hamlet", release: 1948}

It's important to remember that the string "1948" is not the same as the number 1948 so make sure the datatype is right.

Comparison Operators

We can also search based on numeric inequalities. If we want to see the first few people inserted into the IMDB, we'd want people with a low NM. (The ID value for people is called NM, evocative of NAME, while the ID value for movies is called TT, evocative of TITLE.) The following does the trick, returning everyone with an NM < 12.

{nm: {$lt: 12}}

The comparison query operators available are

  • $eq matches values equal to the given value
  • $gt matches values greater than the given value
  • $gte matches values greater than or equal to the given value
  • $lt matches values less than the given value
  • $lte matches values less than or equal to the given value
  • $ne matches values that are not equal to the given value

These are pretty straightforward conceptually, though of course the syntax is weird and unfamiliar to us. Note also the spelling: it's $gte not $ge as you might think.

  • $in matches values that are in the given array
  • $nin matches values that are not in the given array

The $in operator is really useful. If you want to look up several people all at once, this would do the trick:

{nm: {$in: [123, 182]}}   // both cast members of Out of Sight

Security Implications

There is an important warning associated with user-supplied values. Malicious users can try to craft inputs that will cause the database to do things unintended by the programmer. So, while

{nm: 123}

will safely look up George Clooney, if the 123 came from the user, the result might not be safe:

{nm: userVal}

(For example, they could supply a userVal that works out to {$ne:-1} and match everything in the database instead of just one person. That might leak sensitive data. Or if they are able to delete chosen data, they could delete all the data.) Therefore, when using user data, it's better to do:

{nm: {$eq: userVal}}

See the Security Implications in that page.

Boolean Combinations

Of course, we need to be able to test several conditions, we need to combine them with and, or, and not. Here are some examples:

{$and: [ {nm: {$gt: 120}}, {nm: {$lt: 130}} ]} // 120 < nm < 130

Here's a more compact way, since we are using the same field in each comparison:

{nm: {$gt: 120, $lt: 130}}  // 120 < nm < 130

Notice that the $and operator is out front, and the corresponding value is an array of booleans. This kind of prefix notation is probably unfamiliar to you (though it's standard in languages like Lisp). It does have some advantages, if you have more than two conditions. This tests for three different NM values (though of course $in is easier):

{$or: [ {nm: 123}, {nm: 147}, {nm: 183} ]}

The following is ridiculous, but possible:

// 5 < nm < 10 or 120 < nm < 125
{$or: [ {$and: [{nm: {$gt: 5}}, {nm: {$lt: 10}}]},
        {$and: [{nm: {$gt: 120}}, {nm: {$lt: 125}}]}]}

Let's not go further down this road.

Strings and Regular Expressions

We've already searched using strings. But there's a power language for describing sets of strings called regular expressions. It's a topic with cool connections to automata theory and state machines, but we don't have time for all that right now. Indeed, we will only scratch the surface of regular expressions. But put learning more about regular expressions on your personal to-do list.

Regular expressions (regex to its friends) is way to search strings. Regular expressions are built-in to the JavaScript language and fixed regex patterns are denoted with surrounding slashes instead of quotation marks. The following is a regex that matches the word "Potter":

/Potter/

Thus, it would be useful for searching for the various Harry Potter movies. Here's how you would do that in MQL:

{title: {$regex: /Potter/}}

Note that by default, a regex matches any substring of the searched text. The following would find people named John:

{name: {$regex: /John/}}

But it would also match people whose last name was "John" or "Johnson". If you want to the pattern to match only if it matches the beginning of the string, use a caret character, which matches the beginning of the string:

{name: {$regex: /^John/}}

Similarly, if you want to specify that the regular expression matches at the end of the string, you would end with a dollar sign, which matches the end of the string. So the following matches people whose last name is "George".

{name: {$regex: /George$/}}

Metacharacters and Escaping

Regular expression languages are fascinating, powerful and complicated. The two examples we just saw — of ^ to match the beginning of the string and $ to match the end of the string — are examples of metacharacters: characters that are part of the regex language rather than being literally matched against the string. Another example is . which matches against any single character (it's a "wildcard" character, like wildcards that are sometimes allowed in games like Poker or Mahjong).

There are metacharacters to match any digit, any "word" character and many, many other ways of matching. See JavaScript Regular Expressions. The MQL regex language is nearly the same.

Metacharacters are powerful, but they complicate matching. Suppose you wanted to search for movies in the WMDB whose title contains a dot (period). As we saw above, the dot is a metacharacter that matches any single character. If you searched for:

{title: /./}    // match any character

That pattern would match every movie in the collection, which is not what we want. So, to match a literal dot, we need to escape it, which is done with a backslash, like this:

{title: /\./}    // match a literal dot

that search in the WMDB matches only 28 movies, with titles like:

  'E.T. the Extra-Terrestrial',
  'Mrs. Doubtfire',
  'The Adventures of Mary-Kate & Ashley: The Case of the U.S. S',
  'Dr. Horribles Sing Along Blog',
  'Stuck in Love.',
  'Guardians of the Galaxy Vol. 2',
  'Emma.',
  'Dr. Zhivago',
  'Yi Yi: A One and a Two...',
  'St. Trinians',
  'Mr. & Mrs. Smith',
  'Mr. Skeffington',
  'U.S. Marshals',
  'Sympathy for Mr. Vengeance',
  'Rumor Has It...',
  'Mrs. Palfrey at The Claremont',
  'Monsters vs. Aliens',
  'G.I. Joe: Retaliation',
  'The Young and Prodigious T.S. Spivet',
  'Godzilla vs. Kong',
  'Your Name.',
  'To All the Boys: P.S. I Still Love You',
  'A.I. Artificial Intelligence',
  'Mr. and Mrs. Smith',
  'Fantastic Mr. Fox',
  'Jaane Tu... Ya Jaane Na',
  'Pokémon the Movie: Kyurem vs. the Sword of Justice',
  'We Love You, Mr. Jin'

Some of the metacharacters that need escaping are:

^ $ . ( ) ? + * [ ] | @ \ 

So, for example, if you wanted to search for a movie with a dollar sign in the title, you would have to do:

{title: /\$9\.99/}    // match a literal dollar sign and dot

One common desire is case-insensitive matching, which you can get by appending an "i" to the regex:

{name: {$regex: /^george/i }}

The search above will match "George", "george", "GEORGE", and even weird strings like "GeOrGe", should they exist.

Dynamic Regular Expressions

The slash syntax is for regex patterns that are known at compile time. They can be turned into particularly efficient code. However, if the user has supplied the string to search for, you have to dynamically construct a regex, like this:

let pattern = new RegExp(userVal);
{name: {$regex: pattern}}

If you want case-insensitive matching, you can do:

let pattern = new RegExp(userVal, 'i');
{name: {$regex: pattern}}

Nested Documents

As we saw, it can be useful to have nested documents, such as the director of a movie or a structured address. Can we search by that criterion? Yes, we can, using dot notation:

{"director.name": "Peter Jackson"}   // movies directed by Peter Jackson
{"director.nm": 1392}   // same, but using an unambiguous ID
{"addr.zip": "02481"}   // people in the Wellesley zip code

If things are nested deeper, you can use more dots.

Nested Arrays

We also have nested arrays. For example, the cast field is an array of documents, each of which is a dictionary describing a cast member.

First, we can access the size or length of the array.

{"cast": { $size: 3 }}   // movies with a cast of size 3

We can also search the embedded array:

{"cast.nm": 123} // George Clooney in the cast

Movies with someone named George in the cast:

{"cast.name": /^george/i }    // someone named George in the cast

We can even do more elaborate tests:

{"cast": { $elemMatch: { nm: { $gt: 1, $lt: 10 }}}}  // person numbered 1-10 in the cast

Projection

There's no sense fetching data you don't need. It just wastes network bandwidth to download and ignore the data, and it wastes memory in your Node process. Choosing which data you want to retrieve is called projection (from the idea in mathematics of projecting from a higher-dimensional space to a lower-dimensional space).

To do projection, we will use the optional second argument to the find method. We provide a document that gives the fields we want, with an associated value of 1. Everything else is discarded, except the _id field. If we want to get rid of that, too, we list it, but with a 0.

The following finds just the title and tt of each movie:

.find({}, {projection: {_id: 0, title: 1, tt: 1}})

An alternative is to chain the project method onto the find. That can often be clearer. The missing argument to find is the same as supplying an empty document.

.find().project({_id: 0, title: 1, tt: 1});

Sorting

We can also use the second argument to do sorting of the results. To sort by a field in ascending order, list the field with a 1. For descending order, list it with a -1.

The following gets all movies, sorted by title (ascending).

find( {}, {sort: {title: 1}, projection: {_id: 0, title: 1, tt: 1}});

Here they are in descending order:

find( {}, {sort: {title: -1}, projection: {_id: 0, title: 1, tt: 1}});

Similarly, there's a sort method that we can chain onto the find:

find().sort({title: -1}).project({title: 1, tt: 1});

Limits

We might only need a few of the items. Again, it's wasteful to retrieve data we don't need. Or, maybe we've sorted them by some useful criterion and we only want the top 10. We can chain a limit method onto our find to limit the number of results.

Here's the first movie, alphabetically:

find( {}, {sort: {title: 1}, projection: {_id: 0, title: 1, tt: 1}}).limit(1);

or, equivalently:

find({}).sort({title: 1}).project({_id: 0, title: 1, tt: 1}).limit(1);

Chaining allows us to spread out the chain over several lines of code, which is then easier to read:

someCollection.find({...})
    .sort({fields...})
    .project({fields...})
    .limit(N)

But notice that, as with jQuery chaining, the different lines do not end in a semi-colon. A semi-colon would break the chain and also be a syntax error.

Running Examples

There are a great many working examples in the ~cs304node/apps/exampleQueries/ folder. We'll go over these in class. In lab, we'll learn how to copy and modify them.

Here's the first half of the example queries

Conclusion

We've covered a lot of ground in this reading. Let's take a minute to reflect.

  • Two major kinds of database technologies are SQL and NoSQL.
  • MongoDB is a NoSQL kind of database.
  • MongoDB stores documents in collections.
  • A database is a set of related collections. E.g. the wmdb has movies, people and staff.
  • A document is JSON data: fields with values that are strings, numbers, booleans and such.
  • Values can also be date objects, unlike plain JSON.
  • A document can have embedded documents and arrays. They can be nested arbitrarily deeply.

We also learned a lot about searching using the find method and the MongoDB Query Language (MQL). In particular, we saw how to use it from a Node.js program, which is similar but slightly different from how it's used in MongoSH, the Mongo Shell.

The following are some example arguments to the find() method:

  • {field: val} searches for documents with fields having that value
  • {field: {$eq: val}} is best for untrustworthy values
  • {field1: val1, field2: val2} matches on both field values
  • {field: {$lt: val}} compares fields to a fixed value
  • {field: {$in: [val1, val2]}} provides a list of possible values
  • {$and: [ cond1, cond2 ]} allows us several search conditions
  • {field: {$regex: /pattern/}} matches using regular expressions, very powerful for strings
  • {"a.b": val} searches nested fields. The nested field can also be an array.

Finally, we learned about sorting, projection and limits:

  • .project({a: 1, b: 1}) just retrieve fields a and b
  • .sort({a: 1, b: -1}) sort by a in ascending order and b in descending order
  • .limit(n) return only n results

References

to do emphasis use of findOne when only one is wanted/expected

to do mention null return value of findOne if no document matches