Databases and MongoDB Query Language MQL¶
This reading begins our discussion of databases and how to use them from Node.js. It assumes no prior knowledge of databases, so if I make a mistake and refer to something you don't know, please help me out and let me know.
- Databases
- CRUD
- SQL vs NoSQL
- MongoDB
- Documents
- WMDB
- Keys
- IDs in MongoDB
- MongoDB Interfaces
- Find123
- MongoDB Query Language MQL
- Finding using Property Values
- Comparison Operators
- Security Implications
- Boolean Combinations
- Strings and Regular Expressions
- Metacharacters and Escaping
- Case Insensitive Search
- Dynamic Regular Expressions
- Nested Documents
- Nested Arrays
- Projection
- Sorting
- Limits
- Running Examples
- Conclusion
- References
Databases¶
Databases are software that stores data and allows us to get it back out quickly and efficiently. We typically don't worry about exactly where or how the data is stored. (In this course, we'll be using an installation of MongoDB that is stored in the cloud, meaning on servers maintained by MongoDB; for all we know, they have outsourced the servers to AWS or Google cloud.)
Very generally, a database stores data, but that definition is so generally as to be useless. So, let's be more specific:
A database
- stores data
- the data is in machine-friendly form
- the representation is easy and efficient to search. For example, hashtables (dictionaries) or b-trees.
- the database may have additional indexes to speed up certain kinds of searches.
- the representation is easy and efficient to insert, update and delete
For the most part, we trust that the DBMS — Database Management System does all these things correctly and efficiently. So, we will treat this as an "opaque box" or abstraction barrier, and we won't worry about how the DBMS is implemented. (That would be a different course.)
CRUD¶
What we do care about is how to create new items in the database, read them back out, update them, and if necessary, delete them. These four operations go by the evocative acronym CRUD:
- CREATE means to insert a new item or entity in the database. For example, a new movie in a collection of movies or adding a new hire to a collection of employees.
- READ means to look up an entity or set of entities. Say you look up Glass Onion on the Internet Movie DataBase — IMDB or all employees in the sales department.
- UPDATE means to modify an entity. Say Wendy in Sales is getting a bonus and that needs to be recorded in the database. Or we need to update the average rating of Glass Onion.
- DELETE means to remove the item from the database. Maybe Wendy in Sales has been hired away, and we need to delete her from the database.
SQL vs NoSQL¶
For many decades, the industry standard database technology has been SQL. There are a number of implementations by different companies, but they all organize data in similar ways (tables of rows and columns) and support a common language (Structured Query Language — SQL) for querying and managing the data.
In the past decade or so, an important new database technology has arisen, called NoSQL. It's strange to be defined by what it's not, so let's talk a little about what it is.
In a NoSQL database, the entities are not necessarily stored as a row in a table (as in SQL). In MongoDB, as in many other NoSQL databases, the entities are stored as JSON documents. (There are other kinds of NoSQL databases, but we will not be covering them.) Furthermore, NoSQL databases allow the items to be stored more flexibly, so, for example, not every document will have the same properties (fields) and they may have embedded documents or even arrays of documents. There are a few other important differences (joins, the ACID properties and horizonal scaling via sharding) that we will discuss later in course.
Note that this is not an either-or choice. It is common for companies to have both a SQL database for certain kinds of data (employee database, financial records) and a NoSQL database (website, mobile apps, point of sale registers).
Many tutorials and references for NoSQL databases explicitly compare them to SQL databases, but since most of you don't know SQL, we won't be doing that in CS 304. We will discuss SQL later in the course.
MongoDB¶
One of the important NoSQL databases is MongoDB. It has documentation, support and a free cloud-based installation (Mongo Atlas) at MongoDB.com. It's what we will be using in CS 304.
Sign up for the MongoDB Student Pack to receive $50 in Atlas credits and free certification!
Documents¶
The items or entities that we store in MongoDB are called documents. A document is JSON data, which MongoDB actually stores in a compact binary representation called BSON.
In a collection of people, each person might be represented with a document like this:
{name: "Scott",
hair: "increasingly gray",
height: 170, // centimeters
office: "W118",
email: "scott.anderson@wellesley.edu"
}
JSON objects like this give us key/value pairs, so given a document that we've looked up, we can easily find a person's name or height or whatever.
Documents can be more interesting than that, though. They can have
embedded documents for more structured data. Let's add a multi-part
addr
field for the mailing address:
{name: "Scott",
hair: "increasingly gray",
height: 170, // centimeters
office: "W118",
email: "scott.anderson@wellesley.edu",
addr: {office: "Computer Science Department",
organization: "Wellesley College",
street: "106 Central St",
city: "Wellesley",
state: "MA",
zip: "02481"
}
}
So, if the variable person1
contains the document above, we can
print the person's zip code like this:
console.log(person1.addr.zip);
(Note that chains of dots like that can fail if, say, someone in the
database doesn't have an addr
field, and with MongoDB, not every
person needs to have all the same fields. So modern JavaScript
introduced the ?.
operator which will just give you undefined
if
the chain fails. See optional
chaining.)
Documents can also have embedded lists. Let's add a list of hobbies:
{name: "Scott",
hair: "increasingly gray",
height: 170, // centimeters
office: "W118",
email: "scott.anderson@wellesley.edu",
addr: {office: "Computer Science Department",
organization: "Wellesley College",
street: "106 Central St",
city: "Wellesley",
state: "MA",
zip: "02481"
}
hobbies: ["tennis", "movies", "reading", "bridge", "dance"]
}
MongoDB is organized around documents like these. Sets of documents are stored in a collection, which is usually a set of documents for a particular kind of entity. For example, an student database might have collections of students, courses, majors, dorms and the like.
In MongoDB, we can store a number of databases, where the word "database" means a set of related collections.
- a document is data about one entity, represented as JSON
- a collection is a set of documents about a particular kind of entity
- a database is a set of related collections
To make this more concrete, let's talk in depth about a particular example database.
WMDB¶
For the purposes of this course, we have created a database of movies called the Wellesley Movie Database or WMDB. It has three collections:
movies
: each document is about a particular moviepeople
: each document is about a person, maybe an actor, director or key gripstaff
: are Wellesley students who have contributed to the WMDB over the years
Let's start with one of the staff members, me:
{uid: 1,
name: "Scott D. Anderson"
}
The uid
is a unique numeric identifier; we'll talk more about it in the next section.
Now let's turn to a person, George Clooney. His entry in the people
collection might look like:
{nm: 123,
name: "George Clooney",
birthdate: "1961-05-06",
addedby: {uid: 1, name: "Scott D. Anderson"}
}
Again, the nm
is a numeric unique identifier.
Here's a movie he acted in. It's a document in the movies
collection
{tt: 120780,
title: "Out of Sight",
release: "1998",
director: null
addedby: {uid: 1, name: "Scott D. Anderson"}
cast: [ {nm: 123, name: "George Clooney"},
{nm: 182, name: "Jennifer Lopez"} ]
}
You can see that the addedby
information is duplicated in the person
document and the movie document. Some, but not all, of the person
information is duplicated in the movie document.
Indeed, our earlier person document is incomplete. It actually has additional information in it, listing the movies that he's acted in (according to our small and incomplete WMDB; see the IMDB for the complete data):
{nm: 123,
name: "George Clooney",
birthdate: "1961-05-06",
addedby: {uid: 1, name: "Scott D. Anderson"}
movies: [{tt: 90865, title: "Combat High", release: "null"},
{tt: 120780, title: "Out of Sight", release: "1998"},
{tt: 1454468, title: "Gravity", release: "2014"}]
}
It's common in MongoDB to duplicate data. We can see in George
Clooney's data that he acted in "Out of Sight". However, we don't know
the cast for "Out of Sight"; for that, we'd have to look up its entry
in the movies
collection. How do we decide what data to put in a
document and when to cut it off and require another lookup?
There's no single right answer for that. However, for web applications,
a useful guideline is to have all the information in one document to
serve one web request. For the lookup
assignment, you'll write a web
application that displays information about people and movies.
For this app, we wanted to be able to satisfy any request with just a single lookup. That's an arbitrary rule, but useful. In general, you don't want to have a website page request require an arbitrary number of database lookups, such as every person in a cast or every film in an actor's filmography.
(Note that not everyone in the WMDB people
collection is an
actor. Some are directors. Some have some other role.)
Keys¶
One of the things you noticed in the examples above were that each entity had a unique numeric identifier. This is common and standard in databases. This property is usually called an id (identifier) or a key.
One really important reason for this is that names are not unique. Every time I call the doctor, they ask me for my birthdate, because there's more than one "Scott Anderson" in their database. It's the same reason that Wellesley students are issued an B-number (now C-number). (Movie titles aren't always unique, either. IMDB lists 4 movies named "Hamlet": 1996, 1990, 2000, 1948.)
It's also convenient. The Social Security Administration and other US government entities can identify people with their 9-digit social security number rather than name + birthdate (and even name + birthdate isn't guaranteed to be unique). It's nice to have something that is compact and guaranteed to be unique.
The IMDB does this as well. If you look at their URLs, the page for a
particular person looks like /name/nm1234
and for a movie looks like
/title/tt4567
. We used these same ID numbers in the WMDB. (The
reason so many of my examples use George Clooney is that his IMDB id
number is 123, which is easy to remember.)
IDs in MongoDB¶
IDs are important enough that MongoDB has special support for
it. Unless you go out of your way to disable or replace it, each
document inserted into a collection gets issued a unique identifying
number as the value of the _id
property. This number is computed
using a variety of pieces of information so that with extremely high
probability, it is globally unique, without requiring a global
counter. (It's good to avoid global quantities when code is running
concurrently, which MongoDB is designed for). So, the complete WMDB
entry for George Clooney is actually:
{"_id": ObjectId("63c460782a77c1456baaffd9"),
nm: 123,
name: "George Clooney",
birthdate: "1961-05-06",
addedby: {uid: 1, name: "Scott D. Anderson"}
movies: [{tt: 90865, title: "Combat High", release: "null"},
{tt: 120780, title: "Out of Sight", release: "1998"},
{tt: 1454468, title: "Gravity", release: "2014"}]
}
The ObjectId
is a 24-character hexadecimal number that is virtually
guaranteed to be unique. Unfortunately, they lose a bit in convenience
and compactness, being 24 characters long. We will discuss them more
later in the course, but you can ignore them for now.
MongoDB Interfaces¶
The underlying software provides the ability to query collections, but there are a variety of interfaces. The one that MongoDB does the best job of documenting is their Mongo Shell, called Mongosh. If you do a web search for "mongo find", you'll probably be taken to this page:
While that page has a lot of great information on it, unfortunately, that's not the interface from Nodejs.
Better references are these:
The two interfaces are similar though, so it's not a bad idea to read the Mongosh documentation. Just keep in mind the warning at the top of it: that it's for Mongosh only.
In this course, we've decided to mostly ignore MongoSH (though we may
change our minds) and focus on just one API. The disadvantage is that
rather than typing commands into a shell (like MongoSH), you'll write
them in a file and run the file using node
. While this is more
tedious for quick queries, it will be better for when you are writing
your apps.
MongoDB provides two functions: findOne
and find
. They search
identically, but if the query matches multiple documents, findOne
only returns the first, while find
returns all of them.
Find123¶
Here's our first complete example. You can run it like this:
cd ~cs304node/apps/exampleQueries/
node find123.js
We'll pick it apart, and then we'll switch to discussing how to use
.find()
:
// File to find the person with nm 123
// Scott D. Anderson and Olivia Giandrea
const path = require('path');
require("dotenv").config({ path: path.join(process.env.HOME, '.env')});
const { Connection } = require('./connection');
const cs304 = require('./cs304');
const mongoUri = cs304.getMongoUri();
console.log('mongoUri', mongoUri);
async function main() {
let db = await Connection.open(mongoUri, 'wmdb');
// here's the one line that does the query
let results = await db.collection('people').find({nm: 123}).toArray();
console.log('found', results.length, 'results');
results.forEach((r) => console.log(r));
await Connection.close();
console.log('done');
}
main();
The file begins with a comment saying what it does and who wrote it; always nice.
Next, there are a few lines that load the path
, dotenv
,
connection
and cs304
modules:
const path = require('path');
require("dotenv").config({ path: path.join(process.env.HOME, '.env')});
const { Connection } = require('./connection');
const cs304 = require('./cs304');
As we learned earlier in the course, the Node software usually requires modules to be loaded before it can do anything really interesting. In brief:
- the
path
module allows us to join paths and filenames. - the
dotenv
module reads in some environment variable settings from a file named.env
in our home directory. The home directory is found using yet another environment variable called "HOME" (or, from within Node,process.env.HOME
). - The
connection
module is some code we wrote to help you with connecting to the Mongo database in a convenient way. - The
cs304
module has some additional code we wrote to help you with your apps.
Both of the first two are loaded from the node_modules
folder in the
current directory. The second two are loaded from the local directory:
notice the leading ./
in the pathnames.
The next line generates a URI to connect to Mongo Atlas, using a
function that we supplied in the cs304
module:
const mongoUri = cs304.getMongoUri();
This preamble boilerplate code will be at the top of all your apps;
it's in our all our demos and our starter
app as well. You can copy
it and mostly forget it, though it's good to have a little
understanding of what it does.
Finally, we get to the function, main
, that does all the work. It's
an async
function because it uses await
to open a connection,
to query the database, and to close the connection.
async function main() {
let db = await Connection.open(mongoUri, 'wmdb');
// here's the one line that does the query
let results = await db.collection('people').find({nm: 123}).toArray();
console.log('found', results.length, 'results');
results.forEach((r) => console.log(r));
await Connection.close();
console.log('done');
}
The first line connects to Mongo Atlas and says we'll use the wmdb
database.
let db = await Connection.open(mongoUri, 'wmdb');
You'll do something like that before any use of the database; it's what gets us started.
At long last, we get to the code that queries the database. The
asynchronous function is find
. First, we have to choose what
collection we want to query, using the .collection(colName)
method
of the db
. The find
function is actually a method on a collection,
so we chain these together.
The findOne
and find
methods take the same arguments and search
the same way. (Most of the rest of this reading will be about the
argument to find
.) The difference is in the return value. The
findOne
method returns the desired document. The find
method
returns something called a cursor. A cursor can be used to process
the results one at a time, but in this course we will always retrieve
all the results to an array, using the toArray()
method:
let results = await db.collection('people').find({nm: 123}).toArray();
The results
variable just holds a list of JSON documents. The next
few lines are normal, everyday JavaScript, which you're quite used to
now. Here, they just print out the results.
console.log('found', results.length, 'results');
results.forEach((r) => console.log(r));
Last, we close the database connection:
await Connection.close();
console.log('done');
Our web apps typically will not close the connection; they will leave it open for subsequent queries.
The last line of the file just runs main
:
main();
That's it. So, now we can start discussing how to do queries using find
.
MongoDB Query Language MQL¶
Querying in MongoDB is broken down into two kinds: relatively simple queries that use the MongoDB Query Language (MQL) and more complex queries using the MongoDB Aggregation Framework. In this course, I expect you to feel comfortable with the MQL and have some ability with the aggregation framework, but not necessarily feeling comfortable with it. We'll discuss the aggregation framework later in the course.
The MQL is used in the arguments to find
and other method that query
the database. We'll see others.
The argument to find
is a document that describes what we are
looking for. In the query above, the argument was {nm: 123}
.
Finding using Property Values¶
Probably the most common way to look something up is by property/value pairs that it has. We just saw how to look someone up using their NM:
{nm: 123}
Change the 123 to 182 and we find Jennifer Lopez.
We can also look people up by name in the same way:
{name: "George Clooney"}
Of course, we have to make sure it's spelled correctly and we might have to deal with multiples, because names are not unique. But you knew that.
The following would look up movies named "Hamlet". (Assume we chose
the "movies" collection instead of the "people" collection, otherwise,
we'd probablyl find nothing, since the property is name
not
title
.)
{title: "Hamlet"}
If we just want the 1948 version with Sir Laurence Olivier, we can specify multiple field values:
{title: "Hamlet", release: 1948}
It's important to remember that the string "1948"
is not the same as
the number 1948
so make sure the datatype is right.
Comparison Operators¶
We can also search based on numeric inequalities. If we want to see the first few people inserted into the IMDB, we'd want people with a low NM. (The ID value for people is called NM, evocative of NAME, while the ID value for movies is called TT, evocative of TITLE.) The following does the trick, returning everyone with an NM < 12.
{nm: {$lt: 12}}
The comparison query operators available are
$eq
matches values equal to the given value$gt
matches values greater than the given value$gte
matches values greater than or equal to the given value$lt
matches values less than the given value$lte
matches values less than or equal to the given value$ne
matches values that are not equal to the given value
These are pretty straightforward conceptually, though of course the
syntax is weird and unfamiliar to us. Note also the spelling: it's
$gte
not $ge
as you might think.
$in
matches values that are in the given array$nin
matches values that are not in the given array
The $in
operator is really useful. If you want to look up several
people all at once, this would do the trick:
{nm: {$in: [123, 182]}} // both cast members of Out of Sight
Security Implications¶
There is an important warning associated with user-supplied values. Malicious users can try to craft inputs that will cause the database to do things unintended by the programmer. So, while
{nm: 123}
will safely look up George Clooney, if the 123 came from the user, the result might not be safe:
{nm: userVal}
(For example, they could supply a userVal
that works out to
{$ne:-1}
and match everything in the database instead of just one
person. That might leak sensitive data. Or if they are able to delete
chosen data, they could delete all the data.) Therefore, when using
user data, it's better to do:
{nm: {$eq: userVal}}
See the Security Implications in that page.
Boolean Combinations¶
Of course, we need to be able to test several conditions, we need to combine them with and, or, and not. Here are some examples:
{$and: [ {nm: {$gt: 120}}, {nm: {$lt: 130}} ]} // 120 < nm < 130
Here's a more compact way, since we are using the same field in each comparison:
{nm: {$gt: 120, $lt: 130}} // 120 < nm < 130
Notice that the $and
operator is out front, and the corresponding
value is an array of booleans. This kind of prefix notation is
probably unfamiliar to you (though it's standard in languages like
Lisp). It does have some advantages, if you have more than two
conditions. This tests for three different NM values (though of course
$in
is easier):
{$or: [ {nm: 123}, {nm: 147}, {nm: 183} ]}
The following is ridiculous, but possible:
// 5 < nm < 10 or 120 < nm < 125
{$or: [ {$and: [{nm: {$gt: 5}}, {nm: {$lt: 10}}]},
{$and: [{nm: {$gt: 120}}, {nm: {$lt: 125}}]}]}
Let's not go further down this road.
Strings and Regular Expressions¶
We've already searched using strings. But there's a power language for describing sets of strings called regular expressions. It's a topic with cool connections to automata theory and state machines, but we don't have time for all that right now. Indeed, we will only scratch the surface of regular expressions. But put learning more about regular expressions on your personal to-do list.
Regular expressions (regex to its friends) is way to search strings. Regular expressions are built-in to the JavaScript language and fixed regex patterns are denoted with surrounding slashes instead of quotation marks. The following is a regex that matches the word "Potter":
/Potter/
Thus, it would be useful for searching for the various Harry Potter movies. Here's how you would do that in MQL:
{title: {$regex: /Potter/}}
Note that by default, a regex matches any substring of the searched text. The following would find people named John:
{name: {$regex: /John/}}
But it would also match people whose last name was "John" or "Johnson". If you want to the pattern to match only if it matches the beginning of the string, use a caret character, which matches the beginning of the string:
{name: {$regex: /^John/}}
Similarly, if you want to specify that the regular expression matches at the end of the string, you would end with a dollar sign, which matches the end of the string. So the following matches people whose last name is "George".
{name: {$regex: /George$/}}
Metacharacters and Escaping¶
Regular expression languages are fascinating, powerful and
complicated. The two examples we just saw — of ^
to match the
beginning of the string and $
to match the end of the string —
are examples of metacharacters: characters that are part of the
regex language rather than being literally matched against the
string. Another example is .
which matches against any single
character (it's a
"wildcard" character, like
wildcards that are sometimes allowed in games like Poker or Mahjong).
There are metacharacters to match any digit, any "word" character and many, many other ways of matching. See JavaScript Regular Expressions. The MQL regex language is nearly the same.
Metacharacters are powerful, but they complicate matching. Suppose you wanted to search for movies in the WMDB whose title contains a dot (period). As we saw above, the dot is a metacharacter that matches any single character. If you searched for:
{title: /./} // match any character
That pattern would match every movie in the collection, which is not what we want. So, to match a literal dot, we need to escape it, which is done with a backslash, like this:
{title: /\./} // match a literal dot
that search in the WMDB matches only 28 movies, with titles like:
'E.T. the Extra-Terrestrial',
'Mrs. Doubtfire',
'The Adventures of Mary-Kate & Ashley: The Case of the U.S. S',
'Dr. Horribles Sing Along Blog',
'Stuck in Love.',
'Guardians of the Galaxy Vol. 2',
'Emma.',
'Dr. Zhivago',
'Yi Yi: A One and a Two...',
'St. Trinians',
'Mr. & Mrs. Smith',
'Mr. Skeffington',
'U.S. Marshals',
'Sympathy for Mr. Vengeance',
'Rumor Has It...',
'Mrs. Palfrey at The Claremont',
'Monsters vs. Aliens',
'G.I. Joe: Retaliation',
'The Young and Prodigious T.S. Spivet',
'Godzilla vs. Kong',
'Your Name.',
'To All the Boys: P.S. I Still Love You',
'A.I. Artificial Intelligence',
'Mr. and Mrs. Smith',
'Fantastic Mr. Fox',
'Jaane Tu... Ya Jaane Na',
'Pokémon the Movie: Kyurem vs. the Sword of Justice',
'We Love You, Mr. Jin'
Some of the metacharacters that need escaping are:
^ $ . ( ) ? + * [ ] | @ \
So, for example, if you wanted to search for a movie with a dollar sign in the title, you would have to do:
{title: /\$9\.99/} // match a literal dollar sign and dot
Case Insensitive Search¶
One common desire is case-insensitive matching, which you can get by appending an "i" to the regex:
{name: {$regex: /^george/i }}
The search above will match "George", "george", "GEORGE", and even weird strings like "GeOrGe", should they exist.
Dynamic Regular Expressions¶
The slash syntax is for regex patterns that are known at compile time. They can be turned into particularly efficient code. However, if the user has supplied the string to search for, you have to dynamically construct a regex, like this:
let pattern = new RegExp(userVal);
{name: {$regex: pattern}}
If you want case-insensitive matching, you can do:
let pattern = new RegExp(userVal, 'i');
{name: {$regex: pattern}}
Nested Documents¶
As we saw, it can be useful to have nested documents, such as the director of a movie or a structured address. Can we search by that criterion? Yes, we can, using dot notation:
{"director.name": "Peter Jackson"} // movies directed by Peter Jackson
{"director.nm": 1392} // same, but using an unambiguous ID
{"addr.zip": "02481"} // people in the Wellesley zip code
If things are nested deeper, you can use more dots.
Nested Arrays¶
We also have nested arrays. For example, the cast
field is an array
of documents, each of which is a dictionary describing a cast member.
First, we can access the size or length of the array.
{"cast": { $size: 3 }} // movies with a cast of size 3
We can also search the embedded array:
{"cast.nm": 123} // George Clooney in the cast
Movies with someone named George in the cast:
{"cast.name": /^george/i } // someone named George in the cast
We can even do more elaborate tests:
{"cast": { $elemMatch: { nm: { $gt: 1, $lt: 10 }}}} // person numbered 1-10 in the cast
Projection¶
There's no sense fetching data you don't need. It just wastes network bandwidth to download and ignore the data, and it wastes memory in your Node process. Choosing which data you want to retrieve is called projection (from the idea in mathematics of projecting from a higher-dimensional space to a lower-dimensional space).
To do projection, we will use the optional second argument to the
find
method. We provide a document that gives the fields we want,
with an associated value of 1. Everything else is discarded, except
the _id
field. If we want to get rid of that, too, we list it, but
with a 0.
The following finds just the title and tt of each movie:
.find({}, {projection: {_id: 0, title: 1, tt: 1}})
An alternative is to chain the project
method onto the
find
. That can often be clearer. The missing argument to find
is
the same as supplying an empty document.
.find().project({_id: 0, title: 1, tt: 1});
Sorting¶
We can also use the second argument to do sorting of the results. To sort by a field in ascending order, list the field with a 1. For descending order, list it with a -1.
The following gets all movies, sorted by title (ascending).
find( {}, {sort: {title: 1}, projection: {_id: 0, title: 1, tt: 1}});
Here they are in descending order:
find( {}, {sort: {title: -1}, projection: {_id: 0, title: 1, tt: 1}});
Similarly, there's a sort
method that we can chain onto the find
:
find().sort({title: -1}).project({title: 1, tt: 1});
Limits¶
We might only need a few of the items. Again, it's wasteful to
retrieve data we don't need. Or, maybe we've sorted them by some
useful criterion and we only want the top 10. We can chain a limit
method onto our find
to limit the number of results.
Here's the first movie, alphabetically:
find( {}, {sort: {title: 1}, projection: {_id: 0, title: 1, tt: 1}}).limit(1);
or, equivalently:
find({}).sort({title: 1}).project({_id: 0, title: 1, tt: 1}).limit(1);
Chaining allows us to spread out the chain over several lines of code, which is then easier to read:
someCollection.find({...})
.sort({fields...})
.project({fields...})
.limit(N)
But notice that, as with jQuery chaining, the different lines do not end in a semi-colon. A semi-colon would break the chain and also be a syntax error.
Running Examples¶
There are a great many working examples in the
~cs304node/apps/exampleQueries/
folder. We'll go over these in class.
In lab, we'll learn how to copy and modify them.
Here's the first half of the example queries
Conclusion¶
We've covered a lot of ground in this reading. Let's take a minute to reflect.
- Two major kinds of database technologies are SQL and NoSQL.
- MongoDB is a NoSQL kind of database.
- MongoDB stores documents in collections.
- A database is a set of related collections. E.g. the
wmdb
hasmovies
,people
andstaff
. - A document is JSON data: fields with values that are strings, numbers, booleans and such.
- Values can also be date objects, unlike plain JSON.
- A document can have embedded documents and arrays. They can be nested arbitrarily deeply.
We also learned a lot about searching using the find
method and the
MongoDB Query Language (MQL). In particular, we saw how to use it from
a Node.js program, which is similar but slightly different from how
it's used in MongoSH, the Mongo Shell.
The following are some example arguments to the find()
method:
{field: val}
searches for documents with fields having that value{field: {$eq: val}}
is best for untrustworthy values{field1: val1, field2: val2}
matches on both field values{field: {$lt: val}}
compares fields to a fixed value{field: {$in: [val1, val2]}}
provides a list of possible values{$and: [ cond1, cond2 ]}
allows us several search conditions{field: {$regex: /pattern/}}
matches using regular expressions, very powerful for strings{"a.b": val}
searches nested fields. The nested field can also be an array.
Finally, we learned about sorting, projection and limits:
.project({a: 1, b: 1})
just retrieve fieldsa
andb
.sort({a: 1, b: -1})
sort bya
in ascending order andb
in descending order.limit(n)
return onlyn
results
References¶
to do emphasis use of
findOne
when only one is wanted/expectedto do mention null return value of
findOne
if no document matches