Contacts Assignment¶
New for fall 2024: this is a solo assignment. You will work by yourself in your own database.
The goals for this assignment are to get you comfortable with the following:
- Designing and setting up some MySQL tables using
CREATE TABLE
and such. You'll write the SQL statements up in a batch file so that you can usesource
to re-create your tables and so that I can read it. - Inserting data into MySQL tables, using
INSERT
. You might have a batch file to do this, or you might get your data from a text (CSV) file. - Creating queries to process and display that data. The queries will be contained in "batch" files, just like the ones we've used in class.
Create a contacts
folder in your
cs304/assignments
folder for this homework.
Your work will consist of a set of batch files. Details on submission below.
Time¶
Historically, this takes almost everyone less than 8 hours to complete. If it's taken you that long and you're not done, stop and get some help.
Data Entry¶
Data is necessary so that we can test the database design and queries, but data entry is tedious and error prone. I would like to minimize that, but I can only do so much. The best I can offer is a pair of text files in CSV format with some data and some instructions; you're welcome to augment them with additional data.
Here's a list of addresses
name,kind,city,state
Mom,home,Fairfax,VA
Connie,home,Lynden,WA
Dad,home,Charlotte,NC
Lynn,home,Los Angeles,CA
Pattie,home,Wellesley,MA
Allen,work,Needham,MA
Jeff,home,Wellesley,MA
Pat,home,Natick,MA
Matt Damon,home,Boston,MA
Matt Damon,home,Hollywood,CA
Scout,home,Dartmouth,NH
Ron,home,Portland,ME
Fred,home,New Haven,CT
George,home,Providence,RI
Harry,home,Montpelier,VT
Percy,work,Providence,RI
Here's a list of phone numbers
name,kind,phnum
Mom,home,7035555681
Mom,cell,7035551234
Connie,home,6095551243
Dad,home,7045553004
Dad,cell,7045551324
Dad,other,7045551343
Lynn,cell,6915551423
Pattie,home,6175551432
Allen,home,6175552134
Jeff,work,7815552314
Pat,work,7815552341
Matt Damon,work,6175552341
Scout,cell,8085553124
Ron,home,2075551111
Fred,cell,2035552222
George,work,4015553333
George,home,4015554444
Harry,home,8025558888
Percy,cell,4015559999
You should copy those files and use them as part of your data. The files are values that are separated with comma characters, so the format is called "comma-separated values" or "CSV." This is a common output format from programs like MS Excel, Numbers and Google Sheets.
In the data above, you may assume that all names are unique. So, for example, there are two phone numbers for "mom"; those belong to the same person. However, your database should not assume that names will always be unique.
You may have to change those files some before importing, especially since the structure of your tables won't exactly match those files. Also, the files above have column headers (just for clarity), and you'll need to delete those initial lines from your CSV files. Or, you can add the "IGNORE number LINES" clause to the MySQL LOAD DATA command.
It might be good to read ahead about CSV files
Exporting/Importing Data¶
First, set up your MySQL database tables. Think carefully about this part: your tables may or may not match the CSV files.
Copy/edit and restructure the csv
files to have the rows
and columns you want. Importing them into a spreadsheet (such as
Excel, Numbers or Google Sheets) is another nice way to edit them in a
structured way.
Finally, you're ready to load the data:
Export the file from your spreadsheet as CSV/TSV.
Next, SCP the file, say phone-list.csv
to
your folder in your server account.
Then, in MySQL, load the CSV file into the tables in your database. This is a little tricky and requires several parts (or you can write this down in a batch file).
- Start the MySQL client.
- The SQL statement is
LOAD DATA LOCAL INFILE
. It takes many arguments. - One argument is the local file to load; that's just a filename.
- The next argument is the table to load the data into. I've
used
foo
to stand for that. You will substitute your own table name. - You'll specify how fields end, namely with a comma (CSV) or tab (TSV).
- You'll specify how lines end, namely with a newline
character, denoted \n. (But files copied from a Mac may use a
different line ending. You can convert to Unix line endings
using the Linux
command
mac2unix
.)
Example (Note the use of continuation lines):
mysql> mysql> load data local infile 'addresses.csv' -> into table foo -> fields terminated by ',' -> lines terminated by '\n'; Query OK, 10 rows affected (0.00 sec) Records: 10 Deleted: 0 Skipped: 0 Warnings: 0
Alternatively, of course, you may write this code down in a batch file, which will allow you to edit until it works.
By using this test data, the CS 304 graders and I can look for particular parts of the results, to check whether your queries are working.
I think you should not use auto_increment
in this assignment.
The Assignment¶
- Create tables and queries for a
contacts
"database" (collection of tables). You'll create these tables in your personal MySQL database (e.g. `ww123_db`). Your tables will have the following properties:- Allow people to have more than one phone number (room, work, cell, etc.)
- Allow people to have more than one address (home, school, work, vacation, etc.)
- There should be a single file,
Contacts.sql
that sets up the tables. You can create a separate batch file to insert the data if you want (probably using the CSV files described above), or you can do it by some other means. I only want to see thecreate table
SQL code. You can drop tables before creating them, so that it always starts fresh. I won't run this file, so don't worry about making it idempotent. - I expect you to use the referential integrity constraints.
- I expect you to be thoughtful about datatypes and sizes. Remember to choose the right datatype, allow enough space, but not too much, and try to prevent incorrect data.
- Create a query called
Q1-All-Addresses.sql
that creates a simple list of each person's name and addresses. This'll be useful for debugging. The output columns should be name, kind of address, city and state. Sort it by name and then by city and state. - Create a query called
Q2-All-Phones.sql
that creates a simple list of each person's name and phone numbers. This'll also be useful for debugging. The output columns should be name, kind of phone, phone number. Sort it by name and then by kind. - Create a query called
Q3-All-Home-Contacts.sql
that creates a list of each person's name, home address and home phone, if the database has both. Note that the query results shouldn't include the wordhome
, since that would be boring. Output should be name, city, state, and phone number. Sort by name and city. - Create a query called
Q4-Phone-Count.sql
that tells how many phone numbers each person has. Columns are name and count. Sort by name. - Create a query called
Q5-New-Englanders.sql
that reports the names of anyone living (home address) in the states of MA, CT, VT, NH, ME and RI. Be sure to get the logic on this right; it's easy to get it wrong. Columns are name and state, sorted by name. - Create a query called
Q6-Hub-People.sql
that reports the name and phone number of anyone in the 617 area code (any kind of phone). Columns are name and phone number, sorted by name. - Create a query called
Q7-BiCoastal-People.sql
that reports the name and addresses of anyone who has both an address in MA and one in CA. This one is tricky! It will take some thought and insight. Note that Matt Damon in the test data is bi-coastal. Columns are name, MA city (city in Massachusetts), and CA city (city in California). Sort by name. - Create a query called
Q8-Tokens.sql
that reports the name and address of anyone who is the only person with a home address from their state in the database. (That is, they are the token New Yorker, say, or the token Iowan.) In the test data, there are several tokens: "Fred" is the token Connecticutter. Columns are name, state, and count (which will be 1). Sort by name. - Switching to the WMDB, write a query
Q-filmographies.sql
that prints all actors and the year and title of all movies they appear in. Columns (name, year, title) in that order. You must build your query out of just themovie
,person
andcredit
tables. The results should be sorted by actor name, and then by the movie's year (most recent movies first), and finally by the title of the movie.
Running your Queries¶
Note that we will test your queries by running the batch scripts from the shell, like this:
mysql < Q7-BiCoastal-People.sql
This means that your batch scripts will need the use
youracct_db;
or use wmdb;
statement at
the top, so that it will choose the proper database. Be sure to test
your scripts in this manner.
Testing¶
I have written a script that compares the output of all of your batch files with the stored output of all of my solutions, just as with the Queries assignment.
Assuming that you are in the directory that contains your contacts
directory, you can run it like this:
~cs304/pub/bin/compare-contacts.sh contacts
That only tests the query output. It doesn't check the structure of
your database (the Contacts.sql
file). I'll do that by hand and eye.
Submission¶
You'll submit a tarfile of the entire contacts
folder.
Tar up the contacts
folder and drop that, just as you did for the
queries
assignment.
Also, please submit a "dummy" submission to Gradescope. This just like in the unix assignment. You'll do this for every assignment. That's (1) how I you are done and (2) how I give feedback and scores.
Summary Checklist:¶
- dropped a tar file
- submitted to Gradescope
Thank you!
Rubric¶
Points will also be deducted for omitting the appropriate use ww123_db;
line at the top of each file.
Make sure your SQL is clear, nicely formatted, and appropriately commented.
Time and Work
The following link has been updated for Fall 2024.
Finally, when you have completed the assignment, make sure you fill out the Time and Work Fall 2024 That report is required.