Graphic by Keith Ohlfs

CS111, Wellesley College, Fall 1997

Problem Set #7:
Day-To-Day's Database

Due: 6pm, Monday, December 8
[CS111 Home Page] [Syllabus] [Students] [Lecture Notes] [Assignments] [Programs] [Documentation] [Software Installation] [FAQ] [CS Dept.] [CWIS]

 

The goal of this assignment is to give you some hands-on experience with data structures, data abstraction, and arrays.

A Crisis with Day-To-Day's Database

Now that you have mastered the basic elements of Java programming, you have decided to do some work as a programming consultant to earn a little extra spending money. You answer an ad from the Day-To-Day temporary employment agency concerning some troubles they are having with their employee database. The human resource department of Day-To-Day uses a simple database program (described below) that maintains the name and email address of each active temporary employee. Unfortunately, the creators of this database program only tested the program on small test databases and did not worry about how long certain operations would take on large databases. In particular, it turns out that the database program is rather inefficient at adding each new person, so that large databases can take a very long time to create. Your job is to rewrite a portion of the database code in a more efficient form so that large databases will be created more quickly. Because the company does not want to have to retrain its human resource staff to work with a new database program, they have asked you to modify the existing program so that it has the same behavior from the user's perspective, except that certain operations should be faster.

The Features of the Database Program

The database program used by Day-To-Day can be found in the folder "Database" in the CS111 download folder. Upon running the applet DatabaseWorld.html within this folder, you find that the program consists of a window with several buttons at the left as shown below:

If you press the Load DB File button, you will be prompted to select a database file using the usual Macintosh file dialog mechanism. All files ending in the extension .db are database files. The Database folder contains four such files: cs111.db, test4.db, test6.db, and test10.db. If you select test6.db, the database window will be updated to show the six personnel entries in this file, as shown below:

Each personnel entry has four fields: a last name, a first name, a middle name or initial (which may be empty), and an email name. An entry may be selected by clicking on it. At most one entry can be selected at any one time.

The other buttons manipulate the entries as described below.

You should play around with the applet to get a feel for these operations before attempting the rest of the problem set.

The Database Class Interface

The database program is rather large. Fortunately, due to the wonders of data abstraction, it turns out that you only have to understand the Database class defined in the source code file Database.java. Instances of the Database class maintain an ordered collection of Person objects, each of which represents one entry in the database. The Database class is in some sense the heart of the database program --- all of the other classes in the program provide support for interacting with Database objects or the Person objects that they hold.

The Database class has the following interface:

Constructor:

public Database()
Create an empty database (one that contains no persons). 
 
Instance Methods:
 
public int size()
Return the number of persons in the database.
 
public void clear()
Make this database empty. 
 
public void add(Person p)
Add person p to the end of this database. 
 
public void remove(String s)
Remove the person with description string s from this database.
If no person in the database has description string s, do nothing. 
(The description string of a Person object is the entry string that 
appears in the database window. This string is produced by the 
String toString() method of the Person class. The 
boolean hasString (String s) method on the Person 
class is used to test if a person has a given description string.)
 
public void sort (Comparator comp)
Sort the entries of the database according to the comp object. A
Comparator object specifies the ordering of two Person objects via a
boolean lessThan (Person p1, Person p2) method. 
 
public void print()
Display the entries of the database in the stdout window. 
	
public void print(PrintStream ps)
Write the entries of the database to the print stream ps. 
This method is used to save databases to a file. 
	
public String [ ] entryList()
Return an array of description strings for the entries of the database.
The description strings have the same order as the database entries. 

The Database Class Implementation

There are zillions of ways to implement the Database class interface presented above. The implementation in the file Database.java represents a database with n entries as an array (named people) of n Person objects. You should study the code in Database.java to see how each of the above constructors and methods is implemented in terms of this representation. In particular, pay attention to the implementation of the add and remove methods. The add method adds a person p to the end of a database as follows:

The remove method removes the person with description string s as follows:

These operations do a lot of copying work every time an entry is added to or removed from the database. It turns out that loading a database with n entries from a file calls the add method n times. Because of all the copying involved, it can take a long time to load in a large database. We will see in CS230 that the loading process takes time quadratic in the size of the database --- i.e., it is proportional to the square of the size of the database. This is bad; we would like it to take time linear in the size of the database --- i.e., it should take time proportional to the size of the database.

The purpose of this problem is to explore an alternative representation of databases that makes add and remove more efficient.

Problem 1: Object Diagrams

In this problem, you will exhibit your understanding of the representation in Database.java by drawing an object diagram of a simple database. Consider the database constructed by the following code:

Database db = new Database();
Person p1 = new Person("Georgia", "", "Dome", "gdome");
Person p2 = new Person("Foo", "Bar", "Baz", "fbaz");
db.add(p1);
db.add(p2);

Draw a single object diagram showing the state of the database after the execution of all of the above statements. Observe these conventions:

Your diagram should have 3 local variables and 12 objects (the database object, the database array, two person objects, and eight string objects).

Problem 2: An Alternative Database Representation

In this problem, you will implement an alternative database representation that improves the efficiency of the add and remove methods. One way to make the add method more efficient is to start with a database array that is a certain small size (for example 4 elements), instead of zero elements. You can then add entries without increasing the array size until you fill that array. When you reach the limit for that array, you can then create a new array that is double the size of the old array, and copy the old array into the new one (much like the current version of the program does). Thus, when you add the 5th person, you would:

You can then add people to your database without copying until you reach the ninth person to be added, at which time you can double the size of the array again. The strategy of doubling the size of the array rather than incrementing the size of the array every time it is full significantly reduces the number of copy operations that need to be performed in a sequence of add invocations. In fact, the doubling strategy changes the quadratic time load process into a linear time load process.

The remove method can be implemented as follows:

Note that a remove invocation never changes the size of the array in this strategy.

Note that the number of entries in this strategy is no longer equal to the length of the people array. To keep track of the database size, a database object must have an additional instance variable (call it count) that counts the number of entries in the database. The following invariant should be preserved by all the database operations:

To begin this problem, you should follow these steps:

  1. Download the Database folder to your local disk.
  2. Open Database.proj.
  3. Open Database.java by double clicking on the icon in the project window. Rename the class and constructor from Database to DatabaseIncrement. Save the file away as DatabaseIncrement.java.
  4. Open DatabaseDouble.java by double clicking on the icon in the project window. Rename the class and constructor from DatabaseDouble to Database. Save the file away as Database.java.

The file initially named DatabaseDouble is a skeleton for the alternative database representation. The above steps are necessary for replacing the original Database implementation by the alternative Database representation. Note that the original version is still available in DatabaseIncrement.java -- you may wish to refer to this file as you flesh out the skeleton. If you wish to install the original version again, you will have to undo steps 3 and 4.

For this problem, you will only need to accomplish the following tasks:

  1. Implement the Database() constructor. The initial people array should have length 4.
  2. Implement the size() method. You can test this method using the Print button, which prints out the size of the database along with the entries.
  3. Implement the clear() method. This should not change the size of the people array.
  4. Implement the add(Person p) method as outlined above. This should double the size of the people array only when the array is full of valid entries. You may wish to insert a call to System.out.println that indicates every time the array size is doubled. You can test this method via the Add button. You can also test add(Person p) and clear() together using the Load DB File button. This clears the database and adds the entries from the file one by one. You should try loading all the provided .db files.
  5. Implement the remove() method as sketched above. This method should not change the size of the people array. You can test this method via the Remove button.
  6. It is not necessary to modify any other methods other than the ones mentioned above. As part of your problem set, you should include a brief explanation why it is not necessary to modify any of the other methods.

Note: Even on the 60-or-so entry database cs111.db, you may not be able to observe a noticeable speedup with the new implementation. Later this week, we will try to provide a larger file that makes the speedup clear.

Turning in Problem Set #7

For this problem set, we request that you submit both a softcopy and a hardcopy version of your assignment:

Softcopy:

Rename the Database folder to username_Database. Your new code should be in the Database.java file (formerly named DatabaseDouble.java). Upload this folder into the ps7 drop folder by 6pm, Monday, Dec 8.

Hardcopy:

Slip a stapled packet consisting of the following parts under Lyn's office door (SCI E106) by 6pm, Monday, Dec 8. Don't forget to put your name on the packet!

  1. The object diagram from Problem 1.
  2. Problem 2, part 6: A brief explanation of why it is not necessary to modify any methods other than those changed in Problem 2, parts 1--6.
  3. A printout of your updated Database.java file.