CS230 Data Structures, Wellesley College

Lab 11

Linux/Emacs Tip(s) of the Day

  1. spell filename.txt [linux] lists misspelled words in filename.txt
  2. M-x ispell-buffer [emacs] allows you to step through each mispelled word in your file and offers substitute words.

 

Spell Checker

In this lab we will implement a program to do spell checking of text documents against a dictionary of words. Sume peepel reelie nead a spel czequer; sume peepel du knop.

The Hash Table is a very good data structure to hold a dictionary of words against which documents will be checked, because it provides for an efficient way to look elements up in it. (How efficient?)

Once the dictionary is loaded into a hash table, the program will be reading consecutive words from the document to be checked. Each word will be looked up in the dictionary. If a word is not found in the dictionary, the word is considered a spelling error.

The dictionary will be read from a text file (on disk) into a hash table (in memory). Each word in the file serves as a key as we insert entries into the hash table. In this application, the object usually associated with the key is of no importance to us, so in your implementation it can be whatever you choose. For example, it can be the same object as the key itself, or any other non-nul object.

We will structure this application in such a way that it is easy to replace the hash table holding the dictionary with some other suitable data structure in the future. To do so, separate the task of creating the dictionary from the task of spell checking the document. Implement the first task in a class named HashDictionary.java and the second task in a class named Speller.java. The Speller class should require minimal changes when a dictionary other than the HashDictionary is used.

The CheckSpelling directory in the cs230/download, on the cs server, provides a starting point for your work. This folder contains the following files:

  1. Dictionary.java,
  2. HashDictionary.java,
  3. EnglishWords.txt,
  4. Speller.java,
  5. GreenEggs.txt (and other text files)

Download all of them to your own working directory.

Task 1

Can you think of any other ADTs suitable for holding the dictionary of words in this application? What are some advantages and disadvantages of each one?

Task 2a

Fill in the implementation of the code file HashDictionary.java, a skeleton of which you have downloaded in your local directory.

In particular, the HashDictionary.java should implement the Dictionary Interface, specified in the Dictionary.java file. This program will read the file EnglishWords.txt into a Hashtable. Use Java's Hashtable implementation in this assignment.
Add a main method to your file to test your code, and make sure it works well, before you proceed to the next step.

Task 2b

Fill in the implementation of the Speller.java file, the skeleton of which you have downloaded in your local directory. In particular, provide a method, checkSpelling(), which takes two arguments: a Dictionary and the name of a text file to be spell checked. As spell-checking proceeds, any misspelled words should be added to some kind of container, and printed on the standard output at the end.

As before, write a main method, that creates a dictionary and a speller, and spell check a few text documents (there are a couple provided for you, even better, make up or use your own documents). Some of the documents available: GreenEggs.txt, sonnet116.txt, AllYouNeedIsLove.txt, DeclarationIndependence.txt, YouAreTheMusicInMe.txt. (Note that our spell checker has a hard time handling punctuation. Many of our sample documents highlight flaws in our spell checker, just food for thought for future versions of your spell checker...)

Sample output:

[cs230@puma] java Speller EnglishWords.txt GreenEggs.txt
Dictionary EnglishWords.txt loaded containing 45427 words
Document GreenEggs.txt contains 0 misspelled words:
[]
[cs230@puma] java Speller EnglishWords.txt sonnet116.txt
Dictionary EnglishWords.txt loaded containing 45427 words
Document sonnet116.txt contains 22 misspelled words:
[impediments., finds,, remove:, O, no!, ever-fixed, tempests, shaken;, bark,, worth's, unknown,, taken., Love's, Time's, fool,, sickle's, come:, weeks,, doom., proved,, writ,, loved.]

[cs230@puma] java Speller EnglishWords.txt AllYouNeedIsLove.txt
Dictionary EnglishWords.txt loaded containing 45427 words
Document AllYouNeedIsLove.txt contains 16 misspelled words:
[There's, can't, It's, It's, isn't, you're, It's, (Paul:, now!), , (Everybody!), (love, need), Yee-hai!, yeah!, yeah]