CS230 Data Structures, Wellesley College
spell filename.txt [linux] lists misspelled words in filename.txt
M-x ispell-buffer [emacs] allows you to step through each
mispelled word in your file and offers substitute words.
Spell Checker The Hash Table is a very good data structure to hold a dictionary of words against which documents will be checked, because it provides for an efficient way to look elements up in it. (How efficient?)
Once the dictionary is loaded into a hash table, the program will be reading consecutive words from the document to be checked. Each word will be looked up in the dictionary. If a word is not found in the dictionary, the word is considered a spelling error.
The dictionary will be read from a text file (on disk) into a hash table (in memory). Each word in the file serves as a key as we insert entries into the hash table. In this application, the object usually associated with the key is of no importance to us, so in your implementation it can be whatever you choose. For example, it can be the same object as the key itself, or any other non-nul object.
We will structure this application in such a way that it is easy to
replace the hash table holding the dictionary with some other suitable
data structure in the future. To do so, separate the task of creating
the dictionary from the task of spell checking the document. Implement
the first task in a class named HashDictionary.java and
the second task in a class named Speller.java. The
Speller class should require minimal changes when a dictionary other
than the HashDictionary is used.
The CheckSpelling directory in the cs230/download,
on the cs server, provides a starting point for your work. This folder contains the following files:
Dictionary.java,
HashDictionary.java,
EnglishWords.txt,
Speller.java,
GreenEggs.txt (and other text files)
Download all of them to your own working directory.
Dictionary.java contains the interface of a
Dictionary. Any class that defines a dictionary should
implement this interface. In particular, any such class should
provide a way to add a word to the dictionary, and search the
dictionary for a given word. Our HashDictionary class
will, indeed, be no exception. Using this interface, the client code
of any dictionary implementation, like the Speller in our
application, (see further down) becomes independent of the specifics
in each such implementation. EnglishWords.txt contains about 25,000 words that make up
the dictionary we will use today. Notice that it contains some, but
not all variations of a given word (e.g. plural, -ing or
-ed forms of the words). Anything not in our dictionary is
considered a misspelling. This is obviously not ideal, but this is a
preliminary spell-check program. HashDictionary.java will provide the definition of a
dictionary based on a Hash Table.
Speller.java code file will provide the code for the actual check spelling of a given text file.
GreenEggs.txt file contains English text to be spell checked. Of course, feel free to create your
own text files for testing purposes!
HashDictionary.java, a
skeleton of which you have downloaded in your local directory.
In particular, the HashDictionary.java should implement the Dictionary Interface,
specified in the Dictionary.java file. This program will read the file EnglishWords.txt
into a Hashtable. Use Java's Hashtable implementation in this assignment.
Add a main method to your file to test your code, and make sure it works well, before you proceed to the next
step.
Speller.java file, the skeleton of which you have downloaded in your
local directory. In particular, provide a method, checkSpelling(), which takes two arguments:
a Dictionary and the name of a text file to be spell checked. As spell-checking proceeds, any misspelled words
should be added to some kind of container, and printed on the standard output at the end.
As before, write a main method, that creates a dictionary
and a speller, and spell check a few text documents (there are a
couple provided for you, even better, make up or use your own
documents). Some of the documents available: GreenEggs.txt,
sonnet116.txt, AllYouNeedIsLove.txt, DeclarationIndependence.txt,
YouAreTheMusicInMe.txt. (Note that our spell checker has a hard
time handling punctuation. Many of our sample documents highlight flaws in our spell checker, just food for thought for future versions of your spell checker...)
Sample output:
[cs230@puma] java Speller EnglishWords.txt GreenEggs.txt
Dictionary EnglishWords.txt loaded containing 45427 words
Document GreenEggs.txt contains 0 misspelled words:
[]
[cs230@puma] java Speller EnglishWords.txt sonnet116.txt
Dictionary EnglishWords.txt loaded containing 45427 words
Document sonnet116.txt contains 22 misspelled words:
[impediments., finds,, remove:, O, no!, ever-fixed, tempests, shaken;, bark,, worth's, unknown,, taken., Love's, Time's, fool,, sickle's, come:, weeks,, doom., proved,, writ,, loved.]
[cs230@puma] java Speller EnglishWords.txt AllYouNeedIsLove.txt
Dictionary EnglishWords.txt loaded containing 45427 words
Document AllYouNeedIsLove.txt contains 16 misspelled words:
[There's, can't, It's, It's, isn't, you're, It's, (Paul:, now!), , (Everybody!), (love, need), Yee-hai!, yeah!, yeah]