starting foreach CS230: Data Structures

CS230

 

Goals:

Practice with:

  • Manipulating arrays of Strings.
  • Using looping constructs for processing array elements
  • Practicing with testing and debugging programs using arrays
  • Basic IO (File reading)
 

Exercise: A Histogram of data read from a file

In this exercise you will experiment with creating arrays and using them to process String values read from a file. These Strings are considered in groups depending on their length, and the frequency of each group is computed. Study the printing example below.

Aim for a final output that resembles the following example:

Printing the Histogram from file StringData.txt
Initial Data (13 Strings):
Happy Word Hello Visualization project songs Calendar Five One journalistically Total Representation Honor 
****** Results: ******* 
range 0-4 (3 Strings in this bucket)    |+++
range 5-9 (7 Strings in this bucket)    |+++++++
range 10-14 (2 Strings in this bucket)  |++
range 15-19 (1 Strings in this bucket)  |+
range 20-24 (0 Strings in this bucket)  |
  • Note that the number of plus signs (+) in each line indicates the number of Strings from the input data whose lengths are found to belong in that range. E.g., The given file contained 4 Strings of length in the range 0-4, so 4 + signs are printed in that line.

Specifications

Write a class called HistogramIO.java with the following design:

instance variables

  1. MAX_INT: You can assume that each String in the input has length in the range 0 - MAX_INT.
  2. size: the number of Strings in the input.
  3. An array used to hold all the Strings in the input.
  4. numOfBuckets: an integer to indicate the number of ranges/buckets we will use in the application. The example above used buckets of range 5, and there were 5 buckets. The value of this instance variable should be computed based on other input information. The range (or size) of the buckets should be fixed in your program.
  5. frequencies: an array of integers which will hold the frequencies of each range/bucket. In the above printout for example, the bucket/range [5-9] has frequency of 7, since 7 Strings in the input found to belong there.

constructor and instance methods

Write the following methods:

  1. a constructor: It takes as input the name of the file where input data will be read from, reads the input, creates and initializes the instance variables. (See "Format of the input file" below for details on the format of the input file.)
  2. the toString() method: Use the output provided earlier in this page to guide you about implementing this method. Of course, you are encouraged to start with a more basic version of it, and improve it little by little.
  3. a main() method to contain your testing. Create 2 other text files according to the specifications outlined above. (Do NOT use Microsoft Word for this, as it may insert invisible characters which can break your program. Atom or other plain text applications work fine.) Save them in the folder that contains your java file. Use the .txt files to debug and test your program. You will submit these files along with your code and your testing transcript.
  4. the computeFrequencies() method, that does exactly that: Counts the number of Strings in the input that belong to each of the buckets/ranges.

While you develop and debug your program, you can have some helper printing methods available. For example, having a method to print the frequencies array was useful to us.

Notes

  • No use of Java's Arrays package is permitted in this exercise.
  • Format of the input file: The first line of the input file shows the size of the input, i.e, the number of Strings to be processed. The second line contains the MAX_INT to be processed. After that, size-many lines follow, each containing one String: that's your input. Once more, in this basic version, you can assume that the input consists of valid Strings, of lengths in the range 0 to MAX_INT. Make sure you state these, and any other assumptions, in your comments.
  • Here is the file that created the printout above: StringData.txt
  • The main() method should not be too long. It should be directing your algorithm using private helper methods that are called in the main.
  • Use proper javadoc to document your work.
  • In your comments make a note to inform us whether the program produces the expected results. If it doesn't, let us know where you think things seem to have gone wrong.

What to submit

Your Gradescope submission should contain the following:

  1. your HistogramIO.java file
  2. your HistogramIO.txt file that contains your testing results
  3. The two extra testing files you created above.

Good luck!!