CS 112

Assignment 2

Due: Friday, February 20

You can turn in your assignment up until 5:00pm on 2/20/15. You should hand in both a hardcopy and electronic copy of your solutions. Your hardcopy submission should include printouts of four code files: redsox.m, colonTest.m, smartPhones.m and analyzeData.m. To save paper, you can cut and paste all of your code files into one script, but your electronic submission should contain the separate files. Your electronic submission is described in the section Uploading your completed work.

Reading

The following material from the fourth edition of the text is especially useful to review for this assignment: pages 42-43, 174-186. You should also review notes and examples from Lectures #4 and #5 and Lab #3.

Getting Started: Download assign2 folders from cs112d

Use Fetch or WinSCP to connect to the CS server using the cs112d account and download the assign2_exercises folder from the cs112d directory onto your Desktop. This folder contains two code files for the exercises in this assignment, redsox.m and colonTest.m. In MATLAB, set the Current Directory to the assign2_exercises folder on your Desktop.

When starting the problems on this assignment, download the assign2_problems folder from the cs112d directory onto your Desktop. This folder contains two code files, smartPhones.m and analyzeData.m, and a data file named depthData.mat.

Uploading your completed work

When you have completed all of the work for this assignment, your assign2_exercises folder should contain two code files named redsox.m and colonTest.m. Your assign2_problems folder should contain two code files named smartPhones.m and analyzeData.m. Use Fetch or WinSCP to connect to your personal account on the server and navigate to your cs112/drop/assign2 folder. Drag your assign2_exercises and assign2_problems folders to this drop folder. More details about this process can be found on the webpage on Managing Assignment Work.

Exercise 1: The Red Sox roster

 

                                      

In this exercise, you'll work with the following subset of data from the Boston Red Sox baseball team roster from the 2007 (World Series winning!) season:

Player Name Player Number Weight At Bats Home Runs Batting Average 2007 Salary Runs Batted In Runs Stolen Bases
Jason Varitek 33 230 435 17 .255 11,000,000 68 57 1
David Ortiz 34 230 549 35 .332 13,250,000 117 116 3
Manny Ramirez 24 200 483 20 .296 17,016,381 88 84 0
J.D. Drew 7 200 466 11 .270 14,400,000 64 84 4
Mike Lowell 25 210 589 21 .324 9,000,000 120 79 3
Julio Lugo 23 175 570 8 .237 8,250,000 73 71 33
Kevin Youkilis 20 220 528 16 .288 424,500 83 85 4
Coco Crisp 10 180 526 6 .268 3,833,333 60 85 28
Dustin Pedroia 15 180 520 8 .317 380,000 50 86 7

Generating some Red Sox statistics:

The file redsox.m in the assign2_exercises folder creates nine separate vectors, one for each numerical statistic. Look closely at the code in redsox.m and note that the names are stored in a cell array called names. Think of a cell array as a special kind of vector that allows us to store strings. Note that names is created using curly braces { } rather than the square brackets [ ] that we use for numerical vectors. Although a cell array is created using curly braces, you do not need to use curly braces to access its contents. For example, here is a clip of MATLAB code accessing the contents of names:

>> powerHitter = names(homeRuns > 20)
powerHitter =
     'Ortiz'   'Lowell'

Note that you can place a semi-colon at the end of the initial assignment statements in redsox.m to suppress the printout of the statistics.

In the exercise below, you may use MATLAB's   mean, sum, length, and any.

Write MATLAB code to do the following:

  1. Create a variable totalStolen with the total number of stolen bases in 2007.
  2. Create a variable avgWt with the average weight of a Red Sox player.
  3. Create a variable bestBatters with the name(s) of player(s) whose batting average is greater than or equal to 0.300.
  4. Create a variable expensiveHomer that is true if any player costs more than $500,000 per homerun.
  5. Create a variable bigHitter that is true if any players hit more than 10 homeruns with batting averages less than .290.
  6. Create a vector highRBI with the name(s) of player(s) whose Runs Batted In is greater than or equal to Runs.
  7. Create a variable bigBatter that contains the number(s) of the player(s) with more than 550 at bats or more than 20 stolen bases.
  8. Create a variable weightInGold that contains the number of players who were paid more than $25,000 per pound of body weight in the 2007 season.

Add comments to your code so that it is clear and easy to read. Also add comments at the top of each file with the names of you and your partner, and the date. Save your final version of redsox.m in your assign2_exercises folder to upload to the CS server.

Exercise 2: Indexing with colon notation

In lecture you learned how to use colon notation to specify a sequence of regularly spaced numbers. You also learned how to use indexing to read and store values in specific locations of a vector. This exercise combines these two concepts. Colon notation can be used to specify an evenly spaced sequence of vector indices or contents, as shown in the following examples:

>> nums = 1:8
nums = 
  1   2   3   4   5   6   7    8
>> nums(1:2:5) = 10
nums = 
  10   2   10    4   10   6   7   8
>> nums(2:3:8) = [13 9 16]
nums =
  10   13   10   4   9    6   7   16
>> nums2 = nums([1:3 6:8])
nums2 =
  10   13   10   6   7   16
>> nums([1:3 6:8]) = 12:-2:2
nums =
12   10   8   4   9   6   4   2

The following program, colonTest.m is contained in your assign2_exercises folder. Follow the instructions in the comments to rewrite the existing code statements and add five additional statements that use colon notation:

% colonTest.m
% program that provides practice with colon notation and indexing

% rewrite each of the next 4 statements using colon notation
nums1 = [10 9 8 7 6 5 4 3 2 1]
nums2 = nums1([2 4 6 8 10 7 4 1])
nums1([3 4 5 6]) = [9 6 3 0]
nums3 = [1 2 3 1 2 3 1 2 3]

% replace the next 3 statements with a single assignment statement
% that uses colon notation
nums2(6) = 10
nums2(7) = 20
nums2(8) = 30

% for each of the following examples, use "end" in the colon
% notation, for example: nums8 = nums1(3:end)

% write a statement that assigns nums4 to a vector that contains
% the odd-indexed elements of nums1

% write a statement that assigns nums5 to a vector of the
% elements contained in the top half (higher indices) of nums2

% write a statement that assigns nums6 to a vector that contains
% every 3rd element of nums1, starting with index 2

% write a statement that places the value 0 in all of the 
% evenly indexed locations of nums2

% write a statement that places the numbers 8 12 16 20 in the 
% successive odd-indexed elements of nums2

If you write each code statement with no semi-colon at the end, so that the value generated is printed out during execution of the code, then your program should generate the following printout:

>> colonTest
nums1 =
  10  9  8  7  6  5  4  3  2  1
nums2 =
   9  7  5  3  1  4  7  10
nums1 =
  10  9  9  6  3  0  4  3  2  1
nums3 =
   1  2  3  1  2  3  1  2  3
nums2 =
   9  7  5  3  1  10  20  30
nums4 =
  10  9  3  4  2
nums5 =
   1  10  20  30
nums6 =
   9  3  3
nums2 =
   9  0  5  0  1  0  20  0
nums2 =
   8  0  12  0  16  0  20  0

Place comments at the top of your file with the names of you and your partner, and the date. Your final submission should include a copy of your final colonTest.m code file.

Problem 1: Smartphones

 

                                      

With her curiosity piqued by a recent smartphone market forecast by International Data Corporation, and article on the use of smartphones by healthcare professionals, Wendy Wellesley decided to collect some data on smartphone preferences and uses among Wellesley students. Wendy conducted a survey of smartphone owners with the following questions:

The file smartPhones.m in the assign2_problems folder contains Wendy's survey data. The file creates two vectors, currentPhones and newPhones that contain the integers 1-4 indicating the smartphone brand owned and desired by each of the 150 students who completed the survey. The file also creates six vectors that each contain the number of minutes per day spent on each smartphone activity, for each survey participant.

Add code to the smartPhones.m code file to perform the following tasks:

To complete these tasks, consider the following background, tips and guidelines:

Your final submission should include a copy of your final smartPhones.m code file. Be sure to add comments at the top of the file with the name(s) of the authors(s), and any collaborators other than a partner.

Problem 2: Cleaning up the data

Unreliable measurement instruments or unpredictable environments can sometimes yield data that is clearly erroneous. To obtain a reliable assessment of simple properties like the mean value of the data, it may be desirable to remove data samples that are clearly outside the expected range. Such samples are sometimes referred to as outliers. An advantage to analyzing data in MATLAB, with its general programming language, is that we can easily write a program to preprocess the data in a customized way. In this problem, you will complete a program that removes outlying data samples, using the mean and standard deviation of the data.

Imagine that you collected sonar data on the depth of the ocean floor over a large region that is essentially flat. Due to instrument problems and the occasional large marine animal, some measurements are clearly invalid. For simplicity, assume that all of the erroneous measurements are underestimates of the true depth of the ocean floor. The file analyzeData.m in your assign2_problems folder uses the load command to load 1000 depth measurements from a file named depthData.mat into a vector named depthData, and creates a plot of this initial data:

Most of the data is at a depth of around 10,000 feet. The erroneous data samples appear as downward spikes in the data, at depths that are significantly less than 10,000 feet. One principled way to go about removing outlying data is to remove samples whose value is far from the mean value, using the standard deviation to determine the range of values to remove. The standard deviation captures how spread out the data values are, and is given by the following formula:

N is the number of samples in the data, vi is the ith data sample, and v is the mean value of the data. If the distribution of the data follows a bell-shaped curve (which is not really the case here), almost all of the data should lie within three standard deviations of the mean value.

For the ocean floor depth data, we could just remove all samples that are more than three standard deviations away from the mean depth value. A problem with this strategy is that an initial calculation of the mean and standard deviation of all of the data will be biased by the presence of the outlying data samples. Thus, we will instead use a more conservative approach that removes data in two stages, as described in the following steps, which also print, display and save the data:

  1. calculate and print the mean and standard deviation of the original data
  2. modify the data so that samples whose value is more than 4 standard deviations away from the mean value are removed
  3. calculate and print the mean and standard deviation of the modified data, and plot the modified data
  4. ask the user if she would like to remove more outliers from the data. If so,
    • 4a. modify the data again so that samples whose value is more than 3 standard deviations away from the mean are removed, using the newly calculated mean and standard deviation from step 3
    • 4b. calculate and print the mean and standard deviation of the newly modified data, and plot this new data
  5. save the final modified data and its mean and standard deviation in a file named newData.mat

Expand the analyzeData.m code file to perform the above steps. When completing this program, keep in mind the following background, tips and guidelines:

Your final submission should include a copy of your final analyzeData.m code file. Again, be sure to add comments at the top of the file with the name(s) of the authors(s), and any collaborators other than a partner.