You can turn in your assignment up until 10:00am on 3/13/15. You should
hand in both a hardcopy and electronic copy of your solutions. Your hardcopy
submission should include printouts of four code files:
spin.m, lineFit.m, poleVault.m, and
To save paper, you can cut and paste all of your code files into one file, but your
electronic submission should contain the three separate files.
Your electronic submission is described in the
section Uploading your saved work.
assign4_exercisesfolder from the cs112d directory onto your Desktop. This folder contains one file named
rotate.mfor the exercise in this assignment.
When you have completed all of the work for this assignment, your
assign4_exercises folder should contain one additional code file for the exercise,
assign4_problems folder should contain three additional
lineFit.m, poleVault.m, and
virus.m. Use Fetch or WinSCP
to connect to your personal account on the CS file server and navigate to your
cs112/drop/assign4 folder. Drag your
assign4_problems folders to this drop folder. More details about this process
can be found on the webpage on Managing Assignment Work.
In this exercise, you'll write a function called
spin.m that will spin a set of coordinates around in a
circle. You are provided with a function called
rotate.m (in the
that has three inputs: 1) a vector of x coordinates, 2) a vector of y
coordinates, and 3) the angle in degrees to rotate those coordinates.
First, make sure you understand how
rotate works, because your
spin will rely upon
xsquare = [3 8 8 3 3]; ysquare = [3 3 8 8 3]; plot (xsquare, y square);
The following code (the code to generate the yellow axis is not shown here) produced the green, red and black rotated squares at left:
rotate always takes three inputs and returns
two output vectors.
Note that the examples above rotate a square, yet
rotate can rotate any set of x and y coordinates.
spinfunction will take in three inputs: 1) a vector of x coordinates, 2) a vector of y coordinates, and 3) the number of times to repeat the coordinates in the design.
spinwill create a design in MATLAB's figure window. The steps below will incrementally build your
spinfunction. You need only turn in the final version of
spin. In the examples below, the same x and y coordinates are used for the square as above. The flower petal coordinates are as follows:
xpetal = [0 2 8 6 0]; ypetal = [0 4 5 2 0];
spintakes only two inputs: the x and the y coordinate vectors. This version of
spinwill always produce 8 sets of rotated coordinates and plot them in the default blue color.
spinso that there is a third input, namely, the number of rotations to be plotted. Your edited
spinshould plot the user-specified number of rotations of the x and y coordinates, as in the figures below.
spinproduces a user-specified number of rotations of the given x and y coordinates in multiple colors. The examples below cycle through the available colors and show 50 rotations of each set of coordinates.
spinthat you submit: the function takes three inputs (x coordinates, y coordinates, and the number of rotations to be plotted) and produces one colorful figure.
One true test of any scientific theory is whether or not it can be used to make accurate predictions. Given some data that captures the relationship between two or more variables, we can try to formulate a mathematical model that summarizes this relationship. If the model is valid, it can be used to predict the relationship between the variables in cases not explicitly given in the original data.
In some cases, variables may have a simple linear relationship, such as in
the forearm and hand data that we used to test the existence of the Golden Ratio:
The line drawn through the points is the best fit line for the data, also referred to as the regression line. MATLAB provides functions for fitting lines and other curves to data, but you'll instead write your own function to compute the regression line for a set of data, and tailor the information that is returned. You will then apply this analysis to data on the achievements of olympic pole vaulters in the summer olympics, from 1896 to 2004. Finally, you will analyze the pole vaulting data using the Curve Fitting Toolbox that you explored in lab.
Computing a regression line
A nice introduction to the computation of regression lines is provided online at this Finite Mathematics & Applied Calculus resource developed by Stefan Waner and Steven Costenoble at Hofstra University.
Given the (x,y) coordinates for a set of n points (x1,y1), (x2,y2), ... (xn,yn), the best fit line associated with these points has the form
y = mx + b
slope m = (n(Σxy) - (Σx)(Σy)) / (n(Σx2) - (Σx)2)
intercept b = (Σy - m(Σx)) / n
The Σ means "sum of", so
Σx = sum of x coordinates = x1 +
x2 + ... + xn
Σy = sum of y coordinates = y1 + y2 + ... + yn
Σxy = sum of xy products = x1y1 + x2y2 + ... + xnyn
Σx2 = sum of squares of x coordinates = x12 + x22 + ... + yn2
When performing linear regression, it is valuable to know how well the line actually
fits the data. Two measures used to assess the quality of fit are the correlation
coefficient and the size of the residuals that capture the difference between
the actual data values and the values predicted by the regression line. The correlation
coefficient, also described in the Waner and Costenoble online chapter, is a number
r between -1 and 1 calculated as follows:
coefficient r = (n(Σxy) - (Σx)(Σy)) / [n(Σx2) - (Σx)2]0.5[n(Σy2) - (Σy)2]0.5
A better fit corresponds to a value of
r whose magnitude is closer to 1,
while a worse fit yields a value of
r closer to 0. The residuals are the
discrepancies between the actual data (actual
y values) and those predicted by the
best fit line (the values
b). A rough estimate of the average size of the
residuals is given by the RMS error between these two quantities:
average residual RMS = ((Σ(y - (mx + b))2) / n)0.5
Implementing linear regression
Write a function named
lineFit that has two inputs that are vectors containing
the x and y coordinates of a set of points. This function should return four values, all obtained using
the above calculations: the (1) slope
m and (2) intercept
b of the best fit line,
the (3) correlation coefficient and (4) average residual. Test your function with a small number
of points that you create. You can check your results for the best fit line against
those obtained with the MATLAB
polyfit function, which returns a vector
>> lineMB = polyfit(xcoords, ycoords, 1)
Note: you do not need to use any loops (
for statements) in your
lineFit function - all of the calculations can be done by performing
arithmetic operations on the entire vectors of x and y coordinates all at once. This
problem primarily provides practice with writing a function with multiple inputs and
outputs, and more experience with curve fitting.
function [sumV diffV prodV divV] = compute(vect1, vect2) % [sumV diffV prodV divV] = compute(vect1, vect2) % computes the element-by-element sum, difference, product and division % of the values in two input vectors and returns the four results sumV = vect1 + vect2; diffV = vect1 - vect2; prodV = vect1 .* vect2; divV = vect1 ./ vect2;
The future of olympic pole vaulting
From the time the summer olympics began in 1896, until 2004, pole vaulters have been
achieving heights that
have increased in a roughly linear fashion (heights are given in inches):
assign4_problems folder, there is a MAT-file
that contains two variables
heights that store the above data.
Write a script file named
poleVault.m that performs the following actions:
scatterfunction to create a scatter plot:
Check the MATLAB help pages for properties that can be used to change the appearance of the dots, and incorporate some of these properties into your scatter plot.
plotfunction (remember that you only need two points to draw a line!)
In a comment at the end of the
poleVault.m script, write the predicted
year that is printed by your script, and also comment on the reasonableness of the model.
Using the curve fitting tool
After running your
poleVault.m script, the two variables
heights will be stored in your Workspace. Open the curve fitting tool with the
cftool function and create a data set with
years as the X data and
heights as the Y data. A linear polynomial fit to this data should yield a line
similar to what you obtained with your
linefit function. A better model of
pole vaulting performance, though, would use a function that reaches a plateau as
the year increases. One example is a logarithmic function. To obtain a fit to a log function,
Custom Equation for the type of fit.
A default exponential equation will appear in the
equation box. Replace this expression with the following general logarithmic expression:
a * log(b*(x-1895))
The logarithmic function does not
fit the past data as tightly, but probably has better predictive capability for the future.
Use both the logarithmic curve fit and a linear curve fit to predict pole vaulting heights
for the year 3000. Record this information in your
poleVault.m script file
and comment on which fit appears to yield a more reasonable prediction.
Imagine that the Wellesley College community has been quarantined due to a sudden outbreak of an annoying stomach virus. How quickly can this virus spread through the population, and will it eventually die out? This depends on factors such as the ease with which the virus is passed from one individual to another, the time it takes to recover from the virus, and the time during which an individual remains immune to the virus after recovering. One of the simplest models of the spread of disease was developed by W. O. Kermack and A. G. McKendrick in 1927 and is known as the SIR model. Its name is derived from the three populations it considers: Susceptibles (S) have no immunity from the disease, Infecteds (I) have the disease and can spread it to others, and Recovereds (R) have recovered from the disease and are immune to it.
In this problem, you will model the spread of a virus over time, through a population that is represented on a two-dimensional grid. Suppose that each cell on a 100x100 grid is an individual in a group of 10,000 people. Each individual can be susceptible, infectious or immune. Assume that the infection lasts two days and immunity lasts 5 days before the individual becomes susceptible again. The state of this virus in the population can be stored in a 100x100 matrix that contains values from 0 to 7 representing the following conditions:
virus.m code file in the
assign5_problems folder contains
partial code for a function named
virus that simulates the spread of
the virus through the population, over a period of days.
virus function has two inputs: the number of days to run the simulation
and the probability that a susceptible individual will become infected if one of their
neighbors is infected. The 100x100 matrix
grid1 is filled with the initial
state of the simulation on day 1, in which all of the individuals are susceptible, except
for a single infected individual in the center of the grid. The function
displayGrid, also provided in the
displays the current grid as a color image. Susceptible individuals are shown in blue,
infected individuals are bright red on the first day and dark red on the second, and
recovered individuals are shown in shades of green from bright (value 3, first day of
recovery) to dark (value 7, fifth day after recovery). The code in the
virus function displays the initial grid on
Your task is to complete the
virus function to simulate the spread of the
virus, starting with day 2 and continuing over the input number of days. A second 100x100 matrix
grid2 is created in the initial code, to assist with the simulation. For
each new day, assume that the current state of the virus is stored in
Loop through all of the locations of
grid1 (individuals in the population),
compute the new value for each individual and store it in the same location of
For example, if the value in
grid1 at a particular location is 0 (susceptible
individual) and one of the neighbors (above, below, left or right) is infected (i.e. has the
value 1 or 2), then the input probability
probInfect should be used to
determine whether the value stored at this location in
grid2 should be a 0
(individual remains susceptible) or 1 (individual becomes infected). For each of the other
values stored in
grid1, from 1 to 7, think about what value should be stored in
grid2 reflecting the state of the individual on the next day. (Remember that
immunity from the virus lasts for only 5 days after recovery, before the individual
becomes susceptible again.) Display the new grid with the
After displaying the new grid, add a short pause to your code so that the new grid stays visible
for a short amount of time. In this way, the spreading virus will appear as an animation. The
built-in MATLAB function
pause() has a single input that is the number of seconds
to pause, which can be a fraction. For example, the statement
pause(0.1) will cause
MATLAB to pause for one tenth of a second before continuing the execution of your code.
After each day of the simulation is complete, copy the contents of
grid1 in preparation for the next day of the simulation.
To simplify the task of checking whether a neighbor is infected, assume that there is a border
of individuals around the grid (rows 1 and 100, and columns 1 and 100) that remain susceptible
(value 0) throughout the simulation. When looping through
grid1, you only need
to consider individuals in the rows and columns 2 through 99. Finally, the
function can be used to determine whether a susceptible individual with an infected
neighbor, becomes infected themselves. The function call
rand(1) returns a
single random number between 0.0 and 1.0. Suppose we want to simulate the flipping
of a coin that is biased towards heads so that on average, 60% of the flips come up heads.
The following loop simulates 100 flips of this coin using a probability of 0.6:
for flips = 1:100 if (rand(1) <= 0.6) disp('heads'); else disp('tails'); end end
An analogous strategy can be used to determine whether a susceptible individual becomes infected.
The following four pictures show the state of the virus for the first four days of a sample
simulation. Results are shown for a
5x5 section around the center of the
grid. On day 1, only the individual at the center is infected. The simulation used a
probability of infection of 0.5, and on day 2, two neighbors of the center individual
become infected. On day 3, three more individuals become infected, and on day 4, an
additional four individuals become infected. Note that once an individual is infected,
the value at that location increments as each day passes.
The following picture shows an example of the state of the virus for the entire grid, after 60 days:
>> virus(60, 0.5);
Run simulations for a few different values of the probablity (for example, 0.25, 0.5, 0.75, 1.0) and add comments to the
virus.m code file about what you observe.
You now have a colorful and dynamic visualization of a spreading virus. Suppose you then want
to quantify the number of individuals who are either susceptible, infected, or immune to the
virus on each day. Add code to your
virus() function to create three vectors to store
the number of susceptible individuals (value 0 in the grid), infected individuals (values 1 or 2
in the grid) or immune individuals (values 3-7 in the grid). As your simulation calculates the
state of the virus on each day, count the number of individuals in each category (you should not
need a loop for this calculation!) and store it in the appropriate vector. You can assume that
there is a total of 10,000 individuals in the population (one for each grid location). At the end
virus() function, plot the three populations on a single graph, with different
colors (an example is shown below).
With only one infected individual on the first day, these curves are not very interesting. The
final step in the construction of your program is to add the ability to specify an arbitrary
number of individuals who are infected on the first day of the simulation, and place them at
randomly selected locations of the grid. First, add a third input to the
function that is used to specify the number of individuals who are infected on the first day.
Make this an optional input. If the user does not specify a value for this input, your
function should place a single infected individual at the center of
grid1 at the
outset (the code currently does this). However, if the user specifies a value for this input,
place this number of 1's in
grid1 at the outset. The row and column number for each
1 (infected individual) should be a random integer in the range between 2 and 99 (omit the outer
rows and columns of the grid). The built-in function
randi(imax) returns a random
integer in the range from 1 to
imax, and can be helpful here. (Note that the
randi() function is a relatively new addition to MATLAB and is not available in
The following figure is a sample plot generated by the following function call:
>> virus(100, 0.5, 10);
Run your simulation for 100 days, using a couple different values of the
input, combined with a couple different values for the number of individuals who are initially
infected. Add comments to your code describing how the graphs of the susceptible, infected,
and immune individuals changed with different input parameters.