Trent D. Buskirk *and
Linda J. Young**
*Department of Mathematics and Statistics
University of Nebraska-Lincoln
Lincoln, NE 68588-0323
**Department of Biometry
University of Nebraska-Lincoln
Lincoln, NE 68583-0712
Statistics Teaching and
Resource Library, August 29, 2001
© 2001 by
Trent D. Buskirk
and Linda J. Young,
all rights reserved. This text may be freely shared among
individuals, but it may not be republished in any medium without express
written consent from the author and advance notification of the editor.
This activity is an advanced version of
the “Keep your eyes on the ball” activity by Bereska, et al. (1999).
Students should gain experience with differentiating between independent
and dependent variables, using linear regression to describe the
relationship between these variables, and drawing inference about the
parameters of the population regression line. Each group of students
collects data on the rebound heights of a ball dropped multiple times from
each of several different heights. By plotting the data, students quickly
recognize the linear relationship. After obtaining the least squares
estimate of the population regression line, students can set confidence
intervals or test hypotheses on the parameters. Predictions of rebound
length can be made for new values of the drop height as well. Data from
different groups can be used to test for equality of the intercepts and
slopes. By focusing on a particular drop height and multiple types of
balls, one can also introduce the concept of analysis of variance.
Key words: Linear regression, independent variable, dependent variables,
analysis of variance
Materials
Each group of 3-5 students needs a
measuring device (preferably a tape measure), a ball that will rebound
when dropped, and graph paper. A pool of balls, such as super balls,
tennis balls, racquetballs, basketballs, and soccer balls, should be
available. It is better to have more balls available than groups as
students always like to have a choice! Optional materials are chalk,
post-it notes, and a measuring stick.
Time
A class period of 50 or 75 minutes is
sufficient for collecting the data, plotting the data, and estimating the
regression line. Additional class periods could be used to complete
further analyses, such as setting confidence intervals on the parameters
or testing equality of the regression lines obtained by different groups.
Objective
The objective of this activity is to
estimate the population regression line relating the rebound height of a
ball to the height from which it is dropped and to draw inferences using
the fitted regression line. A variation of this activity allows students
to use the analysis of variance to determine whether there is a difference
in the mean rebound height of different balls dropped from a common
height.
Description of Activity
This advanced version of the “Keep you
eyes on the ball” activity by Bereska, et al. (1999) offers students an
opportunity to explore the relationship between a ball’s rebound height
and the height from which it is initially dropped. By setting their own
drop heights and by collecting their own data, groups will gain experience
with independent and dependent variables. Students will also use linear
regression to draw inferences and to make predictions based on their
fitted lines. For this activity, rebound height is defined to be the
highest level of ascent that the ball makes after its impact with the
floor.
To collect the regression data each group should drop its selected ball
from each of ten heights five times. These numbers can be varied according
to course time constraints. Students should determine the (ten) drop
heights for the ball that their group has selected (one ball should be
used per group). During a 50-minute class period, for instance, students
may drop a basketball five times at each of ten heights. Actual student
data are included after the prototype activity in the
Example Student Output
section.
To better understand the nature of the relationship between the drop and
rebound heights, students should first plot their data. On this plot, the
students should be able to see that a line is the best descriptor of this
relationship. Students should also be able to identify outliers on this
plot. Once identified the group should be able to investigate the nature
of any outlying observations. Sometimes these outliers end up being the
first observations recorded for a particular drop height and may simply be
a function of the inexperience of the rebound height recorder. Students
are then asked to use their data to fit a linear regression line and to
use it to make predictions about the rebound heights of a ball dropped
from a drop height for which no data were collected. Students are
encouraged to select their own heights and should avoid extrapolation.
Students are also asked to interpret the regression slope and intercept
within the context of this activity as well as to comment on the scope of
inference for their regression line.
Assessment
Below is a sample exam question to test
an understanding of the basic concepts associated with linear regression:
POSSIBLE EXAM QUESTION: OFFICE-TEMPS Inc. wants to screen applicants for
basic typing skills using a timed test. Applicants are required to type as
many words (in the order in which they appear on a uniform list) as
possible in the prescribed time. The allowable times range from 10 to 90
seconds. Data collected from all applicants interviewing last week are
listed below:
|
Time (Sec) |
10 |
10 |
10 |
20 |
20 |
20 |
60 |
60 |
60 |
90 |
90 |
90 |
|
# of words |
18.5 |
19 |
17.75 |
29 |
29.5 |
32 |
75 |
60.5 |
53.25 |
80.5 |
100 |
93.25 |
- Identify the independent and
dependent variables in this study.
- Assuming that the assumptions
of linear regression hold, fit a regression line to the data.
Interpret the estimated slope and intercept in the context of this
study.
- Is the regression intercept
significantly different from zero? Justify your answer.
- Compute a 95% prediction
interval for the number of words typed in 40 seconds and interpret
it in the context of this study.
- Compute a 90% confidence
interval for the mean number of words that can be typed in 40
seconds and interpret it in the context of this study.
- Clearly explain why the
intervals in (d) and (e) are NOT the same in the context of the
problem.
Teacher notes
Students often confuse dependent and
independent variables and have difficulty grasping the concept of a
population regression line that is being estimated by fitting a linear
regression line. In addition, it is often difficult to find data that
allow a careful consideration of the assumptions underlying regression.
This activity was designed to permit the students to look at the
underlying assumptions of regression and to estimate the population
regression line. Clearly, taking a little more data will lead to changes
in the estimated population regression line even though the population
line remains unchanged. In addition, the differences in a confidence
interval on the mean rebound height at a given drop height and a
prediction interval for a new observation at a given drop height become
more real to the students.
This activity will work best if students are arranged into groups
consisting of 2 to 4 members. It will be difficult to complete the data
collection if students work alone. A group of size three is optimal in
that it allows one student to drop the ball, a second to observe the
rebound height, and a third to record the data. If the groups are larger
than three, additional observers on the rebound height can be helpful.
The most challenging part of the data collection is accurately recording
the rebound heights. The rebound-height observer(s) must be eye level with
the rebound height to record it accurately. Students should practice
dropping the ball and recording the rebound heights. Some students will
force the ball downward resulting in anomalous rebound heights. Other
students will learn that they are better rebound recorders than they are
droppers. Practice time should be allocated so that groups can assign
duties, determine the range of drop heights to be used, and practice
dropping the ball and recording its rebound height.
An additional concept that may be further discussed within the context of
this experiment and its subsequent analysis is the idea of outlying or
influential observations. Sometimes outliers are observed. This could
cause the students to question the assumption of normality. Often students
can identify reasons for the outlier. For example, “It was the first
drop.”
To evaluate the assumption of equality of variances for the rebound
heights at varying levels of the drop height, students can use the 5
rebound heights at each drop height to plot the sample standard deviation
versus the drop height. Although five observations provide limited
insight, this may help identify patterns in measurement error or groups
with potential outliers. In addition to checking the homoscedasticity
assumption, students can also use normal probability plots or residual
plots to check violations of the normality assumption or to identify
outliers.
The drop height should be a good predictor of rebound height so a
discussion of high R2 values may be appropriate as well as a
discussion of the cloud-like pattern that one would expect to see in the
plot of residuals versus independent variable. Groups can compare their
regression lines with other groups for particular balls of interest.
Students should generally conclude that inference could only be drawn to
the ball that was dropped, to the particular surface on which it was
dropped, and within the range of drop heights used to construct the line.
Because the true relationship between drop height and rebound height is
quadratic, the intercept is usually significantly different from zero.
Thus, the problems associated with extrapolation are clear when
interpreting the estimated intercept. This also serves as clear example of
a model that is useful for the range of observed data, but is not the true
underlying model. It could be instructive to ask students to predict the
rebound height for a ball that is dropped from well above any observed
drop heights, say 200 inches from the ground, based on their fitted
regression line. Students should realize that inference ought to be
restricted to the person doing the dropping or observing the rebound
height unless this responsibility was rotated within the group.
Depending on the level of the course, subsequent class periods could be
used to test for equality of the regression lines from two balls of the
same type, or two balls of different types.
An extension of this activity is to use the data for a given drop height
to test for differences in the mean rebound heights of different kinds of
balls. If the data are kept from the regression activity and an effort is
made to have at least one common height for all groups, it should not be
necessary to collect more data.
Acknowledgements
This is Journal Paper No. 13310 of the
Nebraska Agricultural Research Division, University of Nebraska at
Lincoln. Research was supported in part by University of Nebraska
Agricultural Experiment Station Project NEB-23-001.
References
Bereska, C., Bolster, C. H., Bolster, L.
C., and Scheaffer, R. (1999). EQL Investigation 15: Keep your eyes on the
ball. Exploring Statistics in the Elementary Grades: Dale Seymour
Publications: White Plains, New York.
Editor's note:
Before 11-6-01, the "student's version" of an activity was called the
"prototype".