Mary Richardson,
Phyllis Curtiss,
John Gabrosek, and
Diann Reischman
Department of Statistics
Grand Valley State University
1 Campus Drive
Allendale, MI 49401-9403
Statistics Teaching and
Resource Library, September 1, 2002
© 2002 by
Mary Richardson, Phyllis Curtiss, John Gabrosek,
and Diann Reischman, all rights reserved. This text may be freely shared among individuals, but it may not be republished in any medium without express written consent from the authors and advance notification of the
editor.
This article describes an interactive
activity illustrating sampling distributions for means, properties of
confidence intervals, properties of hypothesis testing, confidence
intervals for means, and hypothesis tests for means. Students generate and
analyze data and through simulation explore these concepts. The activity
is completed in three parts. The three parts of the activity can be used
in sequence or they can be used individually as “stand alone” activities.
This allows the educator flexibility in utilizing the activity. Part I
illustrates the sampling distribution of the sample mean. Part II
illustrates confidence intervals for the population mean. Part III
illustrates hypothesis tests for the population mean. This activity is
appropriate for use in an introductory college or high school AP
statistics course.
Key words: sampling distribution of a sample mean, confidence interval for
a mean, hypothesis test on a mean, simulation
Objective
After completing the Rectangularity
activity, students will understand:
 |
How to
construct and use the sampling
distribution for the sample mean |
 |
How to construct and interpret a
confidence interval for a mean |
 |
How to perform a hypothesis test on a
mean |
 |
How to
interpret the level of significance of a
hypothesis test (type I error rate) |
 |
How to
interpret the p-value of a hypothesis
test |
 |
How to
interpret the type II error rate of a
hypothesis test |
 |
How to
interpret the power of a hypothesis test |
 |
The
relationship between type I and type II
error rates and power |
Materials and equipment
Each student needs a random number
table or a calculator that generates random numbers, four sticky notes
(for Parts II and III), and a copy of the activity (which includes
statistical guides containing relevant notation, formulas, and
definitions). Included in the student’s version of the activity is a sheet
with a population of 100 rectangles having different areas. Each square
counts as one unit towards a rectangle’s area.
Time involved
The activity is completed in three
parts. The estimated completion time for each part is one class period
(approximately one hour). The three parts of the activity can be used in
sequence or they can be used individually as “stand alone” activities.
Part I illustrates the sampling distribution of the sample mean and
involves calculations that should be completed using either a computer
software package or a graphing calculator. Part II illustrates confidence
intervals for the population mean. Part III illustrates hypothesis tests
for the population mean.
Activity description -
Part 1: sampling distribution of the sample mean
To begin, the teacher draws a histogram
of the population distribution of areas on the whiteboard. The population
distribution of areas is skewed to the right (positively skewed).
Ten groups of two or three students are formed and the following tasks are
assigned to each group.
 |
Select two
different random samples of n = 5
rectangles (with replacement) |
 |
Select two
different random samples of n = 15
rectangles (with replacement) |
 |
Select two
different random samples of n = 25
rectangles (with replacement) |
Students calculate the average area of
the rectangles for each sample drawn reinforcing the idea that the sample
mean is a random variable.
To complete the data collection sheet, group results are combined to
obtain 20 sample means for sample sizes n = 5, 15, and 25.



After data collection, students answer a
series of questions based on the means and standard deviations of the
sample means for the different sample sizes. Students discover properties
of the distribution of a sample mean; namely, (i) the distribution of
sample mean values is centered at the population mean, (ii) the
distribution of sample mean values approaches a normal distribution as the
sample size increases, (iii) the distribution of sample mean values has
less variability than the original population, and (iv) the variability of
sample mean values decreases as n increases.
Activity description -
Part 2:confidence interval for the population mean
Each student selects a simple random
sample of 25 rectangles (with replacement). Note that the population of
rectangle areas does not have a normal distribution, but the t confidence
interval procedure may be applied in this case since the sampling
distribution of
is approximately
normal for samples of size 25. First, each student uses her sample to
construct an 80% confidence interval for the population mean rectangle
area. Each student writes her result on a sticky note and gives it to the
instructor. Each student’s confidence interval is sketched horizontally on
an overhead transparency leaving one blank horizontal line between
intervals. The resulting overhead transparency displays all of the
confidence intervals constructed by the students in the class.

Students see the results of drawing
repeated samples from the same population and calculating 80% confidence
intervals. Some of the confidence intervals will contain the population
mean (6.26) and some will not. After graphing the class confidence
intervals, their meaning is discussed. We stress that if we claim that we
are 80% confident that a mean lies within the endpoints of a confidence
interval, we are saying that the endpoints of the confidence interval were
calculated by a method that gives correct results in 80% of all possible
random samples. We are not saying that there is an 80% chance that a
calculated interval contains the population mean. Students are asked to
write a statement explaining how an 80% level of confidence should be
interpreted.
Students are then asked to construct a 99% confidence interval for the
population mean rectangle area. As above, the class confidence intervals
are graphed and the results are discussed. We stress how to properly
interpret a 99% confidence level and ask students to write a statement
explaining how a 99% level of confidence should be interpreted. Students
are asked to write a statement explaining how increasing the confidence
level from 80% to 99% changed the width of their confidence intervals.
Activity description -
Part 3:hypothesis test on the population mean
Each student selects a simple
random sample of 25 rectangles (with replacement) or uses the simple
random sample selected for Part II. Note that the population of rectangle
areas does not have a normal distribution, but the t test may be applied
in this case since the sampling distribution of
is
approximately normal for samples of size 25.
In question 1, students use their sample data to perform two hypothesis
tests of Ho:m=9
versus Ha:m<9
with different levels of
significance. Each student’s data is a different simulated sample. Since
the true population mean rectangle area is
m=6.26,
the null hypothesis Ho:m=9
is false. Since Ho
is false, performing these tests
provides an opportunity to use simulation to illustrate properties of
p-values, type II errors, and power.
The first test of Ho:m=9
versus Ha:m<9
is performed using level of significance a=.05.
The instructor draws stems for a stem-and-leaf plot on the whiteboard.
Each student writes her calculated p-value on a sticky note and places it
on the stem-and-leaf plot.
Assuming a class size of 30 students, the plot will contain 30 calculated
p-values. The p-values are calculated under the assumption that
Ho:m=9
is true (when, in fact,
m=6.26), so the p-values
will tend to be small. We discuss with students that small p-values
contradict Ho.
Some students will not obtain small
p-values. On the stem-and-leaf plot, a cut-off value is marked at
a=.05.
Each p-value falling at or below this cut-off represents a rejection of
Ho
(a correct decision). Each p-value
falling above this cut-off represents a failure to reject
Ho
(a type II error). Since 30 samples
are taken, and 30 tests are performed, students see that some samples
result in a correct decision and other samples result in an incorrect
decision (type II error). Students are asked to calculate the fraction of
incorrect decisions to obtain a simulated value for
b, the probability of a type
II error, and a simulated value for the power = 1-b.
An explanation is then given of how to interpret a type II error rate (and
power) in terms of repeatedly performing the procedure of selecting a
sample, then using the data to test a hypothesis about a population
parameter, when the null hypothesis is false.
The second test is performed using a=.20.
The p-value is the same as for the first test; however, the type I error
rate is increased to 20%. On the stem-and-leaf plot of p-values, a new
cut-off is marked at a=.20.
Each p-value falling at or below this cut-off represents a rejection of
Ho
(a correct decision). Each p-value
falling above this cut-off represents a non-rejection of
Ho
(a type II error). Students are asked
to calculate the fraction of non-rejections of Ho
out of the 30 tests to obtain a
simulated value for b
and a simulated value for the power. In examining the
class results, students note that an increase in the type I error rate
results in a decrease in the type II error rate and thus an increase in
the simulated power.
In question 2, students use their sample data to perform two hypothesis
tests of Ho:m=6.26
versus Ha:m¹6.26
with different levels of significance. Under the assumption that
m=6.26,
performing these tests provides an opportunity to illustrate properties of
p-values and type I error.
The first test of Ho:m=6.26
versus Ha:m¹6.26 is performed using
a=.05.
The second test is performed using a=.20.
As before, a stem-and-leaf plot of the class p-values is constructed.
The p-values are calculated under the assumption that Ho:m=6.26
is true, so the p-values will tend to be large. We discuss with students
that large p-values do not contradict Ho.
Some students will not obtain large p-values. On the stem-and-leaf plot, a
cut-off value is marked at a.
Each p-value falling at or below this
cut-off represents a rejection of Ho
(a type I error). Each p-value
falling above this cut-off represents a failure to reject
Ho
(a correct decision). Since 30
samples are taken, and 30 tests are performed, students can see that some
samples result in a correct decision and other samples result in an
incorrect decision (type I error). For a=.05
and a=.20,
students are asked to calculate the
fraction of rejections of Ho
out of the 30 tests to obtain a
simulated value for a.
An explanation is then given of how
to interpret a type I error rate in terms of repeatedly selecting a
sample, then using the data to test a hypothesis about a population
parameter, when the null hypothesis is true.

Teacher notes
Students work with a population of 100
rectangles, drawing repeated simple random samples (with replacement).
Prior to completing Part I, students should be familiar with descriptive
statistics and probability distributions. Prior to completing Part II,
students should be familiar with the basic mechanics of how to construct
confidence intervals. Prior to completing Part III, students should be
familiar with the basic mechanics of how to perform hypothesis tests,
including the calculation of test statistics and p-values.
In this activity, we sample with
replacement to preserve the independence of the sample observations. When
sampling with replacement, it is possible for the same rectangle to be
sampled more than once. If sampled rectangles are not replaced in the
population, then each time a rectangle is withdrawn the probability of
selection for the remaining rectangles will increase. In practice, we
often either sample with replacement or we sample from a population that
is so large that the withdrawal of successive items changes selection
probabilities negligibly.
In this activity, we used the same data
set to perform two different hypothesis tests at two different levels of
significance. The instructor should emphasize that the level of
significance, null hypothesis, and alternative hypothesis should be
determined prior to data collection. We use the same data for multiple
hypothesis tests to save time. Technically, we should have collected four
separate data sets, one for each of the four tests conducted.
In addition, the instructor should stress to students that in reality one
would not know the true value of the population mean m.
If the parameter value were known, then there would be no point in
utilizing sample data to draw an inference about the parameter. The
instructor should stress that we assume that we know the parameter so that
we can investigate the properties of hypothesis testing under different
situations.
Assessment
For Part I: Students should write about
the effect of sampling variability on the center, spread, and shape of the
sampling distribution of the sample mean. Students should write about the
effect of sample size on the shape and spread of the distribution of the
sample mean.
The following questions can be used to assess student understanding or as
challenge problems for students who complete the activity early.
1. What happens to the shape of the sampling distribution of the sample
means for this non-normal population as the sample size increases?
2. How do you think the shape, mean, and standard deviation of the
distribution of the sample means for samples of size 100 would compare to
the shape, mean, and standard deviation for the samples of size 25 that
the class took?
3. Widgets produced by a machine are known to have a mean diameter of 12
mm with a standard deviation of 0.31 mm. Suppose that we take a random
sample of 90 widgets and measure each widget’s diameter. We calculate the
mean diameter of the 90 widgets. We repeat this process every day for 365
days so that we have
.
- What would we expect the mean of the 365 daily means to be?
- What would we expect the standard deviation of the 365 daily means to
be?
-
What would we expect the shape of the histogram of the 365 daily means
to be? Why?
-
Assuming that the machine continues to perform as it has in the past,
what is the probability that for the next day the mean diameter of the
90 sampled widgets will be between 11.95 mm and 12.05 mm?
-
Why is simply looking at the mean diameter not enough to say that the
machine is producing widgets with diameters close to the desired 12mm?
For Part II: Students should be able to explain how to interpret a
confidence interval. Additionally, students should be able to describe the
relationship between the confidence level and the width of a confidence
interval.
The following questions can be used to assess student understanding or as
challenge problems for students who complete the activity early.
For all of these questions, assume that the samples are large enough so
that the sampling distribution of the sample mean is approximately normal.
1. Suppose a simple random sample (SRS) of 20 rectangles has sample mean,
= 7.3, and sample standard deviation, s = 6.1. Based on the sample, we
wish to estimate the value of the population mean, m.
-
What is the point estimate for m?
-
What is the standard deviation of the point estimate?
-
The mean of the sample will not be exactly equal to the mean of the
population, thus there is error associated with the point estimate. With
95% confidence, what is the maximum error associated with the point
estimate? (That is, what is the largest possible difference between
and m)
This value is often called the margin of error.
-
The margin of error in part (c) consists of how many estimated standard
deviations of
?
2. Suppose the sample mean,
, from a SRS of 40 rectangles is used to
estimate m.
-
How would you expect the standard deviation of the sampling distribution
of the sample mean of 40 rectangles to compare to the standard deviation
of the sampling distribution of the sample mean of 20 rectangles?
Explain.
-
How would you expect the 95% margin of error for the estimate of m
for the 40 rectangles to compare to the 95% margin of error for the 20
rectangles in the previous problem? Explain.
-
Do you think using a sample mean from a sample of size 40 will give a
more precise estimate of m
than the sample mean from a sample of size 20? Explain.
3. In the activity, you selected a SRS of 25 rectangles and constructed an
80% confidence interval. Suppose you had selected a SRS of 40 rectangles
and constructed an 80% confidence interval. How would you expect the
confidence interval constructed from 40 rectangles to compare to the
confidence interval constructed from 25 rectangles? Explain.
4. For a large population, a 90% confidence interval for m is found to be
23.5 to 28.9. Why is the following statement incorrect? “There is a 90%
chance that m is between 23.5 and 28.9.”
5. Suppose you select a SRS of size 30 from a large population and find a
95% confidence interval for m to be 17.30 to 23.47. Your friend selects a
separate SRS of size 30 from the population and finds a 95% confidence
interval for m to be 18.64 to 24.81. Which confidence interval is better?
Explain.
For Part III: Students should be able to explain
type I error and type II
error in a specific problem. Additionally, students should be able to
describe the relationship between type I and type II error rates and power.
The following questions can be used to assess student understanding or as
challenge problems for students who complete the activity early.
1. A company is trying to decide whether to buy a new Widget machine that
costs $1 million. It is decided it will be worth buying the machine if
there is overwhelming evidence that the mean number of defective Widgets
will decrease from the current rate of 200 per day.
-
State the null and alternative hypotheses needed to test if the machine
should be purchased.
-
Describe a type I error in the context of this problem.
-
Describe a type II error in the context of this problem.
-
Argue that a type I error is a more serious error in this problem.
-
For this situation, should the company run the test at the 1%, 5%, or
10% significance level? Explain.
2. Explain the fallacy in reasoning in the following statement. “I wanted
to reduce the chance of committing an error, so I reduced the type I error
rate to .001.”
3. A doctor claims that his patients wait an average of 10 minutes in his
waiting room. A disgruntled patient claims it is really higher. For a
random sample of patients, the sample mean is 10.8 with a standard
deviation of 2.1.
-
If the sample consisted of 25 patients, perform the appropriate
hypothesis test using a 1% level of significance (a=.01).
-
If the sample consisted of 50 patients, perform the appropriate
hypothesis test using a 1% level of significance (a=.01).
-
Give an intuitive justification for why changing the sample size may
result in changing the conclusion about a null hypothesis.
-
In general, what is the relationship between the sample size and the
absolute value of the test statistic?
-
In general, what is the relationship between the sample size and the
p-value? (To answer this question, refer to the t-curve.)
-
What do you think is the overall relationship between the sample size,
the type II error rate, and the power, when Ho is false?
References
Aliaga, M. and Gunderson, B. (1999). Interactive Statistics.
New Jersey: Prentice Hall.
Scheaffer, R., Gnanadesikan, M.,
Watkins, A., and Witmer, J. (1996). Activity-Based Statistics:
Instructor Resources. New York: Key Curriculum Press; Springer.