Erin E. Blankenship and
Linda J. Young
Department of Biometry
University of Nebraska–Lincoln
Lincoln, NE 68583-0712
Statistics Teaching and
Resource Library, July 25, 2001
© 2001 by Erin E. Blankenship
and Linda J. Young, all rights reserved. This text may be freely shared among individuals, but it may not be republished in any medium without express written consent from the authors and advance notification of the
editor.
This group activity illustrates the concepts of size and power of a test through
simulation. Students simulate binomial data by repeatedly rolling a ten-sided
die, and they use their simulated data to estimate the size of a binomial test.
They carry out further simulations to estimate the power of the
test. After pooling their data with that of other groups, they construct a power
curve. A theoretical power curve is also constructed, and the students discuss why there
are differences between the expected and estimated curves.
Key
words:
Power, size, hypothesis testing,
binomial distribution
Materials
Each group (2-4 students) will need one ten-sided
die, two tabulation sheets, and a binomial probability table.
Objective
Carry out a simulation study to estimate the size and power of a binomial test
using data simulated by rolling a ten-sided die.
Description of Activity
This activity is used during the laboratory section of a graduate level course
on introductory statistical methods, and was developed because the students were having trouble with the concepts of size and
power. To solidify these ideas in the context of a hypothesis test for a binomial
parameter, the students carry out a simulation study based on Example 3.1,
page 60, in Dowdy
and Wearden (1991). The null hypothesis in the example is Ho:p=0.5 versus the alternative Ha:p¹0.5, and the students begin by performing a simulation study to estimate the size of the test (i.e.,
assuming p=0.5). Each simulated experiment has n=20 trials, and the experiment is repeated 25
times. The simulation study is repeated to estimate the power of the test of Ho:p =0.5
under various alternative values of p.
Again, there are n=20 trials in each of
25 simulated experiments. This time, however, the true value of p
is something
other than 0.5.
To simulate binomial data, the students repeatedly roll a 10-sided
die. At the beginning of the lab period, the students arrange themselves into groups
of 2–4 members, and each group receives a 10-sided die and a particular
alternative true value of p.
Depending on class size, it may be necessary to rearrange the groups so that there are as many groups as there are
alternative p
values. The true values that work well with the 10-sided die are
p={0.1,
0.2, 0.3, 0.4, 0.6, 0.7, 0.8, 0.9}, which imply a total of 8
groups. On the prototype activity, there is a blank left for the group’s
p
value. This prototype activity is identical to the version handed out during the lab
session, and the true p
value is filled in by the lab instructor as the directions are distributed to the
groups. All groups work with the hypothesized value,
p=0.5, in the study to estimate the size of the test of Ho:p=0.5.
The prototype activity includes several questions for the group members to
discuss during the activity. For example, the group members decide which rolls
should constitute a success under their specified value of p.
Other discussion questions included in the activity are more
abstract. For example, the groups are asked whether the estimated size of the test is satisfactorily close to the
theoretical a.
It is often instructive for the lab instructor to bring the groups
together after they have all completed the activity and discuss these types of
questions again. The groups often have different perspectives.
The activity also asks the groups to pool their results with the other groups to construct a
theoretical power curve and an estimated power curve. It works well to have
the lab instructor, after reassembling the groups, construct the power curves on
the blackboard using the information supplied by the groups. For the point at p=0.5 (the hypothesized
value, used by each group) the lab instructor may pick one at random or may average all of the
possibilities. After the curves are constructed, questions about the differences
between the curves can be discussed by the class as a whole. This is also a good time to discuss any problems the
groups encountered during the activity.
Assessment
This activity was used in lab because students were having trouble
understanding the concepts of power and size; they saw them as abstract
ideas. This activity was designed to make those concepts more
concrete. Therefore, the test and homework questions previously used to assess student understanding
really did not change. The anticipated changes were in the quality of the
answers, and the responses did improve. Below is a sample exam question to test
an understanding of power (and binomial hypothesis testing in general):
The CDC reported that 6.7% of men aged 45-54 have coronary heart disease
(CHD). We want to know if this rate also holds for men who are heavy coffee
drinkers (>100 cups per month). To investigate this, 25 heavy coffee drinkers
aged 45-54 are randomly selected, and a physician determines the number out
of the 25 that suffer from CHD.
- State the most reasonable null and alternative hypotheses to
test. Rather than the 6.7% reported by the CDC, use 10% as the rate of CHD so that
the binomial tables from the textbook may be used.
- In the context of this example,
describe Type I and Type II errors. Which type of error do you consider more
serious? Explain.
- If we want to test the hypotheses from
(1) at the a=0.05 level, what is the rejection region?
- Suppose 4 out of the 25 men have CHD.
What is the conclusion?
- Assume that the true proportion of heavy coffee drinking males aged 45-54
that have CHD is 15%. What is the power of the test under this
alternative? What does this value mean?
Teacher notes
The alternative values of
p that work well with this activity are
p={0.1, 0.2, 0.3, 0.4, 0.6, 0.7, 0.8, 0.9} so that the ten-sided die can be
used, although other true values could be used with a different randomizing
device. Sample tabulation sheets for estimating size (p =0.5)
and estimating power (for the alternative p=0.2)
are included. Also included is an example of a theoretical and estimated
power curve plot. (This plot is based on real data, and it looks almost too good!)
Two dice can be provided to each group to speed up the data collection.
The number of rolls and number of simulated experiments can be
changed. The Dowdy and Wearden (1991) text only gives binomial tables for n=20
and n=25, but a more extensive binomial table could be used to accommodate any
number of rolls in an experiment. The number of simulated experiments can
be changed to fit the amount of time allotted for the activity. The larger the
number of simulated experiments, the more closely the estimated power curve
will follow the theoretical curve. We allot a two-hour laboratory session for
this activity, but it does not take the entire time. The activity could easily
be completed during a 75-minute class period, or during one and a half 50-minute
class periods. Of course, if discussion becomes more involved, it will take longer.
The ten-sided dice are available at game
stores. They are also available on the web at http://www.chessex.com/Dice_Home.htm,
but we do not have experience ordering online from this company.
Acknowledgements
This manuscript has been assigned Journal Series
No. 01-6, College of Agricultural Sciences and Natural
Resources, University of Nebraska.
References
Dowdy, S. and Wearden, S. (1991). Statistics for Research, Second Edition
New York: John Wiley & Sons.
Editor's note:
Before 11-6-01, the "student's version" of an activity was called the
"prototype".