Mary Richardson,
Phyllis Curtiss, and
John Gabrosek
Department of Statistics
Grand Valley State University
1 Campus Drive
Allendale, MI 49401-9403
Statistics Teaching and
Resource Library, June 6, 2002
© 2002 by
Mary Richardson, Phyllis Curtiss, and John Gabrosek, all rights reserved. This text may be freely shared among individuals, but it may not be republished in any medium without express written consent from the authors and advance notification of the
editor.
This article describes an interactive
activity illustrating general properties of hypothesis testing and
hypothesis tests for proportions. Students generate, collect, and analyze
data. Through simulation, students explore hypothesis testing concepts.
Concepts illustrated are: interpretation of p-values, type I error rate,
type II error rate, power, and the relationship between type I and type II
error rates and power. This activity is appropriate for use in an
introductory college or high school statistics course.
Key
words: hypothesis test on a proportion, type I and II errors, power,
p-values, simulation
Objective
After completing the "What is the
Significance of a Kiss?" activity, students will understand:
 |
How to perform a hypothesis test on a
proportion |
 |
How to interpret the level of
significance of a hypothesis test (type
I error rate) |
 |
How to interpret the observed level of
significance of a hypothesis test
(p-value) |
 |
How to interpret the power of a
hypothesis test |
 |
How to interpret the type II error rate
of a hypothesis test |
 |
The relationship between type I and type
II error rates and power |
Materials and equipment
Each student needs 10 plain HERSHEY’S®
KISSES® chocolates, a 16-ounce plastic cup, a flat table or
desktop on which to work, two sticky notes, and a copy of the student’s
version of the activity (which includes a statistical guide containing
relevant notation, formulas, and definitions).
Time involved
The estimated completion time is one
hour.
Activity description
Students enjoy collecting and analyzing
data, especially when chocolate is involved. In this activity, students
explore the proportion of base landings for tossed plain HERSHEY’S®
KISSES® chocolates. Prior to completing this activity, students
should be familiar with the basic mechanics of performing hypothesis
tests, including the calculation of test statistics and p-values.
To begin the activity, each student examines a KISSES®
chocolate. (Students are told they can eat the candies later.) The
possible outcomes if a KISSES® chocolate is tossed onto the
desktop are discussed. There are two possible outcomes - landing
completely on the base or not landing completely on the base. Each student
is then asked to determine if he or she believes that the proportion of
the time that a KISSES® chocolate will land completely on its
base is less than 50%. After students make their conjectures, they are
ready to conduct the following experiment.
Each student puts his/her ten KISSES® chocolates into their
plastic cup and spills the candies onto the table five times, each time
counting the number of candies that land on their base. Results are
recorded on the student’s activity sheet.
We treat the 50 results for each student as 50 independent trials.
Actually, each student has five independent trials of 10 tosses each. We
make the assumption that the 10 tosses within a trial are roughly
independent to expedite data collection.

After data collection is completed,
students are informed that in past experiments the percentage of the time
that a plain KISSES® chocolate landed completely on its base
when tossed was consistently near 35%. When answering the questions,
students are to assume that the true proportion of base landings is p =
.35.
In question 1, students use their KISSES® data to perform two
hypothesis tests of Ho:p=.50 versus Ha:p<.50 with
different levels of significance. Each student’s data is a different
simulated sample. Under the assumption that the true value of p = .35, the
null hypothesis, Ho:p=.50, is false. Since Ho is
false, performing these tests provides an opportunity to use simulation to
illustrate properties of p-values, type II errors, and power.
The first test of Ho:p=.50 versus Ha:p<.50 is
performed using all of the data from the tosses (n = 50) and level of
significance
a =
.05. The instructor draws stems for a stem-and-leaf plot on the whiteboard
(see the student’s version of the activity). Each student writes his/her
calculated p-value on a sticky note and places it on the stem-and-leaf
plot.

Assuming a class size of 30 students,
the plot will contain 30 calculated p-values. The p-values are calculated
under the assumption that Ho:p=.50 is true (when, in fact,
p=.35), so the p-values will tend to be small. The point that small
p-values contradict Ho is discussed with students. Some
students will not obtain small p-values. On the stem-and-leaf plot, a
cut-off value is marked at a
= .05. Each p-value falling at or below this
cut-off represents a rejection of Ho (a correct decision). Each
p-value falling above this cut-off represents a failure to reject Ho
(a type II error). Since 30 samples are taken, and 30 tests are performed,
students can see that some samples result in a correct decision and other
samples result in an incorrect decision (type II error). Students are
asked to calculate the fraction of incorrect decisions to obtain a
simulated value for b
and a simulated value for the power = 1-b.
An explanation is then given of how to interpret a type II error rate (and
power) in terms of repeatedly performing the procedure of selecting a
sample, then using the data to test a hypothesis about a population
parameter when the null hypothesis is false.
The second test is performed using
a =
.20. The p-value is the same as for the first test; however, the type I
error rate is increased to 20%. On the stem-and-leaf plot of p-values, a
new cut-off is marked at a
= .20. Each p-value falling at or below this
cut-off represents a rejection of Ho (a correct decision). Each
p-value falling above this cut-off represents a non-rejection of Ho
(a type II error). Students are asked to calculate the fraction of
non-rejections of Ho out of the 30 tests to obtain a simulated
value for b
and a simulated value for the power. In
examining the class results, students will note that an increase in the
type I error rate results in a decrease in the type II error rate and thus
an increase in the simulated power.
In question 2, students use their KISSES® data to perform two
hypothesis tests of Ho:p=.35 versus Ha:p¹.35
with different levels of significance. Under the assumption that p = .35,
performing these tests provides an opportunity to illustrate properties of
p-values and type I error.
The first test of Ho:p=.35 versus Ha:p¹.35
is performed using all of the data from the tosses (n = 50) and
a = .05.
The second test of Ho:p=.35 versus Ha:p¹.35
is performed using a
= .20. As before, a stem-and-leaf plot of the
class p-values is constructed.
The p-values are calculated under the assumption that Ho:p=.35
is true, so the p-values will tend to be large. The point that large
p-values do not contradict Ho is discussed with students. Some
students will not obtain large p-values. On the stem-and-leaf plot, a
cut-off value is marked at a.
Each p-value falling at or below this cut-off represents a rejection of Ho
(a type I error). Each p-value falling above this cut-off represents
a failure to reject Ho (a correct decision). Since 30 samples
are taken, and 30 tests are performed, students can see that some samples
result in a correct decision and other samples result in an incorrect
decision (type I error). For each of the
a values (a
= .05 and
a = .20),
students are asked to calculate the fraction of rejections of Ho
out of the 30 tests to obtain a simulated value for
a. An
explanation is then given of how to interpret a type I error rate in terms
of repeatedly selecting a sample, then using the data to test a hypothesis
about a population parameter when the null hypothesis is true.
Teacher notes
In this activity, we used the same data
set to perform two different hypothesis tests at two different levels of
significance. The instructor should emphasize that the level of
significance, null hypothesis, and alternative hypothesis should be
determined prior to data collection. We use the same data for multiple
hypothesis tests to save time. Technically, we should have collected four
separate data sets, one for each of the four tests conducted.
In addition, the instructor should stress to students that in reality one
would not know the true value of the population parameter p. If the
parameter value were known, then there would be no point in utilizing
sample data to draw an inference about the parameter. The instructor
should stress that we assume knowledge of the parameter in order to
investigate the properties of hypothesis testing under different
situations.
Assessment
Students should be able to explain type
I error and type II error in a specific problem. Additionally, students
should be able to describe the relationship between type I and type II
error rates.
The following questions can be used to assess student understanding or as
challenge problems for students who complete the activity early.
1. A parachutist has made thousands of
successful jumps. His assumption is that when he pulls the rip cord, the
parachute will open.
- Describe a type I error in the
context of this problem.
- Describe a type II error in the
context of this problem.
- Which is a more serious error
for this problem?
- Most parachutes have a back-up
in case the rip cord malfunctions. Does this guard against type I
or type II errors?
- Suppose that I pull the rip
cord and it does not function. I have time to pull it again or
pull the back-up but not both. If I were concerned about a type I
error what would I do? Why? If I were concerned about a type II
error what would I do? Why? What would you do?
2. Explain the fallacy in reasoning in each of the following statements.
- “I wanted to reduce the chance
of committing a type I error, so I increased the power of the
test.”
- “I don’t like making mistakes
so I’m going to set the type I error rate at .0001.”
3. Explain how you would use our class data to simulate the sampling
distribution of the proportion of base landings in 50 trials.
4. In order to answer parts (a) and (b) below, suppose that
and you wish to
test versus Ho:p=.50
versus Ha:p<.35.
- Assume that n = 50 and perform
this hypothesis test using a 5% level of significance (a=.05).
- Assume that n = 100 and perform
this hypothesis test using a 5% level of significance (a=.05).
- Give an intuitive justification
for why changing the sample size may result in changing the
conclusion about a null hypothesis.
- In general, what is the
relationship between the sample size and the absolute value of the
test statistic? (Assume that the sample size is changed, but that
the value of
does not
change.)
- In general, what is the
relationship between the sample size and the p-value? (Assume that
the sample size is changed, but that the value of
does not
change.) To answer this question, refer to the standard
normal curve.
- What do you think is the
overall relationship between the sample size, the type II error
rate, and the power when Ho is false?
References
Aliaga, M. and Gunderson, B. (1999). Interactive Statistics.
New Jersey: Prentice Hall.
The HERSHEY'S® and KISSES®
trademarks are used with permission of Hershey Foods Corporation.