John Gabrosek
Department of Statistics
Grand Valley State University
1 Campus Drive
Allendale, MI 49401-9403
Michael E. Schuckers
Department of Statistics
410 Hodges Hall
West Virginia University
Morgantown, WV 26506
Statistics Teaching and
Resource Library, October 25, 2001
© 2001 by
John Gabrosek
and
Michael E. Schuckers, all rights reserved. This text may be freely shared among individuals, but it may not be republished in any medium without express written consent from the author and advance notification of the
editor.
The statistical educator often finds it
difficult to convey the beauty and power of descriptive graphical data
summaries to her students. Breaking the Code actively engages students in
constructing and interpreting bar charts. The activity requires students
to describe data graphically, compare the frequency distribution depicted
in two bar charts, construct and test a hypothesis, and communicate
results.
The activity begins with an explanation of the Caesar Shift for message
encryption (Singh, 1999). The
Caesar Shift is a translation of the alphabet; for example, a five-letter
shift would code the letter a as f, b as g, … z as e. We describe a
five-step process for decoding an encrypted message. First, groups of size
4 construct a frequency table of the letters in two lines of a coded
message. Second, students construct a bar chart for a reference message of
the frequency of letters in the English language. Third, students create a
bar chart of the coded message. Fourth, students visually compare the bar
chart of the reference message (step 2) to the bar chart of the coded
message (step 3). Based on this comparison, students hypothesize a shift.
Fifth, students apply the shift to the coded message.
After decoding the message, students are asked a series of questions that
assess their ability to see patterns. The questions are geared for higher
levels of cognitive reasoning.
Key words: bar charts, Caesar Shift, encryption, testing hypotheses
Objectives
The objectives of the Breaking the Code
activity include:
 |
Constructing
frequency tables |
 |
Constructing
bar charts |
 |
Comparing
distributions by looking for patterns |
 |
Forming and
testing a hypothesis |
 |
Understanding
sampling variability |
 |
Explaining
results of statistical procedures |
 |
Working
cooperatively in a group |
 |
Using a
statistical computer package (such as, SPSS for
Windows or Minitab).
|
|
Materials and Equipment
The following materials and equipment are needed for the Breaking the Code
activity:
 |
A classroom set
of handouts, one for each student |
 |
A computer
along with statistical software is optional, but
recommended |
|
Time Involved
This activity has been used for four semesters in a general education
introductory statistics classroom. The activity has been assigned early in
the semester when students are unfamiliar with a statistical computer
package. The time involved has been as follows:
 |
Step I of the
activity – Allow the last 10 minutes of a class
period |
 |
Steps II to V
of the activity – Allow an entire 50-minute class
period held in a computer lab |
|
Depending on course structure, consider
the following alternative approaches:
- Eliminate the computer portion
of the assignment and have students produce bar charts by hand.
The activity can be completed in a 50-minute class.
- Give a brief lecture of the
decoding approach with a short example. Assign the activity as
individual or group homework.
Regarding the Data and
Graphs
Non-coded writing is used to produce a
reference distribution for the frequency of letters in the English
language. Any writing of at least 250 letters could be used. Below we give
a writing sample. The sample will not have the same frequency distribution
of letters found in the English language as would another writing sample.
Students will compare bar charts of a coded message and the reference
distribution to hypothesize the shift used to encode the message. The
message that we used to generate the reference distribution is:
And, most importantly, I would like
to thank my family for their unconditional love and generous support.
Without the encouragement of my parents, Joseph and Ann, my brother, Joe,
my sister-in-law, Jenna, my nieces, Gabrielle and Madison, and my sister,
Anita, I could not have completed this work.
The frequency table of the reference
message is:
|
Letter |
Count |
Letter |
Count |
Letter |
Count |
Letter |
Count |
|
A |
19 |
B |
2 |
C |
5 |
D |
10 |
|
E |
23 |
F |
3 |
G |
3 |
H |
8 |
|
I |
17 |
J |
3 |
K |
3 |
L |
11 |
|
M |
12 |
N |
23 |
O |
21 |
P |
6 |
|
Q |
0 |
R |
13 |
S |
12 |
T |
20 |
|
U |
7 |
V |
2 |
W |
4 |
X |
0 |
|
Y |
8 |
Z |
0 |
|
|
|
|
The bar chart of the reference message
is:

The two lines of the coded message are:
Line 1: svukvujhsspunavaolmhyhdhfavduz
Line 2: uvddhypzkljshylkhukihaasljvtlkvdu
The frequency table of the coded message is:
|
Letter |
Count |
Letter |
Count |
Letter |
Count |
Letter |
Count |
|
A |
5 |
B |
0 |
C |
0 |
D |
5 |
|
E |
0 |
F |
1 |
G |
0 |
H |
8 |
|
I |
1 |
J |
3 |
K |
5 |
L |
5 |
|
M |
1 |
N |
1 |
O |
1 |
P |
2 |
|
Q |
0 |
R |
0 |
S |
5 |
T |
1 |
|
U |
7 |
V |
7 |
W |
0 |
X |
0 |
|
Y |
3 |
Z |
2 |
|
|
|
|
The bar chart of the coded message is:

Assessment
After completing the activity, students
should be able to interpret bar charts, state and test hypotheses
(informally), and explain the concept of sampling variability
(informally). Question 1 of the activity requires students to look for
patterns in interpreting bar charts. Questions 1 and 2 informally assess
understanding of the process used to formulate and test hypotheses.
Question 3 addresses knowledge of sampling variability. On homework and
exams students should be required to interpret bar charts looking for
peaks, valleys, and unusual observations. Students should be required to
write about sampling variability and hypothesis testing. For example,
students should be able to answer the following question:
You are given a six-sided die with each of the numbers 1,2,3,4,5,6
imprinted on one face.
- Discuss how you can
determine whether the die is "fair." (By fair we mean that all six
faces of the die are equally likely.)
- Suppose that you and I
independently follow the procedure you outlined in part (a). Would
you expect our results to be identical? Explain.
Teaching Notes
We have observed the following when
using the activity in an introductory statistics classroom:
 |
Students
appreciate the hands-on nature. On a scale of 1 to
5 (1 = strongly disagree, 5 = strongly agree), the
mean student response to the statement “The
activity was more interesting than solely a
lecture on bar charts” was 4.50. |
 |
Giving an
example of a shift applied to a short message
helps to avoid student confusion.
|
 |
Keeping the
required computer skills to a minimum allows
students to focus on interpreting the bar charts. |
 |
Students will
try to compare single peaks between the reference
and the coded messages. We purposefully chose a
message, that when decoded, has a most frequent
letter other than the most frequent letter (a tie
between e and n) in the reference message. A hint
to “look for general patterns of peaks and
valleys” will usually get students on the right
track. |
 |
Closely
monitoring each group’s progress is essential.
Students will proceed far down an incorrect path.
This is especially true if they have hypothesized
an incorrect shift. |
 |
We have used
the same coded message for each group; however,
there is no reason that either the message and/or
the shift could not be varied from group to group. |
 |
Some computer
packages (for example, SPSS for Windows) will not
print a label on the horizontal axis of a bar
chart for a category that has frequency 0. We
recommend that students substitute 0.01 for 0. |
|
References
Singh, S. (1999).
The Code Book.
New York: Doubleday.
Editor's note:
Before 11-6-01, the "student's version" of an activity was called the
"prototype".