The Multiple-Choice Exam on Creating Multiple-Choice Exams (2)
Here are the answers to the second part of this
guide. The correct answers are shown in highlighted
text. Additional commentary is provided where appropriate.
- In the book, “The Hobbit”, the character Bilbo Baggins
functions as the:
- antihero
- antagonist
- focal character
- supporting character
Note that the items are ordered both alphabetically and in
increasing length, which avoids giving undue prominence to any one of them.
We could also have used ‘hero’ as the correct answer, but
this (i) gives an item much shorter than the others and (ii) leaves the
word ‘character’ appearing only once in the list (a cue).
- The answer to question 1 is evidenced by the fact that:
- he sets the dwarves up to be captured repeatedly
- he is the least experienced character in the story
- he cheats, lies, and steals throughout the story
- he appears throughout the story in critical ways
It took about 10 attempts to come up with the final list of distractors,
since they needed to (i) relate to the items in question 1 and (ii)
be plausible without being too obvious. Using ‘he is the titular
character of the book’ would clearly be an inadequate choice!
- An ideal gas has a molar mass of 46.0 g/mol. If a 3.00 g sample
of the gas is heated in a rigid 2.00 L container to a temperature
of 400 K, the pressure of the gas in the container will be (R
= 0.08206 L/(mol.K)):
- 0.004 atm
- 1.07 atm
- 16.4 atm
- 252 atm
The calculation requires substitution of n = m/Mm
into PV = nRT with subsequent rearrangement. And yes, I am
quite aware that those aren’t strict SI units!
- Four identical rigid containers, each containing the same mass
of a different gas, are held at the same temperature. The vessel
with the gas at the highest pressure will be the one containing:
- carbon dioxide (CO2)
- methane (CH4)
- nitrogen (N2)
- krypton (Kr)
- Cannot be determined
- The items in question 2 were crafted to correspond to
those in question 1, but were deliberately presented in a different
sequence because:
- the correct answers in MC tests should always be
evenly distributed between the item postions (a – d)
- the correct answer in MC tests should never be
in the same position as the preceding question
- the position of the answer chosen for the first question
should not act as a cue for the paired question
- the items in question 1 are in ascending order, so next question
should be in descending order
It is not necessary to have an equal number of answers in each position on a
test; neither is it necessary to avoid repeating positions. It is, however, a
good idea to make sure that the same uneven distribution of answer positions
does not occur on every test, as this would give an advantage
to students guessing answers!
- Question 1 was intended to be an easy first question to help students
settle into the test. If the question was appropriate for this
purpose, you would expect its facility value to be:
- < 25%
- 25 – 50%
- 50 – 75%
- > 75%
Actually, I’d hope it was a lot higher than 75% if it
was intended to be an easy question!
- A question that completely discriminated between the
upper and lower quartiles would have a maximum value of:
- DP = 100%
- DP = 75%
- DP = 50%
- DP = 25%
The maximum would be all students in the upper and none in the lower quartile
getting the correct answer. Note that the item order is descending so that
the correct answer is in different positions for questions 6 – 8,
which are otherwise quite similar.
- A question that was answered correctly by all of the upper
and half of the lower quartiles would have a discriminating power of:
- 25%
- 50%
- 75%
- 100%
Note that, if the class size is small, DP could also be calculated using
the top and bottom 30% of the class using the same formula.
- Question 4 was found to have facility value of 15% and a discriminating
power of 36%. Given the nature and purpose of this question, you would
immediately:
- check which distractors were chosen most
frequently
- check how many of the lower quartile answered correctly
- conclude that the question was appropriate and/or adequate
- conclude that the question was inappropriate and/or inadequate
Such a low FV would normally be a red flag, especially if a large
fraction of the class selected the same incorrect answer. This might
be indicative of an error in the question, for example, or a mistake in the
instructor’s solution, and so should be subject to careful review. In
this case, we were trying to discriminate between students at a high conceptual
level. The first consideration then is whether students in the upper quartile
answering incorrectly had mostly opted for answer e (conceptual failure),
or option a (error in procedure). In general, it is a good idea to compare facility
values and discriminating powers for each item; a distractor chosen by
more of the upper than lower quartile students should again be subject to review
by the instructor.
- The best practice for multiple-choice summative
assessment questions is to:
- use common errors as distractors since students will not make these
mistakes if they have really understood the material
- avoid common errors as distractors since cumulative assessments are
stressful and even good students make mistakes
- use common errors as distractors in order to ensure the results
yield the desired grade distribution
- avoid common errors as distractors in order to avoid students
appealing their grades after the exam
This is another example of a poor multiple-choice question: phrasing such as
“the best answer” combined with items that are all true is functionally
equivalent to saying, “read the instructor’s mind.” Even when
the distractors genuinely represent increased degrees of ‘correctness’,
the validity of such questions will by highly dependent on the clarity
and effectiveness of instruction delivered – and particularly whether there
was clear instruction on the need to make what can be quite subtle distinctions.
Items (a) and (b) are opposite sides of the same argument, but context is important
here. If, for example, there is a clear proficiency requirement
(such as calculating the correct dose of a potentially dangerous drug based on patient
weight and drug formulation) then item (a) might very well be the best approach.
Absent such a requirement, I personally would opt for (b) in summative assessment.
Items (c) and (d) are, arguably, bad choices. Option (c) in particular should
not be a consideration if the overall assessment is well-constructed
and valid. Unless you are dealing with extremely large numbers of students, it
would be quite unwise to assume that successive classes always yield identical
spreads of learning outcomes!