The Multiple-Choice Exam on Creating Multiple-Choice Exams (2)

Here are the answers to the second part of this guide. The correct answers are shown in highlighted text. Additional commentary is provided where appropriate.

  1. In the book, “The Hobbit”, the character Bilbo Baggins functions as the:
    1. antihero
    2. antagonist
    3. focal character
    4. supporting character
    Note that the items are ordered both alphabetically and in increasing length, which avoids giving undue prominence to any one of them. We could also have used ‘hero’ as the correct answer, but this (i) gives an item much shorter than the others and (ii) leaves the word ‘character’ appearing only once in the list (a cue).
  2. The answer to question 1 is evidenced by the fact that:
    1. he sets the dwarves up to be captured repeatedly
    2. he is the least experienced character in the story
    3. he cheats, lies, and steals throughout the story
    4. he appears throughout the story in critical ways
    It took about 10 attempts to come up with the final list of distractors, since they needed to (i) relate to the items in question 1 and (ii) be plausible without being too obvious. Using ‘he is the titular character of the book’ would clearly be an inadequate choice!
  3. An ideal gas has a molar mass of 46.0 g/mol. If a 3.00 g sample of the gas is heated in a rigid 2.00 L container to a temperature of 400 K, the pressure of the gas in the container will be (R = 0.08206 L/(mol.K)):
    1. 0.004 atm
    2. 1.07 atm
    3. 16.4 atm
    4. 252 atm
    The calculation requires substitution of n = m/Mm into PV = nRT with subsequent rearrangement. And yes, I am quite aware that those aren’t strict SI units!
  4. Four identical rigid containers, each containing the same mass of a different gas, are held at the same temperature. The vessel with the gas at the highest pressure will be the one containing:
    1. carbon dioxide (CO2)
    2. methane (CH4)
    3. nitrogen (N2)
    4. krypton (Kr)
    5. Cannot be determined
  5. The items in question 2 were crafted to correspond to those in question 1, but were deliberately presented in a different sequence because:
    1. the correct answers in MC tests should always be evenly distributed between the item postions (a – d)
    2. the correct answer in MC tests should never be in the same position as the preceding question
    3. the position of the answer chosen for the first question should not act as a cue for the paired question
    4. the items in question 1 are in ascending order, so next question should be in descending order
    It is not necessary to have an equal number of answers in each position on a test; neither is it necessary to avoid repeating positions. It is, however, a good idea to make sure that the same uneven distribution of answer positions does not occur on every test, as this would give an advantage to students guessing answers!
  6. Question 1 was intended to be an easy first question to help students settle into the test. If the question was appropriate for this purpose, you would expect its facility value to be:
    1. < 25%
    2. 25 – 50%
    3. 50 – 75%
    4. > 75%
    Actually, I’d hope it was a lot higher than 75% if it was intended to be an easy question!
  7. A question that completely discriminated between the upper and lower quartiles would have a maximum value of:
    1. DP = 100%
    2. DP = 75%
    3. DP = 50%
    4. DP = 25%
    The maximum would be all students in the upper and none in the lower quartile getting the correct answer. Note that the item order is descending so that the correct answer is in different positions for questions 6 – 8, which are otherwise quite similar.
  8. A question that was answered correctly by all of the upper and half of the lower quartiles would have a discriminating power of:
    1. 25%
    2. 50%
    3. 75%
    4. 100%
    Note that, if the class size is small, DP could also be calculated using the top and bottom 30% of the class using the same formula.
  9. Question 4 was found to have facility value of 15% and a discriminating power of 36%. Given the nature and purpose of this question, you would immediately:
    1. check which distractors were chosen most frequently
    2. check how many of the lower quartile answered correctly
    3. conclude that the question was appropriate and/or adequate
    4. conclude that the question was inappropriate and/or inadequate
    Such a low FV would normally be a red flag, especially if a large fraction of the class selected the same incorrect answer. This might be indicative of an error in the question, for example, or a mistake in the instructor’s solution, and so should be subject to careful review. In this case, we were trying to discriminate between students at a high conceptual level. The first consideration then is whether students in the upper quartile answering incorrectly had mostly opted for answer e (conceptual failure), or option a (error in procedure). In general, it is a good idea to compare facility values and discriminating powers for each item; a distractor chosen by more of the upper than lower quartile students should again be subject to review by the instructor.
  10. The best practice for multiple-choice summative assessment questions is to:
    1. use common errors as distractors since students will not make these mistakes if they have really understood the material
    2. avoid common errors as distractors since cumulative assessments are stressful and even good students make mistakes
    3. use common errors as distractors in order to ensure the results yield the desired grade distribution
    4. avoid common errors as distractors in order to avoid students appealing their grades after the exam
    This is another example of a poor multiple-choice question: phrasing such as “the best answer” combined with items that are all true is functionally equivalent to saying, “read the instructor’s mind.” Even when the distractors genuinely represent increased degrees of ‘correctness’, the validity of such questions will by highly dependent on the clarity and effectiveness of instruction delivered – and particularly whether there was clear instruction on the need to make what can be quite subtle distinctions.

    Items (a) and (b) are opposite sides of the same argument, but context is important here. If, for example, there is a clear proficiency requirement (such as calculating the correct dose of a potentially dangerous drug based on patient weight and drug formulation) then item (a) might very well be the best approach. Absent such a requirement, I personally would opt for (b) in summative assessment. Items (c) and (d) are, arguably, bad choices. Option (c) in particular should not be a consideration if the overall assessment is well-constructed and valid. Unless you are dealing with extremely large numbers of students, it would be quite unwise to assume that successive classes always yield identical spreads of learning outcomes!