Congratulations on completing part one! We will now dive a little deeper into the art of writing multiple-choice exams. As before, record your answers to the questions as you work through the material; you'll be able to check them later.
One common criticism of multiple-choice exams is that they can only test factual recall or simple calculation skills. This is not actually true, although it is how they are commonly used. There are two key problems with writing conceptual MC questions:
In the context of assessment, validity is the degree to which the assessment (such as a MC test) is both adequate and appropriate for its intended purpose. This includes obvious tests of validity, such as each MC question having only one correct answer and being free from ambiguity. Equally important, however, is that the assessment should actually measure what it sets out to. Thus, a conceptual MC question that effectively tested students’ language skills rather than their understanding of the subject would be considered inadequate for its purpose.
We have already seen some examples of questions that test a student’s ability to take tests: the use of negation, double negatives, and illogical sequencing of items would all be considered inappropriate, unless the exam was explicitly a test of logical reasoning skills (and even then...) Similarly, questions on content not yet covered, that require information the student could not reasonably be expected to know, or that use iodiomatic language all undermine the validity of the test.
Another issue relating to validity is what the test (and its results) will be used for. For example, MC questions commonly form the basis of concept inventories used in diagnostic assessment, either as a research tool or for placement purposes; an inventory that failed to reflect learning gains or distinguish between students of different ability would again be considered inadequate for purpose.
To illustrate some of the potential problems in creating conceptual MC tests, and ways to address them, we will now look at two different approaches: the use of paired questions and case studies.
Suppose, for example, that you wanted to set a conceptual exam for a literature course by having the students analyse one of the assigned texts. The first question might be:
Anticipating the types of questions you might ask, or having prepared detailed summaries of each text studied, some students may have simply memorized lists of characters and their roles. As such, the question does not solely test conceptual understanding. One way around this is to pair questions probing both concept and reasoning together. In this case, the next question might then be:
Another example might be designed to distinguish between students who perform calculations solely by memorized procedures, and those that understand the concepts involved in deriving the same calculation:
In this example, question 4 cannot be answered by direct calculation, since insufficient information is provided. Instead, students would have to reason from first principles that the gas with the lowest molar mass will result in the greatest number of moles and, therefore, the highest pressure. The reason for adding “Cannot be determined” is that without it, students unable to make the conceptual conceptions would be forced to guess; adding this option enables such students to be identified and provided with extra help.
You could criticise this pairing, since the calculation for question 3 is effectively a cue for the deduction in question 4. This could be circumvented by providing the number of moles (instead of the mass and molar mass) in question 3. One way to decide between these options would be to determine the facility value, FV, for each question – this is simply the fraction of students choosing the correct answer. If the FV for question 4 was lower when paired with the simpler form of question 3 than with the original form shown above, then one might well conclude that students were being provided with a significant hint by the original calculation.
Another approach to writing conceptual MC questions is to use case studies. Typically, these might describe scenarios similar to ones used in class; alternatively, they might present entirely new situations that must be analysed using the skills taught in class rather than any specific prior knowledge. In fact, this whole page can serve as an example of the case-study approach to conceptual multiple-choice testing. Given this information, attempt to answer the following questions:
Another metric that can be employed to assess a MC question is its discriminating power, DP. Students are first ranked by their total score, and answers for the upper and lower quartiles separated. For each question, DP is then calculated as the difference in the number of students in the upper and lower quartiles answering correctly divided by half the number of students in both quartiles:
DP(%) = 200 × (NCUQ − NCLQ) / (NTUQ − NTLQ)
The distractors in question 3 were deliberately chosen as the results of common calculation errors, such as inverting the terms in the conversion of mass to moles. In this way, the question functions as a diagnostic assessment of the students’ calculation skills, just as question 4 is diagnostic of their conceptual understanding.
The same question can also be used for formative assessment, since a student can then be provided with highly specific feedback about their mistakes and allowed to try again; this works particularly well in an electronic test environment.
Such questions can also be used for summative assessment, where cumulative knowledge and understanding is tested at the end of a unit or course. This can, however, lead to complaints that instructors are deliberately setting unfair “trick questions”.
Recorded your answers somewhere? Then proceed to the next page to check your score...