|
Based on the book Developing and Using Tests Effectively by Jacobs and Chase
1992. Tip One - The
extent to which students engage in deep or surface learning is tied directly to
their perceptions of what they will be tested on.
- The first test sets the tone and expectation for the student.
- The earlier in the semester the better
Tip Two - Test length is
crucial to the reliability of the test.
Factors to consider:
- Time available—fifty minutes or seventy-five minutes
- Type of questions being asked
True and false ---30 sec. Multiple choice --- one min.
Completion --- one min.
Short answer --- two min.
Multiple choice with higher level thinking ---90 sec.
Matching items --- 30 sec.
Short essays ---10-15 min.
Extended essay ---30 min.
- Other Time Considerations
Numbers of items for a fifty-minute test are 40-45 multiple-choice items or
60–80 true and false.
The fastest student will typically finish a test in about half the time as the
slowest student.
Tip Three - Develop Tests
that have a High Level of Reliability and Validity
Factors that Influence Reliability
- The length of the test
- The time limits for the tests
- The nature of the student group (i.e. If the group is quite homogeneous
the reliability will be lower than if the group is fairly heterogeneous)
- The difficulty of the test items ( 50-70% of the students should be able
to get the answer correct)
- Common set of instructions
- A common environment in which to attempt the test
- The scoring procedure of the test
How to Improve Test Reliability
- Tests should be long enough to sample the content well
- Time limit should allow most students to finish
- Score range should be wide, items at mid-difficulty range
- Items should be free of ambiguity and tricks
- Directions should be clear and concise
What about Essay Test?
- The biggest issue is the consistency of the reader of the test answers.
- Write questions that are not too wide in scope
- Develop a prescribed scoring method.
- A check list of things that need to be in the answer and their point value
- A written response to each question by the instructor that is reviewed
before reading the student answers.
- Reading all of the answers to one question across all tests so you focus
on one answer at a time.
Factors that Affect Validity
- Directions are not clear
- Test requires inappropriate levels of skills that are not part of the
course objectives—teach at one level test at another
- Test items are poorly written
- Test length does not allow for adequate sampling of content
- Complexity and subjectivity of scoring inaccurately rank some students.
- If the scoring process has many steps there are many opportunities for
mistakes
- If it is subjective and easily influenced by factors not part of the
teaching objectives
Improving Validity
- Have someone else read your test for clarity of directions and questions
- Use a test matrix to check balance of questions in relationship to what
was taught
- Develop test questions the day you teach the material
- Ask enough questions to adequately cover the material that was taught
Tip Four - Tips for
Writing Multiple Choice Questions
- You need a quality STEM—a quality stem is one in which the students are
able to read the stem and formulate a tentative answer even before reading
the answer options.
- A stem may be an incomplete sentence
- The stem should be in the simplest form consistent with precision and
clarity.
- Always present a verb in the stem
- Do not pad the stem with superfluous material this only adds to students
reading time.
- State if you want the students to find the correct answer or the
best answer. If it is to be the correct answer it must be
correct beyond any question.
Writing Distractors
- Write distractors that are plausible enough to attract students that do
not know the material very well.
- If you can’t develop a sufficient number of plausible answers then do
not use the question
- Using humor in the answers usually is just a give away to the students.
It is a cue to ignore that answer.
Example
The founder of Ferris State University was
- Ferris Buller
- Ferris Wheel
- Woodbridge Ferris
Make the distractors fairly homogeneous. This will increase the need for
the students to be discriminating in their choices.
Avoid giving irrelevant clues to the students. You want to measure
their content and cognitive skill abilities not their test taking skills.
Examples of irrelevant clues
Length clues—the longest answer is often the correct answer.
Verbal association—using a word in the stem that also appears in the
answers
Grammatical clues
Example Grammar Clue
The coefficient of correlation…social studies test is called a
- Validity coefficient
- Index of reliability
- Equivalence coefficient
Specific stems—these are modifying words or phrases that limit the meaning
of sentences
Example
all, never, always (associated with the incorrect answer)
usually, typically maybe,
sometimes (associated with the correct answer)
Positives and Negatives
- Use positive statements if possible.
- Negative statements can be confusing for students to interpret.
- If you use negative wording call attention to it by underlining it.
All of the Above/None of Above
- Use options like all of the above or none of the above
rarely.
- These distractors are generally too easy.
- If even one of the answer choices is recognized as being incorrect then
the student also knows that all of the above is incorrect
- All of the above maybe a proper use if the instructor is trying to
determine if the students have learned all of the relevant characteristics
or attributes of a phenomenon
- Using none of the above is especially difficult if you are asking
students to find the correct answer as it may be easy to argue that at least
one of the answers was correct in some way.
Additional Considerations when Writing Questions
- Item independence. Getting the correct answer to one item should not be
contingent upon getting the correct response to other items.
- Avoid letting one answer provide a clue to another answer.
- Arrange the options (answers) in a logical order (alpha order or if
numbers in ascending order).
- The correct response choice (a,b,c, or d) should be equally divided
- If the items are controversial site the authority whose opinion is being
used…According to my lecture or In Freud’s opinion…
- Avoid lifting stems verbatim from the text…this encourages students to
memorize rather than fully understand the material.
- Arrange the answer options in vertical columns. This makes the reading
easier and less confusing
Summary Checklist for Writing Multiple Choice Items
- Make sure the item measures significant concepts and principles: do not
write items covering trivia.
- The stem should present a problem; thus, a verb is necessary.
- State the item clearly and concisely use only relevant material.
- Include as much of the item material in the stem; do not repeat words or
phrases in each distractor that could be put in the stem.
- Write one correct or clearly best answer and three or four plausible
distractors
- Avoid giving clues to the right answer; some common clues are
grammatical, some involve length of the options, and some use specific
determiners.
Tip Five - Writing
True-False Test Questions
Positives For Using True-False
- True-false can sample many more bits of information in a given time period
than any other type of test format.
- The greater number of questions increases the reliability of the test.
- True-false tests can be less reliable than multiple choice unless the
number of questions asked (90 questions in a fifty-minute period) is high.
- Research does indicate true-false testing is sufficiently reliable and
valid for periodic in classroom testing.
Negatives of True-False
- It can be difficult to write true-false questions that avoid ambiguous
statements without making the items obvious.
- Writing true or false statements that have no exceptions is problematic.
- Guessing on the part of the student (50-50 chance)
- Students can also make educated guesses increasing their odds beyond 50-50
but still not know the answer outright.
Important Steps in Writing True-False Questions
- Make it clear where the answers are to be placed and what sign (T or F) or
word is to be used. Avoid using a plus (+) and minus (-) sign as the minus can
be made into a plus easily.
- Avoid the use of specific determiners (all, never, always). These
are a sign of false answers.
- Avoid the use of qualifying terms (sometimes, usually, typically).
These are signs of true answers.
- Avoid the use of indefinite terms denoting degree or amount (a long
time ago, a very large part). These are ambiguous and thus make the answer
into a debate
- Don't leave questions up to interpretation
Example:
POORLY WRITTEN: In his study of AIDS, Dr Wye found that many of
those who contracted the HIV virus were exposed through the use of drug
needles that had been used by an infected person.
BETTER VERSION: In his study of AIDS, Dr Wye found that many
(over 20 percent) …infected person.
Assessing students’ knowledge is the goal not their ability to interpret
complex sentences.
Use of Compound Sentences
Can be used by stating a condition first and then followed by an explanation.
Example
Because the combustion of gasoline creates gases that pollute the air, cars
produce more pollutants at fifty miles per hour than at thirty miles per hour
This form of question can test students at a higher level of thinking.
Using True-False to Ask Higher Level Thinking Questions
Use of propositional logic. Using the "if-then" approach.
Example:
Under the current money policy of the Federal Reserve Bank, the prime rate is
.09 and the inflation rate is .04. The gross national product is down.03, and
the unemployment rate is 7 percent. A slow down in the economy is taking place.
- True-False—If the Federal reserve reduces the prime rate, the inflation
rate is expected to rise.
- True-False—If the gross national product goes up and the other indicators
stay the same, the Dow Jones average will probably respond by going up.
- This type of questioning allows for writing several T-F questions related
to the same situation or proposition.
Problem Solving Approach
Example
Last night John bought a used car. This morning it would not start. John
begins to search for the possible causes of the car’s failure to start. Decide
whether each statement is or is not a plausible reason for the car not starting.
- T-F The carburetor may be malfunctioning
- T-F The exhaust manifold may be loose
- T-F The battery may be discharged
- T-F The car may be out of gasoline
Use Multiple True-False Items
Example:
The Boston Tea Party (1773) was
- T-F Actually carried out by Indians
- T-F Planned as a revolt against taxes
- T-F Done because the tea market in America was overstocked and prices were
falling
Tip Six - Short Answer or
Completion Items
- This form of testing demands recall rather than recognition of
information.
- This form does not lend itself to testing higher level thinking very well.
- It is best used to see how well students have collected basic information
pertinent to the course
Writing Items
Write completion items that can be answered in a single word (if at all
possible). Makes scoring faster and less subjective.
Example:
- The density of a fluid is measured with an instrument called the
_____________? (One word)
- The hydrometer is used to measure____________.
The second question will likely have a multiple word answer that may require
interpretation and will definitely take more time.
The statement should be worded so they have only one right answer.
Example POORLY WRITTEN: The battle of Lexington was fought in
__________?
Example CLEARLY WRITTEN: The battle of Lexington was fought in the year
_______?
The first example could have many answers (year, state, season)
- Delete only key words from the statement. No tricks
- Do not lift statements directly from the textbook as this encourages
memorizing, not understanding.
- Make all the blanks the same length.
To make scoring easy use the following model: The quality of a test
that deals with consistency 1. ________ is called (1) and the quality that deals
with the extent to which a test relates to criterion is called (2) 2.________
Tip Seven -
Tests Using Matching Items
To reduce random errors place the stimulus column on the left, with each item
numbered and the responses column on the right with each item lettered.
Provide spaces for students to write their responses to the left of the
stimuli.
Each matching exercise should contain only homogeneous material.
If you use heterogeneous material it simply makes the test easier.
Example
_____ 1. George Washington a. Revolutionary war hero
_____ 2. John Hancock b. Signer of the Dec.of Ind
_____ 3. Virginia c. One of the original colonies
Put the stimuli items in alphabetical order. This will make finding stimuli
easier and save time for the students.
The best number of items is between 10-15
The entire test should be on one page to keep students from having to flip
back and forth to look for stimuli.
The number of items in the response column should be 5 more than the stimulus
column to produce better discrimination on the part of the students
Tip Eight - Writing
Essay Tests
Advantages of Using Essay Tests
- Most advantageous when assessing complex learning outcomes.
- Are relatively easy to construct
- Emphasize communication skills as a fundamental performance in all areas
of complex academic disciplines.
- Cannot be answered by simply recognizing the correct response.
- Do not permit guessing (although they will bluff).
- Essay tests enable instructors to see how students select, organize and
evaluate ideas and apply them to answering the question.
- Essays are not efficient ways however, to get at factual matter,
associative learning and other lower level cognitive objectives.
- A well-constructed test will sample a wide range of course objectives at
varying levels of the cognitive functions taught in class.
Limitations of Essay Test
- They are difficult to score.
- Their scores are less reliable that well written objective tests.
- They provide a very limited sample of the content in the typical unit of
study.
- The score is influenced by the readers overall impression of the student.
- They do not provide a good situation in which to develop good writing
skills.
Reliability Concerns
- They are somewhat less reliable than objective tests.
- Studies show factors like time of day, number of papers being read, mood
of the reader, where the paper is in the stack etc. all can change the grade
of the test.
- The paper read just before a student’s paper can greatly influence the
outcome of the grading process (both good and bad).
- A reader reading the same paper a second time is likely not to give it the
same grade.
- Expectations that an instructor has for a student’s performance influences
scoring.
- Physical elements of the paper (handwriting, erasures, crossing out
material, writing style) can impact a reader’s view of the paper.
- In a study at Arizona University comparing essays that were first hand
written then retyped the typed essays scored almost one full grade higher.
- The use of only a few selected topics increases the possibility that
students may get very high, or very low, scores by the luck of the topic draw.
- There is no data to support that students do better on essay test than
objective test.
Making Essay Tests Better
- Restrict essay to assess outcomes that require complex higher level
cognitive functions.
Examples
- Compare and contrast X and Y in regards to given Qualities.
- Present argument for and against a given issue.
- Illustrate how a principle explains facts.
- Illustrate cause and effect.
- Describe an application of a rule or principle.
- Evaluate the adequacy, relevance, or implication of an arrangement, or
materials and so on.
- Form new inferences form data.
- Organize the parts of a situation, event, or mechanism and show how they
interrelate into a whole.
- Sort out the relevant parts as distinct entities from a total situation,
event, or mechanism.
- Limit the breadth of the essay question.
- It should be tied to a single objective.
- If the question is too broad it cannot be answered in a short time period
and grading it becomes very difficult.
Example POORLY WRITTEN: What were the conditions that led up to the Civil
War?
- All writers should be asked to respond to the same set of test items.
- Giving students choices, although appearing to be fairer actually
creates dozens of different tests, makes comparisons impossible and does not
allow for a common grading scale.
- Grammar and spelling should only be taken into account in the grading if
they are being taught as an objective in the course.
- Directions need to be crystal clear and should include what type of
writing is being sought. (outlines, complete prose, lists).
- The question should lead the student toward the answer that the
instructor wants.
Example POORLY WRITTEN: Why does an internal combustion engine work?
Example WELL WRITTEN: Explain the function of fuel, distributor, and
the operation of the cylinder’s components in making the internal combustion
engine run.
- List the amount of points that each question is worth to allow students
to grasp the stature or importance of each question.
Scoring Essay Tests
- Conceal Students names.
- Use a computer lab if available and have students all use the same font
and double spacing.
- Before reading the papers skim through a few to get the overall feel of
the papers and to get a sense of what a typical response might be, for the
extensiveness of the responses and a sense of what questions they may have
had difficulty with.
- Read only one item across all papers before going on the next item. This
will help instructors apply the same criteria across all papers. Also the
reader has only one criterion (one answer) to keep in mind.
- Reshuffle the stack of papers after reading through each item. This
insures that no one paper will suffer from always following a good paper or
reap the benefits of following a bad paper.
- Use a prescribed reading procedure. Either the "key procedure" or the
"ranking procedure".
Grading Procedures
- In the key procedure the reader lays out the ideas that the
student should have developed in a complete answer, along with the number of
points the student will get for each component of the answer. Research has
shown that this is a more reliable score process than having no prescribe
procedure.
- The readers writes their own correct and complete answer and reviews it
constantly as they read the papers—scores papers on their ability to reflect
what the (teacher) wrote.
- In the ranking procedure the reader goes through the pile on the
first question and lays the paper in 5-7 piles depending on their quality.
Grades are assigned relative to the order of the piles (best to least)
Student Bluffing Characteristics
- Answering every question even though they do not know the answer.
- Restating the question as a declarative statement and elaborating on the
statement usually does this.
- Blatant agreement. If the issue is important to the instructor this
sometimes can earn a few points.
- A broad generalization without elaboration.
- Dropping names with no details. " According to Senator…"
- Emphasize the importance of the question without really answering it. "
This is a vital question in our overpopulated world today…"
- Writing on a related topic in hopes that there will be some cross over
earning some points. " The situation between the
Tip Nine - Test on a
Regular Basis
- Testing every two to three weeks increases the reliability of the final
grade being a true reflection of what the students learned
- Frequent testing has show to produce higher final exam grades in several
studies
- Feedback is one of the most important aspects of student learning—the more
tests the more feedback
- Students are better able to handle smaller amounts of information over
shorter periods of time—the use of tests as a means of structuring and
motivating students is an effective way of enhancing student learning.
Tip Ten - The Feedback of
Test Results is Crucial to Student Learning
- The usefulness and value of the feedback tests can give students
diminishes with every passing hour—you cannot get the tests back too soon.
- Students need to see what they got wrong—understand why it was wrong in
order too fully benefit from the learning that test could provide.
- Faculty need to do a post-test analysis to determine which items were
effective in measuring what they intended them to measure and which were too
easy or too difficult for students—this is the best way to develop quality
tests.
|