Ferris State University

Center for Teaching, Learning & Faculty Development
Writing Tests - Short Answer or Completion Items and Matching Tests
  From Developing and using Tests Effectively by Jacobs and Chase 1992.

Short Answer or Completion Tests

  • This form of testing demands recall rather than recognition of information
  • This form does not lend itself to testing higher level thinking very well
  • It is best used to see how well students have collected basic information pertinent to the course

Writing Items

1. Write completion items that can be answered in a single word (if at all possible).

Makes scoring faster and less subjective.

Example:

The density of a fluid is measured with an instrument called the_____________? (One word)

The hydrometer is used to measure____________

The second question will likely have a multiple word answer that may require interpretation and will definitely take more time.

2. The statement should be worded so they have only one right answer.

Example POORLY WRITTEN: The battle of Lexington was fought in__________?

Example CLEARLY WRITTEN: The battle of Lexington was fought in the year_______?

The first example could have many answers (year, state, season)

3. Delete only key words from the statement. No tricks

4. Do not lift statements directly from the textbook as this encourages memorizing, not understanding.

5. Make all the blanks the same length.

To make scoring easy use the following model:

The quality of a test that deals with consistency is called (1) 1.__________and the quality that deals with the extent to which a test relates to criterion is called (2) 2. ________

Tests using Matching Items

To reduce random errors place the stimulus column on the left, with each item numbered and the responses column on the right with each item lettered.

Provide spaces for students to write their responses to the left of the stimuli.

Each matching exercise should contain only homogeneous material.

If you use heterogeneous material it simply makes the test easier.

Example

                  Stimuli                             Answer choice

_____ 1.George Washington       a. revolutionary war hero

_____ 2.John Hancock                b. signer of the Dec.of Ind

_____ 3.Virginia                          c. One of the original colonies

 Put the stimuli items in alphabetical order. This will make finding stimuli easier and save time for the students.

The best number of items is between 10-15

The entire test should be on one page to keep students from having to flip back and forth to look for stimuli.

Have at least five more choices than the number of stimuli

Writing Tests - Reliability of Tests
  Based on the book Developing and Using Tests Effectively by Jacobs and Chase, 1992.

Reliability deals with the consistency of measurements. The reliability of an assessment is a measure of the consistency with which the test produces the same result under different but comparable conditions (Example: similar populations of students getting similar scores).

For a test to be reliable it must adequately reflect the objectives of the teaching unit.

The best way is to have two measurements of a common trait (course unit) for a common group of people (your students) this would mean having two parallel or equivalent forms of the same test.

Take a single test and split it into two halves. You get a student’s score on the odd number items and another score on the even numbered items. This is not the best way to do a reliability assessment but it will give you some indication if the number of test questions is large enough (fifty or more)

Another way is to look how consistently students perform on each item, in effect treating each item as a mini test.

Using Split Halves to Test Reliability

The test must be focused on a common domain of knowledge.

Limitations are that you end up with two short tests that are less reliable measures.

However, you can use the Spearman-Brown prophecy formula to estimate the reliability of the test of the original length. What we are seeking is that a student’s score on the first half (or their rank) is close to their score (or rank) on the second half. Ideally, you want to have correlations at the .70 or .80 value. (Additional Information  can be found at  http://www.jmu.edu/assessment/wm_library/Reliability_validity.pdf

Factors that Influence Reliability

  • The length of the test

  • The time limits for the tests

  • The nature of the student group (i.e. if the group is quite homogeneous the reliability will be lower than if the group is fairly heterogeneous)

  • The difficulty of the test items (i.e. if the test items are too difficult than the spread of scores will be small)

  • A common set of instructions

  • A common environment in which to attempt the test

  • The scoring procedure of the test

  • Students are aware of how they will be assessed, length, time, content or objectives and value of the test

How to Improve Test Reliability

  • Tests should be long enough to sample the content well
  • Time limit should allow most students to finish
  • Items should be free of ambiguity and tricks
  • Directions should be clear and concise
  • There should be few items that all get wrong or right

What about Essay Test and Reliability

The biggest issue is the consistency of the reader (teacher) of the test answers.

  • Are the questions too wide in scope?
  • Has the reader developed a prescribed scoring method?

A check list of things that need to be in the answer and their point value

  • A written response to each question by the instructor that is reviewed before reading the student answers

Some examples that promote reliability

  • Reading all of the answers to one question across all tests, so you focus on one
    answer at a time
  • Higher reliability is achieved if two readers are used for the tests

Estimating Reliability

  1. Test-retest reliability: a correlation between the score from giving the same test twice to the same students.
  2. Parallel form of reliability: a correlation between scores of the same students on two equivalent forms of the same test.
  3. Internal consistency reliability: correlation or consistency indices among items on a single test—a student should score a similar score on the odd questions as the even questions of a test.

Other factors in Test Reliability

  1. Tests that are too easy
  2. Test are too hard—this encourages guessing which introduces random error
  3. The more questions the better reliability and the less impact guessing will have on the score.
  4. The "true" or "exact score" of a student on a test should be seen as falling into a range of success. An 85% is not significantly better or worse that an 88% or 82%-- There is always a degree of error in any test score.

Test Validity

  • A valid assessment procedure is one which actually tests what it sets out to test—i.e. one which accurately measures the behavior described by the objective under scrutiny.
  • A test is valid if it provides data that increases the accuracy of decisions about a person or object.
  • Tests do not have general validity. They are valid in relation to specific variables, such as intelligence or achievement of course objectives.

Measuring Tests Validity

The obvious way to collect evidence of a tests validity is to compare a student’s score on a test to some external measure of the same trait that the test measures.

Example: Or comparing the ACT with freshman grades in college.

The focus is on how well the test samples a domain of behaviors or knowledge about which we will make an inference.

Tests are not measures of an entire domain, but samples of the desired behavior from which we draw conclusions about a student’s knowledge of an entire domain.

Content Validation

  • The extent to which the test questions reflect the entire body of the content that the test is designed to measure
  • Tests are only a sample of what a student knows so the validity of the sample is dependent upon the representative-ness of the sample
  • The domain of interest measured is not only a subject matter domain but also a behavioral domain (what kinds of mental operations should be tested)
  • Well written tests not only sample all the material taught but also do so in a representative way—the percentage of questions on a test should reflect the time and importance given to a topic.

 How to Insure Test Validity

  1. The test should not test other "things" like vocabulary or writing skills unless these were part of the course objectives.
  2. If the instructor samples the course objectives in proportion to their importance in the course, the instructor’s test will have content validity.
  3. Use a Table of Specifications (see appendix)
  4. The extent to which the test is free from influences or irrelevant variables that threaten the validity of the test the more accurate the finding will be.
  5. Examples of irrelevant variables: vocabulary, ambiguity, minute details, grammatical incorrectness, too little time etc.

Also items like:

  1. Directions are not clear
  2. Test requires inappropriate levels of skills that are not part of the course objectives
  3. Test items are poorly written
  4. Test length does not allow for adequate sampling of content
  5. Complexity and subjectivity of scoring inaccurately rank some students
  6. If the scoring process has many steps there are many opportunities for mistakes
  7. If it is subjective and easily influenced by factors not part of the teaching objectives

Other Factors that Affect Test Validity

  1. The test taking skills of the students—guessing strategies, good allocation of time.
  2. Test wise-ness—the ability to use clues to obtain a score higher than a score that is deserved—they appear to know more than they do.
  3. Response sets—the tendency to respond on a test in a certain consistent way—always mark true if don’t know—always choose the longest answer on multiple choice if uncertain.
  4. Anxiety and Motivation—performance may be impaired by high anxiety or low motivation—research is unclear if the problem is poor test taking skills which leads to poor performance and anxiety. Several studies have shown no causal link between anxiety and test scores but we do know the brain secretes neuro-chemicals under stress that can interfere with cognitive processes.

Administrative Factors

  1. The way a test is administrated— the "this is going to be hard" comments
  2. Proctored by someone else
  3. Students believe cheating is "ignored"
  4. Clarity of instructions
  5. Coaching and practice—drill and practice to the test can affect scores
  6. Test bias—the manner in which a test is constructed to give some people an unfair advantage over others (can help or hurt scores) Test bias is defined as individual(s) from different groups who are equally able do not have equal probability of success (Anderson 1980). The question is, do the differences on tests result from factors irrelevant to what the test was designed to measure or do such differences mirror the true differences between the groups in what the test intended to measure. The key is to equally able groups that perform differently.

Random Errors and Systematic Errors

  1. Random errors by definition are random: the amount and direction of the test error differs unsystematically from one measurement to the next and from one person to the next.
  2. Random errors reduce test reliability

  Examples:

  • Would be to under predict or over predict the math ability levels of a test group
  • The student is tired
  • Student(s) being upset
  • The test being just before a big event
  • Students being cold or hot or hungry

Systematic Errors Affect Test Validity

Unknown to the test developer or the test taker the test measures something that it was not intended to measure

Example: a math test that has word problems that require a very good reading ability – if the score is lower it may be due to reading ability and not math ability.


Faculty wanting further information about any of these topics are encouraged to contact Terry Doyle at doylet@ferris.edu

      Under Construction -- Watch for Updates


CTLFD Home FSU Home Intranet Search