Experts classified each of the 20 selected items according to these categories. It
was hypothesized that items classified as covering higher levels of proficiency should
exhibit higher indices of item difficulty in the pilot test. However, as the empirical
difficulty of a test item is shaped by a multitude of factors which can only partially be
controlled for (e.g. the amount of previous knowledge required, the clarity of the item
text, the mental workload involved, etc.), a certain amount of overlap between the
pre-defined sets of items is inevitable. Previous work on proficiency scaling (cf. Watermann
and Klieme, 2002) shows that sophisticated theories of item difficulty, operationalized
by expert ratings of item demands, can explain between 65 and 80 percent of betweenitem-
variance in item difficulty, when applied to large-scale assessment data.
The following two criteria are therefore realistic and should yield a satisfactory
level of precision:
- Mean item difficulty should increase from level (1) to level (4).
- At least two thirds of the between-item-variance in difficulty should be
explained by the experts' classification of items into the four proficiency levels.
5.5.2 Operationalizing the proficiency levels: The item classification
Level 1: 3 out of 20 items were classified as content-related tasks. These are rather
concrete tasks with a limited scope of reasoning. They require the respondent to make
simple connections, without having to check the constraints systematically. The
respondent has to draw direct consequences, based on the information given and on her
previous, content-related knowledge.
Thus, the mental operations that must be applied successfully to solve items at
level 1 can be characterized as schemata of content-related thinking.
Level 2: Another 3 items were classified as
corresponding to the second level. These items require the respondent to evaluate
certain alternatives with regard to well-defined,
transparent, explicitly stated criteria. The reasoning may be done step by step,
in a linear process, combining information from the question section and the
information
section.
Thus, the mental operations that must be applied successfully to solve items at
level 2 can be characterized as systematical (concrete logical) reasoning.
Level 3: 8 out of 20 items were classified
as belonging this level. Some tasks require the respondent to order several
objects according to given criteria. Others require
her to determine a sequence of actions/events or to construct a solution by taking
non-transparent or multiple interdependent constraints into account. This means
that on
level 3 the respondent has to cope with multi-dimensional or ill-defined goals.
Thus, the mental operations that must be applied successfully to solve items at
level 3 can be characterized as formal operations. The reasoning process goes back and
forth in a non-linear manner, requiring a good deal of self-regulation.
Level 4: The remaining 6 of the 20 items correspond to this level. These items
require the respondent to judge the completeness, consistency and/or dependency among
multiple criteria. In many cases, she has to explain how the solution was reached and
why it is correct. The respondent has to reason from a "meta-perspective", grasping an
entire system of problem states and possible solutions.
Thus, the mental operations that must be applied successfully to solve items at
level 3 can be characterized as critical thinking and meta-cognition.
|