7. Production and evaluation of items
The creation of items for the Numeracy assessment progressed through three stages:
Two stages involving production of items and their testing in two countries on relatively
small samples, and the third stage involved a much larger Pilot testing process.
7.1 Stage 1 (1998-1999): Production and field-testing of a first item pool
Based on the above general principles, a pool of over 80 items was generated by team
members, based on their experience in research, assessment, and teaching with both
school-based and diverse adult and workplace learner populations in several countries.
Production grid. Items were created so as to fill cells within an item production
grid with four key dimensions that match the conceptual facets outlined in Table 1:
- Type of purpose / context: everyday, societal, work, further learning.
- Type of response: identifying or locating, acting upon (order/sort,
count, estimate, compute, measure, model) interpreting, communicating about.
- Type of mathematical or statistical information: quantity, dimension,
patterns/ relations, data/chance, change. The content of the tasks was
also conceived,
however, in terms of common school-based mathematics topics more familiar
to policy makers and educators, i.e., whole numbers and basic operations;
ratios, percents, decimals and fractions; measurement; geometry; algebra;
and statistics.
- Type of representation of mathematical or statistical information: numbers,
formulae, pictures, diagrams, graphs, tables, texts.
Scoring. Guidelines for scoring responses were designed to
classify them into
three general groups: "correct", "any other response" (i.e., wrong answers) and
"not
attempted" (i.e., no indication the respondent tried an item). However, for many
items,
multiple codes were prepared to capture different types of "correct" or "wrong" answers
and thus enable an analysis of error patterns and shed light on the extent to
which instructions are understood and items elicit the expected type of responses.
In some
items that require estimation or measurement, multiple codes were prepared to
capture responses that may have different degrees of accuracy yet still fall
within a "correct" or
"wrong" region, in order to understand the level of accuracy that respondents
adopt.
Non-cognitive items. Research literature suggests that the way in which a person
responds to a numeracy task, including overt actions as well as internal thought processes
and the adoption of a critical stance, depend not only on knowledge and skills but also
on negative attitudes towards mathematics, beliefs about one's mathematical skills, habits
of mind, and prior experiences involving tasks with mathematical content (Cockcroft,
1982; Lave, 1988; Schliemann and Acioly, 1989; Saxe, 1991; McLeod, 1992; Gal,
2000). Hence, the Numeracy team also prepared several scales for the Background
Questionnaire, with questions designed to measure numeracy practices at home and at
work, attitudes and beliefs about mathematics, and information about the environment
in which the respondent learned mathematics while in school. Such scales may help in
explaining performance on numeracy tasks, as well as understanding respondents' status
on variables of interest to policy makers, such as participation in further learning or
employment status.
|