Measuring Adult Literacy and Life Skills: New Framework for Assessment

A small-scale feasibility study was conducted in the USA and The Netherlands, in which 44 additional items were tested on samples of about 55 cases in each country. Statistical analyses conducted at this stage were quite similar to those used during the first feasibility study, although the smaller sample prevented the use of IRT analysis.

Out of all the items tested in the two feasibility studies conducted in Stage 1 and Stage 2, a pool of 81 items was selected for further testing at the Pilot stage. These items had adequate psychometric characteristics, covered diverse levels of difficulty, encompassed key facets of the conceptual framework for Numeracy, and could be adapted without difficulty to a language other than English and to different units of measurement. These items did not show any remarkable error patterns indicating that stimuli, questions or task contexts were misunderstood. In addition, two easy Numeracy items were selected to become part of the "Core", the screener test that is to be successfully passed by each respondent in order for him or her to receive one or more full-length test booklets with items assessing performance in the ALL skill domains.

7.3 Stage 3 (2001-2003): Preparations for pilot testing in participating countries

Scoring materials. Once items for the Pilot stage were selected, the Numeracy team prepared detailed scoring guidelines for each item, as well as a training manual for scorers. In addition, the team prepared a detailed manual describing "critical elements" of each item that should be kept constant during the translation and adaptation work that came next.

Translation and adaptation for Pilot. The 81 items selected at the end of Stage 2 were translated and adapted by all participating countries, sometimes into multiple languages within the same country (e.g., Canada created English and French versions, and Switzerland created German, French, and Italian versions). The translation process was supported by training workshops held in 2000 and 2001 and by the support materials prepared by the Numeracy team, i.e., the manual describing critical elements of each item that should be kept constant across language versions, and a "translation and adaptation manual". Item adaptation aimed to maintain cognitive equivalency in terms of their task demands. Thus, for example, when units of measure or monetary values in some computational items were adapted to various country situations, guidelines emphasized the need to keep item demands comparable.

Further, each country prepared not only country-specific and language-specific sets of items, stimuli, and response pages, but also adapted the scoring instructions and the training manual for scorers. During this stage, all translators could also post questions by e-mail to a hotline staffed by members of the Numeracy team.

The next section provides an overview of the results from the Pilot phase and further explanations regarding the properties of the final 40 items selected for the Main assessment.