As a condition of participation in the international study, it was required to capture and process files using procedures that ensured logical consistency and acceptable levels of data capture error. Specifically, complete verification of the captured scores (i.e., enter each record twice) was done in order to minimize error rates. Because the process of accurately capturing the task scores was essential to high data quality, 100 percent keystroke verification was required.
Industry, occupation, and education variables were required to be coded using standard schemes such as the International Standard Industrial Classification (ISIC), the International Standard Classification of Occupations (ISCO) and the International Standard Classification for Education (ISCED). Coding schemes were provided for all open-ended items, as were specific instructions about coding of such items.
Persons charged with scoring received intense training in scoring responses to the open-ended items using the ALL scoring manual. As well, they were provided a tool for capturing closed format questions. To aid in maintaining scoring accuracy and comparability between countries, the ALL survey introduced the use of an electronic bulletin board, where countries could post their scoring questions and receive scoring decisions from the domain experts. This information could be seen by all participating countries, and they could then adjust their scoring. To further ensure quality, monitoring of the scoring was done in two ways.
First, at least 20 percent of the tasks had to be re-scored. Guidelines for intracountry rescoring involved rescoring a larger portion of booklets at the beginning of the scoring process to identify and rectify as many scoring problems as possible. In a second phase, a smaller portion of the next third of the scoring booklets was selected; the last phase was viewed as a quality monitoring measure, which involved rescoring a smaller portion of booklets regularly to the end of the re-scoring activities. The two sets of scores needed to match with at least 95 percent accuracy before the next step of processing could begin. In fact, most of the intra-country scoring reliabilities were above 95 percent. Where errors occurred, a country was required to go back to the booklets and rescore all the questions with problems and all the tasks that belonged to a problem scorer.
Second, an international re-score was performed. Each country had 10 percent of its sample re-scored by scorers in another country. For example, a sample of task booklets from the United States was re-scored by the persons who had scored Canadian English booklets, and vice-versa. The main goal of the re-score was to verify that no country scored consistently differently from another. Inter-country score reliabilities were calculated by Statistics Canada and the results were evaluated by the Educational Testing Service based in Princeton. Again, strict accuracy was demanded: a 90 percent correspondence was required before the scores were deemed acceptable. Any problems detected had to be re-scored. Table C3 shows the high level of inter-country score agreement that was achieved.