9. Methods

9.1 Respondents

In the ALL Pilot study, respondents aged 16 to 65+ were individually tested in their homes for approximately 90-120 minutes. In each participating country, around 1000-1500 individuals were tested overall, although each respondent was not tested in all of the domains in ALL. Given the goals of the Pilot study, respondents did not comprise a probability sample from each country, although care was taken to recruit individuals from diverse locations and stratify and balance the sample in terms of gender, age groups, and educational levels.

9.2 Procedure and instruments

All respondents were first given a short "Core", a screener test consisting of four simple Prose and Document Literacy tasks and two simple Numeracy tasks, which were read aloud by the examiner. Respondents were then given test booklets that included a subset of items from one or two of the key skill domains, as well as BQ questions. Respondents wrote their answers inside the test booklet. Those who received Numeracy items were free at all times to use a calculator and a ruler provided by the examiner.

Each respondent who received a Numeracy test answered only about half of the 81 items evaluated in the Pilot study. These items were divided into four blocks that were paired in various permutations into four booklets, thus each respondent received two of four blocks. This arrangement enabled testing of the complete Numeracy item pool and evaluation of the relationships between all items, although each respondent answered only half of the items. The total number that tried each Numeracy item averaged about 950 respondents across the countries and languages listed above.

9.3 Scoring

Items were scored by trained teams in each country according to the scoring rubrics and guidelines designed by the Numeracy team. Double-scoring procedures were implemented on a sample of test booklets in each country, to evaluate consistency of scoring and detect problems with scoring instructions. A third scorer arbitrated disagreements. During the scoring process, a member of the Numeracy team answered the questions from scorers in the participating countries on an active electronic listserve.

9.4 Data-analysis

Various statistical analyses were conducted on the available data, with scaling based on Item Response Theory, analysis of difficulty levels of items and of scale characteristics using classical test theory, analysis of the frequency of different types of errors across countries, and correlational analysis of linkages between summary scores on Numeracy items and BQ variables and scales of interest. Key analyses were performed at the Educational Testing Service in Princeton, and some were conducted by members of the Numeracy team and by Statistics Canada.