9. Methods
9.1 Respondents
In the ALL Pilot study, respondents aged 16 to 65+ were individually tested in their
homes for approximately 90-120 minutes. In each participating country, around 1000-1500
individuals were tested overall, although each respondent was not tested in all of
the domains in ALL. Given the goals of the Pilot study, respondents did not comprise
a probability sample from each country, although care was taken to recruit individuals
from diverse locations and stratify and balance the sample in terms of gender, age groups,
and educational levels.
9.2 Procedure and instruments
All respondents were first given a short "Core", a screener test consisting of four simple
Prose and Document Literacy tasks and two simple Numeracy tasks, which were read
aloud by the examiner. Respondents were then given test booklets that included a subset
of items from one or two of the key skill domains, as well as BQ questions. Respondents
wrote their answers inside the test booklet. Those who received Numeracy items were
free at all times to use a calculator and a ruler provided by the examiner.
Each respondent who received a Numeracy test answered only about half of the
81 items evaluated in the Pilot study. These items were divided into four blocks that
were paired in various permutations into four booklets, thus each respondent received
two of four blocks. This arrangement enabled testing of the complete Numeracy item
pool and evaluation of the relationships between all items, although each respondent
answered only half of the items. The total number that tried each Numeracy item
averaged about 950 respondents across the countries and languages listed above.
9.3 Scoring
Items were scored by trained teams in each country according to the scoring rubrics and
guidelines designed by the Numeracy team. Double-scoring procedures were
implemented on a sample of test booklets in each country, to evaluate consistency of
scoring and detect problems with scoring instructions. A third scorer arbitrated
disagreements. During the scoring process, a member of the Numeracy team answered
the questions from scorers in the participating countries on an active electronic listserve.
9.4 Data-analysis
Various statistical analyses were conducted on the available data, with scaling based on
Item Response Theory, analysis of difficulty levels of items and of scale characteristics
using classical test theory, analysis of the frequency of different types of errors across
countries, and correlational analysis of linkages between summary scores on Numeracy
items and BQ variables and scales of interest. Key analyses were performed at the
Educational Testing Service in Princeton, and some were conducted by members of the
Numeracy team and by Statistics Canada.
|