In each case an international panel of experts conducted a review of existing theory and approaches to measurement. This review was used to construct a framework for assessment that rendered explicit the factors that underlie the relative difficulty of tasks in each domain.

  • small scale piloting in pairs of countries to confirm key theoretical and measurement assumptions;
  • large scale development of assessment items in each domain by extensive networks of international experts.

In each case a broader group of experts was enlisted to use the assessment frameworks to develop assessment items that covered the expected range of proficiency and social contexts. The consortium went to great lengths to ensure that development was broadly representative linguistically, culturally and geographically.

In a parallel activity assessment frameworks were refined to reflect what was learned through small scale piloting and item pools were bundled into a pilot survey for testing in all participating countries.

  • full scale piloting of ALL instruments in seven countries and five languages;
  • selection of items for inclusion in the final assessment, refinement of the frameworks and background questionnaire using data from the full pilot.