3. Measurement of problem solving

There are at least three different sources for the design of problem-solving tests: Tasks used in psychological research, domain-specific problem-solving tasks, and tasks used in previous large-scale assessments of cross-curricular or practical problem solving. These three possibilities will be examined in the following sections.

For the ALL study, the following requirements should be examined: The extent to which tasks tap broad analytical problem-solving abilities and are in this respect theoretically sound. They should furthermore be embedded within a real-life context that is realistic enough to trigger actual and not artificial problem-solving processes, and a context that does not make specialized knowledge a prerequisite. Finally, they should show adequate psychometric properties and be compatible with the constraints imposed by a large-scale assessment.

3.1 Tasks used in psychological research on problem solving

During the 20th century, psychological research in the area of problem solving concentrated on a few experimental paradigms. For example, the famous radiation problem in cancer therapy (Duncker, 1945), the water-jug problems (Luchins, 1942), the "Tower of Hanoi" (Newell and Simon, 1972) and it's analogies, Wason's rule induction task (Wason, 1966), traveling-salesman problems or Cryptarithmetics were used again and again in experimental settings (cf. Anderson, 1999). In addition to these puzzle-like problems, psychologists used knowledge-rich tasks such as chess games, geometry problems, algebraic word problems, mechanical reasoning or computer programming. In the European tradition of problem-solving research, computer simulations of various economical or ecological scenarios were introduced as a means of investigating human behavior in ill-defined, dynamic, intransparent, and complex problem situations (see Frensch and Funke, 1995). Thus, one possible strategy for the design of problem-solving assessment instruments could be to implement one or more of these paradigms. However, the tasks used in experimental research a) are often well-known to a larger public, b) are not appropriate for large-scale assessment, c) are not tailored to the life and experiences of the target population. Thus, the challenge would be to adapt these tasks, transform them into appropriate test formats, and contextualize them in a way that is meaningful to subjects across participating countries. The heterogeneity of the tasks brings up another problem: Mixing for example a Tower-of-Hanoi-like problem with an "insight"-problem would most probably yield a test with low internal consistency and unknown validity.