Methods from classical test theory as well as advanced models from item response
theory (IRT) were applied. In addition, experts classified the items and rated item
features. Most of the analyses were done in preparation of the final item selection and
revision. Therefore, some of the results presented below are based on preliminary data
sets and preliminary scoring procedures. However, comparisons carried out after the
final test analysis revealed that the results are stable. For example, the correlation between
our first version of item difficulty parameters and the final parameters, generated at
Educational Testing Service (ETS) after the item selection and scoring had been finalized,
is .92.
The results of the pilot test are quite conclusive. In the following, the relevant
criteria and analytical procedures for each of the four issues will be described, followed
by a short presentation of the corresponding findings from the ALL pilot study. Thus,
it will become clear how the pilot results were used to select a final set of instruments
and to develop an optimal instrument for the assessment of analytical problem solving.
5.2 A unique, common scale for analytical problem solving
5.2.1 Criteria and expectations
The matrix design of the field trial allowed for an integrated analysis of the long versions
of all four projects. Thus, it was possible to estimate the latent (error-free) correlations
between the projects. We expected these latent correlations to be around or above .90.
Correlations of this size would indicate that the four projects could, in fact, be interpreted
as building blocks of a single, common latent dimension.
The classical approach to test and item analysis — calculating item-test-correlations
and estimating test reliability by the so-called coefficient alpha — could be applied to
the combined short versions of all four projects (I+J+K+L), as these 18 items were
administered to the same group of respondents. According to standards of item
construction, an alpha coefficient above .80 would indicate that all the items — regardless
of the different project contexts — make up a single, consistent dimension.
5.2.2 Findings and conclusions
The calculated pair-wise latent correlations between the different blocks ranged from
.925 to .959. The combined short versions show a sufficiently high consistency
(alpha = .81; part-whole-corrected item—test-correlations from .23 to .55 with a median
of .38).
Thus, we can conclude that the items from all four projects form a common
latent dimension, i.e. the analytical problem-solving scale. This is true both
for the long and short versions of the problem-solving instrument This finding
is very much in
line with results from earlier implementations of the project approach (Ebach,
Klieme and Hensgen, 2000, Klieme et al., 2001), where structural equation models
(SEM)
showed that problem-solving tests based on the project approach make up a unique
dimension. This result has important consequences for the validity of the ALL
problem-solving scale. It shows that the problem-solving test does not merely
measure the ability
to cope with certain special, context-dependent planning problems. Instead, the
items actually do tap a general competency for analytical reasoning and decision
making in
complex situations where problem solving is required.
|