Based on our definition of teamwork, the relevant literature on knowledge tests (Borman, 1991; Dye, Reck and McDaniel, 1993; Hunter, 1986), the domain we sought to measure, and our desire to assess applied knowledge, our questions require respondents to make situational judgments. In personnel selection, both situational judgment questions for written tests and structured interviews have been shown to predict job performance (M. Campion, J. Campion, and Hudson, 1993). Specific to teams, Stevens and Campion (1994b) have reported significant criterion-related validities with supervisory and peer ratings of team performance for a thirty-five-item situational judgment test of teamwork knowledge (although this measure was also significantly correlated with respondent general mental ability). Finally, situational judgment tests have a high degree of face validity for the respondent.

4.4.2 Item development

Initially, an item production grid was constructed to guide item development (refer to Appendix 1.1). The item production grid was derived from the team skill definitions and the behavioral facets representing each skill (i.e., the item production grid in Appendix 1.1 represents the key facets of teamwork in the U.S. and will be modified for different ALL countries). The item production grid is used to ensure that an adequate number of items are developed to cover the skill domains of interest and to specify clearly what each item is intended to measure.

Regarding item construction, short vignettes were initially created. These vignettes describe a fictitious team performing a fictitious team task. Care was taken to ensure that vignettes were based on both work and non-work team situations. Each team described in the vignettes conformed to the definition and characteristics of a "team." To date, five vignettes have been created: one focusing on a toy manufacturing team, one focusing on a marketing team, one focusing on a customer service team and two focusing on community-based teams (one assigned to review school performance and one assigned to clean a park).

Situational judgment items were developed for each vignette. Each item presents a situation, and respondents are asked to rate the effectiveness of each response option on a 5-point scale where 1 indicates "Extremely Bad" and 5 indicates "Extremely Good." To date, eight items have been developed for each vignette, resulting in a total of 40 items. Appendix 1.2 presents several example items. Appendix 1.3 lists all of the items developed thus far.

One issue that was considered, though not specifically accounted for during item development, was the notion of item difficulty. First, unlike other measures included in ALL (i.e., literacy, numeracy, problem solving, etc.), the assessment of teamwork skills (or knowledge of teamwork skills) in the adult population internationally is a new undertaking. Therefore, no research was available to help identify the attributes that might comprise a more difficult and less difficult teamwork item. Certainly, varying the degree to which it is easy to identify the best response from a series of distractors would affect item difficulty. Though this could be done, the ability to respond to more difficult items constructed in this manner would not necessarily reflect more knowledge of teamwork skills. Such responses may be more reflective of a test taker's ability to read, comprehend, and extract the correct information. More importantly, we must acknowledge that the difficulty of teamwork may lie in the execution of team behaviors rather than in the knowledge of what to do. All team members may know what to do in a given team situation, but only the best team members are willing and able to carry out these behaviors in a timely and appropriate fashion that maximizes teamwork. The paper-and-pencil measurement approach used in ALL does not allow for assessing a respondent's skills in terms of actual outcome criteria.