graphic of the four basic elements of reading ease: Content, Style, Structure and Design
Fig. 5. The four basic elemnts of reading ease.

Having a measure, now, of the difficulty of each passage, they were able to see what style variables changed as the passage got harder. They used correlation coefficients to show those relationship.

Of the 64 countable variables related to reading difficulty, those with correlations of .35 or above were the following (p.115):

  1. Average sentence length in words: -.52 (a negative correlation, that is, the longer the sentence the more difficult it is).
  2. Percentage of easy words: .52 (the larger the number of easy words the easier the material).
  3. Number of words not known to 90% of sixth-grade students: -.51
  4. Number of "easy" words: .51
  5. Number of different "hard" words: -.50
  6. Minimum syllabic sentence length: -.49
  7. Number of explicit sentences: .48
  8. Number of first, second, and third-person pronouns: .48
  9. Maximum syllabic sentence length, -.47
  10. Average sentence length in syllables, -.47
  11. Percentage of monosyllables: .43
  12. Number of sentences per paragraph: .43
  13. Percentage of different words not known to 90% of sixth-grade students: -.40
  14. Number of simple sentences: .39
  15. Percentage of different words: -.38
  16. Percentage of polysyllables: -.38
  17. Number of prepositional phrases: -35

Although none of the variables studied had a higher correlation than .52, the authors knew by combining variables, they could reach higher levels of correlation. Because combining variables that were tightly related to each other did not raise the correlation coefficient, they needed to find which elements were highly predictive but not related to each other.

Gray and Leary used five of the above variables, numbers 1, 5, 8, 15, and 17, to create a formula, which has a correlation of .645 with reading-difficulty scores. An important characteristic of readability formulas is that one that uses more variables may be only minutely more accurate but much more difficult to measure and apply. Later formulas that use fewer variables may have higher correlations.