The Principles of Readability

The first readability formula Bertha Lively and S. L. Pressey (1923) were concerned with the practical problem of selecting science textbooks for junior high school. The books were so overlaid with technical words that teachers spent all class time teaching vocabulary. They argued that it would be helpful to have a way to measure and reduce the "vocabulary burden" of textbooks.

Their article featured the first children's readability formula. It measured the number of different words in each 1,000 words and the number of words not on the Thorndike list of 10,000 words. Their method produced a correlation coefficient of .80 when tested on 700 books.

In reading research, investigators look for correlations instead of causes. A correlation coefficient (r = ) is a descriptive statistic that can go from +1.00 to 0.0 or from 0.0 to -1.00. Both +1.00 and -1.00 represent a perfect correlation, depending on whether the elements are positively or negatively correlated. A coefficient of 1.00 shows that, as one element changes, the other element changes in the same (+) or opposite (-) direction by a corresponding amount. A coefficient of .00 means no correlation, that is, no corresponding relationship through a series of changes.

For example, if a formula should predict a 9^th-grade level of difficulty on a 7^th-grade text, and, if at all grade levels, the error is in the same direction and by a corresponding amount, the correlation could be +1.00 or at least quite high. If, on the other hand, a formula predicts a 9^th-grade level for a 6^th-grade text, an 8^th grade level for a 10^th-grade text, and has similar variability in both directions, the correlation would be very low, or even 0.00.

Squaring the correlation coefficient (r² =) gives the percentage of accountability for the variance. For example, the Lively and Pressey formula above accounts for 64% (.80²) of the variance of the text difficulty.

Other early school formulas Mabel Vogel and Carleton Washburne (1928) of Winnetka, Illinois carried out one of the most important studies of readability. They were the first to study the structural characteristics of the text and the first to use a criterion based on an empirical evaluation of text. They studied ten different factors including kinds of sentences and prepositional phrases, as well as word difficulty and sentence length. Since, however, many factors correlated highly with one another, they chose four for their new formula.

Following Lively and Pressey, they validated their formula, called the Winnetka formula, against 700 books that had been named by at least 25 out of almost 37,000 children as ones they had read and liked. They also had the mean reading scores of the children, which they used as a difficulty measure in developing their formula. Their new formula correlated highly (r = .845) with the reading test scores.

With this formula, investigators knew that they could objectively match the grade level of a text with the reading ability of the reader. The match was not perfect, but it was better than subjective judgments. The Winnetka formula, the first one to predict difficulty by grade levels, became the prototype of modern readability formulas.

Vogel and Washburne's work stimulated the interest of Alfred S. Lewerenz (1929, 1929a, 1935, 1939), who produced several new readability formulas for the Los Angeles School District.

W. W. Patty and W. I. Painter (1931) discovered the year of highest burden in high school is the sophomore year. They also developed a formula to measure the relative difficulty of textbooks based on a combination of frequency as determined by the Thorndike list and vocabulary diversity (the number of different words in a text).