bnet

FindArticles > Journal of Literacy Research > Spring 2004 > Article > Print friendly

Phonics: A Large Phoneme-Grapheme Frequency Count Revised

Fry, Edward

This study is a summary and simplification of a very large phoneme-grapheme frequency count done by Hanna et al. (1966). Although the results and data from the original study have implications for teaching phonics and spelling, they were presented in a complicated and unwieldy manner. Moreover, the original study is out of print. This study, then, presents a succinct and simplified summary of the Hanna et al. results for researchers and teachers of reading and spelling.

Although phonics has been identified as an essential element of successful literacy instruction in the elementary grades (National Reading Panel, 2000), details about the nature and content of effective phonics (and spelling) programs have not been fully articulated. Questions regarding the content and sequencing of phonics instruction still exist. One approach to questions regarding the content and sequencing of phonics instruction is to examine the phoneme-grapheme content of words used in instructional contexts. Hanna, Hanna, Hodges, and Rudorf (1966) conducted such a study, examining and counting every phoneme-grapheme correspondence in a 17,310 word vocabulary (Thorndike & Lorge, 1944). The current interest in phonics has made this nearly raw basic research data very relevant to today's teachers, researchers and curriculum developers in the areas of reading, spelling, and linguistics. Unfortunately, these valuable data are reported in a U.S. Office of Education document spanning 1,716 pages that has long been out of print. Another complication is the unique coding system used by the researchers, which focused on such factors as separate counts of the phonemes in stressed and unstressed syllables, and the location of the phoneme in each syllable.

The present study has reanalyzed those basic data to simplify the original report and make it more useable. It addresses the following questions:

What are the most useful (highest frequency) phoneme-grapheme correspondences?

What are the most frequent ways of spelling those phonemes?

Answers to these questions could lead to better phonics and spelling instruction and could improve the phonics content of both commercial and teacher-made curriculum materials for reading and spelling instruction.

A similar set of questions was posed and answered (Fry, 1964) based on an earlier Stanford phoneme-grapheme research project (Moore, 1951) that used a 3,000word count. The Hanna et al. (1966) report used a much larger (17,000+) word corpus and a much more sophisticated coding system.

Earlier work in this area has focused on determining spelling generalizations. For example, a recent study by Abbott (2000) used the large Hanna et al. count to identify reliable spelling generalizations. Her spelling generalizations were based on Clymer (1963). Another generalization study by Johnston (2001) used her own 3000-word corpus plus data from Burmeister (1968) and Clymer. The research reported here is not based on generalizations, but rather it is a strict phonemegrapheme correspondence count.

The Hanna et al. study was one of the largest studies funded by the U.S. Office of Education up to that time. The corpus of words used in their study consisted of 17,310 different words selected from the Thorndike-Lorge Teacher's Word Book of 30,000 Words (1944). Hanna et al. omitted foreign words, trade names, slang, and rare words. They placed each phoneme into a 22 vowel and 30 consonant classification system according to the pronunciation given in a Mirriam-Webster dictionary.

Vowel Classification

Hanna et al. began with the Merriam Webster dictionary's vowel classification system involving 33 vowel sounds. They soon found it unworkable and simplified it to a 22 vowel system to facilitate their computerized algorithms. One of the original goals of their study was to determine how well a computer with many algorithms (rules and phoneme-grapheme information) could correctly spell each word, given the dictionary pronunciation guide. The short answer is about 50%.

The present study has a different purpose - to provide teachers and curriculum developers with usable and scientifically-based information for developing phonics and spelling instructional programs for beginning or remedial readers and spellers, presented in a comprehensible manner. It is a compromise between the tens of thousands of little facts in the giant Hanna et al. study and the realities of the classroom. After examining the data in this study, a teacher or curriculum developer should be able to more precisely determine phonics information that is rather useless and other information that is valuable. At either end of the phoneme-use spectrum, there could be causes for disapproval; linguists and phoneticians may find this study too simple, while some classroom teachers may find it too complicated. But it does unearth and bring to light some basic data that has long been buried in a difficult scientific report (Hanna et al., 1966).

The present study further simplifies the Hanna et al. vowel classification system to 17 categories or classifications as described below and seen in the tables. This was done to make the system more comprehensible and usable for teachers.

The simplification process was complicated. For example, many dictionaries do not recognize a Long U: rather, a Long OO sound, as in "moon" or "rule," is specified. In the present study I have combined Hanna et al.'s categories U1 and O6. Teachers can call it Long U or Long OO, whichever suits them.

The Short U, as in "up," is also problematic. Phonetically it is similar to a schwa, as in the A in "ago." Technically a schwa must be in an unaccented syllable, but I have combined the Short U and Schwa (Hanna et al.'s categories U3 and Schwa) because for all practical purposes, and certainly for beginning readers and spellers, they sound the same.

The letter or grapheme R and the phoneme /r/ cause a lot of vowel difficulty. When the letter A is followed by an R there are two different phonemes for the A: the phoneme /ä/ as in "far" and the phoneme /â/ as in "vary." I have chosen to keep these in two different categories.

The letter O followed by an R gives the O a broad sound /ô/ as in "for." A few other graphemes produce /â/ and Hanna et al. separate them into two categories, 02 & 05. I have consolidated them in the category Broad O.

The letter R modifies the Schwa or Short U sound when R follows an E, I, or U as in "her," "fur," "sir" to yield separate categories in the Hanna et al. study. The present study has all these uses, which is very much like the /r/ in "red," listed in the Short U + R vowel category of Table 1.

To summarize, the major changes between the classification system used in the present study and the Hanna et al. study involve consolidating U1 & O6, U3 & Schwa, and O2 & O5. These changes are reflected in Table 1.

Consonant Categories

Although consonants are far less problematic than vowels, they are not totally free of problems. Basically each consonant (grapheme) letter represents one phoneme. There are, however, some exceptions. For example, three consonant letters represent multiple phonemes: (a) Letter X as in "box" represents the /ks/ sound, (b) Letter Q (which never appears without a U) represents the /kw/ sound as in "quick," and (c) Letter C represents two sounds-the /s/ sound as in "city" and the /k/ sound as in "cat."

Consonant digraphs-CH, SH, TH voiced, as in "this," and TH voiceless, as in "thin" each represent distinct phonemes (and should have been separate letters). The consonant digraph WH represents three phonemes; /h/, /w/, and /hw/. For example, the common word "what" can be correctly pronounced as /wot/ or /hwot/, but "why" must begin with /hw/ and "who" must begin with /h/.

A few consonant phonemes are not spelled with the expected graphemes. For example.the /j/ sound is more commonly spelled with a G as in "gem" rather than the expected J as in "just." (See Table 2 for a full presentation of consonants.)

The real work of this study involved producing the tables. They simplify and summarize hundreds of pages of data from the Hanna et al. study and answer basic questions about the significance of phonics content.

Tables 1 and 2 provide all the common spellings (graphemes) for all the phonemes. The frequencies represent how often each phoneme is spelled by a particular grapheme in a 17,310-word corpus. Frequencies less than 10 are omitted; these might properly be considered exceptions as they occur less than 0.006% of the time.

I have tried to give the vowel phonemes popular names to make the information more usable for teachers. The symbols within parentheses, however, are the Hanna et al. categories. The symbols between slash marks are those often used by dictionaries.

If a majority of words follow common rule, it is listed in the rule column in Table 1. Where no rule is apparent or specified, the correspondence is classified as:

Regular-usually the most common use

Unusual-frequency less than regular but more than 50

Rare-frequency less than 50 but 10 or more

Although this classification is somewhat arbitrary, users who wish to know the frequency of any of the correspondences to help them make instructional decisions will find that the data in Tables 1 and 2 provide an empirical frequency summary of phoneme-grapheme correspondences.

Tables 3 and 4 are a further simplification of the Hanna et al. study. They answer the question: "What are the most useful (highest frequency) phoneme-grapheme correspondences?" Table 3 ranks the most common vowel phoneme-grapheme correspondences and gives a few common alternate spellings (less common correspondences). Table 4 ranks the consonants. It differs from Table 3 in that it ranks the consonants by grapheme, not phoneme. In a majority of the correspondences the phoneme and the grapheme are the same, however.

Discussion

This study validates much that is common on phonics instruction, such as the teaching of short vowels before long vowels, the final E rule, the open syllable rule, schwa, and R-modified vowels (see Table 3). The basic correspondences for consonants are important but there are some important modifications and exceptions (see Table 4).

The many ways different phonemes can be spelled, as shown in Tables 1 and 2, provide content for spelling and reading programs. And since one must limit the amount of content, particularly for beginners, the information in the tables suggests what might be eliminated and what could be emphasized.

This study does not support the teaching of phonics or phoneme-grapheme correspondences arranged in alphabetical order. The teaching of/b/ is certainly not more important than the teaching of /r/ or /t/.

The findings of this study, particularly Tables 3 and 4, which show the ranking for vowels and consonants, can be used as a checklist or tool for evaluating published reading and spelling materials. Such a checklist could help district and state curriculum coordinators in developing language arts curricula. It may also help college instructors in developing reading teacher education curricula. It could also assist teachers of English-language learners.

There are about as many ways to teach spelling as there are to teach reading. Phonics is only one way; more often it is a part of a broader collection of techniques and content. The results from this study may inform such curricula. For example, this study might help teachers in selecting categories for word sorts (Bear, Invernizzi, Templeton, & Johnston 2000).

Phonemic awareness has come to prominence partly because of the National Reading Panel's meta-analysis (Ehri et al., 2001) and the summary by Smith, Simmons, and Kameenui (1995). This study provides some content for phonemic awareness instruction.

The National Reading Panel's (2000) meta-analysis for phonics showed some benefit for systematic phonics instruction. The data in this study provides much content for a systematic or an incidental phonics program and lesson, such as the Making Words technique (Cunningham & Cunningham, 1992).

Several limitations to this study need to be identified. First, although this study presents all major phoneme-grapheme correspondences, it is not a presentation of every possible phoneme-grapheme correspondence for either reading or spelling.

The present study is a count of all the phonemes used in over 17,000 different words without regard to word frequency. For example, the digraph TH occurs in some very high frequency words like "the," "this," and "that," yet the TH digraph occurs in only 411 different words. One could argue that TH is thus more important than the 411 frequency indicates. But to weight every one of the 17,000 different words by frequency of occurrence is well beyond the scope of this study. Teachers usually solve this problem by mixing the teaching of phonics with the teaching of high-frequency sight words (e.g. Instant Words (Fry, 1999)). Perhaps some well-funded future researcher would like to work on the weighting (type/ token) problem.

The present study does not deal with morphemes or meaning units (e.g., the prefix UN- has a very high frequency) rimes (phonograms), or other common letter clusters. As readers mature they tend to use larger clusters of letters than just graphemes (Adams, 1990).

One could question that the word database published in 1944 (Thorndike & Lorge) is a bit dated. Nevertheless, new words tend to have a lower frequency than the more common structure words such as "is," or common base words like "run." Thus, most of the words in the Thorndike list are still very relevant. Language does change, but it changes slowly and neologisms tend not to appear in phonics or elementary spelling lessons. However, even new words tend to use the same more common correspondences; hence, it is unlikely that using a newer English word list would substantially change the rank order of correspondences reported here.

Finally, the present study is not a child development study. It does not address which correspondences a beginning reader typically does or should learn first. However, there is an implication in the study that the more common or higher frequency correspondences should be taught first.

Phonics and phonemic awareness are keys to successful literacy acquisition (National Reading Panel, 2000). The results of this study provide reading instructors and curriculum developers with practical information for improving the precision and effectiveness of instruction in these areas.

References

Abbott, M. (2000). Identifying reliable generalizations for spelling words: The importance of multilevel analysis. The Elementary School Journal, 102, 233-245.

Adams, M. J. (1990). Beginning to read: Thinking and learning about print. Cambridge, MA: MIT Press.

Bear, D. R., Invernizzi, M., Templeton, S., & Johnston, F. (2000). Words their way. Upper Saddle River, NJ: Merrill.

Burmeister, L. E. (1968). Usefulness of phonic generalizations. The Reading Teacher, 21, 349-356.

Cunningham, P. M., & Cunningham, J. W. (1992). Making words: Enhancing the invented spelling-decoding connection. The Reading Teacher, 46, 106-107.

Clymer, T. (1963) The utility of phonics generalizations in the primary grades. The Reading Teacher, 16, 252-258.

Ehri, L. C., Nunes, S. R., Willows, D. M., Schuster, B. V., Yaghoub-Zadeh, Z., & Shanahan, T (2001). Phonemic awareness instruction helps children learn to read: Evidence from the National Reading Panel's meta-analysis. Reading Research Quarterly, 36, 250-289.

Fry, E. B. (1964). A frequency approach to phonics. Elementary English, 22, 759-766.

Fry, E. B. (1999). 1000 Instant Words: The most common words for teaching reading, writing, and spelling. Westminster, CA: Teacher Created Materials

Hanna, P. R., Hanna, J. S., Hodges, R. E., & Rudorf, E. H. (1966). Phoneme-grapheme correspondences as cues to spelling improvement. Washington, DC: U.S. Department of Health, Education, and Welfare.

Johnston, F. P. (2001). The utility of phonic generalizations: Let's take another look. The Reading Teacher, 55, 132-142.

Moore, J. T. (1951). Phonetic elements appearing in a 3000 word spelling vocabulary. Unpublished dissertation, Stanford University.

National Reading Panel. (2000). Report of the National Reading Panel; Reports of the subgroups. Washington, DC: National Institute of Child Health and Human Development Clearinghouse.

Smith, S. B., Simmons, D.C., & Kameenui, E. J. (1995). Synthesis of research on phonological awareness principles and implications for reading acquisition. Technical Report 21, National Center to Improve the Tools of Education, University of Oregon (A review of 29 references).

Thorndike, E. L., & Lorge, I. (1944). The teacher's book of 30,000 words. New York: Teachers College Press.

Edward Fry

Rutgers University

Copyright National Reading Conference Spring 2004
Provided by ProQuest Information and Learning Company. All rights Reserved