as 0.85 (Bernstein, 1997). Others have used expert knowledge about systematic pronunciation errors made by L2 adult learners in order to diagnose and correct such errors. One such system is the European Community project SPELL for automated assessment and improvement of foreign language pronunciation (Hiller, Rooney, Vaughan, Eckert, Laver, & Jack, 1994). This system uses advanced speech processing and recognition technologies to assess pronunciation errors by L2 learners of English (French or Italian speakers) and provide immediate corrective feedback. One technique for detecting consonant errors induced by inter-language transfer was to include students' L1 pronunciations into the grammar network. In addition to the English /th/ sound, for example, the grammar network also

includes /t/ or /s/, that is, errors typical of non-native Italian speakers of English. This system, although quite simple in the use of ASR technology, can be very effective in diagnosing and correcting known problems of L1 interference. However, it is less effective in detecting rare and more idiosyncratic pronunciation errors. Furthermore, it assumes that the phonetic system of the target language (e.g., English) can be accurately mapped to the learners' native language (e.g., Italian). While this assumption may work well for an Italian learner of English, it certainly does not for a Chinese learner; that is, there are sounds in Chinese that do not resemble any sounds in English. A system for teaching the pronunciation of Japanese long vowels, the mora nasal, and mora

obstruents was recently built at the University of Tokyo. This system enables students to practice phonemic differences in Japanese that are known to present special challenges to L2 learners. It prompts students to pronounce minimal pairs (e.g., long and short vowels) and returns immediate feedback on segment duration. Based on the limited data, the system seems quite effective at this particular task. Learners quickly mastered the relevant duration cues, and the time spent on learning these pronunciation skills was well within the constraints of Japanese L2 curricula (Kawai & Hirose, 1997). However, the study provides no data on long-term effects of using the system. Supra-segmental Feedback. Correct usage of supra-segmental features such as intonation and stress has been

shown to improve the syntactic and semantic intelligibility of spoken language (Crystal, 1981). In spoken conversation, intonation and stress information not only helps listeners to locate phrase boundaries and word emphasis, but also to identify the pragmatic thrust of the utterance (e.g., interrogative vs. declarative). One of the main acoustical correlates of stress and intonation is fundamental frequency (F0); other acoustical characteristics include loudness, duration, and tempo. Most commercial signal processing software have tools for tracking and visually displaying F0 contours (see Figure 2). Such displays can and have been used to provide valuable pronunciation feedback to students. Experiments have shown that a visual F0 display of supra-segmental features combined

with audio feedback is more effective than audio feedback alone (de Bot, 1983; James, 1976), especially if the student's F0 contour is displayed along with a native model. The feasibility of this type of visual feedback has been demonstrated by a number of simple prototypes (Abberton & Fourcin, 1975; Anderson-Hsieh, 1994; Hiller et al., 1994; Spaai & Hermes, 1993; Stibbard, 1996). We believe that this technology has a good potential for being incorporated into commercial CALL systems. Other types of visual pronunciation feedback include the graphical display of a native speaker's face, the vocal tract, spectrum information, and speech waveforms (see Figure 2). Experiments have shown that a visual display of the talker improves not only word identification accuracy

(Bernstein & Christian, 1996), but also speech rhythm and timing (Markham & Nagano-Madesen, 1997). A large number of commercial pronunciation tutors on the market today offer this kind of feedback. Yet others have experimented with using a real-time spectrogram or waveform display of speech to provide pronunciation feedback. Molholt (1990) and Manuel (1990) report anecdotal success in using such displays along with guidance on how to interpret the displays to improve the pronunciation of suprasegmental features in L2 learners of English. However, the authors do not provide experimental evidence for the effectiveness of this type of visual feedback. Our own experience with real-time spectrum and waveform displays suggests their potential use as pronunciation feedback