Modern technologies in teaching FLT — страница 2
tutor--in other words, the linguistic intelligence of a human--only to conclude that the attempt to create an "intelligent language tutoring system is a fallacy" (p. 11). Because speech technology isn't perfect, it is of no use at all. If it "cannot account for the full complexity of human language," why even bother modeling more constrained aspects of language use (Higgins, 1988, p. vii)? This sort of all-or-nothing reasoning seems symptomatic of much of the latest pedagogical literature on CALL. The quest for a theoretical grounding of CALL system design and evaluation (Chapelle, 1997) tends to lead to exaggerated expectations as to what the technology ought to accomplish. When combined with little or no knowledge of the underlying technology, the inevitable result is disappointment. PRINCIPLES OF ASR TECHNOLOGY Consider the following four scenarios: A court reporter listens to the opening arguments of the defense and types the words into a steno-machine attached to a word-processor. A medical doctor activates a dictation device and speaks his or her patient's name, date of birth, symptoms, and diagnosis into the computer. He or she then pushes "end input" and "print" to produce a written record of the patient's diagnosis. A mother tells her three-year old, "Hey Jimmy, get me my slippers, will you?" The toddler smiles, goes to the bedroom, and returns with papa's hiking boots. A first-grader reads aloud a sentence displayed by an automated Reading Tutor. When he or she stumbles over a difficult word, the system highlights the word, and a voice reads the word aloud. The student repeats the sentence--this time correctly--and the system responds by displaying the next sentence. At some level, all four scenarios involve speech recognition. An incoming speech signal elicits a response from a "listener." In the first two instances, the response consists of a written transcript of the spoken input, whereas in the latter two cases, an action is performed in response to a spoken command. In all four cases, the "success" of the voice interaction is relative to a given task as embodied in a set of expectations that accompany the input. The interaction succeeds when the response--by a machine or human "listener"--matches these expectations. Recognizing and understanding human speech requires a considerable amount of linguistic knowledge: a command of the phonological, lexical, semantic, grammatical, and pragmatic conventions that constitute a language. The listener's command of the language must be "up" to the recognition task or else the interaction fails. Jimmy returns with the wrong items, because he cannot yet verbally discriminate between different kinds of shoes. Likewise, the reading tutor would miserably fail in performing the court-reporter's job or transcribing medical patient information, just as the medical dictation device would be a poor choice for diagnosing a student's reading errors. On the other hand, the human court reporter--assuming he or she is an adult native speaker--would have no problem performing any of the tasks mentioned under (1) through (4). The linguistic competence of an adult native speaker covers a broad range of recognition tasks and communicative activities. Computers, on the other hand, perform best when designed to operate in clearly circumscribed linguistic sub-domains. Humans and machines process speech in fundamentally different ways (Bernstein & Franco, 1996). Complex cognitive processes account for the human ability to associate acoustic signals with meanings and intentions. For a computer, on the other hand, speech is essentially a series of digital values. However, despite these differences, the core problem of speech recognition is the same for both humans and machines: namely, of finding the best match between a given speech sound and its corresponding word string. Automatic speech recognition technology attempts to simulate and optimize this process computationally. Since the early 1970s, a number of different approaches to ASR have been proposed and implemented, including Dynamic Time Warping, template matching, knowledge-based expert systems, neural nets, and Hidden Markov Modeling (HMM) (Levinson & Liberman, 1981; Weinstein, McCandless, Mondshein, & Zue, 1975; for a review, see Bernstein & Franco, 1996). HMM-based modeling applies sophisticated statistical and probabilistic computations to the problem of pattern matching at the sub-word level. The generalized HMM-based approach to speech recognition has proven an effective, if not the most effective, method for creating high-performance
Похожие работы
- Рефераты