Analogical Modeling and the English Past Tense
A Reply to Jaeger et al. 1996

Steve Chandler
Department of English
University of Idaho
Moscow, Idaho 83844-1102
chandler@uidaho.edu

Royal Skousen
English Department
Brigham Young University
Provo, Utah 84602-6280
royal_skousen@byu.edu

7 February 1997
Abstract

Analogical Modeling and the English Past Tense

A Reply to Jaeger et al. 1996

In a recent article in Language, Jeri Jaeger and her colleagues have presented behavioral data (reaction times) and PET scan data on the generation of English past-tense forms which they argue are inconsistent with various 'single system' approaches to modeling English verb form morphology, including Skousen's analogical approach. Unfortunately, the article mischaracterizes Skousen's approach so seriously as to make it impossible for Jaeger et al. to test it by comparing their results to the predictions of the model. A more accurate characterization of the analogical approach reveals that far from disconfirming it, the data reported in Jaeger et al. are completely compatible with the published descriptions of it.

Over the past ten years the acquisition and use of English past-tense verb forms have become the focus of intense study and debate. The debate is important because at issue is nothing less than the proper characterization of the cognitive structures and processes underlying language. In a recent contribution to this debate, published in Language, Jeri Jaeger and her colleagues (1996) have presented behavioral data (reaction times to various linguistic tasks) and PET scan data on the generation of English past-tense verb forms which they argue are inconsistent with both the connectionist approaches of Rumelhart and McClelland (1986) and Plunkett and Marchman (1991) as well as various other 'single system' approaches, including Skousen's analogical modeling approach (Skousen 1989, 1992). The authors conclude that their study confirms the 'two system' approach of traditional linguistic analysis as proposed in Pinker and Prince (1988, 1991) and elsewhere, while disconfirming the single system approaches.

Unfortunately, the study appears so flawed that it is hard for us to know what conclusions to draw from their data, but of more immediate concern to us is that their discussion of one of the alternative theoretical models which they claim to have disconfirmed shows such serious misunderstanding of that theory that we feel compelled to respond to it. Indeed, as we shall explain below, far from disconfirming, Skousen's analogical approach, the data reported by Jaeger et al. are completely compatible with it.

Several times in their article, Jaeger et al. repeat a characterization of Skousen's analogical approach which shows clearly that they do not understand the model, and that misunderstanding leads them to both incorrect statements about what the model predicts for their procedures and to incorrect interpretations of both their reaction time data and their PET scan data with respect to the model. The authors group the analogical approach with other so-called 'single-system theories' (p. 453) even though they in fact several times attribute two different processes or systems to it (pp. 457, 477). According to their interpretation, one system simply looks up instances of previously experienced past tense forms. The other system presumably involves some sort of additional process for analogizing from the pre-existing forms to novel, or in this case, nonce forms. Something like the simple listing and look up of exemplars assumed by Jaeger et al. is a common interpretation of instance-based cognitive models in the experimental psychology literature on instance-based learning (see Shank 1995 for a recent critical review of that literature). What Skousen has developed, however, is a mathematically explicit (Skousen 1992) and computationally tested (Skousen 1989) alternative to the simple listing view of instance-based learning.

Based on their understanding of analogical modeling, Jaeger et al. predict that all processing of real verbs ought to occur with approximately equal RTs and ought to activate approximately the same cortical areas while the additional activations required for the nonce verb forms ought to take longer and involve more cortical area (p. 458). Unfortunately for their conclusions, the interpretation of analogical modeling provided by Jaeger et al. is wrong. While it is true that Skousen's analogical model consults individual instances, or exemplars, of linguistic forms held in long term memory, what Skousen actually describes is a uniform procedure for analogizing from members of that data base to new occurrences, and presumably the procedure operates uniformly in deriving the past forms of familiar verbs as well as novel verb forms. Thus, the model does not simply choose some pre-existing, pre-inflected lexical item to articulate. It goes through an analogical procedure for determining how the new input is to be inflected. Skousen's model describes explicitly how the process chooses among the instances stored in long term memory to arrive at an analogical set þ the set of candidate exemplars which are to be compared to the new occurrence þ and how to choose a specific instance from the analogical set to serve as the model for analogizing to the new occurrence. Now, while the procedure is uniform, it does not follow that either the time required for processing or the cortical areas activated would always be uniform because different input and different tasks will generate different analogical sets and may involve potentially different aspect of the data (instances) stored in long term memory.

Turning now to specific results reported in Jaeger et al., we will explain how the analogical model is completely compatible with those data. The authors report that it took their subjects about 38 msec longer to generate and say a regular past tense for a verb than it took them simply to read a corresponding basic verb aloud, and about 56 msec longer to generate and say a past tense for a nonce verb than it did to read a basic nonce verb aloud, and about 238 msec longer to generate and say an irregular verb past form than simply to read a basic verb aloud (the read aloud list included both regular and irregular English verbs in the same list) (p. 464). As they note, these results are consistent with other RTs reported in the literature.

On the basis of their misunderstanding of the analogical model, Jaeger et al. conclude that their RT data are only partially compatible with it (p. 481). They find the slightly longer RTs for the nonce forms over the regular real verbs compatible with the model, presumably reflecting a slightly longer time for analogizing from known words to the nonce forms. However, they found the much longer RTs for the irregular verbs incompatible with an analogical approach because they are assuming simple lexical look up for both from the same list of previously experienced exemplars (pp. 458, 477).

Unfortunately, Jaeger et al.'s assumptions about and interpretation of analogical modeling and RTs show clearly that they are unfamiliar with Analogy and Structure (Skousen 1992). In section 16.4 of that work ('Efficiency and Processing Time'), Skousen sets out a simple interpretation of analogical modeling which predicts exactly the kinds of RT data discussed in Jaeger et al. By allowing the model to remember already determined homogeneous (supra)contexts, two properties are derived (Skousen 1992:362):

First, it takes longer to process a non-occurring given context than an occurring one (all other things being equal). And second, the closer a given context is to an exceptionally behaving context, the longer the processing time.

In that section these theoretical results are compared with the RTs found in Glushko (1979) for the pronunciation of spelled words. Glushko found that exceptional pseudowords (such as heaf) took the longest to pronounce, then exceptional actual words (such as deaf), closely followed by regular pseudowords (such as hean), with the fastest pronunciation given by regular actual words (such as dean). The RT results reported in Jaeger et al. for past-tense formation parallel the pronunciation results found by Glushko except that, unfortunately, Jaeger et al. chose not to test the RTs for nonce verbs that would be close to exceptional actual verbs (i.e., irregular verbs). Glushko, like Jaeger et al., made the oft repeated bold claim that analogy is unable to predict the appropriate RTs for such tasks, yet what Skousen (1992) demonstrated was that this result only follows when one does not even try to set up an explicit interpretation of analogical modeling that would theoretically predict reaction times.

Turning now from the RT data to the PET scan results, the authors again assume that the analogical model looks up an instance from a uniform listing of pre-inflected verb forms and then feeds that instance into an articulatory program. Based on that assumption they predict that the process ought to occur uniformly in approximately the same cortical areas. Since their results show both different areas and significantly different amounts of cortical area being activated for the three sets of test stimuli, they conclude that the analogical approach has been disconfirmed. In particular, they conclude that their finding that the processing of nonce verbs activated less cortical area than did the processing of irregular but real verbs contradicts outright their interpretation of the analogical model, as does the finding that irregular verbs activated larger and very different cortical areas than did the processing of regular verbs, even though both presumably involved consulting the same list of pre-inflected verb forms.

Our interpretation, however, is that the analogical model and the PET scan data reported in the article support the following alternative interpretation at least equally well. The past-tense forms for the two sets of regular forms (real and nonce) can be derived through the comparison of graphophonemic representations alone. The scan data suggest that even though subjects tried to interpret meanings for the nonce forms, they nonetheless were operating on graphophonemic forms alone. Indeed, in the case of the nonce forms, they had nothing more to work with. Thus, all the analogy procedure had to do was use the analogical process to choose a past-tense form and then feed that information into an articulatory program which possibly already included an articulatory program for the stem waiting in a buffer memory. The authors suggest elsewhere (pp. 470, 476) that their subjects may have been operating with something like this sort of task-specific response strategy. Independent evidence from transcortical sensory dysphasia (Whitaker 1976) and from clang responses to word association tasks (Clark 1971) also suggests that people can transfer phonological input directly to an output articulatory planning program without consulting meaning. Indeed, this issue of the temporal planning of articulatory programs confounds Jaeger et al.'s interpretation of both their RT and PET scan data throughout their article.

In the case of the irregular verbs the subjects have to go from one lexical input to a different lexical output, mediated possibly, perhaps even probably, by meaning. That is, the subjects read one word and then have to determine analogically how the past tense of the meaning of that lexical item is expressed as another word. Thus, even though the analogical processes may be the same for the two different tasks (that is, the two different types of verbs), the analogical sets for the irregulars would include meaning information that the regulars would not and that the nonce forms could not. Even though the processes may be the same, there is, of course, abundant independent evidence that the phonological and semantic information about words is stored in different cortical areas (e.g., Ojemann & Creutzfeldt 1981; Hart & Gordon 1992; Manning & Campbell 1992; Damasio & Tranel 1993), and there is, therefore, nothing inconsistent with the finding that different cortical areas store different kinds of lexical information which can be used to construct the analogical sets for the different kinds of analogizing tasks.

Analogical modeling can, we believe, be used to predict both paradigmatic and syntagmatic relationships in language. Most of our own work in analogical modeling has involved predicting or interpreting one outcome at a time (such as determining the past-tense for a given verb form). As pointed out in Skousen (1995:230), 'language obviously occurs in time and involves the prediction or interpretation of overlapping sequences of sounds, words, phrases, sentence, and discourse elements.' The paradigmatic system developed in Derwing and Skousen (1994) for analogically predicting the English past tense assumes that every distinct past-tense form is predicted within a single unit of processing and production. In other words, no assumption was made in that article about how a given past tense might be implemented in time.

The important point is that analogical modeling presumes (as does every theory of language) some cutting or splicing up of the speech flow. Paradigmatic predictions are used as input for syntagmatic predictions. Analogical modeling does not presume a unitary instance as its database. One could, as a reductio ad absurdum, view a given individual's language experience as an extraordinarily long string of sounds, beginning with birth and going up to the present moment. Obviously, there must be some breaking up of the speech flow into units. What these units may be remains an empirical question, but we believe that analogy can be used to predict both the choice of those paradigmatic units and their sequencing in time.

In concluding, we wish to emphasize that Skousen's analogical model posits a uniform procedure for interpreting input by comparing it analogically with specific instances of similar experiences stored in long term memory. The general procedure is uniform, but differences in task requirements or in stimuli may cause the procedure to apply to different data bases to different ends. We have argued elsewhere in considerable detail that Skousen's analogical approach provides an empirically and theoretically superior alternative to both the traditional rule-governed accounts of language behavior and to the various connectionist approaches addressed by Jaeger and her colleagues (Skousen 1989, 1992, 1995; Chandler 1993, 1995). Skousen (1989) has published a computer program for testing the predictions of his model explicitly, and he has applied it to the interpretation of phonetic features and of morphological data. Robinson (1995) has extended the model to a more general theory of signs, and Jones (1996) has recently applied it to the prediction of syntagmatic relationships in syntactic and semantic issues of natural language processing. The growing interest in the model shows that it merits serious consideration as an alternative to both symbolic-rule approaches and to connectionist approaches. We believe that the discussion of the analogical approach by Jaeger and her colleagues shows that they have dismissed it too hastily without first taking care to understand it thoroughly. References

Chandler, Steve. 1993. Are rules and modules really necessary for explaining language? Journal of Psycholinguistic Research 22.593-606.

---. 1995. Non-declarative linguistics: Some neuropsychological perspectives. Rivista di Linguistica 7.233-248.

Clark, Herbert H. 1971. Word associations and linguistic theory. New horizons in linguistics, ed. by John Lyons, 271-287. Harmondsworth, England: Penguin Books Ltd.

Damasio, Antonio R., and Daniel Tranel. 1993. Nouns and verbs are retrieved with differently distributed neural systems. Proceedings of the National Academy of Sciences, USA, 90.4957-4960.

Derwing, Bruce L., and Royal Skousen. 1994. Productivity and the English past tense. The reality of linguistic rules, ed. by Susan D. Lima, Roberta L. Corrigan, and Gregory K. Iverson, 193-218. Amsterdam: John Benjamins Publishing Co.

Glushko, Robert J. 1979. The psychology of phonography: Reading aloud by orthographic activation and phonological synthesis (Ph.D. dissertation, University of California at San Diego).

Hart, John Jr., and Barry Gordon. 1992. Neural subsystems for object knowledge. Nature 359.60-64.

Jaeger, Jeri J., Alan H. Lockwood, David L. Kemmerer, Robert D. Van Valin, Brian W. Murphy, and Hanif G. Khalak. 1996. A positron emission tomographic study of regular and irregular verb morphology in English. Language 72.451-497.

Jones, Daniel. 1996. Analogical natural language processing. London: UCL Press.

Manning, Lilianne, and Ruth Campbell. 1992. Optic aphasia with spared action naming: A description and possible loci of impairment. Neuropsychologia 30.587-592.

Ojemann, George A., and Otto D. Creutzfeldt. 1987. Language in humans and animals: Contribution of brain stimulation and recording. Handbook of physiology, section 1: the nervous system, vol. v, higher functions of the brain, part 1, ed. by Frederick Plum, 675-699. Bethesda, MD: American Physiological Society.

Pinker, Steven, and Alan Prince. 1988. On language and connectionism: Analysis of a parallel distributed processing model of language acquisition. Cognition 28.73-193.

Pinker, Steven, and Alan Prince. 1991. Regular and irregular morphology and the psychological status of rules of grammar. Berkeley Linguistics Society 17.230-251.

Plunkett, Kim, and Virginia A. Marchman. 1991. U-shaped learning and frequency effects in a multi-layered perceptron. Cognition 39.43-102.

Robinson, Derek. 1995. Index and analogy: A footnote to the theory of signs. Rivista di Linguistica 7.249-272.

Rumelhart, David E., and Jay L. McClelland. 1986. On learning the past tenses of English verbs. Parallel distributed processing: Explorations in the microstructure of cognition, vol. 2, ed. by Jay L. McClelland and David E. Rumelhart, 216-271. Cambridge, MA: MIT Press.

Shanks, David R. 1995. The psychology of associative learning. Cambridge: Cambridge University Press.

Skousen, Royal. 1989. Analogical modeling of language. Dordrecht: Kluwer Academic.

---. 1992. Analogy and structure. Dordrecht: Kluwer Academic.

---. 1995. Analogy: A non-rule alternative to neural networks. Rivista di Linguistica 7.213-232

Whitaker, Harry A. 1976. A case of the isolation of language function. Studies in neurolinguistics, vol. 1, ed. by Harry A. Whitaker and Haiganoosh Whitaker, 1-58. New York: Academic Press.