Analogical Modeling of Language

Royal Skousen
1989
Kluwer Academic Publishers
Dordrecht
ISBN 0-7923-0517-5

INTRODUCTION

One important purpose of this book is to compare two completely different approaches to describing language. The first of these approaches, commonly called structuralist, is the traditional method for describing behavior. Its methods are found in many diverse fields -- from biological taxonomy to literary criticism. A structuralist description can be broadly characterized as a system of classification. The fundamental question that a structuralist description attempts to answer is how a general contextual space should be partitioned. For each context in the partition, a rule is defined. The rule either specifies the behavior of that context or (as in a taxonomy) assigns a name to that context. Structuralists have implicitly assumed that descriptions of behavior should not only be correct, but should also minimize the number of rules and permit only the simplest possible contextual specifications. It turns out that these intuitive notions can actually be derived from more fundamental statements about the uncertainty of rule systems.

Traditionally, linguistic analyses have been based on the idea that a language is a system of rules. Saussure, of course, is well known as an early proponent of linguistic structuralism, as exemplified by his characterization of language as "a self-contained whole and principle of classification" (Saussure 1966:9). Yet linguistic structuralism did not originate with Saussure -- nor did it end with "American structuralism". The Neogrammarian approach to historical change -- "phonetic laws have no exception" -- is clearly structuralist. And it must be recognized that Chomsky himself is a structuralist par excellence. His attack against American structuralists was not an attack against structuralism per se, but instead was an attack against some of the methodological assumptions that these linguists had espoused. For Chomsky (and virtually all other linguists today) there is no doubt that language is rule governed and that language behavior must be accounted for in terms of explicit rules. As a corollary, language acquisition is viewed as learning rules and language change as a change in the rules.

Nonetheless, a number of conceptual and empirical problems arise when we try to describe language in terms of rules. In order to eliminate these difficulties, this book introduces a new way of accounting for language behavior, one that can be called analogical. But unlike the imprecise and impressionistic appeals to "analogy" that have characterized language studies in the past, the analogical approach that this book proposes is based on an explicit definition of analogy. The main problem with traditional analogy is that there is no limit to its use: almost any form can be used to explain the behavior of another form, providing there is some similarity, however meager, between the two forms. Nor does this book use analogy to handle only the cases that the rules cannot account for. Instead, everything is considered analogical, even the cases of complete regularity.

Basically, an analogical description predicts behavior by means of a collection of examples called the analogical set. For a given context x, we construct the analogical set for x by looking through the data for (1) classes of examples that are the most similar to x and (2) more general classes of examples which behave like those examples most similar to x. The probability that a particular kind of occurrence will serve as the analogical model depends on several interrelated factors:

In many cases the predicted behavior is nearly the same no matter whether a rule approach or an analogical one is used, but conceptually the two approaches are vastly different. Some of the important differences between these two approaches are:

STRUCTURALIST APPROACH ANALOGICAL APPROACH
a system of rules an extensive data set derived from actual language data
based on types of behavior based on tokens of behavior
contextual space is partitioned into rule contexts contextual space remains atomistic
global, macroscopic local, microscopic
need for a learning strategy to discover the rules from the data need for a strategy to access the data set and analyze the data
static, rigid dynamic, flexible
usage: find the correct rule that applies to the given context usage: find an appropriate example to model behavior after
need to know how the rules interact need to be able to access data quickly
transitions in predicted behavior ("boundaries") are sharp and precise transitions in predicted behavior are gradual and fuzzy
rule governed appears to be rule governed
general predictions can be made from rules alone predictions can be made only in terms of given contexts
explicit, direct implicit, indirect
usage is a function of the description usage is the description

Many of these same distinctions are found in Winograd's notion of "declarative" versus "procedural" knowledge (Winograd 1975:185-191) or in Rumelhart's distinction between factual knowledge ("knowledge that") and procedural knowledge ("knowledge how") (Rumelhart 1979:2). Recent connectionist models of behavior share many procedural properties with analogical models. But as we shall see in the course of this book, there are significant differences between connectionist and analogical approaches to language.

This book concentrates on language, even though many of the results are applicable to more general kinds of behavior. Moreover, I do not discuss the properties of structuralism in any detail; and the underlying statistical basis for my analogical approach is only marginally considered in this book. These fundamental matters are taken up in my book Analogy and Structure. In that work (as yet unpublished) I discuss such topics as measures of uncertainty, optimality of rule descriptions, and the notion of natural statistics in analogical descriptions. Analogy and Structure is an important precursor to this book and I will occasionally summarize its findings in the following pages.

CHAPTER 1: There are three kinds of behavior that a theory of language description must account for: (1) categorical, (2) exceptional/regular, and (3) idiosyncratic. In order to help visualize the differences between these types, I first construct simple artificial examples of these three basic types of behavior. I also show how a rule approach would describe these basic types.

I then turn to the empirical and conceptual problems that rule approaches have. First of all, there is empirical evidence from language behavior that the boundaries between different types of behavior are not well-defined. I consider a number of examples from English: children's use of the indefinite article (a/an), misspellings, morphological extensions, pronunciation of nonce spellings, experiments with voicing onset time, and Labov's semantic experiments.

In addition, there are some conceptual problems with rule approaches. One particular difficulty is the indeterminacy that occurs when either no rule or more than one rule is applicable. Yet evidence from language usage clearly demonstrates that speakers can readily deal with cases of missing information and ill-formed contexts. In addition, rule approaches have difficulty in dealing with redundancy.

CHAPTER 2: I begin this chapter by giving an overview of how the analogical approach works. I then go through a step-by-step description of how to construct the analogical set for a given context. Important constructs are defined: data set, network of pointers, agreement and disagreement, uncertainty, given context, supracontext, homogeneity and heterogeneity, and rule of usage. One important result is that there is no need for the traditional methods of statistical analysis in order to determine contextual heterogeneity. Instead, I propose a very simple but powerful decision procedure that minimizes the number of disagreements. I also develop a few of the properties of analogical descriptions (such as the exponential effect in saturated deterministic fields of data).

At the end of the chapter, I apply the analogical approach to the three simple examples of categorical, exceptional/regular, and idiosyncratic behavior from chapter 1. The analogical approach captures the basic "rule governedness" of these examples, but also permits leakage across boundaries. In addition, I also briefly discuss the various factors that determine the slope of transition at boundaries. And in contrast to rule approaches, an analogical approach can readily handle cases of indeterminacy: behavior can be predicted even when the given context is either "ill-formed" or missing some "crucial" variable. In other words, an analogical approach predicts the kind of behavior that language actually exhibits.

CHAPTER 3: In this chapter I apply the analogical approach to several examples from English. I first discuss problems that arise when constructing the data set for language examples. Two important questions that must be considered are: (1) which variables should be represented in the data set; and (2) should the data set be based on token or type occurrences? After considering these questions, I construct the data sets for three detailed examples. Each example represents one of the three basic types of behavior:

categorical
  • indefinite article
    • a (followed by a consonant)
    • an (followed by a vowel)
exceptional/regular
  • spelling of initial /h/ sound
    • <h> regular case
    • <wh> major exception
    • <j> minor exception
idiosyncratic
  • categorization of labial stops
    (in terms of voicing onset time)
    • /b/ [-107,2] milliseconds
    • /p/ [51,94] milliseconds

In the categorical example, I show how leakage favors the more frequent outcome (that is, a rather than an) -- not only during language acquisition, but also later. Moreover, an analogical approach can usually predict the appropriate form of the indefinite article even when the following segment is masked out by noise or is simply deleted. Whenever the crucial information is missing, the analogical approach uses a combination of redundancy and word recognition to make predictions about behavior.

While considering the exceptional/regular example, I discuss the notion of a gang effect and introduce an explicit definition for this effect. I also show how the gang effect measures the degree of regularity for any given context that actually occurs in the data. Typically, exceptional data occurrences have a gang effect of less than one, whereas regular data occurrences have a gang effect of greater than one.

Finally, in discussing the case of voicing onset time, I show how a continuous variable can be analyzed as a sequence of discrete variables. In addition, I consider how the number of variables affects the rate of transition across the empty contextual space between idiosyncratic occurrences.

CHAPTER 4: One of the most difficult problems in language description has been non-deterministic or probabilistic language behavior. Rule approaches typically account for such behavior by positing probabilistic rules. Even if we suppose that probabilities exist, there is still the very difficult question of how those probabilities are actually learned from the statistics and then used to predict behavior. But in an analogical approach no probabilities are directly postulated; instead, an analogical set of examples is constructed and then one of these examples is randomly selected in order to predict stochastic behavior. Thus it may look as if probabilities are learned, but in fact none are.

In this chapter I also discuss the connectionist models of behavior that have been proposed by McClelland and Rumelhart. Their activation model (now referred to as "parallel distributed processing") is another procedural alternative to declarative rule approaches. Yet there are some serious difficulties with McClelland and Rumelhart's model. I will argue that their approach cannot learn specific probabilities, nor can it adjust to alternative rules of usage or momentarily eliminate specific outcomes from consideration.

In the second half of this chapter, I consider how an analogical approach deals with multivariate data. If a given context actually occurs, the analogical prediction, it turns out, is not very interesting unless we introduce imperfect memory into our analogical approach. I then show that imperfect memory is equivalent to smaller, less powerful levels of statistical significance. Under conditions of imperfect memory an analogical approach can achieve all the statistical properties necessary for predicting multivariate behavior, but without any statistics!

In this chapter I apply the analogical method under conditions of imperfect memory to two examples of multivariate behavior: the well-known case of final-stop deletion in Black English and a complex case of sociolinguistic variation in colloquial Egyptian Arabic. I also consider some of the difficulties that have arisen when variationists have attempted to account for such complex cases of linguistic variation. The main problem here has been the inherent difficulty in describing variation by means of a system of rules, probabilistic or otherwise.

CHAPTER 5: In this chapter I show how the analogical approach deals with a complex case of historical drift in Finnish. There are a couple dozen Finnish verbs whose past tense forms originally ended in si, but which now end in ti in the standard language, yet there has been no systematic explanation of how this could have happened. In this chapter I first show how a restricted principle of homophone avoidance originally changed at least two past tense forms from si to ti. The effect of this minor change in an already sparse field was sufficient to break down the original gang effect of that field. Under conditions of imperfect memory, the analogical approach then predicts the subsequent historical drift, so that over time other verbs in this field have also changed their past tense forms from si to ti. The analogical approach thus accounts for the original instability of certain past tense forms in Finnish. It also predicts the overall stability of the past tense in the modern standard language.

CHAPTER 6: In this concluding chapter I consider the benefits of constructing the analogical set using massively parallel processing. And I also consider the parallels between the analogical approach in language and the atomistic approaches of physics.

APPENDICES: At the end of the book I include three appendices. The first lists the phonemic symbols for English that are used in this book. The second appendix gives five data sets that are used extensively in chapters 3 through 5: the indefinite article in English, the spelling of initial /h/ in English, the voicing distinction between /b/ and /p/ in English, terms of address in colloquial Egyptian Arabic, and past tense formation in Finnish. In the third appendix, I provide the computer program that I used to derive the results of this book. The parameters in this program are set so that the program will make analogical predictions for the data on final-stop deletion in Black English (as discussed in section 4.3).