Criteria for Defining Formulaic Sequences

  Criteria for Defining Formulaic Sequences

Scholars in different fields studied the issue of formulaic sequences, namely in general linguistics, corpus linguistics, phraseology, lexicography, psycholinguistics, neurolinguistics, first and second language acquisition and others.

They all looked at formulaic sequences with their own theoretical perspectives and different criteria. Thus, according to Schmitt & Carter (2004), corpus linguistics used the most common criteria: institutionalization, fixedness, non-compositionality and frequency of occurrence.

Specialists in the areas of psycholinguistics and language acquisition draw on such criteria as familiarity with sequences by individual participants, and holistic storage of the sequences. Multiple production of the same sequence is used as evidence for the former criterion and intact intonation contour for the latter.

Schmitt and Carter (2004) suggest that both linguistic and psycholinguistic criteria should be applied to the definition of formulaic sequences. Besides the criteria named above, there are some other possible criteria used to identify when a word string is a formulaic sequence.

Each one of the criteria is presented separately below. The criteria are used in regard to form, meaning and use of multi-word sequences.

Transparency of Meaning

Transparency of meaning has been used as one of the criteria for identifying formulaic sequences.

Wray (2004) refers to Erman and Warren (2000), who said that ‘one criterion for identify­ing such strings is some layer of meaning that transcends the individual words and belongs to the string as a whole’. She describes three positions that linguists have taken in regard to this criterion:

  1. On the one end are opaque idioms that are easy to identify because they cannot be constructed or decoded using the grammar of the language. Schmitt et al. (2004b) identify them as ‘self-contained’. Wray’s definition of formulaic sequences perceives only this kind of word strings as formulaic. However, Read and Nation (2004) refer to Grant’s (2003) extensive study of semantically non-compositional and non-figurative word strings that she called ‘core idioms’. Grant discovered that there are about 104 idioms in the English language. Only 10 of those have a literal equivalent in the British National Corpus. Thus, this evidence suggests that there are very few, if any, instances of truly formulaic sequences in the language. Even in those “extremely formulaic” sequences there is considerable variation (pull his leg, pulling my leg, etc.) Read and Nation (2004) state that this variability does not mean that those idioms are not formulaic. They observe that deliberate variation can be used for humor effect.
  2. According to Wray, other linguists and researchers suggest being more inclusive and allowing also word strings that are semantically transparent and grammatically irregular to be considered formulaic. This permits such expressions as have a nice day and it’s been great talking to you to be included.
  3. Other linguists assert that any wordstring should be treated as formulaic despite the absence of an additional layer of meaning or application of rule patterns.

Thus, we see that word strings can be put on a continuum of semantic transparency and grammatical regularity with idioms on the one end, and words with literal use of lexical forms on the other.

Holistic Storage/Processing

There is a strong belief among different scholars that formulaic sequences are normally stored and processed as whole units. Wray’s (2002) definition of formulaic sequences requires it to be prefabricated, i.e. stored in the mind and retrieved holistically. It does not permit ‘generation or analysis by the language grammar’.

Non-compositional idioms in which meaning cannot be obtained from their components can serve as one piece of evidence for holistic acquisition. In their study, Schmitt et al. (2004b) found that semantic/functional transparency played a role on whether recurrent clusters were stored holistically in the mind. According to the authors, the best results were obtained with short, self-contained or semantically transparent units (e.g., you know, to make a short story long), even with non-native speakers.

Spottl & McCarthy (2004) also observed that saliency and opaqueness in meaning affected the performance of their multilingual participants. The participants were engaged in analysis of most of the expressions, and only some well-known and frequently used formulaic sequences were translated without hesitations, i.e. holistically.

In their study, Underwood et al. (2004) examined the efficiency of reading the terminal words in formulaic sequences vs. the same words embedded in non-formulaic contexts. The results they obtained were in favor of the terminal words in formulaic sequences, which the authors interpreted as the evidence for holistic storage and processing of formulaic sequences.

Schmitt & Underwood (2004) assert that formulaic sequences, like individual words, have different processing burdens. As possible reasons for that, they name ‘saliency of the sequence, transparency of the sequence’s meaning, usefulness for particular speakers and context and […] frequency of occurrence’ (p. 183).

Schmitt, et al. (2004b) found in their study that there was no correlation between the frequency of occurrence of word strings in corpus analysis and the high performance score on the dictation test in their study, which means, according to the authors, that those clusters were not stored in the mind as whole units.

The results of some other studies in this volume were not so conclusive, which leaves room for further research to be done in this area.

Fixedness vs. Flexibility

Read and Nation (2004) distinguish between non-compositionality and fixedness in formulaic sequences. They consider a sequence to be non-compositional when it cannot be interpreted as a literal statement, ‘it may contain individual words that never occur except as part of that expression’ (e.g., spick and span). Fixedness, according to them, refers to the extent to which either the order of the words in the sequence can be changed, individual words can be replaced by others, items can be inserted, or items can be inflected.

Some formulaic sequences are totally fixed strings of words (e.g., Ladies and Gentlemen), while others can have a number of slots in their fixed elements which allows them to be used flexibly (e.g., it goes without saying that…). These slots, however, often have semantic constraints, which means that only semantically appropriate words or strings of words can be used in the slots.

Schmitt & Carter (2004) expect flexible formulaic sequences to be more prevalent in discourse than fixed ones, because they are adaptable to a wide range of situations. However, it is difficult to confirm their estimations, because, as they observe, current concordance software does not permit one to identify flexible formulaic sequences.

Fluency of Articulation

One of the criteria of formulaicity is some phonological characteristics of sequences: fluency, a normal intonation contour, such as a natural pitch, stress and juncture profile. Formulaic sequences are usually pronounced more fluently with a certain coherent intonation contour, as noticed by Schmitt & Carter (2004). Among possible phonological features of formulaic sequences, Read and Nation (2004) name speech rate, stress patterns, pauses, and clarity of articulation. Any deviation from a fluent articulation, such as hesitations between words within a cluster, false starts, stumbles and repetitions of parts or whole words, suggests that the cluster is not stored holistically. Though fluently-articulated reproduction is not a direct measure of holistic storage, as Schmitt et al. (2004) admitted, in their study of recurrent clusters embedded in dictation contexts, they consider it to be evidence for formulaic sequences being stored holistically.

Spottl & McCarthy (2004) found neither pauses nor repetitions on think-aloud protocols in their study, which they considered to be possible evidence for holistic processing.

Functions/Purposes of Use

Schmitt & Carter (2004) claim that every day speech situations are full of conventionalized language that serves different functions and includes such speech acts as apologizing, making requests, giving directions, and complaining. Expressions related to these speech acts are expected in certain social situations and thus ease interpersonal communication.

According to Schmitt & Carter (2004), Nattinger and DeCarrico (1992) proposed the wide use of formulaic sequences for functional purposes.

Schmitt & Carter (2004) list the following common functions of formulaic sequences:

  1. Maintaining social interaction in situations when the content is less important than the act of communication itself. In such cases people rely on ‘conventionalized phatic phrases’ or ’situation-bound utterances’ (Kecskes, 2003), for example, comments about the weather, agreeing with an interlocutor, etc.
  2. Discourse organization which involves the use of various discourse markers and organizing phrases for both written and spoken discourse: expressing an alternative viewpoint, re-phrasing, providing links to previous utterances (e.g., in other words, to put it another way, as I was saying).
  3. Transaction of information in a precise and efficient manner with the help of technical words or formulaic sequences. This is especially typical of situational/technical-based discourse where exact phraseology is required to avoid any possible misunderstanding, for example, in aviation, medicine, etc.

According to Schmitt & Carter (2004), multi-word sequences are used for the following purposes:

  1. to express a message or idea (The early bird gets the worm = do not procrastinate);
  2. to express functions ([I'm] just looking [thanks] = declining an offer of assistance from a shopkeeper);
  3. to express social solidarity (I know what you mean = agreeing with an interlocutor);
  4. to transact specific information in a precise and understandable way (Wind 28 at 7= in aviation language).

Jones and Haywood (2004) refer to Wray’s (2002) list of functions:

  1. promotion of the [user’s] interests (this purpose was named by Wray to be the dominant one);
  2. enabling an individual to express identity with a group (e.g., social or academic community);
  3. reducing processing effort for the listener or reader;
  4. expressing individual identity for the speaker or the writer.

Schmitt & Carter (2004) note that further research might reveal other purposes of use and functions of formulaic sequences as there is a variety of communicative situations that require speakers to use patterned language.

Semantic Prosody

Schmitt & Carter (2004) use Sinclair’s (2004) term ‘semantic prosody’ to refer to different shades of meaning (registers/appropriacy markings) that formulaic sequences can carry. They consider it to be the key element of the sequence’s meaning. The example given in the book shows how the usage of the word border becomes syntagmatically constrained when it is found combined with other words. Thus, the majority of instances of bordering on found in the BNC involve negative evaluation of the situation.

Schmitt & Carter (2004) admit that the knowledge of this characteristic of formulaic sequences is limited due to the lack of the research in this area.


Formulaic sequences can be long (You can lead a horse to water, but you can't make him drink) or short (Oh no!), or anything in between. Schmitt & Underwood (2004) found in their study that shorter sequences have a higher frequency of occurrence.


