Compositional vs. Frozen Sequences
Jorge Baptista
University of Algarve, Portugal
Lexicon-Grammar Workshp, Beijing, 16-17 Oct. 2004
1. Introduction
Compound words and frozen expressions constitute
a major part of the lexicon of many languages.
Their definition is not easy, and conceptual and
terminological discussions abound in the literature.
Traditionally defined on semantic grounds
criterion of non-compositionality,
the global meaning of a multiword
expression can not be calculated based on
the meaning of its individual elements when
they are used separately in the language.
formal, syntactic (or combinatorial)
semantically ‘opaque’ compound words
dog-collar, dogfight
only ‘half opaque’ compound words :
dogfish , fish knife , half-life
semantically ‘transparent’ compound words
heavy element ,
<date> before present (present =1950).
spelling rules –are just writing conventions
(orthography consecrates writing habits)
fish knife / fish-knife,
fish finger / fish-finger
Formal constraints on word combinations (non
semantically motivated):
e.g. the set of time-related nouns (dawn, morning, afternoon,
sunset, evening, night), and prepositions, determiners or
at noon / *at morning
in the evening / *on the evening
in the morning / *in morning
by morning / by the morning
meaning of individual, isolated words.
 meaning of a word is related to the word’s
i.e. the words it co-occurs with.
 determining the meaning of a given word by
inserting it in several, different sentences and,
by carefully controlling formal changes on those
sentences, looking for changes (or
invariance) in meaning.
Disagreement about ‘transparent’, halftransparent’ or even ‘opaque’ word-combinations.
 Intuitions about meaning are almost always
and too imprecise to be used in a reproducible
 rather use syntactic, formal criteria to identify
 Show that words are ‘frozen’ together,
even if the meaning of the combination
is relatively ‘transparent’.
‘frozen’ = two or more elements of the expression
do not show any distributional variation.
e.g. the set of time-related nouns
unpredictable blocking of distributional variation
 acceptable combinations have to be included in
the lexicon therefore they should be treated as
compound lexical units.
Every part-of-speech (PoS) shows both simple
and compound words.
 For example, word-combinations such as
the man in the street could very well be
accounted as an indefinite pronoun (similar to
Politicians always cared about the opinion of
the man in the street
Usually, many compound prepositions and
conjunctions have already been included in
current dictionaries:
John stopped in the middle of the street
John came to Paris by way of Madrid
John came to Paris in spite of my warnings against it
John came to Paris because of my warnings
There are some (productive?) rules to produce
compound adjectives:
-like : to be life-like, Algol-like languages
-proof : to be (bullet + water + …) -proof
 Other compound adjectives are frozen on purely
combinatorial ways:
John is (sick and tired + *tired and sick) of that
Moreover, in English, verb + particle combinations
forming phrasal verbs, can be considered a
especial case of compound verb:
John ran (for a mile)
John ran away (to Brazil)
The batteries are running down
John ran into Mary
John ran off to Brazil
John ran off with a book
John’s lecture ran on
The printer ran out of paper
The truck ran over the dog
John ran through the entire proceeding
Some compound words can be described in a
regularly way, by means of finite-state
transducers, as, for example, the (potentially
infinite) set of compound numerals:
one hundred and twenty-one,
twenty-one thousand two hundred and twenty-one
High number of compound words in texts,
particularly in scientific and technical texts
 meaning units
 must be identified as a block and not as a
string of simple words.
 unpredictable overall meaning, that cannot be
directly calculated from the meaning their
internal elements.
In this lecture, we will focus on syntactic
properties that can be used to identify
 Being a major part of many languages’ lexicon,
the task of retrieving and describing them into
dictionaries is not trivial, especially if these
dictionaries are meant to be used in natural
language processing.
many statistical methods to retrieve compound
(or multiword) lexical units from texts,
the linguist’s task : to validate those word combinations
as compound lexical units and to build the dictionaries
for them.
In order to do this, linguists have to rely on syntactical
properties, which can only be done by learning the
language’s syntactic general rules.
It is only then that linguists can find out
the combinatorial constraints on those rules
shown by multiword expressions.
This presentation is structured in two parts:
 first we will present some of the major syntactical
properties distinguishing compound nouns from
ordinary noun phrases; and
 in the second part we will give some examples
of how the same methodology can be applied
to the identification of compound adverbs.
1. Compound nouns.
Probably the most known case of compounding,
compound nouns constitute the largest of all compound
word classes.
In every domain (scientific, technical, economical,
political, etc.) there is a constant need for coining new
denominations for new objects, tools, concepts,
products and so on, the nouns being the most natural
part-of-speech (PoS) to accommodate
such new designations.
compound nouns formed by sequences of
grammatical categories identical to those
appearing in ordinary (i.e. not frozen) noun
a nice dog (a dog)
a hot dog (a sandwich)
a square table (a table)
a square root (a mathematical function)
Adam’s orange (an orange)
Adam’s apple (a part of the human body)
differences between compounds and free word
 this distinction is not as clear-cut as dictionaries
and grammars sometimes could lead one to
 This presentation will show some of the basic
syntactic properties that can help distinguishing
compounds from free word combinations.
compounding in the framework of traditional
grammar studies (Morphology).
 Lexicon-grammar approach:
compounds are described with the very same
tools used to describe the syntax of noun
In order to identify a compound as such
it is necessary to check if that particular word
combination shows any constraints
to the combinatorial properties that one would
expect to find in a noun phrase (NP) formed by
the same internal PoS sequence
(G. Gross 1988, 1989).
compare the grammar of noun phrases to
syntactical properties of a word-combination
candidate for the status of compound word.
our examples here will consist of already well-known
compound nouns.
By analogy, the same methodology can be extended
to other, more complex, word combinations.
Let’s take the examples square table / square root.
In a free NP with the internal structure Adjective + Noun
(AN), where the adjective is often a free modifier of the
the predicative function of the adjective on the noun
is an explicit paraphrase with relative clause with
auxiliary verb be:
a square table : a table that is square
This is not the case with the compound square root:
a square root : *a root that is square
and also with many other compound nouns
where we say that the adjective looses his
Also, free adjectives can be further modified by
an adverb:
a square table : a perfectly square table
a table that is perfectly square
a square root : * a perfectly square root
*a root that is perfectly square
When the AN combination is free, both the adjective and the
noun can vary, provided that basic distributional constraints are
Therefore, table can be replaced by other nouns:
a square (table + door + carpet + …)
in the same way as square can be replaced by other
distributionally similar adjectives:
a (square + oval + triangular + oblong + …) table
However, when an AN combination forms a compound noun,
distributional variation is blocked:
a square (root + *twig + *branch + …)
a (square + *oval + *triangular + *oblong + …) root
 Ambiguous
round table (free combination or compound noun).
 only syntactic environment may help to disambiguate it:
I have bought a round table for my dining room
(a piece of furniture)
I have attended a round table on French syntax
(an event)
Even if many compound nouns are
ambiguous with free word combinations,
usually they are much less ambiguous then
simple words.
in free NP, adjectives are just facultative modifiers
of the noun.
They can be deleted without changing the overall
meaning of the NP (nor the meaning of the
sentence where the NP is inserted):
John bought a (E + square) table
However, with some abstract nouns that express
predicates and are hence called predicative nouns
(M.Gross 1981; see below), the presence of a modifier is
often obligatory (Meunier 1981; Giry-Schneider 1995;
Laporte 1997):
He had an immense esteem for tradition
(Henry James, Portrait of a Lady)
*He had esteem for tradition
*He had an esteem for tradition
When the adjective is not a mere modifier of the noun,
usually it cannot be deleted, for it is the AN combination
that forms a compound lexical unit.
This is particularly clearer with semantically opaque
compound nouns:
John attended a round table on Chinese Syntax
*John attended a table on Chinese Syntax
John calculated the square root of 9
*John calculated the root of 9
But in some compounds, even frozen adjectives can be
For example, most of the times people calculate square
roots, so that in some languages – Portuguese, for
instance –, unless otherwise stated, the adjective
quadrada (equivalent to square) can be zeroed without
loss of information:
O João calculou a raiz (E + quadrada) de 9
(John calculated the (E + square) root of 9)
In many other cases, however, the adjective in a
compound noun functions as a classifier of the noun,
distinguishing a particular type of object:
John likes to drink (red + white + … ) wine
In this case, the adjective can be zeroed, with some loss
of information:
John likes to drink (E + red) wine
The classifying function of an adjective can be detected
by means of classifying sentences:
A red wine is a type of wine
NP with free modifiers cannot enter classifying
*A square table is a type of table
Of course, compound nouns cannot enter these
sentences either:
*A square root is a type of root
When an adjective functions as a modifier, it is
sometimes possible to see a (usually) small
distribution paradigm:
John calculated the (square + cubic) root of that value
John likes to drink (red + white + … ) wine
which is closed for distributional variation:
John calculated the (square + cubic +
*triangular + *spherical) root of that value
John likes to drink (red + white + *yellow
+ *blue… ) wine
In this sense, AN combinations where the adjective is a
classifier can be described as compound nouns.
The extension of distributional paradigm of the classifier
adjective can be rather large (acids) and open to the
coining of new terms; or relatively small (teeth and
vertebrae) and closed to further additions:
John poured some (ascorbic + citric + nitric + … )
acid into the solution
The dentist repaired one of my (incisive + canine +
molar + …) teeth
John was injured in one of his (cervical + lumbar
+ …) vertebrae
in the compounds of wine, one finds that many toponyms
(Ntop) designating wine-producing regions can replace
John likes to drink a glass of (wine + Porto + Bordeaux + …)
These combinations can be derived from a deleted
occurrence of wine :
John likes to drink a glass of (E + Porto + Bordeaux + …) wine
The number of Ntop wine combinations is
very large (every wine region),
but highly conventional,
determined by extra-linguistic factors.
Extensive lists can be made,
but of small linguistic interest.
Some adjectives combine in a highly exclusively way with
a very short set of nouns (often only one):
This noun is inflected in the nominative case
In these cases, the noun of some AN compounds (but not
all) can be zeroed, leaving the adjective in a (superficial)
noun slot:
This noun is inflected in the nominative (E + case)
The dentist repaired my (canine + molar +…)(E + tooth)
 with less ‘exclusive’ adjectives, N can be zeroed depending
on the syntactic context:
John prefers to drink red (E + wine)
to white (E + wine)
This is probably one of the reasons why dictionaries
have classified so many adjectives both as adjectives and
nouns (see M. Gross 1998 for further discussion of this subject).
 This is not always the case:
John was injured in a (*cervical + *lumbar + …)
 or it may depend on the language and the NA involved.
For Portuguese, for instance, zeroing of N in a similar case is observed
with some Adj but not others:
O João ficou ferido numa (E + vértebra) (cervical +
*dorsal + *lombar + *sacra)
A particular case of AN combinations : relation
adjectives, i.e. adjectives derived from nouns, such as
presidential (from President).
 These adjectives never allow the formation of the
relative clause, neither the insertion of an adverbial
The presidential address to the Congress
*The address to the Congress that was presidential
*The very presidential address to the Congress
<was very disturbing>
Nouns such as address express predicates and are
therefore called predicative nouns. (M. Gross 1981)
Relation adjectives, such as presidential, when
combined with predicative nouns, do not function as
mere modifiers of the noun. Instead, they are derived
from a complement NP:
The President’s address to the Congress
< was very disturbing >
In this sentence, President is interpreted as an
argument (in this case, the subject) of the predicative
noun address.
 This syntactic and semantic relation between the two
nouns (President – address) is of the same nature as the
relation between a subject and verb, and it has a formal
counterpart in the sentence:
The President made an address to the Congress
We consider this to be an elementary sentence,
the predicative node is the noun address,
which selects its two arguments (President, Congress).
 In this sentence, to make is a support verb
(Vsup; also called light verb):
 it is devoid of meaning and it functions as a
morphological tool to actualize the predicative noun,
carrying the tense morphemes that the noun cannot
Now, the adjective presidential can enter many other AN
combinations, involving predicative nouns:
The presidential campaign <…>
However, some of these combinations cannot be derived
from the reduction of support verb sentences.
In fact, the NP: The presidential campaign above is ambiguous :
(a) ‘the campaign that the President is making’,
NP is equivalent to:
The president’s campaign <has been extremely violent>
b) it is a campaign where many people run for the office of President
(and not necessarily the President himself),
NP can appear in sentences such as:
The presidential campaign <takes place in September>
Notice that the regularly derived NP cannot appear in this context:
*The president’s campaign takes place in September
It is therefore necessary to study in detail the
properties of all AN combinations where Adj is a
relational adjective and N a predicative noun in order
to determine if this combination can be regularly
derived from an elementary sentence with a support
verb or, else, if this derivation is blocked in some
way, and has become a compound noun.
(A. Monceaux 1999)
The next case illustrates a curious type of blocking
involving relation adjectives.
relational adjectives: solar (sun) or lunar (moon)
AN noun phrases regularly derived from elementary
sentences where moon or sun are an argument of a
predicative noun, such as eclipse:
the eclipse of the (moon + sun) <lasted 20 minutes>
the (lunar + solar) eclipse <lasted 20 minutes>
?*the (moon + sun)’s eclipse <lasted 20 minutes>
*the (moon + sun) eclipse <lasted 20 minutes>
There are, however, many AN combinations that one
cannot derive from moon or sun:
the lunar month <lasts 28 days>
*the moon’s month <lasts 28 days>
*the month of the moon <lasts 28 days>
*the moon month <lasts 28 days>
the solar year <lasts 365,25 days>
*the sun’s year <lasts 365,25 days>
*the year of the sun <lasts 365,25 days>
?*the sun year <lasts 365,25 days>
Finally, some compounds show morphosyntactic
constraints: while their elements can vary in gender
or/and number when used independently, together they
do not show any variation.
For example, national waters, is always used in the plural, in
spite of the uncountable nature of water:
They prevented the ship from entering
(national waters + *national water)
is a certain degree of institutionalization in
Sometimes several, different structures may be available in
the language in order to designate the same concept or
object, but the language retains only one of them.
‘machine used to take photographs’ :
photographic machine (AN)
photographing machine
(V-ing N, as in washing machine)
photo(graph) machine (NN, as in copy machine)
photographier (N-er, as in photocopier)
Instead, it is the simple word camera
that is used to name this object.
When comparing different languages, one finds out that
each may adopt a different strategy, hence:
appareil photo (NN) ‘photo aparatus’
*appareil à photographier (N à V),
*appareil photographique (NA)
*photograph(i)euse / *photograph(i)eur (N-eur)
máquina fotográfica (NA) ‘photographic machine’
*máquina de fotografar (N de V)
* foto-máquina (NN)
* fotografiadora (N-ora)/*fotografadora (V-ora)
In view of these language differences,
many dictionaries used in machine translation
may have to include some word combinations
regardless of its semantic transparence.
When describing different types of compound nouns,
different syntactic properties have to be used to determine
their degree of formal frozenness.
These properties are the very same that are used to
describe the syntactic relations between the elements of a
free noun phrase.
Compound nouns differ from free noun phrases in that they
do not admit some (or any) of these properties.
2. Compound Adverbs.
compound adverbs pose similar problems
Simple adverbs are already included in dictionaries
(if we do not consider the adverbs regularly derived from
adjectives with suffix –ly: rapidly),
but many compound adverbs were just left out
or, else, are described as mere expressive word
combinations with no particular lexical status.
adverbial status of a phrase, replaced by simple adverbs:
John is reading Shakespeare (now + at this moment)
For the most part they are formally identical to
prepositional phrases, but several combinatorial
constraints hold between two or more of their elements.
Usually the resulting overall meaning of the expression
can not be calculated from the sum of the meaning of its
internal elements.
Thus, we find several time adverbs formed with time-related
noun moment:
<That happened> at (this + that + the) moment
<I was doing this> for the moment
<I didn’t believe it> for a moment
<I did it> on the spur of the moment
<I did it> not a moment too soon
the combination of preposition and noun is frozen.
 If we would replace moment for another, almost
synonymous word, instant, most of these combinations
become unacceptable:
<That happened> at (this + that + *the) instant
<I did it> *for the instant
<I didn’t believe it> for an instant
<I did it> *on the spur of the instant
<That happened> ?not an instant too soon
Several adverbs look like an ordinary noun phrases:
One moment John was reading quietly, the next
moment he was crying
Some of these NP-like adverbs may derive from the
deletion of a preposition, while others do not:
(At + *on + E) one moment John was reading quietly,
(?*at + ?*on + E) the next (E + moment) he has crying
current spelling of many simple adverbs
denounces their former condition of phrases:
John goes jogging (everyday + every night)
The determiner of the noun can sometimes present some
formal variation, as in:
at (this + that + the) moment, for (a + one) moment
but it becomes frozen when its replacement involves a
clear change in the overall meaning:
John is reading Shakespeare for the moment
I believed for a moment John that was reading
In some adverbs the preposition and the noun may be
frozen but the noun allows for the insertion of modifiers:
<That happened> at that unfortunate moment
<That happened> at the moment we are speaking
<That happened> at this (precise + exact) moment
Some of these insertions may also be frozen:
<That happened> at (this + that + *the) very moment
<That happened> at the (last + *first) moment
<That happened> *at this (imprecise + inexact) moment
or depend on the determiner-modifier combinations
involved (for example, a definite article and a relative
<That happened> at (*this + *that + the)
very moment I was speaking
Other constraints on formal variation can be found:
<John arrived> not an moment too (soon + *late)
Some subordinate clauses function as frozen adverbs
(M. Gross 1986) :
<John will stay in his post> until hell freezes over (= forever)
<John will only get my post> when hens get teeth (= never)
<John will only get my post> when pigs fly (= never)
In these examples, one cannot change any element of the
(frozen) subordinate clause.
Particular cases of frozen subordination
are comparative frozen adverbs,
modifying verbs or adjectives:
<John moves> like a bull in a china shop (clumsily)
<John cried> like Magdalen (very much)
<The crowd rose to its feet> as one man
(together, at the same time)
<John is as fast> as a bullet (= very fast)
<John is as white> as a sheet (= very white)
Notice, in some cases, the absence of the first comparative
<John is deaf> as a post
Some compound adjectives may have been formed from
such comparative structures:
John is stone deaf
John is deaf (as + like) a stone
but others do not admit this paraphrase:
*John is post deaf
*John is bullet fast
*John is sheet white
are several compound adverbs that select (or
modify) only a limited set of verbs (or predicates):
John (knows + learned + recited) the poem by heart
the adverb man-to-man can only modify SPEAK-like verbs:
John (spoke + talked) man-to-man to Paul
However, there are often many distributional, unpredictable
*John (chatted + whispered) man-to-man to Paul
*John gossiped man-to-man with Paul
Certain verb-adverb combinations are so constraint that
the adverb can only modify a single verb:
John heard that (E + straight) from the horse’s mouth
(directly from a bona fide source)
Adverbs are facultative modifiers of the verb and can usually
be zeroed or replaced by other, simple word adverbs, but
these highly constraint combinations are closer to frozen
Therefore, linguistic description of compound adverbs is not
just a matter of showing their internal word combination
constraints. It also involves representing the way they interact
with the other sentence’s elements.
In this sense, it is, therefore, not very much different from
describing the syntax of simple adverbs.
3. Conclusions
The theoretical and methodological framework of
Lexicon Grammar has demonstrated the quantitative
importance of compounding in the many languages’
Using formal criteria to identify compound words made
clear that most of them show an internal PoS structure
similar to that of ordinary phrases.
Comparing the syntax of free combinations with
restrictions on those formal properties proved to be the
most correct way identifying compounds without having
to rely on vague, imprecise, and irreproducible meaning
At the same time, it is the very grammar of the language
that comes under scope.
Compounds are not just bizarre word combinations;
they are a clue to the language’s grammar.
Finally, by adopting a formal, taxonomical approach and
by the careful construction of linguistic resources,
Lexicon-Grammar enables researchers working on
different languages to compare their inventories and
their respective syntactic properties
(M. Gross 1984; J. Labelle (ed.)1995).
These comparative studies constitute a solid base for
many NLP, lexicographic or didactic applications, and
eventually for future machine translation.
ACL, 2003. Proceedings of the Workshop on Multiword Expressions: Analysis, Acquisition and Treatment.
Sapporo, Japan: ACL; 2004. Proceedings of the Workshop on Multiword Expressions: Integrating Processing.
Barcelona, Spain: ACL
Courtois, B. ; Garrigues, M. ; Gross, G. ; Gross, M. ; Jung, R. ; Mathieu-Colas, M. ; Silberztein, M. ; Vivès, R.
1997. Dictionnaire électronique des noms composés DELAC : Les composants NA et NN. Rapport Technique
du LADL nº 55, Paris : LADL.
Gross, G., 1988. Degré de figement dans les noms composés. Langages 90 : 57-72. Paris : Larousse.
Gross, G., 1990. Définition des noms composés dans un lexique-grammaire. Langue Française 87, Paris :
Gross, G., 1996. Les expressions figées : noms composés et d’autres locutions. Paris : Ophrys.
Gross, M., 1984. A linguistic environment for comparative romance syntax. Papers from the 12th Linguistic
Symposium on Romance Languages, P. Baldi (ed.). pp. 373-416. Amsterdam/Philadelphia: John Benjamins.
Gross, M., 1975. Méthodes en Syntaxe. Paris: Hermann.
Gross, M., 1981. Les bases empiriques de la notion de prédicat sémantique. Langages 63 : 7-52. Paris :
Gross, M., 1986. Grammaire transformationnelle du français. 3- Syntaxe de l’verbe. Paris : ASSTRIL.
Labelle, J. (ed.), 1995. Lexiques-Grammaires comparés et traitements automatiques. Linguvisticae
Investigationes Supplementa. Amsterdam /Philadelphia: John Benjamins.
Ranchhod, E.; De Gioia, M. 1996. Comparative Romance Syntax: Frozen adverbs in Italian and in Portuguese.
Lingvisticae Investigationes XX-1: 33-85. John Benjamins.