MODELLING LEXICAL PHRASES ACQUISITION IN L2*
    <http://lifc.univ-fcomte.fr/RECHERCHE/P7/pub/lars.htm#rn0>

CHANIER Thierry (1), COLMERAUER Colette (2), FOUQUERÉ Christophe (3),
ABEILLÉ Anne (4), PICARD Francoise (5), ZOCK Michael (6)

(1) Département de Linguistique, Université Clermont 2, France.
(2) Laboratoire de Langues, Université Aix-Marseille 2, France.
(3) LIPN-URA 1507, Institut Galillée, Université Paris 13, France.
(4) Département de Sciences du Langage, Université Paris 8 , France.
(5) CNRS-GRTC, Marseille, France.
(6) CNRS-LIMSI, Orsay, France.


      1. Introduction

The acquisition of lexical competency is considered as one of the major
problems in foreign language learning (L2): in order to be able to use
properly a given word, the learner must have previously assimilated not
only their morphologic and syntactic properties, but also their semantic
and pragmatic features. Furthermore, this process of knowledge
acquisition is incremental and nonmonotonic, that is, rule formation is
based on incomplete data, hence, rules may have to be revised completely
in the light of new evidence (data).

The problem we have chosen to focus on within this project is that of
/lexical phrase/ or /semi-frozen phrase/ (SFP)1
<http://lifc.univ-fcomte.fr/RECHERCHE/P7/pub/lars.htm#rn1>.
Theoreticians and practitioners of second language acquisition have
stressed their importance: they are frequently used by natives in their
mother tongue, their acquisition is considered to pose problem in second
language learning, and these phrases occur far more often than simple
lexical items2 <http://lifc.univ-fcomte.fr/RECHERCHE/P7/pub/lars.htm#rn2>.

A SFP is a phrase in which certain parts cannot be altered (these parts
being subject to restricted syntactical variations) without affecting
the original meaning or function of the expression. Take for example an
expression like "abandonner le navire" (abandon the ship).In this case
one can substitute the word abandonnerby quitterbut not navireby bateau.

At first glance SFPs may be divided into two sets: phrases with a
/referential meaning/, like idioms, clichés and proverbs [DANL 88; GROS
82; REY 91; COWI 83], and phrases with a /discourse function/, that is,
phrases which can only be pragmatically interpreted, e.g. "How do you
do?" [MANS 83 ; NATTI 88]3
<http://lifc.univ-fcomte.fr/RECHERCHE/P7/pub/lars.htm#rn3>.

L2 didacticians provide strong arguments in favour of teaching these
phrases [NATTI 88] : the user may operate on larger chunks rather than
atomic units (words), hence this strategy alleviates the burden of
memory and processing. Furthermore, it allows the learner to focus her
attention on conceptual (discourse structure, coherence) and pragmatic
aspects of the dialogue (social aspects of the interaction, adequacy of
the means chosen). A proficient learner may thus avoid the violation of
certain lexical restrictions, errors of register, and so forth when
producing a discourse.

Given the fact that the linguistic functions of these two sets of SFPs
are fundamentally different, they should be introduced at different
stages during the learning process. It would be very difficult to study
both of them concurrently, that is why we have decided to concentrate
only on one of them, namely, phrases with referential meanings.

An idiom is defined as an expression, whose meaning of its constituent
elements does not appear in the global meaning, e.g. /casser sa
pipe/(kick the bucket). The non-compositionality of an idiom is
generally considered as the discriminating factor of SFPs. The
processing of idioms by adults and their acquisition by children have
been studied by psychologists. But their studies have been limited so
far to L1. Section 2 provides an overview of the results concerning the
classification of idioms.

With regards to L2 acquisition of SFPs with referential meaning, we
believe that idioms should not be distinguished from the rest, such as
clichés: être gai comme un pinson / be as gay as a lark. Their learning
is often introduced at an advanced level. Mastering such idioms is
generally considered to be difficult in production as well as in
comprehension, because (a) their meaning cannot be deduced from their
components; (b) their syntactic variations are severely constrained
(*casser sa _propre_ pipe4
<http://lifc.univ-fcomte.fr/RECHERCHE/P7/pub/lars.htm#rn4>); (c) their
figurative meanings may introduce nuances obvious for natives, but far
from evident for the second language learner (see, for example, the use
of casser sa pipe/kick the bucketvs. to die). For all these reasons it
seems necessary to distinguish SFP acquisition from the acquisition of
single words and collocations. For example: il fait enregistrer ses
bagages / he checks in his luggage. The latter can be semantically
decomposed and have no restrictions with regards to syntactic
variations. It is perfectly possible to produce variations like:

His luggage was checked in.

He checked his luggage in late.

This paper starts by presenting psycholinguistic evidence concerning the
processing and acquisition of SFPs in L1. It provides then a survey of
the work done by linguists and a contrastive analysis of SFPs in two
languages. Finally, it describes our project from a linguistic,
psychological and compuational point of view.

The focal points in our study are the following :

  * to compare the views psycholinguists and computational linguists
    have concerning the processes of lexical access and lexical choice;
  * to show the similarities holding between the structure of the
    lexicon in L1 and in L2;
  * to offer a pedagogically realistic approach in vocabulary teaching
    based on these results. 


      2. ACQUISITION AND PROCESSING OF SFPs

Different researchers have offered different hypotheses concerning the
relationships between SFPs and their constituents, and concerning the
organisation and access of SFPs within the mental lexicon. The results
of this research show that different SFPs are learnt at different
moments. These data apply both for learning SFPs in a natural setting or
in an institutional environment.


        Processing

Early experiments on the comprehension of SFPs were based on the
assumption that SFPs are /non-decomposable/ and that idioms can be
processed either literally or figuratively. It was claimed that the
figurative meaning of a SFP is directly represented in the mental
lexicon. Each SFP has only one lexical entry, in a similar way as simple
words. Authors disagreed on the starting point and the duration of the
literal or idiomatic processing.

Lexical access of an SFP is based on its meaning. It occurs either (a)
directly before literal processing [GIBB 80]; (b) at the same time as
literal meaning is processed [SWIN 79]; or (c) after the rejection of a
literal interpretation - this may be the case when contextual factors
exclude it as irrelevant [BOBR 73].

These models, like the earlier ones concerning the mental lexicon, are
now considered as oversimplified. They cannot explain certain results
obtained in more recent experiments. Important differences in the
processing of SFPs have been noted. SFPs do not form a unique class of
linguistic items. Researchers studied parameters like semantic
analyzability [CACC 88; GIBB 89b], familiarity, frequency [SCHW 86; POPI
88], syntactic and lexical flexibility [GIBB 89a].

The assumption that an SFP is semantically non-decomposable (assumption
which was used among other things to define SFPs) needs to be
reconsidered. Experiments have shown that in order to understand the
meaning of a SFP people tend to analyse the expression syntactically and
semantically.

The main elements of the phrase are activated during comprehension. The
literal and/or figurative meanings of these elements give access to the
global meaning even when the phrase is transformed. It should also be
noted, that the role played by the different components is unequal. The
stronger the intrinsic meaning of the constituents, the easier will be
the understanding of the SFP. The native speakers’ intuitions with
regards to the potential analyzability of a SFP correlates with its
syntactic flexibility, its ease of comprehension and its semantic
productivity [GIBB 89a; GIBB 89b].

Cacciari and her colleagues classify idioms in the following way [CACC
91]5 <http://lifc.univ-fcomte.fr/RECHERCHE/P7/pub/lars.htm#rn5>:

- _type N, non-analyzable phrases_: syntactic or semantic analysis of
the components is difficult for one of two reasons: (a) the surface
structure does not obey the rules of syntax, or (b) the expression
contains words which do not exist outside of the SFP. By and
largeillustrate the former (syntactically odd), _spic_and span, prendre
la poudre d'_escampette_, _apurer_un compte, en avoir _marre_ de,are
examples of the latter (lexically ill formed).

- _type AO, analyzable-opaque_: the relationship between the words of
the phrase and the meaning of the whole is unclear. However, every word
may contribute to the meaning and use of the phrase: kick the bucket /
casser sa pipe.

- _type AT, analyzable-transparent_: the relationship between the words
of the phrase and the meaning of the whole is more transparent. This
correspondence is based on the fact that each element has often both a
literal and figurative (mainly metaphoric) meaning: break the ice/
briser la glace, pop the question, spill the beans, rouler sur l'or.

- _type M, quasi-metaphorical_: there is nearly a perfect match between
the literal and the metaphorical interpretation: abandon the ship /
abandonner le navire.

The research mentioned focused on the relationships between the
canonical form of a SFP (and its meaning) and the various forms it could
take in discourse, or on the medhods of analyzing its components.
Recently, possible relationships of SFPs belonging to the same semantic
field have been studied. For example, Gibbs [GIBB 90] has shown that
native speakers use distinctively two semantically similar expressions
like blow your stackand bite his head offdepending on the
appropriateness of the metaphor for the discourse (respectively, "anger
is pressurized heat" and "anger is animal behaviour").

We adopt in this study a sort of reverse approach, that is, we start
from the global meaning of the SFP and end in its verbalization. The aim
is to show to what extent the meaning of a SFP is determined by
conceptual metaphors [GIBB 90; CACC 92].


        Acquisition in L1

Recent work on L1 acquisition of idioms casts considerable doubt on the
predominant view that children are unable to understand idioms before
having reached the stage of figurative competence, that is, before the
age of 9 or10 [CACC 89, 92; GIBB 87]. These experiments also show that
exposure, though being very important for production, is not the main
factor of idiom acquisition. Furthermore, they show that idiom
acquisition does not rely heavily on rote learning.

Levorato and Cacciari identified three stages for acquiring these
expressions [LEVO 92]. At stage one (6-7 years), children are able to
comprehend the idiomatic meaning if the context provides strong cues.
When the child feels that the initial literal interpretation is
irrelevant she will replace it with a figurative interpretation, the
latter being built by using contextual linguistic knowledge. At the
intermediate stage, when the child has enriched her linguistic
competency (by then she is able to handle irony, speech acts,
conventional metaphors,...), she may directly access the figurative
meaning. It should be noted, that the acquisition of idiomatic
expressions requires the comprehension/reconstruction of their
figurative meanings when this is possible. The last step is
characterized by the fact that the child has full control of the idioms,
that is, she starts to use them creatively6
<http://lifc.univ-fcomte.fr/RECHERCHE/P7/pub/lars.htm#rn6>. According to
the type of idiom (opaque, transparent, quasi-metaphorical) different
strategies can be observed.

Other experiments focused more on strategies used by children and adults
in L1 in order to understand the meaning of unknown idioms. According to
Cacciari [CACC 92] these strategies vary with the type of idiom: "the
interpretative strategies children thought as more appropriate reflect
the perception of the semantic characteristics and cognitive
complexities of idioms". Semantic transfer is used first for quasi
metaphorical expressions (muet comme une carpe / as dumb as a fish).
Transparent idioms require to perform mentally the action described
(chercher une aiguille dans une botte de foin / to look for a needle in
a hay-stack). General strategies like to ask adults, to perform the
action, or to find examples, are preferred in the case of opaque
expressions (être au septième ciel / to be in the seventh heaven). In
this latter case, a lot of miscomprehensions occur, which can be viewed
as a lack of any strategy.


      3. Lexical studies in L1 and L2

We focus here on lexemes as they are found in a dictionary. Our examples
are based on groupings performed by linguists. They range from simple
noun phrases or verb phrases to sentences with an idiomatic
interpretation. We do not consider proverbs and sayings. In order to
gain a better understanding concerning the related semantic phenomena
within a language (such as focalisation, irony, register,…) and the
syntactic-semantic correspondences between two languages (in our case
English and French), we have decided to consider only examples
pertaining to a specific semantic field: deception.

N0 prendre N1 dans ses filets7
<http://lifc.univ-fcomte.fr/RECHERCHE/P7/pub/lars.htm#rn7>
N0 avoir N1 dans les grandes largeurs / N0 to pull a fast one on N1
N0 dorer la pilule à N1 / N0 to sugar the pill for N1

Since meaning has been factored out (it is invariant), we can now show,
by varying linguistic and pragmatic factors, how these parameters
influence lexical choice.


        Lexical studies in L1

We start with some general, language independent remarks concerning the
syntactic structures of expressions.

With the exception of non-analyzable phrases, SFPs have standard
syntactic structures, covering simple sentence forms. Gross [GROS 82],
after an extensive study of these expressions in French, concludes that
frozen noun phrases occur much more often in complement position than in
the position of subjects; and that the number of frozen complements is
restricted to two, even though syntax may allow for more. The
expressions of our field of study (deception) have the same
characteristics. Besides this kind of structural information we have to
show how context codetermines the meaning and the choice of a specific
lexical expression. Hence we have to take into account factors like:

- focalisation: N0 avoir N1 / N0 to have N1 on versus N1 se faire avoir
/ N1 to be done.

- shades of meaning (nuances of language)

- preservation of the images induced by context.

For instance, the following expressions in French denote different
intensities of deception:

1. faire une farce (or variations: faire une bonne/belle/petite/grosse... )
2. faire une mauvaise farce
3. faire un coup tordu
4. faire un coup fourré
5. faire un coup de Jarnac
6. faire le coup du père François

While the first two expressions convey the notion of jokes, examples
three and four express the general notion of deception. The last two
expressions add the notion of intensity to the notion of deception.
Instead of translating un coup tordubya bad trick, and un coup de
Jarnacbya stab in the back, we investigate how the second language
learner marks the degree of deception in the target language.

It is also very important to learn the contexts in which two expressions
having approximately the same meaning can be used. The image expressed
by a lexical phrase may entail many implications concerning the context
(see stylistic classification below). For instance, how does a learner
(L1 or L2) become aware of the difference between the following two
phrases ?

7. il a trouvé la pilule amèreversus la pilule était amère à avaler
or simply la pilule était amère or il a avalé la pilule

8. il a avalé des couleuvresor on lui a fait avaler des couleuvres,
il a eu beaucoup de couleuvres à avaler

Obviously, she soon understands that /swallowing bitter pills/ or
/snakes/ is a rather unpleasant experience that you do only when being
forced. Furthermore, she notices that pills are smaller than snakes and
easier to swallow. Finally, she learns that a pill is swallowed only
once but snakes are swallowed one by one. Note, that pill is always
singular, whereas snakes are always plural in the idiom. This is
information that has to be highlighted so that the learner understand
when to use which form. For instance, it is possible to help in
discovering these constraints by presenting the following examples:

9. Quand il est rentré de vacances,il a trouvé son bureau sens
dessus-dessous et son ordinateur volé!
	When he came back from holiday, his office was all in a mess and his
computer stolen !
- la pilule a été amère à avaler ! 	- a bitter pill to swallow
- * il a avalé des couleuvres ! 	- he had to swallow snakes

10. Quand il s'est marié, toute sa belle famille lui a reproché ses
origines paysannes et le lui a fait sentir

	When he got married all his family in law resented his countryside
origins and let him know
- * la pilule a été amère à avaler ! 	- a bitter pill to swallow
- il a avalé des couleuvres ! 	- he had to swallow snakes

Furthermore, a person knowing the meanings of farce / trick, pilule
amère / bitter pilland couleuvres / snakescan easily understand the
examples (1,2,9,10), but not (3,4,5,6) which are more "frozen" idiomatic
expressions.

We agree with Gibbs when he states: "The more analysable an idiom (i.e.
the more speakers are aware of these phrases as having separate
meaningful units), the more likely that the expression is syntactically
productive" [GIBB 89a:102]. And we could paraphrase this idea by "the
more learners are aware of these phrases as having separate meaningful
units, the more easily they will understand and learn them" provided
that there are semantic markers (intensity, frequency, actors,..).

Without going too deeply into rhetorics [FONT 77], we would like to
remind the reader of the definitions of a few figures of speech. A
/trope/ is a figure of speech that figuratively enhances the explanation
and the expression of an idea. There are various tropes, metaphors being
the most well known. A /metaphor/ is an image that signals a resemblance
between the object described and the idea expressed by the metaphor (he
is a wolf = this man is bold like a wolf). /Metonymy/ is a figure of
speech where a symbol stands for the whole (the crownfor the monarch),
whereas in a /synecdoche/ a part stands for the whole or vice-versa (50
heads of cattlefor 50 cows). The figures of speech can also be combined.

They exist in any language.

For example, in French un renard / foxis a metaphor, une fine épice /
spiceis a metonymy and an expression like pas folle la guêpeis a
litotes8 <http://lifc.univ-fcomte.fr/RECHERCHE/P7/pub/lars.htm#rn8>. The
fact that these tropes are linguistic universals is very useful for L2
acquisition.


        Comparison of phrases in L1 and L2

By comparing languages like Italian, English and French one notices that
there are many correspondences between idiomatic expressions, be it at
the lexico-syntactic or metaphorical level [D'ELI 90 ; FREC 85]. This
comes as no surprise, since these countries are also culturally and
linguistically relatively close.

In a comparative study of Italian and French, Conenna has shown [CONN
84] that out of 2000 Italian expressions having the structure V - frozen
direct object complement, more than fourty percen could be translated
(nearly) literally into French. Freckleton [FREC 85] studied numerous
idioms, including the English verbs take, playand hit: nearly eighty
percent of the 238 expressions built with the verb takehave an idiomatic
correspondence in French, 20% have a quasi word-by-word translation.

With regards to playand to hitthe corresponding percentages are
respectively 77%,15% out of 60 expressions (37%, 3% out of 38 expressions).

Approximately the following correspondences can be given for two languages:

1) _Word-for-word correspondences_ (neglecting marginal differences such
as presence of determiners, possessives, matching of number, i.e.
singular vs. plural )

_

_
__
__

14. N0 cache ses cartes, son jeu / N0 hides one's cards
15. N1 tombe dans le piège / N1 falls into the trap

Other well known examples falling out of our restricted domain are:

break the ice / rompere il ghiaccio / rompre la glace
take the bull by the horns / prendere il toro per le corna / prendre le
taureau par les cornes

2) _Partial lexical difference_, _similar tropes_

__
__

In French jouer au plus finmeans to play the smartest wit,witremaining
unexpressed. Although, to have a battle of witsis an expression in English.

16. N0 joue au plus fin avec N1 / N0 has a battle of wits with N1
17. N0 traite N1 à la légère / N0 plays fast and loose with N1

Or :

be as gay as a lark / être gai comme un pinson
kill time / passare il tempo / tuer le temps

3) _Structural correspondences, different tropes_

The same idea is expressed by two "tropes" with the same topic N0:

18. N0 joue/ mise sur les deux tableaux
19. N0 runs with the hare and chases with the fox

Or :

hit the road / prendre le large

This case seems to pose problems for a second language learner, yet an
adult will quickly understand, provided that all the vocabulary is
explained (hare, fox, chase). However, a problem may remain in cases
with opac reading (18). In this latter case the lexeme tableauhas an
(old) unusual meaning, referring to the stages of making professional
progress.

4) _Neither syntactic nor lexical correspondences_

A typical example may be:

take the wind out of N's sails / couper l'herbe sous le pied de N

Compare (19) and (20):

19.pas folle la guêpe!
20.a sharp customer!

(19) is a /metalepse/, that is different layers of tropes.

Un guêpier / a wasp's nestis a metaphor meaning /a difficult state of
affairs/. When combined with a synecdoche, N1becomes une guêpe/ a wasp.
Finally, an understatement, or litotes, is used in order to express
about the animal's clever reaction. 20 seems to be the equivalent of 19,
that is N1should be fooled in a commercial transaction, but being clever
he reacts in a, adequate way. In order to express these subtleties, we
have two different images and two different syntactic constructions.

5)_No idiomatic correspondence_

The expression to take the fifthrefers to the US constitution whose
fifth amendment allows people not to answer a question if the
consequence may incriminate her. There is no equivalent expression in
Italian, French, or English.


      4. Experiments and modelling of their acquisition in L2

We have given already in section 3 several examples of SFPs for a
restricted domain (deception). Our purpose is to study the SFPs within
this domain, and to specify the conditions under which these expressions
are used in a given langage. In order to do so we will build a
structured lexicon. This latter contains conceptual, linguistic and
pragmatic information like the position of an expression within a
hierarchy, its syntactic constraints, lexical functions in the spirit of
Melçuk [MELC 86], communicative function (irony, topicalization) and the
surface form of the expression (lexemes).

This study should prove useful in order to discover basic aspects of
lexical acquisition in L2. The correspondance of expressions like the
ones given in section 3 should provide us with valuable clues concerning
the learners’ behaviour. We would like to find answers to the following
questions:

  * is it easier to understand and to produce expressions of type 1 and
    2 (the two being quite similar in meaning and form and)?
  * what will be prevalent in the choice of an expression, the meaning
    (image) or the linguistic form?
  * are paraphrases a reliable tool to measure the student’s
    comprehension even if she cannot find the exact equivalent in her
    mother tongue?
  * what influence may similarities between two languages (type 2 and 3)
    have on the learning process (positive/negative interferences)?
  * what is the role of correspondences between two languages, what is
    their respective role compared to the other factors mentionned here
    above (L1 acquisition)?

In order to answer these questions we have designed transversal and
longitudinal experiments for second-language learners of French and
English. Having the computer run the experiment will remove the semantic
decisions from the intuitions of the researcher and place the burden of
proof on the computational mechanisms [KREU 91].


        Transversal experiments on recognition and production

_

Population

_

Two different transversal experiments will be run in English and French
with two types of second-language learner: beginners and students of
intermediate level.

_

Corpus and objectives

_

We will present various lexical phrases that are syntactically,
semantically and pragmatically representative of the different classes.
The experiment will be done along the lines of Levorato and Cacciari
[LEVO 92]: the idiom under study was the conclusion of an easily
understandable, very short story.

_

Hypotheses

_

According to Cacciari and Glucksberg [CACC 91] context provides a rich
set of clues for interpreting and producing idioms. Given a specific
kind learner differences in acquisition depend on the semantic class of
the idiom (opaque, transparent or metaphorical meaning).This being so we
would like to compare relative ease of idiom acquisition in L1 and L2.
We would also like to find out to what extent words frequently used in
the mother tongue facilitate the acquisition of these words in a foreign
language? In our experiment learners should be able to perform syntactic
operation on the idioms. The students’ errors should give us valuable
information concerning L2 acquisition.


        Longitudinal experiments: Lexical awareness among L2 adult learners

In a longitudinal study we want to check and to promote lexical
awareness of adult L2 learners. We will build an interactive dictionary
that is meant to foster lexical competency of second-language,
university level students. They need English as an everyday tool for
reading scientific material, and they have to read authentic material
(articles) which forms a specific type of "sublanguage". Scientific
articles offer a good range of expressions and idioms and they use the
same expression over and over.

This environment offers an on-line monolingual dictionary with embedded
information, so that only useful information will be displayed. After
each reading session it will update the student’s personal lexicon by
adding the new words and expressions encountered. This kind of lexicon
will enable the student to memorize words into their context and to link
new contexts to older ones. By the end of two years the student will
have encountered most of the words he needs in their appropriate context.

There are basically three ideas behind this tool:

- have the students deduce the word’s meaning from context (from the
dictionary and from the text itself) and help them to discover the
relationship among words;
- let them use the intuition of their native language in order to
understand the figures of speech and the meaning of idioms;
- promote free association and cognitive awareness in order to enhance
memorization

Another purpose of this approach is to provide adults with a tool that
allows them to build a structured lexicon for metaphoric and idiomatic
expressions. In fact this second goal will be reached while constraining
dictionary and lexicon accesses, which will be carefully tracked and by
the management of a personal lexicon by the student. All interactions
(lexical access, …) will be recorded and this trace will provide use
with valuable clues concerning the way how literal and metaphorical
meanings are understood, and concerning the way how learners link new
words to old ones.

The task: at each session the students choose an article of around 800
words and answer a few questions by taking into account the following
instructions:

- read the text entirely;
- point out dubious words or expressions;
- check the meaning of unknown words in the dictionary;
- if necessary, consult the personal lexicon (The personal lexicon
contains, among other things, the associations a user may have when
encountering a given word)

The exercise is done when all questions are answered. If the student
does not understand the meaning of a word or expression he is invited to
consult the dictionary. At the end of the session students are asked to
make "free associations" with the new words. Free associations should
enhance ease of word access. All interactions (words checked in the
dictionary, free associations,…) are recorded in the personal lexicon.


        Computer modelling

Data collected from these experiments should help us to gain a better
understanding concerning the learning and production lexical phrases in
French and English as a second-language. The next (and to some extent
parallel) step will be to define a model of mental lexicon. This model
is based on the following three parts:

- 1) a description of lexical phrases with a limited number of concepts,
containing all important syntactic, semantic and pragmatic features;

- 2) a model of automatic parsing and generation;

- 3) models of recognition and production strategies for lexical phrases.

The first task can be divided into several subtasks. We will list all
words and expressions of a semantic field, in our case, human
deception). A structural description of the corresponding SFPs is
undertaken in terms of lexicalized TAGs (Tree Adjoining Grammar, for
more details see [ABEI 89]).

The same grammar is used in order to describe the syntactic behaviour of
SFPs as well as other sentences and to parse all of them. Within the TAG
paradigm a SFPs corresponds to a simple entry (that is, a
non-compositional entry) which can be made out of several discontinuous
items. Insertion of modifiers can easily be handled: it is thus possible
to distinguish between an insertion that applies to a sub-component of a
SFP, an insertion which modifies the entire expression, or an insertion
that rules out the idiomatic reading. Different kinds of transformations
can also be handled: passivation, topicalization, cleft constructions,
relativization, wh-questions, pronominalization, ….

The last two hardly ever apply to SFPs. One important point here is to
measure the extent to which a SFP deviates syntactically from its
literal counterpart, because, if only few of them deviate, we could have
a predictive model of SFPs.

SFPs must be described not only in terms of syntax, but also in terms of
semantics and pragmatics.

The semantic description can be done in functional terms, that is, in
terms that spell out relationships holding between SFPs. The result of
such description should make obvious which SFPs are closely related in
terms of meaning, what relationship holds between a SFPs and a general
concept (N0 to have N1 on/ N0 deceive N1), or what is the relationship
between a SFP and its word components. In order to achieve goal we have
to go beyond simple relations such as synonymy and hyponymy. What we
need is a set of lexical functions as rich as the ones provided by the
Meaning-Text Theory [MELC 86]. The pragmatic description will be based
on the most significant parameters governing the use of SFPs in a
communicative setting (register, familiarity, frequency, …).

The four tasks related to part one gives only a prescriptive view of the
linguistic knowledge of SFPs in one domain, i.e. the expert/linguist's
view. How does that knowledge relate to the learner's mental lexicon in
L1 and L2 ? A simplistic way would be to consider the learner's
knowledge as subset of the expert's knowledge. This known as the
/overlay approach/ in learner modelling in Intelligent Tutoring Systems
[WENG 87]).

Moreover, the output of this description should give us a classification
of SFPs w.r.t. different criteria which can help to structure lexicons
in L1 and L2 and predictive model concerning what can be learned most
easily. This classification will be tested along the lines presented
here above. We believe that this kind of interaction between the
empirical work donc by psychologists (experiments with people) and the
simulation done by computer scientist (description and simulation of a
theory) are a fundamental research strategy. Nearly all experiments done
in psycholinguistics concerning idioms rely on the designers' linguistic
intuitions. This intuition usually is too scarce to allow for building a
computational model.

The static part of the mental lexicon (covering L1 and L2) can be
divided into rougly two components: declarative, linguistic knowledge
and procedural knowledge. Part 2 will account for the access and
production processes, which in the area of natural language processing
are called parsing and generation. Important problems which should be
addressed are: when parsing an utterance, what factors determine the
recognition of several words being an idiomatic expression rahter than a
sequence of words? (As previously mentioned syntactic information can
rule out the idiomatic interpetation, or accept both readings. The point
here is to determine what information blocks the literal interpretation.
Is it the word’s meanings, the co-text?, etc.

With regards to generation, the question is to know what factors
determine the selection of an idiom rather than an expression composed
of free elements. Systems which let the learner discover these factors
empirically have been proposed already in the context of computerized
language learning. Having determined the content, SWIM asks the user to
try on his own to find the equivalent linguistic structure (words and
sentence form). It is only then that the system generates its version
[ZOCK 92].

As stressed in section 3, many SFPs are related to tropes, and even if
hearers do not need to apply the trope strategy of interpretation in
order to recognise them in L1, this strategy is still directly
accessible and often used when changing lexical items in expressions
[GIBB 90]. In consequence, another important capacity of the
parser/generator should be to allow for, or to produce lexical
variations. Consider examples (27) and (28), taken from [CRUS 86: 42]:

27. They tried to sweeten/sugar the pill.
28. They tried to sugar the medicine.

A parser taking (28) as input should be able to access the SFP used in
(27) ; conversely, a generator should be able to produce both versions,
as they are reasonably close in meaning.

Part 3 refers to the dynamic aspects of the mental lexicon. We use the
word "dynamic" in order to refer to the /acquisition/ of SFPs rather
than to their processing in discourse (analysis/generation). Description
of linguistic knowledge (part 1) and the implementation of processes
that can access it (part 2) are necessary components for modelling a
given knowledge state of the learner. In order to capture the dynamic
nature of acquisition we have to represent explicitely the factors that
play a role when moving from one (knowledge) state to another. The
strategies a second language learner develops in order to understand or
produce utterances are an important issue. They can be divided into
first or second-language strategies [IRUJ 86], i.e. strategies that
apply knowledge either of the source or the target language. A typical
example of a target language strategy in generation is the substitution
of semantically related words whenever one does not fully remember a
SFP. In the same situation the computer would apply the processes (part
2) mentioned here above, in order to preserve the trope.


      5. Conclusion

Semi-frozen phrases represent a challenge for researchers studying the
acquistion of words and expressions in a foreign language: though being
frequently used in the mother tongue, their acquisition is considered as
being difficult in L2. We have offered a framework in order to study
SFPs, to compare them and to explain why they are so difficult for the
second language learner. A lot of research has been done in
psycholinguistics on the processing of idioms in L1. There have also
been some studies on idiom acquisition in L1, but very little work has
been done on the processing of idioms and their acquisition in L2. We
have reviewed some of the results in L1 that seem to be relevant for L2
learning, and characteristic phenomena of lexical acquisition in a
target language such as the correspondence of expressions in the source
language, among which tropes may play a crucial role. We have sketched
some possible experiments that could be run with L2 learners and a
computational model of lexical phrase acquisition. Finally, we have
stressed the need for a computational model in order to move away from
the researchers’ intuitions and to get at grips with some of the
problems that really are at stake in L2 acquisition.


      References

[ABEI 89] Abeillé A., Schabes Y.: "Parsing Idioms in Lexicalized TAGs".
European Chapter of the ACL, Manchester, April.

[BOBR 73] Bobrow S., Bell S. (1973): "On catching on to idiomatic
expressions". /Memory and Cognition/, 1, pp 343-346.

[CACC 88] Cacciari C., Tabossi P. (1988): "The comprehension of idioms".
/Journal of Memory and Language/, 27, pp 668-683.

[CACC 89] Cacciari C, Levorato M.C. (1989): "How children understand
idioms in discourse". /Journal of Child Language./, 16, pp. 387-405.

[CACC 91] Cacciari C., Glucksberg S. (1991): "Understanding Idiomatic
Expressions: the contribution of Word meanings". In /Understanding Word
and Sentence/, Simpson G.B. (ed), Elsevier Science Publishers,
North-Holland. pp 217-240.

[CACC 92] Cacciari C(1992): "The place of idioms in a litteral and
metaphorical world. In /Idioms processing, structure and
interpretation./, Cacciari C., Tabossi P. (eds), Hillsdale, NJ: Erlbaum.

[CART 88] Carter R., McCarthy M. (eds) (1988): /Vocabulary and Language
Teaching/. Longman.

[CONN 84] Conenna M. (1984): "Les expressions figées en français et en
italien: problèmes lexico-syntaxiques de traducion". /Contrastes/, 10.

[COWI 83] Cowie A.P., Mackin R., McCaig I.R. (1983): /Oxford Dictionary
of Current Idiomatic English/. Volume 1 et 2. Oxford University Press:
Oxford.

[DANL 88] Danlos L. (ed.) (1988) : Les expressions figées. /Langages/ ,
n° 90, juin.

[D’ELI 90] D'Elia C. (1990): "Idioms dictionaries : Italian and
English". /Linguisticae Investigationes/, tome XIV, 2, pp 263-300.

[FONT 77] Fontanier P. (1977): /Les figures du discours, /Flammarion,
Paris, 1ère édition 1818.

[FREC 85] Freckleton P. (1985): /Une comparaison des expressions de
l'anglais et du français. /Doctoral Thesis, Université Paris 7: LADL.

[GIBB 80] Gibbs W.R. (1980): "Spilling the beans on understanding and
memory for idioms in context." /Memory and Cognition/, vol 8, pp.149-156.

[GIBB 87] Gibbs W.R. (1987): "Linguistic factors in children's
understanding of idioms." /Journal of Child Language/, vol 14, pp.569-586.

[GIBB 89a] Gibbs W.R., Nayak N.P. (1989): "Psycholinguistic Studies on
the syntactic behavior of idioms.", /Cognitive Psychology/, 21, pp100-138.

[GIBB 89b] Gibbs W.R., Nayak N.P., Cutting C. (1989): How to kick the
bucket and not decompose: analyzability and idiom processing". /Journal
of Memory and Language/, 28, pp 576-593.

[GIBB 90] Gibbs R.W., O'Brien J.E. (1990): "Idioms and mental imagery:
the metaphorical motivation for idiomatic meaning". /Cognition/, 36, pp
35-68.

[GROS 82] Gross M. (1982): "Une classification des phrases figées du
français", /Revue quebécoise de linguistique, /Vol 11, n°2, Montréal:
Presses de l'Université du Québec à Montréal, pp 151-185.

[IRUJ 86] Irujo S. (1986): "Don't put your leg in your mouth: transfer
in the acquisition of idioms in a second language". /TESOL Quaterly/,
vol 20, 2, pp 287-305.

[LEVO 92] Levorato M.C., Cacciari C (1992): "Children's comprehension
and production of idioms: the role of context and familiarity". /Journal
of Child Language./, in press.

[KREU 91] Kreuz R.J., Graesser A.C. (1991): "Aspects of idiom
interpretation: comment on Nayak and Gibbs". /Journal of Experimental
Psychology./, vol. 120, n° 1, pp. 90-92.

[MANS 83] Manser M. H. (1983): /A dictionary of contemporary idioms/.
Pan books: Londres.

[MELC 86] Mel'Cuk I.(1986): /Dictionnaire Explicatif et Combinatoire du
Français Contemporain./, Presses de l'Université de Montréal, Québec.

[NATTI 88] Nattinger J. (1988): "Some current trends in vocabulary
teaching. /Vocabulary and Language Teaching/.. Carter R., McCarthy M.
(eds). Longman.

[POPI 88] Popiel S.J., McRae K. (1988): "The figurative and literal
senses of idioms, or all idioms are not used equally". /Journal of
Psycholinguistic Research, /vol 17, 6, pp 475-487.

[REY 91] Rey A., Chantreau S. (1991): /Dictionnaire des expressions et
locutions. Dictionnaires/ Le Robert: Paris.

[SCHW 86] Schweigert W.A. (1986): "The comprehension of familiar and
less familiar idioms." /Journal of Psycholinguistic Research, /vol 15,
1, pp.33-45.

[SWIN 79] Swinney D.A., Cutler A. (1979): "The access and processing of
idiomatic expressions." In /Journal of verbal learning and verbal
behavior/, n°18, pp 523-534.

[WEN 87] Wenger E. (1987): /Artificial Intelligence and Tutoring
Systems/. Morgan Kaufmann.

[ZOCK 92] Zock M.**(1992): "SWIM or SINK: the problem of communication
thought". /The Bridge to International Communication: Intelligent
Tutoring Systems for Foreign Language Learning/, Swartz M et Yazdani M.
(eds). Springer-Verlag, NATO ASI Series, pp. 235-247.

------------------------------------------------------------------------


      Notes

* <http://lifc.univ-fcomte.fr/RECHERCHE/P7/pub/lars.htm#n0> This project
has been supported by the Parisian joint research board in Cognitive
Sciences (GDR 957 "Sciences Cognitives de Paris") of the French national
research council (CNRS). M Pengelly of Lancaster University, UK, helped
with the translation.

1 <http://lifc.univ-fcomte.fr/RECHERCHE/P7/pub/lars.htm#n1> Semi-frozen
phrase is a translation of the term expression semi-figée introduced by
the French linguist M. Gross [GROS 82].

2 <http://lifc.univ-fcomte.fr/RECHERCHE/P7/pub/lars.htm#n2> As an
illustration, research undertaken at the LADL laboratory [GROS 82] has
shown their high proportion within French language (20000 verbal frozen
phrases versus 8000 or 12000 free verbs ; 6000 adverbial phrases versus
2000 adverbs ; 300 000 or 400 000 compound nouns versus 80 000 simple
nouns).

3 <http://lifc.univ-fcomte.fr/RECHERCHE/P7/pub/lars.htm#n3> Discourse
oriented phrases are indispensible to open and maintain a communication
between two speakers (like formulae of presentation, apologise, ...)

4 <http://lifc.univ-fcomte.fr/RECHERCHE/P7/pub/lars.htm#n4> The
insertion of any adjective (propre here) makes the phrase loose its
idomatic meaning (which we note with a star). Litteral meaning may then
be the only one accessible, like to break one´s own pipe here.

5 <http://lifc.univ-fcomte.fr/RECHERCHE/P7/pub/lars.htm#n5>There exist
other classifications (such as [GIBB 89a et b]). Cacciari´s one is
presented here because it has been used for research on acquisition.

6 <http://lifc.univ-fcomte.fr/RECHERCHE/P7/pub/lars.htm#n6> Since it
involves a much more difficult task, the production competence of idioms
appears after the comprehension one.

7 <http://lifc.univ-fcomte.fr/RECHERCHE/P7/pub/lars.htm#n7> NO is the
deceiver and N1 the deceived.

8 <http://lifc.univ-fcomte.fr/RECHERCHE/P7/pub/lars.htm#n8> A litotes is
an understatement. pas folle la guêpe means here: very clever !)