Clearing the undergrowth
and marking out trails…
…challenges in
investigating Keyness
Mike Scott,
School of English
University of Liverpool
Keyness in Text Conference
Certosa di Pontignano, Siena
9:00-10:00, 29 June 2007
This presentation is at
www.lexically.net/downloads/corpus_linguistics
Keyness
1
Purpose



To explore the notion of keyness
and its implications in corpus-based study
with reference to WordSmith
Keyness
2
Overview
Keyness, as a new territory, looks
promising and has attracted colonists
and prospectors. It generally appears
to give robust indications of the text’s
aboutness together with indicators of
style.
Keyness
3
the text’s aboutness
Keyness
4
colonists …
Keyness
5
and prospectors
Keyness
6
Issues




the issue of text section v. text v. corpus
v. sub-corpus
statistical questions: what exactly can
be claimed?
how to choose a reference corpus
handling related forms such as
antonyms
Keyness
7
Machine and Human KWS

Rigotti and Rocci (2002) warn that machine
identification of key words omits all
interpretation of the writer’s intentions,
cannot get at cultural implications and does
not spot the congruity of the meanings of
each section with the next.
Keyness
8
metaphors

“In our view, a natural language
text, slippery and vague as it may
be, is not a stone soup where
words float free, tied only to their
multiple associations within a
Foucoultian discourse”
(Rigotti and Rocci, 2002)
Keyness
9
Of course it doesn’t actually
understand…
Keyness
10
… or know what is “correct”
Keyness
11
… only look at what is found
in text
… or context
… whether marked up or not …
<intro>Once upon a time ….</intro>
Keyness
12
Context?
Keyness
13
L e v e ls o f C o n te x t
P h y s ic a l e n v iro n m e nt
Keyness
14
If so

what is the status of the “key words” one
may identify and what is to be done with
them?
Keyness
15
Issues
1.
2.
3.
4.
5.
the issue of text section v. text v. corpus v.
sub-corpus
statistical questions: what exactly can be
claimed?
how to choose a reference corpus
handling related forms such as antonyms
what is the status of the “key words” one
may identify and what is to be done with
them?
Keyness
16
text section v. text v. corpus
v. sub-corpus



text section: levels 1-5
text: level 6
corpus: levels 7 & 8
Keyness
17
But these are often not
clearly differentiated



“text”, level 6: with or without mark-up,
images, sounds?
what do we mean by section, chapter
(4) and other non linguistically defined
categories (Roustier: “passage”)?
is text itself mutating?
Keyness
18
Internet text
Keyness
19
Wikipedia homepage (part)
Keyness
20
Wikipedia homepage (part)
Keyness
21
Wikipedia article (3 parts of
same article)
Keyness
22
Wikipedia discussion


from History of the stall article
latest contributor, “Talk” section
Keyness
23
statistical issues


p value is a well-established
standard, relying on the notion of
chance, random effects
but


if you run lots of comparisons some
will spuriously (by chance) appear
significant
if we’re operating at the level of word
or cluster, text itself doesn’t consist of
randomly ordered words
Keyness
24
Implication



there is no statistical defence of the
whole set of KWs
but only of each one
comparing KW p values is not
advisable
Keyness
25
Why?
Matrix text, describing a
series of troubles
affecting a set of
crops in a certain
place.
weevils and chickpeas
will be much rarer
words (if not rarer
entities in this
particular place)
and will float to the top
of the KW list
hail
wind
weevils
peas
chickpeas
potatoes
Keyness
26
choosing a reference corpus





using a mixed bag RC, the larger the RC the
better but a moderate sized RC may suffice.
the keyword procedure is fairly robust.
KWs identified even by an obviously absurd RC
can be plausible indicators of aboutness, which
reinforces the conclusion that keyword analysis is
robust.
genre-specific RCs identify rather different KWs
the aboutness of a text may not be one thing but
numerous different ones.
Scott (forthcoming)
Keyness
27
related forms



WordSmith can be asked to treat
members of the same lemma as related
and can handle clusters (Biber: lexical
bundles)
but otherwise ignores relations such as



synonymy
antonymy
collocation
Keyness
28
status of the KW





not intrinsic to the word/cluster but
context-bound
a pointer to specific textual
aboutness
and/or style
statistically arrived at but not
established
sometimes pointing to a pattern
Keyness
29
status of the set of KWs



indicative of the more general
aboutness of the source text(s)
and/or style
but (as a set) not statistically
proven
Keyness
30
Shakespeare’s KWs
Keyness
31
KWs of Hamlet

Characters:
FORTINBRAS, GERTRUDE, GUILDENSTERN, HAMLET,
HAMLET'S,HORATIO, LAERTES, OPHELIA, PYRRHUS,
ROSENCRANTZ

Places:
DENMARK, NORWAY

Pronouns:
I, IT, T, THEE, THOU

Themes, events:
MADNESS, PLAY,PLAYERS

Other (“unexpected”):
E'EN, LORD, MOST, MOTHER, PHRASE, VERY
Keyness
32
Most of these are obvious &
probably uninteresting….

if you know the play you already know



it concerns Hamlet and some other
characters
it’s set in Denmark
Ophelia goes mad.
Keyness
33
… but some are puzzling




Why are IT, LORD and MOST
positively key in Hamlet…
if they are negatively key in the other
plays?
Which characters are they most key
of?
Where are they found, how are these
KWs dispersed throughout the play?
Keyness
34
IT in Hamlet (1)

In the plays 0.95% (1 word in
100) but


in Hamlet’s speeches 1.48%: a 50%
increase in this one character’s
speeches…
in Horatio’s speeches 2.33%: nearly
250% of the average in this one
character’s speeches.
Keyness
35
IT in Hamlet (2)

In Hamlet’s speeches, distributed
evenly:
per 1,000
1

Plot
173 14.67
In Horatio’s speeches:
per 1,000
1
Plot
23.74
Keyness
36
DO in Othello



Nearly twice as frequent as in the
other plays
Characteristic of Iago (nearly twice
as often) and Desdemona (more
than 3 times as often)
DOST characteristic of Othello
(more than 6 times as frequent)
Keyness
37
Iago: commanding
Concordance
1
2
3
<IAGO> Do thou meet me presently at the
knows you not. I'll not be far from you: do you find some occasion to anger
time, man. I'll tell you what you shall do. Our general's wife is now the general:
4
vow I here engage my words. <IAGO> Do not rise yet. Witness, you ever-burni
5
out to savage madness. Look! he stirs; Do you withdraw yourself a little while, He
6
speak with me; The which he promis'd. Do but encave yourself, And mark the
7
8
9
10
mind again. This night, Iago. <IAGO> Do it not with poison, strangle her in her
him so That I may save my speech. Do but go after And mark how he
I am none such. <IAGO> Do not weep, do not weep. Alas the day! <EMILIA> Has
I am sure I am none such. <IAGO> Do not weep, do not weep. Alas the day!
Keyness
38
Desdemona: conditional
Concordance
11
warrant of thy place. Assure thee, If I do vow a friendship, I'll perform it To the
12
go seek him. Cassio, walk hereabout; If I do find him fit, I'll move your suit And seek
13
tears, my lord? If haply you my father do suspect An instrument of this your
14
and ever did, And ever will, though he do shake me off To beggarly divorcement,
15
16
Good faith! how foolish are our minds! If I do die before thee, prithee, shroud me In
tell me, Emilia, That there be women do abuse their husbands In such gross
Keyness
39
Othello’s DOST: questioning –
suspicion
Concordance
1
Ha! I like not that. <OTHELLO> What dost thou say? <IAGO> Nothing, my lord:
2
I love you. <OTHELLO> I think thou dost; And, for I know thou art full of love
3
thy brain Some horrible conceit. If thou dost love me, Show me thy thought.
4
for aught I know. <OTHELLO> What dost thou think? <IAGO> Think, my lord!
5
My noble lord,— <OTHELLO> What dost thou say, Iago? <IAGO> Did Michael
6
He did, from first to last: why dost thou ask? <IAGO> But for a
7
thought Too hideous to be shown. Thou dost mean something: I heard thee say
8
meditations lawful? <OTHELLO> Thou dost conspire against thy friend, Iago, If
9
to me as to thy thinkings, As thou dost ruminate, and give thy worst of
10
know my thoughts. <OTHELLO> What dost thou mean? <IAGO> Good name in
11
but keep 't unknown. <OTHELLO> Dost thou say so? <IAGO> She did
12
13
14
Farewell, farewell: If more thou dost perceive, let me know more; Set on
My noble lord,— <OTHELLO> If thou dost slander her and torture me, Never
you not hurt your head? <OTHELLO> Dost thou mock me? <IAGO> I mock
15
most cunning in my patience; But—dost thou hear?—most bloody. <IAGO>
16
And nothing of a man. <OTHELLO> Dost thou hear, Iago? I will be found most
17
t on the tree. O balmy breath, that dost almost persuade Justice to break her
18
in 's hand. O perjur'd woman! thou dost stone my heart, And mak'st me call
Keyness
40
Keyword Clusters



Text-initial sections of
“Hard News” (Guardian 1998-2004)
studying Hoey’s Lexical Priming
theory
Keyness
41
Research Questions
Using the hard news corpus,
1. How many 3-5 word clusters are
found to be key in TISC sections?
2. How many are positively and how
many are negatively key?
3. What recurrent patterns can be
found in the two types of key
cluster?
Keyness
42
RQs 1 & 2: Numbers of KW
clusters
using a p value of 0.0000001 and minimum
frequency of 3 and log likelihood statistic,



8,132 key clusters altogether (in 3.2 million
words of text)
of which 7,631 were positively key
and 501 negatively key
though there is repetition as these are 3-5
word n-grams
Research
Question 2
Keyness
43
RQ 1: Numbers of KW
clusters



Is 8 thousand a large number of
distinct key text-initial clusters?
In the same amount of text there
are 84 thousand 3-5 word clusters
of frequency at least 5 altogether…
about one in 10 is associated with
text initial position at the .0000001
level of significance
Keyness
44
RQ 1, continued





… is 1 in 10 a large number to be key?
In the case of SISC (sentences from
paragraphs with only one sentence in),
we get
507 thousand clusters, of which
2,192 are key (1,747 positively and 445
negatively)
which is about 1 in 230
Keyness
45
IT + reporting verb –
positively key
IT WAS ANNOUNCED LAST NIGHT
IT WAS CLAIMED LAST NIGHT
IT WAS CONFIRMED LAST NIGHT
IT IS REVEALED TODAY
Keyness
46
IT otherwise negatively key:
IT IS A
IT IS ABOUT
IT IS EXPECTED
IT IS GOING
IT IS ONLY
IT IS POSSIBLE
IT SEEMS TO
Keyness
47
Conclusions



keyness is a pointer
to importance
which can be



sub-textual
textual
intertextual
Keyness
48
References

















Berber Sardinha, Tony, 1999. Using Key Words in Text Analysis: practical aspects. DIRECT Papers 42, LAEL, Catholic University of São
Paulo.
Berber Sardinha, Tony, 2004. Lingüística de Corpus. Barueri: Manole.
Culpeper, J. ,2002. 'Computers, language and characterisation: An Analysis of six characters in Romeo and Juliet'. In: U. MelanderMarttala, C. Östman and M. Kytö (eds.), Conversation in Life and in Literature: Papers from the ASLA Symposium, Association Suedoise de
Linguistique Appliquée (ASLA), 15. Universitetstryckeriet: Uppsala, pp.11-30.
Kemppanen, Hannu 2004. Keywords and Ideology in Translated History Texts: A Corpus-based Analysis. Across Languages and Cultures 5
(1), 89-106
Rigotti, Eddo and Andrea Rocci, 2002. From Argument Analysis to Cultural Keywords (and back again). http://www.ils.com.unisi.ch/articolirigotti-rocci-keywords-published.pdf (accessed May 2007). In F. H. van Eemeren et al, Proceedings of the 5th Conference of the
International Society for the Study of Argumentation. Amsterdam: SicSat. pp. 903-908.
Scott, M., 1996 with new versions in 1997, 1999, 2004, Wordsmith Tools, Oxford: Oxford University Press.
Scott, M., 1997a. "PC Analysis of Key Words -- and Key Key Words", System, Vol. 25, No. 1, pp. 1-13.
Scott, M., 1997b. "The Right Word in the Right Place: Key Word Associates in Two Languages", AAA - Arbeiten aus Anglistik und
Amerikanistik, Vol. 22, No. 2, pp. 239-252.
Scott, M., 2000a. ‘Focusing on the Text and Its Key Words’, in L. Burnard & T. McEnery (eds.), Rethinking Language Pedagogy from a
Corpus Perspective, Volume 2. Frankfurt: Peter Lang., pp. 103-122.
Scott, M. 2000b. Reverberations of an Echo, in B. Lewandowska-Tomaszczyk & P.J. Melia (eds.) PALC’99: Practical Applications in
Language Corpora. Lodz Studies in Language, Volume 1. Frankfurt: Peter Lang., pp. 49-68.
Scott, M., 2001. ‘Mapping Key Words to Problem and Solution’ in M. Scott & G. Thompson (eds.) Patterns of Text: in honour of Michael
Hoey, Amsterdam: Benjamins, pp. 109-127.
Scott, M., 2002. ‘Picturing the key words of a very large corpus and their lexical upshots – or getting at the Guardian’s view of the world’ in
B. Kettemann & G. Marko (eds.) Teaching and Learning by Doing Corpus Analysis, Amsterdam: Rodopi, pp. 43-50 and cd-rom within the
cover of the book.
Scott, M. 2006. "The Importance of Key Words for LSP" in Arnó Macià, E., A. Soler Cervera & C. Rueda Ramos (eds.), Information
Technology in Languages for Specific Purposes: issues and prospects. New York: Springer, pp. 231-243.
Scott. M. (forthcoming) In Search of a Bad Reference Corpus. AHRC Methods Network.
Scott, M. & Tribble, C., 2006. Textual Patterns: keyword and corpus analysis in language education, Amsterdam: Benjamins.
Seale C, Charteris-Black J, Ziebland S. 2006. Gender, cancer experience and internet use: a comparative keyword analysis of interviews
and online cancer support groups. Social Science and Medicine. 62, 10: 2577-2590
Tribble, Chris, 1999, "Genres, keywords, teaching: towards a pedagogic account of the language of project proposals" in L. Burnard & A.
McEnery (eds.) Rethinking Language Pedagogy from a Corpus Perspective: Papers from the Third International Conference on Teaching
and Language Corpora, (Lodz Studies in Language). Hamburg: Peter Lang.
Keyness
49
Descargar

Document