Comparable Corpora for
Terminology
Stella E. O. Tagnin - USP
Corpus Linguistics, Translation and Terminology
New Technologies in Translation - CAPES
Universitat Rovira i Virgili-Universidade de São Paulo
Tarragona
June 9-10, 2009
Comparable Corpora
Natural language in both corpora
 Phraseologies (Conventionality)

 Terminology
 Discourse

Make for acquaintance with research area
 Basic
notions
 “Clues” for further research
Online corpora with
built-in tools
Results of your search - BNC
Your query was salt
Here is a random selection of 50 solutions from the 2943 found...
ABB 131 Add the pimentos, salame and boiling water or stock to the pan
with most, but not all of, the parsley and a little salt and pepper.
ABB 105 A pinch of salt is taken for granted in many cake recipes and
is added simply to bring out the flavour of the other ingredients.
ABB 1332 Return the veal to the pan, add the fresh and dried tomatoes,
rosemary, wine, stock, salt and pepper.
AMU 1667 The sea churned the banalities of his life into flotsam: sheets,
shirts, sandals, books, charts, salt cellar…
B77 931 But it is possible to reduce salt consumption further by `;placing
the salt shaker at some distance from the table';.
BPG 1548 freshly ground black pepper and salt
C97 618 SALT is the spice of life .
CFS 1681 Substitute LoSalt for common salt, at the table and in cooking
to reduce your family's salt intake.
G36 1259 Sieve flour into a bowl with pinch salt.
Query Results - Cobuild
NOTE: no more than 40 lines will be displayed here, since a threshold has been implemented. If there were more than 40 instances
found, a random selection will have been applied.
are more effective than a pinch of salt. [p] Fold in with a metal spoon,
chutney [/h] 2 oz walnuts [p] 1/4 tsp salt [p] 1/4 tsp cayenne pepper [p] 2
sea `vegetables" of all types. Sea salt also provides some as does sw3
(tel: 071-276 5599). 4 Topiary salt and pepper pots by Swid
add the
saffron and stir. Season with salt and pepper. markets and collected
cartoon animal salt and pepper shakers, plastic cuckoo entertaining
even Hollywood moguls to salt-beef sandwiches in mainly with boiled
water, sugar and salt, can save most diarrhea victims'
salt. [p]
Herb, vegetable and spice salt: compounds of salt with other
few
leaves crisp iceberg lettuce [p] salt,freshly ground black pepper [p] a
great lover of liberally sprinkling salt on her food at the table, thereby
served fried egg and crisp slices of salt pancetta. [p] Caesar salad was
tray at my head.l A large pinch of salt should be applied to this story.
mousse-like. Sift over the flour and salt, then fold in to the eggs and
Lanzarote round potatoes and rock salt. Tomatoes, sweet potatoes,
I took her words with a grain of salt, went home, put the sample on a
T-Score - salt
Collocate
And
Pepper
With
Lake
Sugar
Water
Black
City
Pinch
Ground
Sea
Tsp
Flour
Add
Freshly
Season
Corpus Freq Joint Freq
1369241
813
903
285
364279
237
2689
93
2472
90
15678
86
16881
84
20496
85
405
73
8804
64
5756
59
290
53
676
51
5006
52
404
45
16627
51
Significance
17.533492
16.869713
9.984592
9.579897
9.427256
8.887077
8.744025
8.711252
8.533166
7.748380
7.509810
7.271002
7.119786
7.052378
6.694434
6.609096
Mutual Information - salt
Collocate
Pepper
Monosodium
Glutamate
Dampier
Tsp
Pinch
Teaspoon
Paprika
Crinkle
Vinegar
Nutmeg
Sodium
Oregano
Freshly
Corpus Freq
903
12
13
18
290
405
151
51
17
272
82
150
43
404
Joint Freq
285
3
3
4
53
73
27
9
3
40
12
18
5
45
Significance
10.431903
10.095633
9.980145
9.925691
9.643600
9.623633
9.612068
9.593083
9.593083
9.330022
9.322967
9.036634
8.991187
8.929159
Equivalence in Translation
Pragmatic definition:
 a term that “works” in target text as it “works” in
source text
Translator aims at
 fluent translation to ensure better understanding on
the reader’s part
Therefore he should not
 use an unusual term which might sound strange to
reader or cause ambiguity
Dictionaries

lack of criteria in compiling terms

lack of usage examples

no updating due to rapid development of
scientific and technological research

no coverage of various technical areas
Advantages of a corpus

built according to translator’s needs

can be constantly updated

offers authentic examples of usage

therefore, translator feels secure in his
choice of term to use
CorTec
http://www.fflch.usp.br/dlm/comet/consulta_cortec.html
CorTec (Technical Corpus), part of
 COMET
– Corpus Multilíngüe para
Ensino e Tradução
Fifteen comparable corpora EnglishPortuguese
 Each corpus approx. 200,000 words in
each language
 Description of content

CorTec
http://www.fflch.usp.br/dlm/comet/consulta_cortec.html
CorTec (Technical Corpus), part of
COMET – Corpus Multilíngüe para
Ensino e Tradução






•Linguistics
Kidney failure
•Nutritional Supplements
Computing I & II
•Football (soccer)
Ecotourism
•Coffee
Contracts
•Cultural Tourism
Cooking Recipes I & II • Astronomy
Hipertension
•Electromagnetic flowmeters
CorTec built-in tools
http://www.fflch.usp.br/dlm/comet/consulta_cortec.html
Wordlist
 by frequency and alphabetical
Concordancer
 by word or expression (Expression or word equal
to)
 by prefixes or beginning of word (Beginning with)
 by suffix or endings (Ending in)
 by parts of words (Containing)
N-gram generator
 combinations of com 2, 3 or 4 words
Identifying equivalents

by Wordlist

by Collocates

by Translation of collocates

by Concordances

by Context
Identifying equivalents by
Wordlist
http://www.fflch.usp.br/dlm/comet/consulta_cortec.html
Contracts
Wordlist in both languages
 Portuguese: most frequent content word
 contrato - 1832 occurrences
 contract: just 186 times
contrato ≠ contract

English: most frequent content word
 agreement - 1724 occurrences
agreement vs. contrato
1a For the purposes of this Agreement, all merchantable Logs 6" in diameter
1b Constitui objeto do presente contrato o intercâmbio eletrônico de documentos
2a 12.2 This Agreement may be cancelled by either party, at it
2b 13.2 A rescisão deste contrato implicará retenção de créditos decorren
3a The term of this Agreement shall expire June 30, 2001, (the "Term“
3b O presente contrato terá prazo de (xxx), iniciando-se no di
Relinquished Property Contract
Replacement Property Contract
Adhesion contract
42.000 “adhesion contract”
890 “adhesion agreement” (mostly translated sites)
Identifying equivalents by
Translation of Collocates
Cooking Recipes
chopped
shredded
grated
finely
sliced
diced



Soy sauce - 1 tbsp Onion - 1 medium, finely chopped Celery - 3
sticks, finely chopp
87
apples - 450g (1 lb), peeled, cored and finely chopped Onions 225g (8 oz), finely ch
88
Milk - 600 ml (1 pint) Onion - 2 tbsp, finely chopped Celery - 2
tbsp, finely chopped
86
chop = picar
 29 occurrences with “fino” or derived form:
“picad* fin*” (18 occurrences),
“finamente picad*” (11 occurrences).
Results for picad*
most frequent adverb with “picad*” is
“bem” (79 occurrences): “bem picada,
bem picado” etc.
 “picadinh*” (96 occurrences, out of which
10 are “bem picadinha”).
 best equivalences for finely chopped

“bem picad*” or “picadinh*”
Results for picad*








2 cebolas médias bem picadas
½ dente de alho bem picado
junte os tomates pelados bem picados.
Calabresa picadinha
100 g de bacon picadinho
2 dentes de alho picadinhos
Polvilhar salsa bem picadinha
½ cebola bem picadinha
finely sliced
Slice = cortar em fatias (?* fatiar)
Calda 4 laranjas descascadas cortadas em fatias finas
 200 g de cebola cortada em fatias finas
 1 pepino sem sementes cortado em fatias finas
 6 rabanetes, cortados em fatias finas
 Juntar as batatas cortadas em fatias finas.
 Decore a quiche com um alho-poró cru cortado em
rodelas finas.
 1 cebola média cortada em rodelas finas
 400 g de lingüiça portuguesa cortada em rodelas finas

finely diced

50 g de bacon em cubinhos
500 g de peito de frango cozido e cortado em cubinhos
1 tomate sem pele e sementes cortado em cubinhos
1 abacaxi médio cortado em cubinhos
150 g de presunto cozido cortado em cubinhos
1/2 xícara (chá) de queijo prato cortado em cubinhos
1 cebola grande, cortada em cubinhos
100 g de bacon em cubos pequenos
300g de abóbora moranga cortada em cubos pequenos

200 g de bacon cortado em cubos pequenos








finely grated
2 occurrences for cheese
 2 col. (sopa) de queijo parmesão ralado fino
 80g de queijo gruyère ralado fino


2 colheres (sopa) de parmesão ralado
50 g de queijo parmesão, ralado
32 occurrences for cheese
 1 xícara de queijo prato ralado grosso
 2 xícaras de queijo mussarela ralado grosso (200 g)
 Para polvilhar 50 g de queijo parmesão ralado grosso
Vegetables and chocolate
 1 cebola ralada fino
 4 xícaras de repolho ralado fino

Cobertura de chocolate ralado bem fino
Identifying equivalents by
Translation of Collocate
Hipertension
heart coração & cardi

Heart : 768 occurrences
heart failure & heart disease
dy fou 154 th ISH reduced the incidence of stroke, heart failure, and 8
si 255 AG increases with advanced age, stroke, heart failure, 9
as
had
706 ol subjects,12 and patients with severe heart failure.18
10
ysfunctio 281 l infarction or in patients with severe heart failure. These
inuria
365 ); acute pulmonary edema, congestive heart failure, left 12
11
e was si 312 ocardial infarction, stroke, congestive heart failure,
hows
495 h hypertension, obesity, and congestive heart failure, the
7

mHpresença de insuficiência cardíaca congestiva, hemorragia cere-2
is quando existe insuficiência cardíaca congestiva associada. Podem causar 3t
ensão severa ou insuficiência cardíaca congestiva associada. Limitações do
Identifying functional
equivalents by Concordances
 marcar
um gol

kidney vs. renal
 rim
vs. renal
3-word clusters
 Identifying
functional
equivalents by Context
 gol
contra
Discovering new terms
overtime
injury time
overtime???
49
eam that lost the penalty kicks scores a goal in the overtime period,
1 as the game extended from regulation to a pair of 15-minute overtime
periods.
2 Too many games decided on the free kicks that follow the overtime period.
3 In the 110th minute, early in the second overtime session of a 1-1 tie at a
sold-out Olympic Stadium,
4 score, 1-1, in the 19th minute with a header off a corner kick. In overtime,
the two became involved again, this time with Zidane
5 One minute into the injury time added on to the 30-minute overtime,
6 Their patience almost paid off at the start of the 30-minute overtime as
reserve
acréscimos - prorrogação




5 Quando todos esperavam a prorrogação, os italianos definiram a vitória
nos acréscimos.
6 nos últimos minutos, mas acabou levando mais um gol nos acréscimos
7 mas conseguiu um gol de pênalti marcado por Totti, nos acréscimos da
partida.
8 Van Bronckhorst acabou expulso nos acréscimos por entrada dura em
Tiago.




1 Ahn marcou o gol que eliminou, aos 12min do 2º tempo da prorrogação,
os italianos nas
2 e sacramentou a vitória argentina sobre o México por 2 a 1, já na
prorrogação,
3 mas nada disso foi suficiente para evitar que a partida fosse para a
prorrogação
injury time





1 John Aloisi added one in injury time.
2 Zinedine Zidane's substitution in injury time could mark his World Cup
farewell -- he will m
3 Rahdi Jaidi's header in injury time Wednesday gave Tunisia a 2-2 tie
with Saudi
4 nded dramatically on reserve Oliver Neuville's goal in injury time.
5 Nadj was ejected in first-half injury time, and Domoraud got his second
yellow card in s

6 and Domoraud got his second yellow card in second-half injury time.

Aloisi deu números finais ao placar já nos acréscimos
acréscimos vs. prorrogação


1 os minutos, mas acabou levando mais um gol nos acréscimos. Zidane
recebeu livre na esquerda, deu um corte em Puyol, e ba
2 do parecia que mais uma partida seria decidida na prorrogação, Zidane
cobrou uma falta da direita, a zaga espanhola desviou e


E quando parecia que mais uma partida seria decidida na prorrogação,
Zidane cobrou uma falta da direita, a zaga espanhola desviou e Vieira, livre
na segunda trave, cabeceou firme, colocando a França na frente. A
Espanha partiu para uma pressão nos últimos minutos, mas acabou
levando mais um gol nos acréscimos. Zidane recebeu livre na esquerda,
deu um corte em Puyol, e bateu firme, vencendo o goleiro Casillas, seu
companheiro de Real Madrid.
How to identify equivalents?
1.
2.
3.
4.
5.
Wordlist  most frequent words: contrato –
agreement;
Collocates of search word: marcar um gol  score
a goal; kidney/renal vs. rim/renal
Translation of collocates: finely +
chopped/diced/sliced/grated/shredded
Cognate collocate: congestive heart failure 
insuficiência cardíaca congestiva
Context: gol contra  Zaccardo, Aloisi; injury
time vs. overtime
Parallel corpora
Studies with parallel corpora
Contrastive Studies
 www.linguateca.pt
 Catálogo de Publicações  Procura de
Publicações  COMPARA

Naturalness in language
A contrastive
methodology to avoid
“translationese”
Possible Studies
Originals
English - EO
1
Originals
Portuguese – PO
1. Contrastive Linguistics
2. Contrasting translations
3
Translations
Portuguese - PT
4
3
2
Translations
English – ET
3. Translation strategies and norms
4. “Translationese”
Possible Studies
1.
2.
3.
4.
EO vs. PO – Contrastive Linguistics:
natural forms in both languages >
similarities and differences
ET vs PT – contrasting translations:
differences between translations into
various languages
EO vs. PT; PO vs. ET: translators’
options: strategies and norms
EO vs. ET; PO vs PT: “translationese” –
peculiarities of translated language which
do not occur in original texts, or do so
with different frequency (over/underuse)
Methodology
Starting always with the original
1.
2.
3.
4.
EO  PT: survey of translations into
Portuguese of study item
PO ET: survey of these equivalents
and their translations into English
EO  PT: survey of these equivalents
and their translations into Portuguese
PO  ET: and so on...
Results
said EO (310)=0.4%  disse PT (203)=0.25%
disse PO (936)=0.23%  said ET (772)=0.18%
told ET (59)=0.013% 0.193%
Conclusions
1.
2.
PT has a greater variety of elocution verbs but
sticks to natural form in target language
ET has low variety of elocution verbs, but even
so falls short of naturalness in target language
ET – 1106 said: 772  disse – 69,8%
334  outras – 30,19%
1106/421.725 = 0.26%
Comparable Corpora
Natural language in both corpora
 Phraseologies (Conventionality)

 Terminology
 Discourse

Acquaintance with research area
 Basic
notions
 “Clues” for further research
Triple Corpus
L1 Corpus
Comparable
Corpus
Parallel
Corpus
L2 Translation
Corpus
L2 Corpus
Monolingual
Comparable
Corpus
UNIVERSIDADE DE SÃO PAULO
FACULDADE DE FILOSOFIA, LETRAS E
CIÊNCIAS HUMANAS Departamento de
Letras Modernas
DUBLINERS SOB A LUPA DA LINGÜÍSTICA
DE CORPUS:
Uma contribuição para a análise e a avaliação da
tradução literária
Lourdes Bernardes Gonçalves
Universidade Federal do Ceará (Depto Letras
Estrangeiras)
Orientadora: Dra. Stella Esther Ortweiler Tagnin
Cap. III: A Análise do Texto Literário
Resultados da Pesquisa:

Área semântica de Música: Palavras-chave:
tenor (45,4); concert (43,8); artistes (34,0); concerts (24,9); baritone
(22,7); clapping (22,7); song (20,5); opera (17,0); piano (15,7);
m usic (14,6); sing (14,4 ); m usical (14,4); artiste (14,2); accom panist
(14,2); w altz (13,8); m elody (11,8); singers (11,3 ).
Conclusão: Importância da música na definição de personagens, tom da
narrativa, comentário da ação.
A palavra SHE: (684 ocorr. como sujeito, 207 verbos distintos) Concordâncias:
Verbos volitivos (ação deliberada): 99,4% não colocam a mulher numa posição de
submissão ou opressão;
Verbos intelectivos (processos mentais): nenhum aponta para subserviência
Verbos afetivos (emoções): 99,03% não apontam para opressão
Exemplos de Pesquisa
direcionada pela Lingüística de
Corpus:
Palavras-chave
Comparação dubjj
com refcor (pal. positivas):
N
1
2
3
4
5
WORD
MR
GABRIEL
AUNT
HIS
KERNAN
FREQ.
573
141
125
1.158
66
DUBJJ.LST %
0,84
0,21
0,18
1,70
0,10
FREQ. REFCOR.LST % KEYNESS
173
0,08
916,1
0
400,1
31
0,01
216,3
2.158
1,02
193,0
0
187,2
Comparação dubjj com refcor (pal. negativas):
N
129
130
131
132
133
WORD
MOTHER
IT
HER
OH
SHE
FREQ.
30
581
790
3
695
DUBJJ.LST %
0,04
0,86
1,16
1,02
FREQ. REFCOR.LST % KEYNESS
295
0,14
48,6
2.808
1,32
100,9
3.674
1,73
112,5
320
0,15
152,0
4.471
2,10
376,6
Exemplos de Pesquisa direcionada
pela Lingüística de Corpus:
Concordâncias com Mrs. Kearney:
N
Concordance
1 about Mr Fitzpatrick,"
repeated Mrs Kearney.
2 asked could she do anything.
Mrs Kearney
3 er in the language movement. Mrs Kearney
4 other corner of the room were Mrs Kearney
5 t way did you treat me?" asked Mrs Kearney.
6 a discreet part of the corridor. Mrs Kearney
7
Everything went on smoothly. Mrs Kearney
8 ing.
Miss Devlin had become Mrs Kearney
9 excited. He spoke volubly, but Mrs Kearney
10 r the first year of married life, Mrs Kearney
"I have my cont
looked searchingl
was well content
and her husband,
Her face was i
asked him when
bought some lovel
out of spite. She
said curtly at int
perceived that su
Verbos de negócios e argumentação (22): take note, speak, take
into consideration, explain, see (=understand), determine, learn
Exemplos de Pesquisa direcionada
pela Lingüística de Corpus:
Possibilidades de Tradução
Concordâncias de man em “A Mother”:
N
Concordance
1 e smiled and shook his hand. He was a little man ,
with a white, vacant face. She notice
2 o have treated her like that if she had been a man . But she would
3 ld see that it went in. He was a
4 ell, the second tenor,
5 occasion
6 The
7 e
`My good man is packing
every year for prizes at t
us off to Skerries
was a slender young man with a scattered
He was a suave, elderly man who balanced
Kearney
9 ly and the baritone.
voice and careful mann
was a fair- haired little man who competed
bass, Mr Duggan,
8 ried life, Mrs
10
grey- haired man with a plausible
to say to some friend:
room by instinct.
see that her daughter
black moustache.
his imposing
body,
perceived that such a man would wear better than a romantic
They were the
was entertaining
He
whe
pers
Freeman man and Mr O'Madden Burke. The Freeman
man and Mr O'Madden Burke. The Freeman man had come in to say that he
11 ile Mr Holohan
for a few w
the Freeman man Mrs Kearney
was speaking
could not
so
animat
Exemplos de Pesquisa direcionada
pela Lingüística de Corpus
Alinhador de Textos
JJ
W h e n w e k n e w h im firs t h e u s e d to b e ra th e r
in te re s tin g , ta lk in g o f fa in ts a n d w o rm s ; b u t I s o o n
g re w tire d o f h im a n d h is e n d le s s s to rie s a b o u t th e
d is tille ry .
H T N o p rin c íp io , q u a n d o o c o n h e c e m o s , c o s tu m a v a
s e r in te re s s a n te c o m s u a s c o n v e rs a s s o b re v e rm e s
e d e s m a io s , m a s lo g o c a n s a ra -m e d e le e d e s u a s
in te rm in á v e is h is tó ria s a re s p e ito d a d e s tila ria .
O S Q u a n d o o c o n h e c e m o s e ra u m s u je ito c a tiv a n te ,
q u e fa la v a d e b a g a ç o e d e s e rp e n tin a s ; m a s lo g o
c a n s e i-m e d e le e d e s u a s h is tó ria s in te rm in á v e is a
re s p e ito d o a la m b iq u e .
Considerações Finais
Contribuição das Listas de Palavras-chave:
Mr. (ch = 916,1) e she (ch = - 376,6)
Função da música no texto Dubliners
Contribuição das Concordâncias:
Concordâncias com sujeitos femininos
Concordâncias com Mrs. Kearney como sujeito
(Mrs. Kearney ou she)
Contribuição do Alinhamento:
Visualização do original e traduções
Análise de traduções frase a frase
Descargar

Comparable Corpora for Terminology