Advances in
Automated
Language
Classification
ASJP Consortium
Dik Bakker, Lancaster
Overview
Project:
ASJP (Automated Similarity Judgment Program)
ASJP: Automatic Reconstruction
2
Overview
Project:
ASJP are:
Sören Wichmann (BRD; Netherlands)
Viveka Velupillai (BRD)
André Müller (BRD)
Robert Mailhammer (BRD)
Hagen Jung (BRD)
Eric Holman (US)
Anthony Grant (UK)
Dmitry Egorov (Russia)
Pamela Brown (US)
Cecil Brown (US)
Dik Bakker (UK; Netherlands)
ASJP: Automatic Reconstruction
3
Overview
Project:
ASJP (Automated Similarity Judgment Program)
ASJP: Automatic Reconstruction
4
Overview
Project:
ASJP (Automated Similarity Judgment Program)
Overall goal:
Automatic reconstruction of language relationships
ASJP: Automatic Reconstruction
5
Overview
Project:
ASJP (Automated Similarity Judgment Program)
Overall goal:
Automatic reconstruction of language relationships
Basis:
Distance matrix between individual languages on basis of
linguistic features
ASJP: Automatic Reconstruction
6
Overview
Project:
ASJP (Automated Similarity Judgment Program)
Overall goal:
Automatic reconstruction of language relationships
Basis:
Distance matrix between individual languages on basis of
linguistic features
Method:
Lexicostatistics: mass comparison of lexical items
ASJP: Automatic Reconstruction
7
Overview
MAIN GOAL: Reconstruction of Language Relationships
Derived goals (a.o):
ASJP: Automatic Reconstruction
8
Overview
MAIN GOAL: Reconstruction of Language Relationships
Derived goals:
- Critical assessment and refinement of existing classifications
ASJP: Automatic Reconstruction
9
Overview
MAIN GOAL: Reconstruction of Language Relationships
Derived goals:
- Critical assessment and refinement of existing classifications
- Classify newly described and unclassified languages
ASJP: Automatic Reconstruction
10
Overview
MAIN GOAL: Reconstruction of Language Relationships
Derived goals:
- Critical assessment and refinement of existing classifications
- Classify newly described and unclassified languages
- Estimate time depths between languages / genera / families
ASJP: Automatic Reconstruction
11
Overview
MAIN GOAL: Reconstruction of Language Relationships
Derived goals:
- Critical assessment and refinement of existing classifications
- Classify newly described and unclassified languages
- Estimate time depths between languages / genera / families
- Search for (ir)regularities in phylogenies
ASJP: Automatic Reconstruction
12
Overview
MAIN GOAL: Reconstruction of Language Relationships
Derived goals:
- Critical assessment and refinement of existing classifications
- Classify newly described and unclassified languages
- Estimate time depths between languages / genera / families
- Search for (ir)regularities in phylogenies
- Test hypotheses (e.g. Atkinson et al 2008; ‘elbow’ phenomenon)
ASJP: Automatic Reconstruction
13
Overview
MAIN GOAL: Reconstruction of Language Relationships
Derived goals:
- Critical assessment and refinement of existing classifications
- Classify newly described and unclassified languages
- Estimate time depths between languages / genera / families
- Search for (ir)regularities in phylogenies
- Test hypotheses (e.g. Atkinson et al 2008; ‘elbow’ phenomenon)
- Experimentally find the best/optimal dating method
ASJP: Automatic Reconstruction
14
Overview
MAIN GOAL: Reconstruction of Language Relationships
Derived goals:
- Critical assessment and refinement of existing classifications
- Classify newly described and unclassified languages
- Estimate time depths between languages / genera / families
- Search for (ir)regularities in phylogenies
- Test hypotheses (e.g. Atkinson et al 2008; ‘elbow’ phenomenon)
- Experimentally find the best/optimal dating method
- Detect borrowings
ASJP: Automatic Reconstruction
15
Overview
MAIN GOAL: Reconstruction of Language Relationships
Derived goals:
- Critical assessment and refinement of existing classifications
- Classify newly described and unclassified languages
- Estimate time depths between languages / genera / families
- Search for (ir)regularities in phylogenies
- Test hypotheses (e.g. Atkinson et al 2008; ‘elbow’ phenomenon)
- Experimentally find the best/optimal dating method
- Detect borrowings
ASJP: Automatic Reconstruction
16
Overview
1. The basic list of lexical items
ASJP: Automatic Reconstruction
17
Overview
1. The basic list of lexical items
2. Comparing languages
ASJP: Automatic Reconstruction
18
Overview
1. The basic list of lexical items
2. Comparing languages
3. Some results: genetic and areal proximity
ASJP: Automatic Reconstruction
19
Overview
1. The basic list of lexical items
2. Comparing languages
3. Some results: genetic and areal proximity
4. On Inheritance vs Borrowing
ASJP: Automatic Reconstruction
20
Overview
1. The basic list of lexical items
2. Comparing languages
3. Some results: genetic and areal proximity
4. On Inheritance vs Borrowing
5. Conclusions
ASJP: Automatic Reconstruction
21
1. The basic list of lexical items
ASJP: Automatic Reconstruction
22
Lexical items
Word list: Swadesh 100 basic meanings
ASJP: Automatic Reconstruction
23
Lexical items
Word list: Swadesh 100 basic meanings
- Word coined in most languages
ASJP: Automatic Reconstruction
24
Lexical items
Word list: Swadesh 100 basic meanings
- Word coined in most languages
- Collected in field work lexicon / grammar
ASJP: Automatic Reconstruction
25
Lexical items
Word list: Swadesh 100 basic meanings
- Word coined in most languages
- Collected in field work lexicon / grammar
- Inherited rather than borrowed
ASJP: Automatic Reconstruction
26
Lexical items
Word list: Swadesh 100 basic meanings
- Word coined in most languages
- Collected in field work lexicon / grammar
- Inherited rather than borrowed
- Culturally independent
ASJP: Automatic Reconstruction
27
Lexical items
Word list: Swadesh 100 basic meanings
- Word coined in most languages
- Collected in field work lexicon / grammar
- Inherited rather than borrowed
- Culturally independent
- Stable over time
ASJP: Automatic Reconstruction
28
Lexical items
Word list: Swadesh 100 basic meanings
- Word coined in most languages
- Collected in field work lexicon / grammar
- Inherited rather than borrowed
- Culturally independent
- Stable over time
- Few synonyms
ASJP: Automatic Reconstruction
29
1. I
21. dog
41. nose
61. die
81. smoke
2. you
22. louse
42. mouth
62. kill
82. fire
3. we
23. tree
43. tooth
63. swim
83. ash
4. this
24. seed
44. tongue
64. fly
84. burn
5. that
25. leaf
45. claw
65. walk
85. path
6. who
26. root
46. foot
66. come
86. mountain
7. what
27. bark
47. knee
67. lie
87. red
8. not
28. skin
48. hand
68. sit
88. green
9. all
29. flesh
49. belly
69. stand
89. yellow
10. many
30. blood
50. neck
70. give
90. white
11. one
31. bone
51. breasts
71. say
91. black
12. two
32. grease
52. heart
72. sun
92. night
13. big
33. egg
53. liver
73. moon
93. hot
14. long
34. horn
54. drink
74. star
94. cold
15. small
35. tail
55. eat
75. water
95. full
16. woman
36. feather
56. bite
76. rain
96. new
17. man
37. hair
57. see
77. stone
97. good
18. person
38. head
58. hear
78. sand
98. round
19. fish
39. ear
59. know
79. earth
99. dry
20. bird
40. eye
60. sleep
80. cloud
100. name
ASJP: Automatic Reconstruction
30
1. I
21. dog
41. nose
61. die
81. smoke
2. you
22. louse
42. mouth
62. kill
82. fire
3. we
23. tree
43. tooth
63. swim
83. ash
4. this
24. seed
44. tongue
64. fly
84. burn
5. that
25. leaf
45. claw
65. walk
85. path
6. who
26. root
46. foot
66. come
86. mountain
7. what
27. bark
47. knee
67. lie
87. red
8. not
28. skin
48. hand
68. sit
88. green
9. all
29. flesh
49. belly
69. stand
89. yellow
10. many
30. blood
50. neck
70. give
90. white
11. one
31. bone
51. breasts
71. say
91. black
12. two
32. grease
52. heart
72. sun
92. night
13. big
33. egg
53. liver
73. moon
93. hot
14. long
34. horn
54. drink
74. star
94. cold
15. small
35. tail
55. eat
75. water
95. full
16. woman
36. feather
56. bite
76. rain
96. new
17. man
37. hair
57. see
77. stone
97. good
18. person
38. head
58. hear
78. sand
98. round
19. fish
39. ear
59. know
79. earth
99. dry
20. bird
40. eye
60. sleep
80. cloud
100. name
ASJP: Automatic Reconstruction
31
1. I
21. dog
41. nose
61. die
81. smoke
2. you
22. louse
42. mouth
62. kill
82. fire
3. we
23. tree
43. tooth
63. swim
83. ash
4. this
24. seed
44. tongue
64. fly
84. burn
5. that
25. leaf
45. claw
65. walk
85. path
6. who
26. root
46. foot
66. come
86. mountain
7. what
27. bark
47. knee
67. lie
87. red
8. not
28. skin
48. hand
68. sit
88. green
9. all
29. flesh
49. belly
69. stand
89. yellow
10. many
30. blood
50. neck
70. give
90. white
11. one
31. bone
51. breasts
71. say
91. black
12. two
32. grease
52. heart
72. sun
92. night
13. big
33. egg
53. liver
73. moon
93. hot
14. long
34. horn
54. drink
74. star
94. cold
15. small
35. tail
55. eat
75. water
95. full
16. woman
36. feather
56. bite
76. rain
96. new
17. man
37. hair
57. see
77. stone
97. good
18. person
38. head
58. hear
78. sand
98. round
19. fish
39. ear
59. know
79. earth
99. dry
20. bird
40. eye
60. sleep
80. cloud
100. name
ASJP: Automatic Reconstruction
32
1. I
21. dog
41. nose
61. die
81. smoke
2. you
22. louse
42. mouth
62. kill
82. fire
3. we
23. tree
43. tooth
63. swim
83. ash
4. this
24. seed
44. tongue
64. fly
84. burn
5. that
25. leaf
45. claw
65. walk
85. path
6. who
26. root
46. foot
66. come
86. mountain
7. what
27. bark
47. knee
67. lie
87. red
8. not
28. skin
48. hand
68. sit
88. green
9. all
29. flesh
49. belly
69. stand
89. yellow
10. many
30. blood
50. neck
70. give
90. white
11. one
31. bone
51. breasts
71. say
91. black
12. two
32. grease
52. heart
72. sun
92. night
13. big
33. egg
53. liver
73. moon
93. hot
14. long
34. horn
54. drink
74. star
94. cold
15. small
35. tail
55. eat
75. water
95. full
16. woman
36. feather
56. bite
76. rain
96. new
17. man
37. hair
57. see
77. stone
97. good
18. person
38. head
58. hear
78. sand
98. round
19. fish
39. ear
59. know
79. earth
99. dry
20. bird
40. eye
60. sleep
80. cloud
100. name
ASJP: Automatic Reconstruction
33
1. I
21. dog
41. nose
61. die
81. smoke
2. you
22. louse
42. mouth
62. kill
82. fire
3. we
23. tree
43. tooth
63. swim
83. ash
4. this
24. seed
44. tongue
64. fly
84. burn
5. that
25. leaf
45. claw
65. walk
85. path
6. who
26. root
46. foot
66. come
86. mountain
7. what
27. bark
47. knee
67. lie
87. red
8. not
28. skin
48. hand
68. sit
88. green
9. all
29. flesh
49. belly
69. stand
89. yellow
10. many
30. blood
50. neck
70. give
90. white
11. one
31. bone
51. breasts
71. say
91. black
12. two
32. grease
52. heart
72. sun
92. night
13. big
33. egg
53. liver
73. moon
93. hot
14. long
34. horn
54. drink
74. star
94. cold
15. small
35. tail
55. eat
75. water
95. full
16. woman
36. feather
56. bite
76. rain
96. new
17. man
37. hair
57. see
77. stone
97. good
18. person
38. head
58. hear
78. sand
98. round
19. fish
39. ear
59. know
79. earth
99. dry
20. bird
40. eye
60. sleep
80. cloud
100. name
ASJP: Automatic Reconstruction
34
1. I
21. dog
41. nose
61. die
81. smoke
2. you
22. louse
42. mouth
62. kill
82. fire
3. we
23. tree
43. tooth
63. swim
83. ash
4. this
24. seed
44. tongue
64. fly
84. burn
5. that
25. leaf
45. claw
65. walk
85. path
6. who
26. root
46. foot
66. come
86. mountain
7. what
27. bark
47. knee
67. lie
87. red
8. not
28. skin
48. hand
68. sit
88. green
9. all
29. flesh
49. belly
69. stand
89. yellow
10. many
30. blood
50. neck
70. give
90. white
11. one
31. bone
51. breasts
71. say
91. black
12. two
32. grease
52. heart
72. sun
92. night
13. big
33. egg
53. liver
73. moon
93. hot
14. long
34. horn
54. drink
74. star
94. cold
15. small
35. tail
55. eat
75. water
95. full
16. woman
36. feather
56. bite
76. rain
96. new
17. man
37. hair
57. see
77. stone
97. good
18. person
38. head
58. hear
78. sand
98. round
19. fish
39. ear
59. know
79. earth
99. dry
20. bird
40. eye
60. sleep
80. cloud
100. name
ASJP: Automatic Reconstruction
35
1. I
21. dog
41. nose
61. die
81. smoke
2. you
22. louse
42. mouth
62. kill
82. fire
3. we
23. tree
43. tooth
63. swim
83. ash
4. this
24. seed
44. tongue
64. fly
84. burn
5. that
25. leaf
45. claw
65. walk
85. path
6. who
26. root
46. foot
66. come
86. mountain
7. what
27. bark
47. knee
67. lie
87. red
8. not
28. skin
48. hand
68. sit
88. green
9. all
29. flesh
49. belly
69. stand
89. yellow
10. many
30. blood
50. neck
70. give
90. white
11. one
31. bone
51. breasts
71. say
91. black
12. two
32. grease
52. heart
72. sun
92. night
13. big
33. egg
53. liver
73. moon
93. hot
14. long
34. horn
54. drink
74. star
94. cold
15. small
35. tail
55. eat
75. water
95. full
16. woman
36. feather
56. bite
76. rain
96. new
17. man
37. hair
57. see
77. stone
97. good
18. person
38. head
58. hear
78. sand
98. round
19. fish
39. ear
59. know
79. earth
99. dry
20. bird
40. eye
60. sleep
80. cloud
100. name
ASJP: Automatic Reconstruction
36
Lexical items: further reduction
Early analyses have shown:
- Optimal 40/100 item subset gives same results
ASJP: Automatic Reconstruction
37
Lexical items: further reduction
Early analyses have shown:
- Optimal 40/100 item subset gives same results
 Less work
ASJP: Automatic Reconstruction
38
Lexical items: further reduction
Early analyses have shown:
- Optimal 40/100 item subset gives same results
 Less work
 Less missing data
ASJP: Automatic Reconstruction
39
Lexical items: further reduction
Early analyses have shown:
- Optimal 40/100 item subset gives same results
 Less work
 Less missing data
 Faster processing; combinatorial explosion:
40 : 100
~
3 * 107 : 2 * 1010
ASJP: Automatic Reconstruction
40
Lexical items: stability
Most stable items:
ASJP: Automatic Reconstruction
41
Lexical items: stability
Most stable items:
Iteratively throw out the most unstable
item in terms of variation within genera
(3500-4000 years; Dryer 2001; 2005)
E.g. Germanic, Romance, Slavic, …
ASJP: Automatic Reconstruction
42
Lexical items: stability
Most stable items:
Iteratively throw out the most unstable
item in terms of variation within genera
(3500-4000 years; Dryer 2001; 2005)
E.g. Germanic, Romance, Slavic, …
Formula: S = (E - U)/(100 - U)
(weighted average % matches Eq vs Uneq)
ASJP: Automatic Reconstruction
43
Ethnologue (Goodmann-Kruskal)
WALS (Pearson)
++ < Stability > -ASJP: Automatic Reconstruction
44
I
dog
nose
die
smoke
you
louse
mouth
kill
fire
we
tree
tooth
swim
ash
this
seed
tongue
fly
burn
that
leaf
claw
walk
path
who
root
foot
come
mountain
what
bark
knee
lie
red
not
skin
hand
sit
green
all
flesh
belly
stand
yellow
many
blood
neck
give
white
one
bone
breasts
say
black
two
grease
heart
sun
night
big
egg
liver
moon
hot
long
horn
drink
star
cold
small
tail
eat
water
full
woman
feather
bite
rain
new
man
hair
see
stone
good
person
head
hear
sand
round
fish
ear
know
earth
dry
bird
eye
sleep
cloud
name
ASJP: Automatic Reconstruction
45
I
dog
nose
die
smoke
you
louse
mouth
kill
fire
we
tree
tooth
swim
ash
this
seed
tongue
fly
burn
that
leaf
claw
walk
path
who
root
foot
come
mountain
what
bark
knee
lie
red
not
skin
hand
sit
green
all
flesh
belly
stand
yellow
many
blood
neck
give
white
one
bone
breast
say
black
two
grease
heart
sun
night
big
egg
liver
moon
hot
long
horn
drink
star
cold
small
tail
eat
water
full
woman
feather
bite
rain
new
man
hair
see
stone
good
person
head
hear
sand
round
fish
ear
know
earth
dry
bird
eye
sleep
cloud
name
ASJP: Automatic Reconstruction
40
Most
Stable
46
I
dog
nose
die
smoke
you
louse
mouth
kill
fire
we
tree
tooth
swim
ash
this
seed
tongue
fly
burn
that
leaf
claw
walk
path
who
root
foot
come
mountain
what
bark
knee
lie
red
not
skin
hand
sit
green
all
flesh
belly
stand
yellow
many
blood
neck
give
white
one
bone
breast
say
black
two
grease
heart
sun
night
big
egg
liver
moon
hot
long
horn
drink
star
cold
small
tail
eat
water
full
woman
feather
bite
rain
new
man
hair
see
stone
good
person
head
hear
sand
round
fish
ear
know
earth
dry
bird
eye
sleep
cloud
name
ASJP: Automatic Reconstruction
H
o
m
o
p
h
o
n
e
s
47
Lexical items: transcription
First phase of project (2007):
Problems with full IPA representation of words:
ASJP: Automatic Reconstruction
48
Lexical items: transcription
First phase of project (2007):
Problems with full IPA representation of words:
- data entry via keyboard
ASJP: Automatic Reconstruction
49
Lexical items: transcription
First phase of project (2007):
Problems with full IPA representation of words:
- data entry via keyboard
- simple programming language (Fortran; Pascal)
ASJP: Automatic Reconstruction
50
Lexical items: transcription
First phase of project (2007):
Problems with full IPA representation of words:
- data entry via keyboard
- simple programming language (Fortran; Pascal)
 Recoding to simplified ASJPcode (only Ascii)
ASJP: Automatic Reconstruction
51
Lexical items: transcription
ASJPcode:
ASJP: Automatic Reconstruction
52
Lexical items: transcription
ASJPcode:
7 Vowels
ASJP: Automatic Reconstruction
53
Lexical items: transcription
ASJPcode:
7 Vowels
34 Consonants
ASJP: Automatic Reconstruction
54
Lexical items: transcription
ASJPcode:
7 Vowels
34 Consonants
Operators for:
Nasalization
Labialization
Palatalization
Aspiration
Glottalization
ASJP: Automatic Reconstruction
55
Lexical items: transcription
ASJPcode:
7 Vowels
34 Consonants
Operators for:
Nasalization
Labialization
Palatalization
Aspiration
Glottalization
 (some) complex syllables simplified (VXC  VC)
ASJP: Automatic Reconstruction
56
Abaza (Caucasian):
Meaning
PERSON
LEAF
SKIN
HORN
NOSE
TOOTH
ASJP: Automatic Reconstruction
57
Abaza (Caucasian):
Meaning
IPA
PERSON
ʕʷɨʧʼʲʷʕʷɨs
LEAF
bɣʲɨ
SKIN
ʧʷazʲ
HORN
ʧʼʷɨʕʷa
NOSE
pɨnʦʼa
TOOTH
pɨʦ
ASJP: Automatic Reconstruction
58
Abaza (Caucasian):
Meaning
IPA
ASJPcode
PERSON
ʕʷɨʧʼʲʷʕʷɨs
Xw~3Cw"yXw~3s
LEAF
bɣʲɨ
bxy~3
SKIN
ʧʷazʲ
Cw~azy~
HORN
ʧʼʷɨʕʷa
Cw"~3Xw~a
NOSE
pɨnʦʼa
p3nc"a
TOOTH
pɨʦ
p3c
ASJP: Automatic Reconstruction
59
Lexical items
Collected to date:
- Over 2100 languages, dialects and proto
ASJP: Automatic Reconstruction
60
Lexical items
Collected to date:
- Over 2100 languages, dialects and proto
- Mean number of items/language: 36.2 (/40)
ASJP: Automatic Reconstruction
61
Lexical items
Distribution:
Americas:
Eurasia:
27%
23%
Australia/PNG:
Austronesia:
Africa:
18%
15%
14%
Creoles:
Artificial:
2%
1%
ASJP: Automatic Reconstruction
62
Languages currently sampled
ASJP: Automatic Reconstruction
63
Lexical items: transcription
Second phase of project (2008):
Problems with full IPA representation solved:
ASJP: Automatic Reconstruction
64
Lexical items: transcription
Second phase of project (2008):
Problems with full IPA representation solved:
1. automatic conversion IPA to integer (Python)
ASJP: Automatic Reconstruction
65
Lexical items: transcription
Second phase of project (2008):
Problems with full IPA representation solved:
1. automatic conversion IPA to integer (Python)
2. (semi-)automatic recoding to ASJPcode:
transduction on the basis of a formal grammar
ASJP: Automatic Reconstruction
66
Lexical items: transcription
Abaza (Caucasian):
Meaning:
PERSON
ASJP: Automatic Reconstruction
67
Lexical items: transcription
Abaza (Caucasian):
Meaning:
PERSON
IPA:
ʕʷɨʧʼʲʷʕʷɨs
ASJP: Automatic Reconstruction
68
Lexical items: transcription
Abaza (Caucasian):
Meaning:
PERSON
IPA:
ʕʷɨʧʼʲʷʕʷɨs
Decimal:
661 695 616 679 700 690 695 661 695 616 115
ASJP: Automatic Reconstruction
69
Lexical items: transcription
Abaza (Caucasian):
Meaning:
PERSON
IPA:
ʕʷɨʧʼʲʷʕʷɨs
Decimal:
661 695 616 679 700 690 695 661 695 616 115
ASJPcode:
88 119 126 51 67 34 121 119 126 88 119 126 51 115
( = Xw~3Cw"y~Xw~3s)
ASJP: Automatic Reconstruction
70
Lexical items: transcription
Second phase of project (2008):
1. automatic conversion IPA to integer (Python)
2. (semi-)automatic recoding to ASJPcode:
transduction on the basis of a formal grammar
Why not run on full IPA??
ASJP: Automatic Reconstruction
71
Lexical items: transcription
Second phase of project (2008):
1. automatic conversion IPA to integer (Python)
2. (semi-)automatic recoding to ASJPcode:
transduction on the basis of a formal grammar
- correlations IPA ~ ASJP > 0.9
ASJP: Automatic Reconstruction
72
Lexical items: transcription
Second phase of project (2008):
1. automatic conversion IPA to integer (Python)
2. (semi-)automatic recoding to ASJPcode:
transduction on the basis of a formal grammar
- correlations IPA ~ ASJP > 0.9
- but: ASJP better fit with classifications
 IPA too specific
ASJP: Automatic Reconstruction
73
Lexical items: transcription
IPA:
ʕʷɨʧʼʲʷʕʷɨs
Decimal:
661 695 616 679 700 690 695 661 695 616 115
A  n661, n695, n616, …
…
PQABC
…
ZPQZ
formal grammar
ASJP++code: ( = any unicode string )
ASJP: Automatic Reconstruction
74
Lexical items: transcription
IPA:
ʕʷɨʧʼʲʷʕʷɨs
Decimal:
661 695 616 679 700 690 695 661 695 616 115
optimal level
of abstraction
for historical
phonological
reconstruction?
A  n661, n695, n616, …
…
PQABC
…
ZPQZ
ASJP++code: ( = any unicode string )
ASJP: Automatic Reconstruction
75
2. Comparing languages
ASJP: Automatic Reconstruction
76
Comparing words
LG
I
YOU
WE
ABAZA
sErE
w3rE
Sw~ErE
ABKHAZ
s3
w3
Sw~3
AGUL
zun
wun
cw~un
ASJP: Automatic Reconstruction
77
Comparing words
LG
I
YOU
WE
ABAZA
sErE
bErE
Sw~ErE
ABKHAZ
s3
w3
Sw~3
AGUL
zun
wun
cw~un
LDi=3
ASJP: Automatic Reconstruction
78
Comparing words
LG
I
YOU
WE
ABAZA
sErE
bErE
Sw~ErE
ABKHAZ
s3
w3
Sw~3
AGUL
zun
wun
cw~un
LDi=3
LDj=4
ASJP: Automatic Reconstruction
79
Comparing words
LG
I
YOU
WE
ABAZA
sErE
bErE
Sw~ErE
ABKHAZ
s3
w3
Sw~3
AGUL
zun
wun
cw~un
LDi=3
LDj=4
ASJP: Automatic Reconstruction
LDk=3
80
Comparing words
LG
I
YOU
WE
ABAZA
sErE
bErE
Sw~ErE
ABKHAZ
s3
w3
Sw~3
AGUL
zun
wun
cw~un
LDi=3
LDj=4
ASJP: Automatic Reconstruction
…
LDk=3
81
Comparing words
LG
I
YOU
WE
ABAZA
sErE
bErE
Sw~ErE
ABKHAZ
s3
w3
Sw~3
AGUL
zun
wun
cw~un
LDmean=3.73
LDi=3
LDj=4
ASJP: Automatic Reconstruction
…
LDk=3
82
Comparing words
LG
I
YOU
WE
ABAZA
sErE
bErE
Sw~ErE
ABKHAZ
s3
w3
Sw~3
AGUL
zun
wun
cw~un
LDmean=4.37
LDi=4
LDj=4
ASJP: Automatic Reconstruction
…
LDk=4
83
Comparing words
3.73
LG
I
YOU
WE
ABAZA
sErE
w3rE
Sw~ErE
ABKHAZ
s3
w3
Sw~3
AGUL
zun
wun
cw~un
ASJP: Automatic Reconstruction
84
Comparing words
3.73
LG
I
YOU
WE
ABAZA
sErE
w3rE
Sw~ErE
ABKHAZ
s3
w3
Sw~3
AGUL
zun
wun
cw~un
4.37
ASJP: Automatic Reconstruction
85
Comparing words
Levenshtein Distance
ASJP: Automatic Reconstruction
86
Comparing words
Levenshtein Distance
a. between 2 words:
Number of transformations to get from the shorter
form to the longer one (changes, additions)
ASJP: Automatic Reconstruction
87
Comparing words
Levenshtein Distance
a. between 2 words:
Number of transformations to get from the shorter
form to the longer one (changes, additions)
b. Between 2 languages:
E.g. mean LD for overlapping set (<= 40)
ASJP: Automatic Reconstruction
88
Comparing words
Levenshtein Distance
Two problems with simple LD:
ASJP: Automatic Reconstruction
89
Comparing words
Levenshtein Distance
Two problems:
1. Value depends on length of longest word
ASJP: Automatic Reconstruction
90
Comparing words
Levenshtein Distance
Two problems:
1. Value depends on length of longest word
 Normalize: LDN = ( LD / Lmax )
ASJP: Automatic Reconstruction
91
Comparing words
Levenshtein Distance
Two problems:
1. Value depends on length of longest word
 Normalize: LDN = ( LD / Lmax )
2. Differences between lgs in phonological overlap
ASJP: Automatic Reconstruction
92
Comparing words
Levenshtein Distance
Two problems:
1. Value depends on length of longest word
 Normalize: LDN = ( LD / Lmax )
2. Differences between lgs in phonological overlap
 Eliminate ‘noise’: LDND = ( LDN / LDNdifferent )
ASJP: Automatic Reconstruction
93
Comparing words
Levenshtein Distance
Two problems:
1. Value depends on length of longest word
 Normalize: LDN = 100 * LDN
2. Differences between lgs in phonological overlap
 Eliminate ‘noise’: LDND = 100 * LDND
ASJP: Automatic Reconstruction
94
Comparing languages
Levenshtein Distance for Language Pair
-
Mean of all LDND’s of words in common
ASJP: Automatic Reconstruction
95
Comparing languages
Levenshtein Distance for Language Pair
-
Mean of all LDND’s of words in common
-
Synonyms (12%):
- take Minimum pair
- take Mean
ASJP: Automatic Reconstruction
96
Comparing languages
Levenshtein Distance for Language Pair
-
Mean of all LDND’s of words in common
-
Synonyms (12%):
- take Minimum pair
- take Mean
Experimental
option
ASJP: Automatic Reconstruction
97
Comparing languages
AVAR (AVA: NAKH-DAGHESTANIAN > AVAR-ANDIC-TSEZIC)
/ AGUL (AGL: NAKH-DAGHESTANIAN > LEZGIC)
I
: dun=zun
* LDND=36.6
YOU
: mun=wun
* LDND=36.6
HORN
: tLar=k"arC
* LDND=66.0
FIRE
: c"a=c"a
* LDND= 0.0
FULL
: c"ura=ac"uf
* LDND=66.0
ALT: AGL= ac"ar
NEW
: c"iya=c"EyEr
* LDND=55.0
ALT: AGL= c"ayif
COMMON (LDND < 70) = AGL - AVA
6 (=15.8% of 38)
LD = 4.01 / LDN = 81.76 / LDND = 89.87
ASJP: Automatic Reconstruction
98
Comparing languages
AVAR (AVA: NAKH-DAGHESTANIAN > AVAR-ANDIC-TSEZIC)
/ AGUL (AGL: NAKH-DAGHESTANIAN > LEZGIC)
I
: dun=zun
* LDND=36.6
YOU
: mun=wun
* LDND=36.6
HORN
: tLar=k"arC
* LDND=66.0
FIRE
: c"a=c"a
* LDND= 0.0
FULL
: c"ura=ac"uf
* LDND=66.0
ALT: AGL= ac"ar
NEW
: c"iya=c"EyEr
* LDND=55.0
ALT: AGL= c"ayif
COMMON (LDND < 70) = AGL - AVA
6 (=15.8% of 38)
LD = 4.01 / LDN = 81.76 / LDND = 89.87
ASJP: Automatic Reconstruction
99
Comparing languages
AVAR (AVA: NAKH-DAGHESTANIAN > AVAR-ANDIC-TSEZIC)
/ AGUL (AGL: NAKH-DAGHESTANIAN > LEZGIC)
I
: dun=zun
* LDND=36.6
YOU
: mun=wun
* LDND=36.6
HORN
: tLar=k"arC
* LDND=66.0
FIRE
: c"a=c"a
* LDND= 0.0
FULL
: c"ura=ac"uf
* LDND=66.0
ALT: AGL= ac"ar
NEW
: c"iya=c"EyEr
* LDND=55.0
ALT: AGL= c"ayif
COMMON (LDND < 70) = AGL - AVA
6 (=15.8% of 38)
LD = 4.01 / LDN = 81.76 / LDND = 89.87
ASJP: Automatic Reconstruction
100
Comparing languages
AVAR (AVA: NAKH-DAGHESTANIAN > AVAR-ANDIC-TSEZIC)
/ AGUL (AGL: NAKH-DAGHESTANIAN > LEZGIC)
I
: dun=zun
* LDND=36.6
YOU
: mun=wun
* LDND=36.6
HORN
: tLar=k"arC
* LDND=66.0
FIRE
: c"a=c"a
* LDND= 0.0
FULL
: c"ura=ac"uf
* LDND=66.0
ALT: AGL= ac"ar
NEW
: c"iya=c"EyEr
* LDND=55.0
ALT: AGL= c"ayif
COMMON (LDND < 70) = AGL - AVA
6 (=15.8% of 38)
LD = 4.01 / LDN = 81.76 / LDND = 89.87
ASJP: Automatic Reconstruction
101
Comparing languages
AVAR (AVA: NAKH-DAGHESTANIAN > AVAR-ANDIC-TSEZIC)
/ AGUL (AGL: NAKH-DAGHESTANIAN > LEZGIC)
I
: dun=zun
* LDND=36.6
YOU
: mun=wun
* LDND=36.6
HORN
: tLar=k"arC
* LDND=66.0
FIRE
: c"a=c"a
* LDND= 0.0
FULL
: c"ura=ac"uf
* LDND=66.0
ALT: AGL= ac"ar
NEW
: c"iya=c"EyEr
* LDND=55.0
ALT: AGL= c"ayif
COMMON (LDND < 70) = AGL - AVA
6 (=15.8% of 38)
LD = 4.01 / LDN = 81.76 / LDND = 89.87
ASJP: Automatic Reconstruction
102
Comparing languages
AVAR (AVA: NAKH-DAGHESTANIAN > AVAR-ANDIC-TSEZIC)
/ AGUL (AGL: NAKH-DAGHESTANIAN > LEZGIC)
I
: dun=zun
* LDND=36.6
YOU
: mun=wun
* LDND=36.6
HORN
: tLar=k"arC
* LDND=66.0
FIRE
: c"a=c"a
* LDND= 0.0
FULL
: c"ura=ac"uf
* LDND=66.0
ALT: AGL= ac"ar
NEW
: c"iya=c"EyEr
* LDND=55.0
ALT: AGL= c"ayif
COMMON (LDND < 70) = AGL - AVA
6 (=15.8% of 38)
LD = 4.01 / LDN = 81.76 / LDND = 89.87
ASJP: Automatic Reconstruction
103
Comparing languages
AVAR (AVA: NAKH-DAGHESTANIAN > AVAR-ANDIC-TSEZIC)
/ AGUL (AGL: NAKH-DAGHESTANIAN > LEZGIC)
I
: dun=zun
* LDND=36.6
YOU
: mun=wun
* LDND=36.6
HORN
: tLar=k"arC
* LDND=66.0
FIRE
: c"a=c"a
* LDND= 0.0
FULL
: c"ura=ac"uf
* LDND=66.0
ALT: AGL= ac"ar
NEW
: c"iya=c"ayif
* LDND=55.0
ALT: AGL= c"EyEr
COMMON (LDND < 70) = AGL - AVA
6 (=15.8% of 38)
LD = 4.01 / LDN = 81.76 / LDND = 89.87
ASJP: Automatic Reconstruction
104
Comparing languages
AVAR (AVA: NAKH-DAGHESTANIAN > AVAR-ANDIC-TSEZIC)
/ AGUL (AGL: NAKH-DAGHESTANIAN > LEZGIC)
I
: dun=zun
* LDND=36.6
YOU
: mun=wun
* LDND=36.6
HORN
: tLar=k"arC
* LDND=66.0
FIRE
: c"a=c"a
* LDND= 0.0
FULL
: c"ura=ac"uf
* LDND=66.0
ALT: AGL= ac"ar
NEW
: c"iya=c"ayif
* LDND=55.0
ALT: AGL= c"EyEr
COMMON (LDND < 70) = AGL - AVA
6 (=15.8% of 38)
LD = 4.01 / LDN = 81.76 / LDND = 89.87
ASJP: Automatic Reconstruction
105
Comparing languages
AVAR (AVA: NAKH-DAGHESTANIAN > AVAR-ANDIC-TSEZIC)
/ AGUL (AGL: NAKH-DAGHESTANIAN > LEZGIC)
I
: dun=zun
* LDND=36.6
YOU
: mun=wun
* LDND=36.6
HORN
: tLar=k"arC
* LDND=66.0
FIRE
: c"a=c"a
* LDND= 0.0
FULL
: c"ura=ac"uf
* LDND=66.0
ALT: AGL= ac"ar
NEW
: c"iya=c"ayif
* LDND=55.0
ALT: AGL= c"EyEr
COMMON (LDND < 70) = AGL - AVA
6 (=15.8% of 38)
LD = 4.01 / LDN = 81.76 / LDND = 89.87
ASJP: Automatic Reconstruction
106
Comparing languages
LANG1
FRENCH
FRENCH
FRENCH
FRENCH
FRENCH
FRENCH
FRENCH
FRENCH
FRENCH
FRENCH
FRENCH
FRENCH
FRENCH
FRENCH
FRENCH
FRENCH
FRENCH
FRENCH
FRENCH
FRENCH
FRENCH
FRENCH
FRENCH
LANG2
ARPITAN
GALICIAN
ARAGONESE
FRIULIAN
ROMANSH_SURSILVAN
ROMANIAN
LATIN
CATALAN
ITALIAN
PORTUGUESE
SPANISH
DANISH
BERNESE_GERMAN
CIMBRIAN
BRABANTIC
NORTH_FRISIAN_AMRUM
JAMTLANDIC
LIMBURGISH
OLD_HIGH_GERMAN
PLAUTDIETSCH
NORTHERN_LOW_SAXON
STELLINGWERFS
FRANS_VLAAMS
FAM1
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
ASJP: Automatic Reconstruction
FAM2
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
INDO-EUROPEAN
LDND
55.63
74.49
76.16
74.64
77.80
74.37
80.07
71.69
75.91
74.38
80.91
93.11
93.18
94.43
95.18
95.30
94.58
94.78
92.70
95.35
90.87
92.85
94.08
107
3. Some results: genetic and areal proximity
ASJP: Automatic Reconstruction
108
Distance Matrix (0.5 * N * (N-1))
FRE
DUT
GAL
PRT
ENG
…
FRE
DUT
90.93
GAL
71.62
90.00
PRT
74.38
94.61
51.87
ENG
91.17 63.19
91.30
95.18
…
< Excel file >
ASJP: Automatic Reconstruction
109
Tools for Trees
ASJP: Automatic Reconstruction
110
Tools for Trees
 Input file to your preferred phylogenetic
software using an editor such as TextPad
(www.textpad.com)
ASJP: Automatic Reconstruction
111
Tools for Trees
 Input file to your preferred phylogenetic
software using an editor such as TextPad
(www.textpad.com)
 Run data using phylogenetic software such as
SplitsTree (www.splitstree.org)
ASJP: Automatic Reconstruction
112
Tools for Trees
 Input file to your preferred phylogenetic
software using an editor such as TextPad
(www.textpad.com)
 Run data using phylogenetic software such as
SplitsTree (www.splitstree.org)
 Choose the most appropriate algorithm
(Neighbour Joining for distance data)
ASJP: Automatic Reconstruction
113
Tools for Trees
 Input file to your preferred phylogenetic
software using an editor such as TextPad
(www.textpad.com)
 Run data using phylogenetic software such as
SplitsTree (www.splitstree.org)
 Choose the most appropriate algorithm
(Neighbour Joining for distance data)
 Prepare tree for presentation using using a tool
such as the Tree Explorer of MEGA
ASJP: Automatic Reconstruction
114
Salishan
Languages
(n=30)
ASJP: Automatic Reconstruction
115
NeighborJoining
Salishan
Languages
(n=30)
ASJP: Automatic Reconstruction
116
UPGMA
NeighborJoining
ASJP: Automatic Reconstruction
117
UPGMA
NeighborJoining
ASJP: Automatic Reconstruction
118
NeighborJoining
NeighborJoining:
ASJP: Automatic Reconstruction
119
NeighborJoining
NeighborJoining:
- specifically meant for
phylogenetic trees
ASJP: Automatic Reconstruction
120
NeighborJoining
NeighborJoining:
- specifically meant for
phylogenetic trees
- takes distance as point of
departure
ASJP: Automatic Reconstruction
121
NeighborJoining
NeighborJoining:
- specifically meant for
phylogenetic trees
- takes distance as point of
departure
- does NOT assume equal rate
of change
ASJP: Automatic Reconstruction
122
Mayan
(n=38)
ASJP: Automatic Reconstruction
123
Calibration of Method
Calibration: best options, parameters, factors:
A. for pure classification:
ASJP: Automatic Reconstruction
124
Calibration of Method
Calibration: best options, parameters, factors:
A. for pure classification:
- existing classifications (Ethnologue; WALS;
mainly the well-documented areas)
ASJP: Automatic Reconstruction
125
Calibration of Method
Calibration: best options, parameters, factors:
A. for pure classification:
- existing classifications (Ethnologue; WALS;
mainly the well-documented areas)
- expert knowledge of specific areas
ASJP: Automatic Reconstruction
126
Calibration of Method
Calibration: best options, parameters, factors:
A. for pure classification:
- existing classifications (Ethnologue; WALS;
mainly the well-documented areas)
- expert knowledge of specific areas
 diversion ±12%  niche!
ASJP: Automatic Reconstruction
127
Calibration of Method
Calibration: best options, parameters, factors:
B. for dating:
ASJP: Automatic Reconstruction
128
Calibration of Method
Calibration: best options, parameters, factors:
B. for dating:
- linguistically crucial historic events:
ASJP: Automatic Reconstruction
129
Linguistically crucial events
Date
Historical event
Linguistic event
c. 250
Goths conquer Dacia
split of E-W Romance
4th c
Irish invade Scotland
split of Irish-Scottish Gaelic
5th c
German kingdoms in W
Roman Empire
breakup of W Romance
5th c
Germans invade Britain
split of English-Frisian
5th-6th c
Britons flee to Brittany
split of Welsh-Breton
400-600
Hieroglyphic evidence
Ch'olan begins to split
768-814
Name of Charlemagne
attested
Proto-Slavic
ASJP: Automatic Reconstruction
130
Linguistically crucial events
Date
Historical event
Linguistic event
c. 250
Goths conquer Dacia
split of E-W Romance
4th c
Irish invade Scotland
split of Irish-Scottish Gaelic
5th c
German kingdoms in W
Roman Empire
breakup of W Romance
5th c
Germans invade Britain
split of English-Frisian
5th-6th c
Britons flee to Brittany
split of Welsh-Breton
400-600
Hieroglyphic evidence
Ch'olan begins to split
768-814
Name of Charlemagne
attested
Proto-Slavic
ASJP: Automatic Reconstruction
131
Linguistically crucial events
Date
Historical event
Linguistic event
c. 250
Goths conquer Dacia
split of E-W Romance
4th c
Irish invade Scotland
split of Irish-Scottish Gaelic
5th c
German kingdoms in W
Roman Empire
breakup of W Romance
5th c
Germans invade Britain
split of English-Frisian
5th-6th c
Britons flee to Brittany
split of Welsh-Breton
400-600
Hieroglyphic evidence
Ch'olan begins to split
768-814
Name of Charlemagne
attested
Proto-Slavic
ASJP: Automatic Reconstruction
132
Calibration of Method
Calibration: best options, parameters, factors:
B. for dating:
- linguistically crucial historic events
 Standard formula (Swadesh):
TimeDepth = log(Similarity) / 2 log Retention
ASJP: Automatic Reconstruction
133
Calibration of Method
Calibration: best options, parameters, factors:
B. for dating:
- linguistically crucial historic events
 Standard formula:
TimeDepth = log(Similarity) / 2 log Retention
ASJP: Automatic Reconstruction
134
Calibration of Method
Calibration: best options, parameters, factors:
B. for dating:
- linguistically crucial historic events
 Standard formula:
TimeDepth = log(LDND) / 2 log Retention
ASJP: Automatic Reconstruction
135
Calibration of Method
Calibration: best options, parameters, factors:
B. for dating:
- linguistically crucial historic events
 Standard formula:
TimeDepth = log(LDND) / 2 log Retention
ASJP: Automatic Reconstruction
136
Linguistically crucial events
Time linguistic event
LDND
Ret
1.75 split of E-W Romance
0.6753
0.73
1.65 split of Irish-Scottish Gaelic
0.6687
0.72
1.55 breakup of W Romance
0.6411
0.72
1.55 split of English-Frisian
0.6574
0.71
1.50 split of Welsh-Breton
0.5705
0.75
1.40 Ch'olan begins to split
0.5369
0.76
1.21 Proto-Slavic
0.5877
0.69
MEAN:
0.73
ASJP: Automatic Reconstruction
137
Calibration of Method
Calibration: best options, parameters, factors:
B. for dating:
- linguistically crucial historic events:
- Standard formula:
TimeDepth = log(LDND) / 2 log 73
ASJP: Automatic Reconstruction
138
Calibration of Method
Calibration: best options, parameters, factors:
B. for dating:
- linguistically crucial historic events:
- Standard formula:
TimeDepth = log(LDND) / 2 log 73 < 75%
ASJP: Automatic Reconstruction
139
Calibration of Method
Calibration: best options, parameters, factors:
B. for dating:
- linguistically crucial historic events:
- Standard formula:
TimeDepth = log(LDND) / 2 log 73 < 75%
Deeper!
ASJP: Automatic Reconstruction
140
Glottochronology only?
Calibration of method:
Glottochronology: all based on lexical distance
ASJP: Automatic Reconstruction
141
Glottochronology only?
Calibration of method:
Glottochronology: all based on lexical distance
Add other linguistic domains …
ASJP: Automatic Reconstruction
142
Glottochronology only?
Calibration of method:
Glottochronology: all based on lexical distance
Add other linguistic domains …
WALS Typological database
ASJP: Automatic Reconstruction
143
Glottochronology only?
Calibration of method:
Glottochronology: all based on lexical distance
Add other linguistic domains …
WALS Typological database
Best result:
(75% 40 lex) + (25% 40 Ph/M/S features)
ASJP: Automatic Reconstruction
144
4. On Inheritance vs Borrowing
ASJP: Automatic Reconstruction
145
Inherited or borrowed?
AVAR (AVA) / AGUL (AGL)
ASJP: Automatic Reconstruction
146
Inherited or borrowed?
AVAR (AVA) / AGUL (AGL)
I
YOU
HORN
FIRE
FULL
NEW
:
:
:
:
:
:
dun=zun
mun=wun
tLar=k"arC
c"a=c"a
c"ura=ac"uf
c"iya=c"EyEr
*
*
*
*
*
*
LDND=36.6
LDND=36.6
LDND=66.0
LDND= 0.0
LDND=66.0
LDND=55.0
ASJP: Automatic Reconstruction
147
Inherited or borrowed?
AVAR (AVA) / AGUL (AGL)
I
YOU
HORN
FIRE
FULL
NEW
:
:
:
:
:
:
dun=zun
mun=wun
tLar=k"arC
c"a=c"a
c"ura=ac"uf
c"iya=c"EyEr
*
*
*
*
*
*
LDND=36.6
LDND=36.6
LDND=66.0
LDND= 0.0
LDND=66.0
LDND=55.0
 6 items < 70.0
ASJP: Automatic Reconstruction
148
Inherited or borrowed?
AVAR (AVA) / AGUL (AGL)
I
YOU
HORN
FIRE
FULL
NEW
:
:
:
:
:
:
dun=zun
mun=wun
tLar=k"arC
c"a=c"a
c"ura=ac"uf
c"iya=c"EyEr
*
*
*
*
*
*
LDND=36.6
LDND=36.6
LDND=66.0
LDND= 0.0
LDND=66.0
LDND=55.0
 6 items < 70.0  Genetically related !!
ASJP: Automatic Reconstruction
149
Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA)
ASJP: Automatic Reconstruction
150
Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA)
ONE
TWO
PERSON
STAR
NIGHT
NEW
:
:
:
:
:
:
uno=unu
dos=dos
persona=petsona
estreya=estrecas
noCe=noces
nuevo=nueba
ASJP: Automatic Reconstruction
*
*
*
*
*
*
LDND=36.9
LDND= 0.0
LDND=55.3
LDND=61.2
LDND=68.2
LDND=44.2
151
Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA)
ONE
TWO
PERSON
STAR
NIGHT
NEW
:
:
:
:
:
:
uno=unu
dos=dos
persona=petsona
estreya=estrecas
noCe=noces
nuevo=nueba
*
*
*
*
*
*
LDND=36.9
LDND= 0.0
LDND=55.3
LDND=61.2
LDND=68.2
LDND=44.2
 6 items < 70.0
ASJP: Automatic Reconstruction
152
Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA)
ONE
TWO
PERSON
STAR
NIGHT
NEW
:
:
:
:
:
:
uno=unu
dos=dos
persona=petsona
estreya=estrecas
noCe=noces
nuevo=nueba
*
*
*
*
*
*
LDND=36.9
LDND= 0.0
LDND=55.3
LDND=61.2
LDND=68.2
LDND=44.2
 6 items < 70.0: RELATED ???
ASJP: Automatic Reconstruction
153
Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA)
ONE
TWO
PERSON
STAR
NIGHT
NEW
:
:
:
:
:
:
uno=unu
dos=dos
persona=petsona
estreya=estrecas
noCe=noces
nuevo=nueba
 RELATED ???
*
*
*
*
*
*
LDND=36.9
LDND= 0.0
LDND=55.3
LDND=61.2
LDND=68.2
LDND=44.2
NO!!!
ASJP: Automatic Reconstruction
154
Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA)
ONE
TWO
PERSON
STAR
NIGHT
NEW
:
:
:
:
:
:
uno=unu
dos=dos
persona=petsona
estreya=estrecas
noCe=noces
nuevo=nueba
*
*
*
*
*
*
LDND=36.9
LDND= 0.0
LDND=55.3
LDND=61.2
LDND=68.2
LDND=44.2
INDO-EUROPEAN < > AUSTRONESIAN
ASJP: Automatic Reconstruction
155
Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA)
ONE
TWO
PERSON
STAR
NIGHT
NEW
:
:
:
:
:
:
uno=unu
dos=dos
persona=petsona
estreya=estrecas
noCe=noces
nuevo=nueba
*
*
*
*
*
*
LDND=36.9
LDND= 0.0
LDND=55.3
LDND=61.2
LDND=68.2
LDND=44.2
CHANCE?
ASJP: Automatic Reconstruction
156
Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA)
ONE
TWO
PERSON
STAR
NIGHT
NEW
:
:
:
:
:
:
uno=unu
dos=dos
persona=petsona
estreya=estrecas
noCe=noces
nuevo=nueba
CHANCE?  ~ 5%
*
*
*
*
*
*
LDND=36.9
LDND= 0.0
LDND=55.3
LDND=61.2
LDND=68.2
LDND=44.2
(i.e. 1 – 2 items)
ASJP: Automatic Reconstruction
157
Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA)
ONE
TWO
PERSON
STAR
NIGHT
NEW
:
:
:
:
:
:
uno=unu
dos=dos
persona=petsona
estreya=estrecas
noCe=noces
nuevo=nueba
*
*
*
*
*
*
LDND=36.9
LDND= 0.0
LDND=55.3
LDND=61.2
LDND=68.2
LDND=44.2
BORROWING through LANGUAGE CONTACT
ASJP: Automatic Reconstruction
158
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
ONE
: uno=unu
* LDND=36.9
ASJP: Automatic Reconstruction
159
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
ONE
: uno=unu
* LDND=36.9
SPA <> CHA:
ASJP: Automatic Reconstruction
160
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
ONE
: uno=unu
SPA <> CHA:
* LDND=36.9
fam/gen=
0.24/0.82
ASJP: Automatic Reconstruction
161
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
ONE
: uno=unu
SPA <> CHA:
* LDND=36.9
fam/gen=
0.24/0.82 > 0.03/0.00
ASJP: Automatic Reconstruction
162
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
ONE
: uno=unu
SPA <> CHA:
* LDND=36.9
fam/gen=
0.24/0.82 > 0.03/0.00
phon pattern fit= 12.00 > 0.67
ASJP: Automatic Reconstruction
163
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
ONE
: uno=unu
SPA <> CHA:
* LDND=36.9
fam/gen=
0.24/0.82 > 0.03/0.00
phon pattern fit= 12.00 > 0.67
…
ASJP: Automatic Reconstruction
164
Borrowed!
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
ONE
: uno=unu
SPA > CHA:
* LDND=36.9
fam/gen=
0.24/0.82 > 0.03/0.00
phon pattern fit= 12.00 > 0.67
…
ASJP: Automatic Reconstruction
165
Borrowing
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
TWO
: dos=dos
SPA > CHA
f/g=
* LDND= 0.0
0.62/1.00
swF= 100.00
> 0.12/0.00
> 0.22
ASJP: Automatic Reconstruction
166
Borrowing
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
PERSON : persona=petsona
SPA > CHA
f/g=
0.20/0.64
swF= 32.40
* LDND=55.3
> 0.01/0.00
> 0.13
ASJP: Automatic Reconstruction
167
Borrowing
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
PERSON : persona=petsona
SPA > CHA
f/g=
0.20/0.64
swF= 32.40
* LDND=55.3
> 0.01/0.00
> 0.13
ALT: CHA= taotao (0.41/0.00)
ASJP: Automatic Reconstruction
168
Borrowing
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
PERSON : persona=petsona
SPA > CHA
f/g=
0.20/0.64
swF= 32.40
* LDND=55.3
> 0.01/0.00
> 0.13
ALT: CHA= taotao (0.41/0.00)
ASJP: Automatic Reconstruction
169
Borrowing
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
STAR
: estreya=estrecas
SPA > CHA
f/g=
* LDND=61.2
0.17/0.82
swF= 100.00
> 0.00/0.00
> 4.44
ALT: CHA= puti7on (0.03/0.00)
ASJP: Automatic Reconstruction
170
Borrowing
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
NIGHT : noCe=noces
SPA > CHA
f/g=
* LDND=68.2
0.23/0.55
swF= 100.00
> 0.04/0.00
> 0.10
ALT: CHA= pw~eNi (0.23/0.00)
ASJP: Automatic Reconstruction
171
Borrowing
SPANISH (SPA) INDO-EUROPEAN (128) > ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) > CHAMORRO
NEW
: nuevo=nueba
SPA > CHA
f/g=
* LDND=44.2
0.50/0.64
swF= 4.27
> 0.04/0.00
> 0.03
ASJP: Automatic Reconstruction
172
5. Conclusions
ASJP: Automatic Reconstruction
173
Conclusions
- Method for automatic reconstruction of language relationships
ASJP: Automatic Reconstruction
174
Conclusions
- Method for automatic reconstruction of language relationships
- Assess, discuss and correct existing classifications
ASJP: Automatic Reconstruction
175
Conclusions
- Method for automatic reconstruction of language relationships
- Assess, discuss and correct existing classifications
- Test hypotheses about genetic distances in time
ASJP: Automatic Reconstruction
176
Conclusions
- Method for automatic reconstruction of language relationships
- Assess, discuss and correct existing classifications
- Test hypotheses about genetic distances in time
- Locate potential borrowings
ASJP: Automatic Reconstruction
177
Conclusions
- Method for automatic reconstruction of language relationships
- Assess, discuss and correct existing classifications
- Test hypotheses about genetic distances in time
- Locate potential borrowings
- C O R E: incremental lexical database (> 35%)
ASJP: Automatic Reconstruction
178
Conclusions
- Method for automatic reconstruction of language relationships
- Assess, discuss and correct existing classifications
- Test hypotheses about genetic distances in time
- Locate potential borrowings
- C O R E: incremental lexical database (> 35%)

One day: Online
ASJP: Automatic Reconstruction
179
Conclusions
- Method for automatic reconstruction of language relationships
- Assess, discuss and correct existing classifications
- Test hypotheses about genetic distances in time
- Locate potential borrowings
- C O R E: incremental lexical database (> 35%)

One day: Online

Cooperation!!
ASJP: Automatic Reconstruction
180
Holman et al. (forthc. 2008)
Explorations in automated language classification.
Folia Linguistica
Brown et al. (forthc. 2008)
Automated Classification of the World’s languages:
A description of the method and prelimary results
Sprachtypologie und Universalienforschung
+ Several working papers
email.eva.mpg.de./~wichmann/ASJPHomePage
ASJP: Automatic Reconstruction
181
?
ASJP: Automatic Reconstruction
182
Descargar

Semantic Parameters of the Grammaticalization of …