ETD 2011 – Cape Town
Examining Accesses by
Country, Language and
Area of Knowledge
ETD 2011 – South Africa
Ana Pavani
Laboratório de Automação de Museus, Bibliotecas
Digitais e Arquivos
Departamento de Engenharia Elétrica
Pontifícia Universidade Católica do Rio de Janeiro
Brazil
[email protected]
http://www.maxwell.lambda.ele.puc-rio.br/
This work is a continuation of a work presented last
year in Austin. The two works differ in the following
aspects:
 In 2010, there were 71 data sets and this work
considers 85 (20% more)
 East Timor was included because accesses from
this country have started happening
 The UNDP has changed the way HDI is computed,
so this data has been updated, as well as the
populations of the countries
 My co-author left the university, so this time I am
by myself
ETDs, PUC-Rio, BDTD & NDLTD
PUC-Rio
Rio de Janeiro
Brazil
PUC-Rio is a small private university. It is divided in 3
centers and each has graduate programs:
 CTCH (Humanities) – 6
 CCS (Social Sciences) – 10
 CTC (Science & Technology) – 10
 The oldest graduate program (EE) started in 1963.
 The newest graduate program is less than 5 years old.
Characteristics of PUC-Rio’s ETD program:
 First published ETD – May 2000
 ETDs became mandatory – Aug 2002
 Number of ETDs – 5,694 (Jun 2011)
 CTCH – 1,442
 CCS – 1,291
 CTC – 2,961
 Yearly average number of defended T&Ds(*) – 590
 (*) 2007, 2008, 2009 & 2010; (**) 2006, 2007, 2008 & 2009.
 There is retrospective digitization.
 ETDs are made available in chapters (graduate school
regulation – please, don’t ask me the reason!, but it will
change as of Oct 2011)
All
Number of ETDs
Average number of ETDs – June 2004 to June 2011
Average number of partitions – June 2011
Average of averages number of partitions – June 2004 to June
2011
CTCH
CCS
CTC
5,694
1,442
1,291
2,961
3,553.1
888.9
726.1
1,938.3
7.3
7.9
7.3
7.0
6.93
7.82
6.83
6.57
PUC-Rio’s ETDs, BDTD(*) and NDLTD (**) :
 Number of BDTD institutions – 97 (OAI-PMH data
providers)
 Number of BDTD metadata records – 170K+ (BDTD
is an OAI-PMH data and service provider)
 BDTD records are/were harvested by OCLC and
other institutions, and made available worldwide
 Brazilian ETDs are the largest collection in
Portuguese available worldwide
 (*) BDTD – Biblioteca Digital de Teses e Dissertações = Brazilian Nat’l Consortium.
 (**) You must know what NDLTD stands for!!!
Accesses to PUC-Rio’s ETDs:
 Access logs saved since – Jun 2004
 Number of monthly logs when article was written –
85
pt & es IN THE WORLD
Worldwide
Western
Languages
Internet
Portuguese
7th
3rd
6th
Spanish
2nd
1St
3rd
pt is the official or one of
the official languages of:










Angola
Brazil
Cape Verde
Equatorial Guinea (*)
East Timor (**)
Guinea-Bissau
Macau (***)
Mozambique
Portugal
Sao Tome and Principe
(*) es & pt official
(**) less than 5% of the population know it; it was banned
during the Indonesian rule
(***) UNDP did not publish in the last report; other data
were used
es is the official or one of
the official languages of:





















Argentina
Bolivia
Chile
Colombia
Costa Rica
Cuba
Dominican Rep
Ecuador
El Salvador
Equatorial Guinea (*)
Guatemala
Honduras
Mexico
Nicaragua
Panama
Paraguay
Peru
Puerto Rico
Spain
Uruguay
Venezuela
Assumptions for the analysis:
 ETDs are very specialized items – people who seek
ETDs are highly educated
 es and pt are quite similar languages – educated
people who can speak one can read the other
 es and pt-speakers are potential readers of PUCRio’s ETDs
 2 countries were not considered:
 Brazil – is the home country
 US – there are very large groups of es and pt-speaking
persons but neither one is the language of the country
 2 groups were defined:
 “international group” – all countries except Brazil and the
US
 “pt+es group” – all countries that have pt and/or es as one
of the official languages
 Factors considered to influence accesses to ETDs:
 Population size
 Level of education
 Access to the Internet
DEALING WITH COUNTRIES DIFFERENCES
Sao Tome and
Principe has
165K
inhabitants
Mexico has
110M
inhabitants
Portugal and
Spain are in
Europe
Angola and
Mozambique
are in Africa
Portugal has
10M
inhabitants
Spain has 45M
inhabitants
Argentina and
Honduras are
in Latin
America
Equatorial
Guinea has the
2 languages
Quantization of potential accesses from countries
that are very different :
 Need to find data on the factors that may influence
accesses to ETDs:




Population size – easy
Level of education – difficult (literacy rates are easy!)
Access to the Internet – difficult
All data should be considered in the same time-frame
 Knowledge that the second and the third factors
are dependent on how developed countries are
 Knowlede that it was necessary to combine the 3
factors
Decision on how to deal the countries differences:
 Use UNDP’s HDI – Human Development Index that
contains information on the second and the third
factors (HDI combines indicators of life expectancy,
education and income; the new way it is computed contains
means years of schooling and expected years of schooling,
going beyond literacy rates)
 Decision to combine HDI with the population size
Index I = Population x HDI
All
Total population
Average HDI
Index I
CTCH
420,281,000
57,858,800
0.707
0.527
309,420,871
25,114,111
Comments:
 21 es-speaking and 10 pt-speaking countries
(Equatorial Guinea was counted in both)
 Average HDI for es-speaking countries is 34.16%
higher than the other group
 Population of the es-speaking countries is almost
7.4 times the population of the other group
 Index I for the es-speaking group is 12.36 times
the same index for the pt-speaking group
The expectation was to have many more accesses from es-speaking countries than
from pt-speaking countries!!
WORKING WITH DATA AND RESULTS
Information:
 Number of sets of data – 85 (one for each month)
 For each set, 16 variables were computed (examples
– number of countries, number of pt-speaking countries
countries, total number of accesses, etc)
 All data were computed for the complete set and
for each of the 3 areas of knowledge
From the sets (collection and areas ) side
This analysis focused on the way the whole collection
and each individual set – CTCH, CCS and CTC – were
accessed from countries in different groups.
Results:
 Total number of countries that accessed ETDs –
204
 CTCH – 183
 CCS – 183
 CTC – 189

Total number in the “international group” – 202
 CTCH – 181
 CCS – 181
 CTC – 187
 Maximum number of countries in the “international
group” in a month – 143
 CTCH – 112
 CCS – 108
 CTC – 132

Maximum number of countries in the “pt+es
group” in a month – 28 (maximum possible 30)
 CTCH – 27
 CCS – 27
 CTC – 27
 Number of months with accesses from 100 or more
countries – 42
 CTCH – 18
 CCS – 15
 CTC – 32
 Some percentages follow
% accesses
from the international group
All
CTCH
CCS
CTC
8.48
7.99
7.89
9.12
in the international group from the es+pt-sepaking group
69.03
73.27
68.56
66.32
in the es+pt-speaking group from pt-speaking countries
82.07
87.11
84.44
77.35
in the international group from pt-speaking countries
56.65
63.83
57.89
51.30
in the international group from Portugal
49.74
57.39
49.54
44.69
in the es+pt-speaking group from Portugal
72.05
78.27
72.26
67.39
in the pt-speaking group from Portugal
87.89
88.92
85.57
87.12
Comments:
 Absolute values for CTC are higher – this area has
the largest collection (higher than the sum of the others)
 Percentages for CTC are lower, except for accesses
from the “international group”
 Is it more international?
 It seems that language is not very important in C&T
From the accesses side
This analysis focused on the way accesses behaved
for the complete collection and how they spread
among the sets – CTCH, CCS and CTC.
The collection and the sets have different profiles –
numbers of ETDs and numbers of partitions. For this
reason, normalization was necessary.
Quantization of potential accesses to sets of works
with different profiles:
 Sets are very different in:
 Numbers of ETDs
 Numbers of partitions per ETD
 This means that numbers of accesses had to be
normalized in order to compare accesses to the
sets
 This work presents a first attempt to quantize the
way the “average ETD” in a set “attracts” accesses
Decision on how to deal the sets differences:
 Combine average numbers of ETDs with average of
average numbers of partitions
Index EI = 1 / (average number of ETDs x
average of average numbers of partitions)
All
Index EI
0.000041
CTCH
0.000144
CCS
0.000202
CTC
0.000079
Average numbers of
Accesses
All
CTCH
CCS
CTC
740.40
901.73
755.93
644.32
Accesses from the int group
62.77
72.01
59.63
58.79
Accesses from the es+pt-speaking group
43.33
52.76
40.88
38.99
Accesses from the pt-speaking countries
35.56
45.96
34.52
30.16
Accesses from Portugal
31.22
41.32
29.54
26.27
 numbers computed for the total numbers of acesses x index EI
 numbers to be viewed as accumulated
 monthly averages can be obtained dividing by 85
Comments:
 When normalized data is consideredn, the average
number of accesses (per ETD in Science & Technology)
from the international group is the lowest among
all
 The same happens with accesses from the es+pt
and pt-speaking groups, and Portugal as well
 The reason is that ETDs in this group have the
lowest average of accesses per ETD among the 3
subsets
 When normalized data is considered, the average
number of accesses (per ETD in Humanities) from the
international group is the highest among all
 The same happens with accesses from the es+pt
and pt-speaking groups, and Portugal as well
 The reason is that ETDs in this group have the
highest average of accesses per ETD among the 3
subsets
FINAL COMMENTS
 Percentage wise, international accesses are the
most significant for ETDs in S&T
 At the same time, the “average S&T ETD”
“attracts” less international accesses than ETDs in
other areas of knowledge and the “average
Humanities ETD” “attracts” the most
 In all areas of knowledge, accesses from:
 es- and/or pt-speaking countries are the most significant
 pt-speaking countries are the most significant in the
“es+pt–group”
 Portugal are the most significant in the pt-group
 New ways of defining “attraction” should be
examined
Results seem to indicate that language and
HDI are important factors in accesses
Thank you!
Muito obrigada!
Descargar

Examining Accesses by Country and Language