Language-based Information
and Knowledge Analysis
Professor Khurshid Ahmad
Department of Computing
School of Electronics and Physical Sciences
University of Surrey
e-Science day at the Surrey Research Park, 2 December 2002
1
Talk Outline
 Computing Intelligently
 The complexity of science
 The triumvirate of understanding
 Dealing with information deluge
 The Missing Link: Images and Text
 Need for/of the Grid
 Afterword
2
Talk Outline
 Computing Intelligently
 The complexity of science
 The triumvirate of understanding
 Dealing with information deluge
 The Missing Link: Images and Text
 Need for/of the Grid
 Afterword
3
Computing Intelligently?
Knowledge
Language; Images
Symbols; Planning;
Learning, Thinking;
Creativity
Intelligence
Cognition
4
Computing Intelligently?
Artificially intelligent computing systems attempt to solve
problems based on an interpretation of work in psychology,
neurobiology, linguistics, mathematics and philosophy.
Knowledge
Language; Images
Symbols; Planning;
Learning, Thinking;
Creativity
Intelligence
Cognition
5
The triumvirate of understanding
Knowledge-based
INFORMATION
EXTRACTION
I R
Intelligence based on
SCIENTOMETRICS/
BIBLIOMETRICS
Intelligent:
INFORMATION
RETRIEVAL
Cognition based
INFORMATION
VISUALIZATION
6
The triumvirate of understanding
Major text data bases are online:
MEDLINE (11 million papers);
Physical Review Online Archive (c. 1890 to date);
US Patent Office (all patents from 1900 onwards);
Genome Data bases
Knowledge-based
INFORMATION EXTRACTION
I
Intelligence based on SCIENTOMETRICS/
BIBLIOMETRICS
R
Intelligent:
INFORMATION
RETRIEVAL
Cognition based INFORMATION
VISUALIZATION
7
The triumvirate of understanding
Major text and image data bases are online:
Reuters News (c. 3000 stories per day);
Spectroscopy and analytical data (NIS data bases);
Chemical Abstracts, where currently structure diagrams are
ignored;
Crime-related images with annotated information
Knowledge-based
INFORMATION EXTRACTION
I
Intelligence based on SCIENTOMETRICS/
BIBLIOMETRICS
R
Intelligent:
INFORMATION
RETRIEVAL
Cognition based INFORMATION
VISUALIZATION
8
The triumvirate of understanding
Major text and image data bases are online:
Recently, studies of how science and technology
evolves have been related to issues of business
management particularly the emergence of
competition, disruptive technologies, and opportunities
for collaboration across disciplines.
Knowledge-based
INFORMATION EXTRACTION
I
Intelligence based on SCIENTOMETRICS/
BIBLIOMETRICS
R
Intelligent:
INFORMATION
RETRIEVAL
Cognition based INFORMATION
VISUALIZATION
9
The triumvirate of understanding
Major text and image data bases are online:
Such methods are used essentially with structured
spatial and temporal data
Abstract non-spatial and atemporal data, for example,
free text as found in journal papers, in various
abstracts data bases (cf MEDLINE), in electronic mail
comprising user-to-expert communication, or in webaccess patterns, are typically visualised using the socalled thematic landscapes.
This would need the GRID.
Knowledge-based
INFORMATION EXTRACTION
I
Intelligence based on SCIENTOMETRICS/
R
Intelligent:
INFORMATION
RETRIEVAL
Cognition based INFORMATION
10
The triumvirate of understanding
Major text and image data bases are online:
Reuters News (c. 3000 stories per day);
Spectroscopy and analytical data (NIS data bases);
Chemical Abstracts, where currently structure diagrams are
ignored;
Crime-related images with annotated information
Knowledge-based
INFORMATION EXTRACTION
I
Intelligence based on SCIENTOMETRICS/
BIBLIOMETRICS
R
Intelligent:
INFORMATION
RETRIEVAL
Cognition based INFORMATION
VISUALIZATION
11
The triumvirate of understanding:
Need for/of the Grid
 Coordinating data sets based on common sets of
metadata: need for standards beyond those for
architecture of the Grid (OGSA)
 Grid-enabling text analysis systems would enable
processing of large volumes of distributed data
 Grids provide the infrastructure for development
of generic computing applications capable of
dealing with and combining results of analysis of
various types of data – language, images, graphs.
12
Computing Intelligently?
A knowledge-based system can be
programmed to reason over a set
of facts, propositions, rules and
rules of thumb and, sometimes,
the system may come to the same
conclusion as a human being.
13
Computing Intelligently – with
rules of thumb about images?
Recognising and reasoning about the visual
environment something that people do
extraordinarily well;
In these abilities an average three year old
makes the most sophisticated computer
vision system look embarrassingly inept
14
Computing Intelligently – with
rules of thumb about images?
The Vision Problem?
Three-dimensional physical structure in
the scene, containing pictures of objects
related to other (probably) known
objects, which projects into two
dimensional structure in the image.
15
Computing Intelligently – with
rules of thumb about words?
Natural Language. A person’s native tongue;
organic, ambiguous, creative, wilful
Natural Language Processing. Processing of
natural language (e.g., English) by a computer to
facilitate communication with the computer or for
other purposes, such as word processors,
computer-based dictionaries and thesauri,
summarizers, machine translators, text filters,
grammar checkers……….
16
Talk Outline
 Computing Intelligently
 The complexity of science
 The triumvirate of understanding
 Dealing with information deluge
 The Missing Link: Images and Text
 Need for/of the Grid
 Afterword
17
Complexity of science
35
LEXICAL DIFFICULTY
30
25
Nature
20
15
Science
10
Scientific
American
5
0
-5
'30
'40
'45
'50
'60
'65
'70
'80
'90
-10
YEAR OF PUBLICATION (BETWEEN 1930 & 1990)
18
Complexity of science
Lexical processes used by scientists
involve:




repetition of lexical items comprising the specific vocabulary of a
subject domain
inventing new words
borrowing words from other domains
re-defining words or terms
Such processes contribute significantly to
the organisation and communication of
tacit and explicit knowledge.
19
Complexity of science
We have developed a computer-based method that compares the
relative occurrence of single words in a English-scientific paper (or a
collection or corpus of papers) with the occurrence of the words in a
representative sample of contemporary English language.
The British National Corpus is a 100 million digital collection
of written (and spoken) English written/spoken during 19751993. Three-quarters of the text is drawn from (A-level+)
natural, social, applied sciences, from arts and culture,
commerce and finance. The other quarter includes works of
fiction and popular science.
BNC type corpora are used extensively in producing dictionaries for
general use.
20
Complexity of science
 Leo Esaki discovered a new semi-conductor device, the
tunnel diodes in 1957.
 The super-fast, current-switching device earned Esaki a
Nobel Prize, and yet technological obstacles hindered
widespread use in conventional, silicon-based circuits.
 Recent developments in tunnel diodes could help chipmakers boost silicon's speed while further shrinking
chips.
• We have developed a text corpus, comprising 100odd journal papers, published between 1980-2000,
containing over 430,000 words, on the topic of
tunnel diodes or more precisely on resonant tunnel
diodes.
21
Complexity of science
A lexico-morphological signature of discovery?

Weird/excessive use of tunnel:Frequency relative to BNC
Surrey Corpus
(a)
British National Corpus
(b)
tunnel
50
1
tunnels
3
2
tunnelled
70
1
tunnelling
685
1
Magnetotunneling does not exist in the British National
Corpus
22
Complexity of science
A lexico-morphological signature of the discovery of
tunnel diodes?
Lexical ‘productivity’ of tunnel & resonant: Frequently
used compound words

barrier
resonant
resonant
resonant
resonant
resonant
resonant
resonant
resonant
resonant
resonant
resonant
tunneling
tunneling
diodes
tunneling
diode
magnetotunneling
tunneling
structures
tunneling
peak
tunneling
structure
tunneling
structure
tunneling
spectroscopy
tunneling
processes
tunneling
system
172
25
19
16
8
8
6
6
6
4
4
23
Complexity of science
Lexicomorphological signature: Compound Words
tunneling diode
resonant tunneling diode
unipolar resonant tunneling diode
Same thing?
interband resonant tunneling diode
resonant interband tunneling diode - RITD
delta doped resonant tunneling diode
double-barrier resonant tunneling diode
quantum well resonant tunneling diode
bipolar light-emitting resonant tunneling diode
interband double barrier tunneling diode
24
Complexity of science
Information from journals is passed into patents.
Memory Devices
(9)
Semiconductor Devices
(5)
Tunnel Devices
(3)
Tunnel Diode
Leo Esaki
1980
Tunnel Devices
(1)
Heterojunction Devices
(2)
Semiconductor Devices
(2)
L. L. Chang, L. Esaki, W. E. Howard, R. Ludeke
and N. Schul, MBE in GaAs and AlAs
Journal, J. Vac. Sci. Technol. 10, 655
(1973)
H. Sakaki, L. L. Chang, R. Ludeke, C. A.
Chang, G. A. Sai-Halasz and L. Esaki;
Molecular Beam Epitaxy, Appl. Phys.
Lett. 31, 211 (1977)
C. A. Chang, R. Ludeke, L. Chang, and L.
Esaki, MBE of InGaAs and GaSbAs, Appl.
Phys. Lett. 31, 759 (1977).
25
Complexity of science
Visualising fashions in science and technology:
The movement of iconic terms.
26
Talk Outline
 Computing Intelligently
 The complexity of science
 The triumvirate of understanding
 Dealing with information deluge
 The Missing Link: Images and Text
 Need for/of the Grid
 Afterword
27
The triumvirate of understanding
Knowledge
Intelligence
Cognition
28
The triumvirate of understanding
with apologies to Plato
Knowledge about, knowledge by description:
knowledge of a person, thing, or perception gained
through information or facts about it rather than by
direct experience.
Language; Images
Symbols; Planning;
Learning, Thinking;
Creativity
An impersonation of
intelligence; an intelligent or
rational being; esp. applied to
one that is or may be
incorporeal; a spirit
COGNITION: The action or
faculty of knowing taken in its
widest sense, including
sensation, perception,
conception, etc., as
distinguished from feeling and
volition.
29
The triumvirate of understanding
with apologies to Aristotle
Knowledge of a person, thing, or other entity (e.g.
sense-datum, universal) by direct experience of it,
as opposed to knowing facts about it. So
knowledge of, by, acquaintance
Language; Images
Symbols; Planning;
Learning, Thinking;
Creativity
INTELLIGENCE: Knowledge
as to events, communicated
by or obtained from
another; information, news,
tidings.
COGNITION: A product
of such an action: a
sensation, perception,
notion, or higher
intuition
30
The triumvirate of understanding
Knowledge-based
INFORMATION
EXTRACTION
I R
Intelligence based on
SCIENTOMETRICS/
BIBLIOMETRICS
Intelligent:
INFORMATION
RETRIEVAL
Cognition based
INFORMATION
VISUALIZATION
31
Talk Outline
 Computing Intelligently
 The complexity of science
 The triumvirate of understanding
 Dealing with information deluge
 The Missing Link: Images and Text
 Need for/of the Grid
 Afterword
32
Dealing with information deluge
•There are over 2,000 news wires produced by Reuters
Financial together with on-line reports from banks,
brokerage houses, regulatory bodies. Filtering the relevant
from the not-so-relevant is a major problem.
•All major journals in science and technology, together with
pre-prints, textbooks, conference proceedings, technical
reports, research road-maps, (US) patent documents, are
all available (almost) freely. Extracting relevant document
from this intellectual deluge is challenging the limits of
documentation and has a serious impact on innovation and
technology transfer.
33
Dealing with information deluge
•The news report is one of the most commonly occurring
linguistic expressions.
•Despite being a good example of open-world data, a
news report is a contrived artefact:
•
each report has a potentially attention grabbing
headline;
•
the opening few sentences generally comprise a
good summary of the contents of the report;
•
there are slots for the date of origin and slots for
photographs and other graphic material.
34
Dealing with information deluge
Event
News
Market (Price)
Information
The relationship between Events,
News and Markets (price) through
Information.
35
Dealing with information deluge
Sep 11, 2001
Germany DAX(PERF)
Sep 11, 2001
Nasdaq Composite Index
Sep 11, 2001
Movement from
Feb 2001 to Jan
2002. Note the
dip on and
around Sep 11th
2001, although all
markets were
falling before this.
Sep 11, 2001
Japan NIKKEI AVERAGE INDEX(225)
Dow Jones Industrial Average
36
Dealing with information deluge
•Francis Knowles has written about the use of health
metaphors used in the financial news reports:
•markets are full of vigour and are strong or the
markets are anaemic or are weak (1996);
•most newspapers also use animal metaphors –
there are bull markets and bear markets, the former
refer to expansion, and indirectly to fertility, and the
latter to shy, retiring and grizzly behaviour much like
that reported about bears in popular press and in
literature for children.
37
Dealing with information deluge
Mainly Good News Stories
Rather Bad News Stories
Naval shipbuilder and military
contractor Vosper Thornycroft has
boosted its civil arm by buying
facilities manager Merlin
Communications (Nov 14, 2001)
Heavyweight banking and oil stocks
have dropped up the leading share
index as investors bet on fresh interest
rate cuts.’ (Nov 21, 2001).
The FTSE 100 stock index looks set to
open stronger today after Wall Street
added to gains seen at the London close
and with U.S. stock index futures
boosted by rumours that Osama bin
Laden had been captured.’(Nov 15,
2001).
The European Commission has slashed
its official growth forecasts for the euro
zone [..], predicting the most serious
slowdown since the 1990s recession,
with lower growth in 2002 than this
year.’ (Nov 21, 2001).
38
Dealing with information deluge
We created a corpus of 1,539 English financial texts from one source
(Reuters) on the World Wide Web, published during a 3 month period (Oct
2001-January 2002) comprising over 310,000 tokens. The corpus comprised
a blend of both short news stories and financial reports. Most of the news is
business news from Britain with thirty percent of the news is from Europe
and from the United States.
Week (5 day week)
Good Word
Frequency
Bad Word
Frequency
1
58
40
2
71
75
3
77
66
4
73
59
5
72
28
351
268
Total
Frequency of Good and Bad words in Nov 2001. The underlined figures in the 2nd and 3rd
columns indicate the minimum value of the frequency and the numbers in italics are the
maximum value.
39
Dealing with information deluge
1.2
1
0.8
Ratio 0.6
0.4
0.2
0
1
2
5
6
7
8
9
12
13
14
15
16
19
20
21
22
23
26
27
28
29
30
Date
Good words
FTSE100
Market correlation between ‘good’ word frequency and FTSE index.
40
Dealing with information deluge
1.2
1
0.8
Ratio0.6
0.4
0.2
0
1
2
5
6
7
8
9
12
13
14
15
16
19
20
21
22
23
26
27
28
29
30
Date
Good words
Bad words
FTSE100
Good and bad word frequency correlated with FTSE 100.
41
Dealing with information deluge
Partners
Reuters News Feed
JRC GmBH, Berlin
Up
Finsoft, London
SYSTEM QUIRK
Ibermatica, Madrid
Down
Time Series of
Up and Down
1.2
1
0.8
Ratio
0.6
0.4
0.2
0
1
2
5
6
7
8
9
12
13
14
15
16
19
20
21
22
23
26
27
28
29
30
Date
FTSE 100
INDEX
Good w ords
Generate Signal
(Buy / Sell)
FTSE100
1.2
1
0.8
Ratio
0.6
0.4
0.2
0
1
2
5
6
7
8
9
12
13
14
15
16
19
20
21
22
23
26
27
28
29
30
Date
Good w ords
FTSE100
This work is being carried out under the auspices of the EU-IST sponsored GIDA
project. The project aims to create a novel service type in the financial investment
business. Its novelty lies in the integration of financial analysis with news analysis
42
Dealing with information deluge
FTSE 100
plotted
against ‘bad
news’  20
February
2002 one of
the lowest
days.
The SATISFI
system keeps
track of news
reports with
bad (and
good) news.
43
Dealing with information deluge
• SATISFI Sentiment and Time Series: Financial analysis System is
being developed at the University of Surrey for the EU-IST GIDA
Project.
FTSE 100
Good
News
SATISFI is based on our existing text analysis system, System Quirk, together
with programs for time series analysis, text summarisation and organising large
text collections, and programs for creating thesauri and term bases. Systems for
learning the behaviour of the markets are also being developed.
44
Profiting from information
deluge?
See also: http://www.vicefund.com/
45
Dealing with information deluge
We have used a neural computing system that
creates its own categories given a class of
computational objects, say digitised, computer-
understandable version of a set of news stories – a set of
keywords representing the whole set. Some keywords will
be present in some stories or absent from the stories.
The system has to be trained on a set of
keywords and creates categories.
Then the system will categorise unseen stories
into the categories it has already created.
46
Dealing with information deluge
Automatic Categorization of Texts
Based on Keywords Using a neural
computing system
Our text corpus consisted of 100 Associated
Press (AP) news wires selected from 10 preclassified news categories shown together
with their icons. The average length of the
articles was 622 words.
47
Dealing with information deluge
Text Categories
1
Bioconversion
6
Exportation of
Industry
2
Pollution
Recovery
7
Foreign Trade
3
Alternative
Fuels
8
Int. Drug
Enforcement
4
Fossil Fuels
9
Foreign Car
Makers
5
Rain Forests
10
Worldwide
Tax Sources
Text categories used in the TIPSTER – SUMMARY program, but were
not known to our system
48
Dealing with information deluge
1
percent
15
mexico
29
mazda
43
enforcement
2
tax
16
emissions
30
gases
44
warming
3
billion
17
drugs
31
shale
45
smog
4
drug
18
fuels
32
deficit
46
ozone
5
reagan
19
senate
33
export
47
massachusetts
6
cars
20
auto
34
recycling
48
imports
7
taxes
21
proposal
35
epa
49
automobile
8
environmental
22
gasoline
36
honda
50
trafficking
9
pollution
23
exports
37
methanol
10
fuel
24
vehicles
38
automakers
11
federal
25
ohio
39
panama
12
dukakis
26
greenhouse
40
corp
13
bush
27
dioxide
41
forests
14
congress
28
marine
42
cocaine
Salient single words identified automatically by System Quirk
49
Dealing with information deluge
Results of a Full Text Map trained using exponentially decreased neighbourhood and
learning rate.
50
Dealing with information deluge
Results of a Full Text Map trained using exponentially decreased neighbourhood and
learning rate.
51
Dealing with information deluge
30 full text documents and corresponding summaries given to 4
TEXT
SUMMARISATION: Surrey’s Program Telepattern
assessors to decide whether the summary was acceptable. Results
per participant
below.
Evaluation
of summary
accuracy of 30 texts by 4 defence intelligence assessors
Participant
British Telecom
Univ. of Surrey
IBM
SRA
Centre for InfoRes (Russia)
New Mexico SU
Univ. of Pennsylvania
National Taiwan Univ.
CGI/Carnegie-Mellon Uni.
Lexis-Nexis
GE
Cornell/SabIR
Intelligent Algorithms
USCalifornia-ISI
Total
YES
85
72
71
67
61
54
51
50
39
35
31
25
23
14
678
NO
34
48
49
52
59
66
69
70
80
85
88
95
96
106
997
“Yes” %
71
60
59
56
51
45
42
42
32
29
26
14
14
11
40
THE PROGRAMS WERE EVALUATED by the US DoD’s TREC AND TIPSTER Programmes
Talk Outline
 Computing Intelligently
 The complexity of science
 The triumvirate of understanding
 Dealing with information deluge
 The Missing Link: Images and Text
 Need for/of the Grid
 Afterword
53
The Missing Link: Images and Text
The administration of justice requires systematic prosecution of the
perpetrators of crime.
One key element in this system is the collection, analysis and
dissemination of information collected safely and securely from the scene
where the crime was committed.
The information comprises images of the scene, the descriptions and
interpretations of these images.
 In a murder case there maybe over 2000 scene of crime images and the
case can take upto two years to come to courts. It is important for these
images to be indexed appropriately and be retrieved efficiently.
Scene-of-crime officers (SoCOs) play a key role in the collection
of this vital multi-modal information; they describe the image and
the context in which the images were collected. The police
officers involved in the administration of justice provide the
interpretation.
54
The Missing Link: Images and Text
The collateral texts – written texts or speech (fragments)
closely or loosely related to an image or objects within the
image.
CRIME
SCENE
REPORT
CLOSELY
COLLATERAL TEXTS
CAPTION
NEWSPAPER
ARTICLE
BROADLY
COLLATERAL TEXTS
DICTIONARY
DEFINITION
The collateral texts are special language texts and comprise
keywords that may help in indexing and retrieving the
images.
55
The Missing Link: Images and Text
The EPSRC-sponsored SoCIS project, involving Universities of
Surrey and Sheffield, is developing methods and techniques for
automatically indexing images with the descriptions provided by
Scene of Crime Officers.
Typical Scene of Crime Images
Fingerprints
showing
ridges
Body on floor
showing
adjacent table
9 mm browning
high power
pistol
Footwear
impression
in blood
The SoCIS project is investigating how the results of the
project can be generalised such that the methods and
techniques can be applied to an arbitrary domain.
56
The Missing Link: Images and Text
What SOCO’s do now? Forms, forms and more forms
57
The Missing Link: Images and Text
The SoCIS project is developing methods and techniques for automatically
indexing images taken at a crime scene with the descriptions provided by scene of
crime officers.
Five UK Police Forces are working closely with our project: They provide
knowledge of their subject domain, test our system and advise us generally.
Surrey
Police
South Yorkshire
Police
Metropolitan
Police
Hampshire
Constabulary
Kent
Constabulary
58
The Missing Link: Images and Text
The SoCIS project is developing methods and techniques for automatically
indexing images with the descriptions provided by scene of crime officers.
Shape
Buttons
Edit Button
Save Button
Select Button
Show All
Hotspots
Button
Delete Button
59
DESCRIBING IMAGES – THE LINK BETWEEN
IMAGES AND TEXT, THE MISSING LINK?

SOCIS: A prototype image and text storage and retrieval
system.
 Automatic Labelling (or INDEXING) of images by keywords in the
descriptions provided by the SOCO’s.
 Automatic Extraction of terms and their relationship to other terms
(ontology) from the descriptions and other texts.
EVIDENCE
TRACE EVIDENCE
BLOOD
INORGANIC
FIBRE
FIBRE
MANUFACTURED
POLYMERIC FIBRE
DNA
DYE FIBRE
The above hierarchy tree is based on our 0.7 million word forensic
science text corpus
60
DESCRIBING IMAGES – THE LINK BETWEEN
IMAGES AND TEXT, THE MISSING LINK?
SANNC: A neural computing system that learns how to
relate textual descriptions with images.


Automatic Clustering of similar images in an image collection.
Automatic Identification of the position of objects in an image or
image.
SELF ORGANISING MAP
IMAGE
TEXT
HEBBIAN
NETWORK
Nine millimetre
browning high
power self-loaded
pistol
Nine millimetre
browning high
power self-loaded
pistol
SELF ORGANISING MAP
61
DESCRIBING IMAGES – THE LINK BETWEEN
IMAGES AND TEXT, THE MISSING LINK?
Indexer Variability: Given the image descriptions are in
free text, perhaps each SOCO gives a different
description of the image?
Close up view
of exhibit ABC/3 red and silver knife
handle on alleyway floor adjacent to
building and metal gate.
[SOCO 1 – spontaneous free text:]
IDENTIFICATION
LOCATION
ELABORATION
[1] Close up view of exhibit ABC/3 [.]
[2] Red and silver knife handle.
On alleyway
floor
Adjacent to building
and metal gate
62
DESCRIBING IMAGES – THE LINK BETWEEN
IMAGES AND TEXT, THE MISSING LINK?
Indexer Variability: Given the image descriptions are in free text,
perhaps each SOCO gives a different description of the image?
Not really: there are three ‘structures’ – identification, location and
elaboration. The linguistic description shows little or no variation.
Research continues.
SOCO 5
Close up item 3.
SOCO 7
Close up of item 3 -
SOCO 1
Close up of knife.
SOCO 8
Close up view item 3 -
SOCO 2
Close up view of ex 3
SOCO 4
Close up view of exhibit 3
SOCO 3
Close up view of exhibit ABC/3
SOCO 6
Close view of marker 3
63
Variation amongst SOCO’s?
Indexer Variability: Given the image descriptions are in free text,
perhaps each SOCO gives a different description of the image?
Not really: there are three ‘structures’ – identification, location and
elaboration. The linguistic description shows little or no variation.
Research continues.
SOCO 2
a red handled lock knife
SOCO 6
against red handled knife.
SOCO 5
Knife handle.
SOCO 3
red and silver knife handle
SOCO 4
red handled flick knife
SOCO 8
red handled flick knife.
SOCO 7
red penknife.
SOCO 1
Red sides. Metal ends.
64
Talk Outline
 Computing Intelligently
 The complexity of science
 The triumvirate of understanding
 Dealing with information deluge
 The Missing Link: Images and Text
 Need for/of the Grid
 Afterword
65
Need for/of the Grid
 Data Grids

Management of large volumes of text, images,
financial data, ….
 Computational Grids

Processing of large volumes of such data
 Collaborative Grids

Activities in research – virtual crime
investigation
66
Need for/of the Grid
 Coordinating data sets based on common sets of
metadata: need for standards beyond those for
architecture of the Grid (OGSA)
 Grid-enabling System Quirk would enable
processing of large volumes of distributed data
 Grids provide the infrastructure for development
of generic computing applications capable of
dealing with and combining results of analysis of
various types of data
67
Talk Outline
 Computing Intelligently
 The complexity of science
 The triumvirate of understanding
 Dealing with information deluge
 The Missing Link: Images and Text
 Need for/of the Grid
 Afterword
68
Afterword: The Department of Computing
A research-active Department





Software Engineering
Theoretical Computing
Knowledge Management
Neural Computing
Information Extraction and Multi-media Group
•Applied to EPSRC to be involved with e-Science
Programme
•Looking to develop industrial collaborations for ALL
research activities
69
Afterword: The Department of Computing
A Department that has or is looking forward to active collaboration within
the University:
•Computer Vision (CVSSP – the new JIF Lab)
•Satellite Engineering (SSTL – Best Practice)
•Linguistics & Dance
A Department that is looking forward to active collaboration outside the
University with:
Unis Sheffield, Southampton, Metropolitan Police College, Queen Mary London
A Department that looking forward to exploit its software systems
especially financial prediction systems, language engineering systems.
70
Descargar

www.computing.surrey.ac.uk