Generating Impact-Based
Summaries for Scientific
Literature
Qiaozhu Mei, ChengXiang Zhai
University of Illinois at Urbana-Champaign
1
Motivation
• Fast growth of publications
– >100k papers in DBLP; > 10 references per paper
• Summarize a scientific paper
– Author’s view: Abstracts, introductions
• May not be what the readers received
• May change over time
– Reader’s view: impact of the paper
• Impact Factor: numeric
• Summary of the content?
Author’s view:
Proof of xxx;
new definition of xxx;
apply xxx technique
State-of-the-art
algorithm;
Evaluation metric
Reader’s view 20 years
later
2
What should an impact summary
look like?
Citation Contexts  Impact, but…
• Describes how other
authors view/comment
on the paper
… They have been also successfully used
in part of speech tagging [7], machine
translation [3, 5], information retrieval [4,
20], transliteration [13] and text
summarization [14]. ... For example, Ponte
and Croft [20] adopt a language modeling
approach to information retrieval. …
– Implies the impact
• Similar to anchor text
on web graph, but:
• Usually more than one
sentences (informative).
• Usually mixed with
discussions/comparison
about other papers
(noisy).
4
Our Definition of Impact Summary
Target: extractive
summary (pick
sentences) of the
impact of a paper
Author picked
sentences: good
for summary, but
doesn’t reflect the
impact
Abstract:….
Introduction: …..
Content: ……
… Ponte and Croft [20] adopt
a language modeling
approach to information
retrieval. …
Citation Contexts
… probabilistic models, as
well as to the use of other
recent models [19, 21], the
statistical properties …
References: ….
Solution: Citation context  infer impact;
Original content  summary
Reader composed
sentences: good
signal of impact,
but too noisy to be
used as summary
5
Rest of this Talk
• An Feasibility study:
• A Language modeling based approach
– Sentence retrieval
• Estimation of impact language models
• Experiments
• Conclusion
6
Language Modeling in Information
Retrieval
Documents
Doc LM
Rank with neg.
KL Divergence
Query LM
d1
θ d1
- D(θq || θd1 )
θq
d2
θd2
- D(θq || θd1 )
Smooth using
collection LM
dN
θC
q
- D(θq || θd1 )
θdN
7
Impact-based Summarization as
Sentence Retrieval
D
Sentences
Sent LM
Rank with neg.
KL Divergence
s1
θs1
- D(θI || θs1 )
s2
θs 2
- D(θI || θs2 )
θD
sN
- D(θI || θs N )
θs N
Impact LM
θI
Key problem:
estimate θI
Use top ranked
sentences as a
summary
D
c1
c2
cM
8
Estimating Impact Language Models
• Interpolation of document language model and
citation language model
c1
c2
θD
θ C1
θC
θ C2
Constant coefficient:
p(w |  I )  (1   ) p(w | d )  p(w | C )
θI
D
Dirichlet smoothing:
cM
θ CM
p( w |  C ) 
p( w |  I ) 
1

j
M

j j 1
j
c( w, d )  c p( w | C )
| d |  c
Set λj with features of cj :
 j  f1 (c j )  f 2 (c j )  f3 (c j )  ...
f1(cj) = |cj|, and…
p( w |  C j )
9
Specific Feature – Citation-based
Authority
d2
d1
Pg(d )  (1   )
1
Pg(d ' )
 
N
d ', d '   d outDeg( d ' )
f 2 (c j )  Pg(d | c j  d )
• Assumption: High
authority paper has
more trustable
comments (citation
context)
• Weight more in impact
language model
• Authority  pagerank
on the citation graph
10
Specific Feature – Citation Context
Proximity
… There has been a lot of effort in applying
the notion of language modeling and its
variations to other problems. For example,
Ponte and Croft [20] adopt a language
modeling approach to information
retrieval. They argue that much of the
difficulty for IR lies in the lack of an
adequate indexing model. Instead of
making prior parametric assumptions about
the similarity of documents, they propose a
non-parametric approach to retrieval based
probabilistic language modeling.
Empirically, their approach significantly
outperforms traditional tf*idf weighting on
two different collections and query sets. …
• Weight citation
sentences
according to the
proximity to the
citation label
1
f
(
c
)

Pr(
c
)

j
• 3 j
k
• k  distance to
the citation label
11
Experiments
• Gold standard:
– human generated summary
– 14 most cited papers in SIGIR
• Baselines:
– Random; LEAD (likely to cover abs/intro.);
– MEAD – Single Doc;
– MEAD – Doc + Citations; (multi-document)
• Evaluation Metric:
– ROUGE-1, ROUGE-L
(unigram cooccurrence; longest common sequence)
12
Basic Results
Length Metric
Random
LEAD MEADDoc
MEADDoc +Cite
LM (KL-Div)
3
R-1
0.163
0.167
0.301
0.248
0.323
(+7.3%)
3
R-L
0.144
0.158
0.265
0.217
0.299
(+12.8%)
5
R-1
0.230
0.301
0.401
0.333
0.467
(+16.5%)
5
R-L
0.214
0.292
0.362
0.298
0.444
(+22.7%)
10
R-1
0.430
0.514
0.575
0.472
0.649
(+12.9%)
10
R-L
0.396
0.494
0.535
0.428
0.622
(+16.2%)
15
R-1
0.538
0.610
0.685
0.552
0.730
(+6.6%)
15
R-L
0.499
0.586
0.650
0.503
0.705
(+8.5%)
13
Component Study
• Impact language model:
Metric
Impact LM Impact LM =
= Doc LM Citation LM
Interpolation
ConstCoef Dirichlet
ROUGE-1 0.529
0.635
0.643
0.647
ROUGE-L 0.501
0.607
0.619
0.623
– Document LM << Citation Context LM
<< Interpolation (Doc LM, Cite LM)
– Dirichlet interpolation > constant coefficient
14
Component Study (Cont.)
• Authority and Proximity
– Both Pagerank and Proximity improves
– Pagerank + Proximity improves marginally
– Q: How to combine pagerank and proximity?
PageRank
Off
On
Proximity = Off Pr(s) = 1/αk
0.685
0.711
0.708
0.712
15
Non-impact-based Summary
Paper = “A study of smoothing methods for language
models applied to ad hoc information retrieval”
1. Language modeling approaches to information retrieval
are attractive and promising because they connect the
problem of retrieval with that of language model
estimation, which has been studied extensively in
other application areas such as speech recognition.
2. The basic idea of these approaches is to estimate a
language model for each document, and then rank
documents by the likelihood of the query according to
the estimated language model.
3. On the one hand, theoretical studies of an underlying
model have been developed; this direction is, for
example, represented by the various kinds of logic
models and probabilistic models (e.g., [14, 3, 15, 22]).
Good big
picture of
the field
(LMIR),
but not
about
contribution
of the paper
(smoothing
in LMIR)
16
Impact-based Summary
Paper = “A study of smoothing methods for language
models applied to ad hoc information retrieval”
1. Figure 5: Interpolation versus backoff for JelinekMercer (top), Dirichlet smoothing (middle), and
absolute discounting (bottom).
2. Second, one can de-couple the two different roles of
smoothing by adopting a two stage smoothing
strategy in which Dirichlet smoothing is first applied to
implement the estimation role and Jelinek-Mercer
smoothing is then applied to implement the role of
query modeling
3. We find that the backoff performance is more sensitive
to the smoothing parameter than that of interpolation,
especially in Jelinek-Mercer and Dirichlet prior.
Specific to
smoothing
LM in IR;
especially
for the
concrete
smoothing
techniques
(Dirichlet
and JM)
17
Related Work
• Text summarization (extractive)
– E.g., Luhn ’58; McKeown and Radev ’95; Goldstein et al. ’99; Kraaij et
al. ’01 (using language modeling)
• Technical paper summarization
– Paice and Jones ’93; Saggion and Lapalme ’02; Teufel and Moens ’02
• Citation context
– Ritchie et al. ’06; Schwartz et al. ’07
• Anchor text and hyperlink structure
• Language Modeling for information retrieval
– Ponte and Croft ’98; Zhai and Lafferty ’01; Lafferty and Zhai ’01
18
Conclusion
• Novel problem of Impact-based Summarization
• Language Modeling approach
– Citation context  Impact language model
– Accommodating authority and proximity features
• Feasibility study rather than optimizing
• Future work
– Optimize features/methods
– Large scale evaluation
19
Thanks!
20
Feature Study
• What we have explored:
–
–
–
–
–
Unigram language models - doc; citation context;
Length features
Authority features;
Proximity features;
Position-based re-ranking;
• What we haven’t done:
– Redundancy removal (Diversity);
– Deeper NLP features; ngram features;
– Learning to weight features;
21
Scientific Literature with Citations
paper
… While the statistical properties of
text corpora are fundamental to the
use of probabilistic models, as well as
to the use of other recent models [19,
21], the statistical properties …
paper
Citation
… They have been also successfully used
in part of speech tagging [7], machine
Citation
translation [3, 5], information retrieval [4,
20], transliteration [13] and text
summarization [14]. ... For example, Ponte
and Croft [20] adopt a language modeling
approach to information retrieval. …
Citation context
paper
22
Language Modeling in Information
Retrieval
• Estimate document language models
– Unigram multinomial distribution of words
– θd: {P(w|d)}
• Ranking documents with query likelihood
– R(doc, Q) ~ P(q|d), a special case of
– negative KL-divergence: R(doc, Q) ~ -D(θq || θd)
• Smooth the document language model
– Interpolation-based (p(w|d) ~ pML(w|d) + p(w|REF))
– Dirichlet smoothing empirically performs well
23
Descargar

Generating Impact-Based Summaries for Scientific …