On the Parallel Complexity of
Hierarchical Clustering and
CC-Complete Problems
Dr. Raymond Greenlaw
School of Computing
Armstrong Atlantic State University
and
Dr. Sanpawat Kantabutra
Department of Computer Science
Chiang Mai University
Outline
•
•
•
•
•
•
•
•
Introduction
Preliminaries
Algorithms for Hierarchical Clustering
Complexity of Hierarchical Clustering
CC-Complete Problems
Conclusions and Open Problems
References
Acknowledgments
On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems — Greenlaw and Kantabutra — 2
Outline
•
•
•
•
•
•
•
•
Introduction
Preliminaries
Algorithms for Hierarchical Clustering
Complexity of Hierarchical Clustering
CC-Complete Problems
Conclusions and Open Problems
References
Acknowledgments
On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems — Greenlaw and Kantabutra — 3
Introduction
• Clustering is a division of data into groups of
‘similar’ objects, where each group is given
a more-compact representation.
• Used to model very large data sets.
• Points are more similar to their own cluster
than to points in other clusters.
On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems — Greenlaw and Kantabutra — 4
Introduction
• Useful tool in data mining, where immense
data sets which are difficult to store and to
manipulate are involved.
• Study the parallel complexity of the
hierarchical clustering problem.
• Builds a tree of clusters.
• Sibling clusters in this tree partition the
points associated with their parent.
• Can explore data using various levels of
granularity.
On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems — Greenlaw and Kantabutra — 5
Introduction
• Two widely studied models
– Bottom-Up Starts with single-point clusters and
then recursively merges two or more of the
most-‘appropriate’ clusters.
– Top-Down Starts with one large cluster
consisting of all the data points and then
recursively splits the most-‘appropriate’ cluster.
• In both methods, the process continues until
a desired stopping condition is met such as a
required number of clusters or a diameter
bound of the ‘largest’ cluster.
On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems — Greenlaw and Kantabutra — 6
Introduction
• A variety of sequential versions of
hierarchical-clustering methods have been
studied:
– Cure Guha, et al.: Bottom-Up, good for clusters
having arbitrary shapes or outliers
– Chameleon Karypis et al.: Bottom-Up, relies
heavily on graph partitioning
– Principal Direction Divisive Partitioning Boley:
Top-Down, good for document collections
– Hierarchical Divisive Bisecting k-means
Steinbach: Top-Down
On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems — Greenlaw and Kantabutra — 7
Introduction
• Address the parallel complexity of
hierarchical clustering.
• Describe known sequential algorithms for
top-down and bottom-up hierarchical
clustering.
• Parallelize top-down, when n points are to
be clustered, provide an O(log n)-time,
n2-processor CREW-PRAM algorithm that
computes the same output as the
corresponding sequential algorithm.
On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems — Greenlaw and Kantabutra — 8
Introduction
• Define a natural decision problem based on
bottom-up hierarchical clustering and add
this Hierarchical Clustering Problem (HCP) to
the list of CC-complete problems, adding a
data mining problem for the first time.
• Show that HCP is one of the computationally
most-difficult problems in the Comparator
Circuit Value Problem (CCVP) class.
On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems — Greenlaw and Kantabutra — 9
Introduction
• Demonstrate that the HCP is very unlikely to
have an NC algorithm.
• In sharp contrast, give an NC algorithm for
the top-down sequential approach.
• Parallel complexities of top-down and
bottom-up are different, unless CC equals
NC.
On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems — Greenlaw and Kantabutra — 10
Outline
•
•
•
•
•
•
•
•
Introduction
Preliminaries
Algorithms for Hierarchical Clustering
Complexity of Hierarchical Clustering
CC-Complete Problems
Conclusions and Open Problems
References
Acknowledgments
On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems — Greenlaw and Kantabutra — 11
Preliminaries
• Interested in relating the complexity of
hierarchical clustering to that of a problem
involving Boolean circuits containing
comparator gates.
• Comparator gates have two output wires,
the first outputting the minimum and the
second outputting the maximum of its two
inputs.
• Each output has a maximum fanout of one.
On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems — Greenlaw and Kantabutra — 12
Preliminaries
• Based on the comparator gate
• Basis for an entire complexity class
Comparator Circuit Value Problem (CCVP)
• Given: An encoding  of a Boolean circuit 
composed of comparator gates, inputs
x1,…,xn, and a designated output y.
• Problem: Is output y of  TRUE on input
x1,…,xn?
On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems — Greenlaw and Kantabutra — 13
Preliminaries
• Let P denote the class of all languages decidable in
polynomial time.
• Let NC denote the class of all languages decidable
in poly-logarithmic time using a polynomial number
of processors on a PRAM.
• Let RNC denote the randomized version of NC.
• Let NLOG denote the class non-deterministic
logarithmic space.
• Let CC denote the class of problems that are NC
many-one reducible to CCVP.
On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems — Greenlaw and Kantabutra — 14
Outline
•
•
•
•
•
•
•
•
Introduction
Preliminaries
Algorithms for Hierarchical Clustering
Complexity of Hierarchical Clustering
CC-Complete Problems
Conclusions and Open Problems
References
Acknowledgments
On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems — Greenlaw and Kantabutra — 15
Algorithms for Hierarchical Clustering
Sequential Algorithms – Bottom-Up
• Input: set of points, distance function,
bound B, and desired number of clusters, k
• Output: set of clusters
• Pair up all points starting with the two
closest ones, then the next remaining two
closest ones, and so on, until all are paired.
• Next, the sets of points X and Y minimizing
dmin(X,Y) over all remaining sets are merged,
until only k sets remain.
On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems — Greenlaw and Kantabutra — 16
Algorithms for Hierarchical Clustering
Sequential Algorithms – Bottom-Up (cont.)
• Assumed that the number of input points is
even.
• There are no restrictions placed on the
distance function.
• In the first phase of the algorithm points are
clustered whose distance is less than or
equal to B.
• Operates in polynomial time.
On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems — Greenlaw and Kantabutra — 17
Algorithms for Hierarchical Clustering
Sequential Algorithms – Top-Down
• Function v(G) takes a graph as its argument
and returns a set that consists of the vertices
from G.
• Input: set of points, a distance function,
and the desired number of clusters k
• Output: set of clusters
• All points start in the same cluster.
On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems — Greenlaw and Kantabutra — 18
Algorithms for Hierarchical Clustering
• Compute a minimum-cost spanning tree.
• Form clusters by repeatedly removing the
highest-cost edge from what remains of a
minimum-cost spanning tree of the graph
corresponding to the initial set of points with
respect to the distance function, until
exactly k sets have been formed.
• Runs in polynomial time.
On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems — Greenlaw and Kantabutra — 19
Algorithms for Hierarchical Clustering
• Top-Down and Bottom-Up have different
parallel complexities, unless CC equals NC.
• Prove that the exact same clusters as
produced by the Sequential (Top-Down)
Hierarchical Clustering Algorithm can be
computed in NC.
• A natural decision problem based on the
Sequential (Bottom-Up) Hierarchical
Clustering Algorithm is CC-complete.
On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems — Greenlaw and Kantabutra — 20
Algorithms for Hierarchical Clustering
• Since a CC-complete problem is very unlikely
to have an NC algorithm and a problem with
an NC algorithm is very unlikely to be
CC-complete, the parallel complexities of
these two sequential algorithms are
different.
• For a fast parallel algorithm for hierarchical
clustering, the algorithm should be based on
the Top-Down approach.
On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems — Greenlaw and Kantabutra — 21
Algorithms for Hierarchical Clustering
• Theorem: Let n denote the number of points
to be clustered. The Parallel (Top-Down)
Hierarchical Clustering Algorithm can be
implemented in O(log n) time using
n2
processors on the CREW PRAM.
• This algorithm is an NC algorithm, which
means that the clusters can be computed
very fast in parallel.
• Any reasonable decision problem based on
this algorithm will be in NC.
On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems — Greenlaw and Kantabutra — 22
Outline
•
•
•
•
•
•
•
•
Introduction
Preliminaries
Algorithms for Hierarchical Clustering
Complexity of Hierarchical Clustering
CC-Complete Problems
Conclusions and Open Problems
References
Acknowledgments
On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems — Greenlaw and Kantabutra — 23
Complexity of Hierarchical Clustering
Hierarchical Clustering Problem (HCP)
• Given: A set S of n points in Rd, a distance
function dS : S x S  N, the number of
clusters k ≤ n/2  N, a distance bound B,
and two points x, y  S.
• Problem: Are x and y with dS(x, y) ≤ B in the
same cluster C after the first-phase of the
Sequential (Bottom-Up) Hierarchical
Clustering Algorithm?
On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems — Greenlaw and Kantabutra — 24
Complexity of Hierarchical Clustering
• No restrictions placed on the properties the
distance function must satisfy, the distances
themselves must be natural numbers.
• This version of the problem easily reduces to
the version where the weights come from R+.
• Not concerned with the distance between a
point and itself, the k is the number of
clusters to be formed.
• x and y are required to be no further apart
than the distance bound B.
On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems — Greenlaw and Kantabutra — 25
Complexity of Hierarchical Clustering
Lexicographically First Maximal Matching
Problem (LFMMP)
• Given: An undirected graph G = (V, E) with
an ordering on its edges plus a distinguished
edge e  E.
• Problem: Is e in the lexicographically first
maximal matching of G?
• A matching is maximal if it cannot be
extended.
On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems — Greenlaw and Kantabutra — 26
Complexity of Hierarchical Clustering
• LFMMP is CC-complete [Cook 1982, Mayr and
Subramanian 1992].
• Theorem: The Hierarchical Clustering
Problem is NC many-one reducible to the
Lexicographically First Maximal Matching
Problem, that is, HCP ≤ NC
m LFMMP.
• HCP is in CC.
On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems — Greenlaw and Kantabutra — 27
Complexity of Hierarchical Clustering
• Theorem: The Lexicographically First
Maximal Matching Problem is NC many-one
reducible to the Hierarchical Clustering
Problem, that is, LFMMP ≤ NC
m HCP.
• Proof Sketch: Let G = (V = {1,…,n},E),
ø : E  {1,…,|E|} be an ordering on E, and
e = {u,v}  E be an instance of the LFMMP.
Construct instance of HCP, a set S of n points
p1,…,pn in Rd, a distance function
dS : S x S  N, clusters k ≤ n/2 N, bound B,
and x,y  S.
On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems — Greenlaw and Kantabutra — 28
Complexity of Hierarchical Clustering
• Proof (cont.): Let S = {1,…,n,n+1,…,2n}. Let
V’ = S – V. Define the distance function
between each pair of points in S as follows:
dS(x,y) =

ø({x,y})
if {x,y}
E
=
2|E|
if x  V and y  V’ or vice versa
=
3|E|
if x  V’, y
=
4|E|
if x ≤ n, y ≤ n, x ≠ y, and
{x,y}  E
 V’, and x ≠ y
• Let B = |E|, k = n, and take u and v as our
points
On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems — Greenlaw and Kantabutra — 29
Complexity of Hierarchical Clustering
• Theorem: The Hierarchical Clustering
Problem is CC-complete.
• This expands the list of CC-complete
problems and adds the first clustering/data
mining problem to the class.
On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems — Greenlaw and Kantabutra — 30
Outline
•
•
•
•
•
•
•
•
Introduction
Preliminaries
Algorithms for Hierarchical Clustering
Complexity of Hierarchical Clustering
CC-Complete Problems
Conclusions and Open Problems
References
Acknowledgments
On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems — Greenlaw and Kantabutra — 31
CC-Complete Problems
Comparator Circuit Value Problem (CCVP)
• Given: An encoding  of a Boolean circuit 
composed of comparator gates, inputs x1,…,xn, and
a designated output y.
• Problem: Is output y of  TRUE on input x1,…,xn?
• References: [Cook 1982, Mayr and Subramanian
1992]
On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems — Greenlaw and Kantabutra — 32
CC-Complete Problems
Lexicographically First Maximal Matching Problem
(LFMMP)
• Given: An undirected graph G = (V, E) with an
ordering on its edges plus a distinguished edge
e  E.
• Problem: Is e in the lexicographically first maximal
matching of G?
• References: [Cook 1982, Mayr and Subramanian
1992]
• Remarks: Resembles the Lexicographically First
Maximal Independent Set Problem which is Pcomplete.
On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems — Greenlaw and Kantabutra — 33
CC-Complete Problems
Stable Marriage Problem (SMP)
• Given: A set of n men and a set of n women.
For each person a ranking of the opposite
sex according to their preference for a
marriage partner.
• Problem: Does the given instance of the
problem have a set of marriages that is
stable? The set is stable if there is no
unmatched pair {m, w} such that both m and
w prefer each other to their current
partners.
On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems — Greenlaw and Kantabutra — 34
CC-Complete Problems
Stable Marriage Problem (SMP)
• References: [Mayr and Subramanian 1992,
Subramanian 1989]
• Remarks: If the preference lists are
complete, there is always a solution.
Several variations of the SMP are also known
to be equivalent to the CCVP. The MaleOptimal Stable Marriage Problem finds a
matching in which no man could do any
better in a stable marriage.
On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems — Greenlaw and Kantabutra — 35
CC-Complete Problems
Stable Marriage Stable Pair Problem (SMSPP)
• Given: A set of n men and n women, for each
person a ranking of the opposite sex according to
their preference for a marriage partner, and a
designated couple Alice and Bob.
• Problem: Are Alice and Bob a stable pair for the
given instance of the problem? That is, is it the
case that Alice and Bob are married to each other
in some stable marriage?
• References: [Mayr and Subramanian 1992,
Subramanian 1989]
On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems — Greenlaw and Kantabutra — 36
CC-Complete Problems
Stable Marriage Minimum Regret Problem
(SMMRP)
• Given: A set of n men and n women, for
each person a ranking of the opposite sex
according to their preference for a marriage
partner, and a natural number k, 1 ≤ k ≤ n.
• Problem: Is there a stable marriage in which
every person has regret at most k? The
regret of a person in a stable marriage is the
position of her mate on her preference list.
On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems — Greenlaw and Kantabutra — 37
CC-Complete Problems
Stable Marriage Minimum Regret Problem
(SMMRP)
• References: [Mayr and Subramanian 1992,
Subramanian 1989]
• Remarks: The goal in this problem is to
minimize the maximum regret of any person.
On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems — Greenlaw and Kantabutra — 38
CC-Complete Problems
Telephone Connection Problem (TCP)
• Given: A telephone line with a fixed channel
capacity k, a natural number l, and a
sequence of calls (s1, f1),…, (sn, fn), where si
(fi) denotes the starting (respectively,
finishing) time of the i-th call. The i-th call
can be serviced at time si if the number of
calls being served at that time is less than k.
If the call cannot be served, it is discarded.
When a call is completed, the channel is
freed up.
On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems — Greenlaw and Kantabutra — 39
CC-Complete Problems
Telephone Connection Problem (TCP)
• Problem: Is the l-th call serviced?
• References: [Ramachandran and Wang 1991]
• Remarks: There is an O(min( n ,k) log n)time EREW-PRAM algorithm that uses n
processors for solving the TCP.
On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems — Greenlaw and Kantabutra — 40
CC-Complete Problems
Internal Diffusion Limited Aggregation
Predication Problem (IDLAPP)
• Given: A time T and a list of moves (t,i,s),
one for each time 0 ≤ t ≤ T indicating that at
time t for particle i, if still active, will visit
site s, plus a designated site d, and a
designated particle p. A particle is active if
it is still moving within the cluster, that is,
the particle has not yet stuck to the cluster
because all of the sites that it has visited so
far were occupied already.
On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems — Greenlaw and Kantabutra — 41
CC-Complete Problems
Internal Diffusion Limited Aggregation
Predication Problem (IDLAPP)
• Problem: Is site d occupied and is site p
active at time T?
• References: [Moore and Machta 2000]
On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems — Greenlaw and Kantabutra — 42
CC-Complete Problems
Internal Diffusion Limited Aggregation
Predication Square Lattice Problem
• Given: A time T and a list of moves (t,i,s) on
a square lattice, one for each time 0 ≤ t ≤ T
indicating that at time t for particle i, if still
active, will visit site s, plus a designated site
d, and a designated particle p.
• Problem: Is site d on the square latice
occupied and is site p active at time T?
• References: [Moore and Matcha 2000]
On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems — Greenlaw and Kantabutra — 43
CC-Complete Problems
Hierarchical Clustering Problem (HCP)
• Given: A set S of n points in Rd, a distance
function dS : S x S  N, the number of
clusters k ≤ n/2  N, a distance bound B,
and two points x, y  S.
• Problem: Are x and y with dS(x, y) ≤ B in the
same cluster C after the first-phase of the
Sequential (Bottom-Up) Hierarchical
Clustering Algorithm?
• Reference: [This work 2006]
On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems — Greenlaw and Kantabutra — 44
Outline
•
•
•
•
•
•
•
•
Introduction
Preliminaries
Algorithms for Hierarchical Clustering
Complexity of Hierarchical Clustering
CC-Complete Problems
Conclusions and Open Problems
References
Acknowledgments
On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems — Greenlaw and Kantabutra — 45
Conclusions
• A natural decision problem based on bottomup hierarchical clustering is CC-complete.
• Top-down hierarchical clustering is in NC.
• Brings the number of known CC-complete
problems to ten, and shows that the HCP is
unlikely to have a NC algorithm.
• Fast parallel algorithms for hierarchical
clustering should be based on a top-down
approach.
On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems — Greenlaw and Kantabutra — 46
Open Problems
• Is Euclidean HCP CC-complete? (It is in CC.)
• Determine the complexity of the secondphase of the Sequential (Bottom-Up)
Hierarchical Clustering Algorithm.
• Add new problems to the class of
CC-complete problems.
On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems — Greenlaw and Kantabutra — 47
Outline
•
•
•
•
•
•
•
•
Introduction
Preliminaries
Algorithms for Hierarchical Clustering
Complexity of Hierarchical Clustering
CC-Complete Problems
Conclusions and Open Problems
References
Acknowledgments
On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems — Greenlaw and Kantabutra — 48
References
• [Blumenthal 1953] Theory and Applications of Distance Geometry.
Oxford University Press.
• [Boley 1998] Principal direction divisive partitioning. Data Mining and
Knowledge Discovery, 2(4):325—344.
• [Chong, Han, and Lam 2001] Concurrent threads and optimal parallel
minimum spanning tree algorithms. Journal of the ACM, 48(2):297—
323.
• [Cole 1988] Parallel Merge Sort. SIAM Journal of Computing,
17(4):770—785.
• [Cook 1985] A taxonomy of problems with fast parallel algorithms.
Information and Control, 64(1—3):2—22.
• [Dash, Petrutiu, and Scheuermann 2004] Efficient parallel hierarchical
clustering. Lecture Notes in Computer Science, 3149:363—371.
• [Feder 1992] A new fixed point approach to stable networks and stable
marriages. Journal of Computer and System Sciences, 45(2):233—284.
• [Gibbons 1985] Algorithmic Graph Theory. Cambridge University Press.
• [Greenlaw 1992] A model classifying algorithms as inherently sequential
with applications to graph searching. Information and Computing,
97(2):133—149.
On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems — Greenlaw and Kantabutra — 49
References
• [Greenlaw, Hoover, and Ruzzo 1995] Limits to Parallel Computation: PCompleteness Theory. Oxford University Press.
• [Guha, Rastogi, and Shim 1998] Cure: An efficient clustering algorithm
for large databases. In ACM SIGMOD, pages 378—385, Seattle, WA.
Association for Computing Machinery.
• [Jain and Dubes 1988] Algorithms for Clustering Data. Prentice-Hall.
• [Karypis, Han, and Vumar 1999] CHAMELEON: A hierarchical clustering
algorithm using dynamic modeling. IEEE Computer, 32(8):68—75.
• [Kaufman and Rousseeuw 1990] Finding Groups in Data: An
Introduction to Cluster Analysis. John Wiley and Sons.
• [Li 1990] Parallel algorithms for hierarchical clustering and clustering
validity. IEEE Trans. Pattern Analysis and Machine Intelligence,
12(11):1088—1092.
• [Li and Fang 1989] Parallel clustering algorithms. Parallel Computing,
11(3):275—290.
• [Mayr and Subramanian 1992] The complexity of circuit value and
network stability. Journal of Computer and System Sciences,
44(2):302—323.
On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems — Greenlaw and Kantabutra — 50
References
• [Moore and Machta 2000] Internal diffusion-limited aggregation:
Parallel algorithms and complexity. Journal of Statistical Physics,
99(3/4):661—690.
• [Olson 1995] Parallel algorithms for hierarchical clustering. Parallel
Computing, 21(8):1313—1325.
• [Pólya, Tarjan, and Woods 1983] Notes on Introductory Combinatorics.
Birkhäuser, Boston.
• [Rajasekaran 2005] Efficient parallel hierarchical clustering algorithms.
IEEE Transactions on Parallel and Distributed Systems, 16(6):497—502.
• [Ramachandran and Wang 1991] Parallel algorithms and complexity
results for telephone link simulation. In Proceedings of the Third IEEE
Symposium on Parallel and Distributed Processing, pages 378—375,
Dallas, TX, December. IEEE.
• [Reif (ed) 1993] Synthesis of Parallel Algorithms. Morgan Kaufmann.
• [Sairam, Vitter, and Tamassia 1993] A complexity theoretic approach to
incremental computation. In Finkel, Enjalbert, and Wagner, editors,
STACS 93: 10th Annual Symposium on Theoretical Aspects of Computer
Science, volume 665 of Lecture Notes in Computer Sciences, pages 640—
649, Wurzburg, Germany, Fbruary. Springer-Verlag.
On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems — Greenlaw and Kantabutra — 51
References
• [Steinbach, Karypis, and Kumar 2000] A comparison of document
clustering techniques. In 6th ACM SIGKDD World Text Mining
Conference, Boston, MA. Association for Computing Machinery.
• [Subramanian 1989] A new approach to stable matching
problems. Technical Report STAN-CS-90-1275. Stanford
University, Department of Computer Science.
• [Subramanian 1990] The Computational Complexity of the
Circuit Value and Network Stability Problems, PhD thesis,
Stanford University. Depatment of Computer Science Technical
Report, STAN-CS-90-1311.
• [Tsai, Horng, Lee, Tsai, and Kao 1997] Parallel hierarchical
clustering algorithms on processor arrays with a reconfigurable
bus system. Pattern Recognition, 30(5):801—815.
• [Wu, Horng, and Tsai 2000] Efficient parallel algorithms for
hierarchical clustering on arrays with reconfigurable optical
buses. Journal of Parallel and Distributed Computing,
60(9):1137—1153.
On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems — Greenlaw and Kantabutra — 52
Outline
•
•
•
•
•
•
•
•
Introduction
Preliminaries
Algorithms for Hierarchical Clustering
Complexity of Hierarchical Clustering
CC-Complete Problems
Conclusions and Open Problems
References
Acknowledgments
On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems — Greenlaw and Kantabutra — 53
Acknowledgements
• Computer Science Department at Chiang Mai
University, Thailand
• Fulbright Commissions of Thailand and the
United States
• Jim Hoover and Larry Ruzzo for material
from [Greenlaw, Hoover, and Ruzzo 1995]
On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems — Greenlaw and Kantabutra — 54
Descargar

Slide 1