Probabilistic models in cluster analysis

Computational Statistics and Data Analysis 23 (1996) 5-28.

This paper discusses cluster analysis in a probabilistic and inferential framework as opposed to more exploratory, heuristic or algorithmic approaches. It presents a broad survey on probabilistic models for partition-type, hierarchical and tree-like clustering structures and points to the relevant literature. It is shown how suitable clustering criteria or grouping methods may be derived from these models in the case of vector-valued data, dissimilarity matrices and similarity relations. In particular, we discuss hypothesis testing for homogeneity or for a grouping structure, the asymptotic distribution of test statistics, the use of random graph theory and combinatorial methods for simulating random dendrograms. Our presentation of hierarchical clustering includes, e.g., Markovian branching processes and phylogenetic inference based on molecular sequence data.

Keywords: Probabilistic cluster analysis; Partition-type clustering; Hierarchical clustering models; Testing for a clustering structure; Phylogenetic inference

Zurück zur Publikationsseite