The course presents the mathematical and statistical theory of classification, discrimination and cluster analysis which are known in practice under pseudonyms such as 'pattern recognition', 'automatic diagnosis', 'model identification', numerical taxonomy', 'learning with/without teacher' and are applied in many practical domains (medicine, biology, marketing, engineering, speech recognition and image processing etc.).
Basically, we consider a set of 'individuals' or 'objects' which have certain observable properties (described by variables) and belong to one of several classes (groups, 'types' or populations) which are characterized by special constellations of these varaibles, e.g., by class-specific probability distributions. After having observed the available variables for these objects, the problem consists either (a) in constructing, from these data, a suitable number of 'classes' or 'clusters' which are homogeneous with respect to the observed data (cluster analysis, taxonomy), or (b) in assigning the objects to one of the (constructed or prespecified) classes and insofar to distinguish these classes with the help of the observed variables. These problems may be formulated in various ways leading to a theoretically interesting and practically helpful range of classification methods.
The course presents the mathematical and statistical concepts and principles for classifying individuals and leads to a range of theoretically based, or heuristically motivated, classification criteria and clustering strategies.
Contents:
Literature:
BOCK, H.H.: Automatische Klassifikation. Vandenhoeck and Ruprecht,
Goettingen, 1974.
FUKUNAGA, K.: Introduction to statistical pattern recognition. Academic
Press, New York, 1978.
HAND, D.J.: Discrimination and classification. Wiley, New York, 1986.
NIEMANN, H.: Klassifikation von Mustern. Springer, Heidelberg, 1983.
SEBER, G.A.F.: Multivariate observations. Wiley, New York, 1984.