## ASSO: Analysis
of Symbolic Official
Data

### Project no. 1ST-2000-25161, 2001-2003

**Project:**

The project 'Analysis of Symbolic Official Data' (ASSO)
was a project in the framework of the Information Society Technologies (IST) Programme of the
European Union. It was designed in order to develop a software system for symbolic data analysis, including the
processing of data stemming from different sources. This resultet as an improved
version of the software system SODAS which has been developed in the context of a previous EU project in 1996-1999.

**Remark:**

'Symbolic data' refers to variables whose values are not numbers or categories (as it is typical for
classical data analysis with, e.g., *income* = 2.000 Euro, or
*colour* = * green*), but intervals (e.g., *income* = [min, max]),
* sets* of categories (e.g., *colour* = {*green*, *blue*}),
probability distributions or histograms (e.g., the income distribution of a country) etc., possibly with data-dependent restrictions,
logically dependent variables, and missing values.

* Symbolic data analysis* generalizes the classical data analysis
methods from statistics (such as principal component analysis, multidimensional scaling,
classification and clustering) to the case of symbolic variables.

In practice, these methods are useful for analyzing data from large data bases (where a preprocessing step
may reduce the large amount of classical data to a 'symbolic data table' of a smaller size).
Primary applications might be the analysis and visualization of data, data mining and KDD. Another application
is the integration of data and surveys originating from different European countries
with different, country-specific standards, in the framework of EUROSTAT.

**More information and a software package**

is available on the ASSO Website.
The scientific methods and algorithms of the SODAS project are described in:

Bock, H.H., and E. Diday (eds.):
**Analysis of Symbolic Data.**
**Exploratory Methods for Extracting Statistical Information from
Complex Data.**
Studies in Classification, Data Analysis, and Knowledge
Organization, Springer Verlag, Heidelberg, 2000, 425 pp., ISBN 3-540-66619-2.