Date: 06-07 April 2022
Location: Rome, Italy
Workshop Description
The workshop illustrates the most popular statistical methods of Automatic Classification based both on the hierarchical approach, with reference to agglomerative algorithms, and on the non-hierarchical or partitive approach. In particular, within the partitive models, the workshop introduces the geometric methods, also mentioning the methods based on Gaussian mixture models.
The National Institute for the Analysis of Public Policies (Istituto Nazionale per l’Analisi delle Politiche Pubbliche, INAPP) is a public research organization, which carries out analysis, monitoring and evaluation of labor policies and employment services, education and training policies, social policies and all those public policies that have an effect on the labor market. INAPP collaborates with European institutions and is part of the Italian Statistical System (Sistema Statistico Nazionale, SISTAN) within which, together with Istat, it represents the only statistical reporting agency.
Prerequisites
Please bring your own laptop with a recent version of R and RStudio installed. Some additional packages are required and they will be presented and installed during the lab session.
Participants will be assumed to be familiar with basic statistical tools and with R environment, but no knowledge of advanced statistical modeling and clustering will be assumed.
Program
Day 1 | |||
---|---|---|---|
09:30 - 13:00 | Introduction to the concepts of distance, dissimilarity and the measurement of heterogeneity in statistics | Slides | Lab |
13:00 - 14:00 | Lunch break | ||
14:00 - 16:30 | Hierarchical methods: simple, average, complete linkage and Ward’s criterion, choice of number of groups and evaluation of the classification | Slides | Lab |
Day 2 | |||
---|---|---|---|
09:30 - 13:00 | Non-hierarchical methods: McQueen’s algorithm and the method of k-means, other geometric partitive algorithms | Slides | Lab |
13:00 - 14:00 | Lunch break | ||
14:00 - 16:30 | Gaussian mixtures: definition of mixture distribution, criteria for estimating mixture parameters and determination of classes | Slides | Lab |