The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

label.pl - Assign labels to clusters in a confusion matrix to maximize agreement

SYNOPSIS

 label.pl [OPTIONS] PRELABEL

Type label.pl --help for a quick summary of options

DESCRIPTION

Labels the discovered clusters with sense tags such that maximum number of contexts are correctly assigned.

INPUT

Required Arguments:

PRELABEL

Should be the output of cluto2label.pl.

Sample CLUTO2LABEL format

 2
 //     cord  phone   text   div
 C0:     4       3       0       0
 C1:     2       2       2       2
 C2:     1       3       3       2

 where the 1st line shows the number of unclustereted instances = 2 

 2nd line shows a space separated list of sense classes starting with // mark.

Each line thereafter shows the sense distribution of the instances belonging to each discovered cluster in the form of a cluster by sense distribution matrix. A cell value at (i,j) in the matrix shows the number of instances belonging to cluster Ci that have the sense tag Sj.

Note that each row begins with the cluster id that precedes a colon (:). Also, the number of sense classes on 2nd line should be same as the number of columns in the cluster by sense distribution table.

Optional Arguments:

--help

Displays this message.

--version

Displays the version information.

OUTPUT

Output shows the sense labels attached to each of the discovered clusters along with the score. Score tells the percentage of the total number of instances correctly clustered if the clusters are tagged with the sense labels as suggested.

Example :

Prelabel file =>

 0
 //      cord    divi    form    phon    prod    text
 C0:     35      26      44      18      23      43
 C1:     64      34      50      43      57      52
 C2:     0       3       1       2       0       3
 C3:     0       0       2       31      0       0
 C4:     1       28      0       4       6       0
 C5:     0       9       3       2       14      2

Label Output =>

 ClusterID -> SenseID
 C0 -> form
 C1 -> cord
 C2 -> text
 C3 -> phon
 C4 -> divi
 C5 -> prod
 Score = 30.67

shows that

 cluster C0 represents the 'form' sense
 cluster C1 represents the 'cord' sense
 cluster C2 represents the 'text' sense
 cluster C3 represents the 'phon' sense
 cluster C4 represents the 'divi' sense
 and cluster C5 represents the 'prod' sense

Also, 30.67 % of the total instances are in their right sense classes if the clusters are tagged with this labeling scheme.

AUTHORS

 Ted Pedersen, University of Minnesota, Duluth
 tpederse at d.umn.edu

 Amruta Purandare, University of Pittsburgh

 Anagha Kukarni, Carnegie-Mellon University

COPYRIGHT

Copyright (c) 2002-2008, Ted Pedersen, Amruta Purandare, Anagha Kulkarni

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to

 The Free Software Foundation, Inc.,
 59 Temple Place - Suite 330,
 Boston, MA  02111-1307, USA.