View on
Ted Pedersen > Text-SenseClusters >


Annotate this POD


Open  0
View/Report Bugs

NAME ^ - Assign labels to clusters in a confusion matrix to maximize agreement


Type --help for a quick summary of options


Labels the discovered clusters with sense tags such that maximum number of contexts are correctly assigned.


Required Arguments:


Should be the output of

Sample CLUTO2LABEL format

 //     cord  phone   text   div
 C0:     4       3       0       0
 C1:     2       2       2       2
 C2:     1       3       3       2

 where the 1st line shows the number of unclustereted instances = 2 

 2nd line shows a space separated list of sense classes starting with // mark.

Each line thereafter shows the sense distribution of the instances belonging to each discovered cluster in the form of a cluster by sense distribution matrix. A cell value at (i,j) in the matrix shows the number of instances belonging to cluster Ci that have the sense tag Sj.

Note that each row begins with the cluster id that precedes a colon (:). Also, the number of sense classes on 2nd line should be same as the number of columns in the cluster by sense distribution table.

Optional Arguments:


Displays this message.


Displays the version information.


Output shows the sense labels attached to each of the discovered clusters along with the score. Score tells the percentage of the total number of instances correctly clustered if the clusters are tagged with the sense labels as suggested.

Example :

Prelabel file =>

 //      cord    divi    form    phon    prod    text
 C0:     35      26      44      18      23      43
 C1:     64      34      50      43      57      52
 C2:     0       3       1       2       0       3
 C3:     0       0       2       31      0       0
 C4:     1       28      0       4       6       0
 C5:     0       9       3       2       14      2

Label Output =>

 ClusterID -> SenseID
 C0 -> form
 C1 -> cord
 C2 -> text
 C3 -> phon
 C4 -> divi
 C5 -> prod
 Score = 30.67

shows that

 cluster C0 represents the 'form' sense
 cluster C1 represents the 'cord' sense
 cluster C2 represents the 'text' sense
 cluster C3 represents the 'phon' sense
 cluster C4 represents the 'divi' sense
 and cluster C5 represents the 'prod' sense

Also, 30.67 % of the total instances are in their right sense classes if the clusters are tagged with this labeling scheme.


 Ted Pedersen, University of Minnesota, Duluth
 tpederse at

 Amruta Purandare, University of Pittsburgh

 Anagha Kukarni, Carnegie-Mellon University


Copyright (c) 2002-2008, Ted Pedersen, Amruta Purandare, Anagha Kulkarni

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to

 The Free Software Foundation, Inc.,
 59 Temple Place - Suite 330,
 Boston, MA  02111-1307, USA.
syntax highlighting: