View on
MetaCPAN is shutting down
For details read Perl NOC. After June 25th this page will redirect to
Ted Pedersen > Text-SenseClusters-1.03 >


Annotate this POD


Open  0
View/Report Bugs
Source   Latest Release: Text-SenseClusters-1.05

NAME ^ - Compute the distribution of senses in a Senseval-2 data file


You can find begin.v-test.xml in samples/Data begin.v-test.xml

Output =>

 <sense id="begin%2:30:00::" percent="64.31"/>
 <sense id="begin%2:30:01::" percent="14.51"/>
 <sense id="begin%2:42:04::" percent="21.18"/>
 Total Instances = 255
 Total Distinct Senses=3
 % of Majority Sense = 64.31

Type --help for a quick summary of options


Displays distribution of senses in a given Senseval-2 file to STDOUT. This information can be used to better understand the data, and also to decide to filter low frequency senses (using or balance the distribution of senses (using


Required Arguments:


SOURCE should be a Senseval-2 formatted file. The sense ids are searched by matching a regex /sense\s*id="S"/.

An instance having multiple sense ids should appear only once with multiple <answer> tags. e.g. If an instance IID has 2 sense ids SID1 and SID2, then in the SOURCE file, instance IID should be formatted as -

 <instance id="IID"> 
 <answer instance="IID" senseid="SID1"/>
 <answer instance="IID" senseid="SID2"/>
        Context Data comes here ....

Optional Arguments:


Displays this message.


Displays the version information.


Output displays

1. Total number of instances in SOURCE

These are counted by matching regex /instance id=\"ID\"/ for unique instance ids.

2. Total number of distinct sense tags found in SOURCE

These are searched by matching a regex /sense\s*id="S"/.

3. Sense Distribution

Output shows

<sense id="S" percent="P"/>

for each sense id found in SOURCE. P is the percentage frequency of the sense S.

4. % of Majority sense

This will be the highest sense percentage found in SOURCE.

Sample Output

 <sense id="begin%2:30:00::" percent="59.49"/>
 <sense id="begin%2:30:01::" percent="13.38"/>
 <sense id="begin%2:42:00::" percent="4.70"/>
 <sense id="begin%2:42:03::" percent="3.44"/>
 <sense id="begin%2:42:04::" percent="18.99"/>
 Total Instances = 548
 Total Distinct Senses=5
 % of Majority Sense = 59.49

Shows that there are total 548 instances and 5 senses.

The senses are distributed with frequencies


where majority sense has frequency = 59.49

The <sense> tags show the frequency of each individual tag.


 Ted Pedersen, University of Minnesota, Duluth
 tpederse at

 Amruta Purandare,  University of Pittsburgh 


Copyright (c) 2002-2008, Amruta Purandare and Ted Pedersen

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to :

 The Free Software Foundation, Inc.,
 59 Temple Place - Suite 330,
 Boston, MA  02111-1307, USA.
syntax highlighting: