Ted Pedersen > Text-SenseClusters-1.03 > sval2plain.pl

Download:
Text-SenseClusters-1.03.tar.gz

Annotate this POD

CPAN RT

Open  0
View/Report Bugs
Source  

NAME ^

sval2plain.pl - Convert a Senseval-2 data file into plain text format

SYNOPSIS ^

 sval2plain.pl [OPTIONS] SVAL2

Note that there are 255 instances (contexts) in the Senseval-2 formatted input file.

 frequency.pl begin.v-test.xml

OUTPUT =>

 <sense id="begin%2:30:00::" percent="64.31"/>
 <sense id="begin%2:30:01::" percent="14.51"/>
 <sense id="begin%2:42:04::" percent="21.18"/>
 Total Instances = 255
 Total Distinct Senses=3
 Distribution={64.31,21.18,14.51}
 % of Majority Sense = 64.31

After converting to plain text, note that there are 255 lines in that file, one per context.

 sval2plain.pl begin.v-test.xml > begin.v-test.txt

 wc begin.v-test.txt

OUTPUT =>

 255   15049   92598 begin.v-test.txt

You can find begin.v-test.xml in samples/Data

You can type sval2plain.pl --help for a quick summary of options

DESCRIPTION ^

Converts a given file from Senseval-2 format into plain text format. Each line of the plain text files contains a single context. This is useful when you have Senseval-2 data that you would like to use as feature extraction (training) data, which much be in plain text format.

INPUT ^

Required Arguments:

SVAL2

Input file in Senseval-2 format that is to be converted into plain text format.

Optional Arguments:

--help

Displays the summary of command line options.

--version

Displays the version information.

OUTPUT ^

sval2plain displays the given SVAL2 file in plain text format with the contextual data of each instance on a separate line. Specifically, each i'th line displayed on STDOUT shows the context of the i'th instance in the given SVAL2 file.

AUTHOR ^

 Ted Pedersen, University of Minnesota, Duluth
 tpederse at d.umn.edu

 Amruta Purandare, University of Pittsburgh

COPYRIGHT ^

Copyright (c) 2002-2008, Ted Pedersen and Amruta Purandare

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to

 The Free Software Foundation, Inc.,
 59 Temple Place - Suite 330,
 Boston, MA  02111-1307, USA.
syntax highlighting: