Ted Pedersen > WordNet-SenseRelate-AllWords > semcor-reformat.pl


Annotate this POD


Open  0
View/Report Bugs


semcor-reformat.pl - Reformat SemCor sense tagged files for use by wsd.pl


 semcor-reformat.pl {--semcor DIR | --file FILE [FILE ...]} [--key] 


 semcor-reformat.pl --semcor ~/semcor2.0


This script reads a SemCor-formatted file and produces formatted text that can be used as input to wsd.pl. Alternatively, if the --key option is specified, the output will also include the sense number for each work, and this output can be used as a key file.

There are a few sources of data that are SemCor formatted, including SemCor itself and the Senseval-2 and Senseval-3 all words data sets. They have been made available for download by Rada Mihalcea:


Only the words that are assigned valid sense numbers will be passed through this program. All other words are discarded. This means that only open-class words that appear in WordNet will be passed through. Closed class words (pronouns, conjuctions, etc.) and other words not appearing in WordNet are discarded.



The location of the SemCor directory. This directory will contain several sub-directories, including 'brown1' and 'brown2'. Do not specify these sub-directories. Only specify the directory name that contains them. For example, if /home/user/semcor2.0 contains the brown1 and brown2 directories, you would only specify /home/user/semcor2.0 as the value of this option. Do not use this option at the same time as the --file option.


A semcor-formatted file to process. This can be used instead of the previous option to only specify a few Semcor files or to specify Senseval files. When this option is used, multiple files can be specified on the command line. For example

 semcor-reformat.pl --file br-a01 br-a02 br-k18 br-m02 br-r05

Do not attempt to use this option when using the previous option.


Generates a key file for use by the allwords-scorer2.pl program instead of a file that can be used for wsd.pl. The allwords-scorer2.pl program can be used to measure the performance of a word sense disambiguation program. See the documentation for scorer2-format.pl and allwords-scorer2.pl for more information.


 Jason Michelizzi

 Varada Kolhatkar, University of Minnesota, Duluth
 kolha002 at d.umn.edu

 Ted Pedersen, University of Minnesota, Duluth
 tpederse at d.umn.edu

This document last modified by : $Id: semcor-reformat.pl,v 1.17 2009/05/22 19:16:38 kvarada Exp $


 L<wsd-experiments.pl> L<scorer2-format.pl> L<scorer2-sort.pl> L<allwords-scorer2.pl>


Copyright (C) 2005-2008 by Jason Michelizzi and Ted Pedersen

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

syntax highlighting: