Maggie J. Xiong > WordNet-BestStem > WordNet::BestStem



Annotate this POD

View/Report Bugs
Module Version: 0.2.2   Source  


WordNet::BestStem -- get the best guess stem of a word.




  my $best = best_stem( 'roses', {V=>1} );


Based on the assumption that the stem has the highest occurence frequency in text corpus. Of course it is not always true, but for certain purposes it may be justifiable to treat the most frequent form as stem.

Find a word's variant forms. Returns the highest frequency (part-of-speech) form according to ICFinder's "information content file", which comes by default with WordNet but can be customized.

ICFinder has frequency count for n and v part-of-speech and not a or r. When a or r is involved, use the number of senses for part-of-speech intead of fre of wp to choose form.

Alternatively, best_stem can use a custom word variant frequency table.



Returns in list context the best guess stem form, part-of-speech, and frequency; returns in scalar context the stem form.

*Note: WordNet does not at the moment have variant forms for very high frequency words, like "what", "the", "would". best_stem returns empty string in such cases.

Default options (case insensitive):

  V     => 0,         # verbose. for debugging / checking
  FRE   => undef,     # % ref to custom word variant frequency table


  use WordNet::BestStem qw( best_stem );

  print best_stem('misgivings');          # misgiving n 8
  print best_stem('roses');               # rose n 5
  print best_stem('rose');                # rise v 17

Compared to WordNet::stem,

  use WordNet::QueryData;
  use WordNet::stem;

  $WN = WordNet::QueryData->new();
  $stemmer = WordNet::stem->new($WN)

  print $stemmer->stemWord('misgivings')  # misgiving
  print $stemmer->stemWord('roses')       # rose
  print $stemmer->stemWord('rose')        # rose rise

Compared to Lingua::Stem::En,

  use Lingua::Stem::En qw( stem );

  $stems = stem( { -words => ['misgivings'] } );
  print @$stems;                          # misgiv

  $stems = stem( { -words => ['roses'] } );
  print @$stems;                          # rose

  $stems = stem( { -words => ['rose'] } );
  print @$stems;                          # rose


Uses contextual info, ie appearances of word forms in paragraph/corpus to help choose stem form.

Default options (case insensitive):

  V     => 0,
  FRE   => undef,    # % ref to custom word variant frequency table
  STEM  => undef,    # % ref to stem_of{string} table per best_stem


  use WordNet::BestStem qw( deluxe_stems );

  my $stemmed_text = deluxe_stems \@text;

or in list context

    # ref to @, %, %, %
  my ($stemmed, $stem_of, $stem_fre, $str_fre) = deluxe_stems \@paragraph;

For two paragraphs / sentences,

  a) beautiful roses i would like a long stem rose
  b) he thinks that average salary rose in the last few years


  $a_ = deluxe_stems \@a;
  print @$a_;
    # beautiful rose i would like a long stem rose
    # he think that average salary rise in the last few year

Compared to best_stem,

  @a_ = map { scalar( best_stem $_ ) || $_ } @a;
  print "@a_\n";
    # beautiful rose i would like a long stem rise
    # he think that average salary rise in the last few year


  WordNet  ( )


~~~~~~~~~~~~ ~~~~~ ~~~~~~~~ ~~~~~ ~~~ `` ><(((">

Copyright (C) 2009 Maggie J. Xiong < maggiexyz >

All rights reserved. There is no warranty. You are allowed to redistribute this software / documentation as Perl itself.

syntax highlighting: