The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

WebService::GoogleHack::Rate - This module implements a simple relatedness measure and semantic orientation related type functions.

SYNOPSIS

    use WebService::GoogleHack::Rate;

    #GIVE PATH TO INPUT FILE HERE

    my $INPUTFILE="";

    #GIVE PATH TO TRACE FILE HERE

    my $TRACEFILE="";

    #create an object of type Rate

    my $rate = WebService::GoogleHack::Rate->new(); 

    $results=$rate->measureSemanticRelatedness1("dog", "cat");

    #The PMI measure is stored in the variable $results, and it can also 
    #be accessed as $rate->{'PMI'};

    $results=$rate->predictSemanticOrientation($INPUTFILE, "excellent", "bad",$TRACEFILE);

    #The resutls can be accessed through 
    print $results->{'prediction'}."\n"; 
    $results->{'PMI Measure'}."\n"; 
    $rate->{'prediction'} &."\n"; 
    $rate->{'PMI Measure'}."\n"; 

DESCRIPTION

WebService::GoogleHack::Rate - This package uses Google to do some basic natural language processing. For example, given two words, say "knife" and "cut", the module has the ability to retrieve a semantic relatedness measure, commonly known as the PMI (Pointwise mututal information) measure. The larger the measure the more related the words are. The package can also predict the semantic orientation of a given paragraph of english text. A positive measure means that the paragraph has a positive meaning, and negative measure means the opposite.

PACKAGE METHODS

__METHOD__->new()

Purpose: This function creates an object of type Rate and returns a blessed reference.

__METHOD__->init(Params Given Below)

Purpose: This this function can used to inititalize the member variables.

Valid arguments are :

  • key

    string. key to the google-api

  • File_location

    string. This the wsdl file name

__METHOD__->measureSemanticRelatedness1(searchString1,searchString2)

Purpose: This function is used to measure the relatedness between two words.

Formula used: log(hits(w1)) + log(hits(w2)) - log(hits(w1w2))

Valid arguments are :

  • searchString1

    string. The search string which can be a phrase or word

  • searchString2

    string. The search string which can be a phrase or word

Returns: Returns the object containing the relatedness measure.

__METHOD__->measureSemanticRelatedness2(searchString1,searchString2)

Purpose: This function is used to measure the relatedness between two words.

Formula used: log(w1w2/(w1+w2))

Valid arguments are :

  • searchString1

    string. The search string which can be a phrase or word

  • searchString2

    string. The search string which can be a phrase or word

Returns: Returns the object containing the relatedness measure.

__METHOD__->measureSemanticRelatedness3(searchString1,searchString2)

Purpose: This function is used to measure the relatedness between two words.

Formula used: log( hits(w1w2) / (hits(w1) * hits(w2)))

Valid arguments are :

  • searchString1

    string. The search string which can be a phrase or word

  • searchString2

    string. The search string which can be a phrase or word

Returns: Returns the object containing the relatedness measure.

__METHOD__->predictSemanticOrientation(infile,posInf, negInf,trace)

Purpose: this function tries to predict the semantic orientation of a paragraph of text.

Valid arguments are :

  • infile

    string. The location of the review file

  • posInf.

    string. Positive inference such as excellent

  • negInf.

    string. Negative inference such a poor

  • trace.

    string. The location of the trace file. If a file_name is given, the results are stored in this file

Returns : the PMI measure and the prediction which is 0 or 1.

__METHOD__->predictWordSentiment(infile,posInf,negInf,html,trace)

Purpose:Given an file containing text, this function tries to find the positive and negative words. The formula used to calculate the sentiment of a word is based on the PMI-IR formula given in Peter Turneys paper.

              (hits(word AND "excellent") hits (poor))

         log2 ----------------------------------------

              (hits(word AND "poor") hits (excellent))

For more information refer the paper, "Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews" By Peter Turney.

  • infile

    string. The input file

  • posInf

    string. A positive word such as "Excellent"

  • negInf.

    string. A negative word such as "Bad"

  • html.

    string. Set to "true" if you want the results to be HTML formatted

    trace.

    string. Set to a file if you want the results to be written to the given filename.

returns : Returns an html or text version of the results.

__METHOD__->predictPhraseSentiment(infile,,posInf,negInf,html,trace)

Purpose:Given an file containing text, this function tries to find the positive and negative phrases. The formula used to calculate the sentiment of a phrase is based on the PMI-IR formula given in Peter Turneys paper.

              (hits(phrase AND "excellent") hits (poor))

         log2 ------------------------------------------
     
              (hits(phrase AND "poor") hits (excellent))

For more information refer the paper, "Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews" By Peter Turney.

  • infile

    string. The input file

  • posInf

    string. A positive word such as "Excellent"

  • negInf.

    string. A negative word such as "Bad"

  • html.

    string. Set to "true" if you want the results to be HTML formatted

    trace.

    string. Set to a file if you want the results to be written to the given filename.

returns : Returns an html or text version of the results.

AUTHOR

Pratheepan Raveendranathan, <rave0029@d.umn.edu>

Ted Pedersen, <tpederse@d.umn.edu>

BUGS

SEE ALSO

WebService::GoogleHack home page - http://google-hack.sourceforge.net

Pratheepan Raveendranathan - http://www.d.umn.edu/~rave0029/research

Ted Pedersen - www.d.umn.edu./~tpederse

Google-Hack Maling List <google-hack-users@lists.sourceforge.net>

COPYRIGHT AND LICENSE

Copyright (c) 2005 by Pratheepan Raveendranathan, Ted Pedersen

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to

The Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.