Jeffrey Baker > Finance-CompanyNames-1 > Finance::CompanyNames

Download:
Finance-CompanyNames-1.tar.gz

Dependencies

Annotate this POD

View/Report Bugs
Module Version: 1   Source  

NAME ^

Finance::CompanyNames - Functions for finding company names in English free text

SYNOPSIS ^

    use Finance::CompanyNames;
    
    my $corps = {
        MSFT => 'Microsoft'
      , INTC => 'Intel'
      , etc...
    };
    
    Finance::CompanyNames::Init($corps)
    $hashref = Finance::CompanyNames::Match($freetext);

DESCRIPTION ^

Finance::CompanyNames finds company names in English text. The user provides a list of company names they wish to find, and the body of text to search. The module then uses natural language processing techniques to find those names or their variants in the text. For example, if a company is alternately referred to as "XYZ", "XYZ Corp.", "XYZ Corporation", and "The XYZ Corporation", Finance::CompanyNames will recognize all variants.

INTERFACE ^

Initialization

It is necessary to call Finance::CompanyNames::Init() before anything else. The argument to this function is a reference to a hash. The canonical use is to use stock tickers as the keys and company names as values. However, you are free to use anything for the keys.

Searching

Finance::CompanyNames::Match searches a body of text for company names. The only argument is a scalar containing the text. The return value is a reference to a hash of references to hashes. The keys are the stock ticker symbols of company names found in the text, or other keys you may have used in Init(). The values are hashes with keys "freq" and "contexts". "freq" is the number of times the company was seen in the text, and "contexts" is a reference to an array storing the bit of text mentioning the company.

For example:

$rv = { INTC => { freq => 10 , contexts => [ "blah blah blah blah blah Intel blah blah blah blah" , "blah Intel Corp. blah blah blah blah blah blah" ] } };

NOTE ^

Please note that Finance::CompanyNames allocates a massive amount of memory. It loads a complete English wordlist as well as a list of English root words and their affixes. This requires approximately 20MB of memory on the author's computer. It is possible for a future version to behave differently. Please mail the author if you have an improvement.

Also please note this module only works with English text, due to the included word and stem lists.

AUTHORS ^

Finance::CompanyNames is a product of Gilder, Gagnon, Howe, & Co. LLC. Mail GGHC Skunkworks <cpan@gghcwest.com> regarding this software.

LICENSE ^

Finance::CompanyNames is distributed under the Artistic License, the same terms under which Perl itself is distributed.

syntax highlighting: