Tommie M. Jones > Statistics-Burst-0.2 > Statistics::Burst

Download:
Statistics-Burst-0.2.tgz

Dependencies

Annotate this POD

CPAN RT

New  1
Open  0
View/Report Bugs
Module Version: 0.2   Source  

NAME ^

Statistics::Burst - Perl Implementation of Kleinberg's Word Busts algorithm

SYNOPSIS ^

  use Statistics::Burst;
  my $burstObj=Burst::new();
  $burstObj->generateStates(3,.111,2);
  $burstObj->gamma(.5);
  $burstObj->setData(\@gap_space);
  $burstObj->process();
  $statesUsed=$burstObj->getStatesUsed();

DESCRIPTION ^

This Burst Module is an implementation of Kleinberg's Word Bursts algorithm. The paper describing the algorithm is located at http://www.cs.cornell.edu/home/kleinber/bhs.pdf.

What this algorithm implementation does is after a few parameters are set the driver code will pass it a list of numbers. The list of numbers are the time diferences between different arrivals. So if you were modelling the arrivals of people into a store entrance and they arrive at times 8:10,8:20,8:25,8:30 then the list of numbers you would pass to Burst will be (10,5,5).

Bursts can be used to model the popularity of words by defining what is the arrival and rate for words. For instance you could monitor RSS titles from Slashdot. If the word appears in a title you could consider that an arrival of that word. Your arrival rate could be defined as how many arrivals of a paticular word in a day or how many times a word appears per a heading (This would usually be less than 1)

With this information you would build the seperation lists that will be used processed by the Burst function.

new

  $burst=Statistics::Burst::new();

Returns a burst object.

setState($lamba, [$index])

     $burst->setState(.112);
     $burst->setState(.24,1);

Allows you to set or change the lamba of a paticular state. If $index is not specified then it creates a new state.

generateStates($count,$rate, $sigma)

     $burst->generateStates(4,.123,1);

Allows developer to generate the states programatically. You specify the number of states to create [$count], The initial rate for state 0 [$rate], and sigma the parameter that defines how much the state changes.

The higher the sigma the larger the difference between states.

gamma($gamma)

     $burst->gamma(2,);

A parameter for the transistion cost. The larger the gamma value the more expensive it is to move to a higher state.

setData($gap_space)

     $burst->setData([4,5,6,3,4,2]);

Sets the data that will be processed.

process

     $burst->process();

Triggers the calculation of the bursts.

getStatesUsed

  $array_ref=$burst->getStatesUsed();

Returns the states of the automaton for each step in the data set.

AUTHOR ^

Copyright 2004-2005, Tommie M. Jones All Rights Reserved. This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

syntax highlighting: