The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Search::ContextGraph - Run searches using a contextual network graph

SYNOPSIS

  use Search::ContextGraph;
  
  my $cg = Search::ContextGraph->new();
  $cg->load( 'file.tdm' );
  
  my %results = $cg->search( 'd1', 't34', 't12' );
  

DESCRIPTION

Search a document collection using a spreading activation search. The search algorithm represents the collection as a set of term and document nodes, connected to one another based on a co-occurrence matrix. If a word occurs in a document, we create an edge between the appropriate term and document node. Searches take place by spreading energy from a query node along the edges of the graph according to some simple rules. All result nodes exceeding a threshold T are returned. You can read a full description of this algorithm at http://www.nitle.org/papers/Contextual_Network_Graph.pdf.

The search engine gives expanded recall (relevant results even when there is no keyword match) without incurring the kind of computational and patent issues inflicted by latent semantic indexing (LSI).

METHODS

new %PARAMS

Object constructor. Parameters include:

debug

Turns verbose mode on when true

energy

initial starting energy, default 10000

threshold

Cutoff value for propagating energy to neighbor nodes

[get|set]_threshold

Accessor for threshold value. This value determines how far energy can spread in the graph

[get|set]_initial_energy

Accessor for initial energy value at the query node. The higher this value, the larger the result set

load TDM_FILE [, LM_FILE]

Opens and loads a term-document matrix (TDM) file to initialize the graph. Optionally also opens and loads a document link matrix (DLM) file of document-to-document links. The TDM encodes information about term-to-document links, while the DLM file holds information about inter-document links, like hyperlinks or citation data. For notes on these file formats, see the README file Note that document-document links are NOT YET IMPLEMENTED.

search @NODES

Given a list of nodes, returns a hash of nearest nodes with relevance values, in the format NODE => RELEVANCE, for all nodes above the threshold value. Term nodes are prefixed by 't', document nodes are prefixed by 'd'. It's your job to keep some kind of node index to value map handy.

set_debug_mode

Turns verbose comments on if given a true value as its argument

_read_tdm FILE

Opens and reads a term-document matrix (TDM) file. The format for this file is described in the README

_energize NODE, ENERGY

Private method. Assigns a starting energy ENERGY to NODE, and recursively distributes the energy to neighbor nodes.

BUGS

AUTHOR

Maciej Ceglowski <maciej@ceglowski.com>

COPYRIGHT AND LICENSE

(C) 2003 Maciej Ceglowski, John Cuadrado, NITLE

This program may be distributed under the same terms as Perl itself.