NAME

Search::Indexer::Incremental::MD5::Indexer - Incrementally index your files

SYNOPSIS

  use File::Find::Rule ;
  
  use Readonly ;
  Readonly my $DEFAUT_MAX_FILE_SIZE_INDEXING_THRESHOLD => 300 << 10 ; # 300KB
  
  my $indexer 
        = Search::Indexer::Incremental::MD5::Indexer->new
                (
                USE_POSITIONS => 1, 
                INDEX_DIRECTORY => 'text_index', 
                get_perl_word_regex_and_stop_words(),
                ) ;
  
  my @files = File::Find::Rule
                ->file()
                ->name( '*.pm', '*.pod' )
                ->size( "<=$DEFAUT_MAX_FILE_SIZE_INDEXING_THRESHOLD" )
                ->not_name(qr[auto | unicore | DateTime/TimeZone | DateTime/Locale])
                ->in('.') ;
  
  indexer->add_files(@files) ;
  indexer->add_files(@more_files) ;
  indexer = undef ;

DESCRIPTION

This module implements an incremental text indexer and searcher based on Search::Indexer.

DOCUMENTATION

Given a list of files, this module will allow you to create an indexed text database that you can later query for matches. You can also use the siim command line application installed with this module.

SUBROUTINES/METHODS

new( %named_arguments)

Create a Search::Indexer::Incremental::MD5::Indexer object.

  my $indexer = new Search::Indexer::Incremental::MD5::Indexer(%named_arguments) ;

Arguments - %named_arguments

%named_arguments -

Returns - A Search::Indexer::Incremental::MD5::Indexer object

Exceptions -

Incomplete argument list
Error creating index directory
Error creating index metadata database
Error creating a Search::Indexer object

add_files($self, %named_arguments)

Adds the contents of the files passed as arguments to the index database. Files already indexed are checked and re-indexed only if their content has changed

Arguments %named_arguments

FILES - Array reference - a list of files to add to the index. The file can either be a:

Scalar - The name of the file to indexed

Hash reference - this is, for example, useful when you want to index the contents of a tarball

NAME - The name of the file to indexed
DESCRIPTION - A user specific description string to be saved within the database

MAXIMUM_DOCUMENT_SIZE - Integer - a warning is displayed for document with greater size

DONE_ONE_FILE_CALLBACK - sub reference - called every time a file is handled

$file_name - the name of the file re-indexed

$file_description - user specific description of the name

$file_info - Hash reference

STATE - Boolean -

0 - up to date, no re-indexing necessary

1 - file content changed since last index, re-indexed
ID - integer - document id
TIME - Float - re_indexing time

Returns - Hash reference keyed on the file name

STATE - Boolean -

0 - up to date, no re-indexing necessary

1 - file content changed since last index, re-indexed

2 - new file
ID - integer - document id
TIME - Float - re-indexing time

Exceptions

add_file($self, $name, $description)

Arguments

$self -
$name -
$description

Returns - Hash reference containing

STATE - Boolean -

0 - up to date, no re-indexing necessary

1 - file content changed since last index, re-indexed

2 - new file
ID - integer - document id
TIME - Float - re-indexing time

Exceptions

remove_files(%named_arguments)

removes the contents of the files passed as arguments from the index database.

Arguments %named_arguments

FILES - Array reference - a list of files to remove from to the index

DONE_ONE_FILE_CALLBACK - sub reference - called every time a file is handled

$file_name - the name of the file removed

$file_description - description of the file

$file_info - Hash reference

STATE - Boolean -

0 - file not found

1 - file found and removed
ID - integer - document id
TIME - Float - removal time

Returns - Hash reference keyed on the file name

STATE - Boolean -

0 - file found and removed

1 - file not found
ID - integer - document id
TIME - Float - re-indexing time

Exceptions

remove_document_with_id($id, $content)

removes the contents of the files passed as arguments

Arguments

$id - The id of the document to remove from the database
$content - The contents of the document or undef

Returns - Nothing

Exceptions - None

check_indexed_files(%named_arguments)

Checks the index database contents.

Arguments %named_arguments

DONE_ONE_FILE_CALLBACK - sub reference - called every time a file is handled

$file_name - the name of the file being checked

$description - description of the file

$file_info - Hash reference containing

STATE - Boolean -

0 - file found and identical

1 - file found, content is different (needs re-indexing)

2 - file not found
ID - integer - document id
TIME - Float - check time

Returns - Hash reference keyed on the file name or nothing in void context

STATE - Boolean -

0 - file found and identical

1 - file found, content is different (needs re-indexing)

2 - file not found
ID - integer - document id
TIME - Float - check time

Exceptions - None

remove_reference_to_unexisting_documents()

Checks the index database contents and remove any reference to documents that don't exist.

Arguments - None

Returns - Array reference containing the named of the document that don't exist

Exceptions - None

BUGS AND LIMITATIONS

None so far.

AUTHOR

        Nadim ibn hamouda el Khemir
        CPAN ID: NKH
        mailto: nadim@cpan.org

LICENSE AND COPYRIGHT

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

SUPPORT

You can find documentation for this module with the perldoc command.

    perldoc Search::Indexer::Incremental::MD5

You can also look for information at:

AnnoCPAN: Annotated CPAN documentation

http://annocpan.org/dist/Search-Indexer-Incremental-MD5
RT: CPAN's request tracker

Please report any bugs or feature requests to L <bug-search-indexer-incremental-md5@rt.cpan.org>.

We will be notified, and then you'll automatically be notified of progress on your bug as we make changes.
Search CPAN

http://search.cpan.org/dist/Search-Indexer-Incremental-MD5

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)

NAME

SYNOPSIS

DESCRIPTION

DOCUMENTATION

SUBROUTINES/METHODS

new( %named_arguments)

add_files($self, %named_arguments)

add_file($self, $name, $description)

remove_files(%named_arguments)

remove_document_with_id($id, $content)

check_indexed_files(%named_arguments)

remove_reference_to_unexisting_documents()

BUGS AND LIMITATIONS

AUTHOR

LICENSE AND COPYRIGHT

SUPPORT

SEE ALSO

Module Install Instructions