The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.
package Search::Fulltext;
use strict;
use warnings;
use utf8;

use Carp;

our $VERSION = '1.03';
use Search::Fulltext::SQLite;

sub new {
    my ($class, @args) = @_;
    my %args = ref $args[0] eq 'HASH' ? %{$args[0]} : @args;

    unless ($args{docs}) { croak "'docs' is required for creating new instance of $class" }
    $args{index_file} = ":memory:" unless defined $args{index_file};
    $args{tokenizer}  = "simple"   unless defined $args{tokenizer};
    $args{sqliteh}    = Search::Fulltext::SQLite::->new(
        docs      => $args{docs},
        dbfile    => $args{index_file},
        tokenizer => $args{tokenizer},
    );

    bless {
        %args
    }, $class;
}

sub search {
    my ($self, $query) = @_;
    return [] unless $query;

    my $sqliteh = $self->{sqliteh};
    $sqliteh->search_docids($query);
}

1;
__END__

=encoding utf8

=head1 NAME

Search::Fulltext - Fulltext search module

=head1 SYNOPSIS

    use Search::Fulltext;
    
    my @docs = (
        'I like beer the best',
        'Wine makes people saticefied',  # does not include beer
        'Beer makes people happy',
    );
    
    my $fts = Search::Fulltext->new({
        docs => \@docs,
    });
    my $results = $fts->search('beer');
    is_deeply($results, [0, 2]);         # 1st & 3rd doc include 'beer'
    my $results = $fts->search('beer AND happy');
    is_deeply($results, [2]);            # 3rd doc includes both 'beer' & 'happy'

=head1 DESCRIPTION

L<Search::Fulltext> is a fulltext search module. It can be used in a few steps.

L<Search::Fulltext> has B<pluggable tokenizer> feature, which possibly provides fulltext search for any language.
Currently, B<English> and B<Japanese> fulltext search are officially supported,
although any other languages which have spaces for separating words could be also used.
See L<CUSTOM TOKENIZERS|/CUSTOM_TOKENIZERS> section to learn how to search non-English languages.

B<SQLite>'s B<FTS4> is used as an indexer.
Various queries supported by FTS4 (C<AND>, C<OR>, C<NEAR>, ...) are fully provided.
See L</QUERIES> section for details.

=head1 METHODS

=head2 Search::Fulltext->new

=pod

Creates fulltext index for documents.

=over 4

=item C<@param docs> B<[required]>

Reference to array whose contents are document to be searched.

=item C<@param index_file> B<[optional]>

File path to write fulltext index. By default, on-memory index is used.

=item C<@param tokenizer> B<[optional]>

Tokenizer name to use. C<simple> (default) and C<porter> must be supported.
C<icu> and C<unicode61> could be used if your SQLite libarary used via L<DBD::SQLite> module support them.
See L<http://www.sqlite.org/fts3.html#tokenizer> for more details on FTS4 tokenizers.

Japanese tokenizer C<perl 'Search::Fulltext::Tokenizer::MeCab::tokenizer'> is also available after you install
L<Search::Fulltext::Tokenizer::MeCab> module.

See L<CUSTOM TOKENIZERS|/CUSTOM_TOKENIZERS> section for developing other tokenizers.

=back

=cut

=head2 Search::Fulltext->search

Search terms in documents by query language.

=pod

=over 4

=item C<@returns>

Array of indexes of C<docs> passed through C<< Search::Fulltext->new >> in which C<query> is matched.

=item C<@param query>

Query to search from documents.
See L</QUERIES> section for types of queries.

=back

=cut

=head1 QUERIES

The simplest query would be a term.

    my $results = $fts->search('beer');

Other queries below and combination of them can be also used.

    my $results = $fts->search('beer AND happy');
    my $results = $fts->search('saticefied OR happy');
    my $results = $fts->search('people NOT beer');
    my $results = $fts->search('make*');
    my $results = $fts->search('"makes people"');
    my $results = $fts->search('beer NEAR happy');
    my $results = $fts->search('beer NEAR/1 happy');

See L<http://www.sqlite.org/fts3.html#section_3> for an explanation of each type of query.

B<NOTE:> Some custom tokenizers might not support full of these queries above.
Check the document of each tokenizer before using complex queries.

=head1 CUSTOM TOKENIZERS

Custom tokenizers can be implemented by pure perl thanks to L<DBD::SQLite/Perl_tokenizers>.
L<Search::Fulltext::Tokenizer::MeCab> is an example of custom tokenizers.

See L<DBD::SQLite/Perl_tokenizers> and L<Search::Fulltext::Tokenizer::MeCab> module to learn how to develop custom tokenizers.

=head1 SUPPORTS

Bug reports and pull requests are welcome at L<https://github.com/laysakura/Search-Fulltext> !

=head1 VERSION

Version 1.03

=head1 AUTHOR

Sho Nakatani <lay.sakura@gmail.com>, a.k.a. @laysakura