lib/Dezi/Tutorial.pod - metacpan.org

=pod

=head1 NAME

Dezi::Tutorial - getting started with the Dezi search platform

=head1 Installation

Install the Dezi server from CPAN:

 % cpan -i Dezi
 
Install the Dezi client from CPAN:

 % cpan -i Dezi::Client

=head1 Beginner - Hello World

Start the Dezi server:

 % dezi

In a separate terminal, add a small test document to the index:

 % echo '<doc><title>bar</title>hello world</doc>' > test.xml
 % dezi-client test.xml

Search the index to confirm your test document worked:

 % dezi-client -q bar
 
=head1 Intermediate - The Dezi Demo

The Intermediate tutorial details the specifics behind the Dezi
demo available at L<http://dezi.org/demo>.

=head2 Download the Reuters corpus

The Reuters News Corpus for Text Classification (Reuters-21578)
is a common document corpus used for information retrieval projects.
Other document collections have become more popular since the
Reuters corpus first appeared (e.g. Wikipedia database) but
the Reuters corpus is a nice, medium sized collection for demonstrating
Dezi.

You can find the corpus many places on the internet. The version
used for the demo came from L<http://svn.peknet.com/search_bench/>.
The C<2xml.pl> script at that URL will convert the original SGML
documents to valid XML and split them into about 21k individual documents.

Unpack the tar.gz file somewhere and run the C<2xml.pl> script as described
in the script's comments.

=head2 Create a Swish3 configuration file

As described in L<Dezi::Architecture>, Dezi is based on Swish3
L<http://swish-e.org/swish3/>. You can index the Reuters corpus
with the B<swish3> command that
comes with L<SWISH::Prog> (one of the Dezi dependencies).

First, you'll need a configuration file. Here's the one used
for the Dezi demo:

 DefaultContents XML*
 StoreDescription XML* <text> 10000
 PropertyNameAlias swishtitle title
 MetaNames dates topics people places orgs author swishdocpath
 PropertyNames dates topics people places orgs author dateline
 FuzzyIndexingMode Stemming_en1

Save the file as C<swish.conf>.

More details on Swish3 configuration can be found
at L<http://swish-e.org/docs/swish-config.html>.

=head2 Index the XML

If your Reuters docs are in a directory called C<reuters>,
you can create an index with a command like:

 % swish3 -c swish.conf -F lucy -f dezi.index -i reuters
 
You can index all kinds of document types, not just XML, but for
the purposes of this tutorial, we'll keep it simple.

=head2 Create a Dezi configuration file

Here's the contents of the demo config file, named C<dezi.config.pl>:

 {
    engine_config => {
        facets => { 
            names => [qw( topics people places orgs author )] 
        },
    },
    ui_class    => 'Dezi::UI',
    base_uri    => 'http://dezi.org/demo',
    username    => 'deziuser',
    password    => 'a-secret',
 }

B<NOTE> that the username/password is there to prevent unwanted
modification of the index. Since Dezi supports POST, PUT and DELETE
HTTP actions on an index, it's a good idea to protect an index,
particularly if it is on the open internet.

B<NOTE> too the C<Dezi::UI> class is enabled. That requires a separate
installation from CPAN.

 % cpan -i Dezi::UI

=head2 Start the Dezi server
 
 % dezi --dezi-config dezi.config.pl

From a separate terminal, you can search the index containing text from the Reuters
corpus:

 % dezi-client -q 'some words'

Thanks to the Dezi::UI module, you can also search via a web browser. Assuming
you are running the demo on a local machine, you can point your browser at
L<http://localhost:5000/ui> and explore the index contents graphically.

=head1 Advanced - Roll Your Own

=head2 Write your own client application

 % cat indexer.pl

 #!/usr/bin/env perl
 use strict;
 use warnings;
 
 use Dezi::Client;
 use File::Find;
 
 my $client = Dezi::Client->new( 
    server => 'http://localhost:5000' 
 );
 
 find({ 
    wanted      => \&add_to_index, 
    follow      => 1, 
    no_chdir    => 1,
 }, @ARGV);
 
 my $resp = $client->commit();

 print $resp->content;

 sub add_to_index {
    my $file = $File::Find::name;
    
    # we only want .xml files
    return unless $file =~ m/\.xml$/;
    
    my $resp = $client->index($file);
    if (!$resp->is_success) {
        die "Failed to index $file: " . $resp->status_line;
    }
 }

=head2 Start your Dezi server

 % dezi 

=head2 Run your indexer

In a separate terminal:

 % perl indexer.pl path/to/xml/docs

=head2 Search with dezi-client

After you're done indexing, look for something:

 % dezi-client -q foo

=head1 AUTHOR

Peter Karman, C<< <karman at cpan.org> >>

=head1 BUGS

Please report any bugs or feature requests to C<bug-dezi at rt.cpan.org>, or through
the web interface at L<http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Dezi>.  I will be notified, and then you'll
automatically be notified of progress on your bug as I make changes.

=head1 SUPPORT

You can find this documentation with the perldoc command.

    perldoc Dezi::Tutorial

You can also look for information at:

=over 4

=item * Website

L<http://dezi.org/>

=item * IRC

#dezisearch at freenode

=item * Mailing list

L<https://groups.google.com/forum/#!forum/dezi-search>

=item * RT: CPAN's request tracker

L<http://rt.cpan.org/NoAuth/Bugs.html?Dist=Dezi>

=item * AnnoCPAN: Annotated CPAN documentation

L<http://annocpan.org/dist/Dezi>

=item * CPAN Ratings

L<http://cpanratings.perl.org/d/Dezi>

=item * Search CPAN

L<http://search.cpan.org/dist/Dezi/>

=back

=head1 COPYRIGHT & LICENSE

Copyright 2011 Peter Karman.

This program is free software; you can redistribute it and/or modify it
under the terms of either: the GNU General Public License as published
by the Free Software Foundation; or the Artistic License.

See http://dev.perl.org/licenses/ for more information.

=head1 SEE ALSO

L<Dezi::Client>, L<Search::OpenSearch>, L<SWISH::3>, L<SWISH::Prog::Lucy>,
L<Plack>, L<Lucy>

=cut
	Global
`s`	Focus search bar
`?`	Bring up this help dialog
	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)
	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse
	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)