Peter Karman > SWISH-3-1.000006 > SWISH::3

Download:
SWISH-3-1.000006.tar.gz

Dependencies

Annotate this POD

Website

View/Report Bugs
Module Version: 1.000006   Source   Latest Release: SWISH-3-1.000009

NAME ^

SWISH::3 - Perl interface to libswish3

SYNOPSIS ^

 use SWISH::3;
 my $swish3 = SWISH::3->new(
                config      => 'path/to/config.xml',
                handler     => \&my_handler,
                regex       => qr/\w+(?:'\w+)*/,
                );
 $swish3->parse( 'path/to/file.xml' )
    or die "failed to parse file: " . $swish3->error;
 
 printf "libxml2 version %s\n", $swish3->xml2_version;
 printf "libswish3 version %s\n", $swish3->version;

DESCRIPTION ^

SWISH::3 is a Perl interface to the libswish3 C library.

CONSTANTS ^

All the SWISH_* constants defined in libswish3.h are available and can be optionally imported with the :constants keyword.

 use SWISH::3 qw(:constants);

See the SWISH::3::Constants section below.

In addition, the SWISH::3 Perl class defines some Perl-only constants:

SWISH_DOC_FIELDS

An array of method names that can be called on a SWISH::3::Doc object in your handler method.

SWISH_TOKEN_FIELDS

An array of method names that can be called on a SWISH::3::Token object.

SWISH_DOC_FIELDS_MAP

A hashref of method names to id integer values. The integer values are assigned in libswish3.h.

SWISH_DOC_PROP_MAP

A hashref of built-in property names to docinfo attribute names. The values of SWISH_DOC_PROP_MAP are the keys of SWISH_DOC_FIELDS_MAP.

FUNCTIONS ^

default_handler

The handler used if you do not specify one. By default is simply prints the contents of SWISH::3::Data to stderr.

CLASS METHODS ^

new( args )

args should be an array of key/value pairs. See SYNOPSIS.

Returns a new SWISH::3 instance.

xml2_version

Returns the libxml2 version used by libswish3.

version

Returns the libswish3 version.

refcount( object )

Returns the Perl reference count for object.

wc_report( codepoint )

Prints a isw* summary to stderr for codepoint. codepoint should be a positive integer representing a Unicode codepoint.

This prints a report similar to the swish_isw.c example script.

slurp( filename )

Returns the contents of filename as a scalar string. May also be called as an object method.

OBJECT METHODS ^

parse( filename_or_filehandle_or_string )

Wrapper around parse_file(), parse_buffer() and parse_fh() that tries to Do the Right Thing.

parse_file( filename )

Calls the C function of the same name on filename.

parse_buffer( str )

Calls the C function of the same name on str. Note that str should contain the API headers.

parse_fh( filehandle )

Not yet implemented.

error

Returns the error message from the last call to parse(), parse_file() parse_buffer() or parse_fh(). If there was no error on the last call to one of those methods, returns undef.

set_config( swish_3_config )

Set the Config object.

get_config

Returns SWISH::3::Config object.

config

Alias for get_config().

set_analyzer( swish_3_analyzer )

Set the Analyzer object.

get_analyzer

Returns SWISH::3::Analyzer object.

analyzer

Alias for get_analyzer()

set_parser( swish_3_parser )

Set the Parser object.

get_parser

Returns SWISH::3::Parser object.

parser

Alias for get_parser().

set_handler( \&handler )

Set the parser handler CODE ref.

get_handler

Returns a CODE ref for the handler.

set_data_class( class_name )

Default class_name is SWISH::3::Data.

get_data_class

Returns class name.

set_parser_class( class_name )

Default class_name is SWISH::3::Parser.

get_parser_class

Returns class name.

set_config_class( class_name )

Default class_name is SWISH::3::Config.

get_config_class

Returns class name.

set_analyzer_class( class_name )

Default class_name is SWISH::3::Analyzer.

get_analyzer_class

Returns class name.

set_regex( qr/\w+(?:'\w+)*/ )

Set the regex used in tokenize().

get_regex

Returns the regex used in tokenize().

regex

Alias for get_regex().

get_stash

Returns the SWISH::3::Stash object used internally by the SWISH::3 object. You typically do not need to access this object as a user of SWISH::3, but if you are developing code that needs to access objects within a handler function, you can put it in the Stash object and then retrieve it later.

Example:

 my $s3    = SWISH::3->new( handler => \&handler );
 my $stash = $s3->get_stash();
 $stash->set('my_indexer' => $indexer);
 
 # later..
 sub handler {
     my $data  = shift;
     my $indexer = $data->s3->get_stash->get('my_indexer');
     $indexer->add_doc( $data );
 }

tokenize( string [, metaname, context ] )

Returns a SWISH::3::TokenIterator object representing string. The tokenizer uses the regex defined in set_regex().

tokenize_native( string [, metaname, context ] )

Returns a SWISH::3::TokenIterator object representing string. The tokenizer uses the built-in libswish3 tokenizer, not a regex.

DEVELOPER METHODS ^

ref_cnt

Returns the internal reference count for the underlying C struct pointer.

debug([n])

Get/set the internal debugging level.

describe( object )

Like calling Devel::Peek::Dump on object.

mem_debug

Calls the C function swish_memcount_debug().

get_memcount

Returns the global C malloc counter value.

dump

A wrapper around describe() and Data::Dump::dump().

SWISH::3::Analyzer ^

new( swish_3_config )

Returns a new SWISH::3::Analyzer instance.

set_regex( qr/\w+/ )

Set the regex used in SWISH::3->tokenize().

get_regex

Returns a qr// regex object.

get_tokenize

Get the tokenize flag. Default is true.

set_tokenize( 0|1 )

Toggle the tokenize flag. Default is true (tokenize contents when file is parsed).

SWISH::3::Config ^

set_default

set_properties

get_properties

set_metanames

get_metanames

set_mimes

get_mimes

set_parsers

get_parsers

set_aliases

get_aliases

set_index

get_index

set_misc

get_misc

debug

add(file_or_xml)

An alias for add() is merge().

delete

delete() is NOT YET IMPLEMENTED.

read( filename )

write( filename )

SWISH::3::Data ^

s3

Get the parent SWISH::3 object.

config

Get the parent SWISH::3::Config object.

property( name )

Returns the string value of PropertyName name.

metaname( name )

Returns the string value of MetaName name.

properties

Returns a hashref of name/value pairs.

metanames

Returns a hashref of name/value pairs.

doc

Returns a SWISH::3::Doc object.

tokens

Returns a SWISH::3::TokenIterator object.

SWISH::3::Doc ^

mtime

Returns the last modified time as epoch int.

size

Returns the size in bytes.

nwords

Returns the number of tokenized words in the Doc.

encoding

Returns the string encoding of Doc.

uri

Returns the URI value.

ext

Returns the file extension.

mime

Returns the mime type.

parser

Returns the name of the parser used (TXT, HTML, or XML).

action

Returns the intended action (e.g., add, delete, update).

SWISH::3::MetaName ^

new( name )

Returns a new SWISH::3::MetaName instance.

TODO: there are no set methods so this isn't of much use.

id

Returrns the id integer.

name

Returns the name string.

bias

Returns the bias integer.

alias_for

Returns the alias_for string.

SWISH::3::MetaNameHash ^

get( name )

Get the SWISH::3::MetaName object for name

set( name, swish_3_metaname )

Set the SWISH::3::MetaName for name.

keys

Returns array of names.

SWISH::3::Property ^

id

Returns the id integer.

name

Returns the name string.

ignore_case

Returns the ignore_case boolean.

type

Returns the type integer.

verbatim

Returns the verbatim boolean.

max

Returns the max integer.

sort

Returns the sort boolean.

alias_for

Returns the alias_for string.

SWISH::3::PropertyHash ^

get( name )

Get the SWISH::3::Property object for name

set( name, swish_3_property )

Set the SWISH::3::Property for name.

keys

Returns array of names.

SWISH::3::Stash ^

get( key )

set( key, value )

keys

values

SWISH::3::Token ^

value

Returns the value string.

meta

Returns the SWISH::3::MetaName object for the Token.

meta_id

Returns the id integer for the related MetaName.

context

Returns the context string.

pos

Returns the position integer.

len

Returns the length in bytes of the Token.

SWISH::3::TokenIterator ^

next

Returns the next SWISH::3::Token.

SWISH::3::xml2Hash ^

get( key )

set( key, value )

keys

SWISH::3::Constants ^

The following constants are imported directly from libswish3 and are defined there.

SWISH_ALIAS
SWISH_BODY_TAG
SWISH_BUFFER_CHUNK_SIZE
SWISH_CASCADE_META_CONTEXT
SWISH_CLASS_ATTRIBUTES
SWISH_CONTRACTIONS
SWISH_DATE_FORMAT_STRING
SWISH_DEFAULT_ENCODING
SWISH_DEFAULT_METANAME
SWISH_DEFAULT_MIME
SWISH_DEFAULT_PARSER
SWISH_DEFAULT_PARSER_TYPE
SWISH_DEFAULT_VALUE
SWISH_DOM_CHAR
SWISH_DOM_STR
SWISH_ENCODING_ERROR
SWISH_ESTRAIER_FORMAT
SWISH_EXT_SEP
SWISH_FALSE
SWISH_FOLLOW_XINCLUDE
SWISH_HEADER_FILE
SWISH_HEADER_ROOT
SWISH_IGNORE_XMLNS
SWISH_INCLUDE_FILE
SWISH_INDEX
SWISH_INDEX_FILEFORMAT
SWISH_INDEX_FILENAME
SWISH_INDEX_FORMAT
SWISH_INDEX_LOCALE
SWISH_INDEX_STEMMER_LANG
SWISH_INDEX_NAME
SWISH_KINOSEARCH_FORMAT
SWISH_LATIN1_ENCODING
SWISH_LOCALE
SWISH_LUCY_FORMAT
SWISH_MAXSTRLEN
SWISH_MAX_FILE_LEN
SWISH_MAX_HEADERS
SWISH_MAX_SORT_STRING_LEN
SWISH_MAX_WORD_LEN
SWISH_META
SWISH_MIME
SWISH_MIN_WORD_LEN
SWISH_PARSERS
SWISH_PARSER_HTML
SWISH_PARSER_TXT
SWISH_PARSER_XML
SWISH_PATH_SEP_STR
SWISH_PREFIX_MTIME
SWISH_PREFIX_URL
SWISH_PROP
SWISH_PROP_DATE
SWISH_PROP_DBFILE
SWISH_PROP_DESCRIPTION
SWISH_PROP_DOCID
SWISH_PROP_DOCPATH
SWISH_PROP_ENCODING
SWISH_PROP_INT
SWISH_PROP_MIME
SWISH_PROP_MTIME
SWISH_PROP_NWORDS
SWISH_PROP_PARSER
SWISH_PROP_RANK
SWISH_PROP_RECCNT
SWISH_PROP_SIZE
SWISH_PROP_STRING
SWISH_PROP_TITLE
SWISH_RD_BUFFER_SIZE
SWISH_SPECIAL_ARG
SWISH_STACK_SIZE
SWISH_SWISH_FORMAT
SWISH_TITLE_METANAME
SWISH_TITLE_TAG
SWISH_TOKENIZE
SWISH_TOKENPOS_BUMPER
SWISH_TOKEN_LIST_SIZE
SWISH_TRUE
SWISH_UNDEFINED_METATAGS
SWISH_UNDEFINED_XML_ATTRIBUTES
SWISH_URL_LENGTH
SWISH_VERSION
SWISH_WORDS
SWISH_XAPIAN_FORMAT

AUTHOR ^

Peter Karman perl@peknet.com

COPYRIGHT ^

Copyright 2010 Peter Karman.

This file is part of libswish3.

libswish3 is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

libswish3 is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

SEE ALSO ^

http://swish-e.org/

SWISH::Prog

syntax highlighting: