
XWI.pm - class for internal representation of a document record

use Combine::XWI;
$xwi = new Combine::XWI;
#single value record variables
$xwi->server($server);
my $server = $xwi->server();
#original content
$xwi->content(\$html);
my $text = ${$xwi->content()};
#multiple value record variables
$xwi->meta_add($name1,$value1);
$xwi->meta_add($name2,$value2);
$xwi->meta_rewind;
my ($name,$content);
while (1) {
($name,$content) = $xwi->meta_get;
last unless $name;
}

Provides methods for storing and retrieving structured records representing crawled documents.

Saves $val using AUTOLOAD. Can later be retrieved, eg
$xwi->MyVar('My value');
$t = $xwi->MyVar;
will set $t to 'My value'
Forget all values.
*_get will start with the first value.
stores values into the datastructure
retrieves values from the datastructure
Stores the content of Meta-tags
Takes/Returns 2 parameters: Name, Content
$xwi->meta_add($name1,$value1);
$xwi->meta_add($name2,$value2);
$xwi->meta_rewind;
my ($name,$content);
while (1) {
($name,$content) = $xwi->meta_get;
last unless $name;
}
Extended information from Meta-tags. Not used.
Stores all URLs (ie if multiple URLs for the same page) for this record
Takes/Returns 1 parameter: URL
Stores headings from HTML documents
Takes/Returns 1 parameter: Heading text
Stores links from documents
Takes/Returns 5 parameters: URL, netlocid, urlid, Anchor text, Link type
Stores calculated information, like genre, language, etc
Takes/Returns 2 parameters Name, Value. Both are strings with max length Name: 15, Value: 20
Stores result of topic classification.
Takes/Returns 5 parameters: Class, Absolute score, Normalized score, Terms, Algorithm id
Class, Terms, and Algorithm id are strings with max lengths Class: 50, and Algorithm id: 25
Absolute score, and Normalized score are integers
Normalized score and Terms are optional and may be replaced with 0, and '' respectively

Combine focused crawler main site http://combine.it.lth.se/

Yong Cao <tsao@munin.ub2.lu.se> v0.05 1997-03-13
Anders Ardö, <anders.ardo@it.lth.se>

Copyright (C) 2005,2006 Anders Ardö
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available.
See the file LICENCE included in the distribution at http://combine.it.lth.se/