XWI.pm - class for internal representation of a document record
use Combine::XWI; $xwi = new Combine::XWI; #single value record variables $xwi->server($server); my $server = $xwi->server(); #original content $xwi->content(\$html); my $text = ${$xwi->content()}; #multiple value record variables $xwi->meta_add($name1,$value1); $xwi->meta_add($name2,$value2); $xwi->meta_rewind; my ($name,$content); while (1) { ($name,$content) = $xwi->meta_get; last unless $name; }
Provides methods for storing and retrieving structured records representing crawled documents.
Saves $val using AUTOLOAD. Can later be retrieved, eg
$xwi->MyVar('My value'); $t = $xwi->MyVar;
will set $t to 'My value'
Forget all values.
*_get will start with the first value.
stores values into the datastructure
retrieves values from the datastructure
Stores the content of Meta-tags
Takes/Returns 2 parameters: Name, Content
$xwi->meta_add($name1,$value1); $xwi->meta_add($name2,$value2); $xwi->meta_rewind; my ($name,$content); while (1) { ($name,$content) = $xwi->meta_get; last unless $name; }
Extended information from Meta-tags. Not used.
Stores all URLs (ie if multiple URLs for the same page) for this record
Takes/Returns 1 parameter: URL
Stores headings from HTML documents
Takes/Returns 1 parameter: Heading text
Stores links from documents
Takes/Returns 5 parameters: URL, netlocid, urlid, Anchor text, Link type
Stores calculated information, like genre, language, etc
Takes/Returns 2 parameters Name, Value. Both are strings with max length Name: 15, Value: 20
Stores result of topic classification.
Takes/Returns 5 parameters: Class, Absolute score, Normalized score, Terms, Algorithm id
Class, Terms, and Algorithm id are strings with max lengths Class: 50, and Algorithm id: 25
Absolute score, and Normalized score are integers
Normalized score and Terms are optional and may be replaced with 0, and '' respectively
Combine focused crawler main site http://combine.it.lth.se/
Yong Cao <tsao@munin.ub2.lu.se> v0.05 1997-03-13
Anders Ardö, <anders.ardo@it.lth.se>
Copyright (C) 2005,2006 Anders Ardö
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available.
See the file LICENCE included in the distribution at http://combine.it.lth.se/
1 POD Error
The following errors were encountered while parsing the POD:
Non-ASCII character seen before =encoding in 'Ardö,'. Assuming CP1252
To install Combine::UA, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Combine::UA
CPAN shell
perl -MCPAN -e shell install Combine::UA
For more information on module installation, please visit the detailed CPAN module installation guide.