The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Biblio::WebPortal - Perl extension for Digital Library support

SYNOPSIS

  use Bilio::WebPortal;

  $a = mkdiglib($conf)

  $a->search( term => 'animal', regexp => 'water' );

  $a->asHTML();
  $a->asLaTeX();

  print $diglib->navigate(%vars);

DESCRIPTION

Biblio::WebPortal uses Biblio::Thesaurus and a configuration file to manage digital libraries in a simple way. For this purpose, we define a digital library as a set of searchable catalogs and an ontology for that subject. Biblio::WebPortal configuration file has a list of catalogs with their respective parse information.

To this be possible, it should be some way to access any kind of catalog: a plain text file, XML document, SQL database or anything else. The only method possible is to define functions to convert these implementation techniques into a mathematical definition. So, the user should give four functions to this module to it be capable of use the catalog. These functions are:

split the catalog

Given a string (say, a catalog identifier) the function should return a Perl array with all catalog entries. This array should be the same everytime the function is called for the same catalog to maintain some type of indexing. The function can use this string as a filename, a SQL table identifier or anything else the function can understand.

terms for an entry

Given an entry with the format returned by the previous function, this function should return a list of terms related to the object catalogued by this entry. These terms will be used latter for thesaurus integration.

html from the entry

Given an entry, return a piece of HTML code to be embebed when listing records.

text from the entry

Given an entry, return the searchable text it includes.

The following example shows a sample configuration file:

  $userconf = {
    catalog   => "/var/library/catalog.xml",
    thesaurus => "/var/library/thesaurus",
    name => 'libraryName',
    catsyn  => {
       asList => sub{ my $file=shift;
                 my $t=`cat $file`;
                 return ($t =~ m{(<entry.*?</entry>)}gs); },
       asRelations => sub{ my $f=shift;
                 my $data;
                 while($f =~ m{<rel\s+tipo='(.*?)'>(.*?)</rel>}g)
                    { push @{$data->{$1}}, $2; }
                 $data; },
       asHTML => sub{ my $f=shift; &mp::cat::fichacat2html($f)},
       asLaTeX => sub{ ... },
       asText => sub{ my $f=shift;
                 $f =~ s{</?\w+}{ }g;
                 $f =~ s/(\s*[\n>"'])+\s*/,/g;
                 $f =~ s/\w+=//g;
                 $f =~ s/\s{2,}/ /g;
                 $f }  }  };

When using the mkdiglib function with this configuration information, the module will create a set of files with cached data for quick response, inside a libraryName directory. This function returns a library object.

The configuration file can refer to more than one catalog file. This is done with the following syntax:

  $userconf = {
    thesaurus => "/var/library/thesaurus",
    name => 'libraryName',
    catalog   => [
      { file => "/var/library/catalog.xml",
        type => {
           asList => sub{ ... },
           asRelations => sub{ ... },
           asHTML => sub{ ... },
           asText => sub{ ... },
           asLaTeX => sub{ ... },
        } },
      { file => ["/var/library/data1.db", "/var/library/data2.db"],
        type => {
           asList => sub{ ... },
           asRelations => sub{ ... },
           asHTML => sub{ ... },
           asText => sub{ ... },
        } },    ] }

After creating the object, we can open it on another script with the opendiglib command wich receives the base name of the digital library. The base name is the path where it was created concatenated with the identifier used.

The most common way to use the digital library is to build a script like:

  use Biblio::WebPortal;
  use CGI qw/:standard :cgi-bin/;

  my $library = "/var/library/libraryName";
  my %vars = Vars();

  print header;
  my $diglib = Biblio::WebPortal::opendiglib( { name => $library } );

  print $diglib->navigate(%vars);

The following attributes can be used in conjuntion with the previous configuration:

scriptname

This should be used whenever the module can't detect the correct script name on the navigate method. Use it to point to the correct place.

bt_next_txt

Set this attribute to the string you want to see with the link to the next page of search results;

bt_prev_txt

Set this attribute to the string you want to see with the link to the previous page of search results;

Note that configuration options from Biblio::Thesaurus navigate method are allowed in the configuration file;

Module Interface

This method is used to navigate over a digital library. It should be called with the hash of variables passed by the CGI;

The Digital Library directory

When creating a Biblio::WebPortal object (a digital library), a directory is created, with the name given in the configuration file. This directory contais a set of files, each one of them with already processed information.

catalogs.index

This is a text file. It contains a map between integers and processed catalogs. Each line consists of a sequential integer (beginning in 0), two dots and a fullpath to the catalog file.

All other databases will use that integer when referring to a catalog.

   0:/home/user/diglib/catalog1.xml
   1:/home/user/diglib/catalog2.xml
entry-catalog.index

Another text file which maps digital library identifiers to entries in each different catalog. Biblio::WebPortal will assign a different integer to each entry, no matter the catalog it is from. This file contains, in each line, the entry identifier in the digital library, two dots, the identifier of the catalog it cames from (the identifier defined in the catalogs.index file), a dot, and a number indicating the entry order in the respective catalog. Note that this order starts at 0, like the catalogs identifiers.

   1:0.0
   2:0.1
   3:0.2
   4:0.3
   5:0.4
   6:1.0
   7:1.1
   8:1.2
html.db

This is a Berkeley DB file where keys are the entry identifiers defined in entry-catalog.index file. For each key, the database stores a pre-calculated HTML version for the entry (using the asHTML function shown in previous section).

latex.db

This is a Berkeley DB file where keys are the entry identifiers defined in entry-catalog.index file. For each key, the database stores a pre-calculated LaTeX version for the entry (using the asLaTeX function shown in previous section).

relation.index

A text file mapping entry identifiers into relation terms. Each line contains the entry identifier, a mark, and a list of classification terms.

relations.db

...

relations.list

This is a text file where each line contains a term. These are the classification terms used in all catalogs.

relations.statistics

Contains the same thing as relations.list, but each term is followed by a mark and an occurrence number.

text.index

This text file contains in each line the entry identifier, a mark, and the text version for the entry, calculated using the asText function shown in the previous section.

thesaurus.log

This is a text file in thesaurus format which maps to the term 'Others' all classification terms found on catalogs but does not exists in the thesaurus file.

thesaurus.store

This is a data dump format (Storable perl module) for the full thesaurus struture. It is used as a cache for quick read when using a navigation enabled thesaurus web page.

AUTHOR

José João Almeida <jj@di.uminho.pt>

Alberto Simões <albie@alfarrabio.di.uminho.pt>

SEE ALSO

Manpages: Biblio::Thesaurus(3) Biblio::Catalog(3) perl(1)

1 POD Error

The following errors were encountered while parsing the POD:

Around line 960:

Non-ASCII character seen before =encoding in 'José'. Assuming CP1252