Myron Turner > Net-Z3950-AsyncZ-0.10 > doc/Examples.pod

Download:
Net-Z3950-AsyncZ-0.10.tar.gz

Annotate this POD

CPAN RT

New  1
Open  0
View/Report Bugs
Source  

Net::Z3950::AsyncZ By Example ^

Introduction

Net::Z3950::AsyncZ adds an additional layer of asynchronous support to the Z3950 module through the use of multiple forked processes. Users may also find that it is a useful front end to Z3950. Detailed descriptions of the mechanics, objects, and methods of AsyncZ can be found in the accompanying documentation:

AsyncZ.pod =item Options.pod =item Report.pod =back

What follows are annotated versions of the example scripts. I start with the basic.pl, which uses the the basics needed to run AsyncZ, and move up through the scripts, each of which adds features to the one previous in this order:   basic.pl, basic_pretty.pl, more_pretty.pl, options.pl

Since each script builds upon the one previous, the only script which is quoted in full is basic.pl. For subsequent scripts, I quote the code added to the predecessor.

basic.pl ^

Main Routine

   use Net::Z3950::AsyncZ qw(isZ_Error);                        # [1]
                                                        
   my @servers = (                                      # [2]
                [ 'amicus.nlc-bnc.ca', 210, 'NL'],      
                ['bison.umanitoba.ca', 210, 'MARION'],
                [ 'library.anu.edu.au', 210, 'INNOPAC' ],
                ['130.17.3.75', 210, 'MAIN*BIBMAST'],                   
                [ 'library.usc.edu', 2200,'unicorn'],
                [ 'z3950.loc.gov', 7090, 'Voyager' ],
                [ 'fc1n01e.fcla.edu', 210, 'FI' ],
                [ 'axp.aacpl.lib.md.us', 210, 'MARION'],
                [ 'jasper.acadiau.ca', 2200, 'UNICORN']
          );

  my $query = '  @attr 1=1003  "Henry James" ';  # [3]

                                                 # [4] 
  my $asyncZ = Net::Z3950::AsyncZ->new( servers=>\@servers,query=>$query,
                                                cb=>\&output );  
  showErrors($asyncZ);                           # [5]

          exit;
        
          #------END MAIN------#  
  1. Load Net::Z3950::AsyncZ and import isZ_Error, which is a class method that we will use in the error handling subroutine showErrors().
  2. Create an array of servers to which we want to send our query. This consists of an array of references to a three-element anonymous array: $host, $port, and $database_name. This is the same structure which is used in Net::Z3950.
  3. Create a query. This query uses the YAZ toolkit's PQN ('prefix') syntax.
  4. Create a Net::Z3950::AsyncZ object, using named-parameters; in addition to passing servers and query into the contrcutor, we also pass in a reference to a callback function which will be called by Net::Z3950::AsyncZ whenever new records become available--it will be up to the callback function to output the records to terminal or browser.
  5. Call showErrors, a subroutine which will output error messages, in the event that some of the servers fail to respond or to return records. We pass in the reference to the Net::Z3950::AsyncZ object, which showErrors() will need to access the errors.
  1. Load Net::Z3950::AsyncZ and import isZ_Error, which is a class method that we will use in the error handling subroutine showErrors().
  2. Create an array of servers to which we want to send our query. This consists of an array of references to a three-element anonymous array: $host, $port, and $database_name. This is the same structure which is used in Net::Z3950.
  3. Create a query. This query uses the YAZ toolkit's PQN syntax: 'prefix'
  4. Create a Net::Z3950::AsyncZ object, using named-parameters; in addition to passing servers and query into the contrcutor, we also pass in a reference to a callback function which will be called by Net::Z3950::AsyncZ whenever new records become available--it will be up to the callback function to output the records to terminal or browser.
  5. Call showErrors, a subroutine which will output error messages, in the event that some of the servers fail to respond or to return records. We pass in the reference to the Net::Z3950::AsyncZ object, which showErrors() will need to access the errors.

Subroutines

the output function
          sub output {
           my($index, $array) = @_;             # [1]
           foreach my $line(@$array) {          # [2]
             print "$line\n" if $line;          # [3]   
            }
           print "\n--------\n\n";    
          }     
  1. Get the two parameters which AsyncZ passes to the output function, namely the index of the server which is reporting back and a reference to an array of record data for the server. This array will contain one or more records.
    See Output Callback in AsyncZ.html.
  2. Each array element represents a separate line of record output: retrieve each line from the array.
  3. If the line is not null, print it. (The check for a null line is a carry over from an earlier state of AsyncZ: I don't think it's needed anymore but I keep it just in case.)
  1. Get the two parameters which AsyncZ passes to the output function, namely the index of the server which is reporting back and a reference to an array of record data for the server. This array will contain one or more records. See Output Callbackin AsyncZ.html or AsyncZ.pod.
  2. Each array element represents a separate line of record output: retrieve each line from the array.
  3. If the line is not null, print it. (The check for a null line is a carry over from an earlier state of AsyncZ: I don't think it's needed anymore but I keep it just in case.)
the error-handling function
  sub showErrors {
   my $asyncZ = shift;                  # [1]

   print "The following servers have not responded to your query: \n";

   for(my $i=0; $i< $asyncZ->getMaxErrors();$i++) {     # [2]
          my $err = $asyncZ->getErrors($i);             # [3]
          next if !isZ_Error($err);                     # [4]
          print "$servers[$i]->[0]\n";                  # [5]
          print "  $err->[0]->{msg}\n" if $err->[0]->{msg};  # [6]
          print "  $err->[1]->{msg}\n" if $err->[1]->{msg};  # [7]
        }
      
  }
  1. Get the reference to the AsyncZ object.
  2. Get the total number of errors reported and loop through them.
  3. For each server there are two possible errors, since for some servers we make a second attempt to get results if the first attempt fails. (See "Basic Mechanisms of Net::Z3950::AsyncZ" in AsyncZ.pod.) $err is a reference to an anonymous array which may hold 1 or 2 references to Net:AsyncZ::ErrMsg objects, which store all the necessary info about these errors. (See Net::Z3950::AsyncZ::ErrMsg in AsyncZ.pod)

    So, get the errors array for the current index.

  4. Check to see whether in fact an error occurred--we may not have gotten any records back on the first attempt; consequently, we may have an error for attempt 1. But we may have been successful on attempt 2 and so the first error is nullified. Or we may have had an error on the first attempt which was fatal and so no second attempt was made. isZ_Error will tell us what happend.
  5. If we've got this far, then there's been some kind of error. So, let's tell our user the name of the server that failed to return results: we pick that up from the @servers array--[$server, $port, $database].
  6. 6 - 7

    Now we can make our own use of the ErrMsg objects. The array reference $err holds two ErrMsg objects. $err-[0]> is from attempt 1 and $err->[1] from attempt 2. We check to see if error messages have been saved in these object and if so, we print them.

  1. Get the reference to the AsyncZ object.
  2. Get the total number of errors reported and loop through them.
  3. For each server there are two possible errors, since for some servers we make a second attempt to get results if the first attempt fails. (See Basic Mechanisms of Net::Z3950::AsyncZ.) $err is a reference to an anonymous array which may hold 1 or 2 references to Net:AsyncZ::ErrMsg objects, which store all the necessary info about these errors. (See Net::Z3950::AsyncZ::ErrMsg)
    So, get the errors array for the current index.
  4. Check to see whether in fact an error occurred--we may not have gotten any records back on the first attempt; consequently, we may have an error for attempt 1. But we may have been successful on attempt 2 and so the first error is nullified. Or we may have had an error on the first attempt which was fatal and so no second attempt was made. isZ_Error will tell us what happend.
  5. If we've got this far, then there's been some kind of error. So, let's tell our user the name of the server that failed to return results: we pick that up from the @servers array--[$server, $port, $database].
  6. - 7. Now we can make our own use of the ErrMsg objects. The array reference $err holds two ErrMsg objects. $err->[0] is from attempt 1 and $err->[1] from attempt 2. We check to see if error messages have been saved in these object and if so, we print them.

basic_pretty.pl ^

basic_pretty.pl is an upgrade to basic_pl. When you run basic_pl, you get a set of headers, which your user doesn't have to see, the records are run together, and the interspersed with the records are various debugging messages. basic_pretty.pl rectifies these problems.

Instead of reprinting the entire basic.pl, let's look only at the changes.

Main Routine

   use Net::Z3950::AsyncZ qw(:header :errors);  # [1]
   use Net::Z3950::AsyncZ::Errors qw(suppressErrors);  # [2]
        
        .        .          .       .

   my $asyncZ =
     Net::Z3950::AsyncZ->new(servers=>\@servers,query=>$query,
                   cb=>\&output,                          
                   log=>suppressErrors(),       # [3]             
        );  

Subroutines

basic_pretty output function
  sub output {
   my($index, $array) = @_;

   foreach my $line(@$array) {
     return if noZ_Response($line);       # [1]
     next if isZ_Info($line);             # [2]         
     next if isZ_Header($line);           # [3]     
     (print "\nServer: ", Z_serverName($line), "\n"), 
             next                                   # [4]
             if isZ_ServerName($line);              # [5]

     print "$line\n" if $line;  
    }

   print "\n--------\n\n";    

  }       

1. isZ_Info removes headers.

2. So, too, does isZ_Header

3. Z_serverName checks to see if this is the header with the server's name in it

4. If it is, then extract the server's name with isZ_ServerName and print it for the user's information

basic_pretty error-handling function
  sub showErrors {
  my $asyncZ = shift;          

   # substitute some general statement for a
   #  system level error instead of something
   #  puzzling to the user like:  'illegal seek'
  my $systemerr = 
    "A system error occurred on the server\n";    # [1]

   print "The following servers have not responded to your query: \n";  

   for(my $i=0; $i< $asyncZ->getMaxErrors();$i++) {
          my $err = $asyncZ->getErrors($i);   
          next if !isZ_Error($err);         
          print "$servers[$i]->[0]\n";  
          if($err->[0]->isSystem()) {                   # [2]
                print $systemerr;               
          }
          else {                                       # [3]
            print "  $err->[0]->{msg}\n" if $err->[0]->{msg};
  }
  if($err->[1] && $err->[1]->isSystem()) {             # [4]
        print $systemerr;                               
  }
  else {
    print "  $err->[1]->{msg}\n"                         # [5]
    if $err->[1]->{msg} && $err->[1]->{msg} != $err->[0]->{msg};
  }

  }

 }

1. We create a general system-level error message because this time we are going to test for system level errors and print the general statement to screen instead of system level error messages which risk frustrating the user.

2. We use the Net::ErrMsg object, naemly $err->[0]->isSystem(), to test for system-level errors and print the general message if it is system-level.

3. If it isn't we ouput the error message for this error.

4. We check first to make sure that $err->[1] exists: remember, $err->[1] is an error that occurs during the second attempt to query the server, and if the first time around we got a fatal (non-retryable) error, then we will not have and $err->[1]. If there is an $err->[1] and it's a system-level error, the print the general system message.

5. Otherwise, print the $err->[1] message. But only if it is not the same error and therefore the same message as the first time around. Since there's no point in repeating it.

more_pretty.pl ^

The script more_pretty illustrates the use of the format option.

the more_pretty Main Routine

   my $asyncZ =
    Net::Z3950::AsyncZ->new(servers=>\@servers,query=>$query,cb=>\&output,
                   format=>\&thisRecordRow,  # [1]
                   log=>suppressErrors()

        );  

the more_pretty format function

  use Text::Wrap qw($columns &wrap);            # [1]

  sub thisRecordRow {
    my ($row) = @_;                             # [2]
    $columns = 56;                              # [3]
    my $field = $row->[1];  
    my $indent = ' ' x 25;
    $field = wrap("",$indent, $field)
                   if length($field) > 56;     # [4]
    
    return sprintf("%20s:  %s\n",                             
      $Net::Z3950::AsyncZ::Report::MARC_FIELDS{$row->[0]}, $field);    # [5]

   }

options.pl ^

   use Net::Z3950::AsyncZ qw(:header :errors asyncZOptions); # [1]
   use Net::Z3950::AsyncZ::Errors qw(suppressErrors);   
                
   my @servers = (
                [ 'amicus.nlc-bnc.ca', 210, 'NL'],              
                ['bison.umanitoba.ca', 210, 'MARION'],          
                [ 'library.anu.edu.au', 210, 'INNOPAC' ],
                ['130.17.3.75', 210, 'MAIN*BIBMAST'],                   
                [ 'library.usc.edu', 2200,'unicorn'],
                [ 'z3950.loc.gov', 7090, 'Voyager' ],
                [ 'fc1n01e.fcla.edu', 210, 'FI' ],
                [ 'axp.aacpl.lib.md.us', 210, 'MARION'],
                [ 'jasper.acadiau.ca', 2200, 'UNICORN']
          );

   my @options = ();                                    # [2]

   for(my $i = 0; $i < @servers; $i++) {                
      $options[$i] = asyncZOptions(num_to_fetch=>1,     # [3]    
                                   format=>\&thisRecordRow);  
      $options[$i]->set_query('  @attr 1=1003  "James Joyce" ')
                          if $i % 2 == 0;               # [4]
   }
       
    $options[0]->set_GRS1();    # amicus        # [5] 
    $options[0]->set_raw_on();                  # [6]
    $options[0]->set_log('amicus.log');         # [7]
    $options[1]->set_raw_on();                  # [8]
    $options[5] = undef;  # z3950.loc.gov       # [9]

    my $query = '  @attr 1=1003  "Henry James" ';  # [10]

    my $asyncZ =
            Net::Z3950::AsyncZ->new(servers=>\@servers,query=>$query,cb=>\&output,
                           log=>suppressErrors(),       # [11]
                            options=>\@options,         # [12]
                            num_to_fetch=>2             # [13]          
                );  
          showErrors($asyncZ);

          exit;

        
          #------END MAIN------#  
[1]

Import asyncZOptions, the class method which returns Net::Z3950::AsyncZ::Option::_params objects-- where we can set options for each server separately.

[2]

Create an array to hold the references to _params objects.

[3]

Loop through the servers, creating a _params object for each. Set num_to_fetch=>1 and format=>\&thisRecordRow for each server.

Note: When you create a _params object for a server, if the num_to_fetch and format options are not set, they will revert to the default values, which are 5 and plain text output, even if you later set these options in the AsyncZ constructor. AsyncZ constructor settings do not apply to num_to_fetch and format if you have previously created a _params object for the server in question.

[4]

For every 2nd server we'll ask for books about James Joyce. The odd number servers will use the query about Henry James at #10. Unlike the num_to_fetch and format options, a query set in the AsyncZ constructor will apply to any server which does not have a query set for it in a _params object. The rationale behind this is that you usually will be asking one question across all servers.

[5]

We request GRS-1 records from amicus, The National Library of Canada, because this is their default preferredRecordSyntax.

[6]

We ask to have the amicus records returned to us raw, because we might presumably branch off from our output function to a special handler for raw GRS-1 records. (Although in the case of the National Library of Canada GRS-1 records, our GRS-1 handler in Net::Z3950::AsyncZ:Records works fine.)

[7]

Because of our special treatment of amicus records, we set a log to catch any error messages. In the case of logs, the log setting in the AsyncZ constructor will apply to all servers unless a log is specifically set for it in it a server's _params object. The rationale for this is that you probably would want one log file to cover all servers, except in special circumstances.

In the present case, only amicus will get a log; all the other servers will be governed by log=>suppressErrors() in the AsyncZ constructor.

[8]

Since amicus doesn't always respond, let's get some raw output from another server, just for demonstration purposes: $server[1] is bison.

[9]

I undef z3950.loc.gov, Library of Congress. This means that the Library of Congress record output will be govenred by the AsyncZ constructor and a default _params object which will be created for it.

[10]

Set the query for any servers which don't have a query set in its <_params>.

[11]

Suppress error logs for all servers which don't ask for error logs in their _params objects.

[12]

Set options=>\@options

[13]

Fetch 2 records for any server which does not have a _params object--in this case z3950.loc.gov, Library of Congress.

raw.pl ^

raw.pl illustrates how to access raw records which have not been filtered through Net::Z3950::Record::render().

   use Net::Z3950::AsyncZ qw(:header :errors asyncZOptions prep_Raw get_ZRawRec);  # [1]
   use Net::Z3950::AsyncZ::Errors qw(suppressErrors);           
   my @servers = (

                ['bison.umanitoba.ca', 210, 'MARION'],
                [ 'z3950.loc.gov', 7090, 'Voyager' ],
                [ 'jasper.acadiau.ca', 2200, 'UNICORN']
          );

   
  my @options = (               
   asyncZOptions (raw=>1,num_to_fetch=>3, render=>0), #[2]
   asyncZOptions (raw=>1,num_to_fetch=>3, render=>0),
   asyncZOptions (raw=>1,num_to_fetch=>3, render=>0),
  );

          my $query = '  @attr 1=1003  "James Joyce" ';  
          my $asyncZ =
            Net::Z3950::AsyncZ->new(servers=>\@servers,query=>$query,cb=>\&output,
                           monitor=>45, 
                           maxpipes=>2,    
                           log=>suppressErrors(),
                           options => \@options,            
                );  
          
          exit;
        
          #------END MAIN------#  



          sub output {
           my($index, $array) = @_;
           my $count=0;
           return if noZ_Response($array=>[0]); #[3]
           my $recs = prep_Raw($array);         #[4] 

           while(($rec = get_ZRawRec($recs))) { #[5]
             my $outfile = "> raw_${index}_$count";  #[6]
             open OUTFILE, $outfile;
             print OUTFILE $rec;
             close OUTFILE;
             $count++;
           }
          }
[1]

Import functions from AsyncZ which are needed for error handling, reading headers, and handling unfiltered raw records

[2]

Create _params objects for each of the servers: set raw to true and render to false.

[3]

Check to make sure there has been a response: no reponse headers will always consist of an array with one element.

[4]

Prepare the unfiltered records by passing them to prep_Raw(). This subroutine strips the headers from all the records in the current group, creates a single string from the array, and sets markers between each record.

[5]

Fetch one record at a time--get_ZRawRec()is a "get_next" type function, starting with the first record.

[6]

For this example, we'll write each record to a file: so we create a file name for each record as it is fetched and write the output to the file.

MARC_HTML.pl ^

This script demonstrates a number of things having to do with handling of HTML and MARC records. In addition, it gives an example of the use of the option Z3950_options of the _params object.

I reprint the script here, which is fully annotated, and give a few fuller explanations below.

        #!/usr/bin/perl

        ##  This script demonstrates a number of things
        ##      1. how to create you own MARC fields hash by adding fields to %$Net::Z3950::Report:all
        ##      2. use of the Z3950_options _params option
        ##      3. formatting HTML by starting with the default HTML row format
        ##      4. use of utf8 for unicode output TO browser
        ##

        use Net::Z3950::AsyncZ qw(:header :errors asyncZOptions); 
        use Net::Z3950::AsyncZ::Errors qw(suppressErrors);
        use Net::Z3950::AsyncZ::Report;
        use strict;


        my @servers = (
                        ['128.118.88.200',210,'catalog'],
                        ['bison.umanitoba.ca', 210, 'MARION']

                  );

        
        # [1] create hash of additional MARC fields

        my %my_MARC_fields = (
        651 => "location",
        654 => "terms",
        655 => "genre",
        656 => "occupation",
        760 => "main series",
        762 => "subseries",
        765 => "original language",
        767 => "translation entry",
        770 => "supplement/special issue",
        772 => "supplement parent",
        773 => "host item entry",
        774 => "constituent unit",
        775 => "other edition",
        776 => "add. physical form",
        777 => "issued with",
        780 => "preceding",
        785 => "succeeding",
        786 => "data source",
        787 => "nonspecific rel.",
        800 => "personal name",
        810 => "corporate name",
        811 => "meeting name",
        830 => "uniform title"
        );

        # [2] create a new hash which adds the additional MARC fields to %$Net::Z3950::AsyncZ::Report::all,
        # ($Net::Z3950::AsyncZ::Report::all is a reference to %Net::Z3950::AsyncZ::Report::MARC_Fields_All

        my %my_MARC_hash = (%$Net::Z3950::AsyncZ::Report::all, %my_MARC_fields);


        # [3] set options for both servers
        #   --assign \%my_MARC_hash to marc_userdef
        #   --ask for full records, the default is brief, by setting the Z3950 option elementSetName =>'f'.
        #   The 'f' option is used by the Net::Z3950::ResultSet module.  We set this option by
        #   using Z3950_options.  (Options set in the Manager are inherited by the other Z3950 modules.)
        #  --set format to &Net::Z3950::AsyncZ::Report::_defaultRecordRowHTML or else set HTML to true.

        my @options = (
                     asyncZOptions(
                          num_to_fetch=>8, format=>\&Net::Z3950::AsyncZ::Report::_defaultRecordRowHTML,
                          marc_userdef=>\%my_MARC_hash,Z3950_options=>{elementSetName =>'f'}),
                      asyncZOptions(
                          num_to_fetch=>8, HTML=>1,
                          marc_userdef=>\%my_MARC_hash)             
        ); 

        #  [4] set the utf8 option to true--you could also do that above in step 3
        $options[0]->set_utf8(1);
        $options[1]->set_utf8(1);


        # [5] set the query
        my $query = '  @attr 1=1016  "Baudelaire" ';  
   
        # [6] Output headers which notify the browser that this script is outputting utf8
        print "Content-type: text/html;charset=utf-8'\n\n";             
        print '<head><META http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body>', "\n";

        # [7] send out the query to the servers
                  my $asyncZ =
                    Net::Z3950::AsyncZ->new(servers=>\@servers,query=>$query,cb=>\&output,
                                   options=>\@options, #log=>suppressErrors()
                        );  


                  exit;

                  #------END MAIN------#  


                  sub output {
                   my($index, $array) = @_;

        # [8] stipulate that the output stream is utf8--required!
                   binmode(STDOUT, ":utf8");

        # [9] create a table structure for the rows of <TD>'s which are output by the
        # default format subroutine

                   my $table_started = 0;
                   my $server_found= 0;
                   print "<TABLE><TR><TD>";
                   foreach my $line(@$array) {
                     return if noZ_Response($line);
              
                     next if isZ_Info($line);   # remove internal data                
                     if (isZ_Header($line)) {
                            print '<tr><td>&nbsp;<td>&nbsp;</TABLE>' if $table_started;
                            $table_started = 1;

        # [10] Add space around table elements and set the alignments for the columns
                            print '<TABLE cellspacing = "6" cellpadding="2" border="0" width = "600">';
                            print '<colgroup span="2"><COL ALIGN = "RIGHT" WIDTH="150" VALIGN="TOP"><COL ALIGN="LEFT"></COLGROUP>';
                            next;
                     }
                  
                     my $sn = Z_serverName($line);
                     if($sn && ! $server_found) {                       
                              print "\n<br><br><br><b>Server: ", $sn, "</b><br>\n";
                              $server_found = 1;        
                     }
 
        # [11] substitute a fancier style for the field names            
                     $line =~ s/<TD>/<TD NOWRAP style="color:blue" align="right">/i;
                     print "$line\n" if $line;  
                    }
                  print "</TABLE>";
                  }       
[1]

%my_MARC_fields is drawn from the Library of Congress MARC documentation.

[2]

It is added to %Net::Z3950::AsyncZ::Report::MARC_FIELDS_ALL, which is referenced by $Net::Z3950::AsyncZ::Report::all (and is not itself directly accessible). We create this extended set of fields in order to get as much data as possible, since we are going to be setting elementSetName to 'f', asking for "full" as opposed to "brief" records.

[4]

To use utf8 support, you must have MARC::Charset installed; otherwise, this option will be ignored.

[5]

This query should get us some French accented characters with which to test out utf8 support

[6], [8]

These steps notify the browser that it will be receiving a utf8 stream and notify perl that it should output a utf8 stream. Unless you call binmode(STDOUT,":utf8"), perl will not output the utf8 code.

[9] - [11]

We will be using the default HTML format function, which outputs individual rows of data formatted for insertion into a table. Its structure is:

                <TD>field name<TD>field data

The output() callback takes advantage of this formatting byt specifying HTML attributes for the table and by reconstructing one of the <TD> tags.

AUTHOR ^

Myron Turner <turnermm@shaw.ca> or <mturner@ms.umanitoba.ca>

COPYRIGHT AND LICENSE ^

Copyright 2003 by Myron Turner

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

syntax highlighting: