<HTML><HEAD><TITLE>News Clipper User's Manual</TITLE>
<BODY bgColor=#FFFFFF text=#000000>
<H2>
<CENTER>News Clipper User's Manual</CENTER></H2>
<P><A
href="#quickstart">Quick Start</A><BR><A
href="#intro">Introduction</A><BR><A
href="#install">Installation</A><BR><A
href="#config">Configuration</A><BR><A
href="#running">Running News Clipper</A><BR><A
href="#works">How It Works</A><BR><A
href="#tags">The News Clipper Tag Language</A><BR>
</P>
<hr><A name=quickstart>
<H2>Quick Start </H2></A>
<P>
<OL>
<LI>Make sure you have Perl installed on your system.
<LI>Install News Clipper as described in the distribution's README file.
<LI>Make a copy of your favorite web page. (You might want to give it a
non-html extension.)
<LI>Edit the HTML, and insert the following text somewhere: <PRE><!--
newsclipper
<input name=date>
-->
</PRE>
<LI>Run <TT>NewsClipper.pl -i inputfile -o outputfile</TT>, where
<TT>inputfile</TT> is the file you just edited, and <TT>outputfile</TT>
is the file you want News Clipper to create.
<LI>When News Clipper asks you for permission to download the "date"
handler, answer yes.
<LI>After News Clipper is done, the output file will have the current
date inserted in place of the special tags you entered.
<LI>Visit the <A
href="http://www.newsclipper.edu/handlers.htm">handler
webpage</A> for more handlers and a description of handler options.
</LI></OL>
<P></P>
<hr><A
name=intro>
<H2>Introduction </H2></A>
<P>News Clipper is a Perl program that allows people to integrate dynamic
information their web page. This information might be something simple,
like the date, or complex, like a set of links to recent Usenet postings.
News Clipper allows the user to specify, using an HTML-like syntax, the
source of data, how that data should be filtered, and how that data should
be output. </P>
<P>By separating acquisition of data, filtering of data, and output of
data, web designers are given more freedom to control the presentation of
data. For example, you can specify that all headlines from Yahoo Tech News
that have to do with Microsoft, Linux, or Y2K should be printed in three
column , with the word Linux highlighted. Here's how the HTML might look: <PRE><!--newsclipper
<input name=yahootopstories source=tech>
<filter name=grep words="microsoft,linux,y2k">
<filter name=map filter=highlight words="microsoft,linux,y2k">
<output name=array numcols=3>
-->
</PRE>
<P></P>
<P>Originally News Clipper was designed for a single user (me), but some
effort has been spent to make it more generally useful. DOS/Windows
installation is supported, as is system installation as Perl modules, and
global HTML and image caches. Now timezones are supported for people whose
time zone doesn't correspond to the server's. </P>
<hr><A
name=install>
<H2>Installation </H2></A>
<P>News Clipper is a Perl program. If you are on a Unix-derived operating
system, you should have it installed already. If you are on a DOS or
Windows system, you are likely to have more difficulties that average
users. If you are a Windows user and have never heard of Perl, you might
be in for even more difficulty. (i.e. Find someone who can help you.) </P>
<P>Instructions:
<OL>
<LI>Download the distribution.
<LI>Unzip and untar it.
<LI>Read the README for detailed installation instructions.
<LI>No really, read the README.
<LI>For a system-wide installation, do <TT>perl Makefile.PL</TT>,
<TT>make</TT>, <TT>make install</TT>. (You may need to use nmake or
dmake if you are on a Windows platform.)
<LI>Do "perldoc NewsClipper.pl" to see how to run the script, in
general.
<LI>Run "NewsClipper.pl -i template.txt -o output.html" to see the
script download the handlers and create the file output.html from
template.txt.
<LI>Visit the
<a href="http://www.newsclipper.com/handlers.htm">handlers</A>
webpage for more tags you can use. </LI></OL>
<P></P>
<P>Read the README for more detailed instructions. Also check the <A
href="http://www.newsclipper.com/techsup.htm">FAQ</A>
if you run into problems. </P>
<P></P>
<hr><A
name=config>
<H2>Configuration </H2></A>
<P>For a complete description of all configuration options, run "perldoc
NewsClipper.pl". Here are a few notes regarding the various configuration
parameters. More description can be found in the NewsClipper.cfg file
itself. </P>
<P>When News Clipper is run with the -c switch, the specified file is used
as a configuration file. Otherwise, News Clipper looks for a configuration
file in ~user/.NewsClipper, then SYSCONFIGDIR, which is set in
NewsClipper.pl during installation time. </P>
<P>On Windows systems, the TZ environment variable is set in the
configuration file during installation. </P>
<P>For single user installations, the News Clipper modules will not be in
the standard Perl locations. In this case, modulepath in the configuration
file is set to point to them. (This means you don't have to change your
PERL5LIB environment variable or run perl with the -I flag.) </P>
<P>Timeouts are used to prevent News Clipper from running too long, and to
prevent unresponsive remote servers from slowing things down. Set
sockettimeout to the maximum amount of time that you want News Clipper to
wait for a response from a server. Set scripttimeout to the maximum time
that you want News Clipper to run. (Note that scripttimeout should be
about equal to sockettimeout times the number of News Clipper tags in your
input files.) </P>
<P>News Clipper can handle multiple input and output files. Be sure that
the number of input files equals the number of output files. </P>
<P>News Clipper caches remote web pages internally. This means that an ISP
with 100 users using the "lycosweather" handler won't hit the Lycos server
100 times. Also, authors of handlers specify the times that data is
updated on remote servers, and News Clipper will only fetch data if it has
been updated since the last time it was fetched. This is useful for things
like comics, which only update once a day. </P>
<P>The "cacheimages" handler caches remote images locally. When given a
bit of HTML with <img src="URLx">, it caches the image pointed to by
URLx, and substitutes a local URLy in the place of URLx. cacheimages also
deletes old images from the cache after a specified time. These options
can be given default values in the configuration file, which lets system
maintainers provide a global image cache for all users. </P>
<P>News Clipper also allows the user to specify the location where
handlers should be stored. System maintainers can point this value to a
globally accessible directory. Otherwise, it defaults to
~user/.NewsClipper/NewsClipper/Handler, where ~user is the user's home
directory. </P>
<hr><A
name=running>
<H2>Running News Clipper </H2></A>
<P>There are different ways of using the script:
<UL>
<LI>Probably the best way to run the script is from a cron job. To do
this, create a .crontab file with something similar to the
following:<BR><TT>0 7,10,13,16,19,22 * * *
/path/NewsClipper.pl</TT><BR>The first field
is the minute, and the second field is the hour(s). You have to specify
the complete path to the script. Then you can make the output file your
startup page in Netscape.
<LI>You could make cgiwrap call your startup page, but this would mean
having to wait for the script to execute (2 to 30 seconds, depending on
the staleness of the information).
<LI>And you could just run the script manually from the command line...
</LI></UL>
<P></P>
<P>If called as a CGI program, the output file is echoed to standard
output with the text "Content-type: text/html" preceding it. This allows
it to be called dynamically over the net via cgiwrap. For example:<BR>
<a
href="http://www.host.com/cgi-bin/cgiwrap?user=you&script=NewsClipper.pl"></A>.
</P>
<P>The first time you run the script each day, it may a half-minute or so
to collect the information (depending on network load and amount of data
to aquire). But after that, the script is very fast because is will only
pull data from the net if it needs to. </P>
<hr><A
name=works>
<H2>How It Works </H2></A>
<P>NewsClipper.pl processes command line options, the configuration file,
and input and output files. Each input file is parsed, and when a comment
of the form <!-- newsclipper...--> is found, the comment is parsed
for commands to be executed. </P>
<P>If there is only one command to be executed (an input command),
News Clipper determines the default filter and output handlers from the input
handler. The resulting (expanded) command list will be composed of an
input command, zero or more filter commands, and an output command. </P>
<P>During input commands, the cache is checked to see if fresh data still
exists. If not, the data is grabbed from the net, stored in the cache, and
then used by the handler. </P>
<P>Each command is executed, and the results are fed into the next
command. If anything goes wrong, News Clipper inserts a comment in the
output file describing the problem. </P>
<P>If, at any time, a handler can not be found, News Clipper prompts the
user to download it. The -n flag can be used to tell News Clipper to check
for new versions of handlers, and the -a flag can be used to automatically
download them. </P>
<hr><A
name=tags>
<H2>The News Clipper Tag Language </H2></A>
<P>With the release of News Clipper 7.0, users have much more flexibility
when it comes to choosing how data should be displayed on their web pages.
This is achieved by separating data acquisition, modification, and output
into distinct steps. </P>
<P>A newsclipper tag is composed of three types of commands: <TT><input
name=...></TT>, <TT><filter name=...></TT>, and <TT><output
name=...></TT>;. The first part of the command tells News Clipper how
to execute the command. The name attribute tells News Clipper which
handler to use for the command. Additional attributes can also be
specified for the command, and are passed on to the handler. Each handler
has a set of default filter and output handler commands, so if you only
specify the input command, the defaults are used. </P>
<P>First off, terminology: a <STRONG>string</STRONG> is a sequence of
characters, possibly containing newlines. Strings can be HTML or regular
text, and it doesn't matter to News Clipper. An <STRONG>array</STRONG> is
an ordered list of items. The items can be anything, even another array. A
<STRONG>hash</STRONG> is an unordered list with named entries. For
example, you might have 3 strings, each corresponding to the "author",
"URL", and "description". The names in a hash are called the
<STRONG>keys</STRONG>. </P>
<P>One important thing to note is the type of data that is input and
output from each command. For example, if you use an input command that
generates a list of items, and you then try to filter this list with a
filter that expects a single string of data, an error will occur. The
input and output types are documented in the comments of the handler.pm
file located in your handlers directory, and also at the <A
href="http://www.newsclipper.com/handlers.htm">handler
webpage</A>. </P>
<P>There are over 100 handlers that can be used in input commands. Some
handlers also perform filtering and output commands if the data that they
generate is very specific to the handler. The majority of handlers,
however, generate strings, lists, and hashes that can be manipulated using
generic filters and output using generic output handlers. </P>
<P>Below is an example tag: <PRE><!-- newsclipper
<input name=slashdot type=articles>
<filter name=slashdot type=LinksAndText>
<filter name=limit number=4>
<filter name=map filter=limit number=200 chars>
<output name=array numcols=2 prefix="<p>--&gt;" suffix="</p>">
-->
</PRE>
<P></P>
<P>This tag specifies nearly everything, including values that already
have defaults. The first command results in an array of hashes containing
information about the current Slashdot articles. The next command is a
filter, which uses one of the filters in the slashdot handler. The
slashdot filter returns an array of strings, which is then sent to the
generic "limit" filter to reduce the number of strings to four. </P>
<P>At this point, we have an array of four (or less) strings containing
Slashdot links and text. The next command is a "map" filter, which applies
another filter to the contents of a data structure. In this case, the map
filter is applying the limit filter to the text in each item of our array.
("number=200 chars" tells the limit filter that we want to limit the
number of characters, not the number of lines, which is the default
behavior for strings.) </P>
<P>The final step is to print the array of shortened strings, so we send
the data to the "array" handler, and tell it to print in two columns using
our own special bullets and spacing. </P>
<P>The output might look something like this:
<TABLE width="100%">
<TBODY>
<TR>
<TD vAlign=top width="50%">
<P>-><A
href="http://www.slashdot.org/articles/99/03/22/134259.shtml">Is Red
Hat the Next Microsoft?</A><BR><A
href="mailto:patdunn@dreamscape.com">Patrick Dunn</A> writes <I>"On
ZDNET's Smart Reseller they have a story about <A
href="http://www.zdnet.com/zdnn/stories/news/0,4586,2229091,00.html">Red
Hat maybe being a mini-Microsoft</A> by it's business
practices."</I> I'd guess that the 2 most common c...</P>
<P>-><A
href="http://www.slashdot.org/articles/99/03/22/1016217.shtml">Mozill
a M3 Release Available Now</A><BR><A
href="mailto:makali@rocketmail.com">Makali</A> writes <I>"Just took
a quick peek at the Sunsite FTP mirror of <A
href="ftp://ftp.mozilla.org/pub/mozilla/releases/m3">ftp://ftp.mozilla.org/pub/mozilla/releases/m3</A>
and <A
href="ftp://sunsite.doc.ic.ac.uk/Mirrors/ftp.mozilla.org/pub/mozilla/releases/M3/">Sunsite.doc.ic.ac.uk</A>
is up and contains tarballs for several platforms. Fetch! "</I>
...</P></TD>
<TD vAlign=top width="50%">
<P>-><A
href="http://www.slashdot.org/articles/99/03/22/0950223.shtml">Wired
on Kipling</A><BR><A href="mailto:dodger@2600.com">The Dodger</A>
writes "The Kipling 'Hacker' luggage debacle gets coverage in <A
href="http://www.wired.com/news/news/culture/story/18616.html">Wired</A>,
along with slightly derogatory references to the Slashdotters'
ability (or rather lack of it) to 'crack ...</P>
<P>-><A
href="http://www.slashdot.org/articles/99/03/22/0934206.shtml">CeBIT
Tidbits</A><BR><A href="mailto:madman3@imfamous.com">MadMan2</A> has
sent us a report from <A href="http://www.messe.de/cb99">CeBIT</A>.
Little bits about bigass Samsung Dimms, Not so upgradable Palm
Pilots, SuSE, AOL-Scape and Applix. Hit the link below to read
MadMan2's machine g...</P></TD></TR></TBODY></TABLE></P>
<P>If all of this seems too complicated, you can just settle for the
default filters and output of the handlers. In the case of Slashdot, you
would do this: <PRE><!-- newsclipper
<input name=slashdot>
-->
</PRE>
<P></P>
<P>And the default output would look like this:
<TABLE width="100%">
<TBODY>
<TR>
<TD vAlign=top width="50%">
<UL>
<LI><A
href="http://www.slashdot.org/articles/99/03/22/134259.shtml">Is
Red Hat the Next Microsoft?</A>
<LI><A
href="http://www.slashdot.org/articles/99/03/22/1016217.shtml">Mozilla
M3 Release Available Now</A>
<LI><A
href="http://www.slashdot.org/articles/99/03/22/0950223.shtml">Wired
on Kipling</A>
<LI><A
href="http://www.slashdot.org/articles/99/03/22/0934206.shtml">CeBIT
Tidbits</A>
<LI><A
href="http://www.slashdot.org/articles/99/03/22/0928207.shtml">The
Anoraks' New Clothes</A> </LI></UL></TD>
<TD vAlign=top width="50%">
<UL>
<LI><A
href="http://www.slashdot.org/articles/99/03/22/0916204.shtml">Bunny
wins the Oscar</A>
<LI><A
href="http://www.slashdot.org/books/99/03/22/0826250.shtml">Review:<CITE>Developing
Linux Applications with GTK+ and GDK</CITE></A>
<LI><A
href="http://www.slashdot.org/articles/99/03/21/1638230.shtml">Star
Wars Retrospective in NY Times</A>
<LI><A
href="http://www.slashdot.org/articles/99/03/21/1459221.shtml">Yet
Another GNOME Article</A> </LI></UL></TD></TR></TBODY></TABLE></P>
<H3>Built-in Filter Handlers </H3>
<P>Each of these filters comes pre-installed with News Clipper. They will
not be located in your .NewsClipper directory, but in the same location as
the other News Clipper modules. (This location depends on your system
configuration, and whether or not you did a site-wide installation.) </P>
<P><STRONG><filter name=grep words=X invert> </STRONG><BR>grep is
named after the Unix command for finding lines in a file that contain a
pattern. It takes a string, array, or hash, and returns the data that
contain one of a set of words. The "invert" attribute can be used to
return the data that does *not* contain the keyword. (Note that in the
case of the hash, it isn't the keys, but the values that are searched.)
</P>
<P><STRONG><filter name=selectkeys keys=X invert> </STRONG><BR>Takes
and returns a smaller hash with the given keys. "invert" returns the hash
that does not contain the keys. </P>
<P><STRONG><filter name=highlight style=X words=Y>
</STRONG><BR>Highlight surrounds the specified words with a HTML tags. The
style is "strong" by default. </P>
<P><STRONG><filter name=limit number=X chars> </STRONG><BR>Accepts a
string, array, or hash, and returns the same. This filter trims the number
of characters, lines, items, or keys to the number specified. "chars" must
be specified if you want to treat strings as sequences of characters
instead of lines. </P>
<P><STRONG><filter name=hash2array order=X> </STRONG><BR>hash2array
takes a hash and a given key ordering, and returns an array whose items
are the hash values in the specified order. </P>
<P><STRONG><filter name=map depth=X filter=Y [...]>
</STRONG><BR>Suppose you have an array of strings, and want to apply the
highlight filter to the strings. Unfortunately, highlight doesn't take
arrays of strings. That's what this filter is for. "depth" tells map how
many levels into your data structure to go before applying the filter
given by "filter". Any additional arguments are passed on to the filter.
</P>
<P><STRONG><filter name=cacheimages maxage=X dir=Y url=Z>
</STRONG><BR>Suppose you have an array of HTML image links, and you want
to cache them locally, and translate the links to point to the local
images. Give cacheimages the "dir" to store the images in, the "url" that
corresponds to that dir on the web, and it will download the images and
store them for you. "maxage" tells the filter that it can delete images
older than a certain number of seconds. </P>
<H3>Built-in Output Handlers </H3>
<P><STRONG><output name=string> </STRONG><BR>Prints a string. </P>
<P><STRONG><output name=table header=X border=Y> </STRONG><BR>Takes
a two-dimensional array and outputs a table having a border size as given
by "border". "header" allows you to specify whether the top and/or left
sides of the table should be headers. </P>
<P><STRONG><output name=array numcols=W prefix=X suffix=Y
separator=Z> </STRONG><BR>Output an array of strings. "numcols" is the
number of columns. "prefix", "separator" and "suffix" are strings to print
before, between, and after each item. If prefix is "ul" or "ol", a
bulletted or numbered list is created. </P>
<P><STRONG><output name=thread style=X> </STRONG><BR>Takes a
"thread" data type, like you would see in discussion lists. Outputs using
numbered or unnumbered lists, depending on whether the style is "ol" or
"ul". See the handler's comments for a description of the thread data
type. </P>
</BODY></HTML>