The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

README for makeztxt version 1.62

by John Gruenenfelder (johng@as.arizona.edu)
Wednesday, June 27, 2003

The Weasel Reader website is here:
http://gutenpalm.sf.net



Contents
--------
1. What is makeztxt?
2. Features
3. Using makeztxt (creating)
4. Using makeztxt (deconstructing)
5. List of command line options
  5.1. Options used when creating a zTXT database
  5.2. Options used when deconstructing a zTXT database
6. Compiling makeztxt
7. Miscellaneous Notes





1. What is makeztxt?
------------------------------------------------------------
makeztxt is a simple commandline program that takes a plain ASCII text file
and compresses it into a zTXT database.  makeztxt will remove newline
characters at the end of lines that contain text so that the paragraphs flow
better on the Palm screen.  makeztxt supports the use of regular expressions
to automatically generate a list of bookmarks for you.  Lastly, makeztxt can
also break an existing zTXT file into it's components (text, bookmarks,
annotations) and store them into separate files for you.

Please note that as a commandline program, makeztxt is intended for more
advanced users.  There are several very good conversion programs available
that have easy to use GUI interfaces.  If you are not experienced using the
DOS/UNIX commandline environment, you may wish to use one of those instead.
You can find links to all the conversion programs at the Weasel website:
  http://gutenpalm.sf.net


*** Important Compatibility Note ***

As of Weasel Reader 1.59.6 the old-style (mode 2) non-random access zTXT
documents are no longer supported.  makeztxt can still create them (this is
not the default), but this is present only for backwards compatibility.




2. Features
------------------------------------------------------------
  * Creates zTXT databases
  * Deconstructs zTXT databases into component pieces
  * Use regular expression to automatically generate bookmarks
  * Includes libztxt, a small C library, so you can easily add zTXT creation
      or disection to your program
  * Create zTXTs that allow on-demand decompression, or get 10% - 15% more
      compression with the original style zTXT.
  * Read regular expressions from a config file (.makeztxtrc)




3. Using makeztxt (creating)
------------------------------------------------------------
Running 'makeztxt --help' will print out the list of command line options and
what their functions are.

The best feature of makeztxt is its ability to use regular expressions to
search the input text for bookmark spots.  This is done with the command line
options -l and -r.
  -l will list all the bookmarks that are generated.
  -r takes a regex as an argument to generate one or more bookmarks.
     You can have as many -r options as you want.

A full listing of all of the options to makeztxt can be found in Section 5.

You can also put a list of regular expressions, one per line, in a file called
".makeztxtrc".  This file goes in your home directory, or in the current
directory (if you have no home directory).  A sample .makeztxtrc is included
with the distribution.  You can also explicitly specify which file to read
regex from by using the -R option.

makeztxt can add a list of pre-generated bookmarks given in a file with the -m
option.  Care should be taken to make sure that the bookmark offsets you
specify are valid in the converted text since makeztxt will, by default,
reformat the input text to better flow on a Palm screen (removing many line
breaks).

For annotations, makeztxt can also add pre-generated annotations given in a
file with the -A option.  See Section 5 for information on how this file must
be formatted.

In addition, you can use a 2 part regular expression, like (regexp1)(regexp2),
and it will match on the entire line, but the bookmark display will only be
the regexp2 part.

eg.

makeztxt -l -r "(Subject:)(.*)" file.txt

Where file.txt contains a number of emails, or news articles will generate
bookmarks with the subject of the article, but without the word Subject:.


The following examples show the name of the work, the command line used, and
the first eight bookmarks generated by the command line:

Shakespeare's "King Henry V"
------------------------------------------------------------
>makeztxt -l -t "King Henry V" -r "DRAMATIS PERSONAE" -r "ACT [A-Z]+" \
         -r "SCENE [A-Z]+" 2ws2310.txt

Generated bookmarks
Offset          Title
-----------     --------------------
12097           DRAMATIS PERSONAE
14841           ACT FIRST
14853           SCENE I
19241           SCENE II
33233           ACT II
35118           SCENE I
40805           SCENE II
49553           SCENE III


RL Stevenson's "Treasure Island"
------------------------------------------------------------
>makeztxt -l -t "Treasure Island" -r "PART [A-Z]+" -r "          [0-9]+" \
         treas10.txt

Generated bookmarks
Offset          Title
-----------     --------------------
12005           PART ONE
12422           PART TWO
12836           PART THREE
13087           PART FOUR
13685           PART FIVE
14102           PART SIX
14656           PART ONE
14723           1


Charles Darwin's "On the Origin of Species"
------------------------------------------------------------
>makeztxt -l -t "On the Origin of Species" -r "Introduction\." \
         -r "Chapter [IVX]+" otoos10.txt

Generated bookmarks
Offset          Title
-----------     --------------------
19482           Introduction.
29724           Chapter I
99693           Chapter II
129257          Chapter III
165118          Chapter IV
259640          Chapter V
332498          Chapter VI
399182          Chapter VII





4. Using makeztxt (deconstructing)
------------------------------------------------------------
Running 'makeztxt -d --help' will print out commandline usage for disecting
zTXT files.  This mode is much simpler than that of creating zTXT database, so
it should be much easier to use.  Simply give makeztxt a zTXT PDB file
(filename.pdb) and it will output the uncompressed text data into another file
(filename.txt).  The exact output filename can be specified with the -o
option.

makeztxt can also extract the bookmark list and the annotations from the zTXT
file and output them.  To output a bookmark list, give an output filename with
the -m option.  Similarly, to output a file with the zTXT's annotations, give
an output filename with the -A option.

That's all there is to it.




5. List of command line options
------------------------------------------------------------


5.1. Options used when creating a zTXT database:
-----------------------------------------------

-A/--annofile filename  --  Give makeztxt a file containing annotations that
    will be added into the generated zTXT database.  This file must follow a
    particular format to be understood by makeztxt.  Each annotation is of the
    format:
      1) An annotation begins with a title line:
          Title: My Annotation
         where the text after the colon is the annotation's title with a
         maximum of 20 characters.
      2) The next line is the location in the text of the annotation anchor:
          Offset: 12345
         where the offset value is an absolute character position in the
         *reformatted* text file.
      3) The actual annotation text:
          Annotation: This is the text of my annotation.
         The annotation text will continue after a *single* "Annotation:" line
         until one of the following conditions is met: a) the file ends, b)
         another annotation is started with a "Title:" line, or c) the
         annotation reaches the maximum size of 4096 characters.

-a/--adjust int  --  Control the method of text formatting.  Valid types are
    0, 1, or 2.  Method 0 will compute the average line length through the
    entire file and strip newline characters from any line longer than the
    average.  Method 1 will strip the newline from any line with text in
    it.  Method 2 will leave the text unchanged.  The default is 0.

-b/--length int  --  If adjust method 0 is used, the value given with this
    option is the length a line must be to have its newline stripped.  Using
    this option will override the value calculated by makeztxt.

-h/--help  --  Display command line options and usage information.

-l/--list  --  Display a list of all bookmarks generated by makeztxt or
    specified by the user.  This is useful if you want to make sure your
    regular expressions are generating correct bookmarks.

-L/--launchable  --  Sets the "launchable" attribute in the generated zTXT
    database.  The Launcher apps on a Palm device can use this attribute and
    will display all zTXT documents in the main program listing allowing you
    to launch Weasel and open a specific document by tapping on the document
    directly.  Default is OFF.

-m/--markfile filename  --  Give makeztxt a file containing a pre-generated
    list of bookmarks to add to the generated zTXT database.  The bookmark
    file has a very simple format.  Each line begins with an integer offset
    for the bookmark anchor.  Following that are one or more spaces/tabs.
    Finally is the bookmark title which occupies the remainder of the line up
    to a maximum of 20 characters.  A line might look like:
       23955   Chapter VII

-n/--nobackup  --  Instructs makeztxt to not set the backup attribute in the
    generated zTXT database.  This attribute, if set, will cause the database
    to be backed up during the next HotSync operation.  Default is to set this
    attribute.

-o/--output filename  --  Explicitly give the output filename which makeztxt
    should use.  If this filename is not given, makeztxt will generate an
    output filename by removing the extension of the input file and replacing
    it with "pdb".  If makeztxt is reading input from standard input this
    option is mandatory.

-R/--regexfile filename  --  makeztxt will attempt to read a default set of
    regular expressions from the file .makeztxtrc in the user's home directory
    or from /etc/makeztxt.conf if that fails.  This option can be used to tell
    makeztxt which file to read the list of regex from.  Useful for user's on
    systems with no home directories.

-r/--regex string  --  Supply makeztxt with a regular expression for bookmark
    generation.  string is a valid regex.  This option can be given multiple
    times on the command line, each one adding a new regex.

-t/--title string  --  Specify the title of the generated zTXT database.  The
    database title is stored within the database and is the name which will
    appear under Palm OS.  The title is limited to 32 characters.  If makeztxt
    is reading input from standard input this option is mandatory.

-V/--version  --  Cause makeztxt to print out version information and exit.

-z/--compression int  --  Set the method of compression to be used.  makeztxt
    supports to methods of compression.  Method 1 allows for random access
    with a zTXT document and is the standard method.  Method 2 gives 10-15%
    higher compression but requires that the entire document be decompressed
    before it can be read by the user.  Default is method 1.



5.2. Options used when deconstructing a zTXT database:
-----------------------------------------------------

-d/--deconstruct  --  This option tells makeztxt that you wish to deconstruct
    a zTXT database.  It is required for this mode of operation.

-A/--annofile filename  --  Specify the filename into which makeztxt will
    store any annotations extracted from the input zTXT file.  If this
    option is not given, annotations will not be extracted.

-h/--help  --  Display command line options and usage information.

-m/--markfile filename  --  Specify the filename into which makeztxt will
    store any bookmarks extracted from the input zTXT file.  If this option is
    not given, bookmarks will not be extracted.

-o/--output filename  --  Specify the output file makeztxt will store the
    extracted text data.  If this option is not given, makeztxt will generate
    a default filename by removing the extension from the input file name and
    replacing it with "txt".  If makeztxt is reading input from standard input
    this option is mandatory.

-V/--version  --  Cause makeztxt to print out version information and exit.




6. Compiling makeztxt (for great profit!)
------------------------------------------------------------
makeztxt uses zLib v1.1.3 (http://www.info-zip.org/pub/infozip/zlib).
You will need to have zLib compiled for your HOST machine.  All Linux
distributions as well as most other Unices come with zLib, though it is
possible you may be lacking the zLib header files.

You should look in the Makefile to make sure the program names and paths are
okay.

If you are running on Sun hardware, uncomment the PACK line in the Makefile.
makeztxt will not work without this.  If you are getting mysterious crashes,
you might want to try this switch as well, however, if you are on an x86
system, you should not enable that flag.

If your system does not have GNU regex (Solaris, Cygwin, others) then
uncomment the USEPOSIX line to cause makeztxt to use POSIX regex.

If you are compiling on a Windows system, or any system which makes a
distinction between text and binary files, you'll need to uncomment out the
HAVEBINARYFLAG line in order to get valid output from makeztxt.

Lastly, you can uncomment STATICLIBS to statically link against zlib.  This
can be beneficial on Cygwin systems to cut down on the number of DLLs that
need to be distributed.

Now run:

  "make"

You should now have makeztxt.

If you're messing with the source, then maybe you want to help.  If you have
any problems, feel free to email me at johng@as.arizona.edu .  Please use, if
possible, the latest code from the CVS repository.  It can be found at:
  http://sourceforge.net/projects/gutenpalm

If you would like to submit a bug report or a feature request, please make use
of the facilities on Weasel's SourceForge project page.  This allows for much
easier management of bug and feature request tracking.  It also ensures that
your report is not forgotten about.  The project page is at:
  http://sf.net/projects/gutenpalm




7. Miscellaneous Notes
------------------------------------------------------------
  ** The standard "it runs fine for me" disclaimer applies.  I've tested it a
     lot myself, but you can never predict everything.  Still, there's no
     oddball hacking involved so I think the chance of catastophic Palm
     explosion should be small indeed.  This is not to say that it won't ever
     crash/hang, but if it does...