README for makeztxt version 1.62
by John Gruenenfelder (johng@as.arizona.edu)
Wednesday, June 27, 2003
The Weasel Reader website is here:
http://gutenpalm.sf.net
Contents
--------
1. What is makeztxt?
2. Features
3. Using makeztxt (creating)
4. Using makeztxt (deconstructing)
5. List of command line options
5.1. Options used when creating a zTXT database
5.2. Options used when deconstructing a zTXT database
6. Compiling makeztxt
7. Miscellaneous Notes
1. What is makeztxt?
------------------------------------------------------------
makeztxt is a simple commandline program that takes a plain ASCII text file
and compresses it into a zTXT database. makeztxt will remove newline
characters at the end of lines that contain text so that the paragraphs flow
better on the Palm screen. makeztxt supports the use of regular expressions
to automatically generate a list of bookmarks for you. Lastly, makeztxt can
also break an existing zTXT file into it's components (text, bookmarks,
annotations) and store them into separate files for you.
Please note that as a commandline program, makeztxt is intended for more
advanced users. There are several very good conversion programs available
that have easy to use GUI interfaces. If you are not experienced using the
DOS/UNIX commandline environment, you may wish to use one of those instead.
You can find links to all the conversion programs at the Weasel website:
http://gutenpalm.sf.net
*** Important Compatibility Note ***
As of Weasel Reader 1.59.6 the old-style (mode 2) non-random access zTXT
documents are no longer supported. makeztxt can still create them (this is
not the default), but this is present only for backwards compatibility.
2. Features
------------------------------------------------------------
* Creates zTXT databases
* Deconstructs zTXT databases into component pieces
* Use regular expression to automatically generate bookmarks
* Includes libztxt, a small C library, so you can easily add zTXT creation
or disection to your program
* Create zTXTs that allow on-demand decompression, or get 10% - 15% more
compression with the original style zTXT.
* Read regular expressions from a config file (.makeztxtrc)
3. Using makeztxt (creating)
------------------------------------------------------------
Running 'makeztxt --help' will print out the list of command line options and
what their functions are.
The best feature of makeztxt is its ability to use regular expressions to
search the input text for bookmark spots. This is done with the command line
options -l and -r.
-l will list all the bookmarks that are generated.
-r takes a regex as an argument to generate one or more bookmarks.
You can have as many -r options as you want.
A full listing of all of the options to makeztxt can be found in Section 5.
You can also put a list of regular expressions, one per line, in a file called
".makeztxtrc". This file goes in your home directory, or in the current
directory (if you have no home directory). A sample .makeztxtrc is included
with the distribution. You can also explicitly specify which file to read
regex from by using the -R option.
makeztxt can add a list of pre-generated bookmarks given in a file with the -m
option. Care should be taken to make sure that the bookmark offsets you
specify are valid in the converted text since makeztxt will, by default,
reformat the input text to better flow on a Palm screen (removing many line
breaks).
For annotations, makeztxt can also add pre-generated annotations given in a
file with the -A option. See Section 5 for information on how this file must
be formatted.
In addition, you can use a 2 part regular expression, like (regexp1)(regexp2),
and it will match on the entire line, but the bookmark display will only be
the regexp2 part.
eg.
makeztxt -l -r "(Subject:)(.*)" file.txt
Where file.txt contains a number of emails, or news articles will generate
bookmarks with the subject of the article, but without the word Subject:.
The following examples show the name of the work, the command line used, and
the first eight bookmarks generated by the command line:
Shakespeare's "King Henry V"
------------------------------------------------------------
>makeztxt -l -t "King Henry V" -r "DRAMATIS PERSONAE" -r "ACT [A-Z]+" \
-r "SCENE [A-Z]+" 2ws2310.txt
Generated bookmarks
Offset Title
----------- --------------------
12097 DRAMATIS PERSONAE
14841 ACT FIRST
14853 SCENE I
19241 SCENE II
33233 ACT II
35118 SCENE I
40805 SCENE II
49553 SCENE III
RL Stevenson's "Treasure Island"
------------------------------------------------------------
>makeztxt -l -t "Treasure Island" -r "PART [A-Z]+" -r " [0-9]+" \
treas10.txt
Generated bookmarks
Offset Title
----------- --------------------
12005 PART ONE
12422 PART TWO
12836 PART THREE
13087 PART FOUR
13685 PART FIVE
14102 PART SIX
14656 PART ONE
14723 1
Charles Darwin's "On the Origin of Species"
------------------------------------------------------------
>makeztxt -l -t "On the Origin of Species" -r "Introduction\." \
-r "Chapter [IVX]+" otoos10.txt
Generated bookmarks
Offset Title
----------- --------------------
19482 Introduction.
29724 Chapter I
99693 Chapter II
129257 Chapter III
165118 Chapter IV
259640 Chapter V
332498 Chapter VI
399182 Chapter VII
4. Using makeztxt (deconstructing)
------------------------------------------------------------
Running 'makeztxt -d --help' will print out commandline usage for disecting
zTXT files. This mode is much simpler than that of creating zTXT database, so
it should be much easier to use. Simply give makeztxt a zTXT PDB file
(filename.pdb) and it will output the uncompressed text data into another file
(filename.txt). The exact output filename can be specified with the -o
option.
makeztxt can also extract the bookmark list and the annotations from the zTXT
file and output them. To output a bookmark list, give an output filename with
the -m option. Similarly, to output a file with the zTXT's annotations, give
an output filename with the -A option.
That's all there is to it.
5. List of command line options
------------------------------------------------------------
5.1. Options used when creating a zTXT database:
-----------------------------------------------
-A/--annofile filename -- Give makeztxt a file containing annotations that
will be added into the generated zTXT database. This file must follow a
particular format to be understood by makeztxt. Each annotation is of the
format:
1) An annotation begins with a title line:
Title: My Annotation
where the text after the colon is the annotation's title with a
maximum of 20 characters.
2) The next line is the location in the text of the annotation anchor:
Offset: 12345
where the offset value is an absolute character position in the
*reformatted* text file.
3) The actual annotation text:
Annotation: This is the text of my annotation.
The annotation text will continue after a *single* "Annotation:" line
until one of the following conditions is met: a) the file ends, b)
another annotation is started with a "Title:" line, or c) the
annotation reaches the maximum size of 4096 characters.
-a/--adjust int -- Control the method of text formatting. Valid types are
0, 1, or 2. Method 0 will compute the average line length through the
entire file and strip newline characters from any line longer than the
average. Method 1 will strip the newline from any line with text in
it. Method 2 will leave the text unchanged. The default is 0.
-b/--length int -- If adjust method 0 is used, the value given with this
option is the length a line must be to have its newline stripped. Using
this option will override the value calculated by makeztxt.
-h/--help -- Display command line options and usage information.
-l/--list -- Display a list of all bookmarks generated by makeztxt or
specified by the user. This is useful if you want to make sure your
regular expressions are generating correct bookmarks.
-L/--launchable -- Sets the "launchable" attribute in the generated zTXT
database. The Launcher apps on a Palm device can use this attribute and
will display all zTXT documents in the main program listing allowing you
to launch Weasel and open a specific document by tapping on the document
directly. Default is OFF.
-m/--markfile filename -- Give makeztxt a file containing a pre-generated
list of bookmarks to add to the generated zTXT database. The bookmark
file has a very simple format. Each line begins with an integer offset
for the bookmark anchor. Following that are one or more spaces/tabs.
Finally is the bookmark title which occupies the remainder of the line up
to a maximum of 20 characters. A line might look like:
23955 Chapter VII
-n/--nobackup -- Instructs makeztxt to not set the backup attribute in the
generated zTXT database. This attribute, if set, will cause the database
to be backed up during the next HotSync operation. Default is to set this
attribute.
-o/--output filename -- Explicitly give the output filename which makeztxt
should use. If this filename is not given, makeztxt will generate an
output filename by removing the extension of the input file and replacing
it with "pdb". If makeztxt is reading input from standard input this
option is mandatory.
-R/--regexfile filename -- makeztxt will attempt to read a default set of
regular expressions from the file .makeztxtrc in the user's home directory
or from /etc/makeztxt.conf if that fails. This option can be used to tell
makeztxt which file to read the list of regex from. Useful for user's on
systems with no home directories.
-r/--regex string -- Supply makeztxt with a regular expression for bookmark
generation. string is a valid regex. This option can be given multiple
times on the command line, each one adding a new regex.
-t/--title string -- Specify the title of the generated zTXT database. The
database title is stored within the database and is the name which will
appear under Palm OS. The title is limited to 32 characters. If makeztxt
is reading input from standard input this option is mandatory.
-V/--version -- Cause makeztxt to print out version information and exit.
-z/--compression int -- Set the method of compression to be used. makeztxt
supports to methods of compression. Method 1 allows for random access
with a zTXT document and is the standard method. Method 2 gives 10-15%
higher compression but requires that the entire document be decompressed
before it can be read by the user. Default is method 1.
5.2. Options used when deconstructing a zTXT database:
-----------------------------------------------------
-d/--deconstruct -- This option tells makeztxt that you wish to deconstruct
a zTXT database. It is required for this mode of operation.
-A/--annofile filename -- Specify the filename into which makeztxt will
store any annotations extracted from the input zTXT file. If this
option is not given, annotations will not be extracted.
-h/--help -- Display command line options and usage information.
-m/--markfile filename -- Specify the filename into which makeztxt will
store any bookmarks extracted from the input zTXT file. If this option is
not given, bookmarks will not be extracted.
-o/--output filename -- Specify the output file makeztxt will store the
extracted text data. If this option is not given, makeztxt will generate
a default filename by removing the extension from the input file name and
replacing it with "txt". If makeztxt is reading input from standard input
this option is mandatory.
-V/--version -- Cause makeztxt to print out version information and exit.
6. Compiling makeztxt (for great profit!)
------------------------------------------------------------
makeztxt uses zLib v1.1.3 (http://www.info-zip.org/pub/infozip/zlib).
You will need to have zLib compiled for your HOST machine. All Linux
distributions as well as most other Unices come with zLib, though it is
possible you may be lacking the zLib header files.
You should look in the Makefile to make sure the program names and paths are
okay.
If you are running on Sun hardware, uncomment the PACK line in the Makefile.
makeztxt will not work without this. If you are getting mysterious crashes,
you might want to try this switch as well, however, if you are on an x86
system, you should not enable that flag.
If your system does not have GNU regex (Solaris, Cygwin, others) then
uncomment the USEPOSIX line to cause makeztxt to use POSIX regex.
If you are compiling on a Windows system, or any system which makes a
distinction between text and binary files, you'll need to uncomment out the
HAVEBINARYFLAG line in order to get valid output from makeztxt.
Lastly, you can uncomment STATICLIBS to statically link against zlib. This
can be beneficial on Cygwin systems to cut down on the number of DLLs that
need to be distributed.
Now run:
"make"
You should now have makeztxt.
If you're messing with the source, then maybe you want to help. If you have
any problems, feel free to email me at johng@as.arizona.edu . Please use, if
possible, the latest code from the CVS repository. It can be found at:
http://sourceforge.net/projects/gutenpalm
If you would like to submit a bug report or a feature request, please make use
of the facilities on Weasel's SourceForge project page. This allows for much
easier management of bug and feature request tracking. It also ensures that
your report is not forgotten about. The project page is at:
http://sf.net/projects/gutenpalm
7. Miscellaneous Notes
------------------------------------------------------------
** The standard "it runs fine for me" disclaimer applies. I've tested it a
lot myself, but you can never predict everything. Still, there's no
oddball hacking involved so I think the chance of catastophic Palm
explosion should be small indeed. This is not to say that it won't ever
crash/hang, but if it does...