The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

FrameMaker::MifTree - A MIF Parser

VERSION

This document describes version 0.075, released 2 May 2006.

SYNOPSIS

  use FrameMaker::MifTree;
  my $mif = FrameMaker::MifTree->new;
  $mif->parse_miffile('filename.mif');
  @strings = $mif->daughters_by_name('String', recurse => 1);
  print $strings[0]->string;
  $strings[3]->string('Just another new string.');
  $mif->dump_miffile('newmif.mif');

DESCRIPTION

The FrameMaker::MifTree class is implemented as a Tree::DAG_Node subclass, and thus inherits all the methods of that class. Two methods are overridden. Please read Tree::DAG_Node to see what other methods are available.

MIF (Maker Interchange Format) is an Adobe FrameMaker file format in ASCII, consisting of statements that create an easily parsed, readable text file of all the text, graphics, formatting, and layout constructs that FrameMaker understands. Because MIF is an alternative representation of a FrameMaker document, it allows FrameMaker and other applications to exchange information while preserving graphics, document content, and format.

This document does not tell you what the syntax of a MIF file is, nor does it document the meaning of the MIF statements. For this, please read (and re-read) the MIF_Reference.pdf, provided by Adobe.

MifTree not only knows the MIF syntax, but it also has some understanding of the allowed structures (within their contexts) and attribute types. The file FrameMaker/MifTree/MifTreeTags holds all the valid MIF statements and the attribute type for every statement. This file may need some improvement, as it is created by analyzing a large collection of MIF files written by FrameMaker (and an automatic analysis of the MIF Reference, which showed several typos and inconsistencies in that manual). The current file is for MIF version 7.00.

Dependencies

This class implementation depends on the following modules, all available from CPAN:

  • Tree::DAG_Node

  • IO::Tokenized and IO::Tokenized::File and the custom-made IO::Tokenized::Scalar

  • IO::Stringy (only IO::Scalar is needed)

Overridden Methods

add_daughters(LIST)

Adds a list of daughter object to a node. The difference with the DAG_Node method is that it checks for a valid MIF construct. Only the mother/daughter relationship is checked.

attributes(VALUE)

The attributes method of the FrameMaker::MifTree class does not require a reference as an attribute, as does the DAG_Node equivalent. As an extra, the method checks if the method is called on a leaf, since the MIF structure does not allow attributes on non-ending nodes. The method reads/sets the raw attribute, no string conversion, path encoding/decoding or value extraction is done. To obtain or set one of those values, use the specific "Attribute Methods" mentioned below.

Quick Creators

The following methods can be used instead of the DAG_Node standard methods to build your MIF structure. It's just a lazy way of adding daughters, but it improves readability of your code if you create something like:

  my $mif = FrameMaker::MifTree->new->add_node(
    AFrames => FrameMaker::MifTree->add_node(
      Frame => FrameMaker::MifTree->add_node(
        ImportObject => FrameMaker::MifTree->add_leaf(
          ImportObFileDI => encode_path('c:\bar\foo.eps'))
      ),
      FrameMaker::MifTree->add_node(
        ImportObject => FrameMaker::MifTree->add_leaf(
          ImportObFileDI => encode_path('../../foo/boo.eps'))
      )
    )
  );
add_leaf(MIFSTATEMENT, ATTRIBUTE or GRANDDAUGHTERLIST)

Adds a new daughter to the object. The first argument specifies the name, all the following arguments are taken either as the attribute for the leaf, or as a list of granddaughter objects to add to the newly created daughter. (In MIFTree world, newly born daughters mature in split seconds.)

add_node(MIFSTATEMENT, ATTRIBUTE or GRANDDAUGHTERLIST)

An exact synonym for the add_leaf method.

add_facet()

Adds a facet to the object. In DAG_Node tree terms, this is implemented as a leaf with the name "_facet" and a filehandle to a temp file as its attribute.

Search in Tree

$OBJ->daughters_by_name(NAMESTRING, recurse => BOOLEAN)

Find all daughters that listen to the name NAMESTRING, either walking the tree ("recurse" is true), or only on the mother's daughters ("recurse" false or omitted; the latter throws a warning that it will not recurse -- I've spent too much time debugging code where I forgot to add the "recurse" parameter). Returns the first object in scalar context, or a list of all found objects in list context.

Maybe one day I'll add magic to this function so you get the next item if you call the method on the same object without arguments.

Note that "daughter_by_name" is an exact alias for this method.

$OBJ->daughter_by_name(NAMESTRING, recurse => BOOLEAN)

Alias for "daughters_by_name".

$OBJ->daughters_by_name_and_attr(NAMESTRING, ATTRIBUTE, recurse => BOOLEAN)

Find all daughters that listen to the name NAMESTRING and have the raw attribute ATTRIBUTE, either walking the tree ("recurse" is true), or only on the mother's daughters ("recurse" false or omitted). Returns the first object in scalar context, or a list of all found objects in list context. ATTRIBUTE must be raw data, so use quote, unquote, encode_path and decode_path as appropriate.

If you specify an empty string or undef as the NAMESTRING, this method will just look for ATTRIBUTE.

Note that "daughters_by_name_and_attr" is an exact alias for this method.

$OBJ->daughter_by_name_and_attr(NAMESTRING, ATTRIBUTE, recurse => BOOLEAN)

Alias for "daughters_by_name_and_attr".

$OBJ->find_string(QUOTED_REGEX)

Returns a list of all strings that match QUOTED_REGEX under $OBJ. When called in scalar context, only the first match is returned. The string is in Unicode if the global modifier FrameMaker::MifTree->use_unicode is set (off by default.)

$OBJ->charleaves_to_strings()

Changes all the leaves with the name "Char" below $OBJ to their equivalent String leaves. This has no effect on the content of the MIF file; it just makes the file less ambiguous. Returns undef.

$OBJ->fold_strings()

This method folds all subsequent paragraph lines in a paragraph into one paragraph line. If you want to do operations on text, you should first use this method on (part of) the tree. In MIF, the flow of text over the lines is maintained, but since this information is not used while FrameMaker parses the MIF file, it is safe to remove this information. Returns undef.

All "Char" leaves except a "HardReturn" are transformed to their string equivalents. A "HardReturn" character forces a new paragraph line.

Attribute Methods

$OBJ->string(STRING)

Reads or sets the object's attribute as a MIF string. The method just calls quote and unquote as appropriate.

If the global modifier FrameMaker::MifTree->use_unicode is set to true, the string will be converted from Unicode to the FrameMaker character set first. (The method now throws a warning when you specify USE_UNICODE as the second argument.)

$OBJ->pathname(PATHSTRING)

Returns the object's attribute as local pathname, or sets it to the device independent pathname. The method just calls encode_path and decode_path as appropriate. PATHSTRING must also be a local pathname.

$OBJ->abs_pathname(FROMROOT)

Returns the object's attribute as a local pathname. The method just calls decode_path, passing on the FROMROOT argument. Use this method if you want to make sure that you always receive absolute pathnames, independently from what is stored in the attribute.

$OBJ->boolean(BOOLEAN)

Returns or sets the object's TRUE or FALSE value.

$OBJ->measurements(LIST)

Returns or sets a list of measurements. When called in scalar context, only the first measurement is returned. Everything is in the default unit of measurement. (Can be set using FrameMaker::MifTree->default_unit. If this variable is set to the empty string (which also happens to be the default), points are output.) You always get the values without the unit specifier, so calculations can be made directly on this. To get a value from the list, do something like:

  my $q;
  $q = FrameMaker::MifTree->new->add_leaf(
    PgfCellMargins => "0.0 pt 1.0 pt 2.0 pt 3.0 pt"
  );
  my $k = ($q->measurements)[1];
  print "k is now: $k\n"            # prints "k is now: 1"

In MIF, a maximum of four values can be supplied, but this is never checked by this method.

$OBJ->percentage(FRACTION)

Returns or sets the object's percentage value as a fraction (1 = 100%).

$OBJ->facet_data()

Returns the object's facet data as a list of lines. (Use a syswrite to facet_handle to set the objects data. Not a very elegant implementation, but I consider a facet to be rather esoteric, and we have to be efficient on memory usage as well...)

$OBJ->facet_handle()

Returns the filehandle to the object's facet data. Since the temporary file is sysopened, you should use syswrite instead of print to respect the buffering considerations.

FrameMaker::MifTree->default_unit(UNIT)

This class method returns or sets the global default units of measurement. See convert for a list of valid assignments.

FrameMaker::MifTree's default units of measurement can (and probably will) differ from the default <Units> that are specified in the MIF file.

The default for default_unit is an empty string, which means that no unit specifier will be output, and all values are in "points".

FrameMaker::MifTree->use_unicode(BOOLEAN)

This class global method returns or sets if strings are in Unicode or not.

Note on Unicode mapping: Most FrameMaker characters map easily to a Unicode equivalent. This is not true however, for the discretionary hyphen (hexadecimal 04, <Char DiscHyphen>), the FrameMaker "soft hyphen" (hexadecimal 06 <Char SoftHyphen>), and the "do not hyphenate" character (hexadecimal 05, <Char NoHyphen>).

The discretionary hyphen has a null default appearance in the middle of a line. At any intraword break that is used for a line break a hyphen glyph will be shown. Oddly enough this is defined in Unicode as a soft hyphen, and so it maps to the soft hyphen (U+00AD) character.

The soft hyphen in FrameMaker is used for automatically inserted hyphens by the FrameMaker hyphenation algorithm. It has no meaning in the MIF, since FrameMaker will reflow a document upon import. But to preserve it in the Unicode string, it is mapped to the Unicode hyphen character (U+2010). You should remove it with tr/\x{2010}//d if you don't want it.

The NoHyphen is a real control character that just prevents a word from being hyphenated automatically by FrameMaker. To preserve this character when doing a to and fro conversion, I decided to map it to the Unicode zero-width joiner (U+200D).

Everything is controlled from the MifTree/FmCharset file, so make changes there if you don't like my choices. Or better, override the %fmcharset hash.

Tests on Tree Object

$OBJ->is_node()

Tests if the object is a valid MIF node statement. That is, if its name occurs in the %mifnodes hash. Returns a list of valid daughters when a match is found. (In my terminology, "nodes" can have daughters, whereas leaves don't.)

$OBJ->is_leaf()

Tests if the object is a valid MIF leaf statement and thus can have an attribute value. The name is just looked up in the %mifleaves hash.

$OBJ->allows_daughter(DAUGHTEROBJECT)

Checks if a mother object can have a specific daughter object. I just thought this could come in handy when you want to bind one object tree to another.

$OBJ->check_attribute

Checks if the attribute conforms to the type. Currently the following types are defined:

  0xnnn
  ID
  L_T_R_B
  L_T_W_H
  W_H
  W_W
  X_Y
  X_Y_W_H
  boolean
  data
  degrees
  dimension
  empty *)
  integer
  keyword
  number
  pathname
  percentage
  seconds_microseconds
  string
  tagstring
  *) no attribute allowed; some leaves and all non-ending nodes have this

The function returns TRUE if the attribute seems valid, and FALSE if there is an error. Use get_attribute_error to see the error.

$OBJ->get_attribute_error

Returns a meaningful text string if the attribute appears to be invalid.

$OBJ->validate(FROMROOT)

Not yet implemented.

Validates a MIF tree object. If you set FROMROOT to true, the validation starts from $OBJ->root, and special checking is done on the root object. This special behaviour is needed because the method cannot know if a FrameMaker::MifTree object is to represent a complete MIF file, and not just a fragment. So please remember always to set FROMROOT if you want to validate a complete MIF tree, even if $OBJ already points to the root object.

From/to MIF Syntax

LIST = $obj->dump_mif()

Dumps out the current tree as a list of MIF statements in valid MIF file syntax. You can write the resulting list to a file. The method tries to mimic the Adobe MIF parser file layout as closely as possible. Please note that this method can be memory intensive, since it creates a whole new copy of your MIF tree in memory. If you just want to write the MIF tree to a file, you may want to use dump_miffile instead.

LIST = $obj->dump_miffile(FILENAME)

Dumps out the current tree of MIF statements into a valid MIF file syntax. The method returns with a FALSE result if the file cannot be written.

$OBJ->parse_mif(STRING)

Parses a string of MIF statements into the object. This is also a very quick way to set up an object tree:

  my $new_obj = FrameMaker::MifTree->new();
  $new_obj->parse_mif(<<ENDMIF);
  <MIFFile 7.00># The only required statement
  <Para # Begin a paragraph
  <ParaLine# Begin a line within the paragraph
  <String `Hello World'># The actual text of this document
  > # end of Paraline #End of ParaLine statement
  > # end of Para #End of Para statement
  ENDMIF

Implemented by tying the scalar to a filehandle and calling IO::Tokenizer on the resulting handle.

The parser currently has the following limitations:

  • All comments are lost.

  • Macro statements are not (yet) implemented.

  • Include statements are not (yet) implemented.

Maybe I'll do something about it. Someday.

$OBJ->parse_miffile(FILENAME)

Parses a file from disk into a DAG_Node tree structure. See parse_mif for details.

Old-style Functions

All these functions are exported by default.

quote(STRING)

Quotes a string with MIF style quotes, and escapes forbidden characters. Backslashes, backticks, single quotes, greater-than and tabs are escaped, non-ASCII values are written in their hexadecimal representation. So:

Some `symbols': > \Ø¿!>

is written as

  `Some \Qsymbols\q: \> \\\xaf \xc0 !'

As a special case, escaped hexadecimals are preserved in the input string. If you want a literal \x00 string, precede it with an extra backslash.

  print quote("\x09 ");     # prints `\x09 ', a forced return in FrameMaker
  print quote("\\x09 ");    # prints `\\x09 '; this will show up literally
                            # as \x09 in FrameMaker

(Note that after emitting a forced return, you must start a new ParaLine.)

If the global modifier $FrameMaker::MifTree::use_unicode is true, the string will be converted from Unicode to the FrameMaker character set.

unquote(STRING)

The opposite action. Surrounding quotes are removed and all escaped sequences are transliterated into their original character.

If the global modifier $FrameMaker::MifTree::use_unicode is true, the string will be converted from the FrameMaker character set to Unicode.

$FrameMaker::MifTree::use_unicode can be exported on request.

encode_path(STRING)

Encodes path names to the MIF path syntax. Usage:

   $mifPathString = encode_path('D:\Dos\Path\With\Backslashes\Filename');
   $mifPathString = encode_path('..\..\Also\Relative\Path\Is\Allowed\Filename');

The path name must not be in a MIF quoted style. It returns the device independent path name with the quotes.

decode_path(STRING, [ROOTPATH])

Usage:

   print decode_path ('<v\>C:<c\>Mydir<c\>Subdir<c\>Filename');
   # prints C:/Mydir/Subdir/Filename
   print decode_path ('<u\><u\><c\>Subdir<c\>Filename');
   # prints ../../Subdir/Filename

Currently only Windows path names are supported (meaning that Unix and MacOS style paths remain untested). MIF string quotes are removed. ROOTPATH, if specified, is the path that is prepended if STRING happens to be a relative path.

convert(VALUE_AND_OLDUNIT, NEWUNIT, SUPPRESSUNIT)

Converts a value in one unit of measurement into another. If you leave out the unit of measurement it defaults to FrameMaker::MifTree->default_unit (not to the MIF document's default unit of measurement!). Other measurements are:

  {
    pt         => 1 / 72,
    point      => 1 / 72,
    "          => 1,
    in         => 1,
    mm         => 1 / 25.4,
    millimeter => 1 / 25.4,
    cm         => 1 / 2.54,
    centimeter => 1 / 2.54,
    pc         => 1 / 6,
    pica       => 1 / 6,
    dd         => 0.01483,
    didot      => 0.01483,
    cc         => 12 * 0.01483,
    cicero     => 12 * 0.01483
  }

The optional argument SUPPRESSUNIT determines if the unit of measurement needs to be written in the result. Note that you won't get a unit of measurement included in your result when you leave out NEWUNIT and specify FrameMaker::MifTree->default_unit to be the empty string, even if you set SUPPRESSUNIT to be false. In that case the returned value is in points. So

  FrameMaker::MifTree->default_unit('');
  print convert('12.0 didot');            # prints the value in points: 12.8131
  FrameMaker::MifTree->default_unit('mm');
  print convert('12.0 didot', 'pt', 1);   # also prints 12.8131
  FrameMaker::MifTree->default_unit('pt');
  print convert('12.0 didot', '', 1);     # also prints 12.8131

All values are rounded to 4 decimals.

SEE ALSO

  • Adobe's MIF_Reference.pdf, included in FrameMaker's online documentation.

  • http://www.miffy.com, as this module was formerly called Miffy.pm

AUTHOR

Roel van der Steen, roel-perl@st2x.net

COPYRIGHT AND LICENSE

Copyright 2004 by ITP and Roel van der Steen

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.