The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

GraphViz2::Marpa::Lexer - A Perl lexer for Graphviz dot files. Output goes to GraphViz2::Marpa::Parser.

Synopsis

o Display help
        perl scripts/lex.pl   -h
        perl scripts/parse.pl -h
        perl scripts/g2m.pl   -h
o Run the lexer
        perl scripts/lex.pl -input_file x.gv -lexed_file x.lex

        x.gv is a Graphviz dot file. x.lex will be a CSV file of lexed tokens.
o Run the parser without running the lexer or the default renderer
        perl scripts/parse.pl -lexed_file x.lex -parsed_file x.parse

        x.parse will be a CSV file of parsed tokens.
o Run the parser and the default renderer
        perl scripts/parse.pl -lexed_file x.lex -parsed_file x.parse -output_file x.rend

        x.rend will be a Graphviz dot file.
o Run the lexer, parser and default renderer
        perl scripts/g2m.pl -input_file x.gv -lexed_file x.lex -parsed_file x.parse -output_file x.rend

Description

GraphViz2::Marpa::Lexer provides a Set:FA::Element-based lexer for http://www.graphviz.org/ dot files.

The output is intended to be input into GraphViz2::Marpa::Parser.

Demo lexer/parser output: http://savage.net.au/Perl-modules/html/graphviz2.marpa/index.html.

State Transition Table: http://savage.net.au/Perl-modules/html/graphviz2.marpa/default.stt.html.

Command line options and object attributes: http://savage.net.au/Perl-modules/html/graphviz2.marpa/code.attributes.html.

My article on this set of modules: http://www.perl.com/pub/2012/10/an-overview-of-lexing-and-parsing.html.

The Marpa grammar as an image: http://savage.net.au/Ron/html/graphviz2.marpa/Marpa.Grammar.svg. This image was created with Graphviz via GraphViz2.

Installation

Install GraphViz2::Marpa as you would for any Perl module:

Run:

        cpanm GraphViz2::Marpa

or run:

        sudo cpan GraphViz2::Marpa

or unpack the distro, and then either:

        perl Build.PL
        ./Build
        ./Build test
        sudo ./Build install

or:

        perl Makefile.PL
        make (or dmake or nmake)
        make test
        make install

Constructor and Initialization

new() is called as my($lexer) = GraphViz2::Marpa::Lexer -> new(k1 => v1, k2 => v2, ...).

It returns a new object of type GraphViz2::Marpa::Lexer.

Key-value pairs accepted in the parameter list (see corresponding methods for details [e.g. description([$graph])]):

o description => $graphDescription

Read the Graphviz (dot) graph definition from the command line.

You are strongly encouraged to surround this string with '...' to protect it from your shell.

See also the 'input_file' option to read the description from a file.

The 'description' option takes precedence over the 'input_file' option.

Default: ''.

o input_file => $aDotInputFileName

Read the Graphviz (dot) graph definition from a file.

See also the 'description' option to read the graph definition from the command line.

The 'description' option takes precedence over the 'input_file' option.

Default: ''.

See the distro for data/*.gv.

o lexed_file => $aLexedOutputFileName

Specify the name of a CSV file of lexed tokens to write. This file can be input to the parser.

Default: ''.

The default means the file is not written.

See the distro for data/*.lex.

o logger => $aLoggerObject

Specify a logger compatible with Log::Handler, for the lexer to use.

Default: A logger of type Log::Handler which writes to the screen.

To disable logging, just set 'logger' to the empty string (not undef).

o maxlevel => $logOption1

This option affects Log::Handler.

See the Log::Handler::Levels docs.

Default: 'notice'.

o minlevel => $logOption2

This option affects Log::Handler.

See the Log::Handler::Levels docs.

Default: 'error'.

No lower levels are used.

o report_items => $Boolean

Log the items recognised by the lexer.

Default: 0.

o report_stt => $Boolean

Log the State Transition Table.

Calls "report()" in Set::FA::Element. Set min and max log levels to 'info' for this.

Default: 0.

o stt_file => $sttFileName

Specify which file contains the State Transition Table.

Default: ''.

The default value means the STT is read from the source code of GraphViz2::Marpa::Lexer.

Candidate files are '' and 'data/default.stt.csv'.

The type of this file must be specified by the 'type' option.

If the file name matches /csv$/, the value of the 'type' option is set to 'csv'.

o timeout => $seconds

Run the DFA for at most this many seconds.

Default: 10.

o type => $type

Specify the type of the stt_file: '' for internal STT and 'csv' for CSV.

Default: ''.

The default value means the STT is read from the source code of GraphViz2::Marpa::Lexer.

This option must be used with the 'stt_file' option.

Warning: The 'ods' option is disabled, because I can find no way in LibreOffice to make it operate in ASCII. What happens is that when you type " (i.e. the double-quote character on the keyboard), LibreOffice inserts a different double-quote character, which, when exported as CSV in Unicode format, produces these 3 bytes: 0xe2, 0x80, 0x9c. This means that if you edit the STT, you absolutely must export to a CSV file in ASCII format. It also means that dot identifiers in (normal) double-quotes will never match the double-quotes in the *.ods file.

Methods

description([$graph])

The [] indicate an optional parameter.

Get or set the Graphviz (dot) graph definition.

The value supplied by the 'description' option takes precedence over the value read from the 'input_file'.

See also "input_file()".

'description' is a parameter to "new()". See "Constructor and Initialization" for details.

generate_lexed_file($file_name)

Write the lexed tokens to the named file.

Called as needed by run().

get_graph_from_command_line()

If the caller has requested a graph be parsed from the command line, with the 'description' option to "new()", get it now.

Called as appropriate by "run()".

get_graph_from_file()

If the caller has requested a graph be parsed from a file, with the 'input_file' option to "new()", get it now.

Called as appropriate by "run()".

graph_text([$graph])

The [] indicate an optional parameter.

Get or set the value of the Graphviz (dot) graph definition string.

Called by "get_graph_from_command_line()" and "get_graph_from_file()".

input_file([$graph_file_name])

Here, the [] indicate an optional parameter.

Get or set the name of the file to read the Graphviz (dot) graph definition from.

The value supplied by the 'description' option takes precedence over the value read from the 'input_file'.

See also the "description()" method.

'input_file' is a parameter to "new()". See "Constructor and Initialization" for details.

items()

Returns an arrayref of lexed tokens. Each element of this arrayref is a hashref.

These lexed tokens do not bear a one-to-one relationship to the parsed tokens returned by the parser's "GraphViz2::Marpa::Parser" in items() method. However, they are (necessarily) very similar.

If you provide an output file by using the 'lexed_file' option to "new()", or the "lexed_file()" method, the file will have 2 columns, type and value.

E.g.: If the arrayref looks like:

        ...
        {count => 10, name => '', type => 'open_bracket'   , value => '['},
        {count => 11, name => '', type => 'attribute_id'   , value => 'color'},
        {count => 12, name => '', type => 'equals'         , value => '='},
        {count => 13, name => '', type => 'attribute_value', value => 'red'},
        {count => 14, name => '', type => 'right_bracket'  , value => ']'},
        ...

then the output file will look like:

        "type","value"
        ...
        open_bracket    , "["
        attribute_id    , "color"
        equals          , "="
        attribute_value , "red"
        close_bracket   , "]"
        ...

If you look at the source code for the run() method in GraphViz2::Marpa, you'll see this arrayref can be passed directly as the value of the 'tokens' key in the call to GraphViz2::Marpa::Parser's new().

Usage:

        my($lexer) = GraphViz2::Marpa::Lexer -> new(...);

        # $lexer -> items actually returns an object of type Set::Array.

        if ($lexer -> run == 0)
        {
                my(@items) = @{$lexer -> items};
        }

See also "How is the lexed graph stored in RAM?" in the "FAQ" below. And see any data/*.lex file for sample data.

And now for a real graph:

Input: data/15.gv:

        digraph graph_15
        {
                node
                [
                        shape = "record"
                ]
                edge
                [
                        color = "red"
                        penwidth = 5
                ]
                node_15_1
                [
                        label = "<f0> left|<f1> middle|<f2> right"
                ]
                node_15_2
                [
                        label = "<f0> one|<f1> two"
                ]
                node_15_1:f0 -> node_15_2:f1
                [
                        arrowhead = "obox"
                ]
        }

Output: data/15.lex:

        "type","value"
        strict              , "no"
        digraph             , "yes"
        graph_id            , "graph_15"
        start_scope         , "1"
        class_id            , "node"
        open_bracket        , "["
        attribute_id        , "shape"
        equals              , "="
        attribute_value     , "record"
        close_bracket       , "]"
        class_id            , "edge"
        open_bracket        , "["
        attribute_id        , "color"
        equals              , "="
        attribute_value     , "red"
        attribute_id        , "penwidth"
        equals              , "="
        attribute_value     , "5"
        close_bracket       , "]"
        node_id             , "node_15_1"
        open_bracket        , "["
        attribute_id        , "label"
        equals              , "="
        attribute_value     , "<f0> left|<f1> middle|<f2> right"
        close_bracket       , "]"
        node_id             , "node_15_2"
        open_bracket        , "["
        attribute_id        , "label"
        equals              , "="
        attribute_value     , "<f0> one|<f1> two"
        close_bracket       , "]"
        node_id             , "node_15_1"
        open_bracket        , "["
        attribute_id        , "port_id"
        equals              , "="
        attribute_value     , "f0"
        close_bracket       , "]"
        edge_id             , "->"
        node_id             , "node_15_2"
        open_bracket        , "["
        attribute_id        , "port_id"
        equals              , "="
        attribute_value     , "f1"
        attribute_id        , "arrowhead"
        equals              , "="
        attribute_value     , "obox"
        close_bracket       , "]"
        end_scope           , "1"

Note the pair:

        open_bracket        , "["
        ...
        close_bracket       , "]"

They start and end each set of attributes, which are of 3 types:

o Node

Node attributes can be specified both at the class (all subsequent nodes) level, or for a specific node.

Class:

        node
        [
                shape = "record" # Attribute.
        ]

Node:

        node_15_1
        [
                label = "<f0> left|<f1> middle|<f2> right" # Attribute.
        ]

Edge:

        node_15_1:f0 -> node_15_2:f1 # Attributes.
        [
                arrowhead = "obox"
        ]
o Edge

Edge attributes can be specified both at the class level and after the second of 2 nodes on an edge.

        edge
        [
                color = "red" # Attribute.
                penwidth = 5  # Attribute.
        ]

and

        node_15_1:f0 -> node_15_2:f1
        [
                arrowhead = "obox" # Attribute.
        ]
o Port/compass point

These only ever occur for one or both of the 2 nodes on an edge, i.e. not at the class or node level:

        node_15_1:f0 -> node_15_2:f1 # Attributes.
        [
                arrowhead = "obox"
        ]

lexed_file([$lex_file_name])

Here, the [] indicate an optional parameter.

Get or set the name of the CSV file of lexed tokens to write. This file can be input to the parser.

'lexed_file' is a parameter to "new()". See "Constructor and Initialization" for details.

log($level, $s)

Calls $self -> logger -> $level($s) if ($self -> logger).

logger([$logger_object])

Here, the [] indicate an optional parameter.

Get or set the logger object.

To disable logging, just set 'logger' to the empty string (not undef), in the call to "new()".

This logger is passed to GraphViz2::Marpa::Lexer::DFA.

'logger' is a parameter to "new()". See "Constructor and Initialization" for details.

maxlevel([$string])

Here, the [] indicate an optional parameter.

Get or set the value used by the logger object.

This option is only used if GraphViz2::Marpa:::Lexer or GraphViz2::Marpa::Parser use or create an object of type Log::Handler. See Log::Handler::Levels.

'maxlevel' is a parameter to "new()". See "Constructor and Initialization" for details.

minlevel([$string])

Here, the [] indicate an optional parameter.

Get or set the value used by the logger object.

This option is only used if GraphViz2::Marpa:::Lexer or GraphViz2::Marpa::Parser use or create an object of type Log::Handler. See Log::Handler::Levels.

'minlevel' is a parameter to "new()". See "Constructor and Initialization" for details.

new()

See "Constructor and Initialization" for details on the parameters accepted by "new()".

report()

Log the list of items recognized by the DFA.

report_items([$Boolean])

The [] indicate an optional parameter.

Get or set the value which determines whether or not to log the items recognised by the lexer.

'report_items' is a parameter to "new()". See "Constructor and Initialization" for details.

report_stt([$Boolean])

The [] indicate an optional parameter.

Get or set the value which determines whether or not to log the parsed state transition table (STT).

Calls "report()" in Set::FA::Element. Set min and max log levels to 'info' for this.

'report_stt' is a parameter to "new()". See "Constructor and Initialization" for details.

run()

This is the only method the caller needs to call. All parameters are supplied to "new()" (or other methods).

Returns 0 for success and 1 for failure.

stt_file([$stt_file_name])

The [] indicate an optional parameter.

Get or set the name of the file containing the State Transition Table.

This option is used in conjunction with the 'type' option to "new()".

If the file name matches /csv$/, the value of the 'type' option is set to 'csv'.

'stt_file' is a parameter to "new()". See "Constructor and Initialization" for details.

timeout($seconds)

The [] indicate an optional parameter.

Get or set the timeout for how long to run the DFA.

'timeout' is a parameter to "new()". See "Constructor and Initialization" for details.

type([$type])

The [] indicate an optional parameter.

Get or set the value which determines what type of 'stt_file' is read.

'type' is a parameter to "new()". See "Constructor and Initialization" for details.

utils([$aUtilsObject])

Here, the [] indicate an optional parameter.

Get or set the utils object.

Default: A object of type GraphViz2::Marpa::Utils.

FAQ

Are the certain cases I should watch out for?

Yes. Consider these 3 situations and their corresponding lexed output:

o digraph g {...}
        digraph     , "yes"
        graph_id    , "g"
        start_scope , "1"
o The start_scope count must be 1 because it's at the very start of the graph
o subgraph s {...}
        start_subgraph  , "1"
        graph_id        , "s"
        start_scope     , "2"
o The start_scope count must be 2 or more
o When start_scope is preceeded by graph_id, it's a subgraph
o Given 'subgraph {...}', the graph_id will be ""
o {...}
        start_scope , "2"
o The start_scope count must be 2 or more
o When start_scope is not preceeded by graph_id, it's a stand-alone {...}

Why doesn't the lexer/parser handle my HTML-style labels?

Traps for young players:

o The <br /> component must include the '/'. <br align='center'> is not accepted by Graphviz
o The <br />'s attributes must use single quotes because output files use CSV with double quotes

See data/38.* for good examples.

Where are the scripts documented?

In "Scripts" in GraphViz2::Marpa.

Where is the State Transition Table?

I use data/default.stt.ods via LibreOffice, when editing the STT.

Then, I export it to data/default.stt.csv. This file is incorporated into the source code of Lexer.pm, after the __DATA__ token.

Lastly, I run scripts/stt2html.pl, and output the result to html/default.stt.html.

So I ship 3 representations of the STT in the distro.

When the lexer runs, the 'stt_file' and 'type' options to "new()" default to reading the STT - using Data::Section::Simple's function get_data_section() - directly from __DATA__.

Where are the functions named in the STT?

In GraphViz2::Marpa::Lexer::DFA.

How is the lexed graph stored in RAM?

Items are stored in an arrayref. This arrayref is available via the "items()" method, which also has a long explanation of this subject.

These items have the same format as the arrayref of items returned by the items() method in GraphViz2::Marpa::Parser, and the same as in GraphViz2::Marpa::Lexer::DFA.

However, the precise values in the 'type' field of the following hashref vary between the lexer and the parser.

Each element in the array is a hashref:

        {
                count => $integer, # 1 .. N.
                name  => '',       # Unused.
                type  => $string,  # The type of the token.
                value => $value,   # The value from the input stream.
        }

$type => $value pairs used by the lexer are listed here in alphabetical order by $type:

o attribute_id => $id
o attribute_value => $value
o class_id => /^edge|graph|node$/

This represents 3 special tokens where the author of the dot file used one or more of the 3 words edge, graph, or node, to specify attributes which apply to all such cases. So:

        node [shape = Msquare]

means all nodes after this point in the input stream default to having an Msquare shape. Of course this can be overidden by another such line, or by any specific node having a shape as part of its list of attributes.

See data/51.* for sample code.

o close_bracket => ']'

This indicates the end of a set of attributes.

o digraph => $yes_no

'yes' => digraph and 'no' => graph.

o edge_id => $id

$id is either '->' for a digraph or '--' for a graph.

o end_scope => $brace_count

This indicates the end of the graph, the end of a subgraph, or the end of a stand-alone {...}.

$brace_count increments by 1 each time '{' is detected in the input string, and decrements each time '}' is detected.

o end_subgraph => $subgraph_count

This indicates the end of a subgraph, and follows the subgraph's 'end_scope'.

$subgraph_count increments by 1 each time 'subgraph' is detected in the input string, and decrements each time a matching '}' is detected.

o equals => '='

This separates 'attribute_id' from 'attribute_value'.

The parser does not output this token.

o graph_id => $id

This indicates both the graph's $id and each subgraph's $id.

For graphs and subgraphs, the $id may be '' (the empty string), and in a case such as:

        {
                rank = same
                A
                B
        }

The $id will definitely be ''.

See data/18.gv, data/19.gv, data/53.gv and data/55.gv.

o node_id => $id
o start_scope => $brace_count

This indicates the start of the graph, the start of a subgraph, or the start of a stand-alone {...}.

$brace_count increments by 1 each time '{' is detected in the input string, and decrements each time '}' is detected.

o open_bracket => '['

This indicates the start of a set of attributes.

o start_subgraph => $subgraph_count

This indicates the start of a subgraph, and preceeds the subgraph's 'graph_id'.

$subgraph_count increments by 1 each time 'subgraph' is detected in the input string, and decrements each time a matching '}' is detected.

o strict => $yes_no

'yes' => strict and 'no' => not strict.

Consult data/*.gv and the corresponding data/*.lex for many examples.

How does the lexer handle comments?

See the next point.

What are the Limitations of the lexer?

o Comments can be of the form m!^\s*(#|//)!

That is, Bash (Perl) and C++-style line-oriented comments are recognized, and the whole line is discarded.

This happens when the line is read in from a file, and so does not apply to the 'description' parameter to new().

o Comments can be of the form /* ... */

This is, C-style comments are recognized, and the comment is discaded.

This happens via the STT, and so applies to any source of input.

But, no attempt is made to ensure the '/*' and '*/' are not embedded in otherwise non-comment strings, so don't do that.

o What does this mean for trailing comments?

Simply that Bash and C++-style comments appearing on the ends of lines containing dot commands are not handled. So, don't do that ether.

o Since comments are discarded, they will never appear in the output

This means that no output file, e.g. *.lex, *.parse or *.rend, will ever retain comments from the input *.gv file.

o Are there any dot files the lexer or parser cannot handle?

Perhaps. Perfection is an extra-cost option... The cost is unknown, but huge donations are welcome.

Actually, according to DOT's HTML-like label definition, http://www.graphviz.org/content/node-shapes#html you can use <...> instead of "..." to delimit text labels. The lexer as of V 1.02 does not handle this case. That is, the code only recognizes HTML-like labels which are delimited with '<<' and '>>'.

Machine-Readable Change Log

The file CHANGES was converted into Changelog.ini by Module::Metadata::Changes.

Version Numbers

Version numbers < 1.00 represent development versions. From 1.00 up, they are production versions.

Support

Email the author, or log a bug on RT:

https://rt.cpan.org/Public/Dist/Display.html?Name=GraphViz2::Marpa.

Author

GraphViz2::Marpa was written by Ron Savage <ron@savage.net.au> in 2012.

Home page: http://savage.net.au/index.html.

Copyright

Australian copyright (c) 2012, Ron Savage.

        All Programs of mine are 'OSI Certified Open Source Software';
        you can redistribute them and/or modify them under the terms of
        The Artistic License, a copy of which is available at:
        http://www.opensource.org/licenses/index.html