The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Algorithm::ConstructDFA2 - Deterministic finite automaton construction

SYNOPSIS

  use Algorithm::ConstructDFA2;

  my $dfa = Algorithm::ConstructDFA2->new(
    input_alphabet     => [ @symbols ],
    input_vertices     => [ qw/ 2 3 4 / ],
    input_edges        => [ [ 2, 3 ], [ 3, 4 ] ],

    vertex_nullable    => sub($vertex)         { ... },
    vertex_matches     => sub($vertex, $input) { ... },

    storage_dsn        => 'dbi:SQLite:dbname=...',
  );

  my $start_id = $dfa->find_or_create_state_id(qw/ 2 /);

  while (my $count = $dfa->compute_some_transitions(1_000)) {
    ...
  }

  my @accepting = $dfa->cleanup_dead_states(sub(@vertices) {
    ...
  });

DESCRIPTION

This module computes deterministic finite automata from equivalent non-deterministic finite automata. The input NFA must be expressed as directed graph with labeled vertices. Vertex labels indicate if vertices match a particular terminal symbol from an input alphabet, or match the empty string, meaning they can be crossed without any input when matching a string.

This is slightly different from how NFA graphs are usually encoded in literature (as graph with labeled edges), but the conversion is straightforward (turn edges into additional vertices). Finding a suitable alphabet is more difficult, Set::IntSpan::Partition can help with that (the module splits sets of sets of terminals like "letters" and "digits" and "hexdigits" into non-overlapping sets, each of which can then be used as a terminal for this module).

DFAs can be exponentially larger than equivalent NFAs; to accomodate large or complicated NFAs, computed data is held in a SQLite database to reduce memory use. Since a DFA is basically just the result of exhaustively computing cross-products, most computation is done in SQL, leaving only minimal Perl code.

CONSTRUCTOR

new(%options)

The %options hash supports the following keys:

input_vertices

Array of vertices (unsigned integers) in the input graph.

input_edges

Array of edges (arrays of two vertices) in the input graph.

input_alphabet

Array of terminal symbols (unsigned integers).

vertex_nullable

Code reference called for each vertex in the input graph. Should return a true value if and only if the vertex matches the empty string.

vertex_matches

Code reference called for each pair of input vertex and input symbol from the input alphabet. Should return a true value if and only if the vertex matches the input symbol.

storage_dsn

Database to use for computations, dbi:SQLite:dbname=:memory: by default.

METHODS

$dfa->find_or_create_state_id(@vertices)

Given a list of vertices, computes a new state, adds it to the automaton if it does not already exist, and returns an identifier for the state. This is used to create a start state in the DFA.

$dfa->compute_some_transitions($limit)

Computes up to $limit additional transitions and returns the number of transitions actually computed. A return value of zero indicates that all transitions have been computed.

$dfa->dead_state_id()

Returns the state identifier for a fixed dead state (from which no accepting configuration can be reached).

$dfa->cleanup_dead_states(\&vertices_accept)

Given a code reference that takes a list of vertices and returns true if and only if the vertices are an accepting configuration, this method changes the automaton so that dead states have no transitions to different dead states.

If, for example, the input NFA has a special "final" vertex that indicates acceptance when reached, the code reference would check if the vertex list contains this vertex.

$dfa->transitions_as_3tuples()

Returns a list of all transitions computed so far as. Transitions are arrays with three identifiers for the source state, the input symbol, and the destination state.

  for my $transition ( $dfa->transitions_as_3tuples() ) {
    my ($src_state, $input, $dst_state) = @$transition;
    ...
  }
$dfa->vertices_in_state($state_id)

Returns a list of vertices in the state $state_id.

$dfa->transitions_as_5tuples()

Returns a list of all transitions computed so far as. Transitions are arrays with five identifiers: the source state, an input vertex included in the source state, the input symbol, the destination state and an input vertex included in the destination state.

  for my $transition ( $dfa->transitions_as_5tuples() ) {
    my ($src_state, $src_vertex, $input, $dst_state, $dst_vertex) =
      @$transition;
    ...
  }

Note that unlike transitions_as_3tuples this omits transitions involving the main dead state.

$dfa->backup_to_file('v0', $file)

Create a backup of the database used to store input and computed data into $file. The first parameter must be v0 and indicates the version of the database schema.

TODO

  • It does not make sense for transitions_as_5tuples and its companions to return a list for large automata. But short of returning the DBI statement handle there does not seem to be a good way to return something more lazy.

  • ...

BUG REPORTS

Please report bugs in this module via http://rt.cpan.org/NoAuth/Bugs.html?Dist=Algorithm-ConstructDFA2

SEE ALSO

ACKNOWLEDGEMENTS

Thanks to Slaven Rezic for bug reports.

AUTHOR / COPYRIGHT / LICENSE

  Copyright (c) 2017-2018 Bjoern Hoehrmann <bjoern@hoehrmann.de>.
  This module is licensed under the same terms as Perl itself.