The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Text::Tradition::Collation - a software model for a text collation

SYNOPSIS

  use Text::Tradition;
  my $t = Text::Tradition->new( 
    'name' => 'this is a text',
    'input' => 'TEI',
    'file' => '/path/to/tei_parallel_seg_file.xml' );

  my $c = $t->collation;
  my @readings = $c->readings;
  my @paths = $c->paths;
  my @relationships = $c->relationships;
  
  my $svg_variant_graph = $t->collation->as_svg();
    

DESCRIPTION

Text::Tradition is a library for representation and analysis of collated texts, particularly medieval ones. The Collation is the central feature of a Tradition, where the text, its sequence of readings, and its relationships between readings are actually kept.

CONSTRUCTOR

new

The constructor. Takes a hash or hashref of the following arguments:

  • tradition - The Text::Tradition object to which the collation belongs. Required.

  • linear - Whether the collation should be linear; that is, whether transposed readings should be treated as two linked readings rather than one, and therefore whether the collation graph is acyclic. Defaults to true.

  • baselabel - The default label for the path taken by a base text (if any). Defaults to 'base text'.

  • wit_list_separator - The string to join a list of witnesses for purposes of making labels in display graphs. Defaults to ', '.

  • ac_label - The extra label to tack onto a witness sigil when representing another layer of path for the given witness - that is, when a text has more than one possible reading due to scribal corrections or the like. Defaults to ' (a.c.)'.

  • wordsep - The string used to separate words in the original text. Defaults to ' '.

ACCESSORS

tradition

linear

wit_list_separator

baselabel

ac_label

wordsep

Simple accessors for collation attributes.

start

The meta-reading at the start of every witness path.

end

The meta-reading at the end of every witness path.

readings

Returns all Reading objects in the graph.

reading( $id )

Returns the Reading object corresponding to the given ID.

add_reading( $reading_args )

Adds a new reading object to the collation. See Text::Tradition::Collation::Reading for the available arguments.

del_reading( $object_or_id )

Removes the given reading from the collation, implicitly removing its paths and relationships.

has_reading( $id )

Predicate to see whether a given reading ID is in the graph.

reading_witnesses( $object_or_id )

Returns a list of sigils whose witnesses contain the reading.

paths

Returns all reading paths within the document - that is, all edges in the collation graph. Each path is an arrayref of [ $source, $target ] reading IDs.

add_path( $source, $target, $sigil )

Links the given readings in the collation in sequence, under the given witness sigil. The readings may be specified by object or ID.

del_path( $source, $target, $sigil )

Links the given readings in the collation in sequence, under the given witness sigil. The readings may be specified by object or ID.

has_path( $source, $target );

Returns true if the two readings are linked in sequence in any witness. The readings may be specified by object or ID.

relationships

Returns all Relationship objects in the collation.

add_relationship( $reading, $other_reading, $options )

Adds a new relationship of the type given in $options between the two readings, which may be specified by object or ID. Returns a value of ( $status, @vectors) where $status is true on success, and @vectors is a list of relationship edges that were ultimately added. See Text::Tradition::Collation::Relationship for the available options.

register_relationship_type( %relationship_definition )

Add a relationship type definition to this collation. The argument can be either a hash or a hashref, defining the properties of the relationship. For relationship types and their properties, see Text::Tradition::Collation::RelationshipType.

get_relationship_type( $relationship_name )

Retrieve the RelationshipType object for the relationship with the given name.

merge_readings( $main, $second, $concatenate, $with_str )

Merges the $second reading into the $main one. If $concatenate is true, then the merged node will carry the text of both readings, concatenated with either $with_str (if specified) or a sensible default (the empty string if the appropriate 'join_*' flag is set on either reading, or else $self->wordsep.)

The first two arguments may be either readings or reading IDs.

merge_related( @relationship_types )

Merge all readings linked with the relationship types given. If any of the selected type(s) is not a colocation, the graph will no longer be linear. The majority/plurality reading in each case will be the one kept.

WARNING: This operation cannot be undone.

compress_readings

Where possible in the graph, compresses plain sequences of readings into a single reading. The sequences must consist of readings with no relationships to other readings, with only a single witness path between them and no other witness paths from either that would skip the other. The readings must also not be marked as nonsense or bad grammar.

WARNING: This operation cannot be undone.

duplicate_reading( $reading, @witlist )

Split the given reading into two, so that the new reading is in the path for the witnesses given in @witlist. If the result is that certain non-colocated relationships (e.g. transpositions) are no longer valid, these will be removed. Returns the newly-created reading.

clear_witness( @sigil_list )

Clear the given witnesses out of the collation entirely, removing references to them in paths, and removing readings that belong only to them. Should only be called via $tradition->del_witness.

reading_witnesses( $reading )

Return a list of sigils corresponding to the witnesses in which the reading appears.

OUTPUT METHODS

as_svg( \%options )

Returns an SVG string that represents the graph, via as_dot and graphviz. See as_dot for a list of options. Must have GraphViz (dot) installed to run.

as_dot( \%options )

Returns a string that is the collation graph expressed in dot (i.e. GraphViz) format. Options include:

  • from

  • to

  • color_common

path_witnesses( $edge )

Returns the list of sigils whose witnesses are associated with the given edge. The edge can be passed as either an array or an arrayref of ( $source, $target ).

as_adjacency_list

Returns a JSON structure that represents the collation sequence graph.

as_graphml

Returns a GraphML representation of the collation. The GraphML will contain two graphs. The first expresses the attributes of the readings and the witness paths that link them; the second expresses the relationships that link the readings. This is the native transfer format for a tradition.

as_csv

Returns a CSV alignment table representation of the collation graph, one row per witness (or witness uncorrected.)

as_tsv

Returns a tab-separated alignment table representation of the collation graph, one row per witness (or witness uncorrected.)

alignment_table

Return a reference to an alignment table, in a slightly enhanced CollateX format which looks like this:

 $table = { alignment => [ { witness => "SIGIL", 
                             tokens => [ { t => "TEXT" }, ... ] },
                           { witness => "SIG2", 
                             tokens => [ { t => "TEXT" }, ... ] },
                           ... ],
            length => TEXTLEN };

NAVIGATION METHODS

reading_sequence( $first, $last, $sigil, $backup )

Returns the ordered list of readings, starting with $first and ending with $last, for the witness given in $sigil. If a $backup sigil is specified (e.g. when walking a layered witness), it will be used wherever no $sigil path exists. If there is a base text reading, that will be used wherever no path exists for $sigil or $backup.

readings_at_rank( $rank )

Returns a list of readings at a given rank, taken from the alignment table.

next_reading( $reading, $sigil );

Returns the reading that follows the given reading along the given witness path.

prior_reading( $reading, $sigil )

Returns the reading that precedes the given reading along the given witness path.

common_readings

Returns the list of common readings in the graph (i.e. those readings that are shared by all non-lacunose witnesses.)

path_text( $sigil, [, $start, $end ] )

Returns the text of a witness (plus its backup, if we are using a layer) as stored in the collation. The text is returned as a string, where the individual readings are joined with spaces and the meta-readings (e.g. lacunae) are omitted. Optional specification of $start and $end allows the generation of a subset of the witness text.

INITIALIZATION METHODS

These are mostly for use by parsers.

make_witness_path( $witness )

Link the array of readings contained in $witness->path (and in $witness->uncorrected_path if it exists) into collation paths. Clear out the arrays when finished.

make_witness_paths

Call make_witness_path for all witnesses in the tradition.

calculate_ranks

Calculate the reading ranks (that is, their aligned positions relative to each other) for the graph. This can only be called on linear collations.

flatten_ranks

A convenience method for parsing collation data. Searches the graph for readings with the same text at the same rank, and merges any that are found.

identical_readings =head2 identical_readings( start => $startnode, end => $endnode ) =head2 identical_readings( startrank => $startrank, endrank => $endrank )

Goes through the graph identifying all pairs of readings that appear to be identical, and therefore able to be merged into a single reading. Returns the relevant identical pairs. Can be restricted to run over only a part of the graph, specified either by node or by rank.

calculate_common_readings

Goes through the graph identifying the readings that appear in every witness (apart from those with lacunae at that spot.) Marks them as common and returns the list.

text_from_paths

Calculate the text array for all witnesses from the path, for later consistency checking. Only to be used if there is no non-graph-based way to know the original texts.

UTILITY FUNCTIONS

common_predecessor( $reading_a, $reading_b )

Find the last reading that occurs in sequence before both the given readings. At the very least this should be $self->start.

common_successor( $reading_a, $reading_b )

Find the first reading that occurs in sequence after both the given readings. At the very least this should be $self->end.

BUGS/TODO

  • Rework XML serialization in a more modular way

LICENSE

This package is free software and is provided "as is" without express or implied warranty. You can redistribute it and/or modify it under the same terms as Perl itself.

AUTHOR

Tara L Andrews <aurum@cpan.org>