Bryan Jurish > PDL-EditDistance-0.06 > PDL::EditDistance

Download:
PDL-EditDistance-0.06.tar.gz

Dependencies

Annotate this POD

CPAN RT

Open  1
View/Report Bugs
Module Version: 0.06   Source  

NAME ^

PDL::EditDistance - Wagner-Fischer edit distance and alignment for PDLs.

SYNOPSIS ^

 use PDL;
 use PDL::EditDistance;

 ##-- input PDLs
 $a = pdl([map { ord($_) } qw(G U M B O)]);
 $b = pdl([map { ord($_) } qw(G A M B O L)]);

 $a1 = pdl([0, map { ord($_) } qw(G U M B O)]);
 $b1 = pdl([0, map { ord($_) } qw(G A M B O L)]);

 ##-------------------------------------------------------------
 ## Levenshtein distance
 $dist          = edit_distance_static($a,$b, 0,1,1,1);
 ($dist,$align) = edit_align_static($a,$b, 0,1,1,1);

 ##-------------------------------------------------------------
 ## Wagner-Fischer distance
 @costs         = ($costMatch=0,$costInsert=1,$costDelete=1,$costSubstitute=2);
 $dist          = edit_distance_static($a,$b, @costs);
 ($dist,$align) = edit_align_static($a,$b, @costs);

 ##-------------------------------------------------------------
 ## General edit distance
 $costsMatch = random($a->nelem+1, $b->nelem+1);
 $costsIns   = random($a->nelem+1, $b->nelem+1);
 $costsDel   = random($a->nelem+1, $b->nelem+1);
 $costsSubst = random($a->nelem+1, $b->nelem+1);
 @costs         = ($costsMatch,$costsIns,$costDel,$costsSubst);
 $dist          = edit_distance_full($a,$b,@costs);
 ($dist,$align) = edit_align_full($a,$b,@costs);

 ##-------------------------------------------------------------
 ## Alignment
 $op_match = align_op_match();      ##-- constant
 $op_del   = align_op_insert1();    ##-- constant
 $op_ins   = align_op_insert2();    ##-- constant
 $op_subst = align_op_substitute(); ##-- constant

 ($apath,$bpath,$pathlen) = edit_bestpath($align);
 ($ai,$bi,$ops,$pathlen)  = edit_pathtrace($align);

 ##-------------------------------------------------------------
 ## Longest Common Subsequence
 $lcs = edit_lcs($a,$b);
 ($ai,$bi,$lcslen) = lcs_backtrace($a,$b,$lcs);

FUNCTIONS ^

_edit_pdl

  Signature: (a(N); [o]apdl(N+1))

Convenience method. Returns a pdl $apdl() suitable for representing $a(), which can be specified as a string, arrays of numbers, or as a PDL. $apdl(0) is always set to zero.

edit_costs

  Signature: (PDL::Type type; int N; int M;
              [o]costsMatch(N+1,M+1); [o]costsIns(N+1,M+1); [o]costsDel(N+1,M+1); [o]costsSubst(N+1,M+1))

Convenience method. Ensures existence and proper dimensionality of cost matrices for inputs of length N and M.

_edit_costs

  Signature: (PDL::Type type; int N1; int M1;
              [o]costsMatch(N1,M1); [o]costsIns(N1,M1); [o]costsDel(N1,M1); [o]costsSubst(N1,M1))

Low-level method. Ensures existence and proper dimensionality of cost matrices for inputs of length N1-1 and M1-1.

edit_costs_static

  Signature: (PDL::Type type; int N; int M;
              staticCostMatch(); staticCostIns(); staticCostSubst();
              [o]costsMatch(N+1,M+1); [o]costsIns(N+1,M+1); [o]costsDel(N+1,M+1); [o]costsSubst(N+1,M+1))

Convenience method.

edit_distance_full

  Signature: (a(N); b(M);
              costsMatch(N+1,M+1); costsIns(N+1,M+1); costsDel(N+1,M+1); costsSubst(N+1,M+1);
              [o]dist(N+1,M+1); [o]align(N+1,M+1))

Convenience method. Compute the edit distance matrix for inputs $a() and $b(), and cost matrices $costsMatch(), $costsIns(), $costsDel(), and $costsSubst(). $a() and $b() may be specified as PDLs, arrays of numbers, or as strings.

_edit_distance_full

  Signature: (a1(N1); b1(M1); costsMatch(N1,M1); costsIns(N1,M1); costsDel(N1,M1); costsSubst(N1,M1); [o]dist(N1,M1))

Low-level method. Compute the edit distance matrix for input PDLs $a1() and $b1() and cost matrices $costsMatch(), $costsIns(), $costsDel(), and $costsSubst().

The first elements of $a1() and $b1() are ignored.

_edit_distance_full does not process bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.

edit_align_full

  Signature: (a(N); b(M);
              costsMatch(N+1,M+1); costsIns(N+1,M+1); costsDel(N+1,N+1); costsSubst(N+1,M+1);
              [o]dist(N+1,M+1); [o]align(N+1,M+1))

Convenience method. Compute the edit distance and alignment matrices for inputs $a() and $b(), and cost matrices $costsMatch(), $costsIns(), $costsDel(), and $costsSubst(). $a() and $b() may be specified as PDLs, arrays of numbers, or as strings.

_edit_align_full

  Signature: (a1(N1); b1(M1); costsMatch(N1,M1); costsIns(N1,M1); costsDel(N1,M1); costsSubst(N1,M1); [o]dist(N1,M1); byte [o]align(N1,M1))

Low-level method. Compute the edit distance and alignment matrix for input PDLs $a1() and $b1() and cost matrices $costsMatch(), $costsIns(), $costsDel(), and $costsSubst().

The first elements of $a1() and $b1() are ignored.

_edit_align_full does not process bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.

edit_distance_static

  Signature: (a(N); b(M);
              staticCostMatch(); staticCostIns(); staticCostDel(); staticCostSubst();
              [o]dist(N+1,M+1))

Convenience method. Compute the edit distance matrix for inputs $a() and $b() given a static cost schema @costs = ($staticCostMatch(), $staticCostIns(), $staticCostDel(), and $staticCostSubst()). $a() and $b() may be specified as PDLs, arrays of numbers, or as strings. Functionally equivalent to edit_distance_full($matches,@costs,$dist), but slightly faster.

_edit_distance_static

  Signature: (a1(N1); b1(M1); costMatch(); costIns(); costDel(); costSubst(); [o]dist(N1,M1))

Low-level method. Compute the edit distance matrix for input PDLs $a1() and $b1() given a static cost schema @costs = ($costMatch(), $costIns(), $costDel(), $costSubst()). Functionally identitical to _edit_distance_matrix_full($matches,@costs,$dist), but slightly faster.

The first elements of $a1() and $b1() are ignored.

_edit_distance_static does not process bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.

edit_align_static

  Signature: (a(N); b(M);
              staticCostMatch(); staticCostIns(); staticCostDel(); staticCostSubst();
              [o]dist(N+1,M+1); [o]align(N+1,M+1))

Convenience method. Compute the edit distance and alignment matrices for inputs $a() and $b() given a static cost schema @costs = ($staticCostMatch(), $staticCostIns(), $staticCostDel(), and $staticCostSubst()). $a() and $b() may be specified as PDLs, arrays of numbers, or as strings. Functionally equivalent to edit_align_full($matches,@costs,$dist), but slightly faster.

_edit_align_static

  Signature: (a1(N1); b1(M1); costMatch(); costIns(); costDel(); costSubst(); [o]dist(N1,M1); byte [o]align(N1,M1))

Low-level method. Compute the edit distance and alignment matrices for input PDLs $a1() and $b1() given a static cost schema @costs = ($costMatch(), $costIns(), $costDel(), $costSubst()). Functionally identitical to _edit_distance_matrix_full($matches,@costs,$dist), but slightly faster.

The first elements of $a1() and $b1() are ignored.

_edit_align_static does not process bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.

align_op_insert1

  Signature: ([o]a())

Alignment matrix value constant for insertion operations on $a() string.

align_op_insert1 does not process bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.

align_op_insert2

  Signature: ([o]a())

Alignment matrix value constant for insertion operations on $a() string.

align_op_insert2 does not process bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.

align_op_match

  Signature: ([o]a())

Alignment matrix value constant for matches.

align_op_match does not process bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.

align_op_substitute

  Signature: ([o]a())

Alignment matrix value constant for substitution operations.

align_op_substitute does not process bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.

align_op_delete

Alias for align_op_insert1()

align_op_insert

Alias for align_op_insert2()

align_ops

  Signature: ([o]ops(4))

Alignment matrix value constants 4-element pdl (match,insert1,insert2,substitute).a

edit_bestpath

  Signature: (align(N+1,M+1); [o]apath(N+M+2); [o]bpath(N+M+2); [o]pathlen())

Convenience method. Compute best path through alignment matrix $align(). Stores paths for original input strings $a() and $b() in $apath() and $bpath() respectively. Negative values in $apath() and $bpath() indicate insertion/deletion operations. On completion, $pathlen() holds the actual length of the paths.

_edit_bestpath

  Signature: (align(N1,M1); int [o]apath(L); int [o]bpath(L); int [o]len(); int ifinal; int jfinal)

Low-level method. Compute best path through alignment matrix $align() from final index ($ifinal,$jfinal). Stores paths for (original) input strings $a() and $b() in $apath() and $bpath() respectively. Negative values in $apath() and $bpath() indicate insertion/deletion operations. On completion, $pathlen() holds the actual length of the paths.

_edit_bestpath does not process bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.

edit_pathtrace

  Signature: ( align(N+1,M+1); [o]ai(L); [o]bi(L); [o]ops(L); [o]$pathlen() )

Convenience method. Compute alignment path backtrace through alignment matrix $align() from final index ($ifinal,$jfinal). Stores raw paths for (original) input strings $a() and $b() in $ai() and $bi() respectively. Unlike edit_bestpath(), null-moves for $ai() and $bi() are not stored here as negative values. Returned pdls ($ai,$bi,$ops) are trimmed to the appropriate path length.

_edit_pathtrace

  Signature: (align(N1,M1); int [o]ai(L); int [o]bi(L); int [o]ops(L); int [o]len(); int ifinal; int jfinal)

Low-level method. Compute alignment path backtrace through alignment matrix $align() from final index ($ifinal,$jfinal). Stores raw paths for (original) input strings $a() and $b() in $ai() and $bi() respectively. Unlike edit_bestpath(), null-moves for $ai() and $bi() are not stored here as negative values. Returned pdls ($ai,$bi,$ops) are trimmed to the appropriate path length.

_edit_pathtrace does not process bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.

edit_lcs

  Signature: (a(N); b(M); int [o]lcs(N+1,M+1);)

Convenience method. Compute the longest common subsequence (LCS) matrix for input PDLs $a1() and $b1(). The output matrix $lcs() contains at cell ($i+1,$j+1) the length of the LCS between $a1(0..$i) and $b1(0..$j); thus $lcs($N,$M) contains the length of the LCS between $a() and $b().

_edit_lcs

  Signature: (a1(N1); b1(M1); int [o]lcs(N1,M1))

Low-level method. Compute the longest common subsequence (LCS) matrix for input PDLs $a1() and $b1(). The initial (zeroth) elements of $a1() and $b1() are ignored. The output matrix $lcs() contains at cell ($i,$j) the length of the LCS between $a1(1..$i) and $b1(1..$j); thus $lcs($N1-1,$M1-1) contains the length of the LCS between $a1() and $b1().

_edit_lcs does not process bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.

lcs_backtrace

  Signature: (a(N); b(M); int lcs(N+1,M+1); int ifinal(); int jfinal(); int [o]ai(L); int [o]bi(L); int [o]len())

Convenience method. Compute longest-common-subsequence backtrace through LCS matrix $lcs() for original input strings ($a(),$b()) from final index ($ifinal,$jfinal). Stores raw paths for (original) input strings $a() and $b() in $ai() and $bi() respectively.

_lcs_backtrace

  Signature: (a1(N1); b1(M1); int lcs(N1,M1); int ifinal(); int jfinal(); [o]ai(L); [o]bi(L); int [o]len())

Low-level method. Compute longest-common-subsequence backtrace through LCS matrix $lcs() for initial-padded strings ($a1(),$b1()) from final index ($ifinal,$jfinal). Stores raw paths for (original) input strings $a() and $b() in $ai() and $bi() respectively.

_lcs_backtrace does not process bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.

ACKNOWLEDGEMENTS ^

Perl by Larry Wall.

PDL by Karl Glazebrook, Tuomas J. Lukka, Christian Soeller, and others.

KNOWN BUGS ^

Probably many.

AUTHOR ^

Bryan Jurish <moocow@cpan.org>

Copyright Policy

Copyright (C) 2006-2014, Bryan Jurish. All rights reserved.

This package is free software, and entirely without warranty. You may redistribute it and/or modify it under the same terms as Perl itself, either Perl 5.14.2, or at your option any later version of Perl 5.

SEE ALSO ^

perl(1), PDL(3perl).

syntax highlighting: