Bryan Jurish > PDL-VectorValued > PDL::VectorValued::Utils

PDL-VectorValued-1.0.6.tar.gz

Dependencies

Annotate this POD

# CPAN RT

 New 1 Open 0
View/Report Bugs
Module Version: 1.0.6

# NAME

PDL::VectorValued::Utils - Low-level utilities for vector-valued PDLs

# SYNOPSIS

``` use PDL;
use PDL::VectorValued::Utils;

##---------------------------------------------------------------------
## ... stuff happens```

# Vector-Based Run-Length Encoding and Decoding

## rlevec

`  Signature: (c(M,N); indx [o]a(N); [o]b(M,N))`

Run-length encode a set of vectors.

Higher-order rle(), for use with qsortvec().

Given set of vectors \$c, generate a vector \$a with the number of occurrences of each element (where an "element" is a vector of length \$M ocurring in \$c), and a set of vectors \$b containing the unique values. As for rle(), only the elements up to the first instance of 0 in \$a should be considered.

Can be used together with clump() to run-length encode "values" of arbitrary dimensions. Can be used together with rotate(), cat(), append(), and qsortvec() to count N-grams over a 1d PDL.

rlevec does not process bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.

## rldvec

`  Signature: (int a(N); b(M,N); [o]c(M,N))`

Run-length decode a set of vectors, akin to a higher-order rld().

Given a vector \$a() of the number of occurrences of each row, and a set \$c() of row-vectors each of length \$M, run-length decode to \$c().

Can be used together with clump() to run-length decode "values" of arbitrary dimensions.

rldvec does not process bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.

## enumvec

`  Signature: (v(M,N); int [o]k(N))`

Enumerate a list of vectors with locally unique keys.

Given a sorted list of vectors \$v, generate a vector \$k containing locally unique keys for the elements of \$v (where an "element" is a vector of length \$M ocurring in \$v).

Note that the keys returned in \$k are only unique over a run of a single vector in \$v, so that each unique vector in \$v has at least one 0 (zero) index in \$k associated with it. If you need global keys, see enumvecg().

enumvec does not process bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.

## enumvecg

`  Signature: (v(M,N); int [o]k(N))`

Enumerate a list of vectors with globally unique keys.

Given a sorted list of vectors \$v, generate a vector \$k containing globally unique keys for the elements of \$v (where an "element" is a vector of length \$M ocurring in \$v). Basically does the same thing as:

` \$k = \$v->vsearchvec(\$v->uniqvec);`

... but somewhat more efficiently.

enumvecg does not process bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.

## rleseq

`  Signature: (c(N); indx [o]a(N); [o]b(N))`

Run-length encode a vector of subsequences.

Given a vector of \$c() of concatenated variable-length, variable-offset subsequences, generate a vector \$a containing the length of each subsequence and a vector \$b containg the subsequence offsets. As for rle(), only the elements up to the first instance of 0 in \$a should be considered.

rleseq does not process bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.

## rldseq

`  Signature: (int a(N); b(N); [o]c(M))`

Run-length decode a subsequence vector.

Given a vector \$a() of sequence lengths and a vector \$b() of corresponding offsets, decode concatenation of subsequences to \$c(), as for:

``` \$c = null;
\$c = \$c->append(\$b(\$_)+sequence(\$a->type,\$a(\$_))) foreach (0..(\$N-1));```

rldseq does not process bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.

## vsearchvec

`  Signature: (find(M); which(M,N); int [o]found())`

Routine for searching N-dimensional values - akin to vsearch() for vectors.

``` \$found   = ccs_vsearchvec(\$find, \$which);
\$nearest = \$which->dice_axis(1,\$found);```

Returns for each row-vector in `\$find` the index along dimension N of the least row vector of `\$which` greater or equal to it. `\$which` should be sorted in increasing order. If the value of `\$find` is larger than any member of `\$which`, the index to the last element of `\$which` is returned.

vsearchvec does not process bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.

# Vector-Valued Sorting and Comparison

The following functions are provided for lexicographic sorting of vectors, rsp. axis indices. Note that vv_qsortvec() is functionally identical to the builtin PDL function qsortvec(), but also that the latter is broken in the stock PDL-2.4.3 distribution. The version included here includes Chris Marshall's "uniqsortvec" patch, which is available here:

` http://sourceforge.net/tracker/index.php?func=detail&aid=1548824&group_id=612&atid=300612`

## cmpvec

`  Signature: (a(N); b(N); int [o]cmp())`

Lexicographically compare a pair of vectors.

cmpvec does not process bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.

## vv_qsortvec

`  Signature: (a(n,m); [o]b(n,m))`

Drop-in replacement for qsortvec(), which is broken in the stock PDL-2.4.3 release. See PDL::Ufunc::qsortvec.

Vectors with bad components should be moved to the end of the array.

## vv_qsortveci

`  Signature: (a(n,m); indx [o]ix(m))`

Get lexicographic sort order of a matrix \$a() viewed as a list of vectors.

Vectors with bad components should be treated as last in the lexicographic order.

# Vector-Valued Set Operations

The following functions are provided for set operations on sorted vector-valued PDLs.

## vv_union

`  Signature: (a(M,NA); b(M,NB); [o]c(M,NC); int [o]nc())`

Union of two vector-valued PDLs. Input PDLs \$a() and \$b() MUST be sorted in lexicographic order. On return, \$nc() holds the actual number of vector-values in the union.

In scalar context, slices \$c() to the actual number of elements in the union and returns the sliced PDL.

vv_union does not process bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.

## vv_intersect

`  Signature: (a(M,NA); b(M,NB); [o]c(M,NC); int [o]nc())`

Intersection of two vector-valued PDLs. Input PDLs \$a() and \$b() MUST be sorted in lexicographic order. On return, \$nc() holds the actual number of vector-values in the intersection.

In scalar context, slices \$c() to the actual number of elements in the intersection and returns the sliced PDL.

vv_intersect does not process bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.

## vv_setdiff

`  Signature: (a(M,NA); b(M,NB); [o]c(M,NC); int [o]nc())`

Set-difference (\$a() \ \$b()) of two vector-valued PDLs. Input PDLs \$a() and \$b() MUST be sorted in lexicographic order. On return, \$nc() holds the actual number of vector-values in the computed vector set.

In scalar context, slices \$c() to the actual number of elements in the output vector set and returns the sliced PDL.

vv_setdiff does not process bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.

# Sorted Vector Set Operations

The following functions are provided for set operations on flat sorted PDLs with unique values. They may be more efficient to compute than the corresponding implenentations via PDL::Primitive::setops().

## v_union

`  Signature: (a(NA); b(NB); [o]c(NC); int [o]nc())`

Union of two flat sorted unique-valued PDLs. Input PDLs \$a() and \$b() MUST be sorted in lexicographic order and contain no duplicates. On return, \$nc() holds the actual number of values in the union.

In scalar context, reshapes \$c() to the actual number of elements in the union and returns it.

v_union does not process bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.

## v_intersect

`  Signature: (a(NA); b(NB); [o]c(NC); int [o]nc())`

Intersection of two flat sorted unique-valued PDLs. Input PDLs \$a() and \$b() MUST be sorted in lexicographic order and contain no duplicates. On return, \$nc() holds the actual number of values in the intersection.

In scalar context, reshapes \$c() to the actual number of elements in the intersection and returns it.

v_intersect does not process bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.

## v_setdiff

`  Signature: (a(NA); b(NB); [o]c(NC); int [o]nc())`

Set-difference (\$a() \ \$b()) of two flat sorted unique-valued PDLs. Input PDLs \$a() and \$b() MUST be sorted in lexicographic order and contain no duplicate values. On return, \$nc() holds the actual number of values in the computed vector set.

In scalar context, reshapes \$c() to the actual number of elements in the difference set and returns it.

v_setdiff does not process bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.

# Miscellaneous Vector-Valued Operations

## vv_vcos

`  Signature: (a(M,N);b(M);float+ [o]vcos(N))`

Computes the vector cosine similarity of a dense vector \$b() with respect to each row \$a(*,i) of a dense PDL \$a(). This is basically the same thing as:

` (\$a * \$b)->sumover / (\$a->pow(2)->sumover->sqrt * \$b->pow(2)->sumover->sqrt)`

... but should be must faster to compute, and avoids allocating potentially large temporaries for the vector magnitudes. Output values in \$vcos() are cosine similarities in the range [-1,1], except for zero-magnitude vectors which will result in NaN values in \$vcos().

You can use PDL threading to batch-compute distances for multiple \$b() vectors simultaneously:

```  \$bx   = random(\$M, \$NB);   ##-- get \$NB random vectors of size \$N
\$vcos = vv_vcos(\$a,\$bx);   ##-- \$vcos(i,j) ~ sim(\$a(,i),\$b(,j))```

vv_vcos() will set the bad status flag on the output piddle \$vcos() if it is set on either of the input piddles \$a() or \$b(), but BAD values will otherwise be ignored for computing the cosine similary.

# ACKNOWLEDGEMENTS

• Perl by Larry Wall
• PDL by Karl Glazebrook, Tuomas J. Lukka, Christian Soeller, and others.
• Code for rlevec() and rldvec() derived from the PDL builtin functions rle() and rld() in \$PDL_SRC_ROOT/Basic/Slices/slices.pd
• Code for vv_qsortvec() copied nearly verbatim from the builtin PDL functions in \$PDL_SRC_ROOT/Basic/Ufunc/ufunc.pd, with Chris Marshall's "uniqsortvec" patch. Code for vv_qsortveci() based on the same.

Probably many.

# AUTHOR

Bryan Jurish <moocow@cpan.org>