PDL::Algorithm::Center - Various methods of finding the center of a sample
version 0.06
PDL::Algorithm::Center is a collection of algorithms which specialize in centering datasets.
PDL::Algorithm::Center
See "TYPES" for information on the types used in the subroutine descriptions.
$results = sigma_clip( center => Optional [ Center | CodeRef ], clip => Optional [PositiveNum], coords => Optional [Coords], dtol => PositiveNum, iterlim => Optional [PositiveInt], log => Optional [Bool | CodeRef], mask => Optional [ Undef | Piddle_min1D_ne ], save_mask => Optional [Bool], save_weight => Optional [Bool], nsigma => PositiveNum, weight => Optional [ Undef | Piddle_min1D_ne ], );
Center a dataset by iteratively excluding data outside of a radius equal to a specified number of standard deviations. The dataset may be specified as a list of coordinates and optional weights, or as a weight piddle of shape NxM (e.g., an image). If only the weight piddle is provided, it is converted internally into a list of coordinates with associated weights.
To operate on a subset of the input data, specify the mask option.
mask
A PDL::Algorithm::Center::Failure::parameter exception will be thrown if there is a parameter error.
The center of a data set is determined by:
clipping (ignoring) the data whose distance to the current center is greater than a specified number of standard deviations
calculating a new center by performing a (weighted) centroid of the remaining data
calculating the standard deviation of the distance from the remaining data to the center
repeat step 1 until either a convergence tolerance has been met or the iteration limit has been exceeded
The initial center may be explicitly specified, or may be calculated by performing a (weighted) centroid of the data.
The initial standard deviation is calculated using the initial center and either the entire dataset, or from a clipped region about the initial center.
The following options are available:
center
The initial center. It may be
An array of length N
The array may contain undefined values for each dimension for which the center should be determined by finding the mean of the values in that dimension.
A piddle with shape N (or something that can be coerced into one, see "TYPES"),
A coderef which will return the center as a piddle with shape N. The subroutine is called as
&$center( $coords, $mask, $weight, $total_weight );
with
$coords
A piddle with shape NxM containing M coordinates with dimension N
$mask
A piddle with shape M, essentially a flattened copy of the initial $mask option to "iterate".
$weight
A piddle with shape M, essentially a copy of the initial $weight option to "iterate".
$total_weight
A scalar which is the sum of $mask * $weight
$mask * $weight
clip
Optional. The clipping radius used to determine the initial standard deviation.
coords
Optional. The coordinates to center. coords is a piddle of shape NxM (or anything which can be coerced into it, see "TYPES") where N is the number of dimensions in the data and M is the number of data elements.
weight may be specified with coords to indicate weighted data.
weight
mask may be specified to indicate that a subset of the coordinates should be operated on.
coords is useful if the data cube is not fully populated; for dense data, use weight instead.
dtol
Optional. If specified iteration will cease when successive centers are closer than the specified distance.
iterlim
Optional. The maximum number of iterations to run. Defaults to 10.
log
Optional.
If log is true (and not a coderef), a default logger which outputs to STDOUT will be used.
If a coderef it will be called before the first iteration and at the end of each iteration. It is passed a copy of the current iteration's results object; see "Sigma Clip Iteration Results".
Optional. This is a piddle which specifies which coordinates to include in the calculations. Its values are either 0 or 1, where values of 1 indicate coordinates to be included. It defaults to a piddle of all 1's.
0
1
When used with coords, mask must be a piddle of shape M, where M is the number of data elements in coords.
If coords is not specified, mask should have the same shape as weight.
save_mask
If true, the mask used in the final iteration will be returned in the iteration result object.
save_weight
If true, the weights used in the final iteration will be returned in the iteration result object.
nsigma
The size of the clipping radius, in units of the standard deviation.
Optional. Data weights. When used with coords, weight must be a piddle of shape M, where M is the number of data elements in coords. If coords is not specified, weight is a piddle of shape NxM, where N is the number of dimensions in the data and M is the number of data elements.
It defaults to a piddle of all 1's.
sigma_clip returns an object which includes all of the attributes from the final iteration object (See "Sigma Clip Iterations" ), with the following additional attributes/methods:
iterations
An array of results objects for each iteration.
success
True if the iteration converged, false otherwise.
error
If convergence has failed, this will contain an error object describing the failure. See "Errors".
If the $save_mask option is true, this will be the final inclusion mask.
$save_mask
If the $save_weight option is true, this will be the final weights.
$save_weight
The results for each iteration are stored in an object with the following attributes/methods:
A 1D piddle containing the derived center. The value for the last iteration will be undefined if all of the elements have been clipped.
iter
The iteration index. An index of 0 indicates the values determined before the iterative loop was entered, and reflects the initial clipping and mask exclusion.
nelem
The number of data elements used in the center.
total_weight
The combined weight of the data elements used to determine the center.
sigma
The standard deviation of the clipped data. The value for the last iteration will be undefined if all of the elements have been clipped.
The clipping radius. This will be undefined for the first iteration if the clip option was not specified.
dist
Optional. The distance between the previous and current centers. This is defined only if the dtol option was passed.
$result = iterate( center => Center | CodeRef, initialize => CodeRef, calc_center => CodeRef, calc_wmask => CodeRef, is_converged => CodeRef, coords => Coords, iterlim => PositiveInt, log => Optional [CodeRef], mask => Optional [Piddle1D_ne], save_mask => Optional [Bool], save_weight => Optional [Bool], weight => Optional [Piddle1D_ne], );
A generic iteration loop for centering data using callbacks for calculating centers, included element masks, weight, and iteration completion.
The following options are accepted:
The initial center. It may either be a piddle with shape N (or something that can be coerced into one, see "TYPES") or a coderef which will return the center as a piddle with shape N. The coderef is called as
$initial_center = &$center( $coords, $mask, $weight, $total_weight );
A scalar which is the sum of $mask * $weight.
initialize
This subroutine provides initialization prior to entering the iteration loop. It should initialize the passed iteration object and work storage.
It is invoked as:
&$initialize( $coords, $mask, $weight, $current, $work );
A piddle of shape NxM with the coordinates of each element
$current
a reference to a Hash::Wrap based object containing data for the current iteration. initialize may augment the underlying hash with its own data (but see "Work Space"). The following attributes are provided by iterate:
iterate
The number of included coordinates, $mask-sum>.
$mask-
The sum of the weights of the included coordinates, ($mask * $weight)->dsum.
($mask * $weight)->dsum
$work
A hashref which may use to store temporary data (e.g. work piddles) which will be available to all of the callback routines.
calc_center
This subroutine should return a piddle of shape N with the calculated center.
It will be called as:
$center = &$calc_center( $coords, $mask, $weight, $current, $work );
A piddle with shape M containing the current inclusion mask.
A piddle with shape M containing the current weights for the included coordinates.
A reference to a Hash::Wrap based object containing data for the current iteration.
calc_center may augment the underlying hash with its own data (but see "Iteration Objects"). The following attributes are provided by iterate:
The number of included coordinates, $mask->sum.
$mask->sum
The sum of the weights of the included coordinates, ($mask*$weight)->dsum).
($mask*$weight)->dsum)
calc_wmask
This subroutine should determine the current set of included coordinates and their current weights.
&$calc_mask( $coords, $mask, $weight, $current, $work );
A piddle with shape M, essentially a flattened copy of the initial $mask option to "iterate". Any changes to it will be discarded at the end of the iteration. Be sure to update $current->nelem if this is changed.
$current->nelem
A piddle with shape M, essentially a flattened copy of the initial $mask option to "iterate". Any changes to it will be discarded at the end of the iteration. Be sure to update $current->total_weight if this is changed.
$current->total_weight
calc_center may augment the underlying hash with its own data (but see "Work Space"). The following attributes are provided by iterate:
The number of included coordinates, $mask->sum. If $mask is changed this must either be updated or set to the undefined value.
The sum of the weights of the included coordinates, ($mask * $weight)->dsum. If $weight is changed this must either be updated or set to the undefined value.
is_converged
This subroutine should return a boolean value indicating whether the iteration has converged.
$bool = &$is_converged( $coords, $mask, $weight, $last, $current, $work );
$last
A reference to a Hash::Wrap based object containing data for the previous iteration. is_converged may augment the underlying hash with its own data (but see "Work Space"). The following attributes are provided by iterate:
The number of included coordinates.
The sum of the weights of the included coordinates.
A reference to a Hash::Wrap based object containing data for the current iteration, with attributes as described above for $last
The is_converged routine is passed references to the actual objects used by sigma_clip to keep track of the iterations. This means that the is_converged routine may manipulate the starting point for the next iteration by altering its $current parameter.
is_converged is called prior to entering the iteration loop with $last set to undef. This allows priming the $current structure, which will be used as $last in the first iteration.
undef
The coordinates to center. coords is a piddle of shape NxM (or anything which can be coerced into it, see "TYPES") where N is the number of dimensions in the data and M is the number of data elements.
A positive integer specifying the maximum number of iterations.
Optional. A subroutine which will be called
It is invoked as
&$log( $iteration );
where $iteration is a copy of the current iteration object. The object will have at least the following fields:
$iteration
A piddle of shape N containing the derived center. The value for the last iteration will be undefined if all of the elements have been clipped.
The iteration index
The summed weight of the included coordinates.
There may be other attributes added by the various callbacks (calc_wmask, calc_center, is_converged). See for example, "Sigma Clip Iterations".
Callbacks are provided with Hash::Wrap based objects which contain the data for the current iteration. They should add data to the objects underlying hash which records particulars about their specific operation,
Callbacks are passed Hash::Wrap based iteration objects and a reference to a $work hash. The iteration objects may have additional elements added to them (which will be available to the caller), but should refrain from storing unnecessary data there, as each new iteration's object is copied from that for the previous iteration.
Instead, use the passed $work hash. It is shared amongst the callbacks, so use it to store data which will not be returned to the caller.
iterate returns an object which includes all of the attributes from the final iteration object (See "Iteration Object" ), with the following additional attributes/methods:
An array of result objects for each iteration.
The value of the center attribute in the last iteration will be undefined if all of the elements have been clipped.
The results for each iteration are stored in an object with the following attributes/methods (in addition to those added by the callbacks).
Before the first iteration:
Extract an initial center from center.
Create a new iteration object.
Call initialize.
Call log
For each iteration:
Creat a new iteration object by copying the old one.
Call calc_wmask, with a copy of the initial mask and weights. calc_mask should update (in place) at least one of them
calc_mask
Update summed weight and number of elements if calc_wmask sets them to undef.
Call calc_center with the current mask and weights.
Call is_converged with the current mask and weights.
Goto step 1 if not converged and iteration limit has not been reached.
In the description of the subroutines, the following types are specified:
This accepts a non-null, non-empty 1D piddle, or anything that can be converted into one (for example, a scalar, a scalar piddle, or an array of numbers );
A code reference.
A positive real number.
A positive integer.
This accepts a non-null, non-empty 2D piddle, or anything that can be converted or up-converted to it.
This accepts a non-null, non-empty piddle with a minimum of 1 dimension.
This accepts a non-null, non-empty 1D piddle.
Errors are represented as objects in the following classes:
These are unconditionally thrown as PDL::Algorithm::Center::Failure::parameter objects.
These are stored in the result object's error attribute.
PDL::Algorithm::Center::Failure::iteration::limit_reached PDL::Algorithm::Center::Failure::iteration::empty
The objects stringify to a failure message.
Please report any bugs or feature requests on the bugtracker website https://rt.cpan.org/Public/Dist/Display.html?Name=PDL-Algorithm-Center or by email to bug-PDL-Algorithm-Center@rt.cpan.org.
When submitting a bug or request, please include a test-file or a patch to an existing test-file that illustrates the bug or desired feature.
The development version is on github at https://github.com/djerius/pdl-algorithm-center and may be cloned from git://github.com/djerius/pdl-algorithm-center.git
Diab Jerius <djerius@cpan.org>
This software is Copyright (c) 2017 by Smithsonian Astrophysical Observatory.
This is free software, licensed under:
The GNU General Public License, Version 3, June 2007
To install PDL::Algorithm::Center, copy and paste the appropriate command in to your terminal.
cpanm
cpanm PDL::Algorithm::Center
CPAN shell
perl -MCPAN -e shell install PDL::Algorithm::Center
For more information on module installation, please visit the detailed CPAN module installation guide.