Lingua::Diversity::SamplingScheme - storing the parameters of a sampling scheme
This documentation refers to Lingua::Diversity::SamplingScheme version 0.02.
# Lingua::Diversity::SamplingScheme is used by Lingua::Diversity::Variety. use Lingua::Diversity::Variety; # Create a new sampling scheme... my $sampling_scheme = Lingua::Diversity::SamplingScheme->new( 'mode' => 'segmental', 'subsample_size' => 100, ); # ... Then apply it to a Lingua::Diversity::Variety object. Lingua::Diversity::Variety->new( 'transform' => 'type_token_ratio', 'sampling_scheme' => $sampling_scheme, );
This class serves as storage for a set of parameters defining a sampling scheme (to be used with a Lingua::Diversity::Variety object). Such a scheme is meant to describe the kind of resampling that should be applied as well as the number of subsamples and their size.
The creator (new()) returns a new Lingua::Diversity::SamplingScheme object. It takes one required and two optional named parameters:
new()
The requested number of unit tokens per subsample (a positive integer).
The number of subsamples to be drawn (a positive integer). Default is 100. Note that this parameter has no effect in segmental mode (see below), since in this case the number of subsamples is the result of the integer division of text length by requested subsample size.
Either random (default) or segmental.
Value 'random' means that (i) the order of unit tokens in the text should not be modified in a given subsample, and (ii) the probability for a unit token to occur in a given subsample depends only on the requested subsample size (see subsample_size above). E.g. from text say you say me, the following subsamples of size 3 (and only them) could be generated (with uniform probability): say you say, say you me, say say me, and you say me.
Value 'segmental' means that subsamples should be continuous, non-overlapping sequences of units in the original text. For example, text say you say me would give rise to exactly two subsamples of size 2: say you and say me. Incomplete subsamples at the end of the text are ignored, so that a subsample size of 3 would produce a single subsample in this example (i.e. say you say). Note that in this mode, it is assumed that the unit and category arrays are in the text's order.
Getter and setter for the subsample_size attribute.
Getter and setter for the num_subsamples attribute.
Getter and setter for the mode attribute.
This module is part of the Lingua::Diversity distribution.
There are no known bugs in this module.
Please report problems to Aris Xanthos (aris.xanthos@unil.ch)
Patches are welcome.
Aris Xanthos (aris.xanthos@unil.ch)
Copyright (c) 2011 Aris Xanthos (aris.xanthos@unil.ch).
This program is released under the GPL license (see http://www.gnu.org/licenses/gpl.html).
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Lingua::Diversity and Lingua::Diversity::Variety
To install Lingua::Diversity, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Lingua::Diversity
CPAN shell
perl -MCPAN -e shell install Lingua::Diversity
For more information on module installation, please visit the detailed CPAN module installation guide.