Algorithm::LossyCount - Memory-efficient approximate frequency count.
version 0.03
use strict; use warnings; use Algorithm::LossyCount; my @samples = qw/a b a c d f a a d b b c a a .../; my $counter = Algorithm::LossyCount->new(max_error_ratio => 0.005); $counter->add_sample($_) for @samples; my $frequencies = $counter->frequencies; say $frequencies->{a}; # Approximate freq. of 'a'. say $frequencies->{b}; # Approximate freq. of 'b'. ...
Lossy-Counting is a approximate frequency counting algorithm proposed by Manku and Motwani in 2002 (refer "SEE ALSO" section below.)
The main advantage of the algorithm is memory efficiency. You can get approximate count of appearance of items with very low memory footprint, compared with total inspection. Furthermore, Lossy-Counting is an online algorithm. It is applicable to data set such that the size is unknown, and you can take intermediate result anytime.
Construcotr. max_error_ratio is the only mandatory parameter, that specifies acceptable error ratio. It is an error that give zero or a negative number as the value.
max_error_ratio
Add given $sample to count.
$sample
Returns current result as HashRef. Its keys and values are samples and corresponding counts respectively.
If optional named parameter support is specified, returned HashRef will contain only samples having frequency greater than ($support - $max_error_ratio) * $num_samples.
support
($support - $max_error_ratio) * $num_samples
Returns max_error_ratio you've given to the constructor.
Returns the total number of samples you've added.
Koichi SATOH <sekia@cpan.org>
This software is Copyright (c) 2014 by Koichi SATOH.
This is free software, licensed under:
The MIT (X11) License
To install Algorithm::LossyCount, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Algorithm::LossyCount
CPAN shell
perl -MCPAN -e shell install Algorithm::LossyCount
For more information on module installation, please visit the detailed CPAN module installation guide.