Steffen Müller > Math-SZaru-0.01 > Math::SZaru::TopEstimator

# NAME

Math::SZaru::TopEstimator - Statistical estimator of the 'top N' items based on CountSketch algorithm

# SYNOPSIS

```  use Math::SZaru::TopEstimator;
my \$ue = Math::SZaru::TopEstimator->new(\$num_top_elems_to_track);
\$ue->add_weighted_elems("foo" => 10, "bar" => 20);
# add many more elems ...

my \$inserted_elems_count = \$ue->tot_elems;
my \$estimated_list_of_top_elements = \$ue->estimate();

# \$estimated_list_of_top_elements is now something like:
# [ ["most frequent string", \$count], ["second most frequent string, \$count],
#   [...], [...], ..., ["nth most frequent string", \$count] ]```

# DESCRIPTION

`Math::SZaru::TopEstimator` provides a statistical estimate of the 'top N' most frequent data items in a stream. This is based on CountSketch algorithm from "Finding Frequent Items in Data Streams", Moses Charikar, Kevin Chen and Martin Farach-Colton, 2002.

# METHODS

## new

Constructor. Expects an integer indicating the number of "top" items to track.

Given a string, adds the string to the TopEstimator.

Same as `add_elem`, but accepts an arbitrary number of strings to insert into the estimator at once.

Given a string and a count N, adds that string N times to the estimator (with a weight of N). Functionality-wise same as calling doing `\$est->add_elem("foo") for 1..\$N`, but much faster.

This is to `add_weighted_elem` what `add_elems` is to `add_elem`. Takes a list of string, count, string, count, ....

## tot_elems

Returns the total count of the number of elements that were added to the estimator.

## estimate

Returns a reference to an array containing as many records as were configured to be tracked at construction time. Each record, in order, represents the n-th most frequent item in the input stream -- estimated.

Each record is a reference to an array of the value (string) and its' estimated total number of occurrences in the input.

Math::SZaru

# AUTHOR

Steffen Mueller, <smueller@cpan.org>

`  http://www.apache.org/licenses/LICENSE-2.0`