Grzegorz Rożniecki > Bloom-Filter > Bloom::Filter

Download:
Bloom-Filter-1.2.tar.gz

Dependencies

Annotate this POD

Related Modules

List::Util
Inline::C
Data::Dumper
Bit::Vector
Digest::SHA1
Benchmark::Timer
Math::Random::MT
HTML::Parser
LWP::RobotUA
File::Find::Rule
more...
By perlmonks.org

CPAN RT

Open  1
View/Report Bugs
Module Version: 1.2   Source  

NAME ^

Bloom::Filter - Sample Perl Bloom filter implementation

DESCRIPTION ^

A Bloom filter is a probabilistic algorithm for doing existence tests in less memory than a full list of keys would require. The tradeoff to using Bloom filters is a certain configurable risk of false positives. This module implements a simple Bloom filter with configurable capacity and false positive rate. Bloom filters were first described in a 1970 paper by Burton Bloom, see http://portal.acm.org/citation.cfm?id=362692&dl=ACM&coll=portal.

SYNOPSIS ^

        use Bloom::Filter

        my $bf = Bloom::Filter->new( capacity => 10, error_rate => .001 );

        $bf->add( @keys );

        while ( <> ) {
                chomp;
                print "Found $_\n" if $bf->check( $_ );
        }

CONSTRUCTORS ^

new %PARAMS

Create a brand new instance. Allowable params are error_rate, capacity.

init

Calculates the best number of hash functions and optimum filter length, creates some random salts, and generates a blank bit vector. Called automatically by constructor.

ACCESSORS ^

capacity

Returns the total capacity of the Bloom filter

error_rate

Returns the configured maximum error rate

length

Returns the length of the Bloom filter in bits

key_count

Returns the number of items currently stored in the filter

on_bits

Returns the number of 'on' bits in the filter

salts

Returns the list of salts used to create the hash functions

PUBLIC METHODS ^

add @KEYS

Adds the list of keys to the filter. Will fail, return undef and complain if the number of keys in the filter exceeds the configured capacity.

check @KEYS

Checks the provided key list against the Bloom filter, and returns a list of equivalent length, with true or false values depending on whether there was a match.

INTERNAL METHODS ^

_calculate_shortest_filter_length CAPACITY ERR_RATE

Given a desired error rate and maximum capacity, returns the optimum combination of vector length (in bits) and number of hash functions to use in building the filter, where "optimum" means shortest vector length.

_get_cells KEY

Given a key, hashes it using the list of salts and returns an array of cell indexes corresponding to the key.

AUTHOR ^

Originally written by Maciej Ceglowski <maciej@ceglowski.com>. Currently maintained by Grzegorz Rożniecki <xaerxess@gmail.com>.

CONTRIBUTORS ^

Dmitriy Ryaboy <dmitriy.ryaboy@ask.com> (big speedup in February 2007, thanks!)

COPYRIGHT AND LICENSE ^

(c) 2004 Maciej Ceglowski

This is free software, distributed under version 2 of the GNU Public License (GPL).

syntax highlighting: