NAME

Bloom::Faster - Perl extension for the c library libbloom.

INSTALLATION

see INSTALL

SYNOPSIS

  use Bloom::Faster;
  
  # m = ideal vector size.  
  # k = # of hash functions to use. 

  my $bloom = new Bloom::Faster({m => 1000000,k => 5});

  # this gives us very tight control of memory usage (a function of m)
  # and performance (a function of k).  but in most applications, we won't
  # know the optimal values of either of these.  for these cases, it is 
  # much easier to supply:
  #
  # n = number of expected elements to check for duplicates,
  # e = acceptable error rate (probability of false positive)
  #
  # my $bloom = new Bloom::Faster({n => 1000000, e => 0.00001});

  while (<>) {
        chomp;
        # Bloom::Faster->add() returns true when the value is a duplicate.
        if ($bloom->add($_)) {
                print "DUP: $_\n";
        }
  }

  if ($bloom->check("foo")) {
    print " foo has been seen ";
  }

  # for annoying backwards-compatibility reasons, we also provide a "test" method. 
  # this method is EQUIVALENT to the add() method and should not be used since it's
  # extremely confusing.  This method is now deprecated.


  # serialize to disk
  $bloom->to_file("/path/to/file");

  # read from disk
  my $another_bloom = new Bloom::Faster("/path/to/another/file");

  # manually free the data structures 
  $bloom->DESTROY;

DESCRIPTION

Bloom filters are a lightweight duplicate detection algorithm proposed by Burton Bloom (http://portal.acm.org/citation.cfm?id=362692&dl=ACM&coll=portal), with applications in stream data processing, among others. Bloom filters are a very cool thing. Where occasional false positives are acceptable, bloom filters give us the ability to detect duplicates in a fast and resource-friendly manner.

The allocation of memory for the bit vector is handled in the c layer, but perl's oo capability handles the garbage collection. when a Bloom::Faster object goes out of scope, the vector pointed to by the c structure will be free()d. to manually do this, the DESTROY builtin method can be called.

A bloom filter perl module is currently avaible on CPAN, but it is slow and cannot handle large vectors. This alternative uses a more efficient c library which can handle very large vectors. =head2 EXPORT

None by default.

Exportable constants

  HASHCNT
  PRIME_SIZ
  SIZ

AUTHOR

Peter Alvaro and Dmitriy Ryaboy, <palvaro@cpan.org> <dvryaboy@cpan.org>

COPYRIGHT AND LICENSE

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.5 or, at your option, any later version of Perl 5 you may have available.

To install Bloom::Faster, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Bloom::Faster

CPAN shell

perl -MCPAN -e shell
install Bloom::Faster

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)