Changes for version 0.008 - 2013-01-01

  • Stanislaw Pusep <creaktive@gmail.com>
    • renamed cosine_sim to cosine_cmp (to keep consistence with minhash_cmp)
    • fixed function name clash (_itoa already present on Win32)

Documentation

compute cosine similarity between two documents
uses MinHash & SpeedyFx to compare large text data
efficiently count unique tokens from a file

Modules

tokenize/hash large amount of strings efficiently

Examples