Станислав Пусеп > Text-SpeedyFx-0.009 > cosine_cmp

Download:
Text-SpeedyFx-0.009.tar.gz

Annotate this POD

Website

View/Report Bugs
Source   Latest Release: Text-SpeedyFx-0.010

NAME ^

cosine_cmp - compute cosine similarity between two documents

VERSION ^

version 0.009

SYNOPSIS ^

    cosine_cmp [options] FILE1 FILE2

DESCRIPTION ^

Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them. The cosine of 0 is 1, and less than 1 for any other angle; the lowest value of the cosine is -1. The cosine of the angle between two vectors thus determines whether two vectors are pointing in roughly the same direction. This is often used to compare documents in text mining. In addition, it is used to measure cohesion within clusters in the field of data mining.

(source)

OPTIONS ^

--help

This.

--length

Feature vector length (in KB, default: 10).

--seed

Custom seed (integer).

--bits

How many bits do represent one character. The default value, 8, sacrifices Unicode handling but is fast and low on memory footprint. The value of 18 encompasses Basic Multilingual, Supplementary Multilingual and Supplementary Ideographic planes.

SEE ALSO ^

AUTHOR ^

Stanislaw Pusep <stas@sysd.org>

COPYRIGHT AND LICENSE ^

This software is copyright (c) 2013 by Stanislaw Pusep.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.

syntax highlighting: