Michael Roberts > WWW-Modbot > WWW::Modbot::Test::TextPlausibility

Download:
modbot/WWW-Modbot-0.02.tar.gz

Dependencies

Annotate this POD

CPAN RT

Open  0
View/Report Bugs
Module Version: 0.01   Source  

NAME ^

WWW::Modbot::Test::TextPlausibility - score a post and field based on some text plausbility metrics.

VERSION ^

Version 0.01

SYNOPSIS ^

The WWW::Modbot::Test::TextPlausibility module is a WWW::Modbot::Test implementation which looks at a text field and rates the probability that it is text. See the W::M::Test module for more information about the API.

The plausibility check might detect titles of the nature wXuDFeSzwCF (I'm sure you've seen a few) by looking at the number of spaces and vowels in a word, the number of case shifts (upper to lower or back), and letter frequency. Obviously, it will work best with English words, but any alphabetic language should meet the criteria it likes.

You might have to disable it if your forum is in Chinese or Japanese. I'd be interested in any input.

In actuality, it's currently just looking at case switches. This might still be a problem with the DBCS languages, but I don't actually know. It works well in English, though.

FUNCTIONS ^

new

The new function doesn't really do anything, but if we don't provide one, Test.pm will try to call itself.

test

All Test modules have only one function, test. It's passed a hashref containing the fields of the post, and a field name which need not be used. The function sets one or more fields in the hashref which are then evaluated by the ruleset to arrive at a score. The return value is a list of the fields set by the function (this makes testing of modules easier).

In the case of TextPlausbility, the field named is the one evaluated, and the return value is $field-casesw. More could be done (in particular, letter frequency should be a valuable consideration) but even looking at case switches only turned out to be pretty effective against the titles being used last time I looked at the statistics.

AUTHOR ^

Michael Roberts, <michael at despammed.com>

BUGS ^

Please report any bugs or feature requests to bug-www-modbot at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=WWW-Modbot. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

COPYRIGHT & LICENSE ^

Copyright 2008 Vivtek, all rights reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

syntax highlighting: