Message::Style - Perl module to perform stylistic analysis of messages
use Message::Style; my $score=Message::Style::score(\@article); # or my $score=Message::Style::score(@article);
This Perl library does an analysis of a RFC2822 format message (typically email messages or Usenet posts) and produces a score that, in the author's opinion, gives a good indication as to whether the poster is a fsckwit, and therefore whether their message should be ignored.
This script takes a Usenet article (or other RFC822 formatted text) and attempts to identify whether the sender is a fsckwit. It does this by analysing quoting style, line length, spelling, and various other criteria.
There are several things that are annoying about Usenet posts, the scores are related to the "cost" of these. There are Byte Points (bandwidth wasted in transmission of pointless material) and Line Points (time wasted scrolling through pointless material). These, and their justifications are:
Long lines are wrapped by some newsreaders, truncated by others, or a horizontal scrollbar is presented. Whatever the case, these cause extra effort for the reader to scroll. A Line Point is given for every block of 80 chars (or part) beyond char 80.
Non-plain Content-Type, e.g. text/html, or a non-text Content-Encoding is unreadable to many. Byte Points are given for the entire article.
Signatures are generally a waste of bandwidth, and long ones need to be paged through. It is considered bad form to have a signature larger than the McQuary limit of 80x4. Because of that, Byte Points and Line Points scored for every character and line outside the 80x4 box.
BUAGs are those annoying graphics that always seem to come with "cute" extralong signatures. These are warned of, but not scored since they've already been accounted for in 3 (and also because BUAGs in the body of the message are sometimes useful.)
A quote is expected to precede the original material. Scoring is based upon this. The first four lines of the quoted material doesn't score at all. The original material is then counted for lines and bytes, and half of each is also allowed for quoted material. Beyond that, Byte and Line scores are applied. Top-posted articles are expected to score badly from this heuristic.
In addition, Byte and Line scores are multipled by the number of newsgroups crossposted to.
For final scoring, a Line point equals 40 Byte points.
Performs a scoring operation on the article, and returns the score.
This module is basically the result of ripping out the core of a really nasty script I wrote early in my Perl career and wrapping the minimum around it to pass CPAN muster. So the code is a bit crufty, although it does certainly work and has heard of strict and warn.
It was however reasonably well-tested at the time thanks to plenty of fsckwit source material on birmingham.misc / uk.local.birmingham.
All code and documentation by Peter Corlett <firstname.lastname@example.org>.
Copyright (C) 2000-2004 Peter Corlett <email@example.com>. All rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
This is free software. IT COMES WITHOUT WARRANTY OF ANY KIND.