Mail::SpamAssassin::Plugin::AutoLearnThreshold - threshold-based discriminator for Bayes auto-learning
This plugin implements the threshold-based auto-learning discriminator for SpamAssassin's Bayes subsystem. Auto-learning is a mechanism whereby high-scoring mails (or low-scoring mails, for non-spam) are fed into its learning systems without user intervention, during scanning.
Note that certain tests are ignored when determining whether a message should be trained upon:
Also note that auto-learning occurs using scores from either scoreset 0 or 1, depending on what scoreset is used during message check. It is likely that the message check and auto-learn scores will be different.
The following configuration settings are used to control auto-learning:
The score threshold below which a mail has to score, to be fed into SpamAssassin's learning systems automatically as a non-spam message.
The score threshold above which a mail has to score, to be fed into SpamAssassin's learning systems automatically as a spam message.
Note: SpamAssassin requires at least 3 points from the header, and 3 points from the body to auto-learn as spam. Therefore, the minimum working value for this option is 6.
If the test option autolearn_force is set, the minimum value will remain at 6 points but there is no requirement that the points come from body and header rules. This option is useful for autolearning with rules that are considered to be extremely safe indicators of the spaminess of a message.
bayes_auto_learn_on_error off, autolearning will be performed even if bayes classifier already agrees with the new classification (i.e. yielded BAYES_00 for what we are now trying to teach it as ham, or yielded BAYES_99 for spam). This is a traditional setting, the default was chosen to retain backward compatibility.
bayes_auto_learn_on_error turned on, autolearning will be performed only when a bayes classifier had a different opinion from what the autolearner is now trying to teach it (i.e. it made an error in judgement). This strategy may or may not produce better future classifications, but usually works very well, while also preventing unnecessary overlearning and slows down database growth.