This is a discussion on Re: Bayes Strategies - SpamAssassin ; On 7 Nov 2008, at 23:40, Matt Kettler wrote: > Neil wrote: >> I'm wondering about the best way to train my Bayes filter (per-user >> filtering). >> >> I have a Junk folder, and it contains roughly three categories ...
On 7 Nov 2008, at 23:40, Matt Kettler wrote:
> Neil wrote:
>> I'm wondering about the best way to train my Bayes filter (per-user
>> I have a Junk folder, and it contains roughly three categories of
>> (to my mind, at least):
>> A. Mail SpamAssassin marked spam and auto-learned as spam.
>> B. Mail SpamAssassin marked spam, but did not autolearn.
>> C. Mail SpamAssassin did not mark spam, which I moved in there.
>> So my questions:
>> 1. Would it be bad for me to just run sa-learn on the entire Junk
>> folder; or should I just let auto-learn do it's thing and sa-learn
>> false negatives?
> No. It's not bad.
> If SA has already correctly learned the message, it will be skipped.
> course, this means it's a waste of time to feed SA messages it's
> learned correctly, but it's not going to hurt anything.
>> 2. Likewise, my Inbox contains just ham; could I run sa-learn on that
>> entire mailbox periodically?
>> 3. Lastly, will it be detrimental (in terms of future accuracy) to
>> sa-learn the same mail more than once, or will SpamAssassin remember
>> it? (I seem to remember reading the latter, but I wasn't sure).
> It will remember
>> If it does, how long/many previous mails does it remember?
> Currently the bayes_seen mechanism has no expiration, so it will
> remember forever, or until you manually delete bayes_seen.
So then I think my strategy is going to be: sort the mail as usual,
and then every once in a while log into my server and run a script
which will call sa-learn on both mailboxes.