On 7 Nov 2008, at 23:43, Neil wrote:

> On 7 Nov 2008, at 23:40, Matt Kettler wrote:
>> Neil wrote:
>>> I'm wondering about the best way to train my Bayes filter (per-user
>>> filtering).
>>> I have a Junk folder, and it contains roughly three categories of
>>> mail
>>> (to my mind, at least):
>>> A. Mail SpamAssassin marked spam and auto-learned as spam.
>>> B. Mail SpamAssassin marked spam, but did not autolearn.
>>> C. Mail SpamAssassin did not mark spam, which I moved in there.
>>> So my questions:
>>> 1. Would it be bad for me to just run sa-learn on the entire Junk
>>> folder; or should I just let auto-learn do it's thing and sa-learn
>>> the
>>> false negatives?

>> No. It's not bad.
>> If SA has already correctly learned the message, it will be
>> skipped. Of
>> course, this means it's a waste of time to feed SA messages it's
>> already
>> learned correctly, but it's not going to hurt anything.
>>> 2. Likewise, my Inbox contains just ham; could I run sa-learn on
>>> that
>>> entire mailbox periodically?

>> Ditto.
>>> 3. Lastly, will it be detrimental (in terms of future accuracy) to
>>> sa-learn the same mail more than once, or will SpamAssassin remember
>>> it? (I seem to remember reading the latter, but I wasn't sure).

>> It will remember
>>> If it does, how long/many previous mails does it remember?

>> Currently the bayes_seen mechanism has no expiration, so it will
>> remember forever, or until you manually delete bayes_seen.

> Thanks.
> So then I think my strategy is going to be: sort the mail as usual,
> and then every once in a while log into my server and run a script
> which will call sa-learn on both mailboxes.

So maybe this is moving slightly off on a tangent, but:
Why does auto-learn sometimes learn spam with a rating of X, but not
spam with a rating of X+Y? Where's it's methodology?