On 5 Jul 2007, tom@tacocat.net stated:

> On 7/2/2007, "Nix" wrote:
>
>
>>If you wanted to replace all other scoring mechanisms with the Bayes DB,
>>you'd need a second Bayes DB for this, anyway, or you'd need the tokens
>>corresponding to typically negative-scoring rules to have values which
>>cannot appear in the body of an email. Anything else would enable spammers
>>to force both FPs and FNs by customizing spam appropriately to include
>>suitable NO_FOO/YES_FOO values.

>
> That's why the data is being passed in as a second reference, nothing to
> do with the message. Seems to be working well, but there's some
> optimization to include.


It doesn't just need to be a second reference. The tokens need to be
independent of the message-derived tokens in the Bayes database itself
as well: i.e., it needs to be impossible for spammers to generate tokens
in the message body which can be used to influence the scores of the
tokens in the Bayes DB which correspond to the Bayes-scored rule hits.


(btw, Tom, what's wrong with your mailer? ^M characters --- CRCRLF line
terminators on the wire, perhaps? --- a doubled-up Subject line, and two
To: lines, one with fullnames, one without... I cleaned up the ^Ms in
this response.)

--
`... in the sense that dragons logically follow evolution so they would
be able to wield metal.' --- Kenneth Eng's colourless green ideas sleep
furiously