This is a discussion on Re: A different approach to scoring spamassassin hits - SpamAssassin ; Tom Allison wrote: > > On Jun 30, 2007, at 1:20 AM, Marc Perkel wrote: > >> >> >> >> Tom Allison wrote: >>> For some years now there has been a lot of effective spam filtering >>> using statistical ...
Tom Allison wrote:
> On Jun 30, 2007, at 1:20 AM, Marc Perkel wrote:
>> Tom Allison wrote:
>>> For some years now there has been a lot of effective spam filtering
>>> using statistical approaches with variations on Bayesian theory,
>>> some of these are inverse Chi Square modifications to Niave Bayes or
>>> even CRM114 and other "languages" have been developed to improve the
>>> scoring of statistical analysis of spam. For all statistical
>>> processes the spamicity is always between 0 and 1.
>>> Many Thanks for those of you who have read this far for your
>>> patience and consideration.
>> Tom, I suggested something somilar to that years ago and I'd still
>> like to see it tried out. I wonder what would happen if you stripped
>> ot the body and ran bayes just on the headers and the rules and let
>> bayes figure it out. You do have to have some points to start with to
>> get bayes pointed in the right direction. But you could use black
>> lists and white lists to do bayes training. Also needs more rules to
>> identify ham and not just rules to identify spam.
> I was under the belief that there were Ham-centric tests that would
> result in negative point scorings.
> Ham doesn't try to be evasive. It's pretty easy to identify. Without
> SA tagging much of it falls to <<0.5 and whitelisting would capture
> much of the exceptions.
> As for headers only testing -- The first five lines of stock spam is
> very telling...
> My question about SA is the PerMsgStatus (I think) Is this the place
> to retrieve all the rules information? I know today you can get a
> list of all the rules that HIT, but is there where you would look to
> find all the rules that were attempted? Or is there a better place
> for it?
There are some ham tests in SA but not nearly enough.