double rules hits? - SpamAssassin

This is a discussion on double rules hits? - SpamAssassin ; I have been seeing several occasions where two rules hit for the same underlying issue, and it seems that this isn't really desired. Example 1: I got ham that had a line with dig [some.isp.name.].isphosts.junkemailfilter.com in it. It seems giving ...

+ Reply to Thread
Results 1 to 2 of 2

Thread: double rules hits?

  1. double rules hits?

    I have been seeing several occasions where two rules hit for the same
    underlying issue, and it seems that this isn't really desired.

    Example 1: I got ham that had a line with

    dig [some.isp.name.].isphosts.junkemailfilter.com

    in it. It seems giving it 2.3 points for SPOOF_COM2COM is fair, but
    that turns out to be 4.3 because SPOOF_COM2OTH gets 2.0. This ended up
    as a FP because I filter to spam folder at 1, preferring to misclassify
    some list mail to keep my inbox as clean as I can.


    X-Spam-Status: Yes, score=1.7 required=1.0 tests=AWL,BAYES_00,HTML_MESSAGE,
    SPOOF_COM2COM,SPOOF_COM2OTH autolearn=no version=3.2.4
    X-Spam-Report:
    * 2.0 SPOOF_COM2OTH URI: URI contains ".com" in middle
    * 2.3 SPOOF_COM2COM URI: URI contains ".com" in middle and end
    * -2.6 BAYES_00 BODY: Bayesian spam probability is 0 to 1%
    * [score: 0.0000]
    * 0.0 HTML_MESSAGE BODY: HTML included in message
    * 0.0 AWL AWL: From: address is in the auto white-list

    Example 2: blacklists

    Here, the mail is spam from a bad source, but with two lists more or
    less claiming this it doesn't seem quite right to add the scores. In
    this case spamcop says the machine has sent spam, and spamhaus that it's
    in XBL for being a compromised box.

    X-Spam-Status: Yes, score=3.6 required=1.0 tests=AWL,BAYES_50,HTML_MESSAGE,
    RCVD_IN_BL_SPAMCOP_NET,RCVD_IN_XBL,RDNS_NONE autolearn=spam version=3.2.4
    X-Spam-Report:
    * 0.0 HTML_MESSAGE BODY: HTML included in message
    * 0.0 BAYES_50 BODY: Bayesian spam probability is 40 to 60%
    * [score: 0.5676]
    * 4.0 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in bl.spamcop.net
    * [Blocked - see ]
    * 3.0 RCVD_IN_XBL RBL: Received via a relay in Spamhaus XBL
    * [123.142.103.19 listed in zen.spamhaus.org]
    * 0.1 RDNS_NONE Delivered to trusted network by a host with no rDNS
    * -3.6 AWL AWL: From: address is in the auto white-list



    So, I realize this would be complicated, but I wonder about having a
    score combining function for tests that are making essentially the same
    claim. Perhaps the 4 and 3 above should combine to 5, and the
    SPOOF_COM2* should just be 2.3.


  2. Re: double rules hits?

    On Fri, 2008-05-30 at 16:18 -0400, Greg Troxel wrote:
    > I have been seeing several occasions where two rules hit for the same
    > underlying issue, and it seems that this isn't really desired.
    >
    > Example 1: I got ham that had a line with
    >
    > dig [some.isp.name.].isphosts.junkemailfilter.com
    >
    > in it. It seems giving it 2.3 points for SPOOF_COM2COM is fair, but
    > that turns out to be 4.3 because SPOOF_COM2OTH gets 2.0. This ended up
    > as a FP because I filter to spam folder at 1, [...]


    That is *really* drastic. Much too low, IMHO.

    > preferring to misclassify some list mail [...]


    Do not filter the SA list. We are talking about spam. You will get FPs.

    > to keep my inbox as clean as I can.


    Hmm, why do mailing lists end up in your Inbox anyway, rather than
    filtering / moving them into dedicated mail folders? In most cases doing
    so without processing these messages by SA is a sensible decision...

    > X-Spam-Status: Yes, score=1.7 required=1.0 tests=AWL,BAYES_00,HTML_MESSAGE,
    > SPOOF_COM2COM,SPOOF_COM2OTH autolearn=no version=3.2.4


    This is only a FP, because *you* deliberately choose it to be. A score
    of 1.7 hardly can be a reason for complaint about FPs.


    > Example 2: blacklists

    [...]
    > So, I realize this would be complicated, but I wonder about having a
    > score combining function for tests that are making essentially the same
    > claim. Perhaps the 4 and 3 above should combine to 5, and the
    > SPOOF_COM2* should just be 2.3.


    It isn't complicated. You can easily set up meta rules, that
    "correct" (reduce in your case) the score.

    However, please do note, that generally, this *is* a strong sign for
    spammyness -- stronger, than the sum of both individually. Hence there
    are stock rules like DIGEST_MULTIPLE...


    Also, regarding both examples: The scores have been set based on some
    really long and thorough process investigating large ham and spam
    corpora. Especially, if two rules are similar in nature and likely to
    trigger both, the *sum* of them is what has proven to be most effective
    in identifying spam while still maintaining a seriously low FP rate. In
    a nutshell: The sum is on purpose.

    Granted, this is with the default threshold of 5, not with a custom
    required_score of 1... Seriously.

    guenther


    --
    char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a \x10\xf4\xf4\xc4";
    main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


+ Reply to Thread