Japanese characters/language blocked by spamassassing + amavisd-new - SpamAssassin

This is a discussion on Japanese characters/language blocked by spamassassing + amavisd-new - SpamAssassin ; Hi all, I have postfix (ver 2.3.3) with mysql, virtual users, amavisd-new, clamav and spamassassin (ver 3.2.5), dcc, pyzor and razor running on centos 5.1. Everything works fine but spamassassin + amavisd-new frequently give a high score for emails coming ...

+ Reply to Thread
Results 1 to 5 of 5

Thread: Japanese characters/language blocked by spamassassing + amavisd-new

  1. Japanese characters/language blocked by spamassassing + amavisd-new

    Hi all,

    I have postfix (ver 2.3.3) with mysql, virtual users, amavisd-new, clamav
    and spamassassin (ver 3.2.5), dcc, pyzor and razor running on centos
    5.1.
    Everything works fine but spamassassin + amavisd-new frequently give a
    high score for emails coming from Japan (using Japanese
    character/language).
    I try to increase the value that emails will be killed or blocked because
    of marked as spam but I think it is not safe for me.

    Here are my configurations.
    local.cf (spamassassin):
    -----------------------
    ok_locales all
    ok_locales en ja ko th zh
    score CHARSET_FARAWAY 3.5
    score CHARSET_FARAWAY_HEADER 2.8
    score HTML_CHARSET_FARAWAY 1.0
    score MIME_CHARSET_FARAWAY 3.5
    use_dcc 1
    dcc_path /usr/local/bin/dccproc
    use_pyzor 1
    pyzor_path /usr/bin/pyzor
    use_razor2 1
    razor_config /var/spool/amavisd/razor-agent.conf
    use_bayes 1
    use_bayes_rules 1
    bayes_auto_learn 1


    Spam tag on /etc/amavisd.conf:
    ------------------------------
    $sa_tag_level_deflt = undef;
    $sa_tag2_level_deflt = 5.0;
    $sa_kill_level_deflt = 8.0;
    $sa_dsn_cutoff_level = 10;


    Sample X-Spam-Status:
    --------------------
    X-Spam-Status: Yes, score=11.732 tag=x tag2=5 kill=8 tests=[AWL=0.404,
    **** BAYES_99=3.5, DBL_12_LETTER_FLDR=0.2, DBL_12_LETTER_PGIMG=0.2,
    **** FM_FRM_RN_L_BRACK=2.674, FM_MULTI_ODD2=1.1, FM_WHITEONWHITE=0.45,
    **** HELO_EQ_JP=1.244, HOST_EQ_JP=1.265, HS_INDEX_PARAM=0.001,
    **** HTML_IMAGE_RATIO_06=0.001, HTML_MESSAGE=0.001,
    **** HTML_NONELEMENT_40_50=0.944, MIME_HTML_ONLY=1.457, SARE_RAND_2=2.5,
    **** SARE_URI_BARGAIN=0.634, SARE_URI_LET_DIG_PIC=1.157,
    **** USER_IN_WHITELIST_TO=-6]


    Any advices will be apreciated.

    Thanks n regards,

    Fuad NAHDI
    Jakarta, INDONESIA


  2. Re: Japanese characters/language blocked by spamassassing + amavisd-new

    Dear Pak Fuad,

    Kayaknya untuk 99_FVGT_meta.cf yang bikin scorenya gede pak, sekarang
    pake nya conf 00_FVGT_File001.cf.. Kebetulan di server saya udah nggak
    pake lagi. Coba aja nggak pake rules itu, dan di coba lagi.

    On Sat, Aug 2, 2008 at 3:58 PM, Fuad NAHDI wrote:
    > Hi all,
    >
    > I have postfix (ver 2.3.3) with mysql, virtual users, amavisd-new, clamav
    > and spamassassin (ver 3.2.5), dcc, pyzor and razor running on centos
    > 5.1.
    > Everything works fine but spamassassin + amavisd-new frequently give a
    > high score for emails coming from Japan (using Japanese
    > character/language).
    > I try to increase the value that emails will be killed or blocked because
    > of marked as spam but I think it is not safe for me.
    >
    > Here are my configurations.
    > local.cf (spamassassin):
    > -----------------------
    > ok_locales all
    > ok_locales en ja ko th zh
    > score CHARSET_FARAWAY 3.5
    > score CHARSET_FARAWAY_HEADER 2.8
    > score HTML_CHARSET_FARAWAY 1.0
    > score MIME_CHARSET_FARAWAY 3.5
    > use_dcc 1
    > dcc_path /usr/local/bin/dccproc
    > use_pyzor 1
    > pyzor_path /usr/bin/pyzor
    > use_razor2 1
    > razor_config /var/spool/amavisd/razor-agent.conf
    > use_bayes 1
    > use_bayes_rules 1
    > bayes_auto_learn 1
    >
    >
    > Spam tag on /etc/amavisd.conf:
    > ------------------------------
    > $sa_tag_level_deflt = undef;
    > $sa_tag2_level_deflt = 5.0;
    > $sa_kill_level_deflt = 8.0;
    > $sa_dsn_cutoff_level = 10;
    >
    >
    > Sample X-Spam-Status:
    > --------------------
    > X-Spam-Status: Yes, score=11.732 tag=x tag2=5 kill=8 tests=[AWL=0.404,
    > BAYES_99=3.5, DBL_12_LETTER_FLDR=0.2, DBL_12_LETTER_PGIMG=0.2,
    > FM_FRM_RN_L_BRACK=2.674, FM_MULTI_ODD2=1.1, FM_WHITEONWHITE=0.45,
    > HELO_EQ_JP=1.244, HOST_EQ_JP=1.265, HS_INDEX_PARAM=0.001,
    > HTML_IMAGE_RATIO_06=0.001, HTML_MESSAGE=0.001,
    > HTML_NONELEMENT_40_50=0.944, MIME_HTML_ONLY=1.457, SARE_RAND_2=2.5,
    > SARE_URI_BARGAIN=0.634, SARE_URI_LET_DIG_PIC=1.157,
    > USER_IN_WHITELIST_TO=-6]
    >
    >
    > Any advices will be apreciated.
    >
    > Thanks n regards,
    >
    > Fuad NAHDI
    > Jakarta, INDONESIA
    >
    >
    >
    >



  3. Re: Japanese characters/language blocked by spamassassing +amavisd-new

    On Sat, 2008-08-02 at 15:58 +0700, Fuad NAHDI wrote:
    > Hi all,
    >
    > I have postfix (ver 2.3.3) with mysql, virtual users, amavisd-new, clamav
    > and spamassassin (ver 3.2.5), dcc, pyzor and razor running on centos
    > 5.1.
    > Everything works fine but spamassassin + amavisd-new frequently give a
    > high score for emails coming from Japan (using Japanese
    > character/language).


    Sneak preview of the comments below: Part of the reason Japanese mail
    is scored high on your system is, because you trained your Bayes to
    believe it is spam, and you are seriously punishing senders from .jp
    domains. But read on.


    > Sample X-Spam-Status:
    > --------------------
    > X-Spam-Status: Yes, score=11.732 tag=x tag2=5 kill=8 tests=[AWL=0.404,
    > BAYES_99=3.5, DBL_12_LETTER_FLDR=0.2, DBL_12_LETTER_PGIMG=0.2,


    Your Bayes is trained badly. Use sa-learn to correct it, and learn
    Japanese ham as ham.

    > FM_FRM_RN_L_BRACK=2.674, FM_MULTI_ODD2=1.1, FM_WHITEONWHITE=0.45,


    Neither these DBL_*, nor the FM_* rules are part of stock SA. With a
    notable exception of FM_FRM_RN_L_BRACK.

    > HELO_EQ_JP=1.244, HOST_EQ_JP=1.265, HS_INDEX_PARAM=0.001,


    The *_EQ_JP rules are not part of stock SA. Given your complaint, you
    seriously should not use these.

    > HTML_IMAGE_RATIO_06=0.001, HTML_MESSAGE=0.001,
    > HTML_NONELEMENT_40_50=0.944, MIME_HTML_ONLY=1.457, SARE_RAND_2=2.5,


    Bad sending MUA, composing HTML mail with no text/plain part.

    > SARE_URI_BARGAIN=0.634, SARE_URI_LET_DIG_PIC=1.157,
    > USER_IN_WHITELIST_TO=-6]


    SARE_* rules are not part of stock SA.


    > Any advices will be apreciated.


    Train your Bayes, learn ham mail. Also, drop your AWL database and start
    fresh, since it currently maintains an average score of about 12 for
    that particular sender.

    Get rid of third party rules, if they don't apply to your particular
    mail stream. Seriously, reconsider *all* third party rules and review
    their performance on *your* mail. This is a problem you created
    yourself, not an issue with SA.


    Granted, assuming a neutral Bayes score, no AWL and no whitelisting,
    stock SA still scores that example at 5.078. Slightly beyond the
    threshold.

    However, properly learning ham will correct this, and a sane AWL
    database will help with future mail, too. If you keep your whitelist,
    you'll easily get the score down below 0.

    Also, you should consider LARTing the sender to use a proper MUA. Or, if
    you run into rules like MIME_HTML_ONLY frequently, adjust the score
    locally to better cope with your particular mail.


    Now, if someone please could translate Donis reply...

    guenther


    --
    char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a \x10\xf4\xf4\xc4";
    main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


  4. Re: Japanese characters/language blocked by spamassassing +amavisd-new

    > Train your Bayes, learn ham mail. Also, drop your AWL database and start
    > fresh, since it currently maintains an average score of about 12 for
    > that particular sender.
    >
    > Get rid of third party rules, if they don't apply to your particular
    > mail stream. Seriously, reconsider *all* third party rules and review
    > their performance on *your* mail. This is a problem you created
    > yourself, not an issue with SA.
    >
    >
    > Granted, assuming a neutral Bayes score, no AWL and no whitelisting,
    > stock SA still scores that example at 5.078. Slightly beyond the
    > threshold.
    >
    > However, properly learning ham will correct this, and a sane AWL
    > database will help with future mail, too. If you keep your whitelist,
    > you'll easily get the score down below 0.
    >
    > Also, you should consider LARTing the sender to use a proper MUA. Or, if
    > you run into rules like MIME_HTML_ONLY frequently, adjust the score
    > locally to better cope with your particular mail.
    >


    Hi guenther,

    Yes I know it is my configuration problem and I did not suspect neither SA
    nor amavisd issues. The thing is I don't know how to figure it out.
    Your answer is seriously very details explanation. Now I understand the
    problem.

    Many thanks for your reply.


    >
    > Now, if someone please could translate Donis reply...


    He suspected the 99_FVGT_meta.cf rule making the high score so he asked
    me to replace this rule to 00_FVGT_File001.cf. A good idea so I follow his
    recommendation also. Thanks pak Doni.


    Fuad NAHDI,
    Jakarta, INDONESIA


    >
    > guenther
    >
    >
    > --
    > char
    > *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a \x10\xf4\xf4\xc4";
    > main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i > c<<=1:
    > (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0;
    > }}}
    >
    >



  5. Re: Japanese characters/language blocked by spamassassing +amavisd-new

    On Sun, 2008-08-03 at 01:59 +0700, Fuad NAHDI wrote:
    > > Train your Bayes, learn ham mail. Also, drop your AWL database and start
    > > fresh, since it currently maintains an average score of about 12 for
    > > that particular sender.
    > >
    > > Get rid of third party rules, if they don't apply to your particular
    > > mail stream. Seriously, reconsider *all* third party rules and review
    > > their performance on *your* mail. This is a problem you created
    > > yourself, not an issue with SA.
    > >
    > >
    > > Granted, assuming a neutral Bayes score, no AWL and no whitelisting,
    > > stock SA still scores that example at 5.078. Slightly beyond the
    > > threshold.
    > >
    > > However, properly learning ham will correct this, and a sane AWL
    > > database will help with future mail, too. If you keep your whitelist,
    > > you'll easily get the score down below 0.
    > >
    > > Also, you should consider LARTing the sender to use a proper MUA. Or, if
    > > you run into rules like MIME_HTML_ONLY frequently, adjust the score
    > > locally to better cope with your particular mail.
    > >

    >
    > Hi guenther,
    >
    > Yes I know it is my configuration problem and I did not suspect neither SA
    > nor amavisd issues.


    Sorry, I did not mean to imply that it is your fault alone and clear
    stock SA out of the picture. My main point is, though, that lack of
    Bayes training and adding third-party rules accounted for a score of 12
    alone, while SA accounts for 5. ´╗┐I meant to point out that the lions
    share is non-SA rules. And I did admit that a stock SA with no Bayes
    training would have resulted in a FP.

    The bottom line is, that (a) proper Bayes training is crucial, and
    (b) third party-rules must not be used without a close look at their
    results with respect to your particular mail stream.


    > The thing is I don't know how to figure it out.


    You got the rules that trigger and their score. Check where these rules
    come from, and whether they perform according to their score. Start with
    the rules that account for large-ish scores. Remove third-party rules,
    that turn out to have a negative impact. Tune individual rules scores if
    need be.


    > Your answer is seriously very details explanation. Now I understand the
    > problem.
    >
    > Many thanks for your reply.


    No problem, glad it did help.


    > ´╗┐Jakarta, INDONESIA


    Given your country and the fact you got a problem with Japanese language
    mail, you might find it particular important to train Japanese mail. If
    you get a lot of JP spam, but only a few important hams, this may even
    include 'sa-learn --forget' on some JP spam, and still train the ham.
    Just a guess, though.

    guenther -- who should be sleeping by now


    --
    char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a \x10\xf4\xf4\xc4";
    main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


+ Reply to Thread