On Thu, 2008-02-14 at 10:17 -0500, up@3.am wrote:
> We're suddenly getting a ton of spam with koi8-r encoding...I tried to do
> a custom rule for it like this:
> header SUBJ_RUSS_CHAR Subject =~/koi8-r/i
> describe SUBJ_RUSS_CHAR has Russian char encoding
> score SUBJ_RUSS_CHAR 3.5

> I would think the rule would catch it either way...what am I missing?

I guess its being decoded before matching. It's not the actual subject
anyway, but a charset definition.

Instead of writing your own rules to catch these, I suggest using
ok_locales. See the Language Options:

If you want to trigger on Russian only, list all but ru. However, you
probably want more like en (all western charsets) only. Also, this
will trigger on header as well as on the body. grep for CHARSET_FARAWAY
in the rules, if you want to adjust its scores.


char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a \x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}