On Fri, 2008-02-15 at 11:49 -0500, Rosenbaum, Larry M. wrote:
> > From: Karsten Bräckelmann [mailto:guenther@rudersport.de]
> >
> > I've pointed it out before. Just use ok_locales, which is all about
> > these char sets. No REs, almost no thinking required, no headache. A
> > single line, and you're done.

> What's the best way to test the character set for use in a meta rule?
> We don't want to reject

SA doesn't reject anyway. It merely classifies and tags mail.

> all messages with the Russian (Cyrillic)
> character set, but we may want to use something like
> if (character set is Russian) && (body contains 'xyzzy')

Well, it depends...

If it is ok for you to treat all char sets, which you did not set in
ok_locales, the same way, then it is just a regular meta rule -- and
based on my understanding of your description re-scoring of the few

> for instance. How would we test the character set?

This I believe can not be done with the current HeaderEval plugin, since
it does not report the char set, but treats all unwanted char sets the
same. However, if you need fine grained rules per char set, it should be
fairly easy to alter the existing plugin or to write custom rules or
plugin based on this.


char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a \x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}