Hello,

I've been trying to figure out how to write rules matching
international spam with non-ASCII characters -- especially languages
with a completely different script (e.g. Cyrillic) -- for SA 3.2.5 on
perl 5.10.0. Rather than write rules to match words in 3+ character
sets per language (which is a maintenance nightmare and probably prone
to false-positives), it looks like 'normalize_charset 1' should allow me
to write the rules once in UTF-8, but this isn't the way it's working.

I'm not entirely sure why it doesn't Just Work (without support for
case-insensitive matches) as the code stands right now. If I add the
following body rule (the rule is written in UTF-8), it will fail to
match "Привет, мир!" in the body regardless of the body character set
(I tried UTF-8 and KOI8-R):

body TEST_RU /Привет, мир!/

If I change run_generic_tests to 'use utf8;' at the beginning of the
test body (patch attached), UTF-8 rules work perfectly (and the TEST_RU
rule fires.) Is there a better way to do this? If not, is there any
chance of the patch (or something similar) being incorporated into
SpamAssassin?

--
Ben Winslow