On Sat, 2008-11-01 at 22:54 +0000, Martin Gregorie wrote:
> On Sat, 2008-11-01 at 23:19 +0100, Karsten Bräckelmann wrote:
> > Yes, there is. Your MUA, Evolution, features pre-formatted paragraphs in
> > the Composer. But I don't feel like repeating myself today.

> [...] I must remember to use it selectively to prevent line wrapping.

It's most handy for code snippets, config and logs slightly exceeding
the default line-wrapping width. But I digress...

> > > describe MG_CASINO Casino gambling
> > > body __MG_CAS1 /(csnaio|casino)/i
> > > header __MG_CAS2 Subject =~ /casino/i
> > > header __MG_CAS3 From =~ /casino/i
> > > body __MG_CAS4 /(\$[0-9]+|[0-9]+ *euro|gold|real deal|invite.*play)/i
> > > meta MG_CASINO ((__MG_CAS1||__MG_CAS2||__MG_CAS3)&&__MG_CAS4)
> > > score MG_CASINO 2.0

> >
> > Hmm, it might be worth for local rules, to score at least a few of
> > them on sight with a low score, yet keeping them in the meta. (Yes,
> > single word rules are generally bad, but scoring a From header that
> > contains specific words might help catch these.) I'd enforce word
> > breaks, though.

> ...and reduce the meta score to compensate?

Well, that's up to you. The score is rather arbitrary, so you can
use whatever you feel comfortable with.

Reducing the meta score to compensate indeed might be good. My thought
was, to partially split up the score in case the meta doesn't match. I
guess the word "casino" in either the Subject or (even stronger) From
header might be worth at least 0.2 or something on its own.

One note I missed earlier, regarding the quantifiers: Using unbounded
quantifiers can and will be expensive. Wherever possible you should use
bounds. So, rather than /.*/, using /.{0,20}/ with a suitable upper
bound will prevent the RE from backtracking an entire mail. Similar for
any occurrence of the + quantifier, of course.

> Has the Perl regex syntax changed since Perl4? If it has I think I need
> to get another Perl book before venturing away from the simple subset
> I'm comfortable with.

Yes, it did change -- not positive about Perl 4, but I guess it's mostly
additions only to the RE syntax. In particular a "simple subset" likely
should still be valid.

You can find more info than you ever want here:

Assuming this was due to recommending word boundaries (see Regular
Expressions / Assertions in perlre), here's a rewritten From matching
header __MG_CAS3 From =~ /\bcasino\b/i

> > This one would have been flagged as spam when using the default
> > required_score spam threshold of 5.0.

> I'm thinking about reducing that back to the default. I initially set it
> higher while finding out how to use SA.

I see. Something to keep in mind when pondering if it's actually worth
the effort of writing custom rules -- it might not, if you're going to
use the default anyway.

> > Also, I notice you're apparently
> > not using Bayes, which likely could raise the score above your 6.0
> > threshold, when trained on these.

> Not entirely. Its enabled but I'm only using auto-learn with default
> thresholds. However its probably not doing much at present because I
> recently reset it by deleting the bayes database.

Ah, so that's why it didn't show up -- since dropping your Bayes DB, SA
didn't learn sufficient ham and spam mail (200 each by default). You
should bootstrap and do some initial learning with existing ham and spam

Also, as you can see in this example, you specifically should train
low-scoring and missed spam after the initial training. SA did not
auto-learn this one, because it is way below the threshold(s).

> > On my check the sample also scored 0.8 for SPF_HELO_SOFTFAIL. Plus
> > Pyzor, which is not enabled by default unless you install Pyzor.

> Noted.

Pyzor is more complicated to set up and heavy-weight. The missing
SPF_HELO_SOFTFAIL though likely is simply because you don't have the
Perl Mail::SPF module installed. If you do, it should start working

> > Oh, and then I got a custom rule worth 0.5 for any single Relay, direct
> > client to MX mail.

> Nope, I'm not seeing that one.

That's because it is a custom rule on my setup.


char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a \x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}