Re: Am I an idiot, or is bayes broken on my system?

From: Troy Settle <troy@psknet.com>

Date: Mon, 10 Nov 2008 11:30:27 -0500

I received a piece of junkmail this morning:

[url]http://home.psknet.com/troy/1.txt[/url]

In the spam report, I see this: BAYES_00=-2.599

So, I run it through sa-learn with --spam:

Learned tokens from 1 message(s) (1 message(s) examined)

Then, I re-scan it using spamc, and still I get:

BAYES_00=-2.599

What gives? I don't expect the total score to come up much, but the

bayes should at least go from a negative number to a positive number...

shouldn't it?

The answer Depends on how many tokens bayes is looking at and how

spammy those tokens are. You can see what bayes thinks about each

token with --debug output. I get BAYES_40 on your message.

% wget [url]http://home.psknet.com/troy/1.txt[/url]

% spamassassin -D --test-mode --debug all,bayes < 1.txt 2>&1 | grep bayes:

...

[14389] dbg: bayes: corpus size: nspam = 426975, nham = 53737

[14389] dbg: bayes: token 'Dodge' => 0.999612090680101

[14389] dbg: bayes: token 'sincerely' => 0.999492864983535

[14389] dbg: bayes: token 'decode' => 0.0344385308520192

[14389] dbg: bayes: token 'I'll' => 0.0365668821340277

[14389] dbg: bayes: token 'Perspective' => 0.0404549158471554

...

[14389] dbg: bayes: score = 0.310353325094371

After you learn a message as spam the numbers and raw score should

increase somewhat depending on how many times that token has been

seen. I get BAYES_60 on the message after learning.

% sa-learn --spam < 1.txt

% sa-learn --sync

% spamassassin -D --test-mode --debug all,bayes < 1.txt 2>&1 | grep bayes:

[14618] dbg: bayes: corpus size: nspam = 426990, nham = 53737

...

[14618] dbg: bayes: token 'Dodge' => 0.999615320566195

[14618] dbg: bayes: token 'sincerely' => 0.999498371335505

[14618] dbg: bayes: token 'decode' => 0.0348456512323892

[14618] dbg: bayes: token 'I'll' => 0.0366062570517363

[14618] dbg: bayes: token 'Perspective' => 0.0670493467695761

...

[14618] dbg: bayes: token 'omaha' => 0.958

[14618] dbg: bayes: token 'elsasser' => 0.958

[14618] dbg: bayes: token 'riders' => 0.958

...

[14618] dbg: bayes: score = 0.659988861825694

-jeff