I am trying to figure out why almost all spam continues to get through. I
use
Fedora 8,
Evolution 2.12.3, and
spamassassin 3.2.4

I have marked as junk, respectively as non-junk, more than 100 mails of
each kind. Probably more than 200 by now.

I have saved to a file the source of one typical example spam.
This mail contains sequences like

rqz

embedded in the middle of "sensitive" words. That makes the word look like

spa massa ssin

(substitute your favorite merchandise). The sequence above selects white
letters on a white background, and in addition, makes the letters rather
small, two pixels high. In this way the words that would otherwise
trigger a filter rule, get split and the pieces are separated by other
words or letter combinations; yet those other words do not show up on the
screen.

Googling around I found a list of Spamassassin tests, including

Area tested: body
Description: HTML font color similar to background
Test name: HTML_FONT_LOW_CONTRAST
Default score:
local: 0.131
net: 0.543
bayes: 0.663
bayes + net: 0.124

(I do not understand these scores. Why are they different? When do they
apply - eg. does the 'local' value apply if I run "spamassassin --local"?
But if so, why is a low font contrast less significant when --local is
used? etc.)

There was also another test named HTML_FONT_INVISIBLE, but I later found
this test appears to be assiociated with earlier versions of spamassassin.

Since Evolution runs "spamc --local", I tried "spamassassin --local" and
looked at the output. Here is one:

X-Spam-Status: No, score=3.4 required=5.0 tests=AWL,DATE_IN_PAST_24_48,
HS_INDEX_PARAM,HTML_MESSAGE,RDNS_NONE autolearn=no version=3.2.4

There is no indication of the low-contrast rule having been triggered.
Should this be so? Is this header supposed to show all tests with non-
zero scores? How can I have spamassassin give me a complete list of
tests with nonzero scores?

I added lines to my .spamassassin/user_prefs

score HTML_FONT_INVISIBLE 9.99
score HTML_FONT_LOW_CONTRAST 9.99

but could not see any change.

Then I tried to look at the source code. I found a function
"html_font_invisible", which starts by computing the foreground and
background colors. I inserted an extra line of code to have the function
log its determinations. Here is some of the output:

backgroud:#ffffff foreground:#000000
backgroud:#ffffff foreground:#ffffff
backgroud:#ffffff foreground:#000000
backgroud:#ffffff foreground:#ffffff
backgroud:#ffffff foreground:#000000
backgroud:#ffffff foreground:#ffffff
backgroud:#ffffff foreground:#000000
backgroud:#ffffff foreground:#000000
backgroud:#ffffff foreground:#000000
backgroud:#ffffff foreground:#ffffff
backgroud:#ffffff foreground:#000000

That is, the function assumes the background is white, and correctly
finds that the text color is sometimes black, sometimes white.

This shows that Spamassassin does run that code, and does correctly
determine that some of the text has the same color as the background.

However, finding one's way through all of spamassassin's code is likely
to be a monumental task, so I wish to ask if somebody knows anything
about this problem.

Further googling turned up some discussions showing that the combination
fedora+evolution+junk-filtering had more complaints than e.g. ubuntu.
However, I did not see any resolution (the web server went offline).

Any ideas? Any pointers?

Thanks