At 01:09 PM 7/5/2007 -0700, you wrote:
>You could match on the "application/octet-steam" and the file
>extension being ".pdf".


Good idea, but sorry, I should have been clearer (my BIM):
I meant use that in COMBINATION with OTHER signs, mainly to detect the
difference between the two styles.

To clear this up, I did some MassChecks on all 2007 data for my two
most diverse (Muggle) domains, and using your (suggested) rule above,
had these results (percent FPs, total FPs, total ham PDFs):
34.66%, 96, 277
26.98%, 92, 341
The first is a business, the 2nd an extended family. Some quick eyeball
checks of just small PDFs indicate this content type is valid, and in
common use.

That content type is only of interest in the context of differentiating
between the two very distinct styles, which I suspect are produced by
two separate pieces of software. Sorry if I confused anybody.

I also did a much smaller masscheck of just this month's data for the
domain which received the new style, and, using the content type as a
branch point for my own rules, had zero FPs, and 100% spam killrate
(for PDFs).

Again, my PDF specific rules "****tail" consists of a combination of
message size, Realname, and internal "tags" (using a post processing
filter that I suspect uses a similar approach to Dallas' SA plugin).
On top of those, I'm getting plenty of hits using nation of
origin/route.
- "Chip"