This is a discussion on Re: Spam PDF - SpamAssassin ; email@example.com wrote: >> Actually, it didn't. The assertion is that if someone else hadn't seen >> this exact message first, then SA wouldn't have caught it. > > No, the assertion is that if someone else hadn't seen prior abuse ...
>> Actually, it didn't. The assertion is that if someone else hadn't seen
>> this exact message first, then SA wouldn't have caught it.
> No, the assertion is that if someone else hadn't seen prior abuse from
> the sending host first (not this exact message), then SA wouldn't have
> caught that particular message. That assertion happens to be true for
> the blacklists, and true for BAYES as well since it would have had to
> have seen headers (since the payload is vastly different) that look like
> this sending host in the recent past and been told that it was SPAM.
Your assertion about bayes is not well supported. It might have been
flagged by bayes for reasons that have _NOTHING_ to do with the received
>> The PBL (which isn't spamtrap fed, it's collected from ISP published
>> and/or contributed data) would have caught this based upon issues that
>> have nothing at all to do with this message, and most likely nothing at
>> all to do with this current round of spam. It would be based upon the
>> host provider's policy that this host shouldn't send email to the internet.
> Which means, some time, in the past, for whatever reasons that
> particular IP address did something against someone's policy to end up
> on that list. The important part being "in the past".
No, it means that the ISP, or possibly net block user, told Spamhaus
"it's an end user IP address, and not a mail server". There might be
_NO_ previous abuse from that IP address, and they'll still be listed.
The "policy" here is NOT the recipient's policy, the sendering network
>> Similarly, the SPAMCOP listing is most likely not related to _this_
>> message. It is more likely an ongoing abuse issue, so the fact that the
>> host fed a spamtrap at spamcop at some point in the past does not mean
>> that they were "lucky to catch this message". The odds are that the
>> SPAMCOP listing has nothing to do with this message.
> Spamcop automatically delists IP addresses over time, to be relisted
> someone/something has to report new abuse. If you happen to receive the
> message before anyone has reported the new abuse, well it won't be listed.
It could have been recent abuse from an entirely different message
batch. In other words, maybe that IP sent a standard stock scam
yesterday, and today it sent the pdf spam ... and this person was the
first one to receive that pdf spam message. No previous recipient of
the same message. But they'll still be listed at spamcop.
>> I would make the same characterization of BAYES. You don't have to see
>> a specific message in the past in order for BAYES to catch it.
>> Therefore, you're not depending upon "luckily not being the first person
>> to see a given message".
> Explain how BAYES will have any matching tokens to work on if its from a
> fresh, never before seen by your system, zombie and there's no message
> body other than the attachment? All you have to work with is headers
> which you've never seen before and MIME boundaries which you've never
> seen before.
There are more headers than just the received headers. And, I honestly
don't know whether or not an attachment's raw data is analyzed by bayes
or not. My assumption is that it is.
>> Just resting upon BAYES, BOTNET, and PBL, you're not "lucky to have
>> caught the message because you're a late receiver". You've caught the
>> message due to a combination of policy, misuse, and historical
>> characteristics of spam in general being used to train your system.
> All of which needs prior examples/reporting of messages similar to the
> one you're trying to detect, that's what "historical characteristics of
> spam" means.
BOTNET does _NOT_ need prior reporting. And the prior reporting the PBL
require has nothing to do with abuse. Further, BAYES does not depend
upon the received headers. But even if you're right about bayes, your
claim that "all of which needs prior..." is at least 2/3 wrong, if not