Can't learn individual copies of identical spam - SpamAssassin

This is a discussion on Can't learn individual copies of identical spam - SpamAssassin ; Every so often I see a run of spams that are identical in content and *most* of their headers; the only differences are in the Received: headers' datestamps. Unfortunately, all too often these runs *also* manage to score BAYES_00. The ...

+ Reply to Thread
Results 1 to 3 of 3

Thread: Can't learn individual copies of identical spam

  1. Can't learn individual copies of identical spam

    Every so often I see a run of spams that are identical in content and
    *most* of their headers; the only differences are in the Received:
    headers' datestamps.

    Unfortunately, all too often these runs *also* manage to score BAYES_00.

    The most recent example is the fake "CNN.com Daily Top 10" spams; I'm
    sure most of you can find a copy or two in your inbox or spam folder.
    The *text* part is even legit; it's only the HTML part that links to
    (IIRC) a virus or spyware download.

    In the interests of getting Bayes to score something resembling
    correctly (I'd take even BAYES_50 right now), what can I do to force SA
    to learn from each and every example of such a series? (I've tried
    manually fiddling the Message-ID header, with no luck.)

    For this particular series, I think I've got rules in place to tag them
    despite the BAYES_00 hit, but I'd like to know if there's a general
    solution to be able to properly learn such spam runs.

    (Given the number of this particular series I've fed in, they *should*
    be hitting BAYES_99.)

    -kgd


  2. Re: Can't learn individual copies of identical spam

    On 07.08.08 13:42, Kris Deugau wrote:
    > Every so often I see a run of spams that are identical in content and
    > *most* of their headers; the only differences are in the Received:
    > headers' datestamps.


    mail loop, or maybe the same mail sent to more address that ends by you.

    > Unfortunately, all too often these runs *also* manage to score BAYES_00.
    >
    > The most recent example is the fake "CNN.com Daily Top 10" spams; I'm
    > sure most of you can find a copy or two in your inbox or spam folder.
    > The *text* part is even legit; it's only the HTML part that links to
    > (IIRC) a virus or spyware download.
    >
    > In the interests of getting Bayes to score something resembling
    > correctly (I'd take even BAYES_50 right now), what can I do to force SA
    > to learn from each and every example of such a series? (I've tried
    > manually fiddling the Message-ID header, with no luck.)


    what do you mean? what did SA say when you trained it?

    > For this particular series, I think I've got rules in place to tag them
    > despite the BAYES_00 hit, but I'd like to know if there's a general
    > solution to be able to properly learn such spam runs.
    >
    > (Given the number of this particular series I've fed in, they *should*
    > be hitting BAYES_99.)


    --
    Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
    Warning: I wish NOT to receive e-mail advertising to this address.
    Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
    Linux is like a teepee: no Windows, no Gates and an apache inside...


  3. Re: Can't learn individual copies of identical spam

    Matus UHLAR - fantomas wrote:
    > On 07.08.08 13:42, Kris Deugau wrote:
    >> Every so often I see a run of spams that are identical in content and
    >> *most* of their headers; the only differences are in the Received:
    >> headers' datestamps.

    >
    > mail loop, or maybe the same mail sent to more address that ends by you.


    Well, that accounts for some of it. (Staff aliases.) But it doesn't
    account for three today, four yesterday, two the day before, three the
    day before that... for several weeks.

    > what do you mean? what did SA say when you trained it?


    Well, if I take a fresh example, and learn that single message, I get
    "Learned 1 message(s)" as expected. If I fiddle the Message-ID in that
    message (I've tried everything from a single-character change to
    complete replacement), and try to learn it again, I get "Learned 0
    message(s)".

    I'm not so much worried about catching them as I am about making sure
    Bayes gets properly trained - if I can't learn more than one example of
    a given series, the Bayes results won't properly reflect the actual
    spamminess of the messages.

    -kgd


+ Reply to Thread