spam learning - SpamAssassin

This is a discussion on spam learning - SpamAssassin ; I use evolution as my mail client. Evolution supports spamassassin and in the past I let evolution use spamassassin to filter incoming mail. Recently, I switched to spam filtering using procmail. The relevant section of my my .procmailrc file is: ...

+ Reply to Thread
Results 1 to 4 of 4

Thread: spam learning

  1. spam learning

    I use evolution as my mail client. Evolution supports spamassassin and
    in the past I let evolution use spamassassin to filter incoming mail.
    Recently, I switched to spam filtering using procmail. The relevant
    section of my my .procmailrc file is:

    :0fw: spamc.lock
    * < 256000
    | spamc

    # Mails with a score of 15 or higher are almost certainly spam (with
    0.05%
    # false positives according to rules/STATISTICS.txt). Let's put them in
    a
    # different mbox. (This one is optional.)
    :0:
    * ^X-Spam-Level: \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*
    Inbox.Spam

    # All mail tagged as spam (eg. with a score higher than the set
    threshold)
    # is moved to "probably-spam".
    :0:
    * ^X-Spam-Status: Yes
    Likely.Spam

    Evolution then simply reads the various mboxes.

    Here's my question. I tell spamassassin to (re)learn the spam tagged
    messages using evolution. However, the format of the messages now has
    the spamc report with the offending message as an attachment. Is
    spamassassin "smart" enough to recognize the differnece between the two
    parts of the message?

    David


  2. Re: spam learning

    David Ronis wrote:

    > I use evolution as my mail client. Evolution supports spamassassin and
    > in the past I let evolution use spamassassin to filter incoming mail.
    > Recently, I switched to spam filtering using procmail.


    [...]

    > Here's my question. I tell spamassassin to (re)learn the spam tagged
    > messages using evolution. However, the format of the messages now has
    > the spamc report with the offending message as an attachment. Is
    > spamassassin "smart" enough to recognize the differnece between the two
    > parts of the message?


    http://wiki.apache.org/spamassassin/...nSpamAssassin:

    "It's OK to feed emails with Spamassassin markup into the sa-learn command --
    sa-learn will ignore any standard Spamassassin headers, and if the original
    email has been encapsulated into an attachment it will decapsulate the email.
    In other words sa-learn will undo any changes which Spamassassin has done
    before learning the spam/ham character of the email."

    --
    Sahil Tandon


  3. Re: spam learning

    On Fri, 2008-07-18 at 18:32 -0400, David Ronis wrote:
    > I use evolution as my mail client. Evolution supports spamassassin and
    > in the past I let evolution use spamassassin to filter incoming mail.
    > Recently, I switched to spam filtering using procmail. The relevant
    > section of my my .procmailrc file is:


    Good move.

    > :0fw: spamc.lock
    > * < 256000
    > | spamc
    >
    > # Mails with a score of 15 or higher are almost certainly spam (with 0.05%
    > # false positives according to rules/STATISTICS.txt). Let's put them in a
    > # different mbox. (This one is optional.)
    > :0:
    > * ^X-Spam-Level: \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*
    > Inbox.Spam
    >
    > # All mail tagged as spam (eg. with a score higher than the set threshold)
    > # is moved to "probably-spam".
    > :0:
    > * ^X-Spam-Status: Yes
    > Likely.Spam


    Rather weird folder name, huh?

    > Evolution then simply reads the various mboxes.
    >
    > Here's my question. I tell spamassassin to (re)learn the spam tagged
    > messages using evolution. However, the format of the messages now has
    > the spamc report with the offending message as an attachment. Is
    > spamassassin "smart" enough to recognize the differnece between the two
    > parts of the message?


    As Sahil already answered: Yes, SA will unwrap the original message and
    strip it's own headers before learning.


    Since you've just moved from client side filtering, some additional
    hints:

    If you don't like the message wrapped as an attachment, the option
    'report_safe 0' will prevent this. All reports are in the headers in
    that case, the message will not be altered otherwise. This pretty much
    looks like what you are used to -- with the notable exception of
    additional report headers.

    Also, learning (Bayes training) now needs to be done server side. The
    clients "Junk" buttons won't work.

    guenther


    --
    char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a \x10\xf4\xf4\xc4";
    main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


  4. Re: spam learning

    Back on-list. If my comment may be mis-understood, others might
    mis-understand it just as well.


    On Sun, 2008-07-20 at 18:42 -0400, David Ronis wrote:
    > On Sun, 2008-07-20 at 22:55 +0200, Karsten Br├Ąckelmann wrote:
    > [snip]
    >
    > > Also, learning (Bayes training) now needs to be done server side. The
    > > clients "Junk" buttons won't work.

    >
    > In this case, the client and server are the same machine. Are you
    > saying that spamc won't look at my user .spamassassin directory when
    > invoked by procmail (via sendmail)?


    No. Actually, I did not even talk about spamc, but Bayes training.
    That generally means sa-learn.

    Your piping through 'spamc' as a filter in procmailrc is just fine, and
    spamd (invoked by spamc) will use the users ~/.spamassassin/ files.

    What I was talking about is the usage of 'sa-learn' to train Bayes. Also
    using per-user database in the very same dir. Since your server and
    client is the same machine, SA will continue to use the Bayes and AWL
    files it used with your previous setup. No change here. A perfectly
    smooth move of your mail processing chain.

    The one thing that likely *does* change, however, is the ability to use
    your clients "Junk" buttons to actually train your Bayes on mis-
    classified mail.


    Let me elaborate on this. Previously, you have been using the SA Junk
    plugin in Evolution. Hitting the "Junk" or "Not Junk" button in your
    client actually called SA to learn the mail.

    In the case where your client does NOT equal the server, these buttons
    will not work any longer with server side spam filtering.

    In your case, where the server and client happens to be identical, it
    MIGHT work. It might just as well fail miserably. Evolution supports
    multiple spam filtering backends. But you most likely disabled that
    plugin in Evo, because you don't want Evo to process mail in your Inbox
    with SA a second time. Thus, hitting the "Junk" button will indeed still
    set the IMAP flag -- but it most likely will NOT make SA learn the mail.

    In a nutshell: With server side spam filtering, client side "buttons"
    will NOT make the server learn the message. [1] Regardless, if client
    and server happen to be the same machine. If you do server side
    filtering, you need to do server side training (on error) as well.

    guenther


    [1] Unless you got a custom setup, in which case you know it works.

    --
    char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a \x10\xf4\xf4\xc4";
    main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


+ Reply to Thread