collecting mail for sa-learn, how to? - SpamAssassin

This is a discussion on collecting mail for sa-learn, how to? - SpamAssassin ; Hi, for a mail server running email for multiple domains what is the typical/recommended way to collect emails which arent detected as spam to be processed by sa-learn? Users are downloading mail via POP3, so once a users sees a ...

+ Reply to Thread
Results 1 to 5 of 5

Thread: collecting mail for sa-learn, how to?

  1. collecting mail for sa-learn, how to?

    Hi,

    for a mail server running email for multiple domains what is the
    typical/recommended way to collect emails which arent detected as spam to be
    processed by sa-learn? Users are downloading mail via POP3, so once a users
    sees a mail and decides that it is in fact spam its already been removed
    from the mail server. If the user forwards the mail to a special mailbox for
    processing then the mail is obviously now different from the original spam,
    the user is the sender etc. Will sa-learn still work using this method? and
    if not what else can I implement that would work?

    thanks for any comments, Andy :P


  2. Re: collecting mail for sa-learn, how to?

    On Friday 11 July 2008 17:29, andys wrote:
    > Hi,


    Hello,

    > for a mail server running email for multiple domains what is the
    > typical/recommended way to collect emails which arent detected as spam to
    > be processed by sa-learn? Users are downloading mail via POP3, so once a
    > users sees a mail and decides that it is in fact spam its already been
    > removed from the mail server. If the user forwards the mail to a special
    > mailbox for processing then the mail is obviously now different from the
    > original spam, the user is the sender etc. Will sa-learn still work using
    > this method? and if not what else can I implement that would work?


    This is what I do:
    Forwarding the unrecognised message to an account which will process the
    message through sal-wrapper.pl. You will find further informations here:
    https://po2.uni-stuttgart.de/~rusjako/sal-wrapper

    > thanks for any comments, Andy :P


    Greetings
    Stefan

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.5 (GNU/Linux)

    iD8DBQBIe1kaBiHtkFOLX+gRAjaAAJ9KvwkKtYS2mdqXAxW9dU UU2frKvgCfe53N
    UaOQgrYI5XLvJEl6Wg4hWD8=
    =j9JP
    -----END PGP SIGNATURE-----


  3. Re: collecting mail for sa-learn, how to?


    On Mon, 2008-07-14 at 15:48 +0200, Stefan Jakobs wrote:
    > On Friday 11 July 2008 17:29, andys wrote:
    > > for a mail server running email for multiple domains what is the
    > > typical/recommended way to collect emails which arent detected as spam to
    > > be processed by sa-learn? Users are downloading mail via POP3, so once a
    > > users sees a mail and decides that it is in fact spam its already been
    > > removed from the mail server. If the user forwards the mail to a special
    > > mailbox for processing then the mail is obviously now different from the
    > > original spam, the user is the sender etc. Will sa-learn still work using
    > > this method? and if not what else can I implement that would work?

    >
    > This is what I do:
    > Forwarding the unrecognised message to an account which will process the
    > message through sal-wrapper.pl. You will find further informations here:
    > https://po2.uni-stuttgart.de/~rusjako/sal-wrapper


    Forwarding alters the message, you will not get reliable results.

    You can, of course, use auto-learn and let SA take care of it.

    If you want your users to classify, the best way is to use IMAP instead
    of POP, and provide server-side training folders that sa-learn can see.
    If IMAP is not an option then this obviously won't work.

    If procmail is in use as the LDA, you could set up a rule to clone to a
    local ham folder to do scheduled training. You could get creative with
    rules and have it collect a randomly-chosen subset of the ham traffic,
    or only train where the score is low and the message is not already
    BAYES_00 or the score is high and the message is not already BAYES_99.
    However, this would be cloning users' mail (even if only temporarily),
    and you should obtain their consent before doing this.

    --
    John Hardin KA7OHZ http://www.impsec.org/~jhardin/
    jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
    key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
    -----------------------------------------------------------------------
    Usually Microsoft doesn't develop products, we buy products.
    -- Arno Edelmann, Microsoft product manager
    -----------------------------------------------------------------------
    2 days until the 63rd anniversary of the dawn of the Atomic Age


  4. Re: collecting mail for sa-learn, how to?

    On Monday 14 July 2008 16:27, John Hardin wrote:
    > On Mon, 2008-07-14 at 15:48 +0200, Stefan Jakobs wrote:
    > > On Friday 11 July 2008 17:29, andys wrote:
    > > > for a mail server running email for multiple domains what is the
    > > > typical/recommended way to collect emails which arent detected as spam
    > > > to be processed by sa-learn? Users are downloading mail via POP3, so
    > > > once a users sees a mail and decides that it is in fact spam its
    > > > already been removed from the mail server. If the user forwards the
    > > > mail to a special mailbox for processing then the mail is obviously now
    > > > different from the original spam, the user is the sender etc. Will
    > > > sa-learn still work using this method? and if not what else can I
    > > > implement that would work?

    > >
    > > This is what I do:
    > > Forwarding the unrecognised message to an account which will process the
    > > message through sal-wrapper.pl. You will find further informations here:
    > > https://po2.uni-stuttgart.de/~rusjako/sal-wrapper

    >
    > Forwarding alters the message, you will not get reliable results.


    Sorry, I should be more clear. The unrecognised message is in the appendix of
    the forwarding message. sal-wrapper will "unpack" the message from the
    appendix and feed it to sa-learn.



    Greetings
    Stefan

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.5 (GNU/Linux)

    iD8DBQBIe2SkBiHtkFOLX+gRAuGmAJ4sQ4yKnbQQ1npxWbiZc6 MYsmTTLACeNuC9
    meyi26bhC82/Cwlj3q1Llvk=
    =Cyye
    -----END PGP SIGNATURE-----


  5. Re: collecting mail for sa-learn, how to?

    andys escribió:
    > Hi,
    > for a mail server running email for multiple domains what is the
    > typical/recommended way to collect emails which arent detected as spam
    > to be processed by sa-learn? Users are downloading mail via POP3, so
    > once a users sees a mail and decides that it is in fact spam its
    > already been removed from the mail server. If the user forwards the
    > mail to a special mailbox for processing then the mail is obviously
    > now different from the original spam, the user is the sender etc. Will
    > sa-learn still work using this method? and if not what else can I
    > implement that would work?
    > thanks for any comments, Andy :P
    >
    >

    I have a similar situation here.
    What I do is instruct several key users to move the spam that still
    slips through, to a spam folder in their client. I then copy or move
    those folders regulary (once a week or so) over the network to my
    computer, import them all to a folder in my Mozilla Thunderbird, and
    check the mails (because sometimes what users think is spam, actually
    isn't). The headers remain intact.

    Then I feed my thunderbird spam folder (mbox format) to sa-learn.
    I happen to use thunderbird, that use mbox file format to store mails,
    but there are programs out there that convert Outlook or Outlook express
    folders to mbox format, too.
    Many parts of this process can be automatized with scripts.

    Regards.
    /Diego


+ Reply to Thread