collecting mail for sa-learn, how to? - SpamAssassin

This is a discussion on collecting mail for sa-learn, how to? - SpamAssassin ; Hi, for a mail server running email for multiple domains what is the typical/recommended way to collect emails which arent detected as spam to be processed by sa-learn? Users are downloading mail via POP3, so once a users sees a ...

+ Reply to Thread
Results 1 to 13 of 13

Thread: collecting mail for sa-learn, how to?

  1. collecting mail for sa-learn, how to?

    Hi,

    for a mail server running email for multiple domains what is the
    typical/recommended way to collect emails which arent detected as spam to
    be processed by sa-learn? Users are downloading mail via POP3, so once a
    users sees a mail and decides that it is in fact spam its already been
    removed from the mail server. If the user forwards the mail to a special
    mailbox for processing then the mail is obviously now different from the
    original spam, the user is the sender etc. Will sa-learn still work using
    this method? and if not what else can I implement that would work?

    thanks for any comments, Andy :P


  2. Re: collecting mail for sa-learn, how to?

    andys wrote:
    > Hi,
    >
    > for a mail server running email for multiple domains what is the
    > typical/recommended way to collect emails which arent detected as spam to
    > be processed by sa-learn? Users are downloading mail via POP3, so once a
    > users sees a mail and decides that it is in fact spam its already been
    > removed from the mail server. If the user forwards the mail to a special
    > mailbox for processing then the mail is obviously now different from the
    > original spam, the user is the sender etc. Will sa-learn still work using
    > this method? and if not what else can I implement that would work?
    > thanks for any comments, Andy :P


    We have had good luck by setting the email clients of *trusted* users
    to leave their mail on the server for 1 day. The users can then login to
    their webmail and move the spam to a SPAM folder and a selection of ham
    to a HAM folder. I train bayes on those folders each night.

    By retaining the messages I train with for seven days, I can go back and
    relearn any improperly classified messages if needed.

    The key part is *trusted* users.

    DAve


    --
    Don't tell me I'm driving the cart!


  3. Re: collecting mail for sa-learn, how to?


    On Mon, 2008-07-14 at 12:16 -0400, DAve wrote:
    > andys wrote:
    > > Hi,
    > >
    > > for a mail server running email for multiple domains what is the
    > > typical/recommended way to collect emails which arent detected as spam to
    > > be processed by sa-learn? Users are downloading mail via POP3, so once a
    > > users sees a mail and decides that it is in fact spam its already been
    > > removed from the mail server. If the user forwards the mail to a special
    > > mailbox for processing then the mail is obviously now different from the
    > > original spam, the user is the sender etc. Will sa-learn still work using
    > > this method? and if not what else can I implement that would work?
    > > thanks for any comments, Andy :P

    >
    > We have had good luck by setting the email clients of *trusted* users
    > to leave their mail on the server for 1 day. The users can then login to
    > their webmail and move the spam to a SPAM folder and a selection of ham
    > to a HAM folder. I train bayes on those folders each night.


    That requires IMAP, though, correct?

    That actually may work for Andy - set up both POP and IMAP, and for
    selected users have them use IMAP rather then POP and provide them with
    server-side ham and spam training folders. That won't require all users
    to use IMAP, with the resulting storage requirements on the server.


    --
    John Hardin KA7OHZ http://www.impsec.org/~jhardin/
    jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
    key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
    -----------------------------------------------------------------------
    ...every time I sit down in front of a Windows machine I feel as
    if the computer is just a place for the manufacturers to put their
    advertising. -- fwadling on Y! SCOX
    ----------------------------------------------------------------------
    2 days until the 63rd anniversary of the dawn of the Atomic Age


  4. Re: collecting mail for sa-learn, how to?

    John Hardin wrote:
    > On Mon, 2008-07-14 at 12:16 -0400, DAve wrote:
    >> andys wrote:
    >>> Hi,
    >>>
    >>> for a mail server running email for multiple domains what is the
    >>> typical/recommended way to collect emails which arent detected as spam to
    >>> be processed by sa-learn? Users are downloading mail via POP3, so once a
    >>> users sees a mail and decides that it is in fact spam its already been
    >>> removed from the mail server. If the user forwards the mail to a special
    >>> mailbox for processing then the mail is obviously now different from the
    >>> original spam, the user is the sender etc. Will sa-learn still work using
    >>> this method? and if not what else can I implement that would work?
    >>> thanks for any comments, Andy :P

    >> We have had good luck by setting the email clients of *trusted* users
    >> to leave their mail on the server for 1 day. The users can then login to
    >> their webmail and move the spam to a SPAM folder and a selection of ham
    >> to a HAM folder. I train bayes on those folders each night.

    >
    > That requires IMAP, though, correct?


    That depends on the webmail software he uses and the location and
    permissions of his mailboxes. We use a webmail product utilizing IMAP,
    there are some that do not require IMAP services to be running.

    >
    > That actually may work for Andy - set up both POP and IMAP, and for
    > selected users have them use IMAP rather then POP and provide them with
    > server-side ham and spam training folders. That won't require all users
    > to use IMAP, with the resulting storage requirements on the server.


    Even if his webmail requires IMAP, he doesn't need to make his users use
    IMAP. We provide IMAP only for webmail, not for mail clients. IMAP
    access is available only on 127.0.0.1. I would think that would work for
    him as well. That is why we have the POP client leave the message on the
    server for 1 day. So that a spam message is still accessible to webmail
    after it arrives in the POP client's mail folder.

    DAve


    --
    Don't tell me I'm driving the cart!


  5. Re: collecting mail for sa-learn, how to?


    On Mon, 2008-07-14 at 14:11 -0400, DAve wrote:
    > John Hardin wrote:
    > > On Mon, 2008-07-14 at 12:16 -0400, DAve wrote:
    > >> andys wrote:
    > >>>
    > >>> for a mail server running email for multiple domains what is the
    > >>> typical/recommended way to collect emails which arent detected as spam to
    > >>> be processed by sa-learn? Users are downloading mail via POP3, so once a
    > >>> users sees a mail and decides that it is in fact spam its already been
    > >>> removed from the mail server. If the user forwards the mail to a special
    > >>> mailbox for processing then the mail is obviously now different from the
    > >>> original spam, the user is the sender etc. Will sa-learn still work using
    > >>> this method? and if not what else can I implement that would work?
    > >>> thanks for any comments, Andy :P
    > >> We have had good luck by setting the email clients of *trusted* users
    > >> to leave their mail on the server for 1 day. The users can then login to
    > >> their webmail and move the spam to a SPAM folder and a selection of ham
    > >> to a HAM folder. I train bayes on those folders each night.

    > >
    > > That requires IMAP, though, correct?

    >
    > That depends on the webmail software he uses


    ....where does Andy mention webmail?

    --
    John Hardin KA7OHZ http://www.impsec.org/~jhardin/
    jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
    key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
    -----------------------------------------------------------------------
    Windows and its users got mentioned at home today, after my wife the
    psych major brought up Seligman's theory of "learned helplessness."
    -- Dan Birchall in a.s.r
    -----------------------------------------------------------------------
    2 days until the 63rd anniversary of the dawn of the Atomic Age


  6. Re: collecting mail for sa-learn, how to?

    DAve escribió:
    > We have had good luck by setting the email clients of *trusted* users
    > to leave their mail on the server for 1 day. The users can then login
    > to their webmail and move the spam to a SPAM folder and a selection of
    > ham to a HAM folder. I train bayes on those folders each night.
    >
    > By retaining the messages I train with for seven days, I can go back
    > and relearn any improperly classified messages if needed.
    >
    > The key part is *trusted* users.
    >

    Heh, in my case I really don't like having to re-train anything. I like
    to be sure when I train that if I tell sa-learn that a mail is spam, it
    is 100% spam. That's why I weekly collect spammy mail from a bunch of
    trusted users and re filter it myself before passing it to sa-learn.

    /Diego
    [ Ensign , you may impress *me*. -- Worf ]


  7. Re: collecting mail for sa-learn, how to?

    John Hardin wrote:
    > On Mon, 2008-07-14 at 14:11 -0400, DAve wrote:
    >> John Hardin wrote:
    >>> On Mon, 2008-07-14 at 12:16 -0400, DAve wrote:
    >>>> andys wrote:
    >>>>> for a mail server running email for multiple domains what is the
    >>>>> typical/recommended way to collect emails which arent detected as spam to
    >>>>> be processed by sa-learn? Users are downloading mail via POP3, so once a
    >>>>> users sees a mail and decides that it is in fact spam its already been
    >>>>> removed from the mail server. If the user forwards the mail to a special
    >>>>> mailbox for processing then the mail is obviously now different from the
    >>>>> original spam, the user is the sender etc. Will sa-learn still work using
    >>>>> this method? and if not what else can I implement that would work?
    >>>>> thanks for any comments, Andy :P
    >>>> We have had good luck by setting the email clients of *trusted* users
    >>>> to leave their mail on the server for 1 day. The users can then login to
    >>>> their webmail and move the spam to a SPAM folder and a selection of ham
    >>>> to a HAM folder. I train bayes on those folders each night.
    >>> That requires IMAP, though, correct?

    >> That depends on the webmail software he uses

    >
    > ...where does Andy mention webmail?
    >

    He doesn't, which is why I made no assumption that IMAP would be a
    requirement. It may not.

    DAve


    --
    Don't tell me I'm driving the cart!


  8. Re: collecting mail for sa-learn, how to?

    Diego Pomatta wrote:
    > DAve escribió:
    >> We have had good luck by setting the email clients of *trusted* users
    >> to leave their mail on the server for 1 day. The users can then login
    >> to their webmail and move the spam to a SPAM folder and a selection of
    >> ham to a HAM folder. I train bayes on those folders each night.
    >>
    >> By retaining the messages I train with for seven days, I can go back
    >> and relearn any improperly classified messages if needed.
    >>
    >> The key part is *trusted* users.
    >>

    > Heh, in my case I really don't like having to re-train anything. I like
    > to be sure when I train that if I tell sa-learn that a mail is spam, it
    > is 100% spam. That's why I weekly collect spammy mail from a bunch of
    > trusted users and re filter it myself before passing it to sa-learn.
    >


    I haven't yet, but keeping the files for a few days just in case
    certainly doesn't hurt, and could prove useful. They could also be used
    to create a new bayes db in a hurry if something goes wrong with your
    existing db.

    DAve

    --
    Don't tell me I'm driving the cart!


  9. Re: collecting mail for sa-learn, how to?

    DAve escribió:
    > Diego Pomatta wrote:
    >> Heh, in my case I really don't like having to re-train anything. I
    >> like to be sure when I train that if I tell sa-learn that a mail is
    >> spam, it is 100% spam. That's why I weekly collect spammy mail from a
    >> bunch of trusted users and re filter it myself before passing it to
    >> sa-learn.
    >>

    >
    > I haven't yet, but keeping the files for a few days just in case
    > certainly doesn't hurt, and could prove useful. They could also be
    > used to create a new bayes db in a hurry if something goes wrong with
    > your existing db.
    >
    > DAve
    >

    Yes, I keep the spam mail in a mbox folder/file for that purpose, too.

    Diego
    ["Scott me up, Beammy!"]


  10. RE: collecting mail for sa-learn, how to?


    > >

    > Heh, in my case I really don't like having to re-train anything. I like
    > to be sure when I train that if I tell sa-learn that a mail is spam, it
    > is 100% spam. That's why I weekly collect spammy mail from a bunch of
    > trusted users and re filter it myself before passing it to sa-learn.
    >


    Diego and list,

    Isn't the timeliness of the training of spam important?

    Isn't spam trained immediately (close to realtime) more effective than spam
    trained well after spammer mail runs?

    - rh


  11. Re: collecting mail for sa-learn, how to?

    Robert - elists wrote:
    >> Heh, in my case I really don't like having to re-train anything. I like
    >> to be sure when I train that if I tell sa-learn that a mail is spam, it
    >> is 100% spam. That's why I weekly collect spammy mail from a bunch of
    >> trusted users and re filter it myself before passing it to sa-learn.
    >>

    >
    > Diego and list,
    >
    > Isn't the timeliness of the training of spam important?
    >
    > Isn't spam trained immediately (close to realtime) more effective than spam
    > trained well after spammer mail runs?
    >
    > - rh


    In my experience yes. We train each evening within hours of the users
    doing their selections.

    DAve


    --
    Don't tell me I'm driving the cart!


  12. Re: collecting mail for sa-learn, how to?


    On Tue, 2008-07-15 at 08:55 -0400, DAve wrote:

    > They could also be used
    > to create a new bayes db in a hurry if something goes wrong with your
    > existing db.


    Absolutely. If you're manually training you want to retain your training
    corpa to troubleshoot, correct errors, and rebuild from scratch if
    needed.

    --
    John Hardin KA7OHZ http://www.impsec.org/~jhardin/
    jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
    key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
    -----------------------------------------------------------------------
    Gun Control enables genocide while doing little to reduce crime.
    -----------------------------------------------------------------------
    Tomorrow: the 63rd anniversary of the dawn of the Atomic Age


  13. Re: collecting mail for sa-learn, how to?

    Robert - elists wrote:
    >> Heh, in my case I really don't like having to re-train anything. I like
    >> to be sure when I train that if I tell sa-learn that a mail is spam, it
    >> is 100% spam. That's why I weekly collect spammy mail from a bunch of
    >> trusted users and re filter it myself before passing it to sa-learn.
    >>

    >
    > Diego and list,
    >
    > Isn't the timeliness of the training of spam important?
    >
    > Isn't spam trained immediately (close to realtime) more effective than spam
    > trained well after spammer mail runs?


    It would even be more effective to train your bayes before spam is
    received come on...

    for me, the goal of bayes is to detect mail that is legitimate because
    it resembles legitimate mail. the fact that spammers change their
    practice doesn't matter because legitimate users do not.

    of course, learning as fast as possible is helpful to block new spam.
    but I am not going to watch my mailbox in real time just for that. This
    would be worst then "hit delete button".


+ Reply to Thread