Moving ham/spam from Exchange folders to sa-learn? - SpamAssassin

This is a discussion on Moving ham/spam from Exchange folders to sa-learn? - SpamAssassin ; Hi, Currently running SA 3.25 via MailScanner frontend (CentOS5 box in the DMZ) to Exchange2K7. Have setup two public folders for users to dump spam/ham in. What's the usual way of moving these messages back to SA for learning? The ...

+ Reply to Thread
Results 1 to 6 of 6

Thread: Moving ham/spam from Exchange folders to sa-learn?

  1. Moving ham/spam from Exchange folders to sa-learn?


    Hi,

    Currently running SA 3.25 via MailScanner frontend (CentOS5 box in the DMZ) to
    Exchange2K7. Have setup two public folders for users to dump spam/ham in.
    What's the usual way of moving these messages back to SA for learning? The
    volume isn't that high so if there was a way to convert .MSG to a format that
    sa-learn understands, I could then just sftp it back onto the CentOS box.

    Any links or tips would be appreciated.

    Thanks.


  2. RE: Moving ham/spam from Exchange folders to sa-learn?

    Henry

    Make sure the spam/ham folders are imap folders. Make sure they drag the messages into that folder and not email them as it'll muck up the headers otherwise.

    Then grab a perl script (heck here's one below) to get messages from those folders and place into the bayes.

    Make sure you're running this script as the user mailscanner run's as (mailnull, postfix etc) is not running as root.

    #!/usr/bin/perl -w
    use strict;
    use Mail::IMAPClient;
    use Shell;
    use Env qw(HOME);
    use Getopt::Long;

    use File::Temp qw/ tempfile tempdir /;

    my $imapserver = "myserver.domain.com";

    # set to 1 to enable imapclient debugging
    my $debug = 0;

    # set to 1 if running under cron (disables output)
    my $cron = 1;

    my $filename;
    my $fh;

    my %options =
    (
    uid => undef,
    pwd => undef
    );

    my $cmdsts = GetOptions ("uid=s" => \$options{uid}, "pwd=s" =>
    \$options{pwd});

    if (!$options {uid}) { die "[SPAMASSASSIN] uid not set
    (-uid=username)\n"; }
    if (!$options {pwd}) { die "[SPAMASSASSIN] pwd not set
    (-pwd=password)\n"; }

    my $uid = $options{uid};
    my $pwd = $options{pwd};

    # login to imap server
    my $imap = Mail::IMAPClient->new (Server=>$imapserver, User=>$uid, Password=>$pw
    d, Debug=>$debug)
    or die "Can't connect to $uid\@$imapserver: $@ $\n";

    if ($imap)
    {
    my $count;

    # Deal with spam first
    learn_mail ($HOME."/spam/", ".spam", "spam", 0, "--spam --showdots");

    # Now deal with ham
    learn_mail ($HOME."/ham/", ".ham", "ham", 0, "--ham --showdots");

    }
    else
    {
    die "[SPAMASSASSIN] Unable to logon to IMAP mail account!
    $options{uid}\n";
    }

    exit;

    #
    # read and learn mail from imap server
    #
    # arguments
    # $dir directory to place retrieved messages in
    # $ext file extension to use on retrieved messages
    # $folder imap folder name on server
    # $shared 0 if imap folder is in users mailbox
    # 1 if imap folder is in shared name space or
    # $sa_args additional arguments to specify to sa-learn
    # (e.g. --spam or --ham)
    #
    sub learn_mail {
    my $dir = shift (@_);
    my $ext = shift (@_);
    my $folder = shift (@_);
    my $shared = shift (@_);
    my $sa_args = shift (@_);

    my $count = 0;

    # tidy up directory before run
    clear_directory ($dir, $ext);

    # read mail from server
    $count = read_mail ($dir, $ext, $folder, $shared);
    if ($count > 0)
    {
    # learn about mail
    sa_learn ($dir, $ext, $sa_args);

    # tidy up files after sa-learn is called
    clear_directory ($dir, $ext);
    }
    }


    #
    # reads mail from an imap folder and saves in a local directory
    #
    # arguments
    # $dir directory to place retrieved messages in
    # $ext file extension to use on retrieved messages
    # $folder imap folder name on server
    # $shared 0 if imap folder is in users mailbox
    # 1 if imap folder is in shared name space or
    sub read_mail {
    my $dir = shift (@_);
    my $ext = shift (@_);
    my $folder = shift (@_);
    my $shared = shift (@_);
    my $count = 0;
    my $target = "";

    if ($shared)
    {
    # use a shared public folder instead
    my ($prefix, $sep) = @{$imap->namespace->[2][0]}
    or die "Can't get shared folder namespace or seperator: $@\n";

    $target = $prefix.
    ($prefix =~ /\Q$sep\E$/ || $folder =~ /^\Q$sep/ ? "" : $sep).
    $folder;
    }
    else { $target = $folder; }

    $imap->select ($target) or die "Cannot select $target: $@\n";

    # If a shared public folder is required uncomment the following
    # lines and comment out the previous $imap->select line

    # read through all messages
    my @msgs = $imap->search("ALL");
    foreach my $msg (@msgs)
    {
    ($fh, $filename) = tempfile (SUFFIX => $ext, DIR => $dir);
    $imap->message_to_file ($fh, $msg);
    close $fh;
    $count++;
    }
    $imap->delete_message (@msgs);

    if ($cron == 0) { print "Retrieved $count messages from $target\n"; }

    return $count;
    }

    #
    # Removes files in directory $dir with extension $ext
    #
    sub clear_directory{
    my $dir = shift (@_);
    my $ext = shift (@_);

    opendir (DIR, $dir) or die "Couldn't open dir: $dir\n";
    my @files = readdir (DIR);
    close (DIR);

    for (my $i = 0; $i <= $#files; $i++ )
    {
    if ($files[$i] =~ /.*?$ext$/) { unlink ($dir.$files[$i]); }
    }
    }


    #
    # execute sa-learn command
    #
    sub sa_learn {
    my $dir = shift (@_);
    my $ext = shift (@_);
    my $type = shift (@_);
    my $learncmd = "/usr/local/bin/sa-learn ".$type." --dir ".$dir;

    if ($cron == 0) { $learncmd .= " --showdots"; }
    else { $learncmd .= " > /dev/null 2>&1"; }

    #
    # Run sa-learn script on spam directory
    #
    my $sh = Shell->new;
    my @args = ($learncmd);

    system (@args) == 0 or die "system @args failed: $?";
    }

    --
    Martin Hepworth
    Snr Systems Administrator
    Solid State Logic
    Tel: +44 (0)1865 842300

    > -----Original Message-----
    > From: news [mailto:news@ger.gmane.org] On Behalf Of Henry Kwan
    > Sent: 19 June 2008 03:10
    > To: users@spamassassin.apache.org
    > Subject: Moving ham/spam from Exchange folders to sa-learn?
    >
    >
    > Hi,
    >
    > Currently running SA 3.25 via MailScanner frontend (CentOS5
    > box in the DMZ) to Exchange2K7. Have setup two public
    > folders for users to dump spam/ham in.
    > What's the usual way of moving these messages back to SA for
    > learning? The volume isn't that high so if there was a way
    > to convert .MSG to a format that sa-learn understands, I
    > could then just sftp it back onto the CentOS box.
    >
    > Any links or tips would be appreciated.
    >
    > Thanks.
    >
    >
    >
    >





    ************************************************** ********************
    Confidentiality : This e-mail and any attachments are intended for the
    addressee only and may be confidential. If they come to you in error
    you must take no action based on them, nor must you copy or show them
    to anyone. Please advise the sender by replying to this e-mail
    immediately and then delete the original from your computer.
    Opinion : Any opinions expressed in this e-mail are entirely those of
    the author and unless specifically stated to the contrary, are not
    necessarily those of the author's employer.
    Security Warning : Internet e-mail is not necessarily a secure
    communications medium and can be subject to data corruption. We advise
    that you consider this fact when e-mailing us.
    Viruses : We have taken steps to ensure that this e-mail and any
    attachments are free from known viruses but in keeping with good
    computing practice, you should ensure that they are virus free.

    Red Lion 49 Ltd T/A Solid State Logic
    Registered as a limited company in England and Wales
    (Company No:5362730)
    Registered Office: 25 Spring Hill Road, Begbroke, Oxford OX5 1RU,
    United Kingdom
    ************************************************** ********************


  3. Re: Moving ham/spam from Exchange folders to sa-learn?

    On 19.06.08 09:18, Martin.Hepworth wrote:

    Please, set up your mailer to wrap lines below 80 characters per line, 72 to
    76 is usually OK.

    > Make sure the spam/ham folders are imap folders. Make sure they drag the
    > messages into that folder and not email them as it'll muck up the headers
    > otherwise.


    note that exchange still m(f)ucks up headers, often recodes body, so it may
    lower the effectiveness if you are running SA before mails hit exchange

    --
    Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
    Warning: I wish NOT to receive e-mail advertising to this address.
    Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
    WinError #99999: Out of error messages.


  4. Re: Moving ham/spam from Exchange folders to sa-learn?

    Martin.Hepworth solidstatelogic.com> writes:

    >
    > Henry
    >
    > Make sure the spam/ham folders are imap folders. Make sure they drag the

    messages into that folder and not
    > email them as it'll muck up the headers otherwise.
    >
    > Then grab a perl script (heck here's one below) to get messages from those

    folders and place into the bayes.
    >
    > Make sure you're running this script as the user mailscanner run's as

    (mailnull, postfix etc) is not
    > running as root.


    Hi Martin,

    Thanks for the script but I don't think I can use it as Exchange2K7 has dropped
    IMAP support for public folders. Or least this blog post from MSFT seems to
    indicate:

    http://msexchangeteam.com/archive/20...20/419994.aspx

    "# E12's client access server has some limitations in public folder support: no
    IMAP, NNTP, nor OWA access to E12 public folders (OWA access to E2K and E2K3
    public folders will be possible for E12 mailbox users)."

    Perhaps I can track down some type of MSG->mbox/mbx/maildir conversion utility.


  5. Re: Moving ham/spam from Exchange folders to sa-learn?

    Henry Kwan wrote:

    > Thanks for the script but I don't think I can use it as Exchange2K7
    > has dropped IMAP support for public folders. Or least this blog post
    > from MSFT seems to indicate:
    >
    > http://msexchangeteam.com/archive/20...20/419994.aspx


    I don't have any Exchange 2007 experience, but at least on 2003 "public
    folder" and "normal mailbox into which everyone can copy e-mail and to
    which no-one can send e-mail" are two separate concepts. And you can use
    IMAP to read the contents of the latter.

    Unfortunately, setting that up involves configuring Outlook on each
    client PC, so depending on the number of users, this may not be
    practical.

    Hope this helps,

    James.
    --
    E-mail: james@ | Never ask, "Oh, why were things so much better in the old
    aprilcottage.co.uk | days?" It's not an intelligent question.
    | -- Ecclesiastes 7 v. 10


  6. Re: Moving ham/spam from Exchange folders to sa-learn?

    James Wilkinson aprilcottage.co.uk> writes:

    > Henry Kwan wrote:
    >
    > > Thanks for the script but I don't think I can use it as Exchange2K7
    > > has dropped IMAP support for public folders. Or least this blog post
    > > from MSFT seems to indicate:
    > >
    > > http://msexchangeteam.com/archive/20...20/419994.aspx

    >
    > I don't have any Exchange 2007 experience, but at least on 2003 "public
    > folder" and "normal mailbox into which everyone can copy e-mail and to
    > which no-one can send e-mail" are two separate concepts. And you can use
    > IMAP to read the contents of the latter.


    I still can't figure out if public folders under Exchange2K7 can be IMAP-enabled
    but in the meanwhile, I have been fiddling with the script that Martin posted.

    I ended up creating a mailbox where I could move all the spam/ham into from the
    public folders. Then I would run the script from the SA machine to grab the
    spam/ham. The script dies on me after it grabs the spam (but not the ham):

    system /usr/local/bin/sa-learn --spam --showdots --dir /root/spam/ > /dev/null
    2>&1 failed: 32512 at ./grabmail.pl line 180.

    I then manually run sa-learn and it seem to succeed:

    [boxen]# sa-learn --spam --progress --dir /root/spam/
    100%
    [================================================== =====================]
    12.58 msgs/sec 00m07s DONE
    Learned tokens from 96 message(s) (97 message(s) examined)

    Not quite automated but I could live with this since I probably will only run it
    once a week.

    Thanks.


+ Reply to Thread