This is a discussion on Re: Bulk spam scan - SpamAssassin ; > > spamassassin --mbox scanned.mbox > > No, SA doesn't know how to split up messages for scanning; sa-learn > is the only SA component that can extract messages from an mbox mail > folder. > In that case, what ...
> > spamassassin --mbox
> No, SA doesn't know how to split up messages for scanning; sa-learn
> is the only SA component that can extract messages from an mbox mail
In that case, what does the --mbox option do? Not what I expected,
> If I accidentally mangled my own personal mail flow such that
> everything got put in my system inbox, for instance, I might just move
> my system mailbox file from /var/spool/mail to ~/spammy-inbox, and
> $ formail -s procmail -m ~/.procmailrc < ~/spammy-inbox
No accident: I've been collecting all inbound and outbound mail with an
"always_bcc" Postfix directive that pushes it through a procmail recipe
and shell script that stores it in a set of mbox files and switches
files when they get near the mbox size limit defined in Postfix.
Meanwhile I've built a proper archive system with a loader that can
extract mail from mbox files, split it up and index the messages.
I'm pretty certain that some of the mbox files precede me installing SA,
so I'd like to push them through SA before pushing them through the
archive loader and. hopefully, end up with a similar spam scanned set of
> (I'd move the mailbox out of /var/spool/mail so I didn't keep
> appending old messages to the end of it over and over; some mail
> *does* get delivered there.)
Yes, that makes sense. Thanks for the formail tip. I can build a script
round that to do my scan and refiling job.
> Hmm. I'm pretty sure it's pointed out in several places that SA does
> not know how to process more than one message per call, but I've been
> using it long enough that I just know that's how it works.
I'd got that message for SA's normal operation and have looked at the
innards of spamc closely enough to see that can only handle a single
message at a time. As I said above, it was the --mbox option that
confused me because, in general, an mbox file contains multiple
Given that I'm running spamc + spamd, I have two final questions:
- would it be better to use spamc/spamd for the scan in place of
- if spamd is the way to go, do I need to stop my normal mail
system while the scan is running or will spamd keep the two
streams separate? I assume it does, but its always good to check.