This is a discussion on Re: sa-learn journal location for teaching spamassassin on multiple hosts - SpamAssassin ; On Fri, Nov 7, 2008 at 4:45 AM, Samy Ascha, Xel Media B.V. wrote: > I have recently setup a mailbox and a sa-learn script to start teaching > SpamAssassin. This was all no problem, but: > > We have ...
On Fri, Nov 7, 2008 at 4:45 AM, Samy Ascha, Xel Media B.V.
> I have recently setup a mailbox and a sa-learn script to start teaching
> SpamAssassin. This was all no problem, but:
> We have an MX group of usually about 3 MTAs, which all run their own content
> filter (amavis) and thus use their own SpamAssassin's database. When we are
> gonna start teaching SpamAssassin with sa-learn, I need to somehow sync the
> results in the journal to all these hosts.
> I've checked out the --no-sync and --sync options and I think these options
> will give me exactly the tools I need for this job.
> I need to know the location of the journal though and I need to know if
> there are any pitfalls when syncing a SpamAssassin with a journal from
> another one on another server.
> Has anyone got experience with syncing sa-learn between multiple MTAs? How
> did you solve this? Can SA sync with a journal in an arbitrary location, or
> does it look for it in one preconfigged place?
> I hope u have some interresting thought about this issue.
Ultimately, you're not syncing 'sa-learn', you're syncing the bayes'
DB that sa-learn (and spamd) records to. There's a few ways to go
about sharing the bayesian database. Probably the best bet would be to
store the bayes DB in MySQL, and point SA on all 3 servers to it-
ideally with the database on a 4th server (hey, you can put the AWL
info into MySQL as well... may as well hit that up at the same time).
You could probably go the --sync and --no-sync route if you fiddled
with it enough (never tried it), but honestly a single MySQL DB for
bayes would probably be a lot simpler if you have any experience at
all with MySQL. It's been good for performance for us even when used
on a single server, and it's pretty bulletproof for us- been in use
for years. The only tip you really need here is to run OPTIMIZE TABLE
every now and then.
An alternative hacky solution: turn off autolearn on 2 of the 3, and
do sa-learns and autolearning on the 3rd. Then nightly rsync all the
bayes DB files over to the other 2 servers and restart spamd. Not
pretty, but it should work.