On Apr 21, 2008, at 8:40 AM, Chris St. Pierre wrote:
> On Mon, 21 Apr 2008, Michael Parker wrote:
>
>> select * from bayes_vars;

>
> ...
> 2289 rows in set (0.00 sec)
>
>> What user do you run bayes under on your MXs?

>
> I think you've found the issue. We run as spamd.
>
> # sa-learn -u spamd --dump magic
> 0.000 0 3 0 non-token data: bayes db
> version
> 0.000 0 1492123 0 non-token data: nspam
> 0.000 0 660634 0 non-token data: nham
> 0.000 0 73178711 0 non-token data: ntokens
> 0.000 0 1189775610 0 non-token data: oldest atime
> 0.000 0 1208785034 0 non-token data: newest atime
> 0.000 0 0 0 non-token data: last journal
> sync atime
> 0.000 0 0 0 non-token data: last expiry
> atime
> 0.000 0 0 0 non-token data: last expire
> atime delta
> 0.000 0 0 0 non-token data: last expire
> reduction count
>
> That leads to two issues:
>
> 1. I need to straighten things out and figure out why I've got a
> strange mix of per-user and global data in my Bayes DB. Whee.



You should use the bayes override username if you want global and then
just sa-learn -u clear everything else (PITA, I know). I
personally don't believe individual bayes dbs are an issue, if you've
got the space and CPU on your database machine. See below for some
solutions.

>
>
> 2. Does this mean that, if I use per-user Bayes, I have to run
> expiration as each user individually?
>
> Manual expiration was recommended to me a long time ago as a way to
> increase database performance, but it seems like it may not be worth
> it if I have to run N forced expirations, for potentially large values
> of N.
>


This is true for DBM based bayes databases, but generally (with an
exception I'll talk about in a second) MySQL based bayes expiration is
very fast (just a few seconds). I would go ahead and turn auto-expire
on, after running a manual expire to clear out the current backlog.

One reason that expiration slows down is an unoptimized db. I've
found for my small uses if I run optimization every couple of weeks I
get much better performance. It looks like you get a lot more traffic
so I would recommend running it more often. With frequent
optimizations and auto-expire your database will stay in much better
shape.

Michael


> Thanks for your help.
>
> Chris St. Pierre
> Unix Systems Administrator
> Nebraska Wesleyan University
>