Len,

> FreeBSD 6.2, 2 GHz, 1 GB RAM, Amavisd-new, 10 servers
> 400 KB max msg size to scan
> TIMING shows sa-check taking 85% - 90%
> spamassassin rulesets:
> updates.spamassassin.org saupdates.openprotect.com sought.rules.yerp.org
> We run sa-compile.
> external checks: pyzor, razor, dcc
> In business hours (08:00-17:00), traffic inbound is about 400 msgs/hour
> Traffic outbound, is about 1250 msgs/hour.
> SA RBL activated, no RBLs timing out or long responses
> bayes uses Berkeley db. I was told SQL was faster, but I don't think it
> will matter that much in our case.
>
> amavisd-nanny shows all 10 servers busy, and occasional time outs.
> load average about 10, CPU idle 0%
> WCPU shows the amavis/vscan processes each taking 7% - 10%
> iostat shows spiky disk i/o with 2-3 seconds of 0 KB i/o between spikes
> (disk not saturated), leads us to think a memory disk won't make any
> difference. free + inactive memory totals about 200 - 300 MB (an amavis
> process takes about 75MB), so not memory constrained.
>
> The machine gets overloaded during peak business hours, with the
> postfix-to-vscan delivery delay taking sometime 100s to 1000s of seconds.
> When falls behind, can take hours to catch up.


Great report, you've covered practically all areas!

It is clear that CPU is the bottleneck here,
which rules out DNS, RBL, i/o, memory.
A quick solution is to switch to a dual (or quad) CPU box,
otherwise read on.

So you are doing 1650 msgs/h on the average during working hours,
peaking possibly 50% higher, which is roughly right (on the low side)
for a single processor with many rules (especially third party
rules which are often more CPU hungry than the built-in rules).
The other two CPU consumers in your case are pyzor and bdb-based bayes.

> sa-check, as seen in amavis log line with TIMING [total xxx ms].
> Without going to sa 3.3, I don't know how to break sa-check ms
> into time per sa action.


Luckily you are on FreeBSD, so mixing perl modules from CPAN and
ports is not a problem. I'd suggest that you do try to install
the current SA 3.3 (tarball or cvs), which as you've noted, does
provide a timing breakdown in the log - which will likely confirm
my pointing at rules, dbd and pyzor (in this order).

The 3.3 is currently fully backwards and forwards compatible with 3.2.5,
so you can switch back to a ports-based 3.2.5 at any time if you chose so
(just pkg_delete the directly installed SA, and reinstall SA from ports).

For the recent months the 3.3 is very stable - and here is a paradox:
as some of the developers run current 3.3 on their production boxes and
carefully monitor their logs, and as all bug fixes go into trunk first
(and only some of them are later backported to what will be a 3.2.6),
the current version tends to be more bug-free and better tested than
3.2.5 or 3.2.6! Sorry for a heretic thought! Of course the usual
disclaimer holds, no promises this will stay so in the future, just
an observation of the current state of affairs from my viewpoint.

My advise regardless of the outcome is to switch Bayes (and AWL)
to MySQL. The documentation is in sql/README, sql/README.bayes
and sql/README.awl. Make sure to use the
bayes_store_module Mail::SpamAssassin::BayesStore::MySQL
and not the plain Mail::SpamAssassin::BayesStore::SQL, and that
the database type is InnoDB - works well with mysql51 from ports.

Mark