> I saw a number of posts on this list earlier indicating that Bayesian
> filter learning and/or application of learned information wasn't working
> properly if the Bayesian analysis data were stored in a MySQL database


> What's the status of this bug, if it is one, or if it's a
> misconfiguration issue, what should I know to avoid it?


I am using Bayes with MySQL for about 2 years and I found it working
perfectly. I experienced no bugs. In comparison, my previous
configuration with the default db files was not working well at all.

I installed according to the manual. It is not a big server (about 15
users), so I use a global database with a fixed user.
My bayes-related and awl-related configuration from local.cf:

bayes_expiry_max_db_size 500000
bayes_sql_override_username mail
bayes_store_module Mail::SpamAssassin::BayesStore::MySQL
bayes_sql_dsn DBI:mysql:sa:my-server-name.domain.com
bayes_sql_username
bayes_sql_password

bayes_ignore_header X-Account-Key
bayes_ignore_header X-UIDL
bayes_ignore_header X-Mozilla-Status
bayes_ignore_header X-Mozilla-Status2
bayes_ignore_header X-Spam-Flag
bayes_ignore_header X-Spam-Status

use_auto_whitelist 1
user_awl_sql_override_username mail
auto_whitelist_factory Mail::SpamAssassin::SQLBasedAddrList
user_awl_dsn DBI:mysql:sa:my-server.name.domain.com
user_awl_sql_username
user_awl_sql_password
user_awl_sql_table awl

My bayes and awl tables were created according to the manual, but I
added a timestamp column to the awl table and to the bayes_seen table to
be able to expire them by date.

Additionally, I added a feature to learn from "spam" and "nonspam" imap
folders, where I manually copy spam or ham that was not already auto-learnt.
I didn't change anything with the default scores: 5 is still the spam
threshold and 3.5 is still the bayes_99 score when used together with
network tests.

An interesting observation: The spam messages that contain half spam and
half mumbo-jumbo of unrelated random text that should probably irritate
bayes filters, score in fact almost always bayes_99. I can only imagine
that the additional random text is not really random but taken from a
fixed library that is not very big and not changed very often.

Alex