MySQL and Size Of bayes_expiry_max_db_size - SpamAssassin

This is a discussion on MySQL and Size Of bayes_expiry_max_db_size - SpamAssassin ; Greetings, This weekend I created a MySQL db to store my bayes tokens. It seems to be working well but I'm a little puzzled by the default size of bayes_expiry_max_db_size. I understand that the default size is 150,000 which seems ...

+ Reply to Thread
Results 1 to 5 of 5

Thread: MySQL and Size Of bayes_expiry_max_db_size

  1. MySQL and Size Of bayes_expiry_max_db_size

    Greetings,

    This weekend I created a MySQL db to store my bayes tokens. It seems to be
    working well but I'm a little puzzled by the default size of
    bayes_expiry_max_db_size. I understand that the default size is 150,000
    which seems very low as it took only one day to reach 100,000 tokens.

    Was the default size set that low because of the performance of the default db?
    Is it reasonable to set it to a much higher number considering that I am
    using a SQL db?

    Thanks for any help!

    Nedry


  2. Re: MySQL and Size Of bayes_expiry_max_db_size

    On 5/27/08 at 4:33 PM -0500 Michael Parker wrote:
    >You should adjust it for whatever works best for your user base and
    >the resources you have available on your database.


    Of course. But how would I figure out what works best? How can I tell if
    it is working poorly or very well?

    I'm looking for a way to calculate or experimentally find the sweet spot
    for bayes_expiry_max_db_size. Is there an ideal range? Or a maximum size?
    What happens if the size is too high?

    The server in question has a dual core processor with 2 GB of RAM. There
    are currently about 150 users on this box and growing. SpamAssassin is
    version 3.2.4.

    Any suggestions?

    Nedry


  3. Re: MySQL and Size Of bayes_expiry_max_db_size

    On Mittwoch, 28. Mai 2008 Larry Nedry wrote:
    > But how would I figure out what works best? *How can I tell if
    > it is working poorly or very well?


    We use bayes_expiry_max_db_size 2123456 and bayes is absolutely correct
    for us. I think you cannot really calculate it, it depends on how many
    different spams/hams you get, so how many tokens you need it to be good
    enough. Over time we experienced with the value a bit, but more than 2
    million tokens doesn't help anymore for us: bayes is 100% correct now.
    We do make good training though.

    mfg zmi
    --
    // Michael Monnerie, Ing.BSc ----- http://it-management.at
    // Tel: 0676/846 914 666 .network.your.ideas.
    // PGP Key: "curl -s http://zmi.at/zmi.asc | gpg --import"
    // Fingerprint: AC19 F9D5 36ED CD8A EF38 500E CE14 91F7 1C12 09B4
    // Keyserver: www.keyserver.net Key-ID: 1C1209B4

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v2.0.4-svn0 (GNU/Linux)

    iD8DBQBIPPfezhSR9xwSCbQRAqtcAJ9HCILtsP9jcsag9x+GcS dhVNYUvwCgq7k3
    ZG6QtgjAoSfiyTtI0yZe3j4=
    =J+Kw
    -----END PGP SIGNATURE-----


  4. Re: MySQL and Size Of bayes_expiry_max_db_size


    On Wed, May 28, 2008 00:04, Larry Nedry wrote:

    > I'm looking for a way to calculate or experimentally find the sweet spot
    > for bayes_expiry_max_db_size. Is there an ideal range? Or a maximum size?
    > What happens if the size is too high?


    what happen is when the size is to big the more ham/spam training needs to be
    performed to have effect on bayes

    the lower bayes size, faster learning, but olso a bit unstable

    to get it:

    1: if you want manual training keep sizes low
    2: otherwize raise bayes size to be bigger to compensate for no manuel training

    always monitor bayes anyway will spot if it works or not, for the bayes
    autolearn one can make the range bigger to get more static laerning olso, so
    if bayes updates takes lots of time pr msg, this is how to make it more
    silence

    most important is that bayes is doing it right eg only give bayes_99 for spam,
    and bayes_00 for ham

    last but not least make sure there is equal learned ham / spam signatures



    Benny Pedersen
    Need more webspace ? http://www.servage.net/?coupon=cust37098


  5. Re: MySQL and Size Of bayes_expiry_max_db_size

    Larry Nedry wrote:
    > Of course. But how would I figure out what works best? How can I tell if
    > it is working poorly or very well?


    Results. Customer/user complaints are always useful (if perhaps
    not really desireable); customer/user *feedback* is critical on
    anything bigger than a trivial personal or very-small-business system.
    You have to feed in a variety of legitimate email - finding spam to feed
    in shouldn't be a problem.

    > I'm looking for a way to calculate or experimentally find the sweet spot
    > for bayes_expiry_max_db_size. Is there an ideal range? Or a maximum size?
    > What happens if the size is too high?


    I've found 600,000 works pretty well on a smallish filter server (about
    the same hardware class as your system, AKA "overkill" ); for the
    larger cluster serving between high single-digit and low double-digit
    thousands of accounts, plus filtering outbound mail, I've been playing
    with various settings on and off for several months now. I still
    haven't found a happy balance.

    (Side note - This question in various forms has been asked 3 or 4 times
    in the past month or so - could someone who really knows the Bayes
    innards please speak up? As noted near the beginning of this thread,
    the default number of tokens is too small for anything much bigger than
    purely personal/per-user Bayes.)

    Benny Pedersen's reply a few messages back includes a few points that
    made my own experiments become a lot more coherent; I'll be doing
    further tuning based on that. At the moment, for my usage, I'm looking
    at ~2M tokens as a floor.

    -kgd


+ Reply to Thread