Reduce Bayes DB - SpamAssassin

This is a discussion on Reduce Bayes DB - SpamAssassin ; Hi, I would like to reduce the size of my bayes db. The filesize of the bayes_seen.MYI is now near 1GByte. # sa-learn -u filter --dump magic 0.000 0 3 0 non-token data: bayes db version 0.000 0 38413200 0 ...

+ Reply to Thread
Results 1 to 7 of 7

Thread: Reduce Bayes DB

  1. Reduce Bayes DB


    Hi,

    I would like to reduce the size of my bayes db.
    The filesize of the bayes_seen.MYI is now near 1GByte.

    # sa-learn -u filter --dump magic
    0.000 0 3 0 non-token data: bayes db version
    0.000 0 38413200 0 non-token data: nspam
    0.000 0 48964536 0 non-token data: nham
    0.000 0 17639510 0 non-token data: ntokens
    0.000 0 1213554746 0 non-token data: oldest atime
    0.000 0 1213597971 0 non-token data: newest atime
    0.000 0 0 0 non-token data: last journal sync
    atime
    0.000 0 1213597961 0 non-token data: last expiry atime
    0.000 0 43200 0 non-token data: last expire atime
    delta
    0.000 0 9765 0 non-token data: last expire
    reduction count

    yes, thats a little bit larger site ;-)

    Two month ago the database was about 4Gbyte and I purged all and created a
    new one. But now its again very large and I looking for a way to reduce it.

    Greetings

    Frank


    --
    View this message in context: http://www.nabble.com/Reduce-Bayes-D...p17859100.html
    Sent from the SpamAssassin - Users mailing list archive at Nabble.com.


  2. Re: Reduce Bayes DB

    furban wrote:
    > Hi,
    >
    > I would like to reduce the size of my bayes db.
    > The filesize of the bayes_seen.MYI is now near 1GByte.
    >
    > # sa-learn -u filter --dump magic
    > 0.000 0 3 0 non-token data: bayes db version
    > 0.000 0 38413200 0 non-token data: nspam
    > 0.000 0 48964536 0 non-token data: nham
    > 0.000 0 17639510 0 non-token data: ntokens
    > 0.000 0 1213554746 0 non-token data: oldest atime
    > 0.000 0 1213597971 0 non-token data: newest atime
    > 0.000 0 0 0 non-token data: last journal sync
    > atime
    > 0.000 0 1213597961 0 non-token data: last expiry atime
    > 0.000 0 43200 0 non-token data: last expire atime
    > delta
    > 0.000 0 9765 0 non-token data: last expire
    > reduction count
    >
    > yes, thats a little bit larger site ;-)
    >
    > Two month ago the database was about 4Gbyte and I purged all and created a
    > new one. But now its again very large and I looking for a way to reduce it.

    Bayes_seen doesn't auto-expire, but you can usually purge it safely.

    The bayes_seen is used to track message id's that have already been
    learned and prevent them from being re-learned. As long as you don't
    retrain the same pool of messages over-and-over again, wiping out this
    table should be safe.

    See also

    https://issues.apache.org/SpamAssass...ug.cgi?id=5652


  3. Re: Reduce Bayes DB


    OK,

    seemed that i will do the same like I have done with the AWL DB
    There I added a Date/Time Row and deleting out everything not used for
    longer than 2 month

    Chang the database
    ALTER TABLE `awl` ADD `lastupdate` TIMESTAMP NOT NULL ;

    run a cronjob
    echo "USE spamassassin; DELETE FROM awl WHERE lastupdate <=
    DATE_SUB(SYSDATE(), INTERVAL 2
    MONTH);" | mysql

    So I thing I can do the same with bayes_seen.

    Frank


    furban wrote:
    >
    > Hi,
    >
    > I would like to reduce the size of my bayes db.
    > The filesize of the bayes_seen.MYI is now near 1GByte.
    >
    > # sa-learn -u filter --dump magic
    > 0.000 0 3 0 non-token data: bayes db version
    > 0.000 0 38413200 0 non-token data: nspam
    > 0.000 0 48964536 0 non-token data: nham
    > 0.000 0 17639510 0 non-token data: ntokens
    > 0.000 0 1213554746 0 non-token data: oldest atime
    > 0.000 0 1213597971 0 non-token data: newest atime
    > 0.000 0 0 0 non-token data: last journal sync
    > atime
    > 0.000 0 1213597961 0 non-token data: last expiry atime
    > 0.000 0 43200 0 non-token data: last expire atime
    > delta
    > 0.000 0 9765 0 non-token data: last expire
    > reduction count
    >
    > yes, thats a little bit larger site ;-)
    >
    > Two month ago the database was about 4Gbyte and I purged all and created a
    > new one. But now its again very large and I looking for a way to reduce
    > it.
    >
    > Greetings
    >
    > Frank
    >
    >
    >


    --
    View this message in context: http://www.nabble.com/Reduce-Bayes-D...p17864030.html
    Sent from the SpamAssassin - Users mailing list archive at Nabble.com.


  4. Re: Reduce Bayes DB


    I looks good

    ALTER TABLE `bayes_seen` ADD `lastupdate` TIMESTAMP NOT NULL ;

    DELETE FROM bayes_seen WHERE lastupdate <= DATE_SUB(SYSDATE(), INTERVAL 2
    DAY);

    but there is still a large bayes_token DB with also more than 200MB. Is
    there also a way to reduce that?
    Does a cronjob with
    sa-learn -u filter --force-expire
    help or is that done automatic?


    Frank





    furban wrote:
    >
    > Hi,
    >
    > I would like to reduce the size of my bayes db.
    > The filesize of the bayes_seen.MYI is now near 1GByte.
    >
    > # sa-learn -u filter --dump magic
    > 0.000 0 3 0 non-token data: bayes db version
    > 0.000 0 38413200 0 non-token data: nspam
    > 0.000 0 48964536 0 non-token data: nham
    > 0.000 0 17639510 0 non-token data: ntokens
    > 0.000 0 1213554746 0 non-token data: oldest atime
    > 0.000 0 1213597971 0 non-token data: newest atime
    > 0.000 0 0 0 non-token data: last journal sync
    > atime
    > 0.000 0 1213597961 0 non-token data: last expiry atime
    > 0.000 0 43200 0 non-token data: last expire atime
    > delta
    > 0.000 0 9765 0 non-token data: last expire
    > reduction count
    >
    > yes, thats a little bit larger site ;-)
    >
    > Two month ago the database was about 4Gbyte and I purged all and created a
    > new one. But now its again very large and I looking for a way to reduce
    > it.
    >
    > Greetings
    >
    > Frank
    >
    >
    >


    --
    View this message in context: http://www.nabble.com/Reduce-Bayes-D...p17864337.html
    Sent from the SpamAssassin - Users mailing list archive at Nabble.com.


  5. Re: Reduce Bayes DB


    On Mon, June 16, 2008 15:04, furban wrote:

    > Chang the database
    > ALTER TABLE `awl` ADD `lastupdate` TIMESTAMP NOT NULL ;
    > So I thing I can do the same with bayes_seen.


    yes same can be done with bayes_seen, no problem, just dont expire one day old
    seens, i keep 6 month backlogs


    Benny Pedersen
    Need more webspace ? http://www.servage.net/?coupon=cust37098


  6. Re: Reduce Bayes DB

    > On Mon, June 16, 2008 15:04, furban wrote:
    >
    >> Chang the database
    >> ALTER TABLE `awl` ADD `lastupdate` TIMESTAMP NOT NULL ;
    >> So I thing I can do the same with bayes_seen.

    >
    > yes same can be done with bayes_seen, no problem, just
    > dont expire one day old seens, i keep 6 month backlogs
    >


    What good is that definion (without default value)?

    ALTER TABLE `awl` ADD `lastupdate` TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL;

    It works.


  7. Re: Reduce Bayes DB

    >> On Mon, June 16, 2008 15:04, furban wrote:
    >>
    >>> Chang the database
    >>> ALTER TABLE `awl` ADD `lastupdate` TIMESTAMP NOT NULL ;
    >>> So I thing I can do the same with bayes_seen.

    >>
    >> yes same can be done with bayes_seen, no problem, just
    >> dont expire one day old seens, i keep 6 month backlogs
    >>

    >
    > What good is that definion (without default value)?
    >
    > ALTER TABLE `awl` ADD `lastupdate` TIMESTAMP DEFAULT
    > CURRENT_TIMESTAMP NOT NULL;
    >
    > It works.


    Even better:

    mysql> ALTER TABLE `bayes_seen` ADD `lastupdate` TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP NOT NULL;


+ Reply to Thread