Querying the AWL - SpamAssassin

This is a discussion on Querying the AWL - SpamAssassin ; I've been happily using SpamAssassin with MIMEDefang for nearly a year now. I have a question about controlling and querying the whitelist. The per-user automatic whitelist is enabled and clearly doing "something" (because it's growing in size) but I can't ...

+ Reply to Thread
Results 1 to 2 of 2

Thread: Querying the AWL

  1. Querying the AWL

    I've been happily using SpamAssassin with MIMEDefang for nearly a year now.
    I have a question about controlling and querying the whitelist.

    The per-user automatic whitelist is enabled and clearly doing "something"
    (because it's growing in size) but I can't find much documentation about it.
    Is there any way to query the email addresses stored in the AWL? For
    example, periodically I wouldn't mind going through the AWL and promoting
    addresses to the actual whitelist for each user (and then mark all the
    others as permanently blacklisted) just to "help" the AWL on its way. Also,
    as I use MIMEDefang, this would allow me to implement the blacklist earlier
    without actually calling spamc as I could bounce the email immediately
    following the SMTP FROM clause. Looking through the docs - I can't see any
    way of querying the addresses and scores from the AWL... did I miss
    something?

    Ultimately, what I'm planning on doing, either in my MIMEDefang filter or by
    parsing the sendmail logs every now and then, is to update user's whitelists
    such that any email address emailed *by* a user are automatically added to
    their personal whitelist in user_prefs. Additionally, because most of my
    users use Outlook, I'd periodically synchronise Outlook address books with
    the server. MIMEDefang is configured to bounce email above a certain
    threshold: giving it the users' address books allows this bounce threshold
    to be very low (e.g. 3-5) as MIMEDefang could use a higher bounce threshold
    (e.g. 10) for recognised email addresses - which would hopefully still catch
    SPAM with a forged from address (though from what I can see it's relatively
    rare to get SPAM from a forged address that you actually know).

    Is this:

    a) Sensible (is it a good idea to have a huge number of email addresses in
    user_prefs?)
    b) Has anyone configured SpamAssassin (via MIMEDefang or any other milter)
    to work this before?

    I've googled around and couldn't see anything. I'm invoking SpamAssassin via
    spamc running version 3.2.5 on Fedora 9 but I don't expect that's relevant
    here.

    Thanks in advance for any advice/tips


    David


  2. Re: Querying the AWL

    David Allsopp wrote:

    > The per-user automatic whitelist is enabled and clearly doing "something"
    > (because it's growing in size) but I can't find much documentation about it.


    perldoc Mail::SpamAssassin::Plugin::AWL
    perldoc Mail::SpamAssassin::AutoWhitelist

    > Is there any way to query the email addresses stored in the AWL? For
    > example, periodically I wouldn't mind going through the AWL and promoting
    > addresses to the actual whitelist for each user (and then mark all the
    > others as permanently blacklisted) just to "help" the AWL on its way.


    If you trust the AWL enough to use it that way, maybe you should
    simply raise the "auto_whitelist_factor". This way the AWLs score
    adjustments will get bigger without the need for new code anywhere.

    Personally I would only be prepared to straight white/blacklist
    for addresses that have a *very* high or low score in the AWL,
    but addresses with very high/low scores will result in a big
    score adjustment from the AWL anyway.

    So promoting addresses from the AWL to white/black lists would
    only help if those lists are either used outside SA or used with
    short circuiting.

    Considering promoting the addresses to straight black/white lists
    in SA, I'n not sure if SA handles partial IP addresses for
    whitelist_from_rcvd, wich is what is stored in the AWL.

    > as I use MIMEDefang, this would allow me to implement the blacklist earlier


    This shouldn't be too hard to do if you have the AWL use a SQL
    database.

    Otherwise you should be able to do it with the help of
    Mail::SpamAssassin::AutoWhitelist.

    > Looking through the docs - I can't see any
    > way of querying the addresses and scores from the AWL... did I miss
    > something?


    perldoc Mail::SpamAssassin::AutoWhitelist should give some hints.

    You might need tp read some source code though.

    > Ultimately, what I'm planning on doing, either in my MIMEDefang filter or by
    > parsing the sendmail logs every now and then, is to update user's whitelists
    > such that any email address emailed *by* a user are automatically addedto
    > their personal whitelist in user_prefs.


    I'm doing something similar to this with MIMEDefang and a
    SpamAssassin plugin. See below.

    > Additionally, because most of my
    > users use Outlook, I'd periodically synchronise Outlook address books with
    > the server.


    Do note that the AWL uses (a part of) the IP-address of the relay
    as well as the mail address. This information will be missing
    from the MUAs address books.

    Whitelisting based only on email addresses often leads to FNs.

    > Is there any way to query the email addresses stored in the AWL? For
    > example, periodically I wouldn't mind going through the AWL and promoting
    > addresses to the actual whitelist for each user (and then mark all the

    [snip]
    > MIMEDefang is configured to bounce email above a certain
    > threshold: giving it the users' address books allows this bounce threshold
    > to be very low (e.g. 3-5) as MIMEDefang could use a higher bounce threshold
    > (e.g. 10) for recognised email addresses


    You don't actually need to do anything special in SA for this.
    Since MIMEDefang knows from who a mail is, from wich relay, and
    to which local address, you could have MIMEDefang use different
    thresholds depending on this information.

    Using the AWL data to adjust the threshold seems odd to me. Since
    the AWL data allready adjusts the score, an adjustment to the
    threshold as well based on the same data will just make the
    adjustment stronger. This can be done easier and with less code
    by simply adjusting "auto_whitelist_factor" for SA.

    > SPAM with a forged from address (though from what I can see it's relatively
    > rare to get SPAM from a forged address that you actually know).


    Especially if you check the sending relay as well as the mail
    address. Wich I think you should.

    > b) Has anyone configured SpamAssassin (via MIMEDefang or any other milter)
    > to work this before?


    Not exactly what describe, but another solutions with slightly
    similar goals.

    * My mimedefang-filter saves information about all *outgoing*
    mail to a SQL database. I then have a SpamAssassin plugin that
    checks to see if incoming mail is likely to be replies to
    outgoing mail.

    * The filter also keeps tracks of incoming spam/ham (as
    determined by SA) and uses this both to bypass spamassassin and
    to block mail before having to call SA.

    Both the filter and the plugin is available at
    http://whatever.frukt.org/mimedefangfilter.text.shtml

    The filter is *huge*, but I hope it's not too hard to find the
    relevant parts of it.

    Regards
    /Jonas
    --
    Jonas Eckerman, FSDB & Fruktträdet
    http://whatever.frukt.org/
    http://www.fsdb.org/
    http://www.frukt.org/


+ Reply to Thread