--Fw8vdPO5iEPGjqL+
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Jul 02, 2007 at 01:28:27PM -0700, Jo Rhett wrote:
> Both of these assume I know every person who needs to e-mail me, and =20
> everything they will send me. Theo, you're active in enough open =20
> source projects to know better.


Well, you just said you were receiving a large amount of "system" type mail=
s,
which for me would all be from my own/well defined set of systems.

> Well then we need to alter the code. While bareword domain matching =20
> might make sense, it doesn't make sense for /a/valid/system/path/=20
> file.pl for "file.pl" to be checked. Zero hits on spam corpus.


I think this is definitely a section of SA that could
use some work, so ... Patches welcome. As a start,
PerMsgStatus::_get_parsed_uri_list() is the function that goes through
the text looking for hostnames or domains. It looks for both schemed URIs
(http://.../) and schemeless URIs, which is where you're getting hit.

Everything else, such as URIDNSBL, keys off of that.


Random thought: URIDNSBL actually has a set of priorities when figuring out
which domains to query. I wonder if the results would be better/worse if t=
he
rules were based on the source type -- at least HTML versus parsed, but cou=
ld
also be HTML tag, etc.

--=20
Randomly Selected Tagline:
"G: And are you using Windows or a Mac?
T: Neither, I'm using Linux.
G: Oh, you're a power user." - Theo and his ex-ISP

--Fw8vdPO5iEPGjqL+
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)

iD8DBQFGiWsnamwUIkXWD1cRAiwSAJ4m4tzdad89ij8zEaw9Ze epVRn/zQCfSQy/
JR3yFWovWKbS3lpMZAJqbgc=
=LFLP
-----END PGP SIGNATURE-----

--Fw8vdPO5iEPGjqL+--