This is a discussion on Re: S-P-A-M Extra long domain names rule? - SpamAssassin ; --8EXHJdkEfksMlRxO Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable I haven't run any real statistics about this, but it's worth realizing that unless there's a significant number of spams that have this behavior, a rule probably costs more in resource use ...
--8EXHJdkEfksMlRxO
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
I haven't run any real statistics about this, but it's worth realizing
that unless there's a significant number of spams that have this behavior,
a rule probably costs more in resource use than it provides in hits.
A quick:
pcregrep -ri 'http://(?:[^/.]+\.){7}'
in my corpus shows about 20 spam hits in some 245000 mails. There could be
reasons this RE wouldn't hit, but in general I wouldn't bother.
On Tue, Apr 22, 2008 at 01:24:37AM +0200, Karsten Br=E4ckelmann wrote:
> On Mon, 2008-04-21 at 22:16 +0200, mouss wrote:
> > untested yet:
>=20
> > uri URI_DEEP5 m|https?://[\w-]\.[\w-]\.[\w-]\.[\w-]\.[\w-]\.|
> > score URI_DEEP5 0.1
> >=20
> > uri URI_DEEP6 m|https?://[\w-]\.[\w-]\.[\w-]\.[\w-]\.[\w-]\.[\w-=
]\.|
> > score URI_DEEP6 1.0
> >=20
> > uri URI_DEEP7 =20
> > m|https?://[\w-]\.[\w-]\.[\w-]\.[\w-]\.[\w-]\.[\w-]\.[\w-]\.|
> > score URI_DEEP7 2.0
>=20
> Beware, those are adding up. Since you didn't anchor the end of the RE
> to ($|/), whatever hits URI_DEEP7 hits the previous ones, too. Effective
> score: 3.1
>=20
> They don't work anyway.You are testing for single chars between the
> dots. And the '-' should be first in a char class, if it is to represent
> itself. Also, I'd prefer to keep them cleaner and more readable using
> quantifiers, rather than copying parts 7 times...
>=20
> uri URI_DEEP7 m,https?://([-\w]+\.){6},
>=20
> The above forces 6 dots, and thus "7 levels". Hits on even longer URIs,
> too -- the same constraint of adding scores applies here.
>=20
> Oh, and yes -- this one is untested, too.
>=20
> guenther
>=20
>=20
> --=20
> char *t=3D"\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a \x10\xf4\xf4=
\xc4";
> main(){ char h,m=3Dh=3D*t++,*x=3Dt+2*h,c,i,l=3D*x,s=3D0; for (i=3D0;i
++){ i%8? c<<=3D1:
> (c=3D*++x); c&128 && (s+=3Dh); if (!(h>>=3D1)||!t[s+h]){ putchar(t[s]);h=
=3Dm;s=3D0; }}}
--=20
Randomly Selected Tagline:
Hear Me, California! Tomorrow you vote. Again. Good luck, and I hope
you get the Governor you deserve. I think it was Adlai Stevenson who said
that there's nothing more inspiring in human society than the spectacle
of the democratic process being bizarrely subverted by a well-funded
partisan exploitation of a constitutional loophole. How true that is.
- Adam Felber, http://www.felbers.net/mt/archives/001654.html
--8EXHJdkEfksMlRxO
Content-Type: application/pgp-signature
Content-Disposition: inline
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (GNU/Linux)
iD8DBQFIDSS5RnAwoQckjjoRAnCXAJwN1BxIPQYSyVLqRhw+oN q0YSe/UgCgwYDd
6OoisReZIkqkS1ZHW6RXmB8=
=JLsk
-----END PGP SIGNATURE-----
--8EXHJdkEfksMlRxO--