uri rules - SpamAssassin

This is a discussion on uri rules - SpamAssassin ; ...

+ Reply to Thread
Results 1 to 8 of 8

Thread: uri rules

  1. Re: uri rules


  2. Re: uri rules


  3. Re: uri rules


  4. uri rules


    I was surprised that this rule...

    uri CU_CN_LINK /http:..\w+\.cn\b/

    matches not only this...



    but also this...

    KooXoo Buys Kuxun.cn Domain


    First, I did not realize that SpamAssassin's idea of "uri" includes not
    only the uri, but the start tag, end tag, and all in between. That's
    useful but not real clear in Mail::SpamAssassin::Conf.

    Second, I can't figure out how \w+ matches the punctuation and spaces!

    Joseph Brennan
    Columbia University I T


  5. Re: uri rules

    Joseph Brennan wrote:
    >
    > I was surprised that this rule...
    >
    > uri CU_CN_LINK /http:..\w+\.cn\b/
    >
    > matches not only this...
    >
    >
    >
    > but also this...
    >
    >
    KooXoo Buys Kuxun.cn
    > Domain

    >
    >
    > First, I did not realize that SpamAssassin's idea of "uri" includes not
    > only the uri, but the start tag, end tag, and all in between. That's
    > useful but not real clear in Mail::SpamAssassin::Conf.

    Actually, it doesn't.. your second example has two URIs as far as
    SpamAssassin is concerned. "http://www.columbia.edu/foo.html" and
    "http://Kuxun.cn". Two separate URIs.

    Since many email clients "auto-link" domains in text portions, like
    www.google.com, SpamAssassin tries to find text strings that clients
    will treat as URIs and use them in the URI tests as well.

    >
    > Second, I can't figure out how \w+ matches the punctuation and spaces!

    It doesn't.


  6. Re: uri rules


    Thanks, Mouss and Matt.

    So a uri regexp will match a "http://" that is not there. OK, well...

    Joe Brennan


  7. Re: uri rules

    Matt Kettler wrote:
    > Joseph Brennan wrote:
    >>
    >> I was surprised that this rule...
    >>
    >> uri CU_CN_LINK /http:..\w+\.cn\b/
    >>
    >> matches not only this...
    >>
    >>
    >>
    >> but also this...
    >>
    >>
    KooXoo Buys Kuxun.cn
    >> Domain

    >>
    >>
    >> First, I did not realize that SpamAssassin's idea of "uri" includes not
    >> only the uri, but the start tag, end tag, and all in between. That's
    >> useful but not real clear in Mail::SpamAssassin::Conf.

    > Actually, it doesn't.. your second example has two URIs as far as
    > SpamAssassin is concerned. "http://www.columbia.edu/foo.html" and
    > "http://Kuxun.cn". Two separate URIs.
    >
    > Since many email clients "auto-link" domains in text portions, like
    > www.google.com, SpamAssassin tries to find text strings that clients
    > will treat as URIs and use them in the URI tests as well.
    >


    How so? How does spamassassin URI check determine Kuxun.cn in a URI as
    opposed to someone who forgot to add a "space" after a sentence end? Is
    it because it is located within the "a" tag?
    >>
    >> Second, I can't figure out how \w+ matches the punctuation and spaces!

    > It doesn't.
    >
    >



  8. Re: uri rules

    Randy Ramsdell wrote:
    >
    >
    > How so? How does spamassassin URI check determine Kuxun.cn in a URI
    > as opposed to someone who forgot to add a "space" after a sentence end?

    Well, CN is a rather strange word to start a sentence with, but it
    doesn't know the difference between an intentional domain and a lack of
    spacing. SpamAssassin no more selective than some email clients are.
    There's a "word" object ending in a . and a valid TLD, so it gets
    treated as a URI.

    However, it shouldn't linkify things like : experiment.see because
    "see" isn't a valid TLD.

    > Is it because it is located within the "a" tag?

    The "a" tag has nothing to do with it.

    IIRC, the code that does this runs after all the HTML tags have been
    stripped out, so it cannot have anything to do with it. (i.e.: it runs
    on the same text that "body" rules see).


+ Reply to Thread