Searching for a list of strings - SpamAssassin

This is a discussion on Searching for a list of strings - SpamAssassin ; Hi all, I'm looking for some regex to find a list of strings in the body, independent where they are and so on. Example: i am Nice Girl good looking girl who is looking to chat with you. email me ...

+ Reply to Thread
Results 1 to 6 of 6

Thread: Searching for a list of strings

  1. Searching for a list of strings


    Hi all,

    I'm looking for some regex to find a list of strings in the body,
    independent where they are and so on.

    Example:

    i am Nice Girl good looking girl who is looking to chat with you.
    email me back at szCic@officiallam.com

    i will reply back with some really nice pics or skype realtime videos.

    The most common phrases are: nice girl, good looking, chat with you, nice
    pics, videos

    So if at least three of them hit, the rule should match.

    Sorry for bothering again.
    --
    View this message in context: http://www.nabble.com/Searching-for-...p19455236.html
    Sent from the SpamAssassin - Users mailing list archive at Nabble.com.


  2. Re: Searching for a list of strings

    patrickbaer schrieb:
    > Hi all,
    >
    > I'm looking for some regex to find a list of strings in the body,
    > independent where they are and so on.
    >
    > Example:
    >
    > i am Nice Girl good looking girl who is looking to chat with you.
    > email me back at szCic@officiallam.com
    >
    > i will reply back with some really nice pics or skype realtime videos.
    >
    > The most common phrases are: nice girl, good looking, chat with you, nice
    > pics, videos
    >
    > So if at least three of them hit, the rule should match.
    >
    > Sorry for bothering again.


    on my side
    this nice girl stuff is mostly matched
    by pyzor, razor, dcc, ixhash, freemail plugins etc
    so phrase matches arent that much important
    there arent so much mails of such kind which bypass
    rbls and clamav-milter at smtp level

    --
    Best Regards

    MfG Robert Schetterer

    Germany/Munich/Bavaria


  3. Re: Searching for a list of strings


    This is the email that went through. Nothing about razor though?

    Return-Path:
    Received: from medusa.tvwerk.de ([unix socket])
    by medusa2 (Cyrus v2.2.13-Debian-2.2.13-10.cb1.1) with LMTPA;
    Fri, 12 Sep 2008 14:33:23 +0200
    X-Sieve: CMU Sieve 2.2
    Received: from proxy.tvwerk.de (proxy1 [10.10.10.2])
    by medusa.tvwerk.de (Postfix) with ESMTP id 0B6D51BD7F23
    for ; Fri, 12 Sep 2008 14:33:23 +0200 (CEST)
    Received: from localhost (unknown [10.10.10.66])
    by proxy.tvwerk.de (Postfix) with ESMTP id 0313F304012
    for ; Fri, 12 Sep 2008 14:33:23 +0200 (CEST)
    X-Virus-Scanned: amavisd-new at animoto.intern
    X-Spam-Flag: NO
    X-Spam-Score: 4.427
    X-Spam-Level: ****
    X-Spam-Status: No, score=4.427 tagged_above=-999 required=5
    tests=[BAYES_50=0.001, HTML_MESSAGE=0.001, RCVD_FORGED_WROTE2=4.325,
    RDNS_NONE=0.1]
    Received: from proxy.tvwerk.de ([10.10.10.2])
    by localhost (voodoo.animoto.intern [10.10.10.66]) (amavisd-new, port
    10024)
    with ESMTP id ZBT1zGMIqj33 for ;
    Fri, 12 Sep 2008 14:33:04 +0200 (MEST)
    Received: from furtmair.com (unknown [79.165.217.243])
    by proxy.tvwerk.de (Postfix) with SMTP id 04966304031
    for ; Fri, 12 Sep 2008 14:32:11 +0200 (CEST)
    Received: from 212.203.9.120 (HELO mail3.servernation.nl)
    by tvwerk.de with esmtp ({nChar[8-12]} {nChar[4-6]})
    id 8secZp-Vw7spY-Ee
    for itk@tvwerk.de; Fri, 12 Sep 2008 16:32:12 +0400
    Message-ID: <45c801c914d3$94e76f20$4fa5d9f3@Rowena>
    From: "Rowena Hyatt"
    To: "Cherry Zapata"
    Subject: i need you
    Date: Fri, 12 Sep 2008 16:32:12 +0400
    MIME-Version: 1.0
    Content-Type: multipart/alternative;
    boundary="----=_NextPart_17862_4630_01C914F5.1BF90F20"
    X-Priority: 3
    X-MSMail-Priority: Normal
    X-Mailer: Microsoft Outlook Express 6.00.2900.2180
    X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2180

    This is a multi-part message in MIME format.

    ------=_NextPart_17862_4630_01C914F5.1BF90F20
    Content-Type: text/plain;
    charset="iso-8859-1"
    Content-Transfer-Encoding: quoted-printable

    i am Nice Girl good looking girl who is looking to chat with you=2E=20
    email me back at 8TclS@officiallam=2Ecom=20

    i will reply back with some really nice pics or skype realtime videos=2E
    ------=_NextPart_17862_4630_01C914F5.1BF90F20
    Content-Type: text/html;
    charset="iso-8859-1"
    Content-Transfer-Encoding: quoted-printable



    1">


    i am Nice Girl good looking girl who is looking to chat with you=2E=



    email me back at 3D"mailto:Cp35U@officiallam=2Ecom" szCic@offi=
    ciallam=2Ecom



    i will reply back with some really nice pics or skype realtime videos=2E<=
    /b>

    ------=_NextPart_17862_4630_01C914F5.1BF90F20--



    Robert Schetterer wrote:
    >
    > patrickbaer schrieb:
    >> Hi all,
    >>
    >> I'm looking for some regex to find a list of strings in the body,
    >> independent where they are and so on.
    >>
    >> Example:
    >>
    >> i am Nice Girl good looking girl who is looking to chat with you.
    >> email me back at szCic@officiallam.com
    >>
    >> i will reply back with some really nice pics or skype realtime videos.
    >>
    >> The most common phrases are: nice girl, good looking, chat with you, nice
    >> pics, videos
    >>
    >> So if at least three of them hit, the rule should match.
    >>
    >> Sorry for bothering again.

    >
    > on my side
    > this nice girl stuff is mostly matched
    > by pyzor, razor, dcc, ixhash, freemail plugins etc
    > so phrase matches arent that much important
    > there arent so much mails of such kind which bypass
    > rbls and clamav-milter at smtp level
    >
    > --
    > Best Regards
    >
    > MfG Robert Schetterer
    >
    > Germany/Munich/Bavaria
    >
    >


    --
    View this message in context: http://www.nabble.com/Searching-for-...p19455515.html
    Sent from the SpamAssassin - Users mailing list archive at Nabble.com.


  4. Re: Searching for a list of strings

    On Fri, 2008-09-12 at 05:56 -0700, patrickbaer wrote:
    > Hi all,
    >
    > I'm looking for some regex to find a list of strings in the body,
    > independent where they are and so on.
    >
    > Example:
    >
    > i am Nice Girl good looking girl who is looking to chat with you.
    > email me back at szCic@officiallam.com
    >
    > i will reply back with some really nice pics or skype realtime videos.
    >
    > The most common phrases are: nice girl, good looking, chat with you, nice
    > pics, videos
    >
    > So if at least three of them hit, the rule should match.
    >

    I'm currently using this type of rule set plus a combining meta rule:

    #
    # Fake degrees
    #
    describe MG_DEGREE Mail-order degree offers
    body __MG_D1 /(Bachelors|Bacheelor|Bachellor)/i
    body __MG_D2 /(Masters|Masteer|MasteerMBA|MassterMBA)/i
    body __MG_D3 /(Doctorate|Doctoraate|Doctoorate|Doctor)/i
    body __MG_D4 /weeks.*college graduate/i
    body __MG_D5 /(diploma|Diiploma|Certiificates)/i
    meta MG_DEGREE ((__MG_D1+__MG_D2+__MG_D3+__MG_D4+__MGD5)>2)
    score MG_DEGREE 4.5

    which is easily extended and adapted to other sets of key words. Each of
    the subordinate rules scores 1 or TRUE when hit, so you can use either
    arithmetic or boolean logic in the meta rule.

    Debugging may be easier if you omit the double leading underscore in the
    body rules since they will then show up in message headers when hit
    (and add a score of 1.0 to the spam total). In this case the meta rule
    would add emphasis to the scores accumulated by the body rules.

    NOTE: The names are important: if any body rule's name is the same as
    the meta rule or is an exact part of it ( MG_DEG is a subset of
    MG_DEGREE but MG_DEG1 is not) then the rule set will not work as you
    expect: the meta rule will not fire.


    Martin


  5. Re: Searching for a list of strings

    patrickbaer wrote:
    > Hi all,
    >
    > I'm looking for some regex to find a list of strings in the body,
    > independent where they are and so on.
    >
    > Example:
    >
    > i am Nice Girl good looking girl who is looking to chat with you.
    > email me back at szCic@officiallam.com
    >
    > i will reply back with some really nice pics or skype realtime videos.
    >
    > The most common phrases are: nice girl, good looking, chat with you, nice
    > pics, videos
    >
    > So if at least three of them hit, the rule should match.


    "at least three of them" can't be implemented without explicit
    repetition or without a plugin. you can however do

    body _BODY_1 /foo/
    body _BODY_2 /bar/
    body _BODY_3 /blah/
    meta BODY_FOOBAR (_BODY_1 && _BODY_2 && _BODY_3)
    score BODY_FOOBAR 0.01
    describe BODY_FOOFAR contains foo and bar and blah


    >
    > Sorry for bothering again.



  6. Re: Searching for a list of strings

    patrickbaer wrote:

    > I'm looking for some regex to find a list of strings in the body,
    > independent where they are and so on.


    Sounds more like your looking for a meta rule.

    perldoc Mail::SpamAssassin::Conf
    and search for "meta"

    > The most common phrases are: nice girl, good looking, chat with you, nice
    > pics, videos


    > So if at least three of them hit, the rule should match.


    Do you mean something like the below (untested) rules?

    body __TB_1 /\bnice girl\b/
    body __TB_2 /\bgood looking\b/
    body __TB_3 /\bchat with you\b/
    body __TB_4 /\bnice pics\b/
    body __TB_5 /\bvideos\b/
    meta TB (__TB_1+__TB_2+__TB_3+__TB_4+__TB_5)>2

    Regards
    /Jonas

    --
    Jonas Eckerman, FSDB & Fruktträdet
    http://whatever.frukt.org/
    http://www.fsdb.org/
    http://www.frukt.org/


+ Reply to Thread