Re: Regular expression to match only strings NOT containing particular words - Unix

This is a discussion on Re: Regular expression to match only strings NOT containing particular words - Unix ; Jürgen Exner wrote: > Dylan Nicholson wrote: >> I can write a regular expression that will only match strings that >> are NOT the word apple: >> >> ^([^a].*|a[^p].*|ap[^p].*|app[^l].*|apple.+)$ >> >> But is there a neater way, and how would ...

+ Reply to Thread
Results 1 to 6 of 6

Thread: Re: Regular expression to match only strings NOT containing particular words

  1. Re: Regular expression to match only strings NOT containing particular words

    Jürgen Exner wrote:
    > Dylan Nicholson wrote:
    >> I can write a regular expression that will only match strings that
    >> are NOT the word apple:
    >>
    >> ^([^a].*|a[^p].*|ap[^p].*|app[^l].*|apple.+)$
    >>
    >> But is there a neater way, and how would I do it to match strings
    >> that are NOT the word apple OR banana? Then what would be needed to
    >> match only strings that do not CONTAIN the word "apple" or "banana"
    >> or "cherry"?

    >
    > !(/apple/ or /banana/ or /cherry/)


    Actually, coming to think of it: there is no good reason to use a RE in the
    first place because you are looking for a literal substring only without any
    of the meta-functionality of REs. The proper tool for that much simpler task
    is index().

    jue




  2. Re: Regular expression to match only strings NOT containing particular words

    On Oct 20, 2:40 am, "Jürgen Exner" wrote:
    > Jürgen Exner wrote:
    > > Dylan Nicholson wrote:
    > >> I can write a regular expression that will only match strings that
    > >> are NOT the word apple:

    >
    > >> ^([^a].*|a[^p].*|ap[^p].*|app[^l].*|apple.+)$

    >
    > >> But is there a neater way, and how would I do it to match strings
    > >> that are NOT the word apple OR banana? Then what would be needed to
    > >> match only strings that do not CONTAIN the word "apple" or "banana"
    > >> or "cherry"?

    >
    > > !(/apple/ or /banana/ or /cherry/)

    >
    > Actually, coming to think of it: there is no good reason to use a RE in the
    > first place because you are looking for a literal substring only without any
    > of the meta-functionality of REs. The proper tool for that much simpler task
    > is index().
    >
    > jue


    Sure, except the regular expression mechanism is already in place as a
    feature of the application. I was just curious if it could be used to
    solve a particular problem.

    Unfortunately "!(/apple/ or /banana/ or /cherry/)" doesn't work with
    Microsoft's .NET regex library.

    Thanks anyway,

    Dylan



  3. Re: Regular expression to match only strings NOT containing particular words

    Dylan Nicholson wrote:
    > On Oct 20, 2:40 am, "Jürgen Exner" wrote:
    >> Actually, coming to think of it: there is no good reason to use a RE
    >> in the first place because you are looking for a literal substring
    >> only without any of the meta-functionality of REs. The proper tool
    >> for that much simpler task is index().

    >
    > Sure, except the regular expression mechanism is already in place as a
    > feature of the application.


    And index() is a function of native Perl itself.

    > Unfortunately "!(/apple/ or /banana/ or /cherry/)" doesn't work with
    > Microsoft's .NET regex library.


    Well, it is native Perl code, no need for some .Net regex library.

    jue



  4. Re: Regular expression to match only strings NOT containing particular words

    Hello Dylan,

    > On Oct 20, 2:40 am, "J?rgen Exner" wrote:
    >
    >> J?rgen Exner wrote:
    >>
    >>> Dylan Nicholson wrote:
    >>>
    >>>> I can write a regular expression that will only match strings that
    >>>> are NOT the word apple:
    >>>>
    >>>> ^([^a].*|a[^p].*|ap[^p].*|app[^l].*|apple.+)$
    >>>>
    >>>> But is there a neater way, and how would I do it to match strings
    >>>> that are NOT the word apple OR banana? Then what would be needed to
    >>>> match only strings that do not CONTAIN the word "apple" or "banana"
    >>>> or "cherry"?
    >>>>
    >>> !(/apple/ or /banana/ or /cherry/)
    >>>

    >> Actually, coming to think of it: there is no good reason to use a RE
    >> in the first place because you are looking for a literal substring
    >> only without any of the meta-functionality of REs. The proper tool
    >> for that much simpler task is index().
    >>
    >> jue
    >>

    > Sure, except the regular expression mechanism is already in place as a
    > feature of the application. I was just curious if it could be used to
    > solve a particular problem.
    >
    > Unfortunately "!(/apple/ or /banana/ or /cherry/)" doesn't work with
    > Microsoft's .NET regex library.



    It isn't ideal, but this will do the trick:

    ^((?!\b(cherry|banana|apple)\b).)*$

    Make sure you set the option SingleLine and unset the option Multiline when
    appropriate. If the application is under your control, it would probably
    be easier to add a checkbox which will invert the match result from Success
    to fail.

    Though as Jue pointed out, it's probably faster and easier to maintain when
    you implement a "bad words" list and use indexOf to see if the string is
    in there somewhere. You might even use \bword\b in a regex for that.

    --
    Jesse Houwing
    jesse.houwing at sogeti.nl



  5. Re: Regular expression to match only strings NOT containing particular words

    On Oct 28, 6:45 am, Jesse Houwing
    wrote:
    > Hello Dylan,
    >
    >
    >
    >
    >
    > > On Oct 20, 2:40 am, "J?rgen Exner" wrote:

    >
    > >> J?rgen Exner wrote:

    >
    > >>> Dylan Nicholson wrote:

    >
    > >>>> I can write a regular expression that will only match strings that
    > >>>> are NOT the word apple:

    >
    > >>>> ^([^a].*|a[^p].*|ap[^p].*|app[^l].*|apple.+)$

    >
    > >>>> But is there a neater way, and how would I do it to match strings
    > >>>> that are NOT the word apple OR banana? Then what would be needed to
    > >>>> match only strings that do not CONTAIN the word "apple" or "banana"
    > >>>> or "cherry"?

    >
    > >>> !(/apple/ or /banana/ or /cherry/)

    >
    > >> Actually, coming to think of it: there is no good reason to use a RE
    > >> in the first place because you are looking for a literal substring
    > >> only without any of the meta-functionality of REs. The proper tool
    > >> for that much simpler task is index().

    >
    > >> jue

    >
    > > Sure, except the regular expression mechanism is already in place as a
    > > feature of the application. I was just curious if it could be used to
    > > solve a particular problem.

    >
    > > Unfortunately "!(/apple/ or /banana/ or /cherry/)" doesn't work with
    > > Microsoft's .NET regex library.

    >
    > It isn't ideal, but this will do the trick:
    >
    > ^((?!\b(cherry|banana|apple)\b).)*$

    Thanks...works great...why do you say it's not ideal? I removed the
    \b's though, as I need to exclude any string that contains "apple",
    regardless of whether it's a separate word.

    >
    > Make sure you set the option SingleLine and unset the option Multiline when
    > appropriate. If the application is under your control, it would probably
    > be easier to add a checkbox which will invert the match result from Success


    Yes, we'll probably do something similar for the next version.

    >
    > Though as Jue pointed out, it's probably faster and easier to maintain when
    > you implement a "bad words" list and use indexOf to see if the string is
    > in there somewhere. You might even use \bword\b in a regex for that.
    >

    If the regex does the job, it's more than adequate for now.


  6. Re: Regular expression to match only strings NOT containing particular words

    Hello Dylan,

    > On Oct 28, 6:45 am, Jesse Houwing
    > wrote:
    >
    >> Hello Dylan,
    >>
    >>> On Oct 20, 2:40 am, "J?rgen Exner" wrote:
    >>>
    >>>> J?rgen Exner wrote:
    >>>>
    >>>>> Dylan Nicholson wrote:
    >>>>>
    >>>>>> I can write a regular expression that will only match strings
    >>>>>> that are NOT the word apple:
    >>>>>>
    >>>>>> ^([^a].*|a[^p].*|ap[^p].*|app[^l].*|apple.+)$
    >>>>>>
    >>>>>> But is there a neater way, and how would I do it to match strings
    >>>>>> that are NOT the word apple OR banana? Then what would be needed
    >>>>>> to match only strings that do not CONTAIN the word "apple" or
    >>>>>> "banana" or "cherry"?
    >>>>>>
    >>>>> !(/apple/ or /banana/ or /cherry/)
    >>>>>
    >>>> Actually, coming to think of it: there is no good reason to use a
    >>>> RE in the first place because you are looking for a literal
    >>>> substring only without any of the meta-functionality of REs. The
    >>>> proper tool for that much simpler task is index().
    >>>>
    >>>> jue
    >>>>
    >>> Sure, except the regular expression mechanism is already in place as
    >>> a feature of the application. I was just curious if it could be
    >>> used to solve a particular problem.
    >>>
    >>> Unfortunately "!(/apple/ or /banana/ or /cherry/)" doesn't work with
    >>> Microsoft's .NET regex library.
    >>>

    >> It isn't ideal, but this will do the trick:
    >>
    >> ^((?!\b(cherry|banana|apple)\b).)*$
    >>

    > Thanks...works great...why do you say it's not ideal?


    My guess is that is isn't the fastest solution.

    > I removed the
    > \b's though, as I need to exclude any string that contains "apple",
    > regardless of whether it's a separate word.


    Ok, didn't understand that from the original post. You can then also remove
    the addiotional ()

    ^((?!cherry|banana|apple).)*$

    >> Make sure you set the option SingleLine and unset the option
    >> Multiline when
    >> appropriate. If the application is under your control, it would
    >> probably
    >> be easier to add a checkbox which will invert the match result from
    >> Success

    > Yes, we'll probably do something similar for the next version.
    >
    >> Though as Jue pointed out, it's probably faster and easier to
    >> maintain when
    >> you implement a "bad words" list and use indexOf to see if the string
    >> is
    >> in there somewhere. You might even use \bword\b in a regex for that.

    > If the regex does the job, it's more than adequate for now.


    Good. Glad I was of help.

    --
    Jesse Houwing
    jesse.houwing at sogeti.nl



+ Reply to Thread