grep like molasses... - Unix

This is a discussion on grep like molasses... - Unix ; Has anybody else noticed an incredible slowness in grep 2.5.1? (On RedHat 9, but I don't know how general this is) I have a spam filter (of my own) that generates a log looking like this: ..... Subj: =?ISO-2022-JP?B?GyRCQmZPUSQsRnxLXCQsYkMwJiRHJDkkaBsoQg==?=( :-|) ...

+ Reply to Thread
Results 1 to 3 of 3

Thread: grep like molasses...

  1. grep like molasses...


    Has anybody else noticed an incredible slowness in grep 2.5.1? (On RedHat
    9, but I don't know how general this is)

    I have a spam filter (of my own) that generates a log looking like this:

    .....
    Subj: =?ISO-2022-JP?B?GyRCQmZPUSQsRnxLXCQsYkMwJiRHJDkkaBsoQg==?=( :-|)
    Probable Spam! (p=1.0000)
    Subj: Re: Visit
    Looks OK (p=0.9999)
    .....

    (The first will have gone to a spam-can, the second to my mailbox.)

    And I have a script that simply counts the entries of each type in the log:

    #! /bin/sh
    echo -n 'Accepted:'
    grep -c 'Looks OK' ~/Mail/spam_log
    echo -n 'Rejected:'
    grep -c 'Probable Spam' ~/Mail/spam_log

    Running this on an almost 8000-line file, the first grep returns a value
    of 160 0r so in about 1/2 sec (a bit slow for a 2.4GHz box...), but the
    second one takes nearly 10 sec to return the 3700 or so spam hits!

    I have my own little app ('matt') that can be asked to do the same thing,
    but it's not line-oriented like grep, so it's normally rather slower.
    However, on this machine, if I replace 'grep' by 'matt' in the above
    script, I get the answers instantaneously!

    This phenomenon seems to have been coincident with bringing up this
    new server. I never noticed it on the older (RH 7) box, but it's no
    longer around to check... However, I hauled the log file over to my
    BeOS (:-)) machine that runs grep 2.0, and sure enough it was also
    essentially instantaneous. So something is really odd with this
    installation. Does anyone have any idea what it might be?

    -- Pete --

    --
    ================================================== ==========================
    The address in the header is a Spam Bucket -- don't bother replying to it...
    (If you do need to email, replace the account name with my true name.)

  2. Re: grep like molasses...

    ["Followup-To:" header set to comp.os.linux.misc.]
    On 2006-09-28, Pete wrote:
    > Has anybody else noticed an incredible slowness in grep 2.5.1? (On RedHat
    > 9, but I don't know how general this is)


    > #! /bin/sh
    > echo -n 'Accepted:'
    > grep -c 'Looks OK' ~/Mail/spam_log
    > echo -n 'Rejected:'
    > grep -c 'Probable Spam' ~/Mail/spam_log
    >
    > Running this on an almost 8000-line file, the first grep returns a value
    > of 160 or so in about 1/2 sec (a bit slow for a 2.4GHz box...), but the
    > second one takes nearly 10 sec to return the 3700 or so spam hits!


    > This phenomenon seems to have been coincident with bringing up this
    > new server. I never noticed it on the older (RH 7) box, but it's no
    > longer around to check...


    Have you switched to a(n) UTF-8 locale? Once upon a time I found that grep
    was much slower on en_US.UTF-8 than on en_US, so for certain scripts I set
    LANG=en_US or LANG=C first.

    --
    Paul Kimoto
    This message was originally posted on Usenet in plain text. Any images,
    hyperlinks, or the like shown here have been added without my consent,
    and may be a violation of international copyright law.

  3. Re: grep like molasses...

    In article ,
    Paul Kimoto wrote:
    >["Followup-To:" header set to comp.os.linux.misc.]

    --- Sorry -- restored the original cross-post because it *might* be
    relevant there...]

    >On 2006-09-28, Pete wrote:
    >> Has anybody else noticed an incredible slowness in grep 2.5.1? (On RedHat
    >> 9, but I don't know how general this is)
    >> [....]

    >
    >Have you switched to a(n) UTF-8 locale? Once upon a time I found that grep
    >was much slower on en_US.UTF-8 than on en_US, so for certain scripts I set
    >LANG=en_US or LANG=C first.


    Bingo! I didn't *specifically* switch to UTF-8 locale, but I noticed
    the other day that RedHat made that decision for me... Did what you
    suggest, and grep went back to 'instantaneous'!

    [Slightly ironically, my 'matt' app that I mentioned in the original post
    is by default UTF-8 aware, and had to be switched to '8-bit' because
    the log is not UTF and contains foreign chars in the subject lines.
    I also found that removing such lines from the log made no difference
    to grep.]

    Many thanks for finding the answer.
    -- Pete --

    >
    >--
    >Paul Kimoto
    >This message was originally posted on Usenet in plain text. Any images,
    >hyperlinks, or the like shown here have been added without my consent,
    >and may be a violation of international copyright law.



    --
    ================================================== ==========================
    The address in the header is a Spam Bucket -- don't bother replying to it...
    (If you do need to email, replace the account name with my true name.)

+ Reply to Thread