find lines containing identical strings in two files - Unix

This is a discussion on find lines containing identical strings in two files - Unix ; I'd like to print out all the lines in the standard output that contain 6-character strings that begin with a contant substring AA and that are also present as identical AA1234 strings in a saved file. For example, if the ...

+ Reply to Thread
Results 1 to 13 of 13

Thread: find lines containing identical strings in two files

  1. find lines containing identical strings in two files

    I'd like to print out all the lines in the standard output that
    contain 6-character strings that begin with a contant substring AA and
    that are also present as identical AA1234 strings in a saved file.

    For example, if the output, a listing of files on a number of servers
    created with the 'find' command, contains 'AA1234', and if the
    'AA1234' string exists anywhere in the master file, I'd like to have
    the original line displayed in the output (the AA part is a constant,
    but 1234 is a variable part).

    Any idea what combination of the standard Unix utilities (grep, sed,
    awk, find...) could be used to solve the problem?

    z.entropic


  2. Re: find lines containing identical strings in two files

    In article
    <73dbdce2-10a2-4b76-9963-1e172f654f43@c65g2000hsa.googlegroups.com>,
    "z.entropic" wrote:

    > I'd like to print out all the lines in the standard output that
    > contain 6-character strings that begin with a contant substring AA and
    > that are also present as identical AA1234 strings in a saved file.
    >
    > For example, if the output, a listing of files on a number of servers
    > created with the 'find' command, contains 'AA1234', and if the
    > 'AA1234' string exists anywhere in the master file, I'd like to have
    > the original line displayed in the output (the AA part is a constant,
    > but 1234 is a variable part).
    >
    > Any idea what combination of the standard Unix utilities (grep, sed,
    > awk, find...) could be used to solve the problem?
    >
    > z.entropic


    grep -f master_file listing_file

    --
    Barry Margolin, barmar@alum.mit.edu
    Arlington, MA
    *** PLEASE don't copy me on replies, I'll read them in the group ***

  3. Re: find lines containing identical strings in two files

    On Apr 7, 10:04*pm, Barry Margolin wrote:
    > In article
    > <73dbdce2-10a2-4b76-9963-1e172f654...@c65g2000hsa.googlegroups.com>,
    >
    > *"z.entropic" wrote:
    > > I'd like to print out all the lines in the standard output that
    > > contain 6-character strings that begin with a contant substring AA and
    > > that are also present as identical AA1234 strings in a saved file.

    >
    > > For example, if the output, a listing of files on a number of servers
    > > created with the 'find' command, contains 'AA1234', and if the
    > > 'AA1234' string exists anywhere in the master file, I'd like to have
    > > the original line displayed in the output (the AA part is a constant,
    > > but 1234 is a variable part).

    >
    > > Any idea what combination of the standard Unix utilities (grep, sed,
    > > awk, find...) could be used to solve the problem?

    >
    > > z.entropic

    >
    > grep -f master_file listing_file
    >
    > --
    > Barry Margolin, bar...@alum.mit.edu
    > Arlington, MA
    > *** PLEASE don't copy me on replies, I'll read them in the group ***


    Unfortunately, that would be too easy... These are embedded strings
    in both lists/files, not separate words, something that would have to
    be extracted by, e.g, sed 's/.*\(AA....\).*/\1/' on the fly to create
    a list for comparison... The only marker is the AA string followed by
    4 characters.

    I see now I could use a two- or three-step process: first, extract the
    strings form the std output and save them in a tmp file, then extract
    the strings form the saved master file, then use grep -f f1 f2 to
    compare ... I was thinking more of a one-liner, but will go this route
    if it's not as simple.

    The issue is this: how to extract a string from stdout (using the
    above sed expression) and pass is directly to grep for comparison with
    another file, and then print the entire input line from which this
    expression was originally extracted.

    z.entropic

  4. Re: find lines containing identical strings in two files

    z.entropic wrote:
    > On Apr 7, 10:04 pm, Barry Margolin wrote:
    >> In article
    >> <73dbdce2-10a2-4b76-9963-1e172f654...@c65g2000hsa.googlegroups.com>,
    >>
    >> "z.entropic" wrote:
    >>> I'd like to print out all the lines in the standard output that
    >>> contain 6-character strings that begin with a contant substring AA and
    >>> that are also present as identical AA1234 strings in a saved file.
    >>> For example, if the output, a listing of files on a number of servers
    >>> created with the 'find' command, contains 'AA1234', and if the
    >>> 'AA1234' string exists anywhere in the master file, I'd like to have
    >>> the original line displayed in the output (the AA part is a constant,
    >>> but 1234 is a variable part).
    >>> Any idea what combination of the standard Unix utilities (grep, sed,
    >>> awk, find...) could be used to solve the problem?
    >>> z.entropic

    >> grep -f master_file listing_file
    >>
    >> --
    >> Barry Margolin, bar...@alum.mit.edu
    >> Arlington, MA
    >> *** PLEASE don't copy me on replies, I'll read them in the group ***

    >
    > Unfortunately, that would be too easy... These are embedded strings
    > in both lists/files, not separate words, something that would have to
    > be extracted by, e.g, sed 's/.*\(AA....\).*/\1/' on the fly to create
    > a list for comparison... The only marker is the AA string followed by
    > 4 characters.
    >

    ....


    sed 's/.*\(AA....\).*/\1/' saved_file | fgrep -f - search_file

    works with GNU fgrep.

    awk '
    {x=substr($0,index($0,"AA"),6)}
    FILENAME=="-" {a[x]=1;next}
    a[x]==1
    ' - search_file < saved_file

    works on any system.


    --
    Michael Tosch @ hp : com

  5. Re: find lines containing identical strings in two files

    Michael Tosch wrote:
    > z.entropic wrote:
    >> On Apr 7, 10:04 pm, Barry Margolin wrote:
    >>> In article
    >>> <73dbdce2-10a2-4b76-9963-1e172f654...@c65g2000hsa.googlegroups.com>,
    >>>
    >>> "z.entropic" wrote:
    >>>> I'd like to print out all the lines in the standard output that
    >>>> contain 6-character strings that begin with a contant substring AA and
    >>>> that are also present as identical AA1234 strings in a saved file.
    >>>> For example, if the output, a listing of files on a number of servers
    >>>> created with the 'find' command, contains 'AA1234', and if the
    >>>> 'AA1234' string exists anywhere in the master file, I'd like to have
    >>>> the original line displayed in the output (the AA part is a constant,
    >>>> but 1234 is a variable part).
    >>>> Any idea what combination of the standard Unix utilities (grep, sed,
    >>>> awk, find...) could be used to solve the problem?
    >>>> z.entropic
    >>> grep -f master_file listing_file
    >>>
    >>> --
    >>> Barry Margolin, bar...@alum.mit.edu
    >>> Arlington, MA
    >>> *** PLEASE don't copy me on replies, I'll read them in the group ***

    >>
    >> Unfortunately, that would be too easy... These are embedded strings
    >> in both lists/files, not separate words, something that would have to
    >> be extracted by, e.g, sed 's/.*\(AA....\).*/\1/' on the fly to create
    >> a list for comparison... The only marker is the AA string followed by
    >> 4 characters.
    >>

    > ...
    >
    >
    > sed 's/.*\(AA....\).*/\1/' saved_file | fgrep -f - search_file
    >
    > works with GNU fgrep.
    >
    > awk '
    > {x=substr($0,index($0,"AA"),6)}
    > FILENAME=="-" {a[x]=1;next}
    > a[x]==1
    > ' - search_file < saved_file
    >
    > works on any system.
    >
    >


    Correction:

    awk '
    (i=index($0,"AA"))==0 {next}
    {x=substr($0,i,6)}
    FILENAME=="-" {a[x]=1;next}
    a[x]==1
    ' - search_file < saved_file

    --
    Michael Tosch @ hp : com

  6. Re: find lines containing identical strings in two files

    Michael Tosch wrote:
    > Michael Tosch wrote:
    >> z.entropic wrote:
    >>> On Apr 7, 10:04 pm, Barry Margolin wrote:
    >>>> In article
    >>>> <73dbdce2-10a2-4b76-9963-1e172f654...@c65g2000hsa.googlegroups.com>,
    >>>>
    >>>> "z.entropic" wrote:
    >>>>> I'd like to print out all the lines in the standard output that
    >>>>> contain 6-character strings that begin with a contant substring AA and
    >>>>> that are also present as identical AA1234 strings in a saved file.
    >>>>> For example, if the output, a listing of files on a number of servers
    >>>>> created with the 'find' command, contains 'AA1234', and if the
    >>>>> 'AA1234' string exists anywhere in the master file, I'd like to have
    >>>>> the original line displayed in the output (the AA part is a constant,
    >>>>> but 1234 is a variable part).
    >>>>> Any idea what combination of the standard Unix utilities (grep, sed,
    >>>>> awk, find...) could be used to solve the problem?
    >>>>> z.entropic
    >>>> grep -f master_file listing_file
    >>>>
    >>>> --
    >>>> Barry Margolin, bar...@alum.mit.edu
    >>>> Arlington, MA
    >>>> *** PLEASE don't copy me on replies, I'll read them in the group ***
    >>>
    >>> Unfortunately, that would be too easy... These are embedded strings
    >>> in both lists/files, not separate words, something that would have to
    >>> be extracted by, e.g, sed 's/.*\(AA....\).*/\1/' on the fly to create
    >>> a list for comparison... The only marker is the AA string followed by
    >>> 4 characters.
    >>>

    >> ...
    >>
    >>
    >> sed 's/.*\(AA....\).*/\1/' saved_file | fgrep -f - search_file
    >>


    Oh my dear same mistake here.
    Correction:

    sed -n 's/.*\(AA....\).*/\1/p' saved_file | fgrep -f - search_file

    or with a newer GNU grep:

    grep -o 'AA....' saved_file | fgrep -f - search_file

    --
    Michael Tosch @ hp : com

  7. Re: find lines containing identical strings in two files

    On Apr 8, 3:16*pm, Michael Tosch
    wrote:
    > Michael Tosch wrote:
    > > Michael Tosch wrote:
    > >> z.entropic wrote:
    > >>> On Apr 7, 10:04 pm, Barry Margolin wrote:
    > >>>> In article
    > >>>> <73dbdce2-10a2-4b76-9963-1e172f654...@c65g2000hsa.googlegroups.com>,

    >
    > >>>> *"z.entropic" wrote:
    > >>>>> I'd like to print out all the lines in the standard output that
    > >>>>> contain 6-character strings that begin with a contant substring AA and
    > >>>>> that are also present as identical AA1234 strings in a saved file.
    > >>>>> For example, if the output, a listing of files on a number of servers
    > >>>>> created with the 'find' command, contains 'AA1234', and if the
    > >>>>> 'AA1234' string exists anywhere in the master file, I'd like to have
    > >>>>> the original line displayed in the output (the AA part is a constant,
    > >>>>> but 1234 is a variable part).
    > >>>>> Any idea what combination of the standard Unix utilities (grep, sed,
    > >>>>> awk, find...) could be used to solve the problem?
    > >>>>> z.entropic
    > >>>> grep -f master_file listing_file

    >
    > >>>> --
    > >>>> Barry Margolin, bar...@alum.mit.edu
    > >>>> Arlington, MA
    > >>>> *** PLEASE don't copy me on replies, I'll read them in the group ***

    >
    > >>> Unfortunately, that would be too easy... *These are embedded strings
    > >>> in both lists/files, not separate words, something that would have to
    > >>> be extracted by, e.g, sed 's/.*\(AA....\).*/\1/' on the fly to create
    > >>> a list for comparison... *The only marker is the AA string followed by
    > >>> 4 characters.

    >
    > >> ...

    >
    > >> sed 's/.*\(AA....\).*/\1/' saved_file | fgrep -f - search_file

    >
    > Oh my dear same mistake here.
    > Correction:
    >
    > sed -n 's/.*\(AA....\).*/\1/p' saved_file | fgrep -f - search_file
    >
    > or with a newer GNU grep:
    >
    > grep -o 'AA....' saved_file | fgrep -f - search_file
    >
    > --
    > Michael Tosch @ hp : com- Hide quoted text -
    >
    > - Show quoted text -


    Thanks, Michael; for some reason, the expression simply returns the
    extracted string, not the match by grep...

    My actual script is like this:

    find J: -name "AA*.dat" -mtime -1 -exec ls -l {} ; | sed 's/.*\(AA....
    \).*/\1/' | grep -f active.list

    where J: is a mapped network drive.

    The part of the expression until 'grep' extracts 6-character strings
    from the output, but then grep fails to match it with the lines in
    active.list that contain these strings and prints out the entire
    output until '| grep'--and I have no idea why...

    z.entropic

  8. Re: find lines containing identical strings in two files

    z.entropic wrote:
    > On Apr 8, 3:16 pm, Michael Tosch
    > wrote:
    >> Michael Tosch wrote:
    >>> Michael Tosch wrote:
    >>>> z.entropic wrote:
    >>>>> On Apr 7, 10:04 pm, Barry Margolin wrote:
    >>>>>> In article
    >>>>>> <73dbdce2-10a2-4b76-9963-1e172f654...@c65g2000hsa.googlegroups.com>,
    >>>>>> "z.entropic" wrote:
    >>>>>>> I'd like to print out all the lines in the standard output that
    >>>>>>> contain 6-character strings that begin with a contant substring AA and
    >>>>>>> that are also present as identical AA1234 strings in a saved file.
    >>>>>>> For example, if the output, a listing of files on a number of servers
    >>>>>>> created with the 'find' command, contains 'AA1234', and if the
    >>>>>>> 'AA1234' string exists anywhere in the master file, I'd like to have
    >>>>>>> the original line displayed in the output (the AA part is a constant,
    >>>>>>> but 1234 is a variable part).
    >>>>>>> Any idea what combination of the standard Unix utilities (grep, sed,
    >>>>>>> awk, find...) could be used to solve the problem?
    >>>>>>> z.entropic
    >>>>>> grep -f master_file listing_file
    >>>>>> --
    >>>>>> Barry Margolin, bar...@alum.mit.edu
    >>>>>> Arlington, MA
    >>>>>> *** PLEASE don't copy me on replies, I'll read them in the group ***
    >>>>> Unfortunately, that would be too easy... These are embedded strings
    >>>>> in both lists/files, not separate words, something that would have to
    >>>>> be extracted by, e.g, sed 's/.*\(AA....\).*/\1/' on the fly to create
    >>>>> a list for comparison... The only marker is the AA string followed by
    >>>>> 4 characters.
    >>>> ...
    >>>> sed 's/.*\(AA....\).*/\1/' saved_file | fgrep -f - search_file

    >> Oh my dear same mistake here.
    >> Correction:
    >>
    >> sed -n 's/.*\(AA....\).*/\1/p' saved_file | fgrep -f - search_file
    >>
    >> or with a newer GNU grep:
    >>
    >> grep -o 'AA....' saved_file | fgrep -f - search_file
    >>
    >> --
    >> Michael Tosch @ hp : com- Hide quoted text -
    >>
    >> - Show quoted text -

    >
    > Thanks, Michael; for some reason, the expression simply returns the
    > extracted string, not the match by grep...


    I think you have omitted the single "-"

    >
    > My actual script is like this:
    >
    > find J: -name "AA*.dat" -mtime -1 -exec ls -l {} ; | sed 's/.*\(AA....
    > \).*/\1/' | grep -f active.list
    >
    > where J: is a mapped network drive.
    >
    > The part of the expression until 'grep' extracts 6-character strings
    > from the output, but then grep fails to match it with the lines in
    > active.list that contain these strings and prints out the entire
    > output until '| grep'--and I have no idea why...
    >
    > z.entropic


    .... because grep searches the pipe i.e. cannot return more than the
    6-character strings.

    Therefore it must be
    .... | grep -f - active.list

    and grep searches the active.list, and "-" is the pipe,
    if you have GNU grep.
    Standard grep treats "-" as a file with that name. In this case
    you must use a tempfile:
    find ... > tempfile
    grep -f tempfile active.list


    --
    Michael Tosch @ hp : com

  9. Re: find lines containing identical strings in two files

    On Apr 10, 3:16*pm, Michael Tosch
    wrote:
    > z.entropic wrote:
    > > On Apr 8, 3:16 pm, Michael Tosch
    > > wrote:
    > >> Michael Tosch wrote:
    > >>> Michael Tosch wrote:
    > >>>> z.entropic wrote:
    > >>>>> On Apr 7, 10:04 pm, Barry Margolin wrote:
    > >>>>>> In article
    > >>>>>> <73dbdce2-10a2-4b76-9963-1e172f654...@c65g2000hsa.googlegroups.com>,
    > >>>>>> *"z.entropic" wrote:
    > >>>>>>> I'd like to print out all the lines in the standard output that
    > >>>>>>> contain 6-character strings that begin with a contant substring AAand
    > >>>>>>> that are also present as identical AA1234 strings in a saved file.
    > >>>>>>> For example, if the output, a listing of files on a number of servers
    > >>>>>>> created with the 'find' command, contains 'AA1234', and if the
    > >>>>>>> 'AA1234' string exists anywhere in the master file, I'd like to have
    > >>>>>>> the original line displayed in the output (the AA part is a constant,
    > >>>>>>> but 1234 is a variable part).
    > >>>>>>> Any idea what combination of the standard Unix utilities (grep, sed,
    > >>>>>>> awk, find...) could be used to solve the problem?
    > >>>>>>> z.entropic
    > >>>>>> grep -f master_file listing_file
    > >>>>>> --
    > >>>>>> Barry Margolin, bar...@alum.mit.edu
    > >>>>>> Arlington, MA
    > >>>>>> *** PLEASE don't copy me on replies, I'll read them in the group ***
    > >>>>> Unfortunately, that would be too easy... *These are embedded strings
    > >>>>> in both lists/files, not separate words, something that would have to
    > >>>>> be extracted by, e.g, sed 's/.*\(AA....\).*/\1/' on the fly to create
    > >>>>> a list for comparison... *The only marker is the AA string followed by
    > >>>>> 4 characters.
    > >>>> ...
    > >>>> sed 's/.*\(AA....\).*/\1/' saved_file | fgrep -f - search_file
    > >> Oh my dear same mistake here.
    > >> Correction:

    >
    > >> sed -n 's/.*\(AA....\).*/\1/p' saved_file | fgrep -f - search_file

    >
    > >> or with a newer GNU grep:

    >
    > >> grep -o 'AA....' saved_file | fgrep -f - search_file

    >
    > >> --
    > >> Michael Tosch @ hp : com- Hide quoted text -

    >
    > >> - Show quoted text -

    >
    > > Thanks, Michael; for some reason, the expression simply returns the
    > > extracted string, not the match by grep...

    >
    > I think you have omitted the single "-"
    >
    >
    >
    > > My actual script is like this:

    >
    > > find J: -name "AA*.dat" -mtime -1 -exec ls -l {} ; | sed 's/.*\(AA....
    > > \).*/\1/' | grep -f active.list

    >
    > > where J: is a mapped network drive.

    >
    > > The part of the expression until 'grep' extracts 6-character strings
    > > from the output, but then grep fails to match it with the lines in
    > > active.list that contain these strings and prints out the entire
    > > output until '| grep'--and I have no idea why...

    >
    > > z.entropic

    >
    > ... because grep searches the pipe i.e. cannot return more than the
    > 6-character strings.
    >
    > Therefore it must be
    > ... | grep -f - active.list
    >
    > and grep searches the active.list, and "-" is the pipe,
    > if you have GNU grep.
    > Standard grep treats "-" as a file with that name. In this case
    > you must use a tempfile:
    > find ... > tempfile
    > grep -f tempfile active.list
    >
    > --
    > Michael Tosch @ hp : com- Hide quoted text -
    >
    > - Show quoted text -


    Aaah... That's what I've been wondering about.. Since I use a
    standard grep (MKS Dev ToolKit's utilities), it seems that a one-liner
    may not be possible.

    I apprecite your help.

    z.e.

  10. Re: find lines containing identical strings in two files

    z.entropic wrote:
    > On Apr 10, 3:16 pm, Michael Tosch
    > wrote:
    >> z.entropic wrote:
    >>> On Apr 8, 3:16 pm, Michael Tosch
    >>> wrote:
    >>>> Michael Tosch wrote:
    >>>>> Michael Tosch wrote:
    >>>>>> z.entropic wrote:
    >>>>>>> On Apr 7, 10:04 pm, Barry Margolin wrote:
    >>>>>>>> In article
    >>>>>>>> <73dbdce2-10a2-4b76-9963-1e172f654...@c65g2000hsa.googlegroups.com>,
    >>>>>>>> "z.entropic" wrote:
    >>>>>>>>> I'd like to print out all the lines in the standard output that
    >>>>>>>>> contain 6-character strings that begin with a contant substring AA and
    >>>>>>>>> that are also present as identical AA1234 strings in a saved file.
    >>>>>>>>> For example, if the output, a listing of files on a number of servers
    >>>>>>>>> created with the 'find' command, contains 'AA1234', and if the
    >>>>>>>>> 'AA1234' string exists anywhere in the master file, I'd like to have
    >>>>>>>>> the original line displayed in the output (the AA part is a constant,
    >>>>>>>>> but 1234 is a variable part).
    >>>>>>>>> Any idea what combination of the standard Unix utilities (grep, sed,
    >>>>>>>>> awk, find...) could be used to solve the problem?
    >>>>>>>>> z.entropic
    >>>>>>>> grep -f master_file listing_file
    >>>>>>>> --
    >>>>>>>> Barry Margolin, bar...@alum.mit.edu
    >>>>>>>> Arlington, MA
    >>>>>>>> *** PLEASE don't copy me on replies, I'll read them in the group ***
    >>>>>>> Unfortunately, that would be too easy... These are embedded strings
    >>>>>>> in both lists/files, not separate words, something that would have to
    >>>>>>> be extracted by, e.g, sed 's/.*\(AA....\).*/\1/' on the fly to create
    >>>>>>> a list for comparison... The only marker is the AA string followed by
    >>>>>>> 4 characters.
    >>>>>> ...
    >>>>>> sed 's/.*\(AA....\).*/\1/' saved_file | fgrep -f - search_file
    >>>> Oh my dear same mistake here.
    >>>> Correction:
    >>>> sed -n 's/.*\(AA....\).*/\1/p' saved_file | fgrep -f - search_file
    >>>> or with a newer GNU grep:
    >>>> grep -o 'AA....' saved_file | fgrep -f - search_file
    >>>> --
    >>>> Michael Tosch @ hp : com- Hide quoted text -
    >>>> - Show quoted text -
    >>> Thanks, Michael; for some reason, the expression simply returns the
    >>> extracted string, not the match by grep...

    >> I think you have omitted the single "-"
    >>
    >>
    >>
    >>> My actual script is like this:
    >>> find J: -name "AA*.dat" -mtime -1 -exec ls -l {} ; | sed 's/.*\(AA....
    >>> \).*/\1/' | grep -f active.list
    >>> where J: is a mapped network drive.
    >>> The part of the expression until 'grep' extracts 6-character strings
    >>> from the output, but then grep fails to match it with the lines in
    >>> active.list that contain these strings and prints out the entire
    >>> output until '| grep'--and I have no idea why...
    >>> z.entropic

    >> ... because grep searches the pipe i.e. cannot return more than the
    >> 6-character strings.
    >>
    >> Therefore it must be
    >> ... | grep -f - active.list
    >>
    >> and grep searches the active.list, and "-" is the pipe,
    >> if you have GNU grep.
    >> Standard grep treats "-" as a file with that name. In this case
    >> you must use a tempfile:
    >> find ... > tempfile
    >> grep -f tempfile active.list
    >>
    >> --
    >> Michael Tosch @ hp : com- Hide quoted text -
    >>
    >> - Show quoted text -

    >
    > Aaah... That's what I've been wondering about.. Since I use a
    > standard grep (MKS Dev ToolKit's utilities), it seems that a one-liner
    > may not be possible.
    >
    > I apprecite your help.
    >
    > z.e.


    Or take my ealier awk solution; without a temp file and much faster.
    I have put commands on multi-line for better readability.
    You can put all commands in one line - separated with semicolons.

    --
    Michael Tosch @ hp : com

  11. Re: find lines containing identical strings in two files

    On Apr 11, 5:24*am, Michael Tosch
    wrote:
    > z.entropic wrote:
    > > On Apr 10, 3:16 pm, Michael Tosch
    > > wrote:
    > >> z.entropic wrote:
    > >>> On Apr 8, 3:16 pm, Michael Tosch
    > >>> wrote:
    > >>>> Michael Tosch wrote:
    > >>>>> Michael Tosch wrote:
    > >>>>>> z.entropic wrote:
    > >>>>>>> On Apr 7, 10:04 pm, Barry Margolin wrote:
    > >>>>>>>> In article
    > >>>>>>>> <73dbdce2-10a2-4b76-9963-1e172f654...@c65g2000hsa.googlegroups.com>,
    > >>>>>>>> *"z.entropic" wrote:
    > >>>>>>>>> I'd like to print out all the lines in the standard output that
    > >>>>>>>>> contain 6-character strings that begin with a contant substring AA and
    > >>>>>>>>> that are also present as identical AA1234 strings in a saved file.
    > >>>>>>>>> For example, if the output, a listing of files on a number of servers
    > >>>>>>>>> created with the 'find' command, contains 'AA1234', and if the
    > >>>>>>>>> 'AA1234' string exists anywhere in the master file, I'd like to have
    > >>>>>>>>> the original line displayed in the output (the AA part is a constant,
    > >>>>>>>>> but 1234 is a variable part).
    > >>>>>>>>> Any idea what combination of the standard Unix utilities (grep, sed,
    > >>>>>>>>> awk, find...) could be used to solve the problem?
    > >>>>>>>>> z.entropic
    > >>>>>>>> grep -f master_file listing_file
    > >>>>>>>> --
    > >>>>>>>> Barry Margolin, bar...@alum.mit.edu
    > >>>>>>>> Arlington, MA
    > >>>>>>>> *** PLEASE don't copy me on replies, I'll read them in the group ***
    > >>>>>>> Unfortunately, that would be too easy... *These are embedded strings
    > >>>>>>> in both lists/files, not separate words, something that would haveto
    > >>>>>>> be extracted by, e.g, sed 's/.*\(AA....\).*/\1/' on the fly to create
    > >>>>>>> a list for comparison... *The only marker is the AA string followed by
    > >>>>>>> 4 characters.
    > >>>>>> ...
    > >>>>>> sed 's/.*\(AA....\).*/\1/' saved_file | fgrep -f - search_file
    > >>>> Oh my dear same mistake here.
    > >>>> Correction:
    > >>>> sed -n 's/.*\(AA....\).*/\1/p' saved_file | fgrep -f - search_file
    > >>>> or with a newer GNU grep:
    > >>>> grep -o 'AA....' saved_file | fgrep -f - search_file
    > >>>> --
    > >>>> Michael Tosch @ hp : com- Hide quoted text -
    > >>>> - Show quoted text -
    > >>> Thanks, Michael; for some reason, the expression simply returns the
    > >>> extracted string, not the match by grep...
    > >> I think you have omitted the single "-"

    >
    > >>> My actual script is like this:
    > >>> find J: -name "AA*.dat" -mtime -1 -exec ls -l {} ; | sed 's/.*\(AA....
    > >>> \).*/\1/' | grep -f active.list
    > >>> where J: is a mapped network drive.
    > >>> The part of the expression until 'grep' extracts 6-character strings
    > >>> from the output, but then grep fails to match it with the lines in
    > >>> active.list that contain these strings and prints out the entire
    > >>> output until '| grep'--and I have no idea why...
    > >>> z.entropic
    > >> ... because grep searches the pipe i.e. cannot return more than the
    > >> 6-character strings.

    >
    > >> Therefore it must be
    > >> ... | grep -f - active.list

    >
    > >> and grep searches the active.list, and "-" is the pipe,
    > >> if you have GNU grep.
    > >> Standard grep treats "-" as a file with that name. In this case
    > >> you must use a tempfile:
    > >> find ... > tempfile
    > >> grep -f tempfile active.list

    >
    > >> --
    > >> Michael Tosch @ hp : com- Hide quoted text -

    >
    > >> - Show quoted text -

    >
    > > Aaah... That's what I've been wondering about.. *Since I use a
    > > standard grep (MKS Dev ToolKit's utilities), it seems that a one-liner
    > > may not be possible.

    >
    > > I apprecite your help.

    >
    > > z.e.

    >
    > Or take my ealier awk solution; without a temp file and much faster.
    > I have put commands on multi-line for better readability.
    > You can put all commands in one line - separated with semicolons.
    >
    > --
    > Michael Tosch @ hp : com- Hide quoted text -
    >
    > - Show quoted text -


    ...but your awk script again uses gawk, with its hyphen for streaming
    input, doens't it? What if my search_file is an output of the ls
    command, perhaps filtered through grep -v?

    awk '
    (i=index($0,"AA"))==0 {next}
    {x=substr($0,i,6)}
    FILENAME=="-" {a[x]=1;next}
    a[x]==1
    ' - search_file < saved_file

    z.e.

  12. Re: find lines containing identical strings in two files

    z.entropic wrote:

    > On Apr 11, 5:24б*am, Michael Tosch
    > wrote:
    >>
    >> Or take my ealier awk solution; without a temp file and much faster.


    > ...but your awk script again uses gawk, with its hyphen for streaming
    > input, doens't it?


    I doubt if any version of awk exists that doesn't treat a filename
    argument of "-" as meaning standard input. It's required by POSIX,
    and even the historical version on Solaris (/bin/awk) does it.

    --
    Geoff Clare

  13. Re: find lines containing identical strings in two files

    z.entropic wrote:
    > I'd like to print out all the lines in the standard output that
    > contain 6-character strings that begin with a contant substring AA and
    > that are also present as identical AA1234 strings in a saved file.
    >
    > For example, if the output, a listing of files on a number of servers
    > created with the 'find' command, contains 'AA1234', and if the
    > 'AA1234' string exists anywhere in the master file, I'd like to have
    > the original line displayed in the output (the AA part is a constant,
    > but 1234 is a variable part).
    >
    > Any idea what combination of the standard Unix utilities (grep, sed,
    > awk, find...) could be used to solve the problem?


    Well, perhaps not precisely the requested tool, but perl will do it
    quite nicely:

    $ cat saved_file
    AA1234
    AA5678
    $ cat saved_file | ./foo.pl
    AA1234
    AA5678
    $ echo 'AA123456789
    > AA234567
    > AAx' | ./foo.pl

    AA123456789
    $ expand -i4 < foo.pl
    #!/usr/bin/perl

    $^W=1;
    use strict;

    open(SAVED_FILE,'<','saved_file') or
    die "$0: failed to open saved_file, aborting";

    my %matchhash;

    while(){
    chomp;

    #only consider valid strings
    if(/^AA....$/o){
    #just track the string as a key in our hash
    $matchhash{$_}=undef;
    }
    };
    close(SAVED_FILE) or
    die "$0: failed to close saved_file, aborting";

    (! scalar keys %matchhash) &&
    #nothing to do
    exit 0;

    while(<>){
    /^AA..../o || next;
    if(exists $matchhash{substr($_,0,6)}){
    print;
    };
    };
    $

+ Reply to Thread