find lines containing identical strings in two files - Unix
This is a discussion on find lines containing identical strings in two files - Unix ; I'd like to print out all the lines in the standard output that
contain 6-character strings that begin with a contant substring AA and
that are also present as identical AA1234 strings in a saved file.
For example, if the ...
-
find lines containing identical strings in two files
I'd like to print out all the lines in the standard output that
contain 6-character strings that begin with a contant substring AA and
that are also present as identical AA1234 strings in a saved file.
For example, if the output, a listing of files on a number of servers
created with the 'find' command, contains 'AA1234', and if the
'AA1234' string exists anywhere in the master file, I'd like to have
the original line displayed in the output (the AA part is a constant,
but 1234 is a variable part).
Any idea what combination of the standard Unix utilities (grep, sed,
awk, find...) could be used to solve the problem?
z.entropic
-
Re: find lines containing identical strings in two files
In article
<73dbdce2-10a2-4b76-9963-1e172f654f43@c65g2000hsa.googlegroups.com>,
"z.entropic" wrote:
> I'd like to print out all the lines in the standard output that
> contain 6-character strings that begin with a contant substring AA and
> that are also present as identical AA1234 strings in a saved file.
>
> For example, if the output, a listing of files on a number of servers
> created with the 'find' command, contains 'AA1234', and if the
> 'AA1234' string exists anywhere in the master file, I'd like to have
> the original line displayed in the output (the AA part is a constant,
> but 1234 is a variable part).
>
> Any idea what combination of the standard Unix utilities (grep, sed,
> awk, find...) could be used to solve the problem?
>
> z.entropic
grep -f master_file listing_file
--
Barry Margolin, barmar@alum.mit.edu
Arlington, MA
*** PLEASE don't copy me on replies, I'll read them in the group ***
-
Re: find lines containing identical strings in two files
On Apr 7, 10:04*pm, Barry Margolin wrote:
> In article
> <73dbdce2-10a2-4b76-9963-1e172f654...@c65g2000hsa.googlegroups.com>,
>
> *"z.entropic" wrote:
> > I'd like to print out all the lines in the standard output that
> > contain 6-character strings that begin with a contant substring AA and
> > that are also present as identical AA1234 strings in a saved file.
>
> > For example, if the output, a listing of files on a number of servers
> > created with the 'find' command, contains 'AA1234', and if the
> > 'AA1234' string exists anywhere in the master file, I'd like to have
> > the original line displayed in the output (the AA part is a constant,
> > but 1234 is a variable part).
>
> > Any idea what combination of the standard Unix utilities (grep, sed,
> > awk, find...) could be used to solve the problem?
>
> > z.entropic
>
> grep -f master_file listing_file
>
> --
> Barry Margolin, bar...@alum.mit.edu
> Arlington, MA
> *** PLEASE don't copy me on replies, I'll read them in the group ***
Unfortunately, that would be too easy... These are embedded strings
in both lists/files, not separate words, something that would have to
be extracted by, e.g, sed 's/.*\(AA....\).*/\1/' on the fly to create
a list for comparison... The only marker is the AA string followed by
4 characters.
I see now I could use a two- or three-step process: first, extract the
strings form the std output and save them in a tmp file, then extract
the strings form the saved master file, then use grep -f f1 f2 to
compare ... I was thinking more of a one-liner, but will go this route
if it's not as simple.
The issue is this: how to extract a string from stdout (using the
above sed expression) and pass is directly to grep for comparison with
another file, and then print the entire input line from which this
expression was originally extracted.
z.entropic
-
Re: find lines containing identical strings in two files
z.entropic wrote:
> On Apr 7, 10:04 pm, Barry Margolin wrote:
>> In article
>> <73dbdce2-10a2-4b76-9963-1e172f654...@c65g2000hsa.googlegroups.com>,
>>
>> "z.entropic" wrote:
>>> I'd like to print out all the lines in the standard output that
>>> contain 6-character strings that begin with a contant substring AA and
>>> that are also present as identical AA1234 strings in a saved file.
>>> For example, if the output, a listing of files on a number of servers
>>> created with the 'find' command, contains 'AA1234', and if the
>>> 'AA1234' string exists anywhere in the master file, I'd like to have
>>> the original line displayed in the output (the AA part is a constant,
>>> but 1234 is a variable part).
>>> Any idea what combination of the standard Unix utilities (grep, sed,
>>> awk, find...) could be used to solve the problem?
>>> z.entropic
>> grep -f master_file listing_file
>>
>> --
>> Barry Margolin, bar...@alum.mit.edu
>> Arlington, MA
>> *** PLEASE don't copy me on replies, I'll read them in the group ***
>
> Unfortunately, that would be too easy... These are embedded strings
> in both lists/files, not separate words, something that would have to
> be extracted by, e.g, sed 's/.*\(AA....\).*/\1/' on the fly to create
> a list for comparison... The only marker is the AA string followed by
> 4 characters.
>
....
sed 's/.*\(AA....\).*/\1/' saved_file | fgrep -f - search_file
works with GNU fgrep.
awk '
{x=substr($0,index($0,"AA"),6)}
FILENAME=="-" {a[x]=1;next}
a[x]==1
' - search_file < saved_file
works on any system.
--
Michael Tosch @ hp : com
-
Re: find lines containing identical strings in two files
Michael Tosch wrote:
> z.entropic wrote:
>> On Apr 7, 10:04 pm, Barry Margolin wrote:
>>> In article
>>> <73dbdce2-10a2-4b76-9963-1e172f654...@c65g2000hsa.googlegroups.com>,
>>>
>>> "z.entropic" wrote:
>>>> I'd like to print out all the lines in the standard output that
>>>> contain 6-character strings that begin with a contant substring AA and
>>>> that are also present as identical AA1234 strings in a saved file.
>>>> For example, if the output, a listing of files on a number of servers
>>>> created with the 'find' command, contains 'AA1234', and if the
>>>> 'AA1234' string exists anywhere in the master file, I'd like to have
>>>> the original line displayed in the output (the AA part is a constant,
>>>> but 1234 is a variable part).
>>>> Any idea what combination of the standard Unix utilities (grep, sed,
>>>> awk, find...) could be used to solve the problem?
>>>> z.entropic
>>> grep -f master_file listing_file
>>>
>>> --
>>> Barry Margolin, bar...@alum.mit.edu
>>> Arlington, MA
>>> *** PLEASE don't copy me on replies, I'll read them in the group ***
>>
>> Unfortunately, that would be too easy... These are embedded strings
>> in both lists/files, not separate words, something that would have to
>> be extracted by, e.g, sed 's/.*\(AA....\).*/\1/' on the fly to create
>> a list for comparison... The only marker is the AA string followed by
>> 4 characters.
>>
> ...
>
>
> sed 's/.*\(AA....\).*/\1/' saved_file | fgrep -f - search_file
>
> works with GNU fgrep.
>
> awk '
> {x=substr($0,index($0,"AA"),6)}
> FILENAME=="-" {a[x]=1;next}
> a[x]==1
> ' - search_file < saved_file
>
> works on any system.
>
>
Correction:
awk '
(i=index($0,"AA"))==0 {next}
{x=substr($0,i,6)}
FILENAME=="-" {a[x]=1;next}
a[x]==1
' - search_file < saved_file
--
Michael Tosch @ hp : com
-
Re: find lines containing identical strings in two files
Michael Tosch wrote:
> Michael Tosch wrote:
>> z.entropic wrote:
>>> On Apr 7, 10:04 pm, Barry Margolin wrote:
>>>> In article
>>>> <73dbdce2-10a2-4b76-9963-1e172f654...@c65g2000hsa.googlegroups.com>,
>>>>
>>>> "z.entropic" wrote:
>>>>> I'd like to print out all the lines in the standard output that
>>>>> contain 6-character strings that begin with a contant substring AA and
>>>>> that are also present as identical AA1234 strings in a saved file.
>>>>> For example, if the output, a listing of files on a number of servers
>>>>> created with the 'find' command, contains 'AA1234', and if the
>>>>> 'AA1234' string exists anywhere in the master file, I'd like to have
>>>>> the original line displayed in the output (the AA part is a constant,
>>>>> but 1234 is a variable part).
>>>>> Any idea what combination of the standard Unix utilities (grep, sed,
>>>>> awk, find...) could be used to solve the problem?
>>>>> z.entropic
>>>> grep -f master_file listing_file
>>>>
>>>> --
>>>> Barry Margolin, bar...@alum.mit.edu
>>>> Arlington, MA
>>>> *** PLEASE don't copy me on replies, I'll read them in the group ***
>>>
>>> Unfortunately, that would be too easy... These are embedded strings
>>> in both lists/files, not separate words, something that would have to
>>> be extracted by, e.g, sed 's/.*\(AA....\).*/\1/' on the fly to create
>>> a list for comparison... The only marker is the AA string followed by
>>> 4 characters.
>>>
>> ...
>>
>>
>> sed 's/.*\(AA....\).*/\1/' saved_file | fgrep -f - search_file
>>
Oh my dear same mistake here.
Correction:
sed -n 's/.*\(AA....\).*/\1/p' saved_file | fgrep -f - search_file
or with a newer GNU grep:
grep -o 'AA....' saved_file | fgrep -f - search_file
--
Michael Tosch @ hp : com
-
Re: find lines containing identical strings in two files
On Apr 8, 3:16*pm, Michael Tosch
wrote:
> Michael Tosch wrote:
> > Michael Tosch wrote:
> >> z.entropic wrote:
> >>> On Apr 7, 10:04 pm, Barry Margolin wrote:
> >>>> In article
> >>>> <73dbdce2-10a2-4b76-9963-1e172f654...@c65g2000hsa.googlegroups.com>,
>
> >>>> *"z.entropic" wrote:
> >>>>> I'd like to print out all the lines in the standard output that
> >>>>> contain 6-character strings that begin with a contant substring AA and
> >>>>> that are also present as identical AA1234 strings in a saved file.
> >>>>> For example, if the output, a listing of files on a number of servers
> >>>>> created with the 'find' command, contains 'AA1234', and if the
> >>>>> 'AA1234' string exists anywhere in the master file, I'd like to have
> >>>>> the original line displayed in the output (the AA part is a constant,
> >>>>> but 1234 is a variable part).
> >>>>> Any idea what combination of the standard Unix utilities (grep, sed,
> >>>>> awk, find...) could be used to solve the problem?
> >>>>> z.entropic
> >>>> grep -f master_file listing_file
>
> >>>> --
> >>>> Barry Margolin, bar...@alum.mit.edu
> >>>> Arlington, MA
> >>>> *** PLEASE don't copy me on replies, I'll read them in the group ***
>
> >>> Unfortunately, that would be too easy... *These are embedded strings
> >>> in both lists/files, not separate words, something that would have to
> >>> be extracted by, e.g, sed 's/.*\(AA....\).*/\1/' on the fly to create
> >>> a list for comparison... *The only marker is the AA string followed by
> >>> 4 characters.
>
> >> ...
>
> >> sed 's/.*\(AA....\).*/\1/' saved_file | fgrep -f - search_file
>
> Oh my dear same mistake here.
> Correction:
>
> sed -n 's/.*\(AA....\).*/\1/p' saved_file | fgrep -f - search_file
>
> or with a newer GNU grep:
>
> grep -o 'AA....' saved_file | fgrep -f - search_file
>
> --
> Michael Tosch @ hp : com- Hide quoted text -
>
> - Show quoted text -
Thanks, Michael; for some reason, the expression simply returns the
extracted string, not the match by grep...
My actual script is like this:
find J: -name "AA*.dat" -mtime -1 -exec ls -l {} ; | sed 's/.*\(AA....
\).*/\1/' | grep -f active.list
where J: is a mapped network drive.
The part of the expression until 'grep' extracts 6-character strings
from the output, but then grep fails to match it with the lines in
active.list that contain these strings and prints out the entire
output until '| grep'--and I have no idea why...
z.entropic
-
Re: find lines containing identical strings in two files
z.entropic wrote:
> On Apr 8, 3:16 pm, Michael Tosch
> wrote:
>> Michael Tosch wrote:
>>> Michael Tosch wrote:
>>>> z.entropic wrote:
>>>>> On Apr 7, 10:04 pm, Barry Margolin wrote:
>>>>>> In article
>>>>>> <73dbdce2-10a2-4b76-9963-1e172f654...@c65g2000hsa.googlegroups.com>,
>>>>>> "z.entropic" wrote:
>>>>>>> I'd like to print out all the lines in the standard output that
>>>>>>> contain 6-character strings that begin with a contant substring AA and
>>>>>>> that are also present as identical AA1234 strings in a saved file.
>>>>>>> For example, if the output, a listing of files on a number of servers
>>>>>>> created with the 'find' command, contains 'AA1234', and if the
>>>>>>> 'AA1234' string exists anywhere in the master file, I'd like to have
>>>>>>> the original line displayed in the output (the AA part is a constant,
>>>>>>> but 1234 is a variable part).
>>>>>>> Any idea what combination of the standard Unix utilities (grep, sed,
>>>>>>> awk, find...) could be used to solve the problem?
>>>>>>> z.entropic
>>>>>> grep -f master_file listing_file
>>>>>> --
>>>>>> Barry Margolin, bar...@alum.mit.edu
>>>>>> Arlington, MA
>>>>>> *** PLEASE don't copy me on replies, I'll read them in the group ***
>>>>> Unfortunately, that would be too easy... These are embedded strings
>>>>> in both lists/files, not separate words, something that would have to
>>>>> be extracted by, e.g, sed 's/.*\(AA....\).*/\1/' on the fly to create
>>>>> a list for comparison... The only marker is the AA string followed by
>>>>> 4 characters.
>>>> ...
>>>> sed 's/.*\(AA....\).*/\1/' saved_file | fgrep -f - search_file
>> Oh my dear same mistake here.
>> Correction:
>>
>> sed -n 's/.*\(AA....\).*/\1/p' saved_file | fgrep -f - search_file
>>
>> or with a newer GNU grep:
>>
>> grep -o 'AA....' saved_file | fgrep -f - search_file
>>
>> --
>> Michael Tosch @ hp : com- Hide quoted text -
>>
>> - Show quoted text -
>
> Thanks, Michael; for some reason, the expression simply returns the
> extracted string, not the match by grep...
I think you have omitted the single "-"
>
> My actual script is like this:
>
> find J: -name "AA*.dat" -mtime -1 -exec ls -l {} ; | sed 's/.*\(AA....
> \).*/\1/' | grep -f active.list
>
> where J: is a mapped network drive.
>
> The part of the expression until 'grep' extracts 6-character strings
> from the output, but then grep fails to match it with the lines in
> active.list that contain these strings and prints out the entire
> output until '| grep'--and I have no idea why...
>
> z.entropic
.... because grep searches the pipe i.e. cannot return more than the
6-character strings.
Therefore it must be
.... | grep -f - active.list
and grep searches the active.list, and "-" is the pipe,
if you have GNU grep.
Standard grep treats "-" as a file with that name. In this case
you must use a tempfile:
find ... > tempfile
grep -f tempfile active.list
--
Michael Tosch @ hp : com
-
Re: find lines containing identical strings in two files
On Apr 10, 3:16*pm, Michael Tosch
wrote:
> z.entropic wrote:
> > On Apr 8, 3:16 pm, Michael Tosch
> > wrote:
> >> Michael Tosch wrote:
> >>> Michael Tosch wrote:
> >>>> z.entropic wrote:
> >>>>> On Apr 7, 10:04 pm, Barry Margolin wrote:
> >>>>>> In article
> >>>>>> <73dbdce2-10a2-4b76-9963-1e172f654...@c65g2000hsa.googlegroups.com>,
> >>>>>> *"z.entropic" wrote:
> >>>>>>> I'd like to print out all the lines in the standard output that
> >>>>>>> contain 6-character strings that begin with a contant substring AAand
> >>>>>>> that are also present as identical AA1234 strings in a saved file.
> >>>>>>> For example, if the output, a listing of files on a number of servers
> >>>>>>> created with the 'find' command, contains 'AA1234', and if the
> >>>>>>> 'AA1234' string exists anywhere in the master file, I'd like to have
> >>>>>>> the original line displayed in the output (the AA part is a constant,
> >>>>>>> but 1234 is a variable part).
> >>>>>>> Any idea what combination of the standard Unix utilities (grep, sed,
> >>>>>>> awk, find...) could be used to solve the problem?
> >>>>>>> z.entropic
> >>>>>> grep -f master_file listing_file
> >>>>>> --
> >>>>>> Barry Margolin, bar...@alum.mit.edu
> >>>>>> Arlington, MA
> >>>>>> *** PLEASE don't copy me on replies, I'll read them in the group ***
> >>>>> Unfortunately, that would be too easy... *These are embedded strings
> >>>>> in both lists/files, not separate words, something that would have to
> >>>>> be extracted by, e.g, sed 's/.*\(AA....\).*/\1/' on the fly to create
> >>>>> a list for comparison... *The only marker is the AA string followed by
> >>>>> 4 characters.
> >>>> ...
> >>>> sed 's/.*\(AA....\).*/\1/' saved_file | fgrep -f - search_file
> >> Oh my dear same mistake here.
> >> Correction:
>
> >> sed -n 's/.*\(AA....\).*/\1/p' saved_file | fgrep -f - search_file
>
> >> or with a newer GNU grep:
>
> >> grep -o 'AA....' saved_file | fgrep -f - search_file
>
> >> --
> >> Michael Tosch @ hp : com- Hide quoted text -
>
> >> - Show quoted text -
>
> > Thanks, Michael; for some reason, the expression simply returns the
> > extracted string, not the match by grep...
>
> I think you have omitted the single "-"
>
>
>
> > My actual script is like this:
>
> > find J: -name "AA*.dat" -mtime -1 -exec ls -l {} ; | sed 's/.*\(AA....
> > \).*/\1/' | grep -f active.list
>
> > where J: is a mapped network drive.
>
> > The part of the expression until 'grep' extracts 6-character strings
> > from the output, but then grep fails to match it with the lines in
> > active.list that contain these strings and prints out the entire
> > output until '| grep'--and I have no idea why...
>
> > z.entropic
>
> ... because grep searches the pipe i.e. cannot return more than the
> 6-character strings.
>
> Therefore it must be
> ... | grep -f - active.list
>
> and grep searches the active.list, and "-" is the pipe,
> if you have GNU grep.
> Standard grep treats "-" as a file with that name. In this case
> you must use a tempfile:
> find ... > tempfile
> grep -f tempfile active.list
>
> --
> Michael Tosch @ hp : com- Hide quoted text -
>
> - Show quoted text -
Aaah... That's what I've been wondering about.. Since I use a
standard grep (MKS Dev ToolKit's utilities), it seems that a one-liner
may not be possible.
I apprecite your help.
z.e.
-
Re: find lines containing identical strings in two files
z.entropic wrote:
> On Apr 10, 3:16 pm, Michael Tosch
> wrote:
>> z.entropic wrote:
>>> On Apr 8, 3:16 pm, Michael Tosch
>>> wrote:
>>>> Michael Tosch wrote:
>>>>> Michael Tosch wrote:
>>>>>> z.entropic wrote:
>>>>>>> On Apr 7, 10:04 pm, Barry Margolin wrote:
>>>>>>>> In article
>>>>>>>> <73dbdce2-10a2-4b76-9963-1e172f654...@c65g2000hsa.googlegroups.com>,
>>>>>>>> "z.entropic" wrote:
>>>>>>>>> I'd like to print out all the lines in the standard output that
>>>>>>>>> contain 6-character strings that begin with a contant substring AA and
>>>>>>>>> that are also present as identical AA1234 strings in a saved file.
>>>>>>>>> For example, if the output, a listing of files on a number of servers
>>>>>>>>> created with the 'find' command, contains 'AA1234', and if the
>>>>>>>>> 'AA1234' string exists anywhere in the master file, I'd like to have
>>>>>>>>> the original line displayed in the output (the AA part is a constant,
>>>>>>>>> but 1234 is a variable part).
>>>>>>>>> Any idea what combination of the standard Unix utilities (grep, sed,
>>>>>>>>> awk, find...) could be used to solve the problem?
>>>>>>>>> z.entropic
>>>>>>>> grep -f master_file listing_file
>>>>>>>> --
>>>>>>>> Barry Margolin, bar...@alum.mit.edu
>>>>>>>> Arlington, MA
>>>>>>>> *** PLEASE don't copy me on replies, I'll read them in the group ***
>>>>>>> Unfortunately, that would be too easy... These are embedded strings
>>>>>>> in both lists/files, not separate words, something that would have to
>>>>>>> be extracted by, e.g, sed 's/.*\(AA....\).*/\1/' on the fly to create
>>>>>>> a list for comparison... The only marker is the AA string followed by
>>>>>>> 4 characters.
>>>>>> ...
>>>>>> sed 's/.*\(AA....\).*/\1/' saved_file | fgrep -f - search_file
>>>> Oh my dear same mistake here.
>>>> Correction:
>>>> sed -n 's/.*\(AA....\).*/\1/p' saved_file | fgrep -f - search_file
>>>> or with a newer GNU grep:
>>>> grep -o 'AA....' saved_file | fgrep -f - search_file
>>>> --
>>>> Michael Tosch @ hp : com- Hide quoted text -
>>>> - Show quoted text -
>>> Thanks, Michael; for some reason, the expression simply returns the
>>> extracted string, not the match by grep...
>> I think you have omitted the single "-"
>>
>>
>>
>>> My actual script is like this:
>>> find J: -name "AA*.dat" -mtime -1 -exec ls -l {} ; | sed 's/.*\(AA....
>>> \).*/\1/' | grep -f active.list
>>> where J: is a mapped network drive.
>>> The part of the expression until 'grep' extracts 6-character strings
>>> from the output, but then grep fails to match it with the lines in
>>> active.list that contain these strings and prints out the entire
>>> output until '| grep'--and I have no idea why...
>>> z.entropic
>> ... because grep searches the pipe i.e. cannot return more than the
>> 6-character strings.
>>
>> Therefore it must be
>> ... | grep -f - active.list
>>
>> and grep searches the active.list, and "-" is the pipe,
>> if you have GNU grep.
>> Standard grep treats "-" as a file with that name. In this case
>> you must use a tempfile:
>> find ... > tempfile
>> grep -f tempfile active.list
>>
>> --
>> Michael Tosch @ hp : com- Hide quoted text -
>>
>> - Show quoted text -
>
> Aaah... That's what I've been wondering about.. Since I use a
> standard grep (MKS Dev ToolKit's utilities), it seems that a one-liner
> may not be possible.
>
> I apprecite your help.
>
> z.e.
Or take my ealier awk solution; without a temp file and much faster.
I have put commands on multi-line for better readability.
You can put all commands in one line - separated with semicolons.
--
Michael Tosch @ hp : com
-
Re: find lines containing identical strings in two files
On Apr 11, 5:24*am, Michael Tosch
wrote:
> z.entropic wrote:
> > On Apr 10, 3:16 pm, Michael Tosch
> > wrote:
> >> z.entropic wrote:
> >>> On Apr 8, 3:16 pm, Michael Tosch
> >>> wrote:
> >>>> Michael Tosch wrote:
> >>>>> Michael Tosch wrote:
> >>>>>> z.entropic wrote:
> >>>>>>> On Apr 7, 10:04 pm, Barry Margolin wrote:
> >>>>>>>> In article
> >>>>>>>> <73dbdce2-10a2-4b76-9963-1e172f654...@c65g2000hsa.googlegroups.com>,
> >>>>>>>> *"z.entropic" wrote:
> >>>>>>>>> I'd like to print out all the lines in the standard output that
> >>>>>>>>> contain 6-character strings that begin with a contant substring AA and
> >>>>>>>>> that are also present as identical AA1234 strings in a saved file.
> >>>>>>>>> For example, if the output, a listing of files on a number of servers
> >>>>>>>>> created with the 'find' command, contains 'AA1234', and if the
> >>>>>>>>> 'AA1234' string exists anywhere in the master file, I'd like to have
> >>>>>>>>> the original line displayed in the output (the AA part is a constant,
> >>>>>>>>> but 1234 is a variable part).
> >>>>>>>>> Any idea what combination of the standard Unix utilities (grep, sed,
> >>>>>>>>> awk, find...) could be used to solve the problem?
> >>>>>>>>> z.entropic
> >>>>>>>> grep -f master_file listing_file
> >>>>>>>> --
> >>>>>>>> Barry Margolin, bar...@alum.mit.edu
> >>>>>>>> Arlington, MA
> >>>>>>>> *** PLEASE don't copy me on replies, I'll read them in the group ***
> >>>>>>> Unfortunately, that would be too easy... *These are embedded strings
> >>>>>>> in both lists/files, not separate words, something that would haveto
> >>>>>>> be extracted by, e.g, sed 's/.*\(AA....\).*/\1/' on the fly to create
> >>>>>>> a list for comparison... *The only marker is the AA string followed by
> >>>>>>> 4 characters.
> >>>>>> ...
> >>>>>> sed 's/.*\(AA....\).*/\1/' saved_file | fgrep -f - search_file
> >>>> Oh my dear same mistake here.
> >>>> Correction:
> >>>> sed -n 's/.*\(AA....\).*/\1/p' saved_file | fgrep -f - search_file
> >>>> or with a newer GNU grep:
> >>>> grep -o 'AA....' saved_file | fgrep -f - search_file
> >>>> --
> >>>> Michael Tosch @ hp : com- Hide quoted text -
> >>>> - Show quoted text -
> >>> Thanks, Michael; for some reason, the expression simply returns the
> >>> extracted string, not the match by grep...
> >> I think you have omitted the single "-"
>
> >>> My actual script is like this:
> >>> find J: -name "AA*.dat" -mtime -1 -exec ls -l {} ; | sed 's/.*\(AA....
> >>> \).*/\1/' | grep -f active.list
> >>> where J: is a mapped network drive.
> >>> The part of the expression until 'grep' extracts 6-character strings
> >>> from the output, but then grep fails to match it with the lines in
> >>> active.list that contain these strings and prints out the entire
> >>> output until '| grep'--and I have no idea why...
> >>> z.entropic
> >> ... because grep searches the pipe i.e. cannot return more than the
> >> 6-character strings.
>
> >> Therefore it must be
> >> ... | grep -f - active.list
>
> >> and grep searches the active.list, and "-" is the pipe,
> >> if you have GNU grep.
> >> Standard grep treats "-" as a file with that name. In this case
> >> you must use a tempfile:
> >> find ... > tempfile
> >> grep -f tempfile active.list
>
> >> --
> >> Michael Tosch @ hp : com- Hide quoted text -
>
> >> - Show quoted text -
>
> > Aaah... That's what I've been wondering about.. *Since I use a
> > standard grep (MKS Dev ToolKit's utilities), it seems that a one-liner
> > may not be possible.
>
> > I apprecite your help.
>
> > z.e.
>
> Or take my ealier awk solution; without a temp file and much faster.
> I have put commands on multi-line for better readability.
> You can put all commands in one line - separated with semicolons.
>
> --
> Michael Tosch @ hp : com- Hide quoted text -
>
> - Show quoted text -
...but your awk script again uses gawk, with its hyphen for streaming
input, doens't it? What if my search_file is an output of the ls
command, perhaps filtered through grep -v?
awk '
(i=index($0,"AA"))==0 {next}
{x=substr($0,i,6)}
FILENAME=="-" {a[x]=1;next}
a[x]==1
' - search_file < saved_file
z.e.
-
Re: find lines containing identical strings in two files
z.entropic wrote:
> On Apr 11, 5:24б*am, Michael Tosch
> wrote:
>>
>> Or take my ealier awk solution; without a temp file and much faster.
> ...but your awk script again uses gawk, with its hyphen for streaming
> input, doens't it?
I doubt if any version of awk exists that doesn't treat a filename
argument of "-" as meaning standard input. It's required by POSIX,
and even the historical version on Solaris (/bin/awk) does it.
--
Geoff Clare
-
Re: find lines containing identical strings in two files
z.entropic wrote:
> I'd like to print out all the lines in the standard output that
> contain 6-character strings that begin with a contant substring AA and
> that are also present as identical AA1234 strings in a saved file.
>
> For example, if the output, a listing of files on a number of servers
> created with the 'find' command, contains 'AA1234', and if the
> 'AA1234' string exists anywhere in the master file, I'd like to have
> the original line displayed in the output (the AA part is a constant,
> but 1234 is a variable part).
>
> Any idea what combination of the standard Unix utilities (grep, sed,
> awk, find...) could be used to solve the problem?
Well, perhaps not precisely the requested tool, but perl will do it
quite nicely:
$ cat saved_file
AA1234
AA5678
$ cat saved_file | ./foo.pl
AA1234
AA5678
$ echo 'AA123456789
> AA234567
> AAx' | ./foo.pl
AA123456789
$ expand -i4 < foo.pl
#!/usr/bin/perl
$^W=1;
use strict;
open(SAVED_FILE,'<','saved_file') or
die "$0: failed to open saved_file, aborting";
my %matchhash;
while(){
chomp;
#only consider valid strings
if(/^AA....$/o){
#just track the string as a key in our hash
$matchhash{$_}=undef;
}
};
close(SAVED_FILE) or
die "$0: failed to close saved_file, aborting";
(! scalar keys %matchhash) &&
#nothing to do
exit 0;
while(<>){
/^AA..../o || next;
if(exists $matchhash{substr($_,0,6)}){
print;
};
};
$