Basic grep question - Unix

This is a discussion on Basic grep question - Unix ; How can grep identify an empty line combined with and followed directly by a new line with the string 'From' at the beginning, as in the mbox format, i.e.: ^$ combined with: ^From (I guess this is not it: grep ...

+ Reply to Thread
Results 1 to 7 of 7

Thread: Basic grep question

  1. Basic grep question

    How can grep identify an empty line combined with and followed directly by
    a new line with the string 'From' at the beginning, as in the mbox format,
    i.e.:

    ^$

    combined with:

    ^From

    (I guess this is not it: grep ^$^From mbox)

  2. Re: Basic grep question

    Fred wrote:
    > How can grep identify an empty line combined with and followed directly by
    > a new line with the string 'From' at the beginning, as in the mbox format,
    > i.e.:
    >
    > ^$
    >
    > combined with:
    >
    > ^From
    >
    > (I guess this is not it: grep ^$^From mbox)


    grep can hardly search for two consecutive lines,
    but sed can:

    sed -n 'x;G;/^\nFrom/p'


    --
    Michael Tosch @ hp : com

  3. Re: Basic grep question

    Michael Tosch wrote:

    > Fred wrote:
    > > How can grep identify an empty line combined with and followed directly
    > > by a new line with the string 'From' at the beginning, as in the mbox
    > > format, i.e.:
    > >
    > > ^$
    > >
    > > combined with:
    > >
    > > ^From
    > >
    > > (I guess this is not it: grep ^$^From mbox)

    >
    > grep can hardly search for two consecutive lines,
    > but sed can:
    >
    > sed -n 'x;G;/^\nFrom/p'
    >
    >


    Thanks for the quick tuturial!

    What does -n stand for?

    Also, how could I run this through csplit to create separate files from the
    start until the next occurance of each match?

    Any tips would be greatly appreciated!


  4. Re: Basic grep question

    Fred wrote:
    > Michael Tosch wrote:
    >
    >
    >>Fred wrote:
    >>
    >>>How can grep identify an empty line combined with and followed directly
    >>>by a new line with the string 'From' at the beginning, as in the mbox
    >>>format, i.e.:
    >>>
    >>>^$
    >>>
    >>>combined with:
    >>>
    >>>^From
    >>>
    >>>(I guess this is not it: grep ^$^From mbox)

    >>
    >>grep can hardly search for two consecutive lines,
    >>but sed can:
    >>
    >>sed -n 'x;G;/^\nFrom/p'
    >>
    >>

    >
    >
    > Thanks for the quick tuturial!
    >
    > What does -n stand for?
    >
    > Also, how could I run this through csplit to create separate files from the
    > start until the next occurance of each match?
    >
    > Any tips would be greatly appreciated!
    >


    Don't use sed for anything that ivolvles multiple lines, use awk. To
    find the pattern you want in awk would be:

    awk 'empty{if (/^From/) print; empty=0} /^$/{empty=1}' file

    If you want to create separate files starting at each match, depending
    whether you want the empty line printed to the new or the old file name
    or removed from the output, this might be what you want:

    awk 'BEGIN {out="file"++nr}
    empty {if (/^From/) {close(out); out="file"++nr} empty=0}
    /^$/ {empty=1} {print > out}' file

    Regards,

    Ed.

  5. Re: Basic grep question

    Ed Morton wrote:

    > Fred wrote:
    > > Michael Tosch wrote:
    > >
    > >
    > >>Fred wrote:
    > >>
    > >>>How can grep identify an empty line combined with and followed directly
    > >>>by a new line with the string 'From' at the beginning, as in the mbox
    > >>>format, i.e.:
    > >>>
    > >>>^$
    > >>>
    > >>>combined with:
    > >>>
    > >>>^From
    > >>>
    > >>>(I guess this is not it: grep ^$^From mbox)
    > >>
    > >>grep can hardly search for two consecutive lines,
    > >>but sed can:
    > >>
    > >>sed -n 'x;G;/^\nFrom/p'
    > >>
    > >>

    > >
    > >
    > > Thanks for the quick tuturial!
    > >
    > > What does -n stand for?
    > >
    > > Also, how could I run this through csplit to create separate files from
    > > the start until the next occurance of each match?
    > >
    > > Any tips would be greatly appreciated!
    > >

    >
    > Don't use sed for anything that ivolvles multiple lines, use awk. To
    > find the pattern you want in awk would be:
    >
    > awk 'empty{if (/^From/) print; empty=0} /^$/{empty=1}' file
    >
    > If you want to create separate files starting at each match, depending
    > whether you want the empty line printed to the new or the old file name
    > or removed from the output, this might be what you want:
    >
    > awk 'BEGIN {out="file"++nr}
    > empty {if (/^From/) {close(out); out="file"++nr} empty=0}
    > /^$/ {empty=1} {print > out}' file


    Yes, I tested it on an example file and found it works exactly as I had
    imagined. For example:

    --- start of file -----

    From asdfsdaf
    kjdfglkjldfg
    dfglkk

    From sdf
    kdsfjkskksf
    From sdfllskdflkdsf

    --- end of file ---

    The awk procedure created 2 files, as the third occurance of 'From' did not
    have a blank line before.

    However, in running the script for the intended purpose, which is to try
    and break down and ultimately repair what appears to be a faulty mbox file,
    I found that it did not create any separate files at all, in spite of the
    fact that the mbox contains thousands of messages which all begin with
    'From' and with a blank line before.

    However, the mbox comes from a Windows system, so I just tested opening it
    in the unix pico editor which, upon opening the file, displayed a
    notification that the file has been "Converted from DOS and Mac format".
    After simply saving the file with pico, running the awk script sucessfully
    split the mbox into thousunds of files! Now I only need to piece them
    together again and hope that the final Mozilla mail application where the
    file ultimately belongs will then see it as a valid mbox. With your help I
    may be a bit closer to solving the mystery of the corrupt Mozilla mbox!


  6. Re: Basic grep question

    [...]

    > may be a bit closer to solving the mystery of the corrupt Mozilla mbox!


    maybe not :-(

    Yet in re-combining the files back into a single mbox again there is still
    something which causes the (.msf) index of Mozilla into thinking there's
    only about 20 messages in the mbox. Could some crummy characters affect the
    way the Mozilla mail application parses the entire mbox? After all, the
    mbox works fine in MUTT before or after re-saving/un-DOS'ing, awk-splitting
    and re-combining. Maybe some weird characters in the file gives Mozilla the
    hickup, and maybe Mozilla is generally a buggy mail application.



  7. Re: Basic grep question

    Fred wrote:
    > [...]
    >
    >> may be a bit closer to solving the mystery of the corrupt Mozilla mbox!

    >
    > maybe not :-(
    >
    > Yet in re-combining the files back into a single mbox again there is still
    > something which causes the (.msf) index of Mozilla into thinking there's
    > only about 20 messages in the mbox. Could some crummy characters affect the
    > way the Mozilla mail application parses the entire mbox? After all, the
    > mbox works fine in MUTT before or after re-saving/un-DOS'ing, awk-splitting
    > and re-combining. Maybe some weird characters in the file gives Mozilla the
    > hickup, and maybe Mozilla is generally a buggy mail application.
    >
    >


    1.
    What about deleting the .msf file?
    IMHO Mozilla will create a new one. You only loose information about which
    mails have been read.

    2.
    Watch out for Content-Length: lines in the mail headers.


    --
    Michael Tosch @ hp : com

+ Reply to Thread