** Sort Question ** - Unix

This is a discussion on ** Sort Question ** - Unix ; I have a question I hope some of you could shed some light on. I was trying to figure out the proper options to the sort command to do a multiple key sort on a file of email addresses. The ...

+ Reply to Thread
Results 1 to 4 of 4

Thread: ** Sort Question **

  1. ** Sort Question **

    I have a question I hope some of you could shed some light on.
    I was trying to figure out the proper options to the sort command
    to do a multiple key sort on a file of email addresses. The file is
    basically just a list of primary and alternate email addresses
    separated by whitespace (tab or multiple spaces). Here is an example
    file.

    bob@hotmail.com bigman@yahoo.com
    tony@juno.com patriotsfan@google.com
    adrian@netscape.com pizzaboy@yahoo.com
    stacy@netscape.com redrider@yahoo.com
    johnboy@netscape.com alex@yahoo.com

    Actually I'm only interesting in sorting on the primary addresses but
    the first sort key would be the domain and the second key would be the
    user.

    I'm having problems with key 1. Key 1 should begin at the first @
    symbol and extend to the first whitespace, but I think the problem is
    that it is looking at everything to the second occurrence of the @
    symbol. That is what I wasn't sure how to fix. I guess my question is
    can or how do you tell the key to begin at the delimiter and end at the
    first whitespace character? If possible, I would like to avoid using
    the older origin-zero syntax
    but if I must, that is fine.

    This is using version 4.5.3 of the sort from the FSF on Red Hat 9
    running bash 2.05b.

    cheers,
    validus


  2. Re: ** Sort Question **

    On Thu, 21 Apr 2005 18:48:08 -0700, validus1 wrote:

    > I have a question I hope some of you could shed some light on.
    > I was trying to figure out the proper options to the sort command
    > to do a multiple key sort on a file of email addresses. The file is
    > basically just a list of primary and alternate email addresses
    > separated by whitespace (tab or multiple spaces). Here is an example
    > file.
    >
    > bob@hotmail.com bigman@yahoo.com
    > tony@juno.com patriotsfan@google.com
    > adrian@netscape.com pizzaboy@yahoo.com
    > stacy@netscape.com redrider@yahoo.com
    > johnboy@netscape.com alex@yahoo.com
    >
    > Actually I'm only interesting in sorting on the primary addresses but
    > the first sort key would be the domain and the second key would be the
    > user.
    >
    > I'm having problems with key 1. Key 1 should begin at the first @
    > symbol and extend to the first whitespace, but I think the problem is
    > that it is looking at everything to the second occurrence of the @
    > symbol. That is what I wasn't sure how to fix. I guess my question is
    > can or how do you tell the key to begin at the delimiter and end at the
    > first whitespace character? If possible, I would like to avoid using
    > the older origin-zero syntax
    > but if I must, that is fine.
    >
    > This is using version 4.5.3 of the sort from the FSF on Red Hat 9
    > running bash 2.05b.


    Unfortunately sort only allows a single character to be specified as the
    field separator, with some special case code for '\0' and nothing
    specified. So you have to adapt your data to sort.

    There are 2 approaches
    1) Change your data so it only has a single key,
    The most obvious transformation is to change the spaces/tabs into another
    '@' character.


    tr ' \t' '@' < data | sort -k2,2 -k1,1 | sed 's/@/ /2'

    You can do more fancy stuff to get the output into neater columns if you want.

    2) Add extra data, so it has the information in the manner you want, sort,
    and then remove the extra data.

    sed 's/^\(^@*@\)\([^ ]*\)/\2 \1&/' data | sort | sed 's/[^@]*@//'
    .....................^^^^^ a space and a tab character

    Icarus

  3. Re: ** Sort Question **

    On Fri, 22 Apr 2005 at 01:48 GMT, validus1@gmail.com wrote:
    > I have a question I hope some of you could shed some light on.
    > I was trying to figure out the proper options to the sort command
    > to do a multiple key sort on a file of email addresses. The file is
    > basically just a list of primary and alternate email addresses
    > separated by whitespace (tab or multiple spaces). Here is an example
    > file.
    >
    > bob@hotmail.com bigman@yahoo.com
    > tony@juno.com patriotsfan@google.com
    > adrian@netscape.com pizzaboy@yahoo.com
    > stacy@netscape.com redrider@yahoo.com
    > johnboy@netscape.com alex@yahoo.com
    >
    > Actually I'm only interesting in sorting on the primary addresses but
    > the first sort key would be the domain and the second key would be the
    > user.
    >
    > I'm having problems with key 1. Key 1 should begin at the first @
    > symbol and extend to the first whitespace, but I think the problem is
    > that it is looking at everything to the second occurrence of the @
    > symbol. That is what I wasn't sure how to fix. I guess my question is
    > can or how do you tell the key to begin at the delimiter and end at the
    > first whitespace character? If possible, I would like to avoid using
    > the older origin-zero syntax
    > but if I must, that is fine.


    TAB=$'\t' ## Use a literal tab with shell that don't support this.
    awk -F '[@ $TAB]' '{printf "%s\t%s\n", $2, $0}' FILE | sort | cut -f2-

    > This is using version 4.5.3 of the sort from the FSF on Red Hat 9
    > running bash 2.05b.


    --
    Chris F.A. Johnson http://cfaj.freeshell.org/shell
    ================================================== =================
    My code (if any) in this post is copyright 2005, Chris F.A. Johnson
    and may be copied under the terms of the GNU General Public License

  4. Re: ** Sort Question **

    Icarus Sparry wrote:
    > On Thu, 21 Apr 2005 18:48:08 -0700, validus1 wrote:
    >>I was trying to figure out the proper options to the sort command
    >>to do a multiple key sort on a file of email addresses. The file is
    >>basically just a list of primary and alternate email addresses
    >>
    >>Actually I'm only interesting in sorting on the primary addresses but
    >>the first sort key would be the domain and the second key would be the
    >>user.

    >
    > There are 2 approaches
    > 1) Change your data so it only has a single key,
    > The most obvious transformation is to change the spaces/tabs into another
    > '@' character.
    >
    > tr ' \t' '@' < data | sort -k2,2 -k1,1 | sed 's/@/ /2'


    Or you can change the at-sign into a space, as in
    sed -e '/@/s// /' data | sort -b -i -k2,2 -k1,1 | sed -e '/ /s//@/'
    ('data' would be the name of the input file)

    Kind regards,


    Daniel von Asmuth


+ Reply to Thread