splitting file based on field contents - SCO

This is a discussion on splitting file based on field contents - SCO ; I want to split a text file up based on contents of first field. File is something like abcd,123,xyz abcd,234,xyz abcd,234,pdq def,333,aaa def,4444,aab ghij,333,ddd ghij,345,dasfsaf and i want to split it up, based on contents of first field, with each ...

+ Reply to Thread
Results 1 to 15 of 15

Thread: splitting file based on field contents

  1. splitting file based on field contents

    I want to split a text file up based on contents of first field.

    File is something like
    abcd,123,xyz
    abcd,234,xyz
    abcd,234,pdq
    def,333,aaa
    def,4444,aab
    ghij,333,ddd
    ghij,345,dasfsaf

    and i want to split it up, based on contents of first field, with each
    block with first field matching being a separate file. for now, all
    first fields are same length, but i can not guarantee that will be like
    that forever. i will not know the contents of the field ahead of time.

    if possible, i would like to name the files using the contents of that first
    field in the process, but if not, i can handle that myself easily later.

    i thought this was an easy task fo csplit, but i did not see any easy way
    to do this in man page. any suggestions?

    --
    -Joe Chasan- Magnatech Business Systems, Inc.
    joe - at - magnatechonline -dot- com Hicksville, NY - USA
    http://www.MagnatechOnline.com Tel.(516) 931-4444/Fax.(516) 931-1264

  2. Re: splitting file based on field contents

    Joe Chasan wrote in
    news:20080227104840.A3879@magnatechonline.com:

    > I want to split a text file up based on contents of first field.
    >
    > File is something like
    > abcd,123,xyz
    > abcd,234,xyz
    > abcd,234,pdq
    > def,333,aaa
    > def,4444,aab
    > ghij,333,ddd
    > ghij,345,dasfsaf
    >
    > and i want to split it up, based on contents of first field, with each
    > block with first field matching being a separate file. for now, all
    > first fields are same length, but i can not guarantee that will be
    > like that forever. i will not know the contents of the field ahead of
    > time.
    >
    > if possible, i would like to name the files using the contents of that
    > first field in the process, but if not, i can handle that myself
    > easily later.
    >
    > i thought this was an easy task fo csplit, but i did not see any easy
    > way to do this in man page. any suggestions?
    >
    > --
    > -Joe Chasan- Magnatech Business Systems,
    > Inc. joe - at - magnatechonline -dot- com Hicksville, NY - USA
    > http://www.MagnatechOnline.com Tel.(516) 931-4444/Fax.(516)
    > 931-1264


    here is a quikie, place it in a progfile and make it executable.

    usage: progfile filetosplit

    will generate a split.xxxx file for each occurance of the first field.

    begin----------
    for i in `cat $1 | sed 's/,.*$//' | sort -u`
    do
    grep "^$i," $1 > split.$i
    done
    end-------------

  3. Re: splitting file based on field contents

    restrictions on the quikie

    1st field can be any length so long as it is at least 1 character long and
    that the field contents can be used as a legal unix filename.

    1st field must end with a comma.






  4. Re: splitting file based on field contents

    Joe Chasan typed (on Wed, Feb 27, 2008 at 10:48:40AM -0500):
    | I want to split a text file up based on contents of first field.
    |
    | File is something like
    | abcd,123,xyz
    | abcd,234,xyz
    | abcd,234,pdq
    | def,333,aaa
    | def,4444,aab
    | ghij,333,ddd
    | ghij,345,dasfsaf
    |
    | and i want to split it up, based on contents of first field, with each
    | block with first field matching being a separate file. for now, all
    | first fields are same length, but i can not guarantee that will be like
    | that forever. i will not know the contents of the field ahead of time.
    |
    | if possible, i would like to name the files using the contents of that first
    | field in the process, but if not, i can handle that myself easily later.
    |
    | i thought this was an easy task fo csplit, but i did not see any easy way
    | to do this in man page. any suggestions?

    awk -F, '{print > $1}' /input_file

    --
    JP

  5. Re: splitting file based on field contents

    On Feb 27, 8:14 am, Marc Champagne wrote:
    > Joe Chasan wrote innews:20080227104840.A3879@magnatechonline.com:
    >
    >
    >
    > > I want to split a text file up based on contents of first field.

    >
    > > File is something like
    > > abcd,123,xyz
    > > abcd,234,xyz
    > > abcd,234,pdq
    > > def,333,aaa
    > > def,4444,aab
    > > ghij,333,ddd
    > > ghij,345,dasfsaf

    >
    > > and i want to split it up, based on contents of first field, with each
    > > block with first field matching being a separate file. for now, all
    > > first fields are same length, but i can not guarantee that will be
    > > like that forever. i will not know the contents of the field ahead of
    > > time.

    >
    > > if possible, i would like to name the files using the contents of that
    > > first field in the process, but if not, i can handle that myself
    > > easily later.

    >
    > > i thought this was an easy task fo csplit, but i did not see any easy
    > > way to do this in man page. any suggestions?

    >
    > > --
    > > -Joe Chasan- Magnatech Business Systems,
    > > Inc. joe - at - magnatechonline -dot- com Hicksville, NY - USA
    > >http://www.MagnatechOnline.com Tel.(516) 931-4444/Fax.(516)
    > > 931-1264

    >
    > here is a quikie, place it in a progfile and make it executable.
    >
    > usage: progfile filetosplit
    >
    > will generate a split.xxxx file for each occurance of the first field.
    >
    > begin----------
    > for i in `cat $1 | sed 's/,.*$//' | sort -u`
    > do
    > grep "^$i," $1 > split.$i
    > done
    > end-------------


    Didn't Unix Review used to feature useless uses of cat? That aside, I
    think it's clearer script and faster execution using "cut":

    for i in `cut -d, -f 1 $1 | sort -u`
    do
    ....

    --RLR

  6. Re: splitting file based on field contents

    Jean-Pierre Radley wrote in
    news:20080227170510.GB15226@jpradley.jpr.com:

    > awk -F, '{print > $1}' /input_file


    Had never noticed the use of > in awk.

    Hell, you learn something new every day!

    There's ALWAYS more than one way to skin a cat under *nix.

    Marc


  7. Re: splitting file based on field contents


    ----- Original Message -----
    From: "Marc Champagne"
    Newsgroups: comp.unix.sco.misc
    To:
    Sent: Wednesday, February 27, 2008 8:35 PM
    Subject: Re: splitting file based on field contents


    > Jean-Pierre Radley wrote in
    > news:20080227170510.GB15226@jpradley.jpr.com:
    >
    >> awk -F, '{print > $1}' /input_file

    >
    > Had never noticed the use of > in awk.
    >
    > Hell, you learn something new every day!
    >
    > There's ALWAYS more than one way to skin a cat under *nix.


    That is really slick, but it just occured to me to wonder does that append? like >> in shell, and if so will it also create new if not yet exist unlike >> in shell?

    /me tries it... yeah it works beautifully, creates and then subsequently appends.

    --
    Brian K. White brian@aljex.com http://www.myspace.com/KEYofR
    +++++[>+++[>+++++>+++++++<<-]<-]>>+.>.+++++.+++++++.-.[>+<---]>++.
    filePro BBx Linux SCO FreeBSD #callahans Satriani Filk!





  8. Re: splitting file based on field contents

    Brian K. White wrote (on Wed, Feb 27, 2008 at 08:58:26PM -0500):

    | ----- Original Message -----
    | From: "Marc Champagne"
    | Newsgroups: comp.unix.sco.misc
    | To:
    | Sent: Wednesday, February 27, 2008 8:35 PM
    | Subject: Re: splitting file based on field contents
    | >
    | > Jean-Pierre Radley wrote in
    | > news:20080227170510.GB15226@jpradley.jpr.com:
    | >
    | >> awk -F, '{print > $1}' /input_file
    | >
    | > Had never noticed the use of > in awk.
    | >
    | > Hell, you learn something new every day!
    | >
    | > There's ALWAYS more than one way to skin a cat under *nix.
    | >
    | > That is really slick, but it just occured to me to wonder does
    | > that append? like >> in shell, and if so will it also create
    | > new if not yet exist unlike >> in shell?
    |
    | /me tries it... yeah it works beautifully, creates and then
    | subsequently appends.
    |
    | --
    | Brian K. White brian@aljex.com http://www.myspace.com/KEYofR
    | +++++[>+++[>+++++>+++++++<<-]<-]>>+.>.+++++.+++++++.-.[>+<---]>++.
    | filePro BBx Linux SCO FreeBSD #callahans Satriani Filk!

    The reason a single > works is that when AWK opens a redirection
    to a file (or to a pipe) it holds it open until it's closed by
    programming or the program terminates.

    Which introduces a caveat re this solution. The number of redirects
    to files (or to pipes) that can be open at any one time is limited,
    so when that limit is reached, no more can be opened (I don't know
    what happens in that circumstance - whether AWK would quietly go on,
    just not opening any more, or if it would quit with an error).

    >From the SCO OSR6 man pages for awk:


    There is a limit to how many files and pipes you can open in an awk
    program (see ``Limitations'' below). Use the close statement to close
    files or pipes:
    close(filename)
    close(command-line)

    Limitations
    ^^^^^^^^^^^
    The following limits exist in this implementation of awk:
    100 fields
    3000 characters per input record
    3000 characters per output record
    3000 characters per field
    3000 characters per printf string
    400 characters per literal string or regular expression
    250 characters per character class
    55 open files or pipes
    ^^^^^^^^^^^^^^^^^^^^^^ (emphasis mine)
    double precision floating point

    Also, the program barfs if it encounters any blank lines, where $1
    would be null (not /dev/null), where you couldn't redirect to it.

    Bob

    --
    Bob Stockler +-+ bob@trebor.iglou.com +-+ http://members.iglou.com/trebor

  9. Re: splitting file based on field contents

    On Feb 27, 10:48*am, Joe Chasan wrote:
    > I want to split a text file up based on contents of first field.
    >
    > File is something like
    > abcd,123,xyz
    > abcd,234,xyz
    > abcd,234,pdq
    > def,333,aaa
    > def,4444,aab
    > ghij,333,ddd
    > ghij,345,dasfsaf
    >
    > and i want to split it up, based on contents of first field, with each
    > block with first field matching being a separate file


    I'd really suggest learning a little Perl for this kind of task.
    Perl is ubiquituous now, you can even install it on
    (ugh) Windows and you hardly even have to think to write programs
    like this.

    #!/usr/bin/perl
    while (<>) {
    # the <> reads stdin or a file given on the command line so if this is
    "split.pl"
    # then "split.pl yourfile" is how you'd use it
    @stuff=split /,/;
    # that will break each line as its read into $stuff[0] $stuff[1]..
    based on what's in between the //
    open(O, ">>$stuff[0]");
    # open for append
    print O $_;
    close O;
    # actually perl will close this when we try to reopen it on next pass
    through the loop
    # so you could leave that line out
    }

    See "Why I love Perl" at http://aplawrence.com/Unixart/loveperl.html
    also.



  10. Re: splitting file based on field contents

    Top Post . . .

    I just found out what happens with the AWK supplied with SCO
    OSR6 (which doesn't seem to agree exactly with the Limitations
    shown in my previous post (below).

    I made an input file with 130 different first fields (to be sure
    I had enough to test it). I got this error when I ran the one-
    line AWK program:

    awk: aaaat makes too many open files
    input record number 124, file /tmp/nuts.txt
    source line 1 of program << {print > $1} >>

    Bob

    Bob Stockler wrote (on Thu, Feb 28, 2008 at 10:09:02AM -0500):

    | Brian K. White wrote (on Wed, Feb 27, 2008 at 08:58:26PM -0500):
    |
    | | ----- Original Message -----
    | | From: "Marc Champagne"
    | | Newsgroups: comp.unix.sco.misc
    | | To:
    | | Sent: Wednesday, February 27, 2008 8:35 PM
    | | Subject: Re: splitting file based on field contents
    | | >
    | | > Jean-Pierre Radley wrote in
    | | > news:20080227170510.GB15226@jpradley.jpr.com:
    | | >
    | | >> awk -F, '{print > $1}' /input_file
    | | >
    | | > Had never noticed the use of > in awk.
    | | >
    | | > Hell, you learn something new every day!
    | | >
    | | > There's ALWAYS more than one way to skin a cat under *nix.
    | | >
    | | > That is really slick, but it just occured to me to wonder does
    | | > that append? like >> in shell, and if so will it also create
    | | > new if not yet exist unlike >> in shell?
    | |
    | | /me tries it... yeah it works beautifully, creates and then
    | | subsequently appends.
    | |
    | | --
    | | Brian K. White brian@aljex.com http://www.myspace.com/KEYofR
    | | +++++[>+++[>+++++>+++++++<<-]<-]>>+.>.+++++.+++++++.-.[>+<---]>++.
    | | filePro BBx Linux SCO FreeBSD #callahans Satriani Filk!
    |
    | The reason a single > works is that when AWK opens a redirection
    | to a file (or to a pipe) it holds it open until it's closed by
    | programming or the program terminates.
    |
    | Which introduces a caveat re this solution. The number of redirects
    | to files (or to pipes) that can be open at any one time is limited,
    | so when that limit is reached, no more can be opened (I don't know
    | what happens in that circumstance - whether AWK would quietly go on,
    | just not opening any more, or if it would quit with an error).
    |
    | From the SCO OSR6 man pages for awk:
    |
    | There is a limit to how many files and pipes you can open in an awk
    | program (see ``Limitations'' below). Use the close statement to close
    | files or pipes:
    | close(filename)
    | close(command-line)
    |
    | Limitations
    | ^^^^^^^^^^^
    | The following limits exist in this implementation of awk:
    | 100 fields
    | 3000 characters per input record
    | 3000 characters per output record
    | 3000 characters per field
    | 3000 characters per printf string
    | 400 characters per literal string or regular expression
    | 250 characters per character class
    | 55 open files or pipes
    | ^^^^^^^^^^^^^^^^^^^^^^ (emphasis mine)
    | double precision floating point
    |
    | Also, the program barfs if it encounters any blank lines, where $1
    | would be null (not /dev/null), where you couldn't redirect to it.
    |
    | Bob
    |
    | --
    | Bob Stockler +-+ bob@trebor.iglou.com +-+ http://members.iglou.com/trebor

    --
    Bob Stockler +-+ bob@trebor.iglou.com +-+ http://members.iglou.com/trebor

  11. Re: splitting file based on field contents

    On 28 Feb, 16:34, Bob Stockler wrote:
    > Top Post . . .
    >
    > I just found out what happens with the AWK supplied with SCO
    > OSR6 (which doesn't seem to agree exactly with the Limitations
    > shown in my previous post (below). *
    >
    > I made an input file with 130 different first fields (to be sure
    > I had enough to test it). *I got this error when I ran the one-
    > line AWK program:
    >
    > * awk: aaaat makes too many open files
    > * *input record number 124, file /tmp/nuts.txt
    > * *source line 1 of program << {print > $1} >>
    >
    > Bob
    >
    > Bob Stockler wrote (on Thu, Feb 28, 2008 at 10:09:02AM -0500):
    >
    > | Brian K. White wrote (on Wed, Feb 27, 2008 at 08:58:26PM -0500):
    > |
    >
    >
    >
    > | | ----- Original Message -----
    > | | From: "Marc Champagne"
    > | | Newsgroups: comp.unix.sco.misc
    > | | To:
    > | | Sent: Wednesday, February 27, 2008 8:35 PM
    > | | Subject: Re: splitting file based on field contents
    > | | >
    > | | > Jean-Pierre Radley wrote in
    > | | >news:20080227170510.GB15226@jpradley.jpr.com:
    > | | >
    > | | >> awk -F, '{print > $1}' /input_file
    > | | >
    > | | > Had never noticed the use of > in awk.
    > | | >
    > | | > Hell, you learn something new every day!
    > | | >
    > | | > There's ALWAYS more than one way to skin a cat under *nix.
    > | | >
    > | | > That is really slick, but it just occured to me to wonder does
    > | | > that append? like >> in shell, and if so will it also create
    > | | > new if not yet exist unlike >> in shell?
    > | |
    > | | /me tries it... yeah it works beautifully, creates and then
    > | | subsequently appends.
    > | |
    > | | --
    > | | Brian K. White * *br...@aljex.com * *http://www.myspace.com/KEYofR
    > | | +++++[>+++[>+++++>+++++++<<-]<-]>>+.>.+++++.+++++++.-.[>+<---]>++.
    > | | filePro *BBx * *Linux *SCO *FreeBSD * *#callahans *Satriani *Filk!
    > |
    > | The reason a single > works is that when AWK opens a redirection
    > | to a file (or to a pipe) it holds it open until it's closed by
    > | programming or the program terminates.
    > |
    > | Which introduces a caveat re this solution. *The number of redirects
    > | to files (or to pipes) that can be open at any one time is limited,
    > | so when that limit is reached, no more can be opened (I don't know
    > | what happens in that circumstance - whether AWK would quietly go on,
    > | just not opening any more, or if it would quit with an error).
    > |
    > | From the SCO OSR6 man pages for awk:
    > |
    > | * There is a limit to how many files and pipes you can open in an awk
    > | * *program (see ``Limitations'' below). Use the close statement to close
    > | * *files or pipes:
    > | * *close(filename)
    > | * *close(command-line)
    > |
    > | * Limitations
    > | * ^^^^^^^^^^^
    > | * *The following limits exist in this implementation of awk:
    > | * * 100 fields
    > | * *3000 characters per input record
    > | * *3000 characters per output record
    > | * *3000 characters per field
    > | * *3000 characters per printf string
    > | * * 400 characters per literal string or regular expression
    > | * * 250 characters per character class
    > | * * *55 open files or pipes
    > | * * *^^^^^^^^^^^^^^^^^^^^^^ (emphasis mine)
    > | * *double precision floating point
    > |
    > | Also, the program barfs if it encounters any blank lines, where $1
    > | would be null (not /dev/null), where you couldn't redirect to it.
    > |
    > | Bob
    > |
    > | --
    > | Bob Stockler +-+ b...@trebor.iglou.com +-+http://members.iglou.com/trebor
    >
    > --
    > Bob Stockler +-+ b...@trebor.iglou.com +-+http://members.iglou.com/trebor-Hide quoted text -
    >
    > - Show quoted text -


    Install and use gawk: in general, the licensed UNIX's have shell
    commands that are at least a decade out of date with the open source
    tools. SCO, being based on AT&T SysV rather than BSD UNIX, is a prime
    culprit in such instances.

  12. Re: splitting file based on field contents

    Bob Stockler wrote:

    > | | >> awk -F, '{print > $1}' /input_file


    > I just found out what happens with the AWK supplied with SCO
    > OSR6 (which doesn't seem to agree exactly with the Limitations
    > shown in my previous post (below).
    >
    > I made an input file with 130 different first fields (to be sure
    > I had enough to test it). I got this error when I ran the one-
    > line AWK program:
    >
    > awk: aaaat makes too many open files
    > input record number 124, file /tmp/nuts.txt
    > source line 1 of program << {print > $1} >>


    It would help if you used a correct awk program:

    awk -F, '{print > $1; close($1)}' input_file

    >Bela<


  13. Re: splitting file based on field contents

    Bela Lubkin wrote (on Fri, Feb 29, 2008 at 05:39:12PM -0800):

    | Bob Stockler wrote:
    |
    | > | | >> awk -F, '{print > $1}' /input_file
    |
    | > I just found out what happens with the AWK supplied with SCO
    | > OSR6 (which doesn't seem to agree exactly with the Limitations
    | > shown in my previous post (below).
    | >
    | > I made an input file with 130 different first fields (to be sure
    | > I had enough to test it). I got this error when I ran the one-
    | > line AWK program:
    | >
    | > awk: aaaat makes too many open files
    | > input record number 124, file /tmp/nuts.txt
    | > source line 1 of program << {print > $1} >>
    |
    | It would help if you used a correct awk program:
    |
    | awk -F, '{print > $1; close($1)}' input_file
    |
    | >Bela<

    But the next time a previously encountered, written to, and closed
    filename was again encountered, when it was opened again wouldn't
    it write to the first byte of the file, rather than appending to it?

    My solution was, in the body of the program, collect the lines in
    associative arrays, and then print them in the END section:

    {
    if ( $1 ) {
    if ( array[$1] ) { array[$1] = array[$1] "\n" $0 }
    else { array[$1] = $0 }
    }
    }
    END {
    for ( name in array ) { print array[name] ; close array[name] }
    }

    Looking at it now, the "close" seems to be superfluous.

    Bob

    --
    Bob Stockler +-+ bob@trebor.iglou.com +-+ http://members.iglou.com/trebor

  14. Re: splitting file based on field contents

    Bob Stockler wrote:

    > Bela Lubkin wrote (on Fri, Feb 29, 2008 at 05:39:12PM -0800):
    >
    > | Bob Stockler wrote:
    > |
    > | > | | >> awk -F, '{print > $1}' /input_file
    > |
    > | > I just found out what happens with the AWK supplied with SCO
    > | > OSR6 (which doesn't seem to agree exactly with the Limitations
    > | > shown in my previous post (below).
    > | >
    > | > I made an input file with 130 different first fields (to be sure
    > | > I had enough to test it). I got this error when I ran the one-
    > | > line AWK program:
    > | >
    > | > awk: aaaat makes too many open files
    > | > input record number 124, file /tmp/nuts.txt
    > | > source line 1 of program << {print > $1} >>
    > |
    > | It would help if you used a correct awk program:
    > |
    > | awk -F, '{print > $1; close($1)}' input_file
    >
    > But the next time a previously encountered, written to, and closed
    > filename was again encountered, when it was opened again wouldn't
    > it write to the first byte of the file, rather than appending to it?


    Yes, I wasn't thinking about multiple input lines with the same $1.

    > My solution was, in the body of the program, collect the lines in
    > associative arrays, and then print them in the END section:
    >
    > {
    > if ( $1 ) {
    > if ( array[$1] ) { array[$1] = array[$1] "\n" $0 }
    > else { array[$1] = $0 }
    > }
    > }
    > END {
    > for ( name in array ) { print array[name] ; close array[name] }
    > }
    >
    > Looking at it now, the "close" seems to be superfluous.


    I guess there was supposed to be a ">" somewhere, like:

    for ( name in array ) { print array[name] > name ; close name }

    ?

    That would work. awk also has a '>>' operator, you could do:

    awk -F, '{print >> $1; close($1)}' input_file

    That does more close() calls than necessary, might be a bit of a
    slowdown if you were doing millions of records but not important for
    likely dataset sizes. You would run this in a directory that was
    previously cleaned (unless you _wanted_ to append runs).

    Next you'll want to make sure $1 is a reasonably legitimate filename.
    Unix isn't too picky: it can't have '\0' in it but neither can awk
    strings, so that's not a problem; and it shouldn't have '/' in it.
    Otherwise someone's going to feed you a dataset where $1 =
    "../../../../etc/passwd" or something nasty like that. It might seem
    useful to be able to have a dataset that's designed to self-distribute
    into subdirectories, but I would just stay away from that. Trying to
    parse $1 into "safe" vs. "unsafe" directory references will get you in
    trouble. Parsing "this has a directory reference, reject it" will do
    until you have a real need for more, which hopefully never comes up.

    >Bela<


  15. Re: splitting file based on field contents

    Bela Lubkin wrote (on Sat, Mar 01, 2008 at 11:58:47PM -0800):

    | Bob Stockler wrote:
    |
    | > Bela Lubkin wrote (on Fri, Feb 29, 2008 at 05:39:12PM -0800):
    | >
    | > | Bob Stockler wrote:
    | > |
    | > | > | | >> awk -F, '{print > $1}' /input_file
    | > |
    | > | > I just found out what happens with the AWK supplied with SCO
    | > | > OSR6 (which doesn't seem to agree exactly with the Limitations
    | > | > shown in my previous post (below).
    | > | >
    | > | > I made an input file with 130 different first fields (to be sure
    | > | > I had enough to test it). I got this error when I ran the one-
    | > | > line AWK program:
    | > | >
    | > | > awk: aaaat makes too many open files
    | > | > input record number 124, file /tmp/nuts.txt
    | > | > source line 1 of program << {print > $1} >>
    | > |
    | > | It would help if you used a correct awk program:
    | > |
    | > | awk -F, '{print > $1; close($1)}' input_file
    | >
    | > But the next time a previously encountered, written to, and closed
    | > filename was again encountered, when it was opened again wouldn't
    | > it write to the first byte of the file, rather than appending to it?
    |
    | Yes, I wasn't thinking about multiple input lines with the same $1.
    |
    | > My solution was, in the body of the program, collect the lines in
    | > associative arrays, and then print them in the END section:
    | >
    | > {
    | > if ( $1 ) {
    | > if ( array[$1] ) { array[$1] = array[$1] "\n" $0 }
    | > else { array[$1] = $0 }
    | > }
    | > }
    | > END {
    | > for ( name in array ) { print array[name] ; close array[name] }
    | > }
    | >
    | > Looking at it now, the "close" seems to be superfluous.
    |
    | I guess there was supposed to be a ">" somewhere, like:
    |
    | for ( name in array ) { print array[name] > name ; close name }
    |
    | ?

    Yeah . . . too much rush to publish.

    | That would work. awk also has a '>>' operator, you could do:
    |
    | awk -F, '{print >> $1; close($1)}' input_file

    I wasn't aware of the '>>' redirection in awk (and I find it's
    also present in mawk, which is what I usually use). Thanks for
    the enlightenment.

    Bob

    | That does more close() calls than necessary, might be a bit of a
    | slowdown if you were doing millions of records but not important for
    | likely dataset sizes. You would run this in a directory that was
    | previously cleaned (unless you _wanted_ to append runs).
    |
    | Next you'll want to make sure $1 is a reasonably legitimate filename.
    | Unix isn't too picky: it can't have '\0' in it but neither can awk
    | strings, so that's not a problem; and it shouldn't have '/' in it.
    | Otherwise someone's going to feed you a dataset where $1 =
    | "../../../../etc/passwd" or something nasty like that. It might seem
    | useful to be able to have a dataset that's designed to self-distribute
    | into subdirectories, but I would just stay away from that. Trying to
    | parse $1 into "safe" vs. "unsafe" directory references will get you in
    | trouble. Parsing "this has a directory reference, reject it" will do
    | until you have a real need for more, which hopefully never comes up.
    |
    | >Bela<

    --
    Bob Stockler +-+ bob@trebor.iglou.com +-+ http://members.iglou.com/trebor

+ Reply to Thread