sed/awk question - SCO
This is a discussion on sed/awk question - SCO ; Hello SCO folks,
Is there a way to take the following lines:
100001 ASCII_STRING 1000
999999 ASCII STRING 99999
and have output:
100001 1000
999999 99999
^^^^^------ 1000 - 99999 Acceptable 4 to 5 digits only
^^^^^^------------- 100001 - 999999 ...
-
sed/awk question
Hello SCO folks,
Is there a way to take the following lines:
100001 ASCII_STRING 1000
999999 ASCII STRING 99999
and have output:
100001 1000
999999 99999
^^^^^------ 1000 - 99999 Acceptable 4 to 5 digits only
^^^^^^------------- 100001 - 999999 Acceptable Only 6 digits only
*ALL* non-numeric characters must be ignored:
123456 ASCII STRING[* space " , # ~ ! etc ]55555[* space " , # ~ ! etc ]
should end up with:
123456 55555
PS:
123456 ASCII_STRING, 555.222.1111 (phone numbers should be ignored)
123456 ASCII STRING 555-222-1111 (phone numbers should be ignored)
I got it as far as the top example, but trying to 'set' the line is
not reliable with all the senerios. Any feedback is appreciated.
- Jeff H
-
Re: sed/awk question
On Sun, Feb 24, 2008, Jeff Hyman wrote:
>Hello SCO folks,
>
>Is there a way to take the following lines:
>100001 ASCII_STRING 1000
>999999 ASCII STRING 99999
>and have output:
>100001 1000
>999999 99999
> ^^^^^------ 1000 - 99999 Acceptable 4 to 5 digits only
>^^^^^^------------- 100001 - 999999 Acceptable Only 6 digits only
>
>*ALL* non-numeric characters must be ignored:
>123456 ASCII STRING[* space " , # ~ ! etc ]55555[* space " , # ~ ! etc ]
>should end up with:
>123456 55555
>
>PS:
>123456 ASCII_STRING, 555.222.1111 (phone numbers should be ignored)
>123456 ASCII STRING 555-222-1111 (phone numbers should be ignored)
I would use python (or perhaps perl). This should work
#!/usr/bin/env python
import re, fileinput, sys
# this pattern matches 6 digitsanythingfive or six
# digits.
matchPattern = re.compile(r'([0-9]{6})\s.*\s([0-9]{5,6})\s+$')
for line in fileinput.input():
R = matchPattern.match(line[:-1])
if R:
print '\t'.join(R.groups())
sys.exit(0)
Bill
--
INTERNET: bill@celestial.com Bill Campbell; Celestial Software LLC
URL: http://www.celestial.com/ PO Box 820; 6641 E. Mercer Way
FAX: (206) 232-9186 Mercer Island, WA 98040-0820; (206) 236-1676
Now if there is one thing that we do worse than any other nation, it is
try and manage somebody else's affairs.
Will Rogers
-
Re: sed/awk question
Jeff Hyman wrote:
> Hello SCO folks,
>
> Is there a way to take the following lines:
> 100001 ASCII_STRING 1000
> 999999 ASCII STRING 99999
> and have output:
> 100001 1000
> 999999 99999
> ^^^^^------ 1000 - 99999 Acceptable 4 to 5 digits only
> ^^^^^^------------- 100001 - 999999 Acceptable Only 6 digits only
Jeff,
Not really enough information to give you a complete answer.
On the surface the following should do what you want:
sed 's/ *.* */ /' input_file > output_file
From the examples you give "ASCII_STRING" is separated from the first
block and last block of numbers with one or more spaces with no
spaces after the last block of numbers:
100001 ASCII_STRING 1000$ <-- The $ marks the end of the line
The search string above is /space_space_*.*space_space_*/ to get all
cases of singe or multiple spaces between the first block of numbers
"ASCII_STRING" and the last block of numbers.
>
> *ALL* non-numeric characters must be ignored:
> 123456 ASCII STRING[* space " , # ~ ! etc ]55555[* space " , # ~ ! etc ]
> should end up with:
> 123456 55555
>
> PS:
> 123456 ASCII_STRING, 555.222.1111 (phone numbers should be ignored)
> 123456 ASCII STRING 555-222-1111 (phone numbers should be ignored)
>
> I got it as far as the top example, but trying to 'set' the line is
> not reliable with all the senerios. Any feedback is appreciated.
>
> - Jeff H
>
>
--
Steve Fabac
S.M. Fabac & Associates
816/765-1670
-
Re: sed/awk question
On Feb 24, 3:23 pm, "Steve M. Fabac, Jr." wrote:
> Jeff Hyman wrote:
> > Hello SCO folks,
>
> > Is there a way to take the following lines:
> > 100001 ASCII_STRING 1000
> > 999999 ASCII STRING 99999
> > and have output:
> > 100001 1000
> > 999999 99999
> > ^^^^^------ 1000 - 99999 Acceptable 4 to 5 digits only
> > ^^^^^^------------- 100001 - 999999 Acceptable Only 6 digits only
>
> Jeff,
>
> Not really enough information to give you a complete answer.
>
> On the surface the following should do what you want:
>
> sed 's/ *.* */ /' input_file > output_file
>
> From the examples you give "ASCII_STRING" is separated from the first
> block and last block of numbers with one or more spaces with no
> spaces after the last block of numbers:
>
> 100001 ASCII_STRING 1000$ <-- The $ marks the end of the line
>
> The search string above is /space_space_*.*space_space_*/ to get all
> cases of singe or multiple spaces between the first block of numbers
> "ASCII_STRING" and the last block of numbers.
>
>
>
>
>
> > *ALL* non-numeric characters must be ignored:
> > 123456 ASCII STRING[* space " , # ~ ! etc ]55555[* space " , # ~ ! etc ]
> > should end up with:
> > 123456 55555
>
> > PS:
> > 123456 ASCII_STRING, 555.222.1111 (phone numbers should be ignored)
> > 123456 ASCII STRING 555-222-1111 (phone numbers should be ignored)
>
> > I got it as far as the top example, but trying to 'set' the line is
> > not reliable with all the senerios. Any feedback is appreciated.
>
> > - Jeff H
>
> --
> Steve Fabac
> S.M. Fabac & Associates
> 816/765-1670
Your constraints aren't well defined.
But if the input is regular and you want to print the warnings shown
on an unspecified SCO system without installing new software then awk
is the thing to use, perhaps preprocessed with sed as Steve suggests.
E.g.,
sed 's/ *.* */ /' input_file | awk '{ print $1 " " $2 NL ; \
if $1 < 10000 || > 99999 print "4 or 5 digits only" NL } '
--RLR