find script? - Redhat

This is a discussion on find script? - Redhat ; I'm in need of script that would allow me to find the following but I've never scripted anything. Hoping to get some help as I just don't have time to learn scripting just for this particular task, although it's definitely ...

+ Reply to Thread
Results 1 to 7 of 7

Thread: find script?

  1. find script?

    I'm in need of script that would allow me to find the following but
    I've never scripted anything. Hoping to get some help as I just don't
    have time to learn scripting just for this particular task, although
    it's definitely something I want to put on my todo list...

    Find all subfolders with a file.ext1 and file.ext2 but only where the
    letter case of "file" is different. There are three possible ways the
    name can be spelled: file, File and FILE.

    So a subfolder containing both file.ext1 and File.ext2 would be found
    since the case is different, but a subfolder containing FILE.ext1 and
    FILE.ext2 would not, since the case matches.

    Hope that makes sense...

    Thanks,
    Pete.

  2. Re: find script?

    use command "find" before going into the scripting world.



  3. Re: find script?

    On Tue, 05 Aug 2008, in the Usenet newsgroup linux.redhat, in article
    , P1 wrote:

    >I'm in need of script that would allow me to find the following


    Why?

    >but I've never scripted anything. Hoping to get some help as I just
    >don't have time to learn scripting just for this particular task,
    >although it's definitely something I want to put on my todo list...


    [compton ~]$ whatis locate find basename sort uniq
    locate (1) - list files in databases that match a pattern
    find (1) - search for files in a directory hierarchy
    basename (1) - strip directory and suffix from filenames
    sort (1) - sort lines of text files
    uniq (1) - remove duplicate lines from a sorted file
    [compton ~]$

    You don't need a script to do this - it's a "one-liner" where you pipe
    the output of one command to another - in this case, three commands
    via two pipes. The 'find' command can be used in place of 'locate' if
    your system hasn't got an up-to-date locatedb. The 'sort' command
    probably won't be needed.

    When you do get around to learning scripting, start with a HOWTO that
    should be on your system:

    -rw-rw-r-- 1 gferg ldp 31540 Jul 27 2000 Bash-Prog-Intro-HOWTO

    and then go to http://tldp.org/guides.html and pick up copies of the
    "Bash Guide for Beginners" and the "Advanced Bash-Scripting Guide",
    which will steer you on the right way.

    >Find all subfolders with a file.ext1 and file.ext2 but only where the
    >letter case of "file" is different. There are three possible ways the
    >name can be spelled: file, File and FILE.
    >
    >So a subfolder containing both file.ext1 and File.ext2 would be found
    >since the case is different, but a subfolder containing FILE.ext1 and
    >FILE.ext2 would not, since the case matches.
    >
    >Hope that makes sense...


    It doesn't - by the way, we call them directories rather than folders.
    This is the type of question that is posed by an instructor as homework
    to give the student practice in thinking, and using the materials that
    were taught in class. Most people won't answer such questions, because
    the student doesn't learn a thing by having others do their homework
    for them.

    Old guy

  4. Re: find script?

    You're the second person that stated that this sounds like homework.
    Maybe it does, I've never taken a Linux class, been a Windows admin
    for many years, just starting to scratch the surface of Linux. I
    changed the filenames in my description, but the problem is actually a
    real-world one. I migrated a proprietary email system (Gordano GMS)
    from Windows to Linux a couple of weeks ago. Most accounts work fine,
    but a few hundred don't and I found that they don't because the
    INBOX.MBX file case doesn't match the INBOX.IDX file case, so the
    system can't find the index for the mailbox file and hence shows the
    mailbox as empty. I need to find all directories where these two
    files have a different case so that I can fix that problem. In
    Windows, this didn't matter because there is no case sensitivity, but
    obviously in Linux this broke the mailbox. Gordano support is trying
    to figure out how to fix this programmatically also, but they're slow
    as hell so I was trying a different avenue...

    Pete.

    On Wed, 06 Aug 2008 15:01:24 -0500, ibuprofin@painkiller.example.tld
    (Moe Trin) wrote:

    >On Tue, 05 Aug 2008, in the Usenet newsgroup linux.redhat, in article
    >, P1 wrote:
    >
    >>I'm in need of script that would allow me to find the following

    >
    >Why?
    >
    >>but I've never scripted anything. Hoping to get some help as I just
    >>don't have time to learn scripting just for this particular task,
    >>although it's definitely something I want to put on my todo list...

    >
    >[compton ~]$ whatis locate find basename sort uniq
    >locate (1) - list files in databases that match a pattern
    >find (1) - search for files in a directory hierarchy
    >basename (1) - strip directory and suffix from filenames
    >sort (1) - sort lines of text files
    >uniq (1) - remove duplicate lines from a sorted file
    >[compton ~]$
    >
    >You don't need a script to do this - it's a "one-liner" where you pipe
    >the output of one command to another - in this case, three commands
    >via two pipes. The 'find' command can be used in place of 'locate' if
    >your system hasn't got an up-to-date locatedb. The 'sort' command
    >probably won't be needed.
    >
    >When you do get around to learning scripting, start with a HOWTO that
    >should be on your system:
    >
    >-rw-rw-r-- 1 gferg ldp 31540 Jul 27 2000 Bash-Prog-Intro-HOWTO
    >
    >and then go to http://tldp.org/guides.html and pick up copies of the
    >"Bash Guide for Beginners" and the "Advanced Bash-Scripting Guide",
    >which will steer you on the right way.
    >
    >>Find all subfolders with a file.ext1 and file.ext2 but only where the
    >>letter case of "file" is different. There are three possible ways the
    >>name can be spelled: file, File and FILE.
    >>
    >>So a subfolder containing both file.ext1 and File.ext2 would be found
    >>since the case is different, but a subfolder containing FILE.ext1 and
    >>FILE.ext2 would not, since the case matches.
    >>
    >>Hope that makes sense...

    >
    >It doesn't - by the way, we call them directories rather than folders.
    >This is the type of question that is posed by an instructor as homework
    >to give the student practice in thinking, and using the materials that
    >were taught in class. Most people won't answer such questions, because
    >the student doesn't learn a thing by having others do their homework
    >for them.
    >
    > Old guy



  5. Re: find script?

    On Wed, 06 Aug 2008, in the Usenet newsgroup linux.redhat, in article
    , P1 wrote:

    >You're the second person that stated that this sounds like homework.
    >Maybe it does, I've never taken a Linux class, been a Windows admin
    >for many years, just starting to scratch the surface of Linux.


    If you look through the archives of this, and more usefully, the Usenet
    group 'comp.unix.shell', you'll see this situation quite frequently. My
    neighbor is an instructor at a nearby college, teaching three courses.
    The first is CS-101A Intro To UNIX, and the homework given is often
    artificial, but is intended to expand the skills of the student. By
    the eighth week, some of the problems are posed to cause the student
    to think of rather long "one-liners".

    >Most accounts work fine, but a few hundred don't and I found that they
    >don't because the INBOX.MBX file case doesn't match the INBOX.IDX file
    >case, so the system can't find the index for the mailbox file and hence
    >shows the mailbox as empty. I need to find all directories where these
    >two files have a different case so that I can fix that problem.


    'manually' fixing the problem is a good technique if you are not very
    experienced - you avoid automatically shooting yourself in the foot.

    OK - the tool you are looking for is 'uniq' (man uniq)

    uniq prints the unique lines in a sorted file, discarding all but
    one of a run of matching lines. It can optionally show only lines
    that appear exactly once, or lines that appear more than once.
    uniq requires sorted input because it compares only consecutive lines.

    and the problem then becomes one of creating this list. Now, your original
    post says the files are in the same directories, which suggests we don't
    have to test the path for differences:

    /this/part/is/the/same/INDEX.MBX
    /this/part/is/the/same/iNdEx.IDX

    OK - first tool is 'find' and you tell it to start searching at the
    appropriate directory level - perhaps /var/spool/mbox (you don't mention
    the arrangement of the file system), searching for files, with names
    that contain the string ending in .MBX or .IDX, (printing the full file
    name relative to the search starting point is the default action):

    find /var/spool/mbox -type f \( -name \*.IDX -o -name \*.MBX \)

    If the _pairs_ of files were NOT located in the same directory, but
    rather something like

    /path/to/index/files/INDEX.MBX
    /path/to/mailbox/files/iNdEx.IDX

    you need only tell 'find' to start searching at the appropriate common
    point (/path/to) and add a '-printf "%f \n"' option on the end of the
    'find' command to have it only print 'filename.extension' rather than
    the '/path/to/directory/filename.extension' names.

    Pipe the output to 'cut to eliminate the extension

    cut -d'.' -f1

    and then do a caseless sort, and pass the result to 'uniq -u'

    sort -f | uniq -u

    So the "one-liner" actually looks like

    find /var/spool/mbox -type f \( -name \*.IDX -o -name \*.MBX \) | cut
    -d'.' -f1 | sort -f | uniq -u

    That's all one line. Now, hit the man pages for those four commands
    and see what I've done. One _other_ thing I'd look at is to see that
    there are no case errors in the file extensions

    find /var/spool/mbox -type f \( -iname \*.IDX -o -iname \*.MBX \) |
    grep -v .IDX | grep -v .MBX

    which looks for those same file extensions in a caseless manner, then
    culls out those with upper case extensions only.

    A minor caution - running these commands MAY take a lot of resources,
    depending on how large an area you have to search. Try to schedule this
    for when the system isn't overly busy.

    >In Windows, this didn't matter because there is no case sensitivity,
    >but obviously in Linux this broke the mailbox.


    Can't imagine why the authors would have decided mixing case would have
    been a good idea just because you can get away with it - but what-ever.

    >Gordano support is trying to figure out how to fix this programmatically
    >also, but they're slow as hell so I was trying a different avenue...


    I hate to tell you this, but we have a cooperative program with several
    universities/colleges providing "work experience" positions - the
    "summer intern" type of job. For the *nix related positions, I expect
    the interns to be able to whip out that 'one-liner' in under five
    minutes. Graduates of that 'CS-101A Intro' class would be _aware_ of
    the solution, but it might take them an hour or two to get it right.
    By the way, this is only one of several ways the problem could have
    been solved - that's a problem and a feature of *nix.

    Old guy

  6. Re: find script?

    On Thu, 07 Aug 2008 15:06:01 -0500, Moe Trin typed this message:

    > On Wed, 06 Aug 2008, in the Usenet newsgroup linux.redhat, in article
    > , P1 wrote:
    >
    >>You're the second person that stated that this sounds like homework.
    >>Maybe it does, I've never taken a Linux class, been a Windows admin for
    >>many years, just starting to scratch the surface of Linux.

    >
    > If you look through the archives of this, and more usefully, the Usenet
    > group 'comp.unix.shell', you'll see this situation quite frequently. My
    > neighbor is an instructor at a nearby college, teaching three courses.
    > The first is CS-101A Intro To UNIX, and the homework given is often
    > artificial, but is intended to expand the skills of the student. By the
    > eighth week, some of the problems are posed to cause the student to
    > think of rather long "one-liners".
    >
    >>Most accounts work fine, but a few hundred don't and I found that they
    >>don't because the INBOX.MBX file case doesn't match the INBOX.IDX file
    >>case, so the system can't find the index for the mailbox file and hence
    >>shows the mailbox as empty. I need to find all directories where these
    >>two files have a different case so that I can fix that problem.

    >
    > 'manually' fixing the problem is a good technique if you are not very
    > experienced - you avoid automatically shooting yourself in the foot.
    >
    > OK - the tool you are looking for is 'uniq' (man uniq)
    >
    > uniq prints the unique lines in a sorted file, discarding all but
    > one of a run of matching lines. It can optionally show only lines
    > that appear exactly once, or lines that appear more than once.
    > uniq requires sorted input because it compares only consecutive
    > lines.
    >
    > and the problem then becomes one of creating this list. Now, your
    > original post says the files are in the same directories, which suggests
    > we don't have to test the path for differences:
    >
    > /this/part/is/the/same/INDEX.MBX
    > /this/part/is/the/same/iNdEx.IDX
    >
    > OK - first tool is 'find' and you tell it to start searching at the
    > appropriate directory level - perhaps /var/spool/mbox (you don't mention
    > the arrangement of the file system), searching for files, with names
    > that contain the string ending in .MBX or .IDX, (printing the full file
    > name relative to the search starting point is the default action):
    >
    > find /var/spool/mbox -type f \( -name \*.IDX -o -name \*.MBX \)
    >
    > If the _pairs_ of files were NOT located in the same directory, but
    > rather something like
    >
    > /path/to/index/files/INDEX.MBX
    > /path/to/mailbox/files/iNdEx.IDX
    >
    > you need only tell 'find' to start searching at the appropriate common
    > point (/path/to) and add a '-printf "%f \n"' option on the end of the
    > 'find' command to have it only print 'filename.extension' rather than
    > the '/path/to/directory/filename.extension' names.
    >
    > Pipe the output to 'cut to eliminate the extension
    >
    > cut -d'.' -f1
    >
    > and then do a caseless sort, and pass the result to 'uniq -u'
    >
    > sort -f | uniq -u
    >
    > So the "one-liner" actually looks like
    >
    > find /var/spool/mbox -type f \( -name \*.IDX -o -name \*.MBX \) | cut
    > -d'.' -f1 | sort -f | uniq -u
    >
    > That's all one line. Now, hit the man pages for those four commands and
    > see what I've done. One _other_ thing I'd look at is to see that there
    > are no case errors in the file extensions
    >
    > find /var/spool/mbox -type f \( -iname \*.IDX -o -iname \*.MBX \) |
    > grep -v .IDX | grep -v .MBX
    >


    Ordinarily, I wouldn't comment but ....
    above | grep v .IDX | grep -v .MBX would only eliminate *.IDX and
    *.MBX but I think the OP actually wanted to be inclusive *Mbx, *MBX,
    *mBX, *mbX, for examples.


    Also the cut "." f1 would produce /home/moetrim/file, /home/moetrim/
    file1, /home/moetrim/file2/not, etc.
    for files
    /home/moetrim/file.is.longer.MBx, /home/moetrim/file1.not.that.1.mbx
    and /home/moetrim/file2/not.MbX
    and I think the OP wanted just
    /home/moetrim/
    /home/moetrim/file2

    > which looks for those same file extensions in a caseless manner, then
    > culls out those with upper case extensions only.
    >
    > A minor caution - running these commands MAY take a lot of resources,
    > depending on how large an area you have to search. Try to schedule this
    > for when the system isn't overly busy.
    >
    >>In Windows, this didn't matter because there is no case sensitivity, but
    >>obviously in Linux this broke the mailbox.

    >
    > Can't imagine why the authors would have decided mixing case would have
    > been a good idea just because you can get away with it - but what-ever.
    >
    >>Gordano support is trying to figure out how to fix this programmatically
    >>also, but they're slow as hell so I was trying a different avenue...

    >
    > I hate to tell you this, but we have a cooperative program with several
    > universities/colleges providing "work experience" positions - the
    > "summer intern" type of job. For the *nix related positions, I expect
    > the interns to be able to whip out that 'one-liner' in under five
    > minutes. Graduates of that 'CS-101A Intro' class would be _aware_ of the
    > solution, but it might take them an hour or two to get it right. By the
    > way, this is only one of several ways the problem could have been solved
    > - that's a problem and a feature of *nix.
    >
    > Old guy



  7. Re: find script?

    On Thu, 07 Aug 2008, in the Usenet newsgroup linux.redhat, in article
    , noi ance wrote:

    >Moe Trin typed this message:


    >> P1 wrote:


    >>> Most accounts work fine, but a few hundred don't and I found that
    >>> they don't because the INBOX.MBX file case doesn't match the
    >>> INBOX.IDX file case, so the system can't find the index for the
    >>> mailbox file and hence shows the mailbox as empty. I need to find
    >>> all directories where these two files have a different case so that
    >>> I can fix that problem.


    >> find /var/spool/mbox -type f \( -name \*.IDX -o -name \*.MBX \) | cut
    >> -d'.' -f1 | sort -f | uniq -u


    >> One _other_ thing I'd look at is to see that there are no case errors
    >> in the file extensions
    >>
    >> find /var/spool/mbox -type f \( -iname \*.IDX -o -iname \*.MBX \) |
    >> grep -v .IDX | grep -v .MBX


    >Ordinarily, I wouldn't comment but ....
    >above | grep v .IDX | grep -v .MBX would only eliminate *.IDX and
    >*.MBX but I think the OP actually wanted to be inclusive *Mbx, *MBX,
    >*mBX, *mbX, for examples.


    My interpretation of the need was from an earlier post, where he wrote

    ] Find all subfolders with a file.ext1 and file.ext2 but only where the
    ] letter case of "file" is different. There are three possible ways the
    ] name can be spelled: file, File and FILE.

    Now I took that as meaning that the extension was not the problem, and
    as a result, the '.IDX' and '.MBX' was the desired extension. Perhaps
    you are interpreting that differently. The effect of the two 'grep -v'
    functions would be to eliminate files with these extensions, and
    anything that is output (which due to the find can only be an extension
    with one or more lower case letters) would be something for the admin
    to look at further.

    Note that any "corrections" to the filenames are being done _manually_
    so it is expected that the person doing so is thinking.

    >Also the cut "." f1 would produce /home/moetrim/file, /home/moetrim/
    >file1, /home/moetrim/file2/not, etc.
    >for files
    >/home/moetrim/file.is.longer.MBx, /home/moetrim/file1.not.that.1.mbx
    >and /home/moetrim/file2/not.MbX
    >and I think the OP wanted just
    >/home/moetrim/
    >/home/moetrim/file2


    Mailbox names are generally related to the 'user' name. While RFC2822
    para 3.4.1 (and the earlier RFC0822 para 6.2) merely state that the
    username is a "locally interpreted string" (or 'domain-dependent string'
    in the earlier document), most operating systems specify that the name
    contains alpha-numerics, the dash, and underscore. Dots are STRONGLY
    discouraged because this may confuse some MUAs, as would spaces. If
    the filename does contain dots other than the dot separating the name
    and extension, the 'cut' command could be replaced by a 'sed' command
    at minor increase in CPU time.

    Old guy

+ Reply to Thread