tar: Want to exclude file path, not files matching a pattern - Unix

This is a discussion on tar: Want to exclude file path, not files matching a pattern - Unix ; I'm using gnu tar to archive file trees. I have an exclude file .../Exclude.lst containing the single line "Log" (no spaces or quotes). In a real use of tar, I typically have a much larger number of items in the ...

+ Reply to Thread
Results 1 to 5 of 5

Thread: tar: Want to exclude file path, not files matching a pattern

  1. tar: Want to exclude file path, not files matching a pattern

    I'm using gnu tar to archive file trees. I have an exclude file
    .../Exclude.lst containing the single line "Log" (no spaces or quotes).
    In a real use of tar, I typically have a much larger number of items
    in the exclude file.

    Say the file tree to tar up consists of the following files and
    directories.

    ./Log
    ./subdir
    ./subdir/Log

    I want to exclude ./Log, but not ./subdir/Log. To test this, I issue

    tar cfX - ../Exclude.lst * | tar tf -

    This gives me only

    ./subdir

    The "Log" in ../Exclude.lst matches

    Log
    subdir/Log

    and the 2nd item is excluded in addition to the first (not what I
    want). I only want to exclude the file whose full path is Log. The
    whole problem stems from the fact that "Log" in Exclude.lst is
    interpeted as a pattern rather than a full path.

    A not-so-good way to solve this is to put "./Log" into ../Exclude.lst,
    and issue

    tar cfX - ../Exclude.lst . | tar tf -

    This sticks an extra "./" in front of the file paths of all archived items.
    It doesn't look as clean, but more seriously, it doesn't work with:

    find * -type f | xargs tar cf Archive.tar

    Being able to issue this archive command is good because it skips all
    the entries for directories alone, which I consider cleaner. The more
    important reason, however, is that it makes this possible:

    tar tf OldArchive.tar | xargs tar cf NewArchive.tar

    If I tried this with an OldArchive.tar that contains entries for
    directory names (e.g. subdir/ ), I would end up archiving some
    portions of my file tree many times. For example, if OldArchive.tar
    contained these entries:

    Log
    subdir/
    subdir/Log

    then I would effectively be issuing

    tar cf NewArchive.tar Log subdir subdir/Log

    NewArchive.tar would end up archiving 2 copies of the subdir/Log --
    the first is due to the argument "subdir", and the 2nd is due to the
    argument subdir/Log.

    To avoid this, I would like to have exclude file entries recognized as
    full paths, not just patterns to match against any portion of a full
    path (or at least the end part). I'm surprised that there doesn't
    seem to be a way to do this. Is there something simple that I'm
    overlooking? Thanks.

  2. Re: tar: Want to exclude file path, not files matching a pattern

    > If I tried this with an OldArchive.tar that contains entries for
    > directory names (e.g. subdir/ ), I would end up archiving some
    > portions of my file tree many times.


    See the tar info documentation and read about the '--no-recursion' flag.

    > To avoid this, I would like to have exclude file entries recognized
    > as full paths, not just patterns to match against any portion of a
    > full path (or at least the end part). I'm surprised that there
    > doesn't seem to be a way to do this. Is there something simple that
    > I'm overlooking? Thanks.


    'man find'

    For example, to exclude "Log" from your archive while still including
    subdir/Log:

    find * -path Log -prune -o -print |
    tar -cz -f archive.tar.gz -T- --no-recursion

    You can also pipe the output of find through grep, etc., if you want
    to perform more complex filtering.

    -- Lars

    --
    Lars Kellogg-Stedman <8273grkci8q8kgt@jetable.net>
    This email address will expire on 2005-11-23.


  3. Re: tar: Want to exclude file path, not files matching a pattern

    Lars Kellogg-Stedman wrote:
    >>If I tried this with an OldArchive.tar that contains entries for
    >>directory names (e.g. subdir/ ), I would end up archiving some
    >>portions of my file tree many times.

    >
    > See the tar info documentation and read about the '--no-recursion' flag.
    >
    >>To avoid this, I would like to have exclude file entries recognized
    >>as full paths, not just patterns to match against any portion of a
    >>full path (or at least the end part). I'm surprised that there
    >>doesn't seem to be a way to do this. Is there something simple that
    >>I'm overlooking? Thanks.

    >
    > 'man find'
    >
    > For example, to exclude "Log" from your archive while still including
    > subdir/Log:
    >
    > find * -path Log -prune -o -print |
    > tar -cz -f archive.tar.gz -T- --no-recursion
    >
    > You can also pipe the output of find through grep, etc., if you want
    > to perform more complex filtering.


    Lars,

    That's an interesting way to use tar. You're not using any of the
    hierarchical traversal functionality of "tar", you merely use it as a
    "pack rat", packing away whatever file it is told to by "find". This
    gets around the problem of archiving a file twice, when its name is
    presented to tar, as well as when it's directory name is presented to
    tar. It also allows the use of the much more flexible selection
    capability of find (and grep, sed, etc..), as you mention.

    I was actually hoping there was a way to get tar to recognize excluded
    paths without resorting to compound commands, but the above is just as
    good. It also avoids an extra file for exclusion patterns. I suppose
    this is an inherent limitation of tar? I thought that recognizing
    excluded paths would be a pretty basic and handy feature for a "tape"
    archiver, but wasn't sure if it was there and I simply didn't see it.

    DD

  4. Re: tar: Want to exclude file path, not files matching a pattern

    > That's an interesting way to use tar. You're not using any of the
    > hierarchical traversal functionality of "tar", you merely use it as a
    > "pack rat", packing away whatever file it is told to by "find".


    The find-pipe-to-archiver is an idiom that is used heavily with 'cpio'
    (another archiving program). It fits well into the "use a chain of
    tools, each doing one thing well" philosophy.

    -- Lars

    --
    Lars Kellogg-Stedman <8273grkci8q8kgt@jetable.net>
    This email address will expire on 2005-11-23.


  5. Re: tar: Want to exclude file path, not files matching a pattern

    Lars Kellogg-Stedman wrote:
    >>That's an interesting way to use tar. You're not using any of the
    >>hierarchical traversal functionality of "tar", you merely use it as a
    >>"pack rat", packing away whatever file it is told to by "find".

    >
    > The find-pipe-to-archiver is an idiom that is used heavily with 'cpio'
    > (another archiving program). It fits well into the "use a chain of
    > tools, each doing one thing well" philosophy.



    Fiddlesticks. I'm finding that our "find" is not gnu. Oh well.
    Time to fall back onto the exclusion list for tar, starting each
    pattern off with a dot i.e. "./Some/File/Path". I can still use
    the --no-recursion, as I have my personal version of gnu tar
    (I'm not the admin). Getting a personal version of everything,
    though, can be extremely time-consuming and costly in terms of
    disk space.

    DD

+ Reply to Thread