NDMP considerations - Storage

This is a discussion on NDMP considerations - Storage ; Hi Friends, I have a task of consolidating some of our datacenters. When done, I would have accumilated about 1 peta byte of data on our brand new netapp filers. While I am in our scoping out phase, I would ...

+ Reply to Thread
Results 1 to 15 of 15

Thread: NDMP considerations

  1. NDMP considerations

    Hi Friends,
    I have a task of consolidating some of our datacenters. When done, I
    would have accumilated about 1 peta byte of data on our brand new
    netapp filers.

    While I am in our scoping out phase, I would like to know what
    considerations I should take for backing up (full and incremental
    backups) of 1 petabyte of data on an on going basis.

    What issues will I run into?

    What are best practises?

    How long will this backup take?

    Any other things to watch out for?

    Just a few notes on our architecture... we have a few high end netapp
    filers (combinations of 3020s, 960cs etc) as tier 1 followed by a tier
    of 270cs for older data.

    I dont have anything speced for VTL and backup yet.

    Thank You
    Ludlas.


  2. Re: NDMP considerations

    On Apr 5, 12:19 pm, ludlaslac...@yahoo.com wrote:
    > Hi Friends,
    > I have a task of consolidating some of our datacenters. When done, I
    > would have accumilated about 1 peta byte of data on our brand new
    > netapp filers.
    >
    > While I am in our scoping out phase, I would like to know what
    > considerations I should take for backing up (full and incremental
    > backups) of 1 petabyte of data on an on going basis.
    >
    > What issues will I run into?
    >
    > What are best practises?
    >
    > How long will this backup take?
    >
    > Any other things to watch out for?
    >
    > Just a few notes on our architecture... we have a few high end netapp
    > filers (combinations of 3020s, 960cs etc) as tier 1 followed by a tier
    > of 270cs for older data.
    >
    > I dont have anything speced for VTL and backup yet.
    >
    > Thank You
    > Ludlas.


    Infact I don't have experience on ndmpcopy backup but I feel during
    ndmpcopy filer performance may be affected. So you have to findout
    whether you have sufficient off hours for backup. Another point I
    would like to say that check average file size, if its very less then
    ndmpcopy may take more time. More no. of small size files on volume
    affect ndmpcopy performance very badly.
    In out environment, 1.2TB volume with around 30M files take more than
    14hr to complete ndmpcopy backup. So we take backup from another linux
    box where storage volumes are mounted.

    - Raju


  3. Re: NDMP considerations

    On Apr 5, 10:39 am, "Raju Mahala" wrote:
    > On Apr 5, 12:19 pm, ludlaslac...@yahoo.com wrote:
    >
    >
    >
    > > Hi Friends,
    > > I have a task of consolidating some of our datacenters. When done, I
    > > would have accumilated about 1 peta byte of data on our brand new
    > > netapp filers.

    >
    > > While I am in our scoping out phase, I would like to know what
    > > considerations I should take for backing up (full and incremental
    > > backups) of 1 petabyte of data on an on going basis.

    >
    > > What issues will I run into?

    >
    > > What are best practises?

    >
    > > How long will this backup take?

    >
    > > Any other things to watch out for?

    >
    > > Just a few notes on our architecture... we have a few high end netapp
    > > filers (combinations of 3020s, 960cs etc) as tier 1 followed by a tier
    > > of 270cs for older data.

    >
    > > I dont have anything speced for VTL and backup yet.

    >
    > > Thank You
    > > Ludlas.

    >
    > Infact I don't have experience on ndmpcopy backup but I feel during
    > ndmpcopy filer performance may be affected. So you have to findout
    > whether you have sufficient off hours for backup. Another point I
    > would like to say that check average file size, if its very less then
    > ndmpcopy may take more time. More no. of small size files on volume
    > affect ndmpcopy performance very badly.
    > In out environment, 1.2TB volume with around 30M files take more than
    > 14hr to complete ndmpcopy backup. So we take backup from another linux
    > box where storage volumes are mounted.
    >
    > - Raju



    Thanks Raju, that somewhat helps...

    I cant understand though why a mere 1.2 TB volume is taking 14 hours
    to read. That is way slower than wireline speeds... We are able to
    read 1.2TB of data much faster than that from even our slow 270s just
    over NFS infact. (one of our modules copies all snapshot directories
    via NFS instead of ndmp right now)

    Do you know where the bottleneck is? Any one else out there with
    experiences in this area for such large number of files and data?



  4. Re: NDMP considerations

    On 5 Apr 2007 12:19:39 -0700, ludlaslacias@yahoo.com wrote:

    >On Apr 5, 10:39 am, "Raju Mahala" wrote:
    >> On Apr 5, 12:19 pm, ludlaslac...@yahoo.com wrote:
    >>
    >>
    >>
    >> > Hi Friends,
    >> > I have a task of consolidating some of our datacenters. When done, I
    >> > would have accumilated about 1 peta byte of data on our brand new
    >> > netapp filers.

    >>
    >> > While I am in our scoping out phase, I would like to know what
    >> > considerations I should take for backing up (full and incremental
    >> > backups) of 1 petabyte of data on an on going basis.

    >>
    >> > What issues will I run into?

    >>
    >> > What are best practises?

    >>
    >> > How long will this backup take?

    >>
    >> > Any other things to watch out for?

    >>
    >> > Just a few notes on our architecture... we have a few high end netapp
    >> > filers (combinations of 3020s, 960cs etc) as tier 1 followed by a tier
    >> > of 270cs for older data.

    >>
    >> > I dont have anything speced for VTL and backup yet.

    >>
    >> > Thank You
    >> > Ludlas.

    >>
    >> Infact I don't have experience on ndmpcopy backup but I feel during
    >> ndmpcopy filer performance may be affected. So you have to findout
    >> whether you have sufficient off hours for backup. Another point I
    >> would like to say that check average file size, if its very less then
    >> ndmpcopy may take more time. More no. of small size files on volume
    >> affect ndmpcopy performance very badly.
    >> In out environment, 1.2TB volume with around 30M files take more than
    >> 14hr to complete ndmpcopy backup. So we take backup from another linux
    >> box where storage volumes are mounted.
    >>
    >> - Raju

    >
    >
    >Thanks Raju, that somewhat helps...
    >
    >I cant understand though why a mere 1.2 TB volume is taking 14 hours
    >to read. That is way slower than wireline speeds... We are able to
    >read 1.2TB of data much faster than that from even our slow 270s just
    >over NFS infact. (one of our modules copies all snapshot directories
    >via NFS instead of ndmp right now)
    >
    >Do you know where the bottleneck is? Any one else out there with
    >experiences in this area for such large number of files and data?
    >



    First, avoid NDMPcopy if possible. Particularly for long sync times.
    NDMPcopy is simply a dump, and dumps do not keep the source and dest
    in sync (ie. anything deleted on the source does not get deleted on
    the dest). If you are intending to eventually migrate then ask NetApp
    for a temporary Snapmirror license and use volume level snapmirror.

    Volume level snapmirror will eliminate 100% the bottleneck you are
    seeing.
    The probelm with dump (nothing specific to NDMP) is that it has to map
    every file before it starts writing data. If you have alot of files
    this can take a very long time. We have some backups that don't
    actually start writing data for 27 hours. But once it starts it
    saturates the tape.

    NetApp is very good at file access for NFS and even some read-ahead
    caching. If you are accessing alot of data over NFS it's Ontap and
    WAFL algorithms come into play, whereas on a dump they do not (or at
    least not in the same manner).

    Also, unless you find the NDMPcopy binary with the "infinite
    incremental" patch you can only do 9 levels, just like dump. The
    built-in filer version only allows 3 if I recall correctly.

    The bottleneck is likely what I mentioned but if you post the quota
    report we can verify.

    ~F

  5. Re: NDMP considerations

    On Apr 5, 12:33 pm, Faeandar wrote:
    > On 5 Apr 2007 12:19:39 -0700, ludlaslac...@yahoo.com wrote:
    >
    >
    >
    > >On Apr 5, 10:39 am, "Raju Mahala" wrote:
    > >> On Apr 5, 12:19 pm, ludlaslac...@yahoo.com wrote:

    >
    > >> > Hi Friends,
    > >> > I have a task of consolidating some of our datacenters. When done, I
    > >> > would have accumilated about 1 peta byte of data on our brand new
    > >> > netapp filers.

    >
    > >> > While I am in our scoping out phase, I would like to know what
    > >> > considerations I should take for backing up (full and incremental
    > >> > backups) of 1 petabyte of data on an on going basis.

    >
    > >> > What issues will I run into?

    >
    > >> > What are best practises?

    >
    > >> > How long will this backup take?

    >
    > >> > Any other things to watch out for?

    >
    > >> > Just a few notes on our architecture... we have a few high end netapp
    > >> > filers (combinations of 3020s, 960cs etc) as tier 1 followed by a tier
    > >> > of 270cs for older data.

    >
    > >> > I dont have anything speced for VTL and backup yet.

    >
    > >> > Thank You
    > >> > Ludlas.

    >
    > >> Infact I don't have experience on ndmpcopy backup but I feel during
    > >> ndmpcopy filer performance may be affected. So you have to findout
    > >> whether you have sufficient off hours for backup. Another point I
    > >> would like to say that check average file size, if its very less then
    > >> ndmpcopy may take more time. More no. of small size files on volume
    > >> affect ndmpcopy performance very badly.
    > >> In out environment, 1.2TB volume with around 30M files take more than
    > >> 14hr to complete ndmpcopy backup. So we take backup from another linux
    > >> box where storage volumes are mounted.

    >
    > >> - Raju

    >
    > >Thanks Raju, that somewhat helps...

    >
    > >I cant understand though why a mere 1.2 TB volume is taking 14 hours
    > >to read. That is way slower than wireline speeds... We are able to
    > >read 1.2TB of data much faster than that from even our slow 270s just
    > >over NFS infact. (one of our modules copies all snapshot directories
    > >via NFS instead of ndmp right now)

    >
    > >Do you know where the bottleneck is? Any one else out there with
    > >experiences in this area for such large number of files and data?

    >
    > First, avoid NDMPcopy if possible. Particularly for long sync times.
    > NDMPcopy is simply a dump, and dumps do not keep the source and dest
    > in sync (ie. anything deleted on the source does not get deleted on
    > the dest). If you are intending to eventually migrate then ask NetApp
    > for a temporary Snapmirror license and use volume level snapmirror.
    >
    > Volume level snapmirror will eliminate 100% the bottleneck you are
    > seeing.
    > The probelm with dump (nothing specific to NDMP) is that it has to map
    > every file before it starts writing data. If you have alot of files
    > this can take a very long time. We have some backups that don't
    > actually start writing data for 27 hours. But once it starts it
    > saturates the tape.
    >
    > NetApp is very good at file access for NFS and even some read-ahead
    > caching. If you are accessing alot of data over NFS it's Ontap and
    > WAFL algorithms come into play, whereas on a dump they do not (or at
    > least not in the same manner).
    >
    > Also, unless you find the NDMPcopy binary with the "infinite
    > incremental" patch you can only do 9 levels, just like dump. The
    > built-in filer version only allows 3 if I recall correctly.
    >
    > The bottleneck is likely what I mentioned but if you post the quota
    > report we can verify.
    >
    > ~F


    Thanks Faeander.

    How long would Snapmirror take to backup 1.2 TB of data? And can
    Snapmirror talk to a standard VTL environment?



  6. Re: NDMP considerations

    On 5 Apr 2007 12:53:42 -0700, ludlaslacias@yahoo.com wrote:


    >
    >How long would Snapmirror take to backup 1.2 TB of data? And can
    >Snapmirror talk to a standard VTL environment?
    >


    It will move 1.2TB at whatever network speed you have. So if you have
    gig it will move it at full usable gig speed, and unlike QSM or dump
    it will start it immediately. No mapping is needed as it just moves
    blocks, which include ALL snapshots on the source volume.

    The only possible caveat I see to the speed is the 270. They are not
    speedy. SnapMirror will beat whatever speeds you get for NFS though.

    I cannot answer the VTL question, I do not know.

    ~F

  7. Re: NDMP considerations

    On Apr 5, 1:02 pm, Faeandar wrote:
    > On 5 Apr 2007 12:53:42 -0700, ludlaslac...@yahoo.com wrote:
    >
    >
    >
    > >How long would Snapmirror take to backup 1.2 TB of data? And can
    > >Snapmirror talk to a standard VTL environment?

    >
    > It will move 1.2TB at whatever network speed you have. So if you have
    > gig it will move it at full usable gig speed, and unlike QSM or dump
    > it will start it immediately. No mapping is needed as it just moves
    > blocks, which include ALL snapshots on the source volume.
    >
    > The only possible caveat I see to the speed is the 270. They are not
    > speedy. SnapMirror will beat whatever speeds you get for NFS though.
    >
    > I cannot answer the VTL question, I do not know.
    >
    > ~F


    So when you use Snapmirror, you are using it primarily for moving data
    from one netapp to another netapp?

    What do you use to backup that amount of data off of your netapps? If
    you dont use VTL, where do you send your backups to?



  8. Re: NDMP considerations

    On 5 Apr 2007 13:31:49 -0700, ludlaslacias@yahoo.com wrote:

    >On Apr 5, 1:02 pm, Faeandar wrote:
    >> On 5 Apr 2007 12:53:42 -0700, ludlaslac...@yahoo.com wrote:
    >>
    >>
    >>
    >> >How long would Snapmirror take to backup 1.2 TB of data? And can
    >> >Snapmirror talk to a standard VTL environment?

    >>
    >> It will move 1.2TB at whatever network speed you have. So if you have
    >> gig it will move it at full usable gig speed, and unlike QSM or dump
    >> it will start it immediately. No mapping is needed as it just moves
    >> blocks, which include ALL snapshots on the source volume.
    >>
    >> The only possible caveat I see to the speed is the 270. They are not
    >> speedy. SnapMirror will beat whatever speeds you get for NFS though.
    >>
    >> I cannot answer the VTL question, I do not know.
    >>
    >> ~F

    >
    >So when you use Snapmirror, you are using it primarily for moving data
    >from one netapp to another netapp?
    >
    >What do you use to backup that amount of data off of your netapps? If
    >you dont use VTL, where do you send your backups to?
    >



    We use SnapMirror to *replicate* data from one filer to another, not
    back it up. For backups you can use SnapVault to another NetApp and
    back up to tape from there. I'm guessing you could use it for VTL as
    well but I do not know.

    Snapvault is quite nice because you can schedule the vaults to occur
    much more simply than snapmirror (though technically they both use the
    same underlying technology).

    Why are you not backing up the filers directly with NDMP over FC or
    even Ethernet?

    ~F

  9. Re: NDMP considerations

    On Apr 5, 2:42 pm, Faeandar wrote:
    > On 5 Apr 2007 13:31:49 -0700, ludlaslac...@yahoo.com wrote:
    >
    >
    >
    > >On Apr 5, 1:02 pm, Faeandar wrote:
    > >> On 5 Apr 2007 12:53:42 -0700, ludlaslac...@yahoo.com wrote:

    >
    > >> >How long would Snapmirror take to backup 1.2 TB of data? And can
    > >> >Snapmirror talk to a standard VTL environment?

    >
    > >> It will move 1.2TB at whatever network speed you have. So if you have
    > >> gig it will move it at full usable gig speed, and unlike QSM or dump
    > >> it will start it immediately. No mapping is needed as it just moves
    > >> blocks, which include ALL snapshots on the source volume.

    >
    > >> The only possible caveat I see to the speed is the 270. They are not
    > >> speedy. SnapMirror will beat whatever speeds you get for NFS though.

    >
    > >> I cannot answer the VTL question, I do not know.

    >
    > >> ~F

    >
    > >So when you use Snapmirror, you are using it primarily for moving data
    > >from one netapp to another netapp?

    >
    > >What do you use to backup that amount of data off of your netapps? If
    > >you dont use VTL, where do you send your backups to?

    >
    > We use SnapMirror to *replicate* data from one filer to another, not
    > back it up. For backups you can use SnapVault to another NetApp and
    > back up to tape from there. I'm guessing you could use it for VTL as
    > well but I do not know.

    When you backup from the netapp to the tape system, is that NDMP over
    FC at that point?

    >
    > Snapvault is quite nice because you can schedule the vaults to occur
    > much more simply than snapmirror (though technically they both use the
    > same underlying technology).
    >
    > Why are you not backing up the filers directly with NDMP over FC or
    > even Ethernet?


    This is exactly what I am trying to setup here... an NDMP over FC
    environment... However, some poeple had warned about long backup times
    for large amounts of data, and that is what prompted me to my post...

    Which is, if I had say 1.2TB of data, would I also face the 14 hr
    backup cycles like Raju had? Because in our environment, we will
    likely start approaching over 500TB of data with in the next few
    months as we consolidate and with in a year be up to a PB of online
    data.

    So while I now understand how efficient Snapmirror is for replication,
    I would like to understand what the NDMP backup considerations should
    be... will that also work at wireline speeds? Can I do multiple NDMP
    backups from various filers in parallel and so on.

    Best,
    Ludlas

    >
    > ~F




  10. Re: NDMP considerations

    On 5 Apr 2007 16:58:47 -0700, ludlaslacias@yahoo.com wrote:


    >>
    >> Snapvault is quite nice because you can schedule the vaults to occur
    >> much more simply than snapmirror (though technically they both use the
    >> same underlying technology).
    >>
    >> Why are you not backing up the filers directly with NDMP over FC or
    >> even Ethernet?

    >
    >This is exactly what I am trying to setup here... an NDMP over FC
    >environment... However, some poeple had warned about long backup times
    >for large amounts of data, and that is what prompted me to my post...


    That is completely dependent on your data set. Post a quota report
    with file listing and I can give you a ballpark. Also post what type
    of infrastructure you are looking to build.
    2gb or 4gb FC?
    direct to tape or FC switch?
    type of tape drives?
    Are all filers going to backup direct to tape or will you do 3-way
    backups in some cases? (this is where a filer without a tape drive
    does an NDMP dump over ethernet to a filer who has a tape drive)
    What backup software are you going to use?

    >
    >Which is, if I had say 1.2TB of data, would I also face the 14 hr
    >backup cycles like Raju had? Because in our environment, we will
    >likely start approaching over 500TB of data with in the next few
    >months as we consolidate and with in a year be up to a PB of online
    >data.
    >
    >So while I now understand how efficient Snapmirror is for replication,
    >I would like to understand what the NDMP backup considerations should
    >be... will that also work at wireline speeds? Can I do multiple NDMP
    >backups from various filers in parallel and so on.


    I think we're getting somewhere now. There are lots of restrictions
    depending on the backup software.
    For the most part you can't share tape drives between filers except
    via 3-way backups (as mentioned above). But this can cause
    performance problems depending on your data, network, infrastructure,
    etc. But it does work quite nicely in many cases and is almost as
    fast as FC, you just have a filer doing both net in and tape out for
    something other than it's own data.

    Is the time to backup really a problem? What's driving it? I ask
    because the filer backups based on snapshot so the need to backup fast
    is not based on coherency.

    Post file count or quota report, that will help you understand backup
    implications better than any theoretical discussion of speeds and
    feeds.

    ~F

  11. Re: NDMP considerations

    On Apr 5, 5:25 pm, Faeandar wrote:
    > On 5 Apr 2007 16:58:47 -0700, ludlaslac...@yahoo.com wrote:
    >
    >
    >
    > >> Snapvault is quite nice because you can schedule the vaults to occur
    > >> much more simply than snapmirror (though technically they both use the
    > >> same underlying technology).

    >
    > >> Why are you not backing up the filers directly with NDMP over FC or
    > >> even Ethernet?

    >
    > >This is exactly what I am trying to setup here... an NDMP over FC
    > >environment... However, some poeple had warned about long backup times
    > >for large amounts of data, and that is what prompted me to my post...

    >
    > That is completely dependent on your data set. Post a quota report
    > with file listing and I can give you a ballpark. Also post what type
    > of infrastructure you are looking to build.
    > 2gb or 4gb FC?
    > direct to tape or FC switch?
    > type of tape drives?
    > Are all filers going to backup direct to tape or will you do 3-way
    > backups in some cases? (this is where a filer without a tape drive
    > does an NDMP dump over ethernet to a filer who has a tape drive)
    > What backup software are you going to use?
    >
    >
    >
    > >Which is, if I had say 1.2TB of data, would I also face the 14 hr
    > >backup cycles like Raju had? Because in our environment, we will
    > >likely start approaching over 500TB of data with in the next few
    > >months as we consolidate and with in a year be up to a PB of online
    > >data.

    >
    > >So while I now understand how efficient Snapmirror is for replication,
    > >I would like to understand what the NDMP backup considerations should
    > >be... will that also work at wireline speeds? Can I do multiple NDMP
    > >backups from various filers in parallel and so on.

    >
    > I think we're getting somewhere now. There are lots of restrictions
    > depending on the backup software.
    > For the most part you can't share tape drives between filers except
    > via 3-way backups (as mentioned above). But this can cause
    > performance problems depending on your data, network, infrastructure,
    > etc. But it does work quite nicely in many cases and is almost as
    > fast as FC, you just have a filer doing both net in and tape out for
    > something other than it's own data.
    >
    > Is the time to backup really a problem? What's driving it? I ask
    > because the filer backups based on snapshot so the need to backup fast
    > is not based on coherency.
    >
    > Post file count or quota report, that will help you understand backup
    > implications better than any theoretical discussion of speeds and
    > feeds.
    >
    > ~F


    The time to backup may actually not b a problem now that I understand
    what you say... although I'd expect the backup to happen in somehwat a
    reasonable amount of time... For example, respoding to the post by
    Raju, if we had a 14 hour window to backup just 1.2 TB of data, that
    wouldnt fly with 500TB of data... Its not that it needs to be done in
    24 hrs or even a few days... its just that we cant have operations
    that span more than a few weeks.

    Is Raju's enviornment an anomoly or is that common? I'm asking
    because I've heard some caution about long backup windows for large
    amounts of data.

    Probobally more in particular, restore on a volume level also seems to
    be something some people in my old organization had warned me about...
    some volumes are fairly very large, and just to recover a single file,
    would I need to restore an entire volume?



  12. Re: NDMP considerations

    On Apr 6, 12:19 am, ludlaslac...@yahoo.com wrote:
    > On Apr 5, 10:39 am, "Raju Mahala" wrote:
    >
    >
    >
    >
    >
    > > On Apr 5, 12:19 pm, ludlaslac...@yahoo.com wrote:

    >
    > > > Hi Friends,
    > > > I have a task of consolidating some of our datacenters. When done, I
    > > > would have accumilated about 1 peta byte of data on our brand new
    > > > netapp filers.

    >
    > > > While I am in our scoping out phase, I would like to know what
    > > > considerations I should take for backing up (full and incremental
    > > > backups) of 1 petabyte of data on an on going basis.

    >
    > > > What issues will I run into?

    >
    > > > What are best practises?

    >
    > > > How long will this backup take?

    >
    > > > Any other things to watch out for?

    >
    > > > Just a few notes on our architecture... we have a few high end netapp
    > > > filers (combinations of 3020s, 960cs etc) as tier 1 followed by a tier
    > > > of 270cs for older data.

    >
    > > > I dont have anything speced for VTL and backup yet.

    >
    > > > Thank You
    > > > Ludlas.

    >
    > > Infact I don't have experience on ndmpcopy backup but I feel during
    > > ndmpcopy filer performance may be affected. So you have to findout
    > > whether you have sufficient off hours for backup. Another point I
    > > would like to say that check average file size, if its very less then
    > > ndmpcopy may take more time. More no. of small size files on volume
    > > affect ndmpcopy performance very badly.
    > > In out environment, 1.2TB volume with around 30M files take more than
    > > 14hr to complete ndmpcopy backup. So we take backup from another linux
    > > box where storage volumes are mounted.

    >
    > > - Raju

    >
    > Thanks Raju, that somewhat helps...
    >
    > I cant understand though why a mere 1.2 TB volume is taking 14 hours
    > to read. That is way slower than wireline speeds... We are able to
    > read 1.2TB of data much faster than that from even our slow 270s just
    > over NFS infact. (one of our modules copies all snapshot directories
    > via NFS instead of ndmp right now)
    >
    > Do you know where the bottleneck is? Any one else out there with
    > experiences in this area for such large number of files and data?- Hide quoted text -
    >
    > - Show quoted text -


    Yes we know the bottleneck. Lots of RCA was done by Netapp guy also.
    So there are two major reasons :
    1) large no. of small files (average file size : ~40KB)
    2) Heavy turnaround of file creation/deletion

    So due to these factors volume gets fragmented at file level and
    because ndmpcopy works at file level so it takes more time to read
    fragmentated files. If we go for snapmirror then it doesn't take much
    time because that is block level copy. Even it takes more than
    expected time during snapvault which is block level copy but works at
    qtree level so need to find out corresponding block for qtree.

    - Raju


  13. Re: NDMP considerations

    > > Post file count or quota report, that will help you understand backup
    > > implications better than any theoretical discussion of speeds and
    > > feeds.

    >
    > > ~F

    >


    As I feel file count and daily turnaround of file deletion/creation
    (which lead to file fragmentation) will play major role in total
    backup time through ndmpcopy, whether it is over FC or ethernet.
    I have experience in NDMP over ethernet and in that case volume with
    low turnaround and less file count were getting done almost based on
    wire speed but volume those had more file count with higher turnaround
    were having problem.
    I can't comment on NDMP over FC but I feel bottleneck is due to
    filecount which will remain as it is so must be same scenerio.


    > The time to backup may actually not b a problem now that I understand
    > what you say... although I'd expect the backup to happen in somehwat a
    > reasonable amount of time... For example, respoding to the post by
    > Raju, if we had a 14 hour window to backup just 1.2 TB of data, that
    > wouldnt fly with 500TB of data... Its not that it needs to be done in
    > 24 hrs or even a few days... its just that we cant have operations
    > that span more than a few weeks.
    >
    > Is Raju's enviornment an anomoly or is that common? I'm asking
    > because I've heard some caution about long backup windows for large
    > amounts of data.


    This is not with our all netapp volumes so can't say common. Its only
    with some netapp volume those are for some specific user groups where
    heavy turnaround of file creation/deletion occurs with very small
    files. Rest of the volumes are ok and don't have these issues.

    > Probobally more in particular, restore on a volume level also seems to
    > be something some people in my old organization had warned me about...
    > some volumes are fairly very large, and just to recover a single file,
    > would I need to restore an entire volume?


    I am not sure but I hope that shouldn't be the case. Restore procedure
    must depend on Backup software policy. If it is the case then I think
    useless so I almost sure it shouldn't be the case.

    - Raju


  14. Re: NDMP considerations

    On 5 Apr 2007 23:18:13 -0700, ludlaslacias@yahoo.com wrote:

    >On Apr 5, 5:25 pm, Faeandar wrote:
    >> On 5 Apr 2007 16:58:47 -0700, ludlaslac...@yahoo.com wrote:
    >>
    >>
    >>
    >> >> Snapvault is quite nice because you can schedule the vaults to occur
    >> >> much more simply than snapmirror (though technically they both use the
    >> >> same underlying technology).

    >>
    >> >> Why are you not backing up the filers directly with NDMP over FC or
    >> >> even Ethernet?

    >>
    >> >This is exactly what I am trying to setup here... an NDMP over FC
    >> >environment... However, some poeple had warned about long backup times
    >> >for large amounts of data, and that is what prompted me to my post...

    >>
    >> That is completely dependent on your data set. Post a quota report
    >> with file listing and I can give you a ballpark. Also post what type
    >> of infrastructure you are looking to build.
    >> 2gb or 4gb FC?
    >> direct to tape or FC switch?
    >> type of tape drives?
    >> Are all filers going to backup direct to tape or will you do 3-way
    >> backups in some cases? (this is where a filer without a tape drive
    >> does an NDMP dump over ethernet to a filer who has a tape drive)
    >> What backup software are you going to use?
    >>
    >>
    >>
    >> >Which is, if I had say 1.2TB of data, would I also face the 14 hr
    >> >backup cycles like Raju had? Because in our environment, we will
    >> >likely start approaching over 500TB of data with in the next few
    >> >months as we consolidate and with in a year be up to a PB of online
    >> >data.

    >>
    >> >So while I now understand how efficient Snapmirror is for replication,
    >> >I would like to understand what the NDMP backup considerations should
    >> >be... will that also work at wireline speeds? Can I do multiple NDMP
    >> >backups from various filers in parallel and so on.

    >>
    >> I think we're getting somewhere now. There are lots of restrictions
    >> depending on the backup software.
    >> For the most part you can't share tape drives between filers except
    >> via 3-way backups (as mentioned above). But this can cause
    >> performance problems depending on your data, network, infrastructure,
    >> etc. But it does work quite nicely in many cases and is almost as
    >> fast as FC, you just have a filer doing both net in and tape out for
    >> something other than it's own data.
    >>
    >> Is the time to backup really a problem? What's driving it? I ask
    >> because the filer backups based on snapshot so the need to backup fast
    >> is not based on coherency.
    >>
    >> Post file count or quota report, that will help you understand backup
    >> implications better than any theoretical discussion of speeds and
    >> feeds.
    >>
    >> ~F

    >
    >The time to backup may actually not b a problem now that I understand
    >what you say... although I'd expect the backup to happen in somehwat a
    >reasonable amount of time... For example, respoding to the post by
    >Raju, if we had a 14 hour window to backup just 1.2 TB of data, that
    >wouldnt fly with 500TB of data... Its not that it needs to be done in
    >24 hrs or even a few days... its just that we cant have operations
    >that span more than a few weeks.


    Well, raw backup time is a function of total data and speed of tape
    drives. You just can't get faster than the max speed of the
    destination.
    At 30MB/sec you can backup 2.5TB a day. So that 500TB better be split
    up nicely or you can count on weeks of backup time.
    A filer can do more than 30MB/sec so you could have 2 or more drives
    to make that go faster overall.

    >
    >Is Raju's enviornment an anomoly or is that common? I'm asking
    >because I've heard some caution about long backup windows for large
    >amounts of data.


    What caution? You can either back it up in a window or you can't.
    There's not really a cautionary statement there. If you can break up
    the data set to work within your window then great. If not, you still
    need to back it up.

    >
    >Probobally more in particular, restore on a volume level also seems to
    >be something some people in my old organization had warned me about...
    >some volumes are fairly very large, and just to recover a single file,
    >would I need to restore an entire volume?
    >


    You do not need to restore the entire volume. However if you do not
    use DAR then you may end up scanning the entire tape set for that
    backup. One thing to keep in mind about DAR is that it's file level
    only. So if you select a directory you just lost that ability, but if
    you select every file in a directory you will get speedy restores.

    ~F

  15. Re: NDMP considerations

    On 6 Apr 2007 02:01:35 -0700, "Raju Mahala"
    wrote:

    >On Apr 6, 12:19 am, ludlaslac...@yahoo.com wrote:
    >> On Apr 5, 10:39 am, "Raju Mahala" wrote:
    >>
    >>
    >>
    >>
    >>
    >> > On Apr 5, 12:19 pm, ludlaslac...@yahoo.com wrote:

    >>
    >> > > Hi Friends,
    >> > > I have a task of consolidating some of our datacenters. When done, I
    >> > > would have accumilated about 1 peta byte of data on our brand new
    >> > > netapp filers.

    >>
    >> > > While I am in our scoping out phase, I would like to know what
    >> > > considerations I should take for backing up (full and incremental
    >> > > backups) of 1 petabyte of data on an on going basis.

    >>
    >> > > What issues will I run into?

    >>
    >> > > What are best practises?

    >>
    >> > > How long will this backup take?

    >>
    >> > > Any other things to watch out for?

    >>
    >> > > Just a few notes on our architecture... we have a few high end netapp
    >> > > filers (combinations of 3020s, 960cs etc) as tier 1 followed by a tier
    >> > > of 270cs for older data.

    >>
    >> > > I dont have anything speced for VTL and backup yet.

    >>
    >> > > Thank You
    >> > > Ludlas.

    >>
    >> > Infact I don't have experience on ndmpcopy backup but I feel during
    >> > ndmpcopy filer performance may be affected. So you have to findout
    >> > whether you have sufficient off hours for backup. Another point I
    >> > would like to say that check average file size, if its very less then
    >> > ndmpcopy may take more time. More no. of small size files on volume
    >> > affect ndmpcopy performance very badly.
    >> > In out environment, 1.2TB volume with around 30M files take more than
    >> > 14hr to complete ndmpcopy backup. So we take backup from another linux
    >> > box where storage volumes are mounted.

    >>
    >> > - Raju

    >>
    >> Thanks Raju, that somewhat helps...
    >>
    >> I cant understand though why a mere 1.2 TB volume is taking 14 hours
    >> to read. That is way slower than wireline speeds... We are able to
    >> read 1.2TB of data much faster than that from even our slow 270s just
    >> over NFS infact. (one of our modules copies all snapshot directories
    >> via NFS instead of ndmp right now)
    >>
    >> Do you know where the bottleneck is? Any one else out there with
    >> experiences in this area for such large number of files and data?- Hide quoted text -
    >>
    >> - Show quoted text -

    >
    >Yes we know the bottleneck. Lots of RCA was done by Netapp guy also.
    >So there are two major reasons :
    >1) large no. of small files (average file size : ~40KB)
    >2) Heavy turnaround of file creation/deletion
    >
    >So due to these factors volume gets fragmented at file level and
    >because ndmpcopy works at file level so it takes more time to read
    >fragmentated files. If we go for snapmirror then it doesn't take much
    >time because that is block level copy. Even it takes more than
    >expected time during snapvault which is block level copy but works at
    >qtree level so need to find out corresponding block for qtree.
    >
    >- Raju



    There is really no fragmentation in WAFL, at least not what most
    people understand it to be. Because ALL writes on a filer are
    sequential (think NVRAM) and because of the way WAFL functions it's
    almost zero in a general purpose environment.
    However, the more you fill up a volume the harder the file system has
    to work to figure out where to put stuff. This is what causes
    performance degradation at 85% and above.

    Data churn is also not an issue for backups unless you run out of
    snapshot space. Performance may be impacted because of the general
    resource usage of other processes but the data churn itself is not an
    issue.

    ~F

+ Reply to Thread