[9fans] s3venti - Plan9

This is a discussion on [9fans] s3venti - Plan9 ; I mentioned in passing some time ago that I was working on a venti server that uses Amazon S3 as a storage backend. There is now code in /n/sources/contrib/rcbilson/s3venti . Beware sharp edges. I have pumped a fair amount of ...

+ Reply to Thread
Results 1 to 11 of 11

Thread: [9fans] s3venti

  1. [9fans] s3venti

    I mentioned in passing some time ago that I was working on a venti
    server that uses Amazon S3 as a storage backend. There is now code in
    /n/sources/contrib/rcbilson/s3venti . Beware sharp edges. I have
    pumped a fair amount of test data through it successfully, but I
    wouldn't recommend trusting anything important to it yet. There is a
    man page.

    I started writing it under plan9, but for irrelevant reasons later
    switched to plan9port, so that's where it's known to work (on Linux,
    at least). I would hope and expect that moving it back to native plan9
    would be a small job.

    Questions and comments are welcome.

  2. Re: [9fans] s3venti

    that's interesting. we initially considered that, but decided on
    S3fs. , brucee has been working on it. we will use it to provide
    archiving for rangboom users.

    > I mentioned in passing some time ago that I was working on a venti
    > server that uses Amazon S3 as a storage backend. There is now code in
    > /n/sources/contrib/rcbilson/s3venti . Beware sharp edges. I have
    > pumped a fair amount of test data through it successfully, but I
    > wouldn't recommend trusting anything important to it yet. There is a
    > man page.
    >
    > I started writing it under plan9, but for irrelevant reasons later
    > switched to plan9port, so that's where it's known to work (on Linux,
    > at least). I would hope and expect that moving it back to native plan9
    > would be a small job.
    >
    > Questions and comments are welcome.



  3. Re: [9fans] s3venti

    > I mentioned in passing some time ago that I was working on a venti
    > server that uses Amazon S3 as a storage backend. There is now code in
    > /n/sources/contrib/rcbilson/s3venti . Beware sharp edges. I have
    > pumped a fair amount of test data through it successfully, but I
    > wouldn't recommend trusting anything important to it yet. There is a
    > man page.
    >
    > I started writing it under plan9, but for irrelevant reasons later
    > switched to plan9port, so that's where it's known to work (on Linux,
    > at least). I would hope and expect that moving it back to native plan9
    > would be a small job.
    >
    > Questions and comments are welcome.


    neat stuff.

    i took a quick look at pricing -- $0.15/gb/month plus $0.10/gb to transfer
    data in. assuming it's the data motel and it never checks out,
    500GB would cost $1500 to store for a year. but 1GB would cost
    just $3. this seems nice -- my fs has only 2.5GB of stuff. and even
    at my cost of $100 for the recycled machine, that $1.60/gb/month.
    but i would need to cache all that locally & have a duplicate copy.
    so what usage senerio do you have in mind for venti/s3?

    skip: what are the principles of operation of s3fs? what's the advantage
    over venti?

    - erik


  4. Re: [9fans] s3venti

    > what usage senerio do you have in mind for venti/s3?

    I wanted "set it and forget it" off-site backups, at a reasonable cost
    and without significant capital outlays or maintenance. I.e., mirror
    an existing venti with a cron job, or use it as a target for vbackup.
    As you point out, whether the cost of S3 is reasonable depends on how
    much you have to store, and how much it's worth to you to store it. I
    don't intend to use it for my mp3s, for instance.

    An additional advantage of s3venti is that multiple s3venti servers
    can use the same S3 bucket and exploit redundancies in the data across
    servers. That's not of particular use to me right now, but it seemed
    interesting.

  5. Re: [9fans] s3venti

    On Mon, 11 Feb 2008 11:39:23 EST "Richard Bilson" wrote:
    > > what usage senerio do you have in mind for venti/s3?

    >
    > I wanted "set it and forget it" off-site backups, at a reasonable cost
    > and without significant capital outlays or maintenance. I.e., mirror
    > an existing venti with a cron job, or use it as a target for vbackup.
    > As you point out, whether the cost of S3 is reasonable depends on how
    > much you have to store, and how much it's worth to you to store it. I
    > don't intend to use it for my mp3s, for instance.


    In using S3 for off-site backups I would worry about the time
    to restore a failed disk (apart from the privacy issues). As
    an example restoring a 100GB disk over the 'net at a constant
    300KB/s of download speed can take close to 4 days. Of
    course, these days many people have much more data than that.

    May be there are other remote backup companies that provide a
    "copy your data to disk and deliver it overnight" service for
    an extra charge.

  6. Re: [9fans] s3venti

    > skip: what are the principles of operation of s3fs? what's the advantage
    > over venti?


    it is easier to do a mirror. there is a limitation on the number
    of buckets, etc that also played into it, and an issue related to the
    fact that we need to encrypt users' data. unfortunately the
    thread that had brucee's (and rsc's i believe) comments on it is on a sick
    kenfs that's being worked on.


  7. Re: [9fans] s3venti

    > and an issue related to the
    > fact that we need to encrypt users' data.


    For the record, s3venti does encrypt blocks that it writes to S3. It
    uses a single key, making it rather vulnerable to dictionary attacks,
    but I haven't come up with a way to do better without changing the
    venti protocol. Suggestions are welcome.

  8. Re: [9fans] s3venti

    Richard Bilson schrieb:
    >> and an issue related to the
    >> fact that we need to encrypt users' data.

    >
    > For the record, s3venti does encrypt blocks that it writes to S3. It
    > uses a single key, making it rather vulnerable to dictionary attacks,
    > but I haven't come up with a way to do better without changing the
    > venti protocol. Suggestions are welcome.


    Any sort of encryption which does not change the key from time to time
    is not very secure. If the attacker has enough time, security is not easy
    to get.

    I propose to divide the files to store, e.g. into upper and lower 4bit
    nibbles and put them into different places. In this case both halves are
    likely to be less worthy for themselves, and much more difficult to
    decipher, too.
    --
    Dipl.-Math. Wilhelm Bernhard Kloke
    Institut fuer Arbeitsphysiologie an der Universitaet Dortmund
    Ardeystrasse 67, D-44139 Dortmund, Tel. 0231-1084-257
    PGP: http://vestein.arb-phys.uni-dortmund...b/mypublic.key

  9. Re: [9fans] s3venti

    You could reduce your storage bill by using file names to store the data
    through information hiding rather than the content

    http://www.geocities.com/patchnpuki/...ompression.htm

    One of these days ......


  10. Re: [9fans] s3venti

    > You could reduce your storage bill by using file names to store the data
    > through information hiding rather than the content
    >
    > http://www.geocities.com/patchnpuki/...ompression.htm
    >
    > One of these days ......


    my reading of the sla seemed to indicate they count bucket names
    against you.

    - erik


  11. Re: [9fans] s3venti

    > For the record, s3venti does encrypt blocks that it writes to S3. It
    > uses a single key, making it rather vulnerable to dictionary attacks,
    > but I haven't come up with a way to do better without changing the
    > venti protocol. Suggestions are welcome.


    Beware: I am no security expert, I know just enough to be dangerous.

    Ensure you have plenty of entropy - insist on long pass phrases.
    sha1 this with the block number to give you the key for a particular block.
    This at least permutes the venti tree info blocks - its real purpose is
    to ensure the duplicate blocks look different when encrypted but venti doesn't
    have duplicate blocks as such.

    you could repeat the sha1 as it may be possible to infer some
    info given all the sha1s start with the same (or known) prefix -
    the pass phrase (or block number).

    If you are likely to have multiple ventis with the same password on the server
    (one for work stuff and one for home) then stir a random string into the sha1,
    and keep this in factotum; generate this string when the venti is initialised.

    your venti blocks are compressed which gives you some obscurity, guessing
    plaintext is not so easy but huffman tables and the like still stand out.

    If you want to be obsessive you could generate a block of random data, say 64k
    which you hold locally and xor this with your venti blocks before encryption.
    offset your start position in the random data by a value generated from the
    sha1(sha1(blocknumber, passphrase)) (eg the checksum), this would make cracking
    your data much harder.

    Note this block of random data needs to be really random, not a PRBS like rand()
    which is predictable. you could slowly suck bytes from /dev/random on a busy machine.

    as ever its a case of:

    how valuable is it?
    how long to you want to keep it secret?
    who are you trying to keep it secret from?

    caveat emptor

    -Steve

+ Reply to Thread