btree on disk - Unix

This is a discussion on btree on disk - Unix ; Hi, For learning purposes, i want to implement a btree on disk on an unix system(freebsd). I am aware of the existence of dbm/ndbm. But this doesn't solve my purpose. I understand the necessity of a temporary file for write/update ...

+ Reply to Thread
Results 1 to 19 of 19

Thread: btree on disk

  1. btree on disk

    Hi,

    For learning purposes, i want to implement a btree on disk on an unix
    system(freebsd). I am aware of the existence of dbm/ndbm. But this
    doesn't solve my purpose. I understand the necessity of a temporary
    file for write/update purposes. I would like to know in-detail and in-
    depth how this could be achieved(use of cache/shared memroy,etc.). I
    would like to know the concepts,tricks and hints with respect to this.
    Also, pls let me know if there are books which clearly explain how a
    database is implemented, probably with sample code in c/c++ explaining
    the use of shared memory, merging of files after updates etc.

    Any direction in this regard is greatly appreciated.

    Thanks,
    Balaji.


  2. Re: btree on disk

    >For learning purposes, i want to implement a btree on disk on an unix
    >system(freebsd). I am aware of the existence of dbm/ndbm. But this
    >doesn't solve my purpose.


    What purpose? Just learning, or did you have a specific application
    in mind?

    >I understand the necessity of a temporary
    >file for write/update purposes.


    It is not absolutely necessary to use a temporary file to make
    changes in databases that are set up so you can do updates in place.

    >I would like to know in-detail and in-
    >depth how this could be achieved(use of cache/shared memroy,etc.). I


    A database on disk file can be done on disk. Explicit use of cache
    is not necessary (the filesystem is likely to provide it, wanted
    or not). Use of shared memory isn't necessary if you have nothing
    to share it with. You didn't say anything about multiple programs
    trying to update the database at the same time.

    >would like to know the concepts,tricks and hints with respect to this.
    >Also, pls let me know if there are books which clearly explain how a
    >database is implemented, probably with sample code in c/c++ explaining
    >the use of shared memory, merging of files after updates etc.



  3. Re: btree on disk

    On Oct 21, 9:56 pm, gordonb.2c...@burditt.org (Gordon Burditt) wrote:
    > >For learning purposes, i want to implement a btree on disk on an unix
    > >system(freebsd). I am aware of the existence of dbm/ndbm. But this
    > >doesn't solve my purpose.

    >
    > What purpose? Just learning, or did you have a specific application
    > in mind?
    >
    > >I understand the necessity of a temporary
    > >file for write/update purposes.

    >
    > It is not absolutely necessary to use a temporary file to make
    > changes in databases that are set up so you can do updates in place.
    >
    > >I would like to know in-detail and in-
    > >depth how this could be achieved(use of cache/shared memroy,etc.). I

    >
    > A database on disk file can be done on disk. Explicit use of cache
    > is not necessary (the filesystem is likely to provide it, wanted
    > or not). Use of shared memory isn't necessary if you have nothing
    > to share it with. You didn't say anything about multiple programs
    > trying to update the database at the same time.
    >
    >
    >
    > >would like to know the concepts,tricks and hints with respect to this.
    > >Also, pls let me know if there are books which clearly explain how a
    > >database is implemented, probably with sample code in c/c++ explaining
    > >the use of shared memory, merging of files after updates etc.- Hide quoted text -

    >
    > - Show quoted text -


    Hi,

    Thanks for replying to my post. I appreciate it.

    I have a definite application in mind. The application is nothing but
    a web-server/web service. Since i know unix,c,sockets,data structures,
    i wanted to write one of my own, at my free time. I felt this would be
    the optimal way to learn the nuances.

    yes, there would be multiple processes that access the same data.
    hence looking out for a book/this group for further directions.

    Thanks,
    Balaji.


  4. Re: btree on disk

    On Oct 22, 2:33 am, kasthurirangan.bal...@gmail.com wrote:

    > I have a definite application in mind. The application is nothing but
    > a web-server/web service. Since i know unix,c,sockets,data structures,
    > i wanted to write one of my own, at my free time. I felt this would be
    > the optimal way to learn the nuances.


    If you're trying to write a clone of ndbm/gdbm, why not study how
    those programs work first? You're asking an incredibly vague question
    and you seem to be expecting some very specific answers.

    DS


  5. Re: btree on disk

    On Oct 22, 7:38 pm, David Schwartz wrote:
    > On Oct 22, 2:33 am, kasthurirangan.bal...@gmail.com wrote:
    >
    > > I have a definite application in mind. The application is nothing but
    > > a web-server/web service. Since i know unix,c,sockets,data structures,
    > > i wanted to write one of my own, at my free time. I felt this would be
    > > the optimal way to learn the nuances.

    >
    > If you're trying to write a clone of ndbm/gdbm, why not study how
    > those programs work first? You're asking an incredibly vague question
    > and you seem to be expecting some very specific answers.
    >
    > DS


    Thanks. Thats also a great idea. Pls let me know where can i find the
    internals of ndbm/gdbm. Whenever i search in google, i always get the
    usage and not the internals.

    If my words are vague, here are the details.

    I have a simple web server(basically a server). I have just one page,
    divided into parts and any of the part could be added or updated.
    Right now, its just 3 parts and i have no issues doing a seek/write.
    Also, right now its just serial processing.But going forward, i intend
    to use pthreads and also make the parts updatable by another process,
    eventually increasing the number of pages. I understand that btree is
    the best for disk access for large data(assuming more pages are to be
    added). I could very well use LAMP technology, but since i want to
    learn the nuances i am just trying it. Definitely its something like
    're-inventing the wheel'. Since i am doing this at my free time, i do
    not mind it.


  6. Re: btree on disk

    kasthurirangan.balaji@gmail.com wrote:
    > On Oct 22, 7:38 pm, David Schwartz wrote:


    > > If you're trying to write a clone of ndbm/gdbm, why not study how
    > > those programs work first? You're asking an incredibly vague question
    > > and you seem to be expecting some very specific answers.
    > >
    > > DS

    >
    > Thanks. Thats also a great idea. Pls let me know where can i find the
    > internals of ndbm/gdbm. Whenever i search in google, i always get the
    > usage and not the internals.


    Not a tree, but Berstein's CDB has an exceedingly simple layout, and it
    might be a good start:

    http://cr.yp.to/cdb.html

    There's ample documentation.

  7. btree on disk

    WA> Not a tree, but Berstein's CDB has an exceedingly simple
    WA> layout, and it might be a good start:
    WA>
    WA> http://cr.yp.to/cdb.html
    WA>
    WA> There's ample documentation.

    Actually, the Bernstein approach to this would be to not reinvent the
    filesystem. The filesystem is quite capable of holding three parts of
    a web page to be concatenated together, in three files. The rename()
    system call provides atomic update semantics for individual parts, if
    one requires those. And directories provide indexing.

    Witness the operation of the Maildir mailbox format.


  8. Re: btree on disk

    On Oct 24, 9:12 am, J de Boyne Pollard
    wrote:

    > Actually, the Bernstein approach to this would be to not reinvent the
    > filesystem. The filesystem is quite capable of holding three parts of
    > a web page to be concatenated together, in three files. The rename()
    > system call provides atomic update semantics for individual parts, if
    > one requires those. And directories provide indexing.


    Give that man a cigar.

    DS



  9. Re: btree on disk

    J de Boyne Pollard wrote:
    > WA> Not a tree, but Berstein's CDB has an exceedingly simple
    > WA> layout, and it might be a good start:
    > WA>
    > WA> http://cr.yp.to/cdb.html
    > WA>
    > WA> There's ample documentation.
    >
    > Actually, the Bernstein approach to this would be to not reinvent the
    > filesystem. The filesystem is quite capable of holding three parts of
    > a web page to be concatenated together, in three files. The rename()
    > system call provides atomic update semantics for individual parts, if
    > one requires those. And directories provide indexing.


    I agree. I'm guilty of not paying sufficient attention to the OP's request.
    But also I've given up on this fight. I've been railing against gratuitous
    use of SQL databases and other useless data obsfucations for years, but
    often you just get some micro-benchmark thrown back in your face; eventually
    you just get worn down.

    > Witness the operation of the Maildir mailbox format.


    You haven't met the developer of UW-IMAP, yet, have you? Watch out for
    useless micro-benchmarks.


  10. Re: btree on disk

    William Ahern writes:

    [...]

    >> Witness the operation of the Maildir mailbox format.

    >
    > You haven't met the developer of UW-IMAP, yet, have you? Watch out for
    > useless micro-benchmarks.


    The 'developer of UW-IMAP' never provided a 'micro-benchmark' but just
    stated that he doesn't like maildir, because it has an enormously
    larger amount of system calls than other mail access schemes.

    To put this into contemporary language: It will scale extremly bad and
    a sufficiently large installation would be better off using some
    database format that provides indexing et. al. without resorting to
    the kernel file-system support code for this. 'Sufficiently large'
    here depends on the actual kernel being used and on the machine in runs
    on.

    BTW, have you bothered to determine how many users UWash supported,
    using what kernel running on which hardware by the time this quite old
    statement was actually written, and to prove that it was really
    useless back then?



  11. Re: btree on disk

    On Oct 23, 11:30 am, William Ahern
    wrote:
    > kasthurirangan.bal...@gmail.com wrote:
    > > On Oct 22, 7:38 pm, David Schwartz wrote:

    >
    > > > If you're trying to write a clone of ndbm/gdbm, why not study how
    > > > those programs work first? You're asking an incredibly vague question
    > > > and you seem to be expecting some very specific answers.

    >
    > > > DS

    >
    > > Thanks. Thats also a great idea. Pls let me know where can i find the
    > > internals of ndbm/gdbm. Whenever i search in google, i always get the
    > > usage and not the internals.

    >
    > Not a tree, but Berstein's CDB has an exceedingly simple layout, and it
    > might be a good start:
    >
    > http://cr.yp.to/cdb.html
    >
    > There's ample documentation.


    Also see Tridgell's tdb: http://sourceforge.net/projects/tdb/


  12. Re: btree on disk

    William Ahern writes:
    > Rainer Weikusat wrote:
    >> William Ahern writes:

    >
    >> [...]

    >
    >> >> Witness the operation of the Maildir mailbox format.
    >> >
    >> > You haven't met the developer of UW-IMAP, yet, have you? Watch out for
    >> > useless micro-benchmarks.

    >
    >> BTW, have you bothered to determine how many users UWash supported,
    >> using what kernel running on which hardware by the time this quite old
    >> statement was actually written, and to prove that it was really
    >> useless back then?

    >
    > 11 December 2006
    >
    > http://www.washington.edu/imap/docum...rmats.txt.html


    As can be verified (at least now, 2007-10-25 19:14:00 CEST) by taking
    a look into the old directory of the imap ftp server, this file
    appeared in the imap-4.6 distribution, originally dated

    [rw@fever]/tmp/imap-4.6/docs $ls -l formats.txt
    -rw-r--r-- 1 rw rw 8673 1999-06-06 05:35 formats.txt

    [...]

    > On _my_ system Maildir, with 20+ users, has proven far superior to
    > mbox.


    Quoting from the file elleselled above:

    There's a general reason why file/message formats are a bad idea.
    Just about every filesystem in existance serializes file creation and
    deletions because these manipulate the free space map. _This turns out
    to be an enormous problem when you start creating/deleting more than a
    few messages per second;_

    And '20+ users' is certainly within the range of (at most) 'creating a
    few messages per second'.

  13. Re: btree on disk

    Rainer Weikusat wrote:
    > William Ahern writes:


    > > 11 December 2006
    > >
    > > http://www.washington.edu/imap/docum...rmats.txt.html


    > As can be verified (at least now, 2007-10-25 19:14:00 CEST) by taking
    > a look into the old directory of the imap ftp server, this file
    > appeared in the imap-4.6 distribution, originally dated


    > [rw@fever]/tmp/imap-4.6/docs $ls -l formats.txt
    > -rw-r--r-- 1 rw rw 8673 1999-06-06 05:35 formats.txt


    Point being? It's no excuse for continued reliance on specious analysis.

    > > On _my_ system Maildir, with 20+ users, has proven far superior to
    > > mbox.

    >
    > Quoting from the file elleselled above:
    >
    > There's a general reason why file/message formats are a bad idea.
    > Just about every filesystem in existance serializes file creation and
    > deletions because these manipulate the free space map. _This turns out
    > to be an enormous problem when you start creating/deleting more than a
    > few messages per second;_
    >
    > And '20+ users' is certainly within the range of (at most) 'creating a
    > few messages per second'.


    Is the problem less than or greater than the problem of users keeping large
    attachments?

  14. Re: btree on disk

    William Ahern writes:
    > Rainer Weikusat wrote:
    >> William Ahern writes:

    >
    >> > 11 December 2006
    >> >
    >> > http://www.washington.edu/imap/docum...rmats.txt.html

    >
    >> As can be verified (at least now, 2007-10-25 19:14:00 CEST) by taking
    >> a look into the old directory of the imap ftp server, this file
    >> appeared in the imap-4.6 distribution, originally dated

    >
    >> [rw@fever]/tmp/imap-4.6/docs $ls -l formats.txt
    >> -rw-r--r-- 1 rw rw 8673 1999-06-06 05:35 formats.txt

    >
    > Point being?


    Check the previous posting.

    > It's no excuse for continued reliance on specious analysis.


    And answer my question. Additionally, it would be appropriate if you
    could point out the fault in the statement below instead of just
    asserting that there is one, using what I would consider to be a
    loaded term for 'mistaken'[*].

    >> > On _my_ system Maildir, with 20+ users, has proven far superior to
    >> > mbox.

    >>
    >> Quoting from the file elleselled above:
    >>
    >> There's a general reason why file/message formats are a bad idea.
    >> Just about every filesystem in existance serializes file creation and
    >> deletions because these manipulate the free space map. _This turns out
    >> to be an enormous problem when you start creating/deleting more than a
    >> few messages per second;_
    >>
    >> And '20+ users' is certainly within the range of (at most) 'creating a
    >> few messages per second'.

    >
    > Is the problem less than or greater than the problem of users keeping large
    > attachments?


    Since you originally asserted that the statement above would be
    'specious', one would assume that you must know the answer to this
    question.

    So, what is it?

  15. btree on disk

    JdeBP> Actually, the Bernstein approach to this would be to not
    JdeBP> reinvent the filesystem. The filesystem is quite capable
    JdeBP> of holding three parts of a web page to be concatenated
    JdeBP> together, in three files. The rename() system call provides
    JdeBP> atomic update semantics for individual parts, if one
    JdeBP> requires those. And directories provide indexing.

    WA> I agree. I'm guilty of not paying sufficient attention to the
    OP's
    WA> request. But also I've given up on this fight. I've been railing
    WA> against gratuitous use of SQL databases and other useless
    WA> data obsfucations for years, but often you just get some
    WA> micro-benchmark thrown back in your face; eventually you
    WA> just get worn down.

    JdeBP> Witness the operation of the Maildir mailbox format.

    WA> You haven't met the developer of UW-IMAP, yet, have you?
    WA> Watch out for useless micro-benchmarks.

    RW> The 'developer of UW-IMAP' never provided a 'micro-
    benchmark' [...]

    True. However several benchmarks _were_ produced in response. Here
    are some: people.redhat.com./rkeech/maildir-migration.txt> www.decisionsoft.com./pdw/mailbench.html>

    (And although I haven't physically met M. Crispin, I've been around
    for a long time, and I do know who he is. I cited him in one of my
    FGAs back in 2004.)


  16. btree on disk

    WA> It's also a fair assessment, I think, so say that it's a major
    WA> pain that UW-IMAP doesn't support Maildir, and that a
    WA> disinterested third party, with full view of all the other Unix
    WA> software which supports Maildir (mutt, etc), would agree that
    WA> the reticence is a bit puzzling. On _my_ system Maildir, with
    WA> 20+ users, has proven far superior to mbox. But, for various
    WA> reasons, I stick with UW-IMAP and mbox (for non-shell
    WA> users), notwithstanding that the users who insist on keeping
    WA> 10MB attachments in their folders continue to whine incessantly.






    You can make it use Maildir format mailboxes if you want to.


  17. Re: btree on disk

    Rainer Weikusat wrote:
    > William Ahern writes:


    > > Is the problem less than or greater than the problem of users keeping large
    > > attachments?


    > Since you originally asserted that the statement above would be
    > 'specious', one would assume that you must know the answer to this
    > question.
    >
    > So, what is it?


    "specious"

    apparently good or right though lacking real merit; superficially
    pleasing or plausible

    The maildir analysis is specious because it assumes usage patterns that are
    far from universal, and in fact may be becoming less common. And it was
    specious back then as now because it made a universal statement based on a
    set of localized conditions and metrics, which even _then_ were suspect.

    That maildir suffered dismal performance for the usage patterns of interest
    to the developers at the time they made their analysis, I don't doubt. But
    the totality of the argument against maildir rose to more a general level
    argumentation. There arose a more general dispute in the IMAP community.

    Thus, my point about people using "micro-benchmarks" to shoot down simple
    solutions. Often people generalize to the extreme, and unless one has solid
    data about a specific operating condition, *and* that the specific operating
    condition will actually be the normal condition (which is a much harder
    task), then such a person has failed their burden of proof. All else equal,
    in engineering the rule of thumb is to assume the simpler solution, layered
    on established mechanism, to be superior.


  18. Re: btree on disk

    William Ahern writes:
    > Rainer Weikusat wrote:
    >> William Ahern writes:

    >
    >> > Is the problem less than or greater than the problem of users keeping large
    >> > attachments?

    >
    >> Since you originally asserted that the statement above would be
    >> 'specious', one would assume that you must know the answer to this
    >> question.
    >>
    >> So, what is it?

    >
    > "specious"


    The 'what' would refer to the answer.

    > The maildir analysis is specious because it assumes usage patterns that are
    > far from universal, and in fact may be becoming less common.


    The formats.txt does not contain 'a maildir analysis', but just remark
    regarding 'file system overhead' of 'maildir'. And you are still
    talking in soap bubbles, ie you are not stating what the terms you use
    are supposed to mean.

    > And it was specious back then as now because it made a universal
    > statement based on a set of localized conditions and metrics, which
    > even _then_ were suspect.


    In itself, this statement means absolutely nothing. Since I have now
    asked you to detail and substantiate your assertions at least twice,
    and all you have done is that you have looped around to your original,
    factually wrong, statement about the fictional microbenchmark, I'd say
    it is safe to assume that you can neither detail nor substantiate what
    you are actually writing about.

  19. btree on disk

    WA> The maildir analysis is specious because it assumes usage
    WA> patterns that are far from universal, and in fact may be becoming
    WA> less common. And it was specious back then as now because it
    WA> made a universal statement based on a set of localized conditions
    WA> and metrics, which even _then_ were suspect.

    M. Weikusat may not comprehend what you are saying, but I do.
    However, I think that you are being unjust. M. Crispin _didn't_
    overgeneralize from one set of data. His analysis was a deductive
    one, not an inductive one. The flaw that people pointed out was not
    overgeneralization, but the erroneous axiomatic assumption that all
    filesystems behaved in a certain way, from which the deductions were
    then made.

    WA> That maildir suffered dismal performance for the usage patterns
    WA> of interest to the developers at the time they made their
    analysis,
    WA> I don't doubt. But the totality of the argument against maildir
    rose
    WA> to more a general level [of] argumentation. There arose a more
    WA> general dispute in the IMAP community.

    I remember.

    WA> Thus, my point about people using "micro-benchmarks" to shoot
    WA> down simple solutions. Often people generalize to the extreme,
    WA> and unless one has solid data about a specific operating
    condition,
    WA> *and* that the specific operating condition will actually be the
    WA> normal condition (which is a much harder task), then such a
    WA> person has failed their burden of proof. All else equal, in
    WA> engineering the rule of thumb is to assume the simpler
    WA> solution, layered on established mechanism, to be superior.

    It's true that people generalize. But often in my experience lack of
    context and lack of knowledge play a part, too. For example: There
    was a discussion in alt.folklore.computers last month where one poster
    was reluctant to run xyr own example program, on the grounds that it
    would create over 17,000 files in a single directory. (To some of us,
    that's simply a medium-sized news spool directory.) The worry about
    creating that number of files all too often stems from a vague
    knowledge that it is bad, gleaned thirdhand as received wisdom based
    upon sources that were talking about the FAT filesystem format. It's
    not necessarily that people are generalizing from FAT to everything
    else. It is sometimes that they don't know that (a) the books and
    articles that they may have read were only talking about FAT, because
    those books and articles were written at a time and for an audience
    where FAT was the only format available and so the fact that FAT was
    the filesystem format was implicit and often unstated; or that (b)
    other filesystems formats don't have this problem because they don't
    have linear directory organizations like FAT does, and that the
    problem is fomat-specific. They lack the context to realize that the
    old article talking about how bad large directory sizes were was
    written from the viewpoint of someone who only had the option of FAT,
    and they lack the knowledge of filesystem design to know that not all
    filesytem formats are the same in this regard.


+ Reply to Thread