Asynchronous Disk IO on linux - Linux

This is a discussion on Asynchronous Disk IO on linux - Linux ; Hi all, I just read "The C10K problem" and I found this line: An important bottleneck in this method is that read() or sendfile() from disk blocks if the page is not in core at the moment; setting nonblocking mode ...

+ Reply to Thread
Results 1 to 10 of 10

Thread: Asynchronous Disk IO on linux

  1. Asynchronous Disk IO on linux

    Hi all,

    I just read "The C10K problem" and I found this line:

    An important bottleneck in this method is that read() or
    sendfile() from disk blocks if the page is not in core at the moment;
    setting nonblocking mode on a disk file handle has no effect.

    So does that means:

    1. using select or epoll on disk IO is useless?
    2. These is no non-blocking disk IO at all?

  2. Re: Asynchronous Disk IO on linux

    On Nov 8, 8:27 am, һʫ wrote:
    > Hi all,
    >
    > I just read "The C10K problem" and I found this line:
    >
    > An important bottleneck in this method is that read() or
    > sendfile() from disk blocks if the page is not in core at the moment;
    > setting nonblocking mode on a disk file handle has no effect.


    Right.

    > So does that means:
    >
    > 1. using select or epoll on disk IO is useless?


    That's correct. The descriptor will always be ready for both reading
    and writing.

    > 2. These is no non-blocking disk IO at all?


    Not anything like non-blocking network I/O. Read the man page for
    'io_setup' and 'aio_read'.

    I personally recommend using threads for this purpose.

    DS

  3. Re: Asynchronous Disk IO on linux

    Wouldn't it be better if there were an option to open() like what is
    proposed in http://lkml.org/lkml/2005/3/17/139 ?

    I personally would like to avoid all of the concurrency baggage that
    comes with adding thread usage to my process.

    Does anyone know if this functionality is currently being discussed or
    prototyped within the kernel dev community?

    David

    David Schwartz wrote:
    > On Nov 8, 8:27 am, һ��ʫ wrote:
    > > Hi all,
    > >
    > > I just read "The C10K problem" and I found this line:
    > >
    > > An important bottleneck in this method is that read() or
    > > sendfile() from disk blocks if the page is not in core at the moment;
    > > setting nonblocking mode on a disk file handle has no effect.

    >
    > Right.
    >
    > > So does that means:
    > >
    > > 1. using select or epoll on disk IO is useless?

    >
    > That's correct. The descriptor will always be ready for both reading
    > and writing.
    >
    > > 2. These is no non-blocking disk IO at all?

    >
    > Not anything like non-blocking network I/O. Read the man page for
    > 'io_setup' and 'aio_read'.
    >
    > I personally recommend using threads for this purpose.
    >
    > DS


  4. Re: Asynchronous Disk IO on linux

    On Nov 10, 8:20*am, ben...@xdal.org wrote:

    > Wouldn't it be better if there were an option to open() like what is
    > proposed inhttp://lkml.org/lkml/2005/3/17/139?


    I don't think so, since the semantics are not sensible.

    > I personally would like to avoid all of the concurrency baggage that
    > comes with adding thread usage to my process.


    If you don't want concurrency, what are we talking about? If you want
    concurrency, how is the concurrency a bad thing?

    > Does anyone know if this functionality is currently being discussed or
    > prototyped within the kernel dev community?


    Nobody has yet come up with sensible semantics. The reason sockets
    have sensible semantics is that it's very clear what it means for a
    socket to be writable or readable. There is no such obvious meaning
    for a file. What does it mean for a file to be readable?

    Linux does have threads and does have aio_read/aio_write.

    DS

  5. Re: Asynchronous Disk IO on linux

    benoit@xdal.org wrote:
    > Wouldn't it be better if there were an option to open() like what is
    > proposed in http://lkml.org/lkml/2005/3/17/139 ?


    There are obvious typos (decimal instead of octal) in the proposed patch
    for asm-parisc and asm-alpha:
    +#define O_ATOMICREAD 10000000 /* non-blocking file i/o */
    This may indicate that little or no testing has been done, which can
    cast doubt on the rest of the proposal.

    > I personally would like to avoid all of the concurrency baggage that
    > comes with adding thread usage to my process.


    It looks like the implementation just returns -EWOULDBLOCK whenever
    an operation would block. If so, then you have not avoided
    "all of the concurrency baggage", because spinning/time_out/re-try
    is concurrency baggage. Also, it is not obvious that the implementation
    guarantees forward progress. What prevents -EWOULDBLOCK forever
    if user code always retries the same operation?

    --

  6. Re: Asynchronous Disk IO on linux

    Here are the semantics I want (pseudo code):

    fd=open(file, non-blocking);
    n = read(fd, buf, 1000);

    at this point, the kernel will check "do I have 1000 bytes available
    to read? if yes, return them. If no, return the number I have and set
    errno to EWOULDBLOCK, then start the process of paging in that data
    from the disk. When it is available, signal READ on the fd for
    epoll()/poll()/select().

    And if I do:

    n = write(fd, buf, x);

    it does something similar.

    re: aio, I consider that solution messy, especially when mixing it
    with non-blocking socket handling. Since the operations aren't atomic
    (i.e. when I "post" a read, I have to "cancel" it before I can release
    my buffer, etc.), that means I have to do a LOT more management of
    structures after I want to get rid of them. This is likely what I'm
    going to have to use, but it reminds me too much of OVERLAPPED
    maddness in the windows API.

    re: concurrency, obviously I want to have "concurrency"... my point
    was that introducing threads introduces many concurrency issues that
    one does not have in a completely epoll() based processing model. I
    don't want to start worrying about TLS, resource locks, inter-thread
    communication, etc.

    David

    On Nov 10, 11:18*am, David Schwartz wrote:
    > On Nov 10, 8:20*am, ben...@xdal.org wrote:
    >
    > > Wouldn't it be better if there were an option to open() like what is
    > > proposed inhttp://lkml.org/lkml/2005/3/17/139?

    >
    > I don't think so, since the semantics are not sensible.
    >
    > > I personally would like to avoid all of the concurrency baggage that
    > > comes with adding thread usage to my process.

    >
    > If you don't want concurrency, what are we talking about? If you want
    > concurrency, how is the concurrency a bad thing?
    >
    > > Does anyone know if this functionality is currently being discussed or
    > > prototyped within the kernel dev community?

    >
    > Nobody has yet come up with sensible semantics. The reason sockets
    > have sensible semantics is that it's very clear what it means for a
    > socket to be writable or readable. There is no such obvious meaning
    > for a file. What does it mean for a file to be readable?
    >
    > Linux does have threads and does have aio_read/aio_write.
    >
    > DS



  7. Re: Asynchronous Disk IO on linux

    On Nov 10, 1:28*pm, ben...@xdal.org wrote:

    > Here are the semantics I want (pseudo code):
    >
    > *fd=open(file, non-blocking);
    > *n = read(fd, buf, 1000);
    >
    > at this point, the kernel will check "do I have 1000 bytes available
    > to read? if yes, return them. *If no, return the number I have and set
    > errno to EWOULDBLOCK, then start the process of paging in that data
    > from the disk. *When it is available, signal READ on the fd for
    > epoll()/poll()/select().


    When what is available? You started out explaining the semantics and
    then stopped. When "it" is available? What's "it"? All 1,000 bytes?
    The next byte?

    What happens if a seek intervenes before the 'select'? What if you
    'select' without calling 'read'?

    > And if I do:
    >
    > *n = write(fd, buf, x);
    >
    > it does something similar.


    Which would be? Nobody knows what the semantics for these operations
    should be.

    > re: aio, I consider that solution messy, especially when mixing it
    > with non-blocking socket handling. *Since the operations aren't atomic
    > (i.e. when I "post" a read, I have to "cancel" it before I can release
    > my buffer, etc.), that means I have to do a LOT more management of
    > structures after I want to get rid of them. *This is likely what I'm
    > going to have to use, but it reminds me too much of OVERLAPPED
    > maddness in the windows API.


    But that is the right way. That solves all the semantic problems and
    shows why normal non-blocking semantics don't work.

    > re: concurrency, obviously I want to have "concurrency"... my point
    > was that introducing threads introduces many concurrency issues that
    > one does not have in a completely epoll() based processing model. *I
    > don't want to start worrying about TLS, resource locks, inter-thread
    > communication, etc.


    Then don't do those things. You want a solution that works by magic.

    DS

  8. Re: Asynchronous Disk IO on linux

    I'm going to think about this more and perhaps respond with a more
    thought out proposal. I don't want magic. I want clear, clean and
    atomic operations.

    David

    On Nov 10, 1:58*pm, David Schwartz wrote:
    > On Nov 10, 1:28*pm, ben...@xdal.org wrote:
    >
    > > Here are the semantics I want (pseudo code):

    >
    > > *fd=open(file, non-blocking);
    > > *n = read(fd, buf, 1000);

    >
    > > at this point, the kernel will check "do I have 1000 bytes available
    > > to read? if yes, return them. *If no, return the number I have and set
    > > errno to EWOULDBLOCK, then start the process of paging in that data
    > > from the disk. *When it is available, signal READ on the fd for
    > > epoll()/poll()/select().

    >
    > When what is available? You started out explaining the semantics and
    > then stopped. When "it" is available? What's "it"? All 1,000 bytes?
    > The next byte?
    >
    > What happens if a seek intervenes before the 'select'? What if you
    > 'select' without calling 'read'?
    >
    > > And if I do:

    >
    > > *n = write(fd, buf, x);

    >
    > > it does something similar.

    >
    > Which would be? Nobody knows what the semantics for these operations
    > should be.
    >
    > > re: aio, I consider that solution messy, especially when mixing it
    > > with non-blocking socket handling. *Since the operations aren't atomic
    > > (i.e. when I "post" a read, I have to "cancel" it before I can release
    > > my buffer, etc.), that means I have to do a LOT more management of
    > > structures after I want to get rid of them. *This is likely what I'm
    > > going to have to use, but it reminds me too much of OVERLAPPED
    > > maddness in the windows API.

    >
    > But that is the right way. That solves all the semantic problems and
    > shows why normal non-blocking semantics don't work.
    >
    > > re: concurrency, obviously I want to have "concurrency"... my point
    > > was that introducing threads introduces many concurrency issues that
    > > one does not have in a completely epoll() based processing model. *I
    > > don't want to start worrying about TLS, resource locks, inter-thread
    > > communication, etc.

    >
    > Then don't do those things. You want a solution that works by magic.
    >
    > DS



  9. Re: Asynchronous Disk IO on linux

    On Nov 10, 2:22*pm, ben...@xdal.org wrote:

    > I'm going to think about this more and perhaps respond with a more
    > thought out proposal. *I don't want magic. *I want clear, clean and
    > atomic operations.


    From what I can tell, Windows OVERLAPPED operations or POSIX aio
    operations are the correct semantics for non-blocking operations on
    files. Normal non-blocking semantics just doesn't work, because
    there's no single notion of "readability" or "writability" that makes
    sense.

    There are many clean, simple ways you can solve this problem. You can
    queue bite-sized write operations to a group of worker threads. You
    can use asynchronous read.

    You complain about the overhead, but then you ask for precisely that
    same overhead. Any non-blocking write is going to require the data be
    stored until it can be committed, but that's precisely what you claim
    that's too much overhead. There is no difference in overhead based on
    who does it, and if you do it, you get to control it.

    It sounds like your complaint has nothing whatsoever to do with kernel
    or OS mechanisms. You don't want a new 'open' mode or new write
    semantics. You just want somebody to write an asynchronous file I/O
    library for you.

    I find that really, really odd. Since it's so simple.

    Sit down and write what you want. Use either aio or threads. Stop
    complaining and code.

    DS

  10. Re: Asynchronous Disk IO on linux

    benoit@xdal.org writes:
    > fd=open(file, non-blocking);
    > n = read(fd, buf, 1000);
    >
    > at this point, the kernel will check "do I have 1000 bytes available
    > to read? if yes, return them. If no, return the number I have and set
    > errno to EWOULDBLOCK, then start the process of paging in that data
    > from the disk.


    Well, what you want is an interface to request that the kernel reads a
    certain number of bytes from a particular descriptor, starting at a
    particular offset, into memory and informs the application when that
    has been done. This implies that it would be possible to implement a
    'data availability polling-function via pread. But there still
    wouldn't be a way to signal availability of this data without
    explicitly communicating the desired I/O-parameters to the
    kernel. Which means aio. The kernel can be told to post a signal to a
    process upon completion of an aio-event and this notification may
    include an arbitrary, user-supplied integer or pointer value
    (according to SUS). Since 2.6.22, signals can be received via file
    descriptor.

    So, what's your problem? Random-access I/O is more complicated than
    'stream'-I/O because arbitrary 'positions' are involved.

+ Reply to Thread