aio_read/write versus O_NONBLOCK - Unix

This is a discussion on aio_read/write versus O_NONBLOCK - Unix ; On May 27, 5:36*pm, phil-news-nos...@ipal.net wrote: > | Which is what? Are you suggesting it should say that the file is not > | ready for reading (since the data is not available) and that it never > | should ...

+ Reply to Thread
Page 4 of 5 FirstFirst ... 2 3 4 5 LastLast
Results 61 to 80 of 97

Thread: aio_read/write versus O_NONBLOCK

  1. Re: aio_read/write versus O_NONBLOCK

    On May 27, 5:36*pm, phil-news-nos...@ipal.net wrote:

    > | Which is what? Are you suggesting it should say that the file is not
    > | ready for reading (since the data is not available) and that it never
    > | should return ready no matter how long you wait, perhaps unless some
    > | other process coincidentally happens to cause the first few bytes of
    > | data to be resident in memory?


    > Being "ready to read" does not mean the actual data is in RAM. *It just means
    > you can proceed to do the read call if your program has an interest in
    > reading something (which presumably it does if you included that descriptor
    > for reading in poll or select).


    If a file is *always* ready to read, whether or not the actual data is
    in RAM, then what's the point of passing the descriptor to 'poll' or
    'select'? To make non-blocking file I/O useful, you have to have some
    rule for when a file is ready to read and when it's not.

    > AIO is not necessary outside of the small thinking box you use.


    Propose a coherent set of semantics. I believe it's impossible. If you
    think it can be done, go ahead and do it. Tell me -- when should a
    file be ready for read and when should a poll block?

    > |> From then on, the kernel
    > |> follows in lock-step; we've swapped the producer-consumer ordering--so that
    > |> an open doesn't necessarily load a page on a whim--and alls well w/ the
    > |> world.
    > |
    > | Exactly. What you really want is AIO semantics, and you could possibly
    > | fake it in ugly and unsatisfying ways with non-blocking semantics.
    >
    > Faked? *Just follow the logic and it can work.


    What is the logic? If you open a file and then pass it to 'select' for
    read, what are you waiting for?

    DS

  2. Re: aio_read/write versus O_NONBLOCK

    On Tue, 27 May 2008 19:22:47 -0700 (PDT) David Schwartz wrote:
    | On May 27, 5:36?pm, phil-news-nos...@ipal.net wrote:
    |
    |> | Which is what? Are you suggesting it should say that the file is not
    |> | ready for reading (since the data is not available) and that it never
    |> | should return ready no matter how long you wait, perhaps unless some
    |> | other process coincidentally happens to cause the first few bytes of
    |> | data to be resident in memory?
    |
    |> Being "ready to read" does not mean the actual data is in RAM. ?It just means
    |> you can proceed to do the read call if your program has an interest in
    |> reading something (which presumably it does if you included that descriptor
    |> for reading in poll or select).
    |
    | If a file is *always* ready to read, whether or not the actual data is
    | in RAM, then what's the point of passing the descriptor to 'poll' or
    | 'select'? To make non-blocking file I/O useful, you have to have some
    | rule for when a file is ready to read and when it's not.

    For each descriptor, or for each process' reference to a descriptor, keep
    a flag that indicates if a read request is active. Another flag would be
    kept for a write request.

    When doing a poll, for each file/disk descriptor, if there is no request
    currently active (and the filesystem is not in an error status) then poll
    will present the file as ready for read and/or write (depending on which
    flags are set).

    When doing a read or write, if the request can be completed immediately
    (for read, the data is already in RAM ... for write, there is buffer space
    eligible to cache the data if the O_SYNC flag is not in effect), then that
    read or write proceeds to completion or partial completion in cases where
    that is allowed and necessary (normally this has not been the case for
    files, but it is also something I believe should be allowed).

    When doing a read or write, if the request cannot be completed immediately
    (for read, the data is not in RAM ... for write, there is insufficient
    buffer space eligible for caching this data or the O_SYNC flag is in effect)
    then the I/O is queued/scheduled/started as appropriate, the descriptor is
    marked active for that operation, errno is set to EAGAIN, and the syscall
    returns -1.

    When doing a read or write, if the associated device or filesystem is in
    a special non-permanent delay condition (such as network recovery for NFS)
    then errno is set to EAGAIN, and the syscall returns -1.

    When the requested I/O completes and the data is in the buffer space, the
    active flag is cleared. Processes in poll with this descriptor selected
    will have that descriptor marked as ready and unblocked to return from
    poll.

    If a process that has done a read and gotten EAGAIN for some data then
    calls a seek function to select different data and calls read again, and
    if that data is available, that read will just return immediately without
    clearing the active flag. If that second data is not available, then it
    will just return EAGAIN. If it now calls poll and blocks there and wakes
    up when the first request is done, and does read at the position for the
    2nd request, it will get EAGAIN, but this time that 2nd request is started.

    A more advanced implementation will use counters instead of flags and allow
    a finite number of queued requests per descriptor.


    |> AIO is not necessary outside of the small thinking box you use.
    |
    | Propose a coherent set of semantics. I believe it's impossible. If you
    | think it can be done, go ahead and do it. Tell me -- when should a
    | file be ready for read and when should a poll block?

    See above.


    |> |> From then on, the kernel
    |> |> follows in lock-step; we've swapped the producer-consumer ordering--so that
    |> |> an open doesn't necessarily load a page on a whim--and alls well w/ the
    |> |> world.
    |> |
    |> | Exactly. What you really want is AIO semantics, and you could possibly
    |> | fake it in ugly and unsatisfying ways with non-blocking semantics.
    |>
    |> Faked? ?Just follow the logic and it can work.
    |
    | What is the logic? If you open a file and then pass it to 'select' for
    | read, what are you waiting for?

    See above.

    --
    |WARNING: Due to extreme spam, googlegroups.com is blocked. Due to ignorance |
    | by the abuse department, bellsouth.net is blocked. If you post to |
    | Usenet from these places, find another Usenet provider ASAP. |
    | Phil Howard KA9WGN (email for humans: first name in lower case at ipal.net) |

  3. Re: aio_read/write versus O_NONBLOCK

    phil-news-nospam@ipal.net writes:
    > On Tue, 27 May 2008 12:25:10 +0200 Rainer Weikusat wrote:
    > | phil-news-nospam@ipal.net writes:
    > |> On Mon, 26 May 2008 13:33:35 +0200 Rainer Weikusat wrote:
    > |> | phil-news-nospam@ipal.net writes:
    > |> |> On Sun, 25 May 2008 19:18:57 +0200 Rainer Weikusat wrote:
    > |> |> | phil-news-nospam@ipal.net writes:
    > |> |> |> On Sun, 25 May 2008 09:49:07 +0200 Rainer Weikusat wrote:
    > |> |> |> |> Why do you say it cannot work on a file?
    > |> |> |> |
    > |> |> |> | Without doing a read first, no data is ever going to be readable
    > |> |> |> | from the file, because there is no active partner which could provide
    > |> |> |> | it unilaterally.
    > |> |> |>
    > |> |> |> So why not read it first.
    > |> |> |
    > |> |> | ,----
    > |> |> | | The next problem would be that the existing sychronous I/O
    > |> |> | | multiplexing primitives are not designed for random access files but
    > |> |> | | for (implicitly) time-ordered streams, ie it is completely ok to
    > |> |> | | ^^^^^^^^^^^^^^^^^^^^^^
    > |> |> | | create a socket and then call poll to wait until data to read is
    > |> |> | | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^
    > |> |> | | available. This cannot possibly work on a file.
    > |> |> | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    > |> |> | `----
    > |> |>
    > |> |> So why not read it first.
    > |> |> ^^^^^^^^^^^^^^^^^^^^^^^^^
    > |> |
    > |> | Which part of 'create, then poll without read not worky-worky on disk
    > |> | file' is too complicated for you?
    > |>
    > |> Why do you insist on not doing it the way that would work?
    > |
    > | Because I intended to give an example where synchronous
    > | I/O-multiplexing would need to work differently than it usually does
    > | when files were to be supported.
    >
    > You're just giving an example of a way to do things that does not work and
    > ignore how to make things work. This examples nothing.


    This 'examples' that the usual semantics of the call are unsuitable
    for file I/O. This does not mean that you, Mr Ahern, the tooth fairy
    or any other anonymous terrorist cannot 'silently' change the
    multiplexing call to do something different which would make it
    possible to use it with files, but that someone would have to.
    Which is the single point I have been re-iterating for quite some time
    now.

  4. Re: aio_read/write versus O_NONBLOCK

    phil-news-nospam@ipal.net writes:
    > Rainer Weikusat wrote:
    > | William Ahern writes:
    > |> David Schwartz wrote:
    > |
    > | [...]
    > |
    > |>> So what should waiting do in this case? Should it be required to start
    > |>> a read ahead, changing the state of the socket so that there is an
    > |>> asynchronous operation?
    > |>
    > |> The same thing it has always done, return readiness. When an actual I/O
    > |> request blocks, the kernel then has a sufficient hint to queue a
    > |> request;
    > |
    > | That's the more straight-forward 'obvious hack-around': Make the first
    > | call return a fictional result with an unknown probability of being
    > | wrong and assume that the application will then call read, harvest
    > | EAGAIN and go back to waiting, which will actually take place if
    > | necessary on the second call. This has the 'nice' side effect of
    > | breaking each and every piece of code which naively assumed that the
    > | return values of system calls were actually meant in earnest and still
    > | requires the behavious of the subroutines implementing the interface
    > | to be modified because an operation the interface is not suitable for
    > | shall (for some weird reason) be put under its umbrella, aka "all of
    > | my fifteen children are called 'Hey you!!' and I just throw stones at the
    > | one I actually meant to address".
    >
    > So your logic is that existing code ...


    No.

  5. Re: aio_read/write versus O_NONBLOCK

    On May 27, 10:45*pm, phil-news-nos...@ipal.net wrote:

    > For each descriptor, or for each process' reference to a descriptor, keep
    > a flag that indicates if a read request is active. *Another flag would be
    > kept for a write request.


    Congratulations, you've proven my point. Non-blocking semantics don't
    work for files, you need AIO. Having requests that are active and
    whose completion you wait for is AIO semantics. Getting hints that
    it's the right time to start a request is non-blocking semantics.

    As I've been saying this whole time, to do discovery for descriptors
    that reference files, you need AIO semantics. Non-blocking semantics
    work for files but not for sockets.

    It's really this simple -- with non-blocking semantics, there is no
    way for the implementation to know *what* read you are waiting for.
    The semantics require there to be one clear notion of "ready for
    read". This is possible for sockets (unreceived data is present or
    there's an error), but as your own comments show, this is impossible
    for files (you need a pending request).

    DS

  6. Re: aio_read/write versus O_NONBLOCK

    On Wed, 28 May 2008 07:22:23 -0700 (PDT) David Schwartz wrote:
    | On May 27, 10:45?pm, phil-news-nos...@ipal.net wrote:
    |
    |> For each descriptor, or for each process' reference to a descriptor, keep
    |> a flag that indicates if a read request is active. ?Another flag would be
    |> kept for a write request.
    |
    | Congratulations, you've proven my point. Non-blocking semantics don't
    | work for files, you need AIO. Having requests that are active and
    | whose completion you wait for is AIO semantics. Getting hints that
    | it's the right time to start a request is non-blocking semantics.

    I've proven exactly the opposite, that it would work ... if the standard is
    made to allow it to work.


    | As I've been saying this whole time, to do discovery for descriptors
    | that reference files, you need AIO semantics. Non-blocking semantics
    | work for files but not for sockets.

    Or use the semantics I described previously.


    | It's really this simple -- with non-blocking semantics, there is no
    | way for the implementation to know *what* read you are waiting for.
    | The semantics require there to be one clear notion of "ready for
    | read". This is possible for sockets (unreceived data is present or
    | there's an error), but as your own comments show, this is impossible
    | for files (you need a pending request).

    It doesn't need to know what read you are waiting for. It only needs to know
    if there is a currently active read (or sufficient number of them in the cases
    of a more complex implementation that allow multiple requests to be scheduled)
    that would prevent starting a new read.

    --
    |WARNING: Due to extreme spam, googlegroups.com is blocked. Due to ignorance |
    | by the abuse department, bellsouth.net is blocked. If you post to |
    | Usenet from these places, find another Usenet provider ASAP. |
    | Phil Howard KA9WGN (email for humans: first name in lower case at ipal.net) |

  7. Re: aio_read/write versus O_NONBLOCK

    On Wed, 28 May 2008 08:41:11 +0200 Rainer Weikusat wrote:
    | phil-news-nospam@ipal.net writes:
    |> On Tue, 27 May 2008 12:25:10 +0200 Rainer Weikusat wrote:
    |> | phil-news-nospam@ipal.net writes:
    |> |> On Mon, 26 May 2008 13:33:35 +0200 Rainer Weikusat wrote:
    |> |> | phil-news-nospam@ipal.net writes:
    |> |> |> On Sun, 25 May 2008 19:18:57 +0200 Rainer Weikusat wrote:
    |> |> |> | phil-news-nospam@ipal.net writes:
    |> |> |> |> On Sun, 25 May 2008 09:49:07 +0200 Rainer Weikusat wrote:
    |> |> |> |> |> Why do you say it cannot work on a file?
    |> |> |> |> |
    |> |> |> |> | Without doing a read first, no data is ever going to be readable
    |> |> |> |> | from the file, because there is no active partner which could provide
    |> |> |> |> | it unilaterally.
    |> |> |> |>
    |> |> |> |> So why not read it first.
    |> |> |> |
    |> |> |> | ,----
    |> |> |> | | The next problem would be that the existing sychronous I/O
    |> |> |> | | multiplexing primitives are not designed for random access files but
    |> |> |> | | for (implicitly) time-ordered streams, ie it is completely ok to
    |> |> |> | | ^^^^^^^^^^^^^^^^^^^^^^
    |> |> |> | | create a socket and then call poll to wait until data to read is
    |> |> |> | | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^
    |> |> |> | | available. This cannot possibly work on a file.
    |> |> |> | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |> |> |> | `----
    |> |> |>
    |> |> |> So why not read it first.
    |> |> |> ^^^^^^^^^^^^^^^^^^^^^^^^^
    |> |> |
    |> |> | Which part of 'create, then poll without read not worky-worky on disk
    |> |> | file' is too complicated for you?
    |> |>
    |> |> Why do you insist on not doing it the way that would work?
    |> |
    |> | Because I intended to give an example where synchronous
    |> | I/O-multiplexing would need to work differently than it usually does
    |> | when files were to be supported.
    |>
    |> You're just giving an example of a way to do things that does not work and
    |> ignore how to make things work. This examples nothing.
    |
    | This 'examples' that the usual semantics of the call are unsuitable
    | for file I/O. This does not mean that you, Mr Ahern, the tooth fairy
    | or any other anonymous terrorist cannot 'silently' change the
    | multiplexing call to do something different which would make it
    | possible to use it with files, but that someone would have to.
    | Which is the single point I have been re-iterating for quite some time
    | now.

    No one is suggesting being silent about it. I've raised the point that the
    standard could have been made to allow non-blocking I/O using the existing
    calls without changing how they work for any blocking case or for any case
    with a non-file/disk descriptor. I've also given the details in a reply
    elsewhere in this thread. Either show where the logic I described cannot
    work, or be silent regarding my logic giving the appearance that you do not
    see why it cannot work.

    --
    |WARNING: Due to extreme spam, googlegroups.com is blocked. Due to ignorance |
    | by the abuse department, bellsouth.net is blocked. If you post to |
    | Usenet from these places, find another Usenet provider ASAP. |
    | Phil Howard KA9WGN (email for humans: first name in lower case at ipal.net) |

  8. Re: aio_read/write versus O_NONBLOCK

    phil-news-nospam@ipal.net writes:
    > On Wed, 28 May 2008 08:41:11 +0200 Rainer Weikusat wrote:


    [...]

    > | This 'examples' that the usual semantics of the call are unsuitable
    > | for file I/O. This does not mean that you, Mr Ahern, the tooth fairy
    > | or any other anonymous terrorist cannot 'silently' change the
    > | multiplexing call to do something different which would make it
    > | possible to use it with files, but that someone would have to.
    > | Which is the single point I have been re-iterating for quite some time
    > | now.
    >


    [...]

    > Either show where the logic I described cannot work, or be silent
    > regarding my logic giving the appearance that you do not
    > see why it cannot work.


    The basic is question is not wether something could conceivably be
    implemented (everything can be implemented, subject to the laws of
    physics) but wether implementing it would be sensible. IMO,
    overloading the existing synchronous I/O-multiplexing primitves to
    accomplish a loosely-related 'other task' is not sensible, especially
    taking into account that this 'other task' would only be useful in
    fringe cases on computer systems using particular hardware and
    particular filesystem layouts.

  9. Re: aio_read/write versus O_NONBLOCK

    On May 29, 12:03*am, phil-news-nos...@ipal.net wrote:

    > | Congratulations, you've proven my point. Non-blocking semantics don't
    > | work for files, you need AIO. Having requests that are active and
    > | whose completion you wait for is AIO semantics. Getting hints that
    > | it's the right time to start a request is non-blocking semantics.


    > I've proven exactly the opposite, that it would work ... if the standard is
    > made to allow it to work.


    I'm obviously not going to convince you. Essentially, what you've done
    is invented a badly broken version of AIO.

    DS

  10. Re: aio_read/write versus O_NONBLOCK

    phil-news-nospam@ipal.net wrote:

    > It doesn't need to know what read you are waiting for. It only needs to know
    > if there is a currently active read (or sufficient number of them in the cases
    > of a more complex implementation that allow multiple requests to be scheduled)
    > that would prevent starting a new read.


    It does need to know what read you are waiting for, actually, because a file
    is in a sense--to draw a socket analogy--a buffer of infinite length. It
    shouldn't trigger readiness merely because any random block has been paged
    in. OTOH, it's _sensible_ to trigger when the most-likely-to-be-read page is
    resident. (The Kantian argument that the kernel shouldn't lie to userland in
    regards to the consumer-producer swap trick is just so outlandish....)

    The only actual difference is that you want to [attempt to] reap the request
    and initiate a copyout to userland yourself, whereas AIO insists that this
    be done before you're notified of readiness (or rather, completion).

    In the interests of purity AIO renders billions of lines of existing,
    repurposeable code nearly useless. But semantically it's a thing of
    beauty--largely because it ignores lots of real-life issues. (It's also
    similar to Win32 IOCP, yet anecdotally Unix programmers--for better or
    worse--have trouble integrating IOCP into their portable applications, even
    though w/o IOCP the scalability of networking apps in Win32 is nil.)

    Abstractly, nearly any non-blocking networking code can be supplanted by
    AIO. In reality this is not the case. Sockets can do descriptor passing and
    other types of ancillary message passing, like TCP OOB data. This already
    muddies the waters, and these sorts of dilemmas have traditionally been
    resolved in Unix by specifying a lesser common denominator interface.

    Mr. Weikusat makes the point that different resources should, or could, be
    treated differently, and that perhaps AIO shouldn't encompass sockets at all
    (or at least not bother to account for the tangential problems). I take it
    then that he enjoys Win32 programming, which IMO follows his sort of ethos,
    which inevitably manifests in a hodge podge of peculiar interfaces, many
    mutually exclusive--as opposed to vertical.

    At the end of the day, what matters most to me (and I would hope most other
    engineers) is code reusability. One day, a thousand years from now, AIO
    might be just the ticket, and replace--similar to pthread's
    accomplishment--all the myriad buffering libraries. It's not yet, and I
    don't understand why people get so uptight about stepping stones. The usage
    pattern of an initiated read request followed by a seek is so inestimably
    remote beyond a small circle of well understood applications, it boggles my
    mind why it's made into an issue at all.

    Yet, Mr. Weikusat and Schwartz are in good company, along with Torvalds and
    many other reasonable folk.

    Mr. Weikusat gives the "show me the code" challenge. Unfortunately, I try to
    write portable applications. A singular kernel implementation is useless (we
    are talking about a spec afterall). I am writing a portable solution, using
    the well known thread pool pattern, except I get to use sendfile(2) and
    splice(2) optimizations in place of shuttling data through userland. I may
    or may not be able to release that code, but it's straight-forward. I will
    be releasing something called libkq*, which is a portable kqueue() library
    which, I hope, will eventually portably handle AIO polling, and in
    conjunction with a Win32 AIO/IOCP wrapper might allow me to make use of AIO
    as a generic I/O interface in my applications. But I _first_ need the
    stepping stone of pollable regular file descriptors, if only because I
    cannot merely orphan all of my existing code and applications.


    * kqueue() provides the semantics of lockless event insertion
    across threads, which can be imitated using epoll() with no or minimally
    scoped locking. libevent is too high level for such a task.


  11. Re: aio_read/write versus O_NONBLOCK

    On May 29, 2:14*pm, William Ahern
    wrote:

    > The usage
    > pattern of an initiated read request followed by a seek is so inestimably
    > remote beyond a small circle of well understood applications, it boggles my
    > mind why it's made into an issue at all.


    Huh? What?

    One of the things that annoys me the most on Windows is that there is
    no equivalent of 'pread' or 'pwrite' and I have to implement my own
    locking around file accesses to prevent one thread from moving the
    pointer and upsetting another thread's read or write operation.

    What you are really arguing is that the usage pattern of concurrent
    logically independent reads and writes to the same file is
    "inestimably remote". So apparently in your world, most applications
    don't, for example, contain or use any kind of database.

    I think the thing your missing is that applications that use files
    without seeking probably don't need AIO or non-blocking semantics
    anyway. We're only interested in the universe of applications that are
    going out of their way to avoid blocking, and these will likely have
    concurrency in their file access too.

    DS

  12. Re: aio_read/write versus O_NONBLOCK

    David Schwartz wrote:
    > On May 29, 2:14?pm, William Ahern
    > wrote:
    >
    > > The usage
    > > pattern of an initiated read request followed by a seek is so inestimably
    > > remote beyond a small circle of well understood applications, it boggles my
    > > mind why it's made into an issue at all.

    >
    > Huh? What?


    I meant a queued read request (or hint) which is then invalidated by a seek
    on the same descriptor before completion of the original request. That is,
    the implicit argument that non-blocking semantics would lead to too much
    unnecessary I/O, or interpose too much code useless to too many
    applications.

    > One of the things that annoys me the most on Windows is that there is
    > no equivalent of 'pread' or 'pwrite' and I have to implement my own
    > locking around file accesses to prevent one thread from moving the
    > pointer and upsetting another thread's read or write operation.


    And on systems like Linux, when multiple threads are using pread/pwrite on
    the same descriptor it's also useful to tell the I/O scheduler not to do
    read-ahead, unless the fact of their use is a sufficient hint alone.
    Alternatively, each thread could use a separate descriptor, if they intend
    to seek and do long sequential operations. These are imprecise hints to the
    kernel, for sure, but I don't see how such impreciseness detrimentaly
    reflects on any argument for non-blocking file I/O request polling. Files,
    unlike sockets and pipe, are susceptible to myriad sets of useful semantics,
    depending on usage. AIO neatly obscures these patterns; it doesn't solve the
    problem of negotiating w/ the kernel.

    > What you are really arguing is that the usage pattern of concurrent
    > logically independent reads and writes to the same file is
    > "inestimably remote". So apparently in your world, most applications
    > don't, for example, contain or use any kind of database.


    You removed my qualifier specifically meant to encompass your example.
    Databases are a definite, well known use, and whether regular decriptors
    support polling or not they still benefit from hinting to the kernel how the
    resource will be used. It's all the unclassifiable such uses, in various
    applications, which I'm arguing are actually not common at all, because
    aside from the random access requirements of databases (for which shared
    semantics can be well defined), most others fall into the general category
    of open+optional seek+read, read, read.

    > I think the thing your missing is that applications that use files
    > without seeking probably don't need AIO or non-blocking semantics
    > anyway.


    Except for the vast majority of server applications: HTTP, FTP, RTSP, CIFS,
    etc, etc. A filesystem _is_ a database, and file names are hints to the
    kernel about the access pattern to that database. If you're implementing
    your own database on top of that, you should cooperate w/ the kernel to get
    optimal performance--which you're also implicitly doing by opening a
    particular file--in the same way I'm arguing that an application which
    simply desires to read a file as a stream can hint to the kernel by
    requesting non-blocking semantics, which in any event is already a pattern
    accounted for in I/O schedulers.

    All I'm arguing is that, though obviously the universe of software which
    requires these abilities is relatively small, the subset which do concurrent
    _random_ access are far less numerable than the rest, and I don't understand
    why a solution which abstactly addresses both sets, yet which in practice
    impedes existing development practices, should be heralded in a way which
    excludes compatible and practical solutions.

    > We're only interested in the universe of applications that are going out
    > of their way to avoid blocking, and these will likely have concurrency in
    > their file access too.


    The most common sort of concurrency in file access is using multiple file
    descriptors. Sharing a single descriptor is, I would argue, an optimization,
    and an uncommon one, in terms of types of applications which make use of
    it--databases being one example which probably accounts for most, and for
    which reuseable implementations abound. (That open(2) might be expensive is
    beside the point; it registers a consumer with a unique and more predictable
    access behavior.) That's not an argument against pread/pwrite, however,
    because allowing optional non-blocking semantics on regular files doesn't
    intrude on the semantics or even necessarily the implementations of
    pread/pwrite.


  13. Re: aio_read/write versus O_NONBLOCK


    William Ahern wrote:

    > David Schwartz wrote:


    > I meant a queued read request (or hint) which is then invalidated by a seek
    > on the same descriptor before completion of the original request. That is,
    > the implicit argument that non-blocking semantics would lead to too much
    > unnecessary I/O, or interpose too much code useless to too many
    > applications.


    Exactly. This is precisely what would happen in many typical
    applications. Consider a multi-threaded web server that is serving up
    many copies of a file that's too large to fit in memory. Every time it
    gets one 'read' for one client started up, the next one will screw it
    up.

    > And on systems like Linux, when multiple threads are using pread/pwrite on
    > the same descriptor it's also useful to tell the I/O scheduler not to do
    > read-ahead, unless the fact of their use is a sufficient hint alone.
    > Alternatively, each thread could use a separate descriptor, if they intend
    > to seek and do long sequential operations. These are imprecise hints to the
    > kernel, for sure, but I don't see how such impreciseness detrimentaly
    > reflects on any argument for non-blocking file I/O request polling. Files,
    > unlike sockets and pipe, are susceptible to myriad sets of useful semantics,
    > depending on usage. AIO neatly obscures these patterns; it doesn't solve the
    > problem of negotiating w/ the kernel.


    Nobody has yet been able to draft useful non-blocking semantics for
    files. All they do is badly reinvent AIO semantics using the function
    calls normally used for non-blocking semantics.

    > > What you are really arguing is that the usage pattern of concurrent
    > > logically independent reads and writes to the same file is
    > > "inestimably remote". So apparently in your world, most applications
    > > don't, for example, contain or use any kind of database.


    > You removed my qualifier specifically meant to encompass your example.
    > Databases are a definite, well known use, and whether regular decriptors
    > support polling or not they still benefit from hinting to the kernel how the
    > resource will be used. It's all the unclassifiable such uses, in various
    > applications, which I'm arguing are actually not common at all, because
    > aside from the random access requirements of databases (for which shared
    > semantics can be well defined), most others fall into the general category
    > of open+optional seek+read, read, read.


    You're missing the point that there's no reason to consider
    application that couldn't benefit from either AIO, non-blocking, or
    any other type of asynchronous file I/O semantics. Once you're in the
    universe of only programs that can benefit from this type of I/O, the
    percentage that use files in sophisticated ways goes way up.

    > > I think the thing your missing is that applications that use files
    > > without seeking probably don't need AIO or non-blocking semantics
    > > anyway.


    > Except for the vast majority of server applications: HTTP, FTP, RTSP, CIFS,
    > etc, etc. A filesystem _is_ a database, and file names are hints to the
    > kernel about the access pattern to that database. If you're implementing
    > your own database on top of that, you should cooperate w/ the kernel to get
    > optimal performance--which you're also implicitly doing by opening a
    > particular file--in the same way I'm arguing that an application which
    > simply desires to read a file as a stream can hint to the kernel by
    > requesting non-blocking semantics, which in any event is already a pattern
    > accounted for in I/O schedulers.


    I've written servers for almost all of these protocols. Having to deal
    with the lack of 'pread'/'pwrite' on Windows was a PITA on every
    single one of them. You can't open 8,000 copies of a file just because
    8,000 clients want to transfer it and it doesn't fit in memory. I've
    had to implement special locks on Windows to make the 'seek/read' and
    'seek/write' atomic.

    > All I'm arguing is that, though obviously the universe of software which
    > requires these abilities is relatively small, the subset which do concurrent
    > _random_ access are far less numerable than the rest, and I don't understand
    > why a solution which abstactly addresses both sets, yet which in practice
    > impedes existing development practices, should be heralded in a way which
    > excludes compatible and practical solutions.


    I have yet to see any version of non-blocking semantics for files that
    allows useful concurrency. Being limited to one operation in each
    direction per file or having some horrible ugliness pop up when you do
    that is not useful concurrency in my book.

    > > We're only interested in the universe of applications that are going out
    > > of their way to avoid blocking, and these will likely have concurrency in
    > > their file access too.


    > The most common sort of concurrency in file access is using multiple file
    > descriptors. Sharing a single descriptor is, I would argue, an optimization,
    > and an uncommon one, in terms of types of applications which make use of
    > it--databases being one example which probably accounts for most, and for
    > which reuseable implementations abound. (That open(2) might be expensive is
    > beside the point; it registers a consumer with a unique and more predictable
    > access behavior.) That's not an argument against pread/pwrite, however,
    > because allowing optional non-blocking semantics on regular files doesn't
    > intrude on the semantics or even necessarily the implementations of
    > pread/pwrite.


    We're only talking about highly-optimized applications here. Non-
    optimized applications would have no use whatsoever for asynchronous
    or non-blocking semantics. Very, very few applications use this type
    of semantic to this day and there's nothing they can't do. We're
    talking about super-advanced optimizations only for applications that
    want the very highest degree of concurrency.

    To say that these applications don't share their file descriptors or
    don't have complex access patterns is just crazy. Odds are these costs
    would outweigh the benefits.

    We are talking here about an optimization to use when other
    optimizations don't do enough for you.

    DS

  14. Re: aio_read/write versus O_NONBLOCK

    William Ahern writes:
    > phil-news-nospam@ipal.net wrote:
    >
    >> It doesn't need to know what read you are waiting for. It only needs to know
    >> if there is a currently active read (or sufficient number of them in the cases
    >> of a more complex implementation that allow multiple requests to be scheduled)
    >> that would prevent starting a new read.

    >
    > It does need to know what read you are waiting for, actually, because a file
    > is in a sense--to draw a socket analogy--a buffer of infinite length. It
    > shouldn't trigger readiness merely because any random block has been paged
    > in. OTOH, it's _sensible_ to trigger when the most-likely-to-be-read page is
    > resident.


    So, it should both know what the application wants to read ('most-
    likely-to-be-read'-page) and not know what the application wants to
    read ('It does not need to know what read you are waiting for')?

    > (The Kantian argument that the kernel shouldn't lie to userland in
    > regards to the consumer-producer swap trick is just so
    > outlandish....)


    This remark was intended to refer to the recurring discussions
    regarding wether 'multiplexing call returns "readable"' means 'the
    next read will not block', and consequently, wether using of
    O_NONBLOCK is 'necessary'. Obviously, when the call returns a result
    with a 'random' probabilty of being wrong to begin with, applications
    which were coded under the assumption that not only will the result
    not be wrong, but that it will even be valid after an unspecified
    amount of time has passed, will 'randomly' block when they
    shouldn't in situations where they would not have blocked with the
    unmodified call. Because this keeps being rediscussed every now and
    then, the conjecture that such applications exist (and even exist in
    numbers) appears justified to me. The means that especially
    're-purposing code' which was written to deal with other types of
    things file descriptors can be associated with, will not necessarily
    be a trivial task.

    OTOH, I agree with the general statement that 'subroutine should
    perform according to their documentation' instead of 'performing
    "tricks" based on assumptions about application coding patterns'.
    'Tricks' are only needed if there is a need to trick someone into
    something in the first place and they tend to be 'tricky', ie
    non-obvious, hard to understand for someone not already knowing about
    them and 'surprising' --- all things I would rather not have in code.

    The suggestion that this could be a question of 'moral' is indeed
    outlandish.

    > The only actual difference is that you want to [attempt to] reap the request
    > and initiate a copyout to userland yourself, whereas AIO insists that this
    > be done before you're notified of readiness (or rather, completion).


    There is another diffference: 'Streams' (like sequences
    of TCP segments or UDP datagrams) are time-ordered: Each particular
    unit of data is available at the same location during a particular time. A
    file is 'space-ordered': Each particular unit of data is available at
    a particular location during same time. Especially, this means that
    for a stream, there is always a 'next' event to wait for and all other
    events will occure one-by-one after the next event. This is not true
    for a set of I/O-operations on a random access file, which can all be
    started at the same time and complete, possibly even in parallell, in
    any order. For the 'copy a file' example, this would mean that the
    second, fourth and sixth 'unit of data' could already be written to
    'the output file' at their respective positions, while the first,
    third and fith are not yet in memory, using the basic pattern to start
    n (some desired concurrency level) async reads at the same time, start
    an async write whenever a read completes and another read whenever a
    write completes.

    [...]

    > Mr. Weikusat makes the point that different resources should, or
    > could, be treated differently, and that perhaps AIO shouldn't
    > encompass sockets at all (or at least not bother to account for the
    > tangential problems).


    I wasn't writing about AIO but about the existing synchronous
    I/O-multiplexing interfaces. These have been designed for time-ordered
    streams of data and cannot be used for 'space-ordered data sets'
    without modification. This means 'treating different things
    differently' is necessary in any case. The remaining question would be
    wether 'modifiying what is already there' or 'adding something new' is
    more desirable. Have you ever heard of a maxim that one should rather
    write a new program than add unrelated functionality to an existing
    program?

    > I take it then that he enjoys Win32 programming, which IMO follows
    > his sort of ethos, which inevitably manifests in a hodge podge of
    > peculiar interfaces, many mutually exclusive--as opposed to
    > vertical.


    The last (and only) version of Windows I have been in closer contact
    with than occasionally helping someone with something was 3.1. I don't
    use it and I don't develop anything for it. I don't even care if code
    is portable to Windows, because unless this is specifically requested,
    I am not willing to deal with any Microsoft weirdnesses.

    > At the end of the day, what matters most to me (and I would hope most other
    > engineers) is code reusability.


    [...]

    The 'engineer' used to be the guy who operated the (steam) engine, did
    you know that? An optimist would hope that 'an engineer' would care
    for technically sensible solutions in preferences to minimum-effort
    solutions. But this could presumably be regarded as a German idee
    fixe :->.

    > Mr. Weikusat gives the "show me the code" challenge. Unfortunately, I try to
    > write portable applications. A singular kernel implementation is useless (we
    > are talking about a spec afterall).


    This was a misunderstanding: What I intended to say was 'there are
    apparently two mutually incompatible opinions regarding this
    particular topic, with one of them being "what we have is fine" and
    the other "it isn't"'. Consequently, the only way to settle this
    dispute would be by experiment: Someone implementing the other idea to
    see how it will work out in practice ('opinion' means 'everybody could
    be wrong') and if it would gain any traction based on its actual,
    technical merits.

  15. Re: aio_read/write versus O_NONBLOCK

    On May 30, 9:57*am, Rainer Weikusat wrote:

    > This remark was intended to refer to the recurring discussions
    > regarding wether 'multiplexing call returns "readable"' means 'the
    > next read will not block',


    The don't.

    > and consequently, wether using of
    > O_NONBLOCK is 'necessary'.


    It is if you don't want to block.

    > Obviously, when the call returns a result
    > with a 'random' probabilty of being wrong to begin with, applications
    > which were coded under the assumption that not only will the result
    > not be wrong, but that it will even be valid after an unspecified
    > amount of time has passed, will 'randomly' block when they
    > shouldn't in situations where they would not have blocked with the
    > unmodified call.


    Of course applications coded with broken assumptions will break. This
    is no different from an application that calls 'statfs' to check the
    free space and then assumes a subsequent write won't run out of space
    because the space was available in the past.

    It is literally impossible for the kernel to guarantee the result of a
    future operation and code that assumes such is broken beyond repair.

    > Because this keeps being rediscussed every now and
    > then, the conjecture that such applications exist (and even exist in
    > numbers) appears justified to me.


    This is a particularly common error, as it happens.

    > The means that especially
    > 're-purposing code' which was written to deal with other types of
    > things file descriptors can be associated with, will not necessarily
    > be a trivial task.


    I agree that this is true, but this argument doesn't support that
    claim. If existing applications are broken, it should be fixed,
    period.

    > OTOH, I agree with the general statement that 'subroutine should
    > perform according to their documentation' instead of 'performing
    > "tricks" based on assumptions about application coding patterns'.
    > 'Tricks' are only needed if there is a need to trick someone into
    > something in the first place and they tend to be 'tricky', ie
    > non-obvious, hard to understand for someone not already knowing about
    > them and 'surprising' --- all things I would rather not have in code.


    That's why you can't make assumptions that might not be valid in
    tricky circumstances. Assuming that because a read wouldn't have
    blocked at time T, a read will not block at time T+1 assumes that
    nothing tricky will happen. Then when something tricky does happen,
    you are screwed. That's why we try to provide interfaces that don't
    require you to make assumptions.

    > The last (and only) version of Windows I have been in closer contact
    > with than occasionally helping someone with something was 3.1. I don't
    > use it and I don't develop anything for it. I don't even care if code
    > is portable to Windows, because unless this is specifically requested,
    > I am not willing to deal with any Microsoft weirdnesses.


    I think this is a mistake. As a result, you won't learn another way of
    doing things, in what way it's better and in what ways it's worse.
    Also, I have found that the more different platforms I test my code
    on, the more platform-independent bugs I find. Compiling and running
    code on as many fundamentally-different platforms as possible
    generates better code.

    To give a simple example, I add some C++ code kind of like this:
    class Foo
    {
    void Qux(bool);
    void Qux(int);
    void Qux(const char *);
    void Quux(const Bar *);
    };

    Foo *j;
    bar *k;
    j->Qux(k); // Ack! Supposed to be Quux.

    Yes, the compiler converted the pointer to a boolean and no warnings
    or errors were generated. The bug was not caught until the code was
    compiled on VC++ for Windows, and VC++ generated a warning.

    The UNIX code got better because it was being compiled and tested on
    Windows. This happens an awful lot.

    DS

  16. Re: aio_read/write versus O_NONBLOCK

    On Thu, 29 May 2008 08:18:18 -0700 (PDT) David Schwartz wrote:
    | On May 29, 12:03?am, phil-news-nos...@ipal.net wrote:
    |
    |> | Congratulations, you've proven my point. Non-blocking semantics don't
    |> | work for files, you need AIO. Having requests that are active and
    |> | whose completion you wait for is AIO semantics. Getting hints that
    |> | it's the right time to start a request is non-blocking semantics.
    |
    |> I've proven exactly the opposite, that it would work ... if the standard is
    |> made to allow it to work.
    |
    | I'm obviously not going to convince you. Essentially, what you've done
    | is invented a badly broken version of AIO.

    And what is broken about it?

    --
    |WARNING: Due to extreme spam, googlegroups.com is blocked. Due to ignorance |
    | by the abuse department, bellsouth.net is blocked. If you post to |
    | Usenet from these places, find another Usenet provider ASAP. |
    | Phil Howard KA9WGN (email for humans: first name in lower case at ipal.net) |

  17. Re: aio_read/write versus O_NONBLOCK

    On May 31, 2:03*am, phil-news-nos...@ipal.net wrote:

    > | I'm obviously not going to convince you. Essentially, what you've done
    > | is invented a badly broken version of AIO.


    > And what is broken about it?


    You have got to be joking. One the very slight chance that you aren't
    joking, I'll make a deal with you. You coherently explain the
    semantics in one place and I'll rip them to shreds.

    Start out by answering this question: what is the definition of "ready
    for read" or "readable" for a file going to be? Consider the following
    cases:

    1) Open file, select for readability.

    2) Open file, select for readability, another thread moves the file
    pointer.

    3) A program needs to read 20 bytes at one position and 30 bytes at
    another position. How does it do this without blocking?

    DS

  18. Re: aio_read/write versus O_NONBLOCK

    On Thu, 29 May 2008 14:14:42 -0700 William Ahern wrote:
    | phil-news-nospam@ipal.net wrote:
    |
    |> It doesn't need to know what read you are waiting for. It only needs to know
    |> if there is a currently active read (or sufficient number of them in the cases
    |> of a more complex implementation that allow multiple requests to be scheduled)
    |> that would prevent starting a new read.
    |
    | It does need to know what read you are waiting for, actually, because a file
    | is in a sense--to draw a socket analogy--a buffer of infinite length. It
    | shouldn't trigger readiness merely because any random block has been paged
    | in. OTOH, it's _sensible_ to trigger when the most-likely-to-be-read page is
    | resident. (The Kantian argument that the kernel shouldn't lie to userland in
    | regards to the consumer-producer swap trick is just so outlandish....)

    Needing to know what part of the file the process is waiting for is merely
    an idealistic notion. The logic I described shows how the kernel only needs
    to keep track of two things, which it already does: 1: which parts of the
    file are already in RAM ... 2: which part(s) of the file it is doing I/O to
    read into RAM. My logic doesn't trigger readiness because a random block is
    paged in (although the idealistic construct might want to do that). Instead,
    it triggers readiness when a block ... whatever it is ... that was requested
    by the process, has been read in. The readiness logic is deferring to the
    caching logic to keep track of what is "in" (what can be instantly accessed).

    I have no idea what you mean about this "Kantian argument".


    | The only actual difference is that you want to [attempt to] reap the request
    | and initiate a copyout to userland yourself, whereas AIO insists that this
    | be done before you're notified of readiness (or rather, completion).

    I don't know what role you are referring to in the "yourself" reference so I
    am unable to determine the meaning of this sentence.


    | In the interests of purity AIO renders billions of lines of existing,
    | repurposeable code nearly useless. But semantically it's a thing of
    | beauty--largely because it ignores lots of real-life issues. (It's also
    | similar to Win32 IOCP, yet anecdotally Unix programmers--for better or
    | worse--have trouble integrating IOCP into their portable applications, even
    | though w/o IOCP the scalability of networking apps in Win32 is nil.)

    It very well may be that AIO itself is more a thing of beauty than the
    lecagy I/O call interface. It might well have made things work a lot
    better and cleaner had AIO been used for everything in place of the I/O
    calls most commonly used now. Had the original Unix designers used it
    and only it, I'm sure we would not be having this discussion at all.

    What *I* consider to be a thing of ugliness is mixing two different ways
    to do things. I see the beauty of the conventional I/O interface as more
    a thing of beauty if it is allowed to be complete (as in, allowed to have
    its non-blocking semantics for everything).

    I'm looking not at AIO or CIO (classic I/O) for the beauty, but rather, at
    the whole of the Unix/POSIX interface. The beauty in it for me would be
    to have _one_ kind of interface that can do it all. Either AIO or CIO has
    the potential to do that. My logic explains how CIO can. Currently, POSIX
    allows neither method to realize its full ability.


    | Abstractly, nearly any non-blocking networking code can be supplanted by
    | AIO. In reality this is not the case. Sockets can do descriptor passing and
    | other types of ancillary message passing, like TCP OOB data. This already
    | muddies the waters, and these sorts of dilemmas have traditionally been
    | resolved in Unix by specifying a lesser common denominator interface.
    |
    | Mr. Weikusat makes the point that different resources should, or could, be
    | treated differently, and that perhaps AIO shouldn't encompass sockets at all
    | (or at least not bother to account for the tangential problems). I take it
    | then that he enjoys Win32 programming, which IMO follows his sort of ethos,
    | which inevitably manifests in a hodge podge of peculiar interfaces, many
    | mutually exclusive--as opposed to vertical.

    The big problem with AIO and CIO is that a program cannot use both very
    easily. The reason is AIO has its own process suspension method (that
    being aio_suspend) different from the CIO method (poll or select). What
    would happen if you schedule I/O with AIO and then call poll()?


    | At the end of the day, what matters most to me (and I would hope most other
    | engineers) is code reusability. One day, a thousand years from now, AIO
    | might be just the ticket, and replace--similar to pthread's
    | accomplishment--all the myriad buffering libraries. It's not yet, and I
    | don't understand why people get so uptight about stepping stones. The usage
    | pattern of an initiated read request followed by a seek is so inestimably
    | remote beyond a small circle of well understood applications, it boggles my
    | mind why it's made into an issue at all.

    Maybe it can. But what is still a crucial need is to have a single common
    means for a process to wait on _all_ descriptors (and other resources) at
    one time. If it is possible to have one set of calls for all kinds of I/O
    then that is great. If not, so be it. But the waiting/suspension part does
    need to be in common.

    --
    |WARNING: Due to extreme spam, googlegroups.com is blocked. Due to ignorance |
    | by the abuse department, bellsouth.net is blocked. If you post to |
    | Usenet from these places, find another Usenet provider ASAP. |
    | Phil Howard KA9WGN (email for humans: first name in lower case at ipal.net) |

  19. Re: aio_read/write versus O_NONBLOCK

    On Thu, 29 May 2008 13:31:06 +0200 Rainer Weikusat wrote:
    | phil-news-nospam@ipal.net writes:
    |> On Wed, 28 May 2008 08:41:11 +0200 Rainer Weikusat wrote:
    |
    | [...]
    |
    |> | This 'examples' that the usual semantics of the call are unsuitable
    |> | for file I/O. This does not mean that you, Mr Ahern, the tooth fairy
    |> | or any other anonymous terrorist cannot 'silently' change the
    |> | multiplexing call to do something different which would make it
    |> | possible to use it with files, but that someone would have to.
    |> | Which is the single point I have been re-iterating for quite some time
    |> | now.
    |>
    |
    | [...]
    |
    |> Either show where the logic I described cannot work, or be silent
    |> regarding my logic giving the appearance that you do not
    |> see why it cannot work.
    |
    | The basic is question is not wether something could conceivably be
    | implemented (everything can be implemented, subject to the laws of
    | physics) but wether implementing it would be sensible. IMO,
    | overloading the existing synchronous I/O-multiplexing primitves to
    | accomplish a loosely-related 'other task' is not sensible, especially
    | taking into account that this 'other task' would only be useful in
    | fringe cases on computer systems using particular hardware and
    | particular filesystem layouts.

    "sensible" is subjective.

    I believe that the logic I described is not overloading anything at all.
    Instead, I believe that what it does is restores the part that was removed
    by the arbitrary decision that non-blocking would not be allowed for file
    or disk I/O.

    I believe that the logic I described is completely sensible. If it had
    been included in the original design, and left in place to this day,
    would you be continually complaining about it?

    --
    |WARNING: Due to extreme spam, googlegroups.com is blocked. Due to ignorance |
    | by the abuse department, bellsouth.net is blocked. If you post to |
    | Usenet from these places, find another Usenet provider ASAP. |
    | Phil Howard KA9WGN (email for humans: first name in lower case at ipal.net) |

  20. Re: aio_read/write versus O_NONBLOCK

    On May 31, 2:27*am, phil-news-nos...@ipal.net wrote:

    > Needing to know what part of the file the process is waiting for is merely
    > an idealistic notion. *The logic I described shows how the kernel only needs
    > to keep track of two things, which it already does: *1: which parts of the
    > file are already in RAM ... 2: which part(s) of the file it is doing I/O to
    > read into RAM. *My logic doesn't trigger readiness because a random block is
    > paged in (although the idealistic construct might want to do that). *Instead,
    > it triggers readiness when a block ... whatever it is ... that was requested
    > by the process, has been read in. *The readiness logic is deferring to the
    > caching logic to keep track of what is "in" (what can be instantly accessed).


    That won't work for at least two reasons. The most obvious is this:

    1) Long ago the process tried to read a particular block of a file. It
    no longer cares about this particular block nor even really knows what
    block it was. The logic associated with that block in the process has
    long since been cancelled.

    2) Some thread blocks for readiness. The implementation has only two
    choices: it can either signal readiness based on the lost block or not
    do so.

    A) It signals readiness. But this thread has no idea what block it's
    supposed to read. It finds the file readable, but not read it attempts
    works. It spins out of control until the lost block is read in. The
    implementation fails and degenerates to worse than blocking semantics
    in this case.

    B) It does not signal readiness. But how can it distinguish this from
    the case where this thread is picking up the forgotten operation from
    step 1?

    So you cannot be edge-triggered and you cannot be level-triggered.
    What else is there?

    DS

+ Reply to Thread
Page 4 of 5 FirstFirst ... 2 3 4 5 LastLast