aio_read/write versus O_NONBLOCK - Unix
This is a discussion on aio_read/write versus O_NONBLOCK - Unix ; On May 27, 5:36*pm, phil-news-nos...@ipal.net wrote:
> | Which is what? Are you suggesting it should say that the file is not
> | ready for reading (since the data is not available) and that it never
> | should ...
-
Re: aio_read/write versus O_NONBLOCK
On May 27, 5:36*pm, phil-news-nos...@ipal.net wrote:
> | Which is what? Are you suggesting it should say that the file is not
> | ready for reading (since the data is not available) and that it never
> | should return ready no matter how long you wait, perhaps unless some
> | other process coincidentally happens to cause the first few bytes of
> | data to be resident in memory?
> Being "ready to read" does not mean the actual data is in RAM. *It just means
> you can proceed to do the read call if your program has an interest in
> reading something (which presumably it does if you included that descriptor
> for reading in poll or select).
If a file is *always* ready to read, whether or not the actual data is
in RAM, then what's the point of passing the descriptor to 'poll' or
'select'? To make non-blocking file I/O useful, you have to have some
rule for when a file is ready to read and when it's not.
> AIO is not necessary outside of the small thinking box you use.
Propose a coherent set of semantics. I believe it's impossible. If you
think it can be done, go ahead and do it. Tell me -- when should a
file be ready for read and when should a poll block?
> |> From then on, the kernel
> |> follows in lock-step; we've swapped the producer-consumer ordering--so that
> |> an open doesn't necessarily load a page on a whim--and alls well w/ the
> |> world.
> |
> | Exactly. What you really want is AIO semantics, and you could possibly
> | fake it in ugly and unsatisfying ways with non-blocking semantics.
>
> Faked? *Just follow the logic and it can work.
What is the logic? If you open a file and then pass it to 'select' for
read, what are you waiting for?
DS
-
Re: aio_read/write versus O_NONBLOCK
On Tue, 27 May 2008 19:22:47 -0700 (PDT) David Schwartz wrote:
| On May 27, 5:36?pm, phil-news-nos...@ipal.net wrote:
|
|> | Which is what? Are you suggesting it should say that the file is not
|> | ready for reading (since the data is not available) and that it never
|> | should return ready no matter how long you wait, perhaps unless some
|> | other process coincidentally happens to cause the first few bytes of
|> | data to be resident in memory?
|
|> Being "ready to read" does not mean the actual data is in RAM. ?It just means
|> you can proceed to do the read call if your program has an interest in
|> reading something (which presumably it does if you included that descriptor
|> for reading in poll or select).
|
| If a file is *always* ready to read, whether or not the actual data is
| in RAM, then what's the point of passing the descriptor to 'poll' or
| 'select'? To make non-blocking file I/O useful, you have to have some
| rule for when a file is ready to read and when it's not.
For each descriptor, or for each process' reference to a descriptor, keep
a flag that indicates if a read request is active. Another flag would be
kept for a write request.
When doing a poll, for each file/disk descriptor, if there is no request
currently active (and the filesystem is not in an error status) then poll
will present the file as ready for read and/or write (depending on which
flags are set).
When doing a read or write, if the request can be completed immediately
(for read, the data is already in RAM ... for write, there is buffer space
eligible to cache the data if the O_SYNC flag is not in effect), then that
read or write proceeds to completion or partial completion in cases where
that is allowed and necessary (normally this has not been the case for
files, but it is also something I believe should be allowed).
When doing a read or write, if the request cannot be completed immediately
(for read, the data is not in RAM ... for write, there is insufficient
buffer space eligible for caching this data or the O_SYNC flag is in effect)
then the I/O is queued/scheduled/started as appropriate, the descriptor is
marked active for that operation, errno is set to EAGAIN, and the syscall
returns -1.
When doing a read or write, if the associated device or filesystem is in
a special non-permanent delay condition (such as network recovery for NFS)
then errno is set to EAGAIN, and the syscall returns -1.
When the requested I/O completes and the data is in the buffer space, the
active flag is cleared. Processes in poll with this descriptor selected
will have that descriptor marked as ready and unblocked to return from
poll.
If a process that has done a read and gotten EAGAIN for some data then
calls a seek function to select different data and calls read again, and
if that data is available, that read will just return immediately without
clearing the active flag. If that second data is not available, then it
will just return EAGAIN. If it now calls poll and blocks there and wakes
up when the first request is done, and does read at the position for the
2nd request, it will get EAGAIN, but this time that 2nd request is started.
A more advanced implementation will use counters instead of flags and allow
a finite number of queued requests per descriptor.
|> AIO is not necessary outside of the small thinking box you use.
|
| Propose a coherent set of semantics. I believe it's impossible. If you
| think it can be done, go ahead and do it. Tell me -- when should a
| file be ready for read and when should a poll block?
See above.
|> |> From then on, the kernel
|> |> follows in lock-step; we've swapped the producer-consumer ordering--so that
|> |> an open doesn't necessarily load a page on a whim--and alls well w/ the
|> |> world.
|> |
|> | Exactly. What you really want is AIO semantics, and you could possibly
|> | fake it in ugly and unsatisfying ways with non-blocking semantics.
|>
|> Faked? ?Just follow the logic and it can work.
|
| What is the logic? If you open a file and then pass it to 'select' for
| read, what are you waiting for?
See above.
--
|WARNING: Due to extreme spam, googlegroups.com is blocked. Due to ignorance |
| by the abuse department, bellsouth.net is blocked. If you post to |
| Usenet from these places, find another Usenet provider ASAP. |
| Phil Howard KA9WGN (email for humans: first name in lower case at ipal.net) |
-
Re: aio_read/write versus O_NONBLOCK
phil-news-nospam@ipal.net writes:
> On Tue, 27 May 2008 12:25:10 +0200 Rainer Weikusat wrote:
> | phil-news-nospam@ipal.net writes:
> |> On Mon, 26 May 2008 13:33:35 +0200 Rainer Weikusat wrote:
> |> | phil-news-nospam@ipal.net writes:
> |> |> On Sun, 25 May 2008 19:18:57 +0200 Rainer Weikusat wrote:
> |> |> | phil-news-nospam@ipal.net writes:
> |> |> |> On Sun, 25 May 2008 09:49:07 +0200 Rainer Weikusat wrote:
> |> |> |> |> Why do you say it cannot work on a file?
> |> |> |> |
> |> |> |> | Without doing a read first, no data is ever going to be readable
> |> |> |> | from the file, because there is no active partner which could provide
> |> |> |> | it unilaterally.
> |> |> |>
> |> |> |> So why not read it first.
> |> |> |
> |> |> | ,----
> |> |> | | The next problem would be that the existing sychronous I/O
> |> |> | | multiplexing primitives are not designed for random access files but
> |> |> | | for (implicitly) time-ordered streams, ie it is completely ok to
> |> |> | | ^^^^^^^^^^^^^^^^^^^^^^
> |> |> | | create a socket and then call poll to wait until data to read is
> |> |> | | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^
> |> |> | | available. This cannot possibly work on a file.
> |> |> | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> |> |> | `----
> |> |>
> |> |> So why not read it first.
> |> |> ^^^^^^^^^^^^^^^^^^^^^^^^^
> |> |
> |> | Which part of 'create, then poll without read not worky-worky on disk
> |> | file' is too complicated for you?
> |>
> |> Why do you insist on not doing it the way that would work?
> |
> | Because I intended to give an example where synchronous
> | I/O-multiplexing would need to work differently than it usually does
> | when files were to be supported.
>
> You're just giving an example of a way to do things that does not work and
> ignore how to make things work. This examples nothing.
This 'examples' that the usual semantics of the call are unsuitable
for file I/O. This does not mean that you, Mr Ahern, the tooth fairy
or any other anonymous terrorist cannot 'silently' change the
multiplexing call to do something different which would make it
possible to use it with files, but that someone would have to.
Which is the single point I have been re-iterating for quite some time
now.
-
Re: aio_read/write versus O_NONBLOCK
phil-news-nospam@ipal.net writes:
> Rainer Weikusat wrote:
> | William Ahern writes:
> |> David Schwartz wrote:
> |
> | [...]
> |
> |>> So what should waiting do in this case? Should it be required to start
> |>> a read ahead, changing the state of the socket so that there is an
> |>> asynchronous operation?
> |>
> |> The same thing it has always done, return readiness. When an actual I/O
> |> request blocks, the kernel then has a sufficient hint to queue a
> |> request;
> |
> | That's the more straight-forward 'obvious hack-around': Make the first
> | call return a fictional result with an unknown probability of being
> | wrong and assume that the application will then call read, harvest
> | EAGAIN and go back to waiting, which will actually take place if
> | necessary on the second call. This has the 'nice' side effect of
> | breaking each and every piece of code which naively assumed that the
> | return values of system calls were actually meant in earnest and still
> | requires the behavious of the subroutines implementing the interface
> | to be modified because an operation the interface is not suitable for
> | shall (for some weird reason) be put under its umbrella, aka "all of
> | my fifteen children are called 'Hey you!!' and I just throw stones at the
> | one I actually meant to address".
>
> So your logic is that existing code ...
No.
-
Re: aio_read/write versus O_NONBLOCK
On May 27, 10:45*pm, phil-news-nos...@ipal.net wrote:
> For each descriptor, or for each process' reference to a descriptor, keep
> a flag that indicates if a read request is active. *Another flag would be
> kept for a write request.
Congratulations, you've proven my point. Non-blocking semantics don't
work for files, you need AIO. Having requests that are active and
whose completion you wait for is AIO semantics. Getting hints that
it's the right time to start a request is non-blocking semantics.
As I've been saying this whole time, to do discovery for descriptors
that reference files, you need AIO semantics. Non-blocking semantics
work for files but not for sockets.
It's really this simple -- with non-blocking semantics, there is no
way for the implementation to know *what* read you are waiting for.
The semantics require there to be one clear notion of "ready for
read". This is possible for sockets (unreceived data is present or
there's an error), but as your own comments show, this is impossible
for files (you need a pending request).
DS
-
Re: aio_read/write versus O_NONBLOCK
On Wed, 28 May 2008 07:22:23 -0700 (PDT) David Schwartz wrote:
| On May 27, 10:45?pm, phil-news-nos...@ipal.net wrote:
|
|> For each descriptor, or for each process' reference to a descriptor, keep
|> a flag that indicates if a read request is active. ?Another flag would be
|> kept for a write request.
|
| Congratulations, you've proven my point. Non-blocking semantics don't
| work for files, you need AIO. Having requests that are active and
| whose completion you wait for is AIO semantics. Getting hints that
| it's the right time to start a request is non-blocking semantics.
I've proven exactly the opposite, that it would work ... if the standard is
made to allow it to work.
| As I've been saying this whole time, to do discovery for descriptors
| that reference files, you need AIO semantics. Non-blocking semantics
| work for files but not for sockets.
Or use the semantics I described previously.
| It's really this simple -- with non-blocking semantics, there is no
| way for the implementation to know *what* read you are waiting for.
| The semantics require there to be one clear notion of "ready for
| read". This is possible for sockets (unreceived data is present or
| there's an error), but as your own comments show, this is impossible
| for files (you need a pending request).
It doesn't need to know what read you are waiting for. It only needs to know
if there is a currently active read (or sufficient number of them in the cases
of a more complex implementation that allow multiple requests to be scheduled)
that would prevent starting a new read.
--
|WARNING: Due to extreme spam, googlegroups.com is blocked. Due to ignorance |
| by the abuse department, bellsouth.net is blocked. If you post to |
| Usenet from these places, find another Usenet provider ASAP. |
| Phil Howard KA9WGN (email for humans: first name in lower case at ipal.net) |
-
Re: aio_read/write versus O_NONBLOCK
On Wed, 28 May 2008 08:41:11 +0200 Rainer Weikusat wrote:
| phil-news-nospam@ipal.net writes:
|> On Tue, 27 May 2008 12:25:10 +0200 Rainer Weikusat wrote:
|> | phil-news-nospam@ipal.net writes:
|> |> On Mon, 26 May 2008 13:33:35 +0200 Rainer Weikusat wrote:
|> |> | phil-news-nospam@ipal.net writes:
|> |> |> On Sun, 25 May 2008 19:18:57 +0200 Rainer Weikusat wrote:
|> |> |> | phil-news-nospam@ipal.net writes:
|> |> |> |> On Sun, 25 May 2008 09:49:07 +0200 Rainer Weikusat wrote:
|> |> |> |> |> Why do you say it cannot work on a file?
|> |> |> |> |
|> |> |> |> | Without doing a read first, no data is ever going to be readable
|> |> |> |> | from the file, because there is no active partner which could provide
|> |> |> |> | it unilaterally.
|> |> |> |>
|> |> |> |> So why not read it first.
|> |> |> |
|> |> |> | ,----
|> |> |> | | The next problem would be that the existing sychronous I/O
|> |> |> | | multiplexing primitives are not designed for random access files but
|> |> |> | | for (implicitly) time-ordered streams, ie it is completely ok to
|> |> |> | | ^^^^^^^^^^^^^^^^^^^^^^
|> |> |> | | create a socket and then call poll to wait until data to read is
|> |> |> | | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^
|> |> |> | | available. This cannot possibly work on a file.
|> |> |> | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|> |> |> | `----
|> |> |>
|> |> |> So why not read it first.
|> |> |> ^^^^^^^^^^^^^^^^^^^^^^^^^
|> |> |
|> |> | Which part of 'create, then poll without read not worky-worky on disk
|> |> | file' is too complicated for you?
|> |>
|> |> Why do you insist on not doing it the way that would work?
|> |
|> | Because I intended to give an example where synchronous
|> | I/O-multiplexing would need to work differently than it usually does
|> | when files were to be supported.
|>
|> You're just giving an example of a way to do things that does not work and
|> ignore how to make things work. This examples nothing.
|
| This 'examples' that the usual semantics of the call are unsuitable
| for file I/O. This does not mean that you, Mr Ahern, the tooth fairy
| or any other anonymous terrorist cannot 'silently' change the
| multiplexing call to do something different which would make it
| possible to use it with files, but that someone would have to.
| Which is the single point I have been re-iterating for quite some time
| now.
No one is suggesting being silent about it. I've raised the point that the
standard could have been made to allow non-blocking I/O using the existing
calls without changing how they work for any blocking case or for any case
with a non-file/disk descriptor. I've also given the details in a reply
elsewhere in this thread. Either show where the logic I described cannot
work, or be silent regarding my logic giving the appearance that you do not
see why it cannot work.
--
|WARNING: Due to extreme spam, googlegroups.com is blocked. Due to ignorance |
| by the abuse department, bellsouth.net is blocked. If you post to |
| Usenet from these places, find another Usenet provider ASAP. |
| Phil Howard KA9WGN (email for humans: first name in lower case at ipal.net) |
-
Re: aio_read/write versus O_NONBLOCK
phil-news-nospam@ipal.net writes:
> On Wed, 28 May 2008 08:41:11 +0200 Rainer Weikusat wrote:
[...]
> | This 'examples' that the usual semantics of the call are unsuitable
> | for file I/O. This does not mean that you, Mr Ahern, the tooth fairy
> | or any other anonymous terrorist cannot 'silently' change the
> | multiplexing call to do something different which would make it
> | possible to use it with files, but that someone would have to.
> | Which is the single point I have been re-iterating for quite some time
> | now.
>
[...]
> Either show where the logic I described cannot work, or be silent
> regarding my logic giving the appearance that you do not
> see why it cannot work.
The basic is question is not wether something could conceivably be
implemented (everything can be implemented, subject to the laws of
physics) but wether implementing it would be sensible. IMO,
overloading the existing synchronous I/O-multiplexing primitves to
accomplish a loosely-related 'other task' is not sensible, especially
taking into account that this 'other task' would only be useful in
fringe cases on computer systems using particular hardware and
particular filesystem layouts.
-
Re: aio_read/write versus O_NONBLOCK
On May 29, 12:03*am, phil-news-nos...@ipal.net wrote:
> | Congratulations, you've proven my point. Non-blocking semantics don't
> | work for files, you need AIO. Having requests that are active and
> | whose completion you wait for is AIO semantics. Getting hints that
> | it's the right time to start a request is non-blocking semantics.
> I've proven exactly the opposite, that it would work ... if the standard is
> made to allow it to work.
I'm obviously not going to convince you. Essentially, what you've done
is invented a badly broken version of AIO.
DS
-
Re: aio_read/write versus O_NONBLOCK
phil-news-nospam@ipal.net wrote:
> It doesn't need to know what read you are waiting for. It only needs to know
> if there is a currently active read (or sufficient number of them in the cases
> of a more complex implementation that allow multiple requests to be scheduled)
> that would prevent starting a new read.
It does need to know what read you are waiting for, actually, because a file
is in a sense--to draw a socket analogy--a buffer of infinite length. It
shouldn't trigger readiness merely because any random block has been paged
in. OTOH, it's _sensible_ to trigger when the most-likely-to-be-read page is
resident. (The Kantian argument that the kernel shouldn't lie to userland in
regards to the consumer-producer swap trick is just so outlandish....)
The only actual difference is that you want to [attempt to] reap the request
and initiate a copyout to userland yourself, whereas AIO insists that this
be done before you're notified of readiness (or rather, completion).
In the interests of purity AIO renders billions of lines of existing,
repurposeable code nearly useless. But semantically it's a thing of
beauty--largely because it ignores lots of real-life issues. (It's also
similar to Win32 IOCP, yet anecdotally Unix programmers--for better or
worse--have trouble integrating IOCP into their portable applications, even
though w/o IOCP the scalability of networking apps in Win32 is nil.)
Abstractly, nearly any non-blocking networking code can be supplanted by
AIO. In reality this is not the case. Sockets can do descriptor passing and
other types of ancillary message passing, like TCP OOB data. This already
muddies the waters, and these sorts of dilemmas have traditionally been
resolved in Unix by specifying a lesser common denominator interface.
Mr. Weikusat makes the point that different resources should, or could, be
treated differently, and that perhaps AIO shouldn't encompass sockets at all
(or at least not bother to account for the tangential problems). I take it
then that he enjoys Win32 programming, which IMO follows his sort of ethos,
which inevitably manifests in a hodge podge of peculiar interfaces, many
mutually exclusive--as opposed to vertical.
At the end of the day, what matters most to me (and I would hope most other
engineers) is code reusability. One day, a thousand years from now, AIO
might be just the ticket, and replace--similar to pthread's
accomplishment--all the myriad buffering libraries. It's not yet, and I
don't understand why people get so uptight about stepping stones. The usage
pattern of an initiated read request followed by a seek is so inestimably
remote beyond a small circle of well understood applications, it boggles my
mind why it's made into an issue at all.
Yet, Mr. Weikusat and Schwartz are in good company, along with Torvalds and
many other reasonable folk.
Mr. Weikusat gives the "show me the code" challenge. Unfortunately, I try to
write portable applications. A singular kernel implementation is useless (we
are talking about a spec afterall). I am writing a portable solution, using
the well known thread pool pattern, except I get to use sendfile(2) and
splice(2) optimizations in place of shuttling data through userland. I may
or may not be able to release that code, but it's straight-forward. I will
be releasing something called libkq*, which is a portable kqueue() library
which, I hope, will eventually portably handle AIO polling, and in
conjunction with a Win32 AIO/IOCP wrapper might allow me to make use of AIO
as a generic I/O interface in my applications. But I _first_ need the
stepping stone of pollable regular file descriptors, if only because I
cannot merely orphan all of my existing code and applications.
* kqueue() provides the semantics of lockless event insertion
across threads, which can be imitated using epoll() with no or minimally
scoped locking. libevent is too high level for such a task.
-
Re: aio_read/write versus O_NONBLOCK
On May 29, 2:14*pm, William Ahern
wrote:
> The usage
> pattern of an initiated read request followed by a seek is so inestimably
> remote beyond a small circle of well understood applications, it boggles my
> mind why it's made into an issue at all.
Huh? What?
One of the things that annoys me the most on Windows is that there is
no equivalent of 'pread' or 'pwrite' and I have to implement my own
locking around file accesses to prevent one thread from moving the
pointer and upsetting another thread's read or write operation.
What you are really arguing is that the usage pattern of concurrent
logically independent reads and writes to the same file is
"inestimably remote". So apparently in your world, most applications
don't, for example, contain or use any kind of database.
I think the thing your missing is that applications that use files
without seeking probably don't need AIO or non-blocking semantics
anyway. We're only interested in the universe of applications that are
going out of their way to avoid blocking, and these will likely have
concurrency in their file access too.
DS
-
Re: aio_read/write versus O_NONBLOCK
David Schwartz wrote:
> On May 29, 2:14?pm, William Ahern
> wrote:
>
> > The usage
> > pattern of an initiated read request followed by a seek is so inestimably
> > remote beyond a small circle of well understood applications, it boggles my
> > mind why it's made into an issue at all.
>
> Huh? What?
I meant a queued read request (or hint) which is then invalidated by a seek
on the same descriptor before completion of the original request. That is,
the implicit argument that non-blocking semantics would lead to too much
unnecessary I/O, or interpose too much code useless to too many
applications.
> One of the things that annoys me the most on Windows is that there is
> no equivalent of 'pread' or 'pwrite' and I have to implement my own
> locking around file accesses to prevent one thread from moving the
> pointer and upsetting another thread's read or write operation.
And on systems like Linux, when multiple threads are using pread/pwrite on
the same descriptor it's also useful to tell the I/O scheduler not to do
read-ahead, unless the fact of their use is a sufficient hint alone.
Alternatively, each thread could use a separate descriptor, if they intend
to seek and do long sequential operations. These are imprecise hints to the
kernel, for sure, but I don't see how such impreciseness detrimentaly
reflects on any argument for non-blocking file I/O request polling. Files,
unlike sockets and pipe, are susceptible to myriad sets of useful semantics,
depending on usage. AIO neatly obscures these patterns; it doesn't solve the
problem of negotiating w/ the kernel.
> What you are really arguing is that the usage pattern of concurrent
> logically independent reads and writes to the same file is
> "inestimably remote". So apparently in your world, most applications
> don't, for example, contain or use any kind of database.
You removed my qualifier specifically meant to encompass your example.
Databases are a definite, well known use, and whether regular decriptors
support polling or not they still benefit from hinting to the kernel how the
resource will be used. It's all the unclassifiable such uses, in various
applications, which I'm arguing are actually not common at all, because
aside from the random access requirements of databases (for which shared
semantics can be well defined), most others fall into the general category
of open+optional seek+read, read, read.
> I think the thing your missing is that applications that use files
> without seeking probably don't need AIO or non-blocking semantics
> anyway.
Except for the vast majority of server applications: HTTP, FTP, RTSP, CIFS,
etc, etc. A filesystem _is_ a database, and file names are hints to the
kernel about the access pattern to that database. If you're implementing
your own database on top of that, you should cooperate w/ the kernel to get
optimal performance--which you're also implicitly doing by opening a
particular file--in the same way I'm arguing that an application which
simply desires to read a file as a stream can hint to the kernel by
requesting non-blocking semantics, which in any event is already a pattern
accounted for in I/O schedulers.
All I'm arguing is that, though obviously the universe of software which
requires these abilities is relatively small, the subset which do concurrent
_random_ access are far less numerable than the rest, and I don't understand
why a solution which abstactly addresses both sets, yet which in practice
impedes existing development practices, should be heralded in a way which
excludes compatible and practical solutions.
> We're only interested in the universe of applications that are going out
> of their way to avoid blocking, and these will likely have concurrency in
> their file access too.
The most common sort of concurrency in file access is using multiple file
descriptors. Sharing a single descriptor is, I would argue, an optimization,
and an uncommon one, in terms of types of applications which make use of
it--databases being one example which probably accounts for most, and for
which reuseable implementations abound. (That open(2) might be expensive is
beside the point; it registers a consumer with a unique and more predictable
access behavior.) That's not an argument against pread/pwrite, however,
because allowing optional non-blocking semantics on regular files doesn't
intrude on the semantics or even necessarily the implementations of
pread/pwrite.
-
Re: aio_read/write versus O_NONBLOCK
William Ahern wrote:
> David Schwartz wrote:
> I meant a queued read request (or hint) which is then invalidated by a seek
> on the same descriptor before completion of the original request. That is,
> the implicit argument that non-blocking semantics would lead to too much
> unnecessary I/O, or interpose too much code useless to too many
> applications.
Exactly. This is precisely what would happen in many typical
applications. Consider a multi-threaded web server that is serving up
many copies of a file that's too large to fit in memory. Every time it
gets one 'read' for one client started up, the next one will screw it
up.
> And on systems like Linux, when multiple threads are using pread/pwrite on
> the same descriptor it's also useful to tell the I/O scheduler not to do
> read-ahead, unless the fact of their use is a sufficient hint alone.
> Alternatively, each thread could use a separate descriptor, if they intend
> to seek and do long sequential operations. These are imprecise hints to the
> kernel, for sure, but I don't see how such impreciseness detrimentaly
> reflects on any argument for non-blocking file I/O request polling. Files,
> unlike sockets and pipe, are susceptible to myriad sets of useful semantics,
> depending on usage. AIO neatly obscures these patterns; it doesn't solve the
> problem of negotiating w/ the kernel.
Nobody has yet been able to draft useful non-blocking semantics for
files. All they do is badly reinvent AIO semantics using the function
calls normally used for non-blocking semantics.
> > What you are really arguing is that the usage pattern of concurrent
> > logically independent reads and writes to the same file is
> > "inestimably remote". So apparently in your world, most applications
> > don't, for example, contain or use any kind of database.
> You removed my qualifier specifically meant to encompass your example.
> Databases are a definite, well known use, and whether regular decriptors
> support polling or not they still benefit from hinting to the kernel how the
> resource will be used. It's all the unclassifiable such uses, in various
> applications, which I'm arguing are actually not common at all, because
> aside from the random access requirements of databases (for which shared
> semantics can be well defined), most others fall into the general category
> of open+optional seek+read, read, read.
You're missing the point that there's no reason to consider
application that couldn't benefit from either AIO, non-blocking, or
any other type of asynchronous file I/O semantics. Once you're in the
universe of only programs that can benefit from this type of I/O, the
percentage that use files in sophisticated ways goes way up.
> > I think the thing your missing is that applications that use files
> > without seeking probably don't need AIO or non-blocking semantics
> > anyway.
> Except for the vast majority of server applications: HTTP, FTP, RTSP, CIFS,
> etc, etc. A filesystem _is_ a database, and file names are hints to the
> kernel about the access pattern to that database. If you're implementing
> your own database on top of that, you should cooperate w/ the kernel to get
> optimal performance--which you're also implicitly doing by opening a
> particular file--in the same way I'm arguing that an application which
> simply desires to read a file as a stream can hint to the kernel by
> requesting non-blocking semantics, which in any event is already a pattern
> accounted for in I/O schedulers.
I've written servers for almost all of these protocols. Having to deal
with the lack of 'pread'/'pwrite' on Windows was a PITA on every
single one of them. You can't open 8,000 copies of a file just because
8,000 clients want to transfer it and it doesn't fit in memory. I've
had to implement special locks on Windows to make the 'seek/read' and
'seek/write' atomic.
> All I'm arguing is that, though obviously the universe of software which
> requires these abilities is relatively small, the subset which do concurrent
> _random_ access are far less numerable than the rest, and I don't understand
> why a solution which abstactly addresses both sets, yet which in practice
> impedes existing development practices, should be heralded in a way which
> excludes compatible and practical solutions.
I have yet to see any version of non-blocking semantics for files that
allows useful concurrency. Being limited to one operation in each
direction per file or having some horrible ugliness pop up when you do
that is not useful concurrency in my book.
> > We're only interested in the universe of applications that are going out
> > of their way to avoid blocking, and these will likely have concurrency in
> > their file access too.
> The most common sort of concurrency in file access is using multiple file
> descriptors. Sharing a single descriptor is, I would argue, an optimization,
> and an uncommon one, in terms of types of applications which make use of
> it--databases being one example which probably accounts for most, and for
> which reuseable implementations abound. (That open(2) might be expensive is
> beside the point; it registers a consumer with a unique and more predictable
> access behavior.) That's not an argument against pread/pwrite, however,
> because allowing optional non-blocking semantics on regular files doesn't
> intrude on the semantics or even necessarily the implementations of
> pread/pwrite.
We're only talking about highly-optimized applications here. Non-
optimized applications would have no use whatsoever for asynchronous
or non-blocking semantics. Very, very few applications use this type
of semantic to this day and there's nothing they can't do. We're
talking about super-advanced optimizations only for applications that
want the very highest degree of concurrency.
To say that these applications don't share their file descriptors or
don't have complex access patterns is just crazy. Odds are these costs
would outweigh the benefits.
We are talking here about an optimization to use when other
optimizations don't do enough for you.
DS
-
Re: aio_read/write versus O_NONBLOCK
William Ahern writes:
> phil-news-nospam@ipal.net wrote:
>
>> It doesn't need to know what read you are waiting for. It only needs to know
>> if there is a currently active read (or sufficient number of them in the cases
>> of a more complex implementation that allow multiple requests to be scheduled)
>> that would prevent starting a new read.
>
> It does need to know what read you are waiting for, actually, because a file
> is in a sense--to draw a socket analogy--a buffer of infinite length. It
> shouldn't trigger readiness merely because any random block has been paged
> in. OTOH, it's _sensible_ to trigger when the most-likely-to-be-read page is
> resident.
So, it should both know what the application wants to read ('most-
likely-to-be-read'-page) and not know what the application wants to
read ('It does not need to know what read you are waiting for')?
> (The Kantian argument that the kernel shouldn't lie to userland in
> regards to the consumer-producer swap trick is just so
> outlandish....)
This remark was intended to refer to the recurring discussions
regarding wether 'multiplexing call returns "readable"' means 'the
next read will not block', and consequently, wether using of
O_NONBLOCK is 'necessary'. Obviously, when the call returns a result
with a 'random' probabilty of being wrong to begin with, applications
which were coded under the assumption that not only will the result
not be wrong, but that it will even be valid after an unspecified
amount of time has passed, will 'randomly' block when they
shouldn't in situations where they would not have blocked with the
unmodified call. Because this keeps being rediscussed every now and
then, the conjecture that such applications exist (and even exist in
numbers) appears justified to me. The means that especially
're-purposing code' which was written to deal with other types of
things file descriptors can be associated with, will not necessarily
be a trivial task.
OTOH, I agree with the general statement that 'subroutine should
perform according to their documentation' instead of 'performing
"tricks" based on assumptions about application coding patterns'.
'Tricks' are only needed if there is a need to trick someone into
something in the first place and they tend to be 'tricky', ie
non-obvious, hard to understand for someone not already knowing about
them and 'surprising' --- all things I would rather not have in code.
The suggestion that this could be a question of 'moral' is indeed
outlandish.
> The only actual difference is that you want to [attempt to] reap the request
> and initiate a copyout to userland yourself, whereas AIO insists that this
> be done before you're notified of readiness (or rather, completion).
There is another diffference: 'Streams' (like sequences
of TCP segments or UDP datagrams) are time-ordered: Each particular
unit of data is available at the same location during a particular time. A
file is 'space-ordered': Each particular unit of data is available at
a particular location during same time. Especially, this means that
for a stream, there is always a 'next' event to wait for and all other
events will occure one-by-one after the next event. This is not true
for a set of I/O-operations on a random access file, which can all be
started at the same time and complete, possibly even in parallell, in
any order. For the 'copy a file' example, this would mean that the
second, fourth and sixth 'unit of data' could already be written to
'the output file' at their respective positions, while the first,
third and fith are not yet in memory, using the basic pattern to start
n (some desired concurrency level) async reads at the same time, start
an async write whenever a read completes and another read whenever a
write completes.
[...]
> Mr. Weikusat makes the point that different resources should, or
> could, be treated differently, and that perhaps AIO shouldn't
> encompass sockets at all (or at least not bother to account for the
> tangential problems).
I wasn't writing about AIO but about the existing synchronous
I/O-multiplexing interfaces. These have been designed for time-ordered
streams of data and cannot be used for 'space-ordered data sets'
without modification. This means 'treating different things
differently' is necessary in any case. The remaining question would be
wether 'modifiying what is already there' or 'adding something new' is
more desirable. Have you ever heard of a maxim that one should rather
write a new program than add unrelated functionality to an existing
program?
> I take it then that he enjoys Win32 programming, which IMO follows
> his sort of ethos, which inevitably manifests in a hodge podge of
> peculiar interfaces, many mutually exclusive--as opposed to
> vertical.
The last (and only) version of Windows I have been in closer contact
with than occasionally helping someone with something was 3.1. I don't
use it and I don't develop anything for it. I don't even care if code
is portable to Windows, because unless this is specifically requested,
I am not willing to deal with any Microsoft weirdnesses.
> At the end of the day, what matters most to me (and I would hope most other
> engineers) is code reusability.
[...]
The 'engineer' used to be the guy who operated the (steam) engine, did
you know that? An optimist would hope that 'an engineer' would care
for technically sensible solutions in preferences to minimum-effort
solutions. But this could presumably be regarded as a German idee
fixe :->.
> Mr. Weikusat gives the "show me the code" challenge. Unfortunately, I try to
> write portable applications. A singular kernel implementation is useless (we
> are talking about a spec afterall).
This was a misunderstanding: What I intended to say was 'there are
apparently two mutually incompatible opinions regarding this
particular topic, with one of them being "what we have is fine" and
the other "it isn't"'. Consequently, the only way to settle this
dispute would be by experiment: Someone implementing the other idea to
see how it will work out in practice ('opinion' means 'everybody could
be wrong') and if it would gain any traction based on its actual,
technical merits.
-
Re: aio_read/write versus O_NONBLOCK
On May 30, 9:57*am, Rainer Weikusat wrote:
> This remark was intended to refer to the recurring discussions
> regarding wether 'multiplexing call returns "readable"' means 'the
> next read will not block',
The don't.
> and consequently, wether using of
> O_NONBLOCK is 'necessary'.
It is if you don't want to block.
> Obviously, when the call returns a result
> with a 'random' probabilty of being wrong to begin with, applications
> which were coded under the assumption that not only will the result
> not be wrong, but that it will even be valid after an unspecified
> amount of time has passed, will 'randomly' block when they
> shouldn't in situations where they would not have blocked with the
> unmodified call.
Of course applications coded with broken assumptions will break. This
is no different from an application that calls 'statfs' to check the
free space and then assumes a subsequent write won't run out of space
because the space was available in the past.
It is literally impossible for the kernel to guarantee the result of a
future operation and code that assumes such is broken beyond repair.
> Because this keeps being rediscussed every now and
> then, the conjecture that such applications exist (and even exist in
> numbers) appears justified to me.
This is a particularly common error, as it happens.
> The means that especially
> 're-purposing code' which was written to deal with other types of
> things file descriptors can be associated with, will not necessarily
> be a trivial task.
I agree that this is true, but this argument doesn't support that
claim. If existing applications are broken, it should be fixed,
period.
> OTOH, I agree with the general statement that 'subroutine should
> perform according to their documentation' instead of 'performing
> "tricks" based on assumptions about application coding patterns'.
> 'Tricks' are only needed if there is a need to trick someone into
> something in the first place and they tend to be 'tricky', ie
> non-obvious, hard to understand for someone not already knowing about
> them and 'surprising' --- all things I would rather not have in code.
That's why you can't make assumptions that might not be valid in
tricky circumstances. Assuming that because a read wouldn't have
blocked at time T, a read will not block at time T+1 assumes that
nothing tricky will happen. Then when something tricky does happen,
you are screwed. That's why we try to provide interfaces that don't
require you to make assumptions.
> The last (and only) version of Windows I have been in closer contact
> with than occasionally helping someone with something was 3.1. I don't
> use it and I don't develop anything for it. I don't even care if code
> is portable to Windows, because unless this is specifically requested,
> I am not willing to deal with any Microsoft weirdnesses.
I think this is a mistake. As a result, you won't learn another way of
doing things, in what way it's better and in what ways it's worse.
Also, I have found that the more different platforms I test my code
on, the more platform-independent bugs I find. Compiling and running
code on as many fundamentally-different platforms as possible
generates better code.
To give a simple example, I add some C++ code kind of like this:
class Foo
{
void Qux(bool);
void Qux(int);
void Qux(const char *);
void Quux(const Bar *);
};
Foo *j;
bar *k;
j->Qux(k); // Ack! Supposed to be Quux.
Yes, the compiler converted the pointer to a boolean and no warnings
or errors were generated. The bug was not caught until the code was
compiled on VC++ for Windows, and VC++ generated a warning.
The UNIX code got better because it was being compiled and tested on
Windows. This happens an awful lot.
DS
-
Re: aio_read/write versus O_NONBLOCK
On Thu, 29 May 2008 08:18:18 -0700 (PDT) David Schwartz wrote:
| On May 29, 12:03?am, phil-news-nos...@ipal.net wrote:
|
|> | Congratulations, you've proven my point. Non-blocking semantics don't
|> | work for files, you need AIO. Having requests that are active and
|> | whose completion you wait for is AIO semantics. Getting hints that
|> | it's the right time to start a request is non-blocking semantics.
|
|> I've proven exactly the opposite, that it would work ... if the standard is
|> made to allow it to work.
|
| I'm obviously not going to convince you. Essentially, what you've done
| is invented a badly broken version of AIO.
And what is broken about it?
--
|WARNING: Due to extreme spam, googlegroups.com is blocked. Due to ignorance |
| by the abuse department, bellsouth.net is blocked. If you post to |
| Usenet from these places, find another Usenet provider ASAP. |
| Phil Howard KA9WGN (email for humans: first name in lower case at ipal.net) |
-
Re: aio_read/write versus O_NONBLOCK
On May 31, 2:03*am, phil-news-nos...@ipal.net wrote:
> | I'm obviously not going to convince you. Essentially, what you've done
> | is invented a badly broken version of AIO.
> And what is broken about it?
You have got to be joking. One the very slight chance that you aren't
joking, I'll make a deal with you. You coherently explain the
semantics in one place and I'll rip them to shreds.
Start out by answering this question: what is the definition of "ready
for read" or "readable" for a file going to be? Consider the following
cases:
1) Open file, select for readability.
2) Open file, select for readability, another thread moves the file
pointer.
3) A program needs to read 20 bytes at one position and 30 bytes at
another position. How does it do this without blocking?
DS
-
Re: aio_read/write versus O_NONBLOCK
On Thu, 29 May 2008 14:14:42 -0700 William Ahern wrote:
| phil-news-nospam@ipal.net wrote:
|
|> It doesn't need to know what read you are waiting for. It only needs to know
|> if there is a currently active read (or sufficient number of them in the cases
|> of a more complex implementation that allow multiple requests to be scheduled)
|> that would prevent starting a new read.
|
| It does need to know what read you are waiting for, actually, because a file
| is in a sense--to draw a socket analogy--a buffer of infinite length. It
| shouldn't trigger readiness merely because any random block has been paged
| in. OTOH, it's _sensible_ to trigger when the most-likely-to-be-read page is
| resident. (The Kantian argument that the kernel shouldn't lie to userland in
| regards to the consumer-producer swap trick is just so outlandish....)
Needing to know what part of the file the process is waiting for is merely
an idealistic notion. The logic I described shows how the kernel only needs
to keep track of two things, which it already does: 1: which parts of the
file are already in RAM ... 2: which part(s) of the file it is doing I/O to
read into RAM. My logic doesn't trigger readiness because a random block is
paged in (although the idealistic construct might want to do that). Instead,
it triggers readiness when a block ... whatever it is ... that was requested
by the process, has been read in. The readiness logic is deferring to the
caching logic to keep track of what is "in" (what can be instantly accessed).
I have no idea what you mean about this "Kantian argument".
| The only actual difference is that you want to [attempt to] reap the request
| and initiate a copyout to userland yourself, whereas AIO insists that this
| be done before you're notified of readiness (or rather, completion).
I don't know what role you are referring to in the "yourself" reference so I
am unable to determine the meaning of this sentence.
| In the interests of purity AIO renders billions of lines of existing,
| repurposeable code nearly useless. But semantically it's a thing of
| beauty--largely because it ignores lots of real-life issues. (It's also
| similar to Win32 IOCP, yet anecdotally Unix programmers--for better or
| worse--have trouble integrating IOCP into their portable applications, even
| though w/o IOCP the scalability of networking apps in Win32 is nil.)
It very well may be that AIO itself is more a thing of beauty than the
lecagy I/O call interface. It might well have made things work a lot
better and cleaner had AIO been used for everything in place of the I/O
calls most commonly used now. Had the original Unix designers used it
and only it, I'm sure we would not be having this discussion at all.
What *I* consider to be a thing of ugliness is mixing two different ways
to do things. I see the beauty of the conventional I/O interface as more
a thing of beauty if it is allowed to be complete (as in, allowed to have
its non-blocking semantics for everything).
I'm looking not at AIO or CIO (classic I/O) for the beauty, but rather, at
the whole of the Unix/POSIX interface. The beauty in it for me would be
to have _one_ kind of interface that can do it all. Either AIO or CIO has
the potential to do that. My logic explains how CIO can. Currently, POSIX
allows neither method to realize its full ability.
| Abstractly, nearly any non-blocking networking code can be supplanted by
| AIO. In reality this is not the case. Sockets can do descriptor passing and
| other types of ancillary message passing, like TCP OOB data. This already
| muddies the waters, and these sorts of dilemmas have traditionally been
| resolved in Unix by specifying a lesser common denominator interface.
|
| Mr. Weikusat makes the point that different resources should, or could, be
| treated differently, and that perhaps AIO shouldn't encompass sockets at all
| (or at least not bother to account for the tangential problems). I take it
| then that he enjoys Win32 programming, which IMO follows his sort of ethos,
| which inevitably manifests in a hodge podge of peculiar interfaces, many
| mutually exclusive--as opposed to vertical.
The big problem with AIO and CIO is that a program cannot use both very
easily. The reason is AIO has its own process suspension method (that
being aio_suspend) different from the CIO method (poll or select). What
would happen if you schedule I/O with AIO and then call poll()?
| At the end of the day, what matters most to me (and I would hope most other
| engineers) is code reusability. One day, a thousand years from now, AIO
| might be just the ticket, and replace--similar to pthread's
| accomplishment--all the myriad buffering libraries. It's not yet, and I
| don't understand why people get so uptight about stepping stones. The usage
| pattern of an initiated read request followed by a seek is so inestimably
| remote beyond a small circle of well understood applications, it boggles my
| mind why it's made into an issue at all.
Maybe it can. But what is still a crucial need is to have a single common
means for a process to wait on _all_ descriptors (and other resources) at
one time. If it is possible to have one set of calls for all kinds of I/O
then that is great. If not, so be it. But the waiting/suspension part does
need to be in common.
--
|WARNING: Due to extreme spam, googlegroups.com is blocked. Due to ignorance |
| by the abuse department, bellsouth.net is blocked. If you post to |
| Usenet from these places, find another Usenet provider ASAP. |
| Phil Howard KA9WGN (email for humans: first name in lower case at ipal.net) |
-
Re: aio_read/write versus O_NONBLOCK
On Thu, 29 May 2008 13:31:06 +0200 Rainer Weikusat wrote:
| phil-news-nospam@ipal.net writes:
|> On Wed, 28 May 2008 08:41:11 +0200 Rainer Weikusat wrote:
|
| [...]
|
|> | This 'examples' that the usual semantics of the call are unsuitable
|> | for file I/O. This does not mean that you, Mr Ahern, the tooth fairy
|> | or any other anonymous terrorist cannot 'silently' change the
|> | multiplexing call to do something different which would make it
|> | possible to use it with files, but that someone would have to.
|> | Which is the single point I have been re-iterating for quite some time
|> | now.
|>
|
| [...]
|
|> Either show where the logic I described cannot work, or be silent
|> regarding my logic giving the appearance that you do not
|> see why it cannot work.
|
| The basic is question is not wether something could conceivably be
| implemented (everything can be implemented, subject to the laws of
| physics) but wether implementing it would be sensible. IMO,
| overloading the existing synchronous I/O-multiplexing primitves to
| accomplish a loosely-related 'other task' is not sensible, especially
| taking into account that this 'other task' would only be useful in
| fringe cases on computer systems using particular hardware and
| particular filesystem layouts.
"sensible" is subjective.
I believe that the logic I described is not overloading anything at all.
Instead, I believe that what it does is restores the part that was removed
by the arbitrary decision that non-blocking would not be allowed for file
or disk I/O.
I believe that the logic I described is completely sensible. If it had
been included in the original design, and left in place to this day,
would you be continually complaining about it?
--
|WARNING: Due to extreme spam, googlegroups.com is blocked. Due to ignorance |
| by the abuse department, bellsouth.net is blocked. If you post to |
| Usenet from these places, find another Usenet provider ASAP. |
| Phil Howard KA9WGN (email for humans: first name in lower case at ipal.net) |
-
Re: aio_read/write versus O_NONBLOCK
On May 31, 2:27*am, phil-news-nos...@ipal.net wrote:
> Needing to know what part of the file the process is waiting for is merely
> an idealistic notion. *The logic I described shows how the kernel only needs
> to keep track of two things, which it already does: *1: which parts of the
> file are already in RAM ... 2: which part(s) of the file it is doing I/O to
> read into RAM. *My logic doesn't trigger readiness because a random block is
> paged in (although the idealistic construct might want to do that). *Instead,
> it triggers readiness when a block ... whatever it is ... that was requested
> by the process, has been read in. *The readiness logic is deferring to the
> caching logic to keep track of what is "in" (what can be instantly accessed).
That won't work for at least two reasons. The most obvious is this:
1) Long ago the process tried to read a particular block of a file. It
no longer cares about this particular block nor even really knows what
block it was. The logic associated with that block in the process has
long since been cancelled.
2) Some thread blocks for readiness. The implementation has only two
choices: it can either signal readiness based on the lost block or not
do so.
A) It signals readiness. But this thread has no idea what block it's
supposed to read. It finds the file readable, but not read it attempts
works. It spins out of control until the lost block is read in. The
implementation fails and degenerates to worse than blocking semantics
in this case.
B) It does not signal readiness. But how can it distinguish this from
the case where this thread is picking up the forgotten operation from
step 1?
So you cannot be edge-triggered and you cannot be level-triggered.
What else is there?
DS