select()/write() semantics - Linux

This is a discussion on select()/write() semantics - Linux ; >From select(2): A file descriptor is considered ready if it is possible to perform the corresponding I/O operation (e.g., read(2)) without blocking. >From pipe(7): The precise semantics [ of write ] depend on whether the file descriptor is non-blocking (O_NONBLOCK), ...

+ Reply to Thread
Results 1 to 13 of 13

Thread: select()/write() semantics

  1. select()/write() semantics


    >From select(2):


    A file descriptor is considered ready if it is
    possible to perform the corresponding I/O operation
    (e.g., read(2)) without blocking.

    >From pipe(7):


    The precise semantics [ of write ] depend on whether
    the file descriptor is non-blocking (O_NONBLOCK),
    whether there are multiple writers to the pipe,
    and on n, the number of bytes to be written:
    O_NONBLOCK disabled, n <= PIPE_BUF All n bytes are
    written atomically; write(2) may block if there is
    not room for n bytes to be written immediately


    I've noticed that attempts to write n > PIPE_BUF
    on a file descriptor which select has reported as
    being ready will often block. I've never noticed an
    attempt to write <= PIPE_BUF to block, but from the
    documentation, I haven't found a definitive statement
    that such a write will not block. It seems to be
    implied -- since writes of less that PIPE_BUF must be
    atomic, then if select reports that a file descriptor
    is ready for writing, a write of n <= PIPE_BUF bytes
    is guaranteed not to block. On the other hand, a
    pathological case may be that a pipe is only able to
    receive 1 byte, in which case select reports that it
    is ready, but a write of 2 bytes will block. In other
    words, even using select(), you can never be certain
    that a write of more than 1 byte will not block.

    Can anyone find a definitive confirmation in the
    documentation of the following statement:

    If select(2) reports that a file descriptor (that
    does not refer to a socket ) is ready for writing,
    then a write of n <= PIPE_BUF bytes will not block.


  2. Re: select()/write() semantics

    William Pursell writes:
    >>From select(2):

    >
    > A file descriptor is considered ready if it is
    > possible to perform the corresponding I/O operation
    > (e.g., read(2)) without blocking.
    >
    >>From pipe(7):

    >
    > The precise semantics [ of write ] depend on whether
    > the file descriptor is non-blocking (O_NONBLOCK),
    > whether there are multiple writers to the pipe,
    > and on n, the number of bytes to be written:
    > O_NONBLOCK disabled, n <= PIPE_BUF All n bytes are
    > written atomically; write(2) may block if there is
    > not room for n bytes to be written immediately
    >
    > I've noticed that attempts to write n > PIPE_BUF
    > on a file descriptor which select has reported as
    > being ready will often block.


    That is to be expected. The producer process will likely copy data
    into the pipe buffer until that is full. The consumer is blocked in
    the kernel, waiting for data to arrive in the pipe buffer. After the
    buffer has been filled completely, the writer will go to sleep and
    the read will be awoken.

    > I've never noticed an attempt to write <= PIPE_BUF to block, but
    > from the documentation, I haven't found a definitive statement
    > that such a write will not block. It seems to be
    > implied -- since writes of less that PIPE_BUF must be
    > atomic, then if select reports that a file descriptor
    > is ready for writing, a write of n <= PIPE_BUF bytes
    > is guaranteed not to block.


    Assuming there is only a single process/ thread writing to the pipe,
    yes. Otherwise, 'someone else' could be faster, causing another write
    to block until the buffer is empty again.

    [...]

    > Can anyone find a definitive confirmation in the
    > documentation of the following statement:
    >
    > If select(2) reports that a file descriptor (that
    > does not refer to a socket ) is ready for writing,
    > then a write of n <= PIPE_BUF bytes will not block.


    I can find documentation for the opposite:

    Write requests to a pipe or FIFO shall be handled in the same
    way as a regular file with the following exceptions:

    [...]

    If the O_NONBLOCK flag is clear, a write request may cause the
    thread to block,
    [SUS]

  3. Re: select()/write() semantics

    On 2007-06-14, William Pursell wrote:
    >
    >>From select(2):

    >
    > A file descriptor is considered ready if it is
    > possible to perform the corresponding I/O operation
    > (e.g., read(2)) without blocking.
    >
    >>From pipe(7):

    >
    > The precise semantics [ of write ] depend on whether
    > the file descriptor is non-blocking (O_NONBLOCK),
    > whether there are multiple writers to the pipe,
    > and on n, the number of bytes to be written:
    > O_NONBLOCK disabled, n <= PIPE_BUF All n bytes are
    > written atomically; write(2) may block if there is
    > not room for n bytes to be written immediately





    > I've noticed that attempts to write n > PIPE_BUF
    > on a file descriptor which select has reported as
    > being ready will often block. I've never noticed an
    > attempt to write <= PIPE_BUF to block, but from the
    > documentation, I haven't found a definitive statement
    > that such a write will not block.


    mmy copy of pipe(7) offers:

    O_NONBLOCK disabled, n <= PIPE_BUF
    All n bytes are written atomically; write(2) may block if there
    is not room for n bytes to be written immediately

    O_NONBLOCK enabled, n <= PIPE_BUF
    If there is room to write n bytes to the pipe, then write(2)
    succeeds immediately, writing all n bytes; otherwise write(2)
    fails, with errno set to EAGAIN.


    > It seems to be
    > implied -- since writes of less that PIPE_BUF must be
    > atomic, then if select reports that a file descriptor
    > is ready for writing, a write of n <= PIPE_BUF bytes
    > is guaranteed not to block.


    with O_nonblock writes never block, sometimes they give
    E_AGAIN however.

    > If select(2) reports that a file descriptor (that
    > does not refer to a socket ) is ready for writing,
    > then a write of n <= PIPE_BUF bytes will not block.



    It seems that select only says there's room when there's room for
    PIPE_BUF.



    #include
    #include
    #include
    #include

    char buffer[65536]="";

    int a[10]={[1 ... 3] = 2};

    int fds[2];

    int main()
    {
    int readme,writeme;
    pipe(fds);
    readme=fds[0];
    writeme=fds[1];

    fcntl(writeme,F_SETFL, O_NONBLOCK);

    printf("write of 64000 res=%d\n", write(writeme,buffer,64000));

    {
    fd_set rd, wr, er;

    struct timeval timeout= { 1,0 } ;

    FD_ZERO (&rd);
    FD_ZERO (&wr);
    FD_ZERO (&er);
    FD_SET (writeme, &wr);

    printf("Select : %d \n",select(writeme+1,&rd,&wr,&er,&timeout));

    }

    printf("write of 1000 res=%d\n", write(writeme,buffer,1000));

    printf("write of 1000 res=%d\n", write(writeme,buffer,1000));
    printf("Errno= %d\n",errno);
    perror("result");
    return 0;
    }





    --

    Bye.
    Jasen

  4. Re: select()/write() semantics

    On Jun 13, 10:36 pm, William Pursell wrote:

    > If select(2) reports that a file descriptor (that
    > does not refer to a socket ) is ready for writing,
    > then a write of n <= PIPE_BUF bytes will not block.


    Such a statement would be as false and misleading as saying that "if
    'access' reports that a file can be accessed, a subsequent 'open' will
    not fail" or that "if 'stat' reports that a file does not exist, a
    subsequent 'creat' will not return 'EEXIST'".

    Status functions, like 'select', never ever predict the future. The
    only way to assure that a 'write' doesn't block is to set the
    descriptor non-blocking.

    DS


  5. Re: select()/write() semantics

    David Schwartz writes:
    > On Jun 13, 10:36 pm, William Pursell wrote:
    >
    >> If select(2) reports that a file descriptor (that
    >> does not refer to a socket ) is ready for writing,
    >> then a write of n <= PIPE_BUF bytes will not block.


    Contrary to this text, the original statement was

    Can anyone find a definitive confirmation in the
    documentation of the following statement:

    If select(2) reports that a file descriptor (that
    does not refer to a socket ) is ready for writing,
    then a write of n <= PIPE_BUF bytes will not block.

    This means it was a question if the statement was true, not an
    assertation of its trueness.

    > Such a statement would be as false and misleading as saying that "if
    > 'access' reports that a file can be accessed, a subsequent 'open' will
    > not fail" or that "if 'stat' reports that a file does not exist, a
    > subsequent 'creat' will not return 'EEXIST'".


    These are two bad examples. Both 'access' and 'stat' query the
    filesystem, which can be modified by any other process at any point in
    time. The "I/O readiness state" of a particular file descriptor can
    not generally be changed by independent applications after the
    descriptor has entered a particular "I/O readiness state". For the
    pipe example given, if the process calling select is the only writer,
    the write will succeed without blocking. The same would trivially be
    true for a TCP socket descriptor, for instance. If there is room in
    the socket write buffer, this room will remain available until
    consumed.

  6. Re: select()/write() semantics

    On Jun 17, 12:03 pm, Rainer Weikusat wrote:

    > David Schwartz writes:


    > Contrary to this text, the original statement was
    >
    > Can anyone find a definitive confirmation in the
    > documentation of the following statement:
    >
    > If select(2) reports that a file descriptor (that
    > does not refer to a socket ) is ready for writing,
    > then a write of n <= PIPE_BUF bytes will not block.


    > This means it was a question if the statement was true, not an
    > assertation of its trueness.


    Right, I'm saying it's not true.

    > > Such a statement would be as false and misleading as saying that "if
    > > 'access' reports that a file can be accessed, a subsequent 'open' will
    > > not fail" or that "if 'stat' reports that a file does not exist, a
    > > subsequent 'creat' will not return 'EEXIST'".


    > These are two bad examples. Both 'access' and 'stat' query the
    > filesystem, which can be modified by any other process at any point in
    > time.


    Exactly. Meanwhile 'select' queries the I/O subsystem, typically about
    network connections. At a minimum, these connections have a remote
    end, which can change the status of those connections at any time.

    > The "I/O readiness state" of a particular file descriptor can
    > not generally be changed by independent applications after the
    > descriptor has entered a particular "I/O readiness state".


    The question is whether you had a guarantee or not. That something
    "generally cannot change" means precisely the same as saying it is
    *NOT* guaranteed not to change.

    > For the
    > pipe example given, if the process calling select is the only writer,
    > the write will succeed without blocking.


    How do you know this?

    > The same would trivially be
    > true for a TCP socket descriptor, for instance. If there is room in
    > the socket write buffer, this room will remain available until
    > consumed.


    How do you know this?

    You are simply taking specific knowledge about specific
    implementations handling specific cases and trying to get a guarantee
    out of that. This simply does not work and has caused real-life code
    to break in the past.

    What you're saying is that you can't think of any way the readiness
    state could change. But at least twice before people have said this,
    and ways the readiness state could change that they didn't think of
    bit people on the ass. (The Linux inetd denial of service attack and
    the various accept deadlocks.)

    "I can't think of a way it can change" is not the same as "it cannot
    change because some standard says so".

    DS


  7. Re: select()/write() semantics

    David Schwartz writes:
    > On Jun 17, 12:03 pm, Rainer Weikusat wrote:
    >> David Schwartz writes:


    [...]

    >> > Such a statement would be as false and misleading as saying that "if
    >> > 'access' reports that a file can be accessed, a subsequent 'open' will
    >> > not fail" or that "if 'stat' reports that a file does not exist, a
    >> > subsequent 'creat' will not return 'EEXIST'".

    >
    >> These are two bad examples. Both 'access' and 'stat' query the
    >> filesystem, which can be modified by any other process at any point in
    >> time.

    >
    > Exactly. Meanwhile 'select' queries the I/O subsystem,


    There is no such thing as 'an I/O subsystem'. 'I/O readiness' is a
    per-descriptor property which is implemented by a struct file method.

    > typically about network connections.


    That something happens 'typically' is different from 'something
    happens always'. Specifically, FIFOs are not network connection, and
    select behaviour on FIFOs was discussed.

    > At a minimum, these connections have a remote end, which can change
    > the status of those connections at any time.


    For connected sockets, the remote end can either send data or
    terminate the connection. It cannot modify anything about already
    received data.

    >> The "I/O readiness state" of a particular file descriptor can
    >> not generally be changed by independent applications after the
    >> descriptor has entered a particular "I/O readiness state".

    >
    > The question is whether you had a guarantee or not. That something
    > "generally cannot change" means precisely the same as saying it is
    > *NOT* guaranteed not to change.


    More precisely, it means that the status may or may not change,
    depending on the specific circumstances of the situation.

    >>> For the pipe example given, if the process calling select is the only writer,

    >> the write will succeed without blocking.

    >
    > How do you know this?


    Because there is room in the pipe write buffer and nobody except the one
    process having access to this buffer can change anything about that.

    Below is the kernel implementation for polling on FIFOs:

    static unsigned int
    pipe_poll(struct file *filp, poll_table *wait)
    {
    unsigned int mask;
    struct inode *inode = filp->f_path.dentry->d_inode;
    struct pipe_inode_info *pipe = inode->i_pipe;
    int nrbufs;

    poll_wait(filp, &pipe->wait, wait);

    /* Reading only -- no need for acquiring the semaphore. */
    nrbufs = pipe->nrbufs;
    mask = 0;
    if (filp->f_mode & FMODE_READ) {
    mask = (nrbufs > 0) ? POLLIN | POLLRDNORM : 0;
    if (!pipe->writers && filp->f_version != pipe->w_counter)
    mask |= POLLHUP;
    }

    if (filp->f_mode & FMODE_WRITE) {
    mask |= (nrbufs < PIPE_BUFFERS) ? POLLOUT | POLLWRNORM : 0;
    /*
    * Most Unices do not set POLLERR for FIFOs but on Linux they
    * behave exactly like pipes for poll().
    */
    if (!pipe->readers)
    mask |= POLLERR;
    }

    return mask;
    }
    [../linux/fs/pipe.c]
    (for a general description, try )

    The routine returns 'ready for reading' if pipe->nrbufs is larger
    than zero and 'ready for writting' if pipe->nrbufs is smaller than
    PIPE_BUFFERS. The pipe->nrbufs values is only modified in pipe_read
    (decrements it if a buffer was consumed) and pipe_write (increments it
    if a buffer was added). pipe_write blocks only if pipe->nrbufs becomes
    larger than PIPE_BUFFERS before it has written all the requested data
    to a set of pipe buffers. Since each pipe buffer has a size of
    PIPE_BUF, at least one can be added if pipe_poll returns 'writable'
    and nobody adds pipe buffers except pipe_write, pipe_write will not
    block when writing <= PIPE_BUF octets after poll has returned writable
    provided only a single process can write to the pipe.

    BTW, that I have looked this up in the kernel source was basically for
    sport. The 'room in the buffer' condition of a pipe does not change
    except if data is added to the buffer.

    >> The same would trivially be true for a TCP socket descriptor, for
    >> instance. If there is room in the socket write buffer, this room
    >> will remain available until consumed.

    >
    > How do you know this?


    Strictly speaking, not at all. But that would be the answer to the
    question 'How do you know that Schroedinger's cat will be dead after
    ten years in its box', and the topic if discussion would then be a
    philosophical one. I am willing to assume 'cause and effect' and the
    possibility of deductions that correctly predict future effects as
    given.

    > You are simply taking specific knowledge about specific
    > implementations handling specific cases and trying to get a guarantee
    > out of that.


    I have mentioned two specific situations. A specific device driver
    could, for instance, implement write as 'block the process, delete the
    harddisk contents and make the soundcard cry curses in arabic
    languages'.

    [...]

    > "I can't think of a way it can change" is not the same as "it cannot
    > change because some standard says so".


    That 'some standard says' X should behave such-and-such tells me
    nothing about the actual behaviour of code claimed to implement X.

  8. Re: select()/write() semantics

    David Schwartz writes:
    > On Jun 17, 12:03 pm, Rainer Weikusat wrote:
    >> David Schwartz writes:


    [...]

    >> > Such a statement would be as false and misleading as saying that "if
    >> > 'access' reports that a file can be accessed, a subsequent 'open' will
    >> > not fail" or that "if 'stat' reports that a file does not exist, a
    >> > subsequent 'creat' will not return 'EEXIST'".

    >
    >> These are two bad examples. Both 'access' and 'stat' query the
    >> filesystem, which can be modified by any other process at any point in
    >> time.

    >
    > Exactly. Meanwhile 'select' queries the I/O subsystem,


    There is no such thing as 'an I/O subsystem'. 'I/O readiness' is a
    per-descriptor property which is implemented by a struct file method.

    > typically about network connections.


    That something happens 'typically' is different from 'something
    happens always'. Specifically, FIFOs are not network connection, and
    select behaviour on FIFOs was discussed.

    > At a minimum, these connections have a remote end, which can change
    > the status of those connections at any time.


    For connected sockets, the remote end can either send data or
    terminate the connection. It cannot modify anything about already
    received data.

    >> The "I/O readiness state" of a particular file descriptor can
    >> not generally be changed by independent applications after the
    >> descriptor has entered a particular "I/O readiness state".

    >
    > The question is whether you had a guarantee or not. That something
    > "generally cannot change" means precisely the same as saying it is
    > *NOT* guaranteed not to change.


    More precisely, it means that the status may or may not change,
    depending on the specific circumstances of the situation.

    >>> For the pipe example given, if the process calling select is the only writer,

    >> the write will succeed without blocking.

    >
    > How do you know this?


    Because there is room in the pipe write buffer and nobody except the one
    process having access to this buffer can change anything about that.

    Below is the kernel implementation for polling on FIFOs:

    static unsigned int
    pipe_poll(struct file *filp, poll_table *wait)
    {
    unsigned int mask;
    struct inode *inode = filp->f_path.dentry->d_inode;
    struct pipe_inode_info *pipe = inode->i_pipe;
    int nrbufs;

    poll_wait(filp, &pipe->wait, wait);

    /* Reading only -- no need for acquiring the semaphore. */
    nrbufs = pipe->nrbufs;
    mask = 0;
    if (filp->f_mode & FMODE_READ) {
    mask = (nrbufs > 0) ? POLLIN | POLLRDNORM : 0;
    if (!pipe->writers && filp->f_version != pipe->w_counter)
    mask |= POLLHUP;
    }

    if (filp->f_mode & FMODE_WRITE) {
    mask |= (nrbufs < PIPE_BUFFERS) ? POLLOUT | POLLWRNORM : 0;
    /*
    * Most Unices do not set POLLERR for FIFOs but on Linux they
    * behave exactly like pipes for poll().
    */
    if (!pipe->readers)
    mask |= POLLERR;
    }

    return mask;
    }
    [../linux/fs/pipe.c]
    (for a general description, try )

    The routine returns 'ready for reading' if pipe->nrbufs is larger
    than zero and 'ready for writting' if pipe->nrbufs is smaller than
    PIPE_BUFFERS. The pipe->nrbufs values is only modified in pipe_read
    (decrements it if a buffer was consumed) and pipe_write (increments it
    if a buffer was added). pipe_write blocks only if pipe->nrbufs becomes
    equal to PIPE_BUFFERS before it has written all the requested data
    to a set of pipe buffers. Since each pipe buffer has a size of
    PIPE_BUF, at least one can be added if pipe_poll returns 'writable'
    and nobody adds pipe buffers except pipe_write, pipe_write will not
    block when writing <= PIPE_BUF octets after poll has returned writable
    provided only a single process can write to the pipe.

    BTW, that I have looked this up in the kernel source was basically for
    sport. The 'room in the buffer' condition of a pipe does not change
    except if data is added to the buffer.

    >> The same would trivially be true for a TCP socket descriptor, for
    >> instance. If there is room in the socket write buffer, this room
    >> will remain available until consumed.

    >
    > How do you know this?


    Strictly speaking, not at all. But that would be the answer to the
    question 'How do you know that Schroedinger's cat will be dead after
    ten years in its box', and the topic if discussion would then be a
    philosophical one. I am willing to assume 'cause and effect' and the
    possibility of deductions that correctly predict future effects as
    given.

    > You are simply taking specific knowledge about specific
    > implementations handling specific cases and trying to get a guarantee
    > out of that.


    I have mentioned two specific situations. A specific device driver
    could, for instance, implement write as 'block the process, delete the
    harddisk contents and make the soundcard cry curses in arabic
    languages'.

    [...]

    > "I can't think of a way it can change" is not the same as "it cannot
    > change because some standard says so".


    That 'some standard says' X should behave such-and-such tells me
    nothing about the actual behaviour of code claimed to implement X.

  9. Re: select()/write() semantics

    On Jun 19, 5:04 am, Rainer Weikusat wrote:

    > > Exactly. Meanwhile 'select' queries the I/O subsystem,


    > There is no such thing as 'an I/O subsystem'. 'I/O readiness' is a
    > per-descriptor property which is implemented by a struct file method.


    You are confusing attributes of some particular piece of code with
    attributes of the functions that code implements.

    > > typically about network connections.


    > That something happens 'typically' is different from 'something
    > happens always'. Specifically, FIFOs are not network connection, and
    > select behaviour on FIFOs was discussed.


    Exactly. So nothing about network connections turns into a *guarantee*
    about what select will do.

    > > At a minimum, these connections have a remote end, which can change
    > > the status of those connections at any time.


    > For connected sockets, the remote end can either send data or
    > terminate the connection. It cannot modify anything about already
    > received data.


    How do you know that? Where is that written? What document says that
    there cannot exist a network protocol or non-standard function to
    modify data that has already been received by the operating system but
    not passed to the application?

    You are taking things that you know about specific implementations of
    specific operations and trying to get out of that a guarantee about a
    generic function. That is simply impossible.

    > >> The "I/O readiness state" of a particular file descriptor can
    > >> not generally be changed by independent applications after the
    > >> descriptor has entered a particular "I/O readiness state".


    > > The question is whether you had a guarantee or not. That something
    > > "generally cannot change" means precisely the same as saying it is
    > > *NOT* guaranteed not to change.


    > More precisely, it means that the status may or may not change,
    > depending on the specific circumstances of the situation.


    Exactly.

    > >>> For the pipe example given, if the process calling select is the only writer,
    > >> the write will succeed without blocking.

    >
    > > How do you know this?


    > Because there is room in the pipe write buffer and nobody except the one
    > process having access to this buffer can change anything about that.


    How do you know that? Where is it written that there cannot ever exist
    a function to allow another process to modify the data in a pipe
    buffer?

    You are claiming that something is guaranteed based on their not
    existing, to your knowledge, a way to break that guarantee. That is
    just not a valid methodology at all. You have guarantees when the
    standards in fact provide them, not when you can't think of any way
    the guarantee might be violated.

    > Below is the kernel implementation for polling on FIFOs:
    >
    > static unsigned int
    > pipe_poll(struct file *filp, poll_table *wait)
    > {
    > unsigned int mask;
    > struct inode *inode = filp->f_path.dentry->d_inode;
    > struct pipe_inode_info *pipe = inode->i_pipe;
    > int nrbufs;
    >
    > poll_wait(filp, &pipe->wait, wait);
    >
    > /* Reading only -- no need for acquiring the semaphore. */
    > nrbufs = pipe->nrbufs;
    > mask = 0;
    > if (filp->f_mode & FMODE_READ) {
    > mask = (nrbufs > 0) ? POLLIN | POLLRDNORM : 0;
    > if (!pipe->writers && filp->f_version != pipe->w_counter)
    > mask |= POLLHUP;
    > }
    >
    > if (filp->f_mode & FMODE_WRITE) {
    > mask |= (nrbufs < PIPE_BUFFERS) ? POLLOUT | POLLWRNORM : 0;
    > /*
    > * Most Unices do not set POLLERR for FIFOs but on Linux they
    > * behave exactly like pipes for poll().
    > */
    > if (!pipe->readers)
    > mask |= POLLERR;
    > }
    >
    > return mask;}
    >
    > [../linux/fs/pipe.c]
    > (for a general description, try )


    You mean the implementation in one specific version of one particular
    operating system.

    > The routine returns 'ready for reading' if pipe->nrbufs is larger
    > than zero and 'ready for writting' if pipe->nrbufs is smaller than
    > PIPE_BUFFERS. The pipe->nrbufs values is only modified in pipe_read
    > (decrements it if a buffer was consumed) and pipe_write (increments it
    > if a buffer was added). pipe_write blocks only if pipe->nrbufs becomes
    > equal to PIPE_BUFFERS before it has written all the requested data
    > to a set of pipe buffers. Since each pipe buffer has a size of
    > PIPE_BUF, at least one can be added if pipe_poll returns 'writable'
    > and nobody adds pipe buffers except pipe_write, pipe_write will not
    > block when writing <= PIPE_BUF octets after poll has returned writable
    > provided only a single process can write to the pipe.
    >
    > BTW, that I have looked this up in the kernel source was basically for
    > sport. The 'room in the buffer' condition of a pipe does not change
    > except if data is added to the buffer.


    It can't change from external memory pressure? Where is that written?

    > >> The same would trivially be true for a TCP socket descriptor, for
    > >> instance. If there is room in the socket write buffer, this room
    > >> will remain available until consumed.


    > > How do you know this?


    > Strictly speaking, not at all. But that would be the answer to the
    > question 'How do you know that Schroedinger's cat will be dead after
    > ten years in its box', and the topic if discussion would then be a
    > philosophical one. I am willing to assume 'cause and effect' and the
    > possibility of deductions that correctly predict future effects as
    > given.


    That is precisely what you cannot do. It would even be incorrect to
    say, for example, "if 'access' says you can write to a file, a
    subsequent open for writing will not fail so long as the permissions
    are not modified". Why is that wrong? Because there could be many
    other ways the subsequent 'open' could fail, and even if you can't
    think of any, that doesn't mean they don't exist.

    > > "I can't think of a way it can change" is not the same as "it cannot
    > > change because some standard says so".


    > That 'some standard says' X should behave such-and-such tells me
    > nothing about the actual behaviour of code claimed to implement X.


    Certainly. I never claimed any different.

    At least two times in the past, people have listened to nonsense
    exactly like the nonsense you are spouting and real-world code has
    *BROKEN* because of it.

    DS


  10. Re: select()/write() semantics

    David Schwartz writes:
    > On Jun 19, 5:04 am, Rainer Weikusat wrote:
    >
    >> > Exactly. Meanwhile 'select' queries the I/O subsystem,

    >
    >> There is no such thing as 'an I/O subsystem'. 'I/O readiness' is a
    >> per-descriptor property which is implemented by a struct file method.

    >
    > You are confusing attributes of some particular piece of code with
    > attributes of the functions that code implements.


    No. I am writing about Linux 2.6 and Linux 2.6 does not (currently)
    have a general abstraction named 'I/O subsystem'.

    >> > typically about network connections.

    >
    >> That something happens 'typically' is different from 'something
    >> happens always'. Specifically, FIFOs are not network connection, and
    >> select behaviour on FIFOs was discussed.

    >
    > Exactly. So nothing about network connections turns into a *guarantee*
    > about what select will do.


    Since the original question was about FIFOs, behaviour of (hypothetical)
    network connection is not part of the answer.

    >> > At a minimum, these connections have a remote end, which can change
    >> > the status of those connections at any time.

    >
    >> For connected sockets, the remote end can either send data or
    >> terminate the connection. It cannot modify anything about already
    >> received data.

    >
    > How do you know that? Where is that written?


    It is written in the kernel source and the relevant protocol
    specifications.

    [...]

    >> Below is the kernel implementation for polling on FIFOs:
    >>
    >> static unsigned int
    >> pipe_poll(struct file *filp, poll_table *wait)
    >> {
    >> unsigned int mask;
    >> struct inode *inode = filp->f_path.dentry->d_inode;
    >> struct pipe_inode_info *pipe = inode->i_pipe;
    >> int nrbufs;
    >>
    >> poll_wait(filp, &pipe->wait, wait);
    >>
    >> /* Reading only -- no need for acquiring the semaphore. */
    >> nrbufs = pipe->nrbufs;
    >> mask = 0;
    >> if (filp->f_mode & FMODE_READ) {
    >> mask = (nrbufs > 0) ? POLLIN | POLLRDNORM : 0;
    >> if (!pipe->writers && filp->f_version != pipe->w_counter)
    >> mask |= POLLHUP;
    >> }
    >>
    >> if (filp->f_mode & FMODE_WRITE) {
    >> mask |= (nrbufs < PIPE_BUFFERS) ? POLLOUT | POLLWRNORM : 0;
    >> /*
    >> * Most Unices do not set POLLERR for FIFOs but on Linux they
    >> * behave exactly like pipes for poll().
    >> */
    >> if (!pipe->readers)
    >> mask |= POLLERR;
    >> }
    >>
    >> return mask;}
    >>
    >> [../linux/fs/pipe.c]
    >> (for a general description, try )

    >
    > You mean the implementation in one specific version of one particular
    > operating system.


    Exactly. I am writing about FIFOs as implemented in Linux in a
    newsgroup whose topic is (supposedly) development of applications for
    Linux.
    >
    >> The routine returns 'ready for reading' if pipe->nrbufs is larger
    >> than zero and 'ready for writting' if pipe->nrbufs is smaller than
    >> PIPE_BUFFERS. The pipe->nrbufs values is only modified in pipe_read
    >> (decrements it if a buffer was consumed) and pipe_write (increments it
    >> if a buffer was added). pipe_write blocks only if pipe->nrbufs becomes
    >> equal to PIPE_BUFFERS before it has written all the requested data
    >> to a set of pipe buffers. Since each pipe buffer has a size of
    >> PIPE_BUF, at least one can be added if pipe_poll returns 'writable'
    >> and nobody adds pipe buffers except pipe_write, pipe_write will not
    >> block when writing <= PIPE_BUF octets after poll has returned writable
    >> provided only a single process can write to the pipe.
    >>
    >> BTW, that I have looked this up in the kernel source was basically for
    >> sport. The 'room in the buffer' condition of a pipe does not change
    >> except if data is added to the buffer.

    >
    > It can't change from external memory pressure? Where is that
    > written?


    In the code.


    if (!page) {
    page = alloc_page(GFP_HIGHUSER);
    if (unlikely(!page)) {
    ret = ret ? : -ENOMEM;
    break;
    }
    pipe->tmp_page = page;
    }
    [pipe_write/ ../linux/fs/pipe.c]

    The 'room in the buffer' condition actually means 'pipe_write may
    allocate more memory for this pipe' and if allocating more memory
    isn't possible, the routine returns failure. It may probably sleep
    waiting for disk I/O if pages need to be reclaimed, but that isn't
    what is usually meant by 'blocking', namely 'wait for an indefinite
    amount of time until an unrelated application has done something.

    >> >> The same would trivially be true for a TCP socket descriptor, for
    >> >> instance. If there is room in the socket write buffer, this room
    >> >> will remain available until consumed.

    >
    >> > How do you know this?

    >
    >> Strictly speaking, not at all. But that would be the answer to the
    >> question 'How do you know that Schroedinger's cat will be dead after
    >> ten years in its box', and the topic if discussion would then be a
    >> philosophical one. I am willing to assume 'cause and effect' and the
    >> possibility of deductions that correctly predict future effects as
    >> given.

    >
    > That is precisely what you cannot do.


    Ergo: It is impossible to develop software, because its behaviour
    cannot be predicted.

    Since I am able to type this particular sentence, this conclusion is
    wrong.

    > It would even be incorrect to say, for example, "if 'access' says
    > you can write to a file, a subsequent open for writing will not fail
    > so long as the permissions are not modified". Why is that wrong?


    Because somebody could unlink the file.

    > Because there could be many other ways the subsequent 'open' could
    > fail, and even if you can't think of any, that doesn't mean they
    > don't exist.


    The problem here is that the original claim is too general. If a
    process having the necessary permissions tries to open a file which
    exists for writing, this open will neither fail because the file does
    not exist nor because the process hasn't the necessary permissions to
    open it. That's part of the defined semantic of 'open(2)' and an open
    which 'may or may not open a file, depending on random circumstances'
    wouldn't be particularly useful.

    It does not mean that the cleaning lady will not pull the power chord
    at the same time and it does not mean that an evil alien with a gamma
    ray cannon could not just erase some parts of the contents of the
    system's RAM.

    [...]

    > At least two times in the past, people have listened to nonsense
    > exactly like the nonsense you are spouting


    I am still writing (mostly) of the behaviour of FIFOs on Linux. And
    nothing else. If would add the additional claim that the behaviour of
    each other type of descriptor can be determined, too, provided one is
    willing to limit oneself to a specific part of the observable
    reality. Which I am.

    > and real-world code has *BROKEN* because of it.


    This code has been broken to begin with, because it was written under
    assumptions that even contradicted documented behaviour.

  11. Re: select()/write() semantics


    Rainer Weikusat wrote:

    > David Schwartz writes:


    > > You are confusing attributes of some particular piece of code with
    > > attributes of the functions that code implements.


    > No. I am writing about Linux 2.6 and Linux 2.6 does not (currently)
    > have a general abstraction named 'I/O subsystem'.


    This is bordering on the absurd.

    > >> > typically about network connections.

    > >
    > >> That something happens 'typically' is different from 'something
    > >> happens always'. Specifically, FIFOs are not network connection, and
    > >> select behaviour on FIFOs was discussed.

    > >
    > > Exactly. So nothing about network connections turns into a *guarantee*
    > > about what select will do.


    > Since the original question was about FIFOs, behaviour of (hypothetical)
    > network connection is not part of the answer.


    Yes, it is. The semantics of 'select' are independent of what you're
    selecting on. The 'select' system call does not provide some
    guarantees on some types of descriptors and some on others.

    > >> > At a minimum, these connections have a remote end, which can change
    > >> > the status of those connections at any time.

    > >
    > >> For connected sockets, the remote end can either send data or
    > >> terminate the connection. It cannot modify anything about already
    > >> received data.

    > >
    > > How do you know that? Where is that written?


    > It is written in the kernel source and the relevant protocol
    > specifications.


    Which protocols are relevant exactly? And since when did it make sense
    to make general programming assumptions based on the behavior of one
    particular version of the Linux kernel?

    > > You mean the implementation in one specific version of one particular
    > > operating system.


    > Exactly. I am writing about FIFOs as implemented in Linux in a
    > newsgroup whose topic is (supposedly) development of applications for
    > Linux.


    Right, that is completely and utterly wrong. The particular
    implementation some current version of Linux happens to use is
    irrelevant when the question is whether something is guaranteed to
    happen. Only the specification can provide that kind of guarantee.
    Implementations change. Code should not break if the implementation
    changes,

    > > It can't change from external memory pressure? Where is that
    > > written?

    >
    > In the code.


    Which code? The kernel his program is going to run on next year?

    > Ergo: It is impossible to develop software, because its behaviour
    > cannot be predicted.


    Nonsense. We have specific guarantees that we can use to develop
    software. That's what standards are for.

    What happens to be in the Linux kernel code *today* is not a standard.
    It's the totally wrong place to look when the question is "how should
    I design software so that it works correctly" or "is X guaranteed".

    > > It would even be incorrect to say, for example, "if 'access' says
    > > you can write to a file, a subsequent open for writing will not fail
    > > so long as the permissions are not modified". Why is that wrong?


    > Because somebody could unlink the file.


    Right, but even if you couldn't think of that, it would still be
    wrong. That you can't think of a way something can break is not the
    same as it being guaranteed. At least twice before, in this exact same
    arena, people couldn't think of ways things could break, and they did
    break, and they got screwed.

    > > Because there could be many other ways the subsequent 'open' could
    > > fail, and even if you can't think of any, that doesn't mean they
    > > don't exist.


    > It does not mean that the cleaning lady will not pull the power chord
    > at the same time and it does not mean that an evil alien with a gamma
    > ray cannon could not just erase some parts of the contents of the
    > system's RAM.


    I'm not sure why you bring up such ridiculous things. Even standards
    will be violated if something is actually broken. We are talking about
    things working correctly, not about things that are broken. Of course
    your software will not work correctly if the system is broken.

    > > At least two times in the past, people have listened to nonsense
    > > exactly like the nonsense you are spouting


    > I am still writing (mostly) of the behaviour of FIFOs on Linux. And
    > nothing else. If would add the additional claim that the behaviour of
    > each other type of descriptor can be determined, too, provided one is
    > willing to limit oneself to a specific part of the observable
    > reality. Which I am.


    What about the next version of Linux? You think it's appropriate to
    write applications that read and write to FIFOs that are dependent
    upon a particular Linux kernel version's FIFO implementation? That
    completely and utterly insane.

    > > and real-world code has *BROKEN* because of it.


    > This code has been broken to begin with, because it was written under
    > assumptions that even contradicted documented behaviour.


    Really? Where was it documented that the kernel might decide to drop a
    UDP datagram *after* it had triggered a read hit from select? That is
    what caused the Linux inetd denial-of-service attack.

    Linux kernels at the time inetd was written never dropped a UDP
    datapgram after a 'select' indicated a read hit. Later Linux kernels
    did, as an optimization. Guess what, real world code broke. Inetd
    didn't set its sockets non-blocking but assumed that a read hit from
    'select' meant a subsequent 'recvfrom' would not block. A packet
    carefully-crafted to result in a datagram being dropped after the
    'select' hit but before the 'recvfrom' caused inetd to block in
    'recvfrom' until another UDP packet was sent to the same port.

    Nobody could think of a way this could happen, and at the time inetd
    was written, there was none. But the kernel changed, and inetd didn't.
    Real-world pain resulted from this.

    What you are doing is utterly indefensible and a violation of the most
    basic programming rules. I am baffled at your insistence at standing
    by it. I hope I don't have to rely on any code you have written. This
    really is that serious. You are misleading people who may wind up
    doing real harm by listening to you.

    Exactly what you are doing was responsible for the Linux inetd denial-
    of-service attack. There was never any reason the kernel could not
    drop a datagram after giving a read hit in 'select' and before you
    could call 'recvfrom'. It just never did that before ... before it
    did. Code that assumed it never would broke, and broke badly.

    DS


  12. Re: select()/write() semantics

    David Schwartz writes:
    > Rainer Weikusat wrote:
    >> David Schwartz writes:
    >> > You are confusing attributes of some particular piece of code with
    >> > attributes of the functions that code implements.

    >
    >> No. I am writing about Linux 2.6 and Linux 2.6 does not (currently)
    >> have a general abstraction named 'I/O subsystem'.

    >
    > This is bordering on the absurd.


    No. It is a correct statement about a specific kernel.

    >> >> > typically about network connections.
    >> >
    >> >> That something happens 'typically' is different from 'something
    >> >> happens always'. Specifically, FIFOs are not network connection, and
    >> >> select behaviour on FIFOs was discussed.
    >> >
    >> > Exactly. So nothing about network connections turns into a *guarantee*
    >> > about what select will do.

    >
    >> Since the original question was about FIFOs, behaviour of (hypothetical)
    >> network connection is not part of the answer.

    >
    > Yes, it is. The semantics of 'select' are independent of what you're
    > selecting on. The 'select' system call does not provide some
    > guarantees on some types of descriptors and some on others.


    But the question was not about 'behaviour of the select system for
    unspecified type of descriptors as provided by arbitrary operating
    system kernels' but about 'select for FIFOs on Linux'.

    [...]

    I am ignoring the middle-part because I consider it to basically be an
    off topic rant about 'the unknown is unknown'.

    >> > and real-world code has *BROKEN* because of it.

    >
    >> This code has been broken to begin with, because it was written under
    >> assumptions that even contradicted documented behaviour.

    >
    > Really? Where was it documented that the kernel might decide to drop a
    > UDP datagram *after* it had triggered a read hit from select?


    In the select(2) linux manpage. It may or may not have been there by
    the time this issue was current, I cannot determine this since it is
    too ancient.

    [...]

    > What you are doing is utterly indefensible and a violation of the most
    > basic programming rules. I am baffled at your insistence at standing
    > by it. I hope I don't have to rely on any code you have written. This
    > really is that serious. You are misleading people who may wind up
    > doing real harm by listening to you.


    I have been answering a specific question with a specific answer, have
    not claimed to ever have done anything different and have clearly
    stated which those specific circumstances are.

    I am not you 'modifiable textbook enemy', as you appear to believe,
    this personal attack of yours is entirely unmotivated and way beyond
    anything that could still be called 'decent behaviour of at least
    remotely civilized people'. Unless you are younger than twenty-one and
    have been brought up in a dustbin, you should be really ashamed of
    yourself.




  13. Re: select()/write() semantics

    On Jun 20, 4:00 am, Rainer Weikusat wrote:

    > I have been answering a specific question with a specific answer, have
    > not claimed to ever have done anything different and have clearly
    > stated which those specific circumstances are.


    You are engaged in an amusing attempt to rewrite history. The original
    statement was:

    > If select(2) reports that a file descriptor (that
    > does not refer to a socket ) is ready for writing,
    > then a write of n <= PIPE_BUF bytes will not block.


    This is a statement about whether or not an *application* has a
    guarantee in what behavior it will see from APIs provided by the
    kernel to the application.

    Such a guarantee simply cannot come from kernel code. Kernel code can
    change.

    > I am not you 'modifiable textbook enemy', as you appear to believe,
    > this personal attack of yours is entirely unmotivated and way beyond
    > anything that could still be called 'decent behaviour of at least
    > remotely civilized people'. Unless you are younger than twenty-one and
    > have been brought up in a dustbin, you should be really ashamed of
    > yourself.


    When you see a person advising others who don't know better to walk
    across the freeway with their eyes closed, who then stubbornly
    maintains that it's safe because the freeway behind his house is
    closed to vehicle traffic, politeness is simply not a concern.

    Honestly, that's what you're doing.

    DS


+ Reply to Thread