[PATCH] alternative to sys_indirect, part 1 - Kernel

This is a discussion on [PATCH] alternative to sys_indirect, part 1 - Kernel ; On 4/24/08, David Miller wrote: > From: Linus Torvalds > Date: Thu, 24 Apr 2008 08:29:14 -0700 (PDT) > > > > > > > > On Thu, 24 Apr 2008, Alan Cox wrote: > > > > > > ...

+ Reply to Thread
Page 2 of 2 FirstFirst 1 2
Results 21 to 36 of 36

Thread: [PATCH] alternative to sys_indirect, part 1

  1. Re: [PATCH] alternative to sys_indirect, part 1

    On 4/24/08, David Miller wrote:
    > From: Linus Torvalds
    > Date: Thu, 24 Apr 2008 08:29:14 -0700 (PDT)
    >
    >
    > >
    > >
    > > On Thu, 24 Apr 2008, Alan Cox wrote:
    > > >
    > > > Given we will never have 2^32 socket types, and in a sense this is part
    > > > of the type why not just use
    > > >
    > > > socket(PF_INET, SOCK_STREAM|SOCK_CLOEXEC, ...)

    > >
    > > Ok, I have to admit that I find this very appealing. It looks much
    > > cleaner, but perhaps more importantly, it also looks both readable _and_
    > > easier to use for the user-space programmer.

    >
    >
    > Me too.


    But this approach fixes just one of the interfaces. There are 7 or 8
    other interfaces that need to solve the same problem. What about
    those?

    It strikes me to be cleanest to use the same solution for all of them
    -- i.e., new syscalls (seems simplest) or sys_indirect() -- including
    socket().

    --
    I'll likely only see replies if they are CCed to mtk.manpages at gmail dot com
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: [PATCH] alternative to sys_indirect, part 1

    -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA1

    Alan Cox wrote:
    > Believe it or not we have the compute capability between us to not
    > accidentally reassign values we assigned to one thing to something else.


    Once again, this is not about assigned values. This is about the time
    before you get a value assigned. Not every experiment out there will
    have a value assigned before it starts development.

    But it really doesn't matter to me. I'm not the one you would introduce
    the problem. Patch is forthcoming.


    >> Oh really? You open a server socket, use fcntl(FD_CLOEXEC), and then
    >> accept().

    >
    > And your behaviour just became OS specific....


    Not according to POSIX. If some OSes deliberately violate POSIX that's
    their problem. All POSIX OSes will have up to today return a new file
    descriptor without the close-on-exec flag set at all times. Just read
    the spec.

    - --
    ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.7 (GNU/Linux)

    iD8DBQFIEK+f2ijCOnn/RHQRAtYRAJ9Ve9XSkMriqkHkiCL00wsXzJJbYgCgmqzQ
    3uexpcjM0NvU7qgngOs7LDA=
    =uNKi
    -----END PGP SIGNATURE-----
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: [PATCH] alternative to sys_indirect, part 1

    On Thu, 24 Apr 2008 09:00:08 -0700 (PDT)
    David Miller wrote:

    > From: Alan Cox
    > Date: Thu, 24 Apr 2008 16:24:44 +0100
    >
    > > BTW in 4.4BSD and derivatives if I remember rightly F_CLOEXEC *is*
    > > inherited across accept() so I doubt any user space software will be too
    > > upset by such a shift.

    >
    > It actually doesn't.
    >
    > Just like in Linux, no file descriptor flags are inherited.


    NDELAY certainly appears to be looking at Stevens.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: [PATCH] alternative to sys_indirect, part 1

    On 4/24/08, Alan Cox wrote:
    > > You didn't read what I wrote.

    >
    >
    > The feeling is mutual
    >
    >
    > > For those the implementer must ensure that during the development no
    > > value is used which can conflict with any current and future assigned
    > > value and not with any other development.

    >
    >
    > Kernel socket type values are assigned by the kernel team so that
    > isn't a problem.
    >
    >
    > > > Every other property of a socket via accept() is inherited from the
    > > > parent. Making one property different would be bizarre and ugly.

    > >
    > > Implementing this would visibly change existing code and it would
    > > actively violate POSIX. Not a good idea.

    >
    >
    > POSIX has no interface for this new behaviour you propose so that is
    > complete crap. The moment you use one of these features you stepped
    > outside of the POSIX spec - and you know that. If there was an existing
    > standard we wouldn't have a problem.


    Alan, I agree with your analysis of the standard on that last para,
    but I'm still not convinced that having the behavior inherited from
    accept() would be good. The problem (IIUC) is that after the
    accept(), a userland programmer might want to immediately change the
    O_CLOEXEC for the descriptor, and there would be the same race there
    that this whole thread is about avoiding.

    --
    I'll likely only see replies if they are CCed to mtk.manpages at gmail dot com
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: [PATCH] alternative to sys_indirect, part 1

    From: Alan Cox
    Date: Thu, 24 Apr 2008 16:24:44 +0100

    > BTW in 4.4BSD and derivatives if I remember rightly F_CLOEXEC *is*
    > inherited across accept() so I doubt any user space software will be too
    > upset by such a shift.


    It actually doesn't.

    Just like in Linux, no file descriptor flags are inherited.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: [PATCH] alternative to sys_indirect, part 1

    > But this approach fixes just one of the interfaces. There are 7 or 8
    > other interfaces that need to solve the same problem. What about
    > those?


    Actually it seems to fix most of them. I accept Jakub's observation we
    need a "paccept()" or similar.

    > It strikes me to be cleanest to use the same solution for all of them
    > -- i.e., new syscalls (seems simplest) or sys_indirect() -- including
    > socket().


    New syscalls make the interface more complex and harder to learn. They
    make it harder to tweak applications neatly to use the new API if
    present. They are not immediately obvious from knowling the existing API.

    What we don't want to do is to end up with a thousand weird system calls
    as Windows NT did where nobody can actually understand chunks of code
    without looking calls up in books as they go.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: [PATCH] alternative to sys_indirect, part 1

    > Once again, this is not about assigned values. This is about the time
    > before you get a value assigned. Not every experiment out there will
    > have a value assigned before it starts development.


    And no value used by a random experiment on the internet belongs in any
    one elses code. When it hits the kernel main tree it becomes definitive
    and will remain so. Until then it remains someones devel hack.

    The same is true about syscall numbers so your argument on this is
    slightly less than sound.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: [PATCH] alternative to sys_indirect, part 1



    On Thu, 24 Apr 2008, Michael Kerrisk wrote:
    >
    > It strikes me to be cleanest to use the same solution for all of them
    > -- i.e., new syscalls (seems simplest) or sys_indirect() -- including
    > socket().


    I certainly don't dislike sys_indirect either, but I've also done user
    mode programming, and when it comes to OS-specific things (and especially
    if they are even _version_-specific) I can tell you that basically nobody
    will ever use them if you cannot decide to use them dynamically.

    Here's an example of a *successful* use of something like that:

    #ifndef O_NOATIME
    #define O_NOATIME 0
    #endif

    static unsigned int sha1_file_open_flag = O_NOATIME;

    ...
    fd = open(filename, O_RDONLY | sha1_file_open_flag);
    if (fd < 0) {
    /* See if it works without O_NOATIME */
    switch (sha1_file_open_flag) {
    default:
    fd = open(filename, O_RDONLY);
    if (fd >= 0)
    break;
    /* Fallthrough */
    case 0:
    return NULL;
    }


    /* If it failed once, it will probably fail again.
    * Stop using O_NOATIME
    */
    sha1_file_open_flag = 0;
    }
    ...

    see? This is soemthing where I actually used Linux-specific code. And
    dammit, I'm _Linus_. Think of your normal programmer that isn't quite as
    Linux-oriented.

    And that's the problem with anything that isn't flags-based. Once you do
    new system calls, doing the above is really quite nasty. How do you
    statically even _test_ that you have a system call? Now you need to add a
    whole autoconf thing for it existing, and when it does exist you still
    need to test whether it works, and you can't even do it in the slow-path
    like the above (which turns the failure into a fast-path _without_ the
    flag).

    So while I don't dislike the indirect system call, I do think that if we
    can handle a large case of the problems with an added flag to already
    existing system calls, that does have huge advantages. Because it allows
    code like the above, which needs absolutely zero autoconf for linking
    errors etc..

    Linus
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: [PATCH] alternative to sys_indirect, part 1

    On 4/24/08, Alan Cox wrote:
    > On Thu, 24 Apr 2008 09:00:08 -0700 (PDT)
    >
    > David Miller wrote:
    >
    >
    > > From: Alan Cox
    > > Date: Thu, 24 Apr 2008 16:24:44 +0100
    > >
    > > > BTW in 4.4BSD and derivatives if I remember rightly F_CLOEXEC *is*
    > > > inherited across accept() so I doubt any user space software will be too
    > > > upset by such a shift.

    > >
    > > It actually doesn't.
    > >
    > > Just like in Linux, no file descriptor flags are inherited.

    >
    >
    > NDELAY certainly appears to be looking at Stevens.


    A while back I did some testing of this point. These were the results I noted:

    FreeBSD 4.8
    O_NONBLOCK and O_ASYNC are inherited
    FD_CLOEXEC is not inherited

    Solaris 8
    O_NONBLOCK and O_ASYNC are inherited
    FD_CLOEXEC is not inherited

    Tru64 5.1 (sep 03, testdrive)
    No F_SETFL flags are inherited
    FD_CLOEXEC is not inherited

    HP-UX 11
    No F_SETFL flags are inherited
    FD_CLOEXEC is not inherited
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: [PATCH] alternative to sys_indirect, part 1

    On 4/24/08, Alan Cox wrote:
    > > But this approach fixes just one of the interfaces. There are 7 or 8
    > > other interfaces that need to solve the same problem. What about
    > > those?

    >
    >
    > Actually it seems to fix most of them.


    Am I missingg something? How? There a number of system calls that
    have neither a flags argument, nor another argument that we can
    overload (as you propose with socket()). For those, we'd need new
    system calls os sys_indirect().

    > I accept Jakub's observation we
    > need a "paccept()" or similar.


    True, that would be nice.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. Re: [PATCH] alternative to sys_indirect, part 1

    -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA1

    Linus Torvalds wrote:
    > So while I don't dislike the indirect system call, I do think that if we
    > can handle a large case of the problems with an added flag to already
    > existing system calls,


    The easy, clean cases I already handled back when. I wouldn't have
    implemented socket this way to preserve the function signature but
    that's just me. It's hopefully over now.

    What remains isn't that easy to fix. We need syscall interface changes.
    Yes, I'd like to avoid them, too. But sometimes the existing
    interfaces are just wrong and now we have to make a decision: new
    syscalls or sys_indirect. No way around it.

    As far as the userlevel interface is concerned, this is not quite the
    same. As explained before, I've anticipated some of the problems.
    signalfd, eventfd have no flags parameter in the syscall but I have them
    in the userlevel interface. I.e., any kernel change will be hidden. At
    least as far as the interface signature is concerned.


    So, the question still is on the table: do you want sys_indirect?

    If yes, then then new sys_accept would use sys_indirect instead of a new
    entry point. If you don't want sys_indirect, then I'll submit a new
    sys_accept syscall (already have the patch here ready to go).

    - --
    ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.7 (GNU/Linux)

    iD8DBQFIELrj2ijCOnn/RHQRAtewAJ4+826rxwtckEvvOaXdiNSr/5ECPACfWwTn
    hgt5EYrrj/imBloPE7DxHJA=
    =T6LW
    -----END PGP SIGNATURE-----
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. Re: [PATCH] alternative to sys_indirect, part 1

    On Thu, Apr 24, 2008 at 07:18:43AM -0700, Ulrich Drepper (drepper@redhat.com) wrote:
    > I don't think this is a viable approach because it is not about the
    > range. People can and do select arbitrary values for those types.
    > Until a value is officially recognized and registered it is in fact best
    > to choose a (possibly large) random value to not conflict with anything
    > else. Who can guarantee that whatever bit is chosen for SOCK_CLOEXEC
    > isn't already used by someone?


    type argument is limited to SOCK_MAX, higher half of the word can be
    used for flags. It is much cleaner than implementing socket4() for the
    single bit.

    > Add to this that it's not a complete solution (no such hack possible for
    > accept) and I think using a new interface is cleaner(tm).


    It can inherit flags from parent by default.

    --
    Evgeniy Polyakov
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  13. Re: [PATCH] alternative to sys_indirect, part 1

    Michael Kerrisk wrote:
    > On 4/24/08, Alan Cox wrote:
    >>> But this approach fixes just one of the interfaces. There are 7 or 8
    >> > other interfaces that need to solve the same problem. What about
    >> > those?

    >>
    >>
    >> Actually it seems to fix most of them.

    >
    > Am I missingg something? How? There a number of system calls that
    > have neither a flags argument, nor another argument that we can
    > overload (as you propose with socket()). For those, we'd need new
    > system calls os sys_indirect().
    >


    sys_indirect is a total red herring here, since it won't help one iota
    making the userspace interface comprehensible - it just introduces a
    different calling convention that the C library will have to thunk.

    -hpa
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  14. Re: [PATCH] alternative to sys_indirect, part 1

    -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA1

    H. Peter Anvin wrote:
    > sys_indirect is a total red herring here, since it won't help one iota
    > making the userspace interface comprehensible - it just introduces a
    > different calling convention that the C library will have to thunk.


    Nobody ever suggested that sys_indirect is in any way visible at the
    userlevel. It's only meant to solve the problem of changing many
    syscalls (and hence touch lots of arch-specific code). Again, as said
    several times, it could easily be used to fix the existing signalfd and
    eventfd syscalls without any arch-specific changes and no userlevel
    interface changes (the latter since we already have the correct interface).

    Yes, you don't like sys_indirect, we know it. But don't deliberately
    misrepresent the approach.

    - --
    ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.7 (GNU/Linux)

    iD8DBQFIEMPx2ijCOnn/RHQRAr7uAJ0aHkZ+bbjk2nsMhhN2xzslA/yhKgCghi8r
    9PZw8zfW5fxTVTfrbsHIII0=
    =SmAT
    -----END PGP SIGNATURE-----
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  15. Re: [PATCH] alternative to sys_indirect, part 1

    Ulrich Drepper wrote:
    > -----BEGIN PGP SIGNED MESSAGE-----
    > Hash: SHA1
    >
    > H. Peter Anvin wrote:
    >> sys_indirect is a total red herring here, since it won't help one iota
    >> making the userspace interface comprehensible - it just introduces a
    >> different calling convention that the C library will have to thunk.

    >
    > Nobody ever suggested that sys_indirect is in any way visible at the
    > userlevel. It's only meant to solve the problem of changing many
    > syscalls (and hence touch lots of arch-specific code). Again, as said
    > several times, it could easily be used to fix the existing signalfd and
    > eventfd syscalls without any arch-specific changes and no userlevel
    > interface changes (the latter since we already have the correct interface).
    >
    > Yes, you don't like sys_indirect, we know it. But don't deliberately
    > misrepresent the approach.
    >


    I wasn't misrepresenting anything. I was pointing out to the parent
    post -- not to you -- that sys_indirect does neither hide nor hair for
    what *he* was concerned about, which was the comprehensibility of the
    user-level interface.

    -hpa
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  16. Re: [PATCH] alternative to sys_indirect, part 1

    On Thu, 24 Apr 2008, Michael Kerrisk wrote:

    > A while back I did some testing of this point. These were the results I noted:
    >
    > FreeBSD 4.8
    > O_NONBLOCK and O_ASYNC are inherited
    > FD_CLOEXEC is not inherited
    >
    > Solaris 8
    > O_NONBLOCK and O_ASYNC are inherited
    > FD_CLOEXEC is not inherited
    >
    > Tru64 5.1 (sep 03, testdrive)
    > No F_SETFL flags are inherited
    > FD_CLOEXEC is not inherited
    >
    > HP-UX 11
    > No F_SETFL flags are inherited
    > FD_CLOEXEC is not inherited


    invent FD_CLOEXEC_INHERITED to handle accept()?

    -dean
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread
Page 2 of 2 FirstFirst 1 2