[PATCH] alternative to sys_indirect, part 1 - Kernel
This is a discussion on [PATCH] alternative to sys_indirect, part 1 - Kernel ; On 4/24/08, David Miller wrote:
> From: Linus Torvalds
> Date: Thu, 24 Apr 2008 08:29:14 -0700 (PDT)
>
>
> >
> >
> > On Thu, 24 Apr 2008, Alan Cox wrote:
> > >
> > > ...
-
Re: [PATCH] alternative to sys_indirect, part 1
On 4/24/08, David Miller wrote:
> From: Linus Torvalds
> Date: Thu, 24 Apr 2008 08:29:14 -0700 (PDT)
>
>
> >
> >
> > On Thu, 24 Apr 2008, Alan Cox wrote:
> > >
> > > Given we will never have 2^32 socket types, and in a sense this is part
> > > of the type why not just use
> > >
> > > socket(PF_INET, SOCK_STREAM|SOCK_CLOEXEC, ...)
> >
> > Ok, I have to admit that I find this very appealing. It looks much
> > cleaner, but perhaps more importantly, it also looks both readable _and_
> > easier to use for the user-space programmer.
>
>
> Me too.
But this approach fixes just one of the interfaces. There are 7 or 8
other interfaces that need to solve the same problem. What about
those?
It strikes me to be cleanest to use the same solution for all of them
-- i.e., new syscalls (seems simplest) or sys_indirect() -- including
socket().
--
I'll likely only see replies if they are CCed to mtk.manpages at gmail dot com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [PATCH] alternative to sys_indirect, part 1
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Alan Cox wrote:
> Believe it or not we have the compute capability between us to not
> accidentally reassign values we assigned to one thing to something else.
Once again, this is not about assigned values. This is about the time
before you get a value assigned. Not every experiment out there will
have a value assigned before it starts development.
But it really doesn't matter to me. I'm not the one you would introduce
the problem. Patch is forthcoming.
>> Oh really? You open a server socket, use fcntl(FD_CLOEXEC), and then
>> accept().
>
> And your behaviour just became OS specific....
Not according to POSIX. If some OSes deliberately violate POSIX that's
their problem. All POSIX OSes will have up to today return a new file
descriptor without the close-on-exec flag set at all times. Just read
the spec.
- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
iD8DBQFIEK+f2ijCOnn/RHQRAtYRAJ9Ve9XSkMriqkHkiCL00wsXzJJbYgCgmqzQ
3uexpcjM0NvU7qgngOs7LDA=
=uNKi
-----END PGP SIGNATURE-----
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [PATCH] alternative to sys_indirect, part 1
On Thu, 24 Apr 2008 09:00:08 -0700 (PDT)
David Miller wrote:
> From: Alan Cox
> Date: Thu, 24 Apr 2008 16:24:44 +0100
>
> > BTW in 4.4BSD and derivatives if I remember rightly F_CLOEXEC *is*
> > inherited across accept() so I doubt any user space software will be too
> > upset by such a shift.
>
> It actually doesn't.
>
> Just like in Linux, no file descriptor flags are inherited.
NDELAY certainly appears to be looking at Stevens.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [PATCH] alternative to sys_indirect, part 1
On 4/24/08, Alan Cox wrote:
> > You didn't read what I wrote.
>
>
> The feeling is mutual
>
>
> > For those the implementer must ensure that during the development no
> > value is used which can conflict with any current and future assigned
> > value and not with any other development.
>
>
> Kernel socket type values are assigned by the kernel team so that
> isn't a problem.
>
>
> > > Every other property of a socket via accept() is inherited from the
> > > parent. Making one property different would be bizarre and ugly.
> >
> > Implementing this would visibly change existing code and it would
> > actively violate POSIX. Not a good idea.
>
>
> POSIX has no interface for this new behaviour you propose so that is
> complete crap. The moment you use one of these features you stepped
> outside of the POSIX spec - and you know that. If there was an existing
> standard we wouldn't have a problem.
Alan, I agree with your analysis of the standard on that last para,
but I'm still not convinced that having the behavior inherited from
accept() would be good. The problem (IIUC) is that after the
accept(), a userland programmer might want to immediately change the
O_CLOEXEC for the descriptor, and there would be the same race there
that this whole thread is about avoiding.
--
I'll likely only see replies if they are CCed to mtk.manpages at gmail dot com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [PATCH] alternative to sys_indirect, part 1
From: Alan Cox
Date: Thu, 24 Apr 2008 16:24:44 +0100
> BTW in 4.4BSD and derivatives if I remember rightly F_CLOEXEC *is*
> inherited across accept() so I doubt any user space software will be too
> upset by such a shift.
It actually doesn't.
Just like in Linux, no file descriptor flags are inherited.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [PATCH] alternative to sys_indirect, part 1
> But this approach fixes just one of the interfaces. There are 7 or 8
> other interfaces that need to solve the same problem. What about
> those?
Actually it seems to fix most of them. I accept Jakub's observation we
need a "paccept()" or similar.
> It strikes me to be cleanest to use the same solution for all of them
> -- i.e., new syscalls (seems simplest) or sys_indirect() -- including
> socket().
New syscalls make the interface more complex and harder to learn. They
make it harder to tweak applications neatly to use the new API if
present. They are not immediately obvious from knowling the existing API.
What we don't want to do is to end up with a thousand weird system calls
as Windows NT did where nobody can actually understand chunks of code
without looking calls up in books as they go.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [PATCH] alternative to sys_indirect, part 1
> Once again, this is not about assigned values. This is about the time
> before you get a value assigned. Not every experiment out there will
> have a value assigned before it starts development.
And no value used by a random experiment on the internet belongs in any
one elses code. When it hits the kernel main tree it becomes definitive
and will remain so. Until then it remains someones devel hack.
The same is true about syscall numbers so your argument on this is
slightly less than sound.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [PATCH] alternative to sys_indirect, part 1
On Thu, 24 Apr 2008, Michael Kerrisk wrote:
>
> It strikes me to be cleanest to use the same solution for all of them
> -- i.e., new syscalls (seems simplest) or sys_indirect() -- including
> socket().
I certainly don't dislike sys_indirect either, but I've also done user
mode programming, and when it comes to OS-specific things (and especially
if they are even _version_-specific) I can tell you that basically nobody
will ever use them if you cannot decide to use them dynamically.
Here's an example of a *successful* use of something like that:
#ifndef O_NOATIME
#define O_NOATIME 0
#endif
static unsigned int sha1_file_open_flag = O_NOATIME;
...
fd = open(filename, O_RDONLY | sha1_file_open_flag);
if (fd < 0) {
/* See if it works without O_NOATIME */
switch (sha1_file_open_flag) {
default:
fd = open(filename, O_RDONLY);
if (fd >= 0)
break;
/* Fallthrough */
case 0:
return NULL;
}
/* If it failed once, it will probably fail again.
* Stop using O_NOATIME
*/
sha1_file_open_flag = 0;
}
...
see? This is soemthing where I actually used Linux-specific code. And
dammit, I'm _Linus_. Think of your normal programmer that isn't quite as
Linux-oriented.
And that's the problem with anything that isn't flags-based. Once you do
new system calls, doing the above is really quite nasty. How do you
statically even _test_ that you have a system call? Now you need to add a
whole autoconf thing for it existing, and when it does exist you still
need to test whether it works, and you can't even do it in the slow-path
like the above (which turns the failure into a fast-path _without_ the
flag).
So while I don't dislike the indirect system call, I do think that if we
can handle a large case of the problems with an added flag to already
existing system calls, that does have huge advantages. Because it allows
code like the above, which needs absolutely zero autoconf for linking
errors etc..
Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [PATCH] alternative to sys_indirect, part 1
On 4/24/08, Alan Cox wrote:
> On Thu, 24 Apr 2008 09:00:08 -0700 (PDT)
>
> David Miller wrote:
>
>
> > From: Alan Cox
> > Date: Thu, 24 Apr 2008 16:24:44 +0100
> >
> > > BTW in 4.4BSD and derivatives if I remember rightly F_CLOEXEC *is*
> > > inherited across accept() so I doubt any user space software will be too
> > > upset by such a shift.
> >
> > It actually doesn't.
> >
> > Just like in Linux, no file descriptor flags are inherited.
>
>
> NDELAY certainly appears to be looking at Stevens.
A while back I did some testing of this point. These were the results I noted:
FreeBSD 4.8
O_NONBLOCK and O_ASYNC are inherited
FD_CLOEXEC is not inherited
Solaris 8
O_NONBLOCK and O_ASYNC are inherited
FD_CLOEXEC is not inherited
Tru64 5.1 (sep 03, testdrive)
No F_SETFL flags are inherited
FD_CLOEXEC is not inherited
HP-UX 11
No F_SETFL flags are inherited
FD_CLOEXEC is not inherited
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [PATCH] alternative to sys_indirect, part 1
On 4/24/08, Alan Cox wrote:
> > But this approach fixes just one of the interfaces. There are 7 or 8
> > other interfaces that need to solve the same problem. What about
> > those?
>
>
> Actually it seems to fix most of them.
Am I missingg something? How? There a number of system calls that
have neither a flags argument, nor another argument that we can
overload (as you propose with socket()). For those, we'd need new
system calls os sys_indirect().
> I accept Jakub's observation we
> need a "paccept()" or similar.
True, that would be nice.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [PATCH] alternative to sys_indirect, part 1
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Linus Torvalds wrote:
> So while I don't dislike the indirect system call, I do think that if we
> can handle a large case of the problems with an added flag to already
> existing system calls,
The easy, clean cases I already handled back when. I wouldn't have
implemented socket this way to preserve the function signature but
that's just me. It's hopefully over now.
What remains isn't that easy to fix. We need syscall interface changes.
Yes, I'd like to avoid them, too. But sometimes the existing
interfaces are just wrong and now we have to make a decision: new
syscalls or sys_indirect. No way around it.
As far as the userlevel interface is concerned, this is not quite the
same. As explained before, I've anticipated some of the problems.
signalfd, eventfd have no flags parameter in the syscall but I have them
in the userlevel interface. I.e., any kernel change will be hidden. At
least as far as the interface signature is concerned.
So, the question still is on the table: do you want sys_indirect?
If yes, then then new sys_accept would use sys_indirect instead of a new
entry point. If you don't want sys_indirect, then I'll submit a new
sys_accept syscall (already have the patch here ready to go).
- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
iD8DBQFIELrj2ijCOnn/RHQRAtewAJ4+826rxwtckEvvOaXdiNSr/5ECPACfWwTn
hgt5EYrrj/imBloPE7DxHJA=
=T6LW
-----END PGP SIGNATURE-----
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [PATCH] alternative to sys_indirect, part 1
On Thu, Apr 24, 2008 at 07:18:43AM -0700, Ulrich Drepper (drepper@redhat.com) wrote:
> I don't think this is a viable approach because it is not about the
> range. People can and do select arbitrary values for those types.
> Until a value is officially recognized and registered it is in fact best
> to choose a (possibly large) random value to not conflict with anything
> else. Who can guarantee that whatever bit is chosen for SOCK_CLOEXEC
> isn't already used by someone?
type argument is limited to SOCK_MAX, higher half of the word can be
used for flags. It is much cleaner than implementing socket4() for the
single bit.
> Add to this that it's not a complete solution (no such hack possible for
> accept) and I think using a new interface is cleaner(tm).
It can inherit flags from parent by default.
--
Evgeniy Polyakov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [PATCH] alternative to sys_indirect, part 1
Michael Kerrisk wrote:
> On 4/24/08, Alan Cox wrote:
>>> But this approach fixes just one of the interfaces. There are 7 or 8
>> > other interfaces that need to solve the same problem. What about
>> > those?
>>
>>
>> Actually it seems to fix most of them.
>
> Am I missingg something? How? There a number of system calls that
> have neither a flags argument, nor another argument that we can
> overload (as you propose with socket()). For those, we'd need new
> system calls os sys_indirect().
>
sys_indirect is a total red herring here, since it won't help one iota
making the userspace interface comprehensible - it just introduces a
different calling convention that the C library will have to thunk.
-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [PATCH] alternative to sys_indirect, part 1
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
H. Peter Anvin wrote:
> sys_indirect is a total red herring here, since it won't help one iota
> making the userspace interface comprehensible - it just introduces a
> different calling convention that the C library will have to thunk.
Nobody ever suggested that sys_indirect is in any way visible at the
userlevel. It's only meant to solve the problem of changing many
syscalls (and hence touch lots of arch-specific code). Again, as said
several times, it could easily be used to fix the existing signalfd and
eventfd syscalls without any arch-specific changes and no userlevel
interface changes (the latter since we already have the correct interface).
Yes, you don't like sys_indirect, we know it. But don't deliberately
misrepresent the approach.
- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
iD8DBQFIEMPx2ijCOnn/RHQRAr7uAJ0aHkZ+bbjk2nsMhhN2xzslA/yhKgCghi8r
9PZw8zfW5fxTVTfrbsHIII0=
=SmAT
-----END PGP SIGNATURE-----
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [PATCH] alternative to sys_indirect, part 1
Ulrich Drepper wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> H. Peter Anvin wrote:
>> sys_indirect is a total red herring here, since it won't help one iota
>> making the userspace interface comprehensible - it just introduces a
>> different calling convention that the C library will have to thunk.
>
> Nobody ever suggested that sys_indirect is in any way visible at the
> userlevel. It's only meant to solve the problem of changing many
> syscalls (and hence touch lots of arch-specific code). Again, as said
> several times, it could easily be used to fix the existing signalfd and
> eventfd syscalls without any arch-specific changes and no userlevel
> interface changes (the latter since we already have the correct interface).
>
> Yes, you don't like sys_indirect, we know it. But don't deliberately
> misrepresent the approach.
>
I wasn't misrepresenting anything. I was pointing out to the parent
post -- not to you -- that sys_indirect does neither hide nor hair for
what *he* was concerned about, which was the comprehensibility of the
user-level interface.
-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [PATCH] alternative to sys_indirect, part 1
On Thu, 24 Apr 2008, Michael Kerrisk wrote:
> A while back I did some testing of this point. These were the results I noted:
>
> FreeBSD 4.8
> O_NONBLOCK and O_ASYNC are inherited
> FD_CLOEXEC is not inherited
>
> Solaris 8
> O_NONBLOCK and O_ASYNC are inherited
> FD_CLOEXEC is not inherited
>
> Tru64 5.1 (sep 03, testdrive)
> No F_SETFL flags are inherited
> FD_CLOEXEC is not inherited
>
> HP-UX 11
> No F_SETFL flags are inherited
> FD_CLOEXEC is not inherited
invent FD_CLOEXEC_INHERITED to handle accept()?
-dean
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/