Re: 2.6.24-rc6-mm1 - git-lblnet.patch and networking horkage - Kernel

This is a discussion on Re: 2.6.24-rc6-mm1 - git-lblnet.patch and networking horkage - Kernel ; As James said I'm away right now and computer access is limited. However, I'm stuck in the airport right now and spent some time looking at the code ... Based on what has been found so far I wonder if ...

+ Reply to Thread
Results 1 to 7 of 7

Thread: Re: 2.6.24-rc6-mm1 - git-lblnet.patch and networking horkage

  1. Re: 2.6.24-rc6-mm1 - git-lblnet.patch and networking horkage

    As James said I'm away right now and computer access is limited. However, I'm stuck in the airport right now and spent some time looking at the code ... Based on what has been found so far I wonder if the problem isn't a race but a problem of skb->iif never being initialized correctly? To my untrained eye it looks like __netdev_alloc_skb() should be setting skb->iif (like it does for skb->dev) but it currently doesn't.

    Am I barking up the wrong tree here?

    .. paul moore
    .. linux security @ hp
    -----Original Message-----
    From: James Morris
    Date: Wednesday, Dec 26, 2007 7:16 am
    Subject: Re: 2.6.24-rc6-mm1 - git-lblnet.patch and networking horkage
    To: Valdis.Kletnieks@vt.edu
    CC: Andrew Morton , Paul Moore , linux-kernel@vger.kernel.org, Stephen Smalley

    On Wed, 26 Dec 2007, James Morris wrote:
    >
    >> What does the following say ?

    >
    > # sestatus && rpm -q selinux-policy
    >
    >Don't worry about that -- I reproduced it with Paul Moore's git tree: git://git.infradead.org/users/pcmoore/lblnet-2.6_testing
    >
    >(under current -mm, the e1000 driver doesn't find my ethernet card & the
    >tcl tests won't run without an external interface).
    >
    >The offending commit is when SELinux is converted to the new ifindex
    >interface:
    >
    > 9c6ad8f6895db7a517c04c2147cb5e7ffb83a315 is first bad commit
    > commit 9c6ad8f6895db7a517c04c2147cb5e7ffb83a315
    > Author: Paul Moore
    > Date: Fri Dec 21 11:44:26 2007 -0500
    >
    > SELinux: Convert the netif code to use ifindex values
    >
    > [...]
    >
    >In some case (not yet fully identified -- also happens when avahi starts
    >up, although seemingly silently & without obvious issues), SELinux is
    >passed an ifindex of 1515870810, which corresponds to 0x5a5a5a5a, the slab
    >poison value, suggesting a race in the calling code where we're being
    >asked to check an skb which has been freed.
    >
    >The SELinux code is erroring out before performing an access check
    >(perhaps there should be WARN_ON, at least), so this will affect both
    >permissive and enforcing mode without generating any log messages.
    >
    >Andrew: I suggest dropping the patchset from -mm until Paul gets back from
    >vacation.
    >
    >
    >- James
    >--
    >James Morris
    >
    >


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: 2.6.24-rc6-mm1 - git-lblnet.patch and networking horkage

    On Thu, 26 Dec 2007, Paul Moore wrote:

    > As James said I'm away right now and computer access is limited.
    > However, I'm stuck in the airport right now and spent some time looking
    > at the code ... Based on what has been found so far I wonder if the
    > problem isn't a race but a problem of skb->iif never being initialized
    > correctly? To my untrained eye it looks like __netdev_alloc_skb()
    > should be setting skb->iif (like it does for skb->dev) but it currently
    > doesn't.


    ->iif will be zeroed during skb allocation, then set during
    netif_receive_skb().


    - James
    --
    James Morris

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: 2.6.24-rc6-mm1 - git-lblnet.patch and networking horkage

    On Wednesday 26 December 2007 4:52:03 pm James Morris wrote:
    > On Thu, 26 Dec 2007, Paul Moore wrote:
    > > As James said I'm away right now and computer access is limited.
    > > However, I'm stuck in the airport right now and spent some time looking
    > > at the code ... Based on what has been found so far I wonder if the
    > > problem isn't a race but a problem of skb->iif never being initialized
    > > correctly? To my untrained eye it looks like __netdev_alloc_skb()
    > > should be setting skb->iif (like it does for skb->dev) but it currently
    > > doesn't.

    >
    > ->iif will be zeroed during skb allocation, then set during
    > netif_receive_skb().


    So it is ... I didn't look at __alloc_skb() close enough. Thanks.

    --
    paul moore
    linux security @ hp
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: 2.6.24-rc6-mm1 - git-lblnet.patch and networking horkage

    On Wednesday 26 December 2007 4:52:03 pm James Morris wrote:
    > On Thu, 26 Dec 2007, Paul Moore wrote:
    > > As James said I'm away right now and computer access is limited.
    > > However, I'm stuck in the airport right now and spent some time looking
    > > at the code ... Based on what has been found so far I wonder if the
    > > problem isn't a race but a problem of skb->iif never being initialized
    > > correctly? To my untrained eye it looks like __netdev_alloc_skb()
    > > should be setting skb->iif (like it does for skb->dev) but it currently
    > > doesn't.

    >
    > ->iif will be zeroed during skb allocation, then set during
    > netif_receive_skb().


    I was able to reproduce this bug this morning by running avahi as James did
    and did a little more digging. I don't have a fix yet, but thought I would
    pass along what I've found in case this triggers a moment of clarity to
    someone out there ...

    The skb->iif value appears to be messed up as early as netif_receive_skb(), in
    my case it is set to 196611 (trust me, I don't have that many interfaces in
    my test machine) which causes the ->iif initialization code in
    netif_receive_skb() to be skipped because ->iif is greater than zero. This
    particular packet is locally generated and locally consumed.

    Hopefully I'll have a fix later this afternoon but if someone has a bright
    idea I'd love to hear it. Backtrace is below:

    WARNING: at security/selinux/hooks.c:3805 selinux_socket_sock_rcv_skb()
    Pid: 1454, comm: avahi-daemon Not tainted 2.6.24-rc5 #4
    [] selinux_socket_sock_rcv_skb+0x96/0x3ac
    [] printk+0x1b/0x1f
    [] __print_symbol+0x21/0x2a
    [] security_sock_rcv_skb+0xc/0xd
    [] sock_queue_rcv_skb+0x29/0xce
    [] ipt_do_table+0x423/0x466 [ip_tables]
    [] udp_queue_rcv_skb+0x199/0x201
    [] vsnprintf+0x283/0x450
    [] nf_conntrack_in+0x307/0x3d7 [nf_conntrack]
    [] __udp4_lib_rcv+0x3ee/0x7a7
    [] nf_ct_deliver_cached_events+0x8/0x90 [nf_conntrack]
    [] ipv4_confirm+0x34/0x39 [nf_conntrack_ipv4]
    [] nf_iterate+0x3a/0x6e
    [] ip_local_deliver_finish+0x0/0x191
    [] ip_local_deliver_finish+0x0/0x191
    [] ip_local_deliver_finish+0x112/0x191
    [] ip_rcv_finish+0x254/0x273
    [] ip_rcv_finish+0x0/0x273
    [] ip_rcv+0x1cc/0x1fb
    [] ip_rcv_finish+0x0/0x273
    [] ip_rcv+0x0/0x1fb
    [] netif_receive_skb+0x37d/0x397
    [] process_backlog+0x60/0x92
    [] net_rx_action+0x67/0x118
    [] __do_softirq+0x35/0x75
    [] do_softirq+0x3e/0x8d
    [] local_bh_enable+0x6b/0x79
    [] nf_ct_deliver_cached_events+0x8/0x90 [nf_conntrack]
    [] ipv4_confirm+0x34/0x39 [nf_conntrack_ipv4]
    [] ipv4_confirm+0x0/0x39 [nf_conntrack_ipv4]
    [] nf_iterate+0x3a/0x6e
    [] ip_finish_output+0x0/0x208
    [] nf_hook_slow+0x4d/0xb5
    [] ip_finish_output+0x0/0x208
    [] ip_mc_output+0x172/0x18b
    [] ip_finish_output+0x0/0x208
    [] ip_push_pending_frames+0x2be/0x311
    [] dst_output+0x0/0x7
    [] udp_push_pending_frames+0x298/0x2d7
    [] udp_sendmsg+0x459/0x55c
    [] inet_sendmsg+0x3b/0x45
    [] sock_sendmsg+0xc8/0xe3
    [] autoremove_wake_function+0x0/0x33
    [] sock_sendmsg+0xc8/0xe3
    [] autoremove_wake_function+0x0/0x33
    [] copy_from_user+0x32/0x5e
    [] copy_from_user+0x32/0x5e
    [] sys_sendmsg+0x192/0x1f7
    [] current_fs_time+0x13/0x15
    [] file_update_time+0x21/0x61
    [] pipe_write+0x3cc/0x3d8
    [] do_sync_write+0x0/0x109
    [] do_sync_write+0xc6/0x109
    [] autoremove_wake_function+0x0/0x33
    [] sys_socketcall+0x240/0x261
    [] syscall_call+0x7/0xb
    =======================

    --
    paul moore
    linux security @ hp
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: 2.6.24-rc6-mm1 - git-lblnet.patch and networking horkage

    On Monday 31 December 2007 12:13:32 pm Paul Moore wrote:
    > On Wednesday 26 December 2007 4:52:03 pm James Morris wrote:
    > > On Thu, 26 Dec 2007, Paul Moore wrote:
    > > > As James said I'm away right now and computer access is limited.
    > > > However, I'm stuck in the airport right now and spent some time looking
    > > > at the code ... Based on what has been found so far I wonder if the
    > > > problem isn't a race but a problem of skb->iif never being initialized
    > > > correctly? To my untrained eye it looks like __netdev_alloc_skb()
    > > > should be setting skb->iif (like it does for skb->dev) but it currently
    > > > doesn't.

    > >
    > > ->iif will be zeroed during skb allocation, then set during
    > > netif_receive_skb().

    >
    > I was able to reproduce this bug this morning by running avahi as James did
    > and did a little more digging. I don't have a fix yet, but thought I would
    > pass along what I've found in case this triggers a moment of clarity to
    > someone out there ...
    >
    > The skb->iif value appears to be messed up as early as netif_receive_skb(),
    > in my case it is set to 196611 (trust me, I don't have that many interfaces
    > in my test machine) which causes the ->iif initialization code in
    > netif_receive_skb() to be skipped because ->iif is greater than zero. This
    > particular packet is locally generated and locally consumed.
    >
    > Hopefully I'll have a fix later this afternoon but if someone has a bright
    > idea I'd love to hear it ...


    [NOTE: I added netdev to this thread to gather some input. @netdev folks, the
    problem is that the skb->iif field contains garbage in some cases which is
    causing problems for some new SELinux network code. The exact problem
    probably isn't too important for this discussion, what is important is that
    the skb->iif field contains a non-zero garbage value some of the time on
    incoming packets.]

    I'm pretty certain this is an uninitialized value problem now and not a
    use-after-free issue. The invalid/garbage ->iif value seems to only happen
    on packets that are generated locally and sent back into the stack for local
    consumption, e.g. loopback. These local packets also need to have been
    cloned at some point, either on the output or input path.

    The problem appears to be a skb_clone() function which does not clear the skb
    structure properly and fails to copy the ->iif value from the original skb to
    the cloned skb. From what I can tell, there are two possible solutions to
    this problem:

    1. Clear all of the cloned skb fields in skb_clone() via memset()
    2. Copy the ->iif field in __copy_skb_header()

    I don't have a good enough understanding of all the details involving skb
    memory management to know if option #1 is a Good Idea or not, but option #2
    seems much simpler and solves the problem of garbage in the ->iif field. My
    preference is to go with option #2 but before I submit a patch does anyone
    think this is the wrong solution?

    --
    paul moore
    linux security @ hp
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: 2.6.24-rc6-mm1 - git-lblnet.patch and networking horkage

    On Mon, 31 Dec 2007, Paul Moore wrote:

    > I'm pretty certain this is an uninitialized value problem now and not a
    > use-after-free issue. The invalid/garbage ->iif value seems to only happen
    > on packets that are generated locally and sent back into the stack for local
    > consumption, e.g. loopback. These local packets also need to have been
    > cloned at some point, either on the output or input path.


    I think we need to find out exactly what's happening, first.

    > The problem appears to be a skb_clone() function which does not clear the skb
    > structure properly and fails to copy the ->iif value from the original skb to
    > the cloned skb. From what I can tell, there are two possible solutions to
    > this problem:
    >
    > 1. Clear all of the cloned skb fields in skb_clone() via memset()


    Sounds like it's not going to fly for performance reasons in any case.

    > 2. Copy the ->iif field in __copy_skb_header()


    Seems valid.


    - James
    --
    James Morris

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: 2.6.24-rc6-mm1 - git-lblnet.patch and networking horkage

    On Monday 31 December 2007 4:46:09 pm James Morris wrote:
    > On Mon, 31 Dec 2007, Paul Moore wrote:
    > > I'm pretty certain this is an uninitialized value problem now and not a
    > > use-after-free issue. The invalid/garbage ->iif value seems to only
    > > happen on packets that are generated locally and sent back into the stack
    > > for local consumption, e.g. loopback. These local packets also need to
    > > have been cloned at some point, either on the output or input path.

    >
    > I think we need to find out exactly what's happening, first.


    The more I've looked at the code this afternoon, I'm certain this is the case.
    I've also been running a patched kernel (using option #2 from below) and all
    of the skbs coming up the stack have valid ->iif values. Granted, I haven't
    examined the code from the avahi daemon or the tcl test cases and traced the
    entire code path through the kernel but I _am_ certain that at some point in
    that code path the packet is cloned and due to a problem in skb_clone()
    the ->iif field is not copied correctly causing the problems we have all
    seen.

    How much smoke needs to be coming from the gun?

    > > The problem appears to be a skb_clone() function which does not clear the
    > > skb structure properly and fails to copy the ->iif value from the
    > > original skb to the cloned skb. From what I can tell, there are two
    > > possible solutions to this problem:
    > >
    > > 1. Clear all of the cloned skb fields in skb_clone() via memset()

    >
    > Sounds like it's not going to fly for performance reasons in any case.


    That was my gut feeling. I was also a little unsure where exactly the correct
    placement should be for the memset() call.

    > > 2. Copy the ->iif field in __copy_skb_header()

    >
    > Seems valid.


    Okay, I'll stick with this approach. I'll post a patch backed against
    net-2.6.25 tomorrow as an RFC to see if anyone on netdev has any strong
    feelings. If no one complains, I'll add it to the lblnet git tree.

    --
    paul moore
    linux security @ hp
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread