2.6.24-rc4-mm1 - Kernel

This is a discussion on 2.6.24-rc4-mm1 - Kernel ; Andrew Morton wrote: > On Tue, 11 Dec 2007 13:26:58 -0800 > "Kok, Auke" wrote: > >> Andrew Morton wrote: >>> On Tue, 11 Dec 2007 08:13:52 -0800 "Martin Bligh" wrote: >>> >>>>> - Lots of device IDs have been ...

+ Reply to Thread
Page 6 of 6 FirstFirst ... 4 5 6
Results 101 to 111 of 111

Thread: 2.6.24-rc4-mm1

  1. Re: 2.6.24-rc4-mm1

    Andrew Morton wrote:
    > On Tue, 11 Dec 2007 13:26:58 -0800
    > "Kok, Auke" wrote:
    >
    >> Andrew Morton wrote:
    >>> On Tue, 11 Dec 2007 08:13:52 -0800 "Martin Bligh" wrote:
    >>>
    >>>>> - Lots of device IDs have been removed from the e1000 driver and moved
    >>>>> over
    >>>>> to e1000e. So if your e1000 stops working, you forgot to set
    >>>>> CONFIG_E1000E.
    >>>>>
    >>>>>
    >>>> Wouldn't it make sense to just default this to on if E1000 was on, rather
    >>>> than screwing
    >>>> everybody for no good reason (plus breaking all the automated testing, etc
    >>>> etc)?
    >>>> Much though I love random refactoring, it is fairly painful to just keep
    >>>> changing the
    >>>> names of things.
    >>> (cc netdev and Auke)
    >>>
    >>> Yes, that would be very sensible. CONFIG_E1000E should default to whatever
    >>> CONFIG_E1000 was set to.

    >> which is "y" for x86 and friends, ppc, arm and ia64 through 'defconfig'. the
    >> Kconfig files do not have defaults in them.

    >
    > I wouldn't be looking at defconfig files - I don't think many people use
    > them. Most people use their previous config, via oldconfig.
    >
    > So what we want here is to give them E1000E if they had previously been
    > using E1000. I don't know how one would do this in Kconfig.


    ditto. I doubt that "SELECT E1000E" would be a good idea here (maybe not even
    work), and I can't think of anything else.

    Auke
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: 2.6.24-rc4-mm1

    On Tue, 11 Dec 2007 14:17:16 -0800 Kok, Auke wrote:

    > Andrew Morton wrote:
    > > On Tue, 11 Dec 2007 13:26:58 -0800
    > > "Kok, Auke" wrote:
    > >
    > >> Andrew Morton wrote:
    > >>> On Tue, 11 Dec 2007 08:13:52 -0800 "Martin Bligh" wrote:
    > >>>
    > >>>>> - Lots of device IDs have been removed from the e1000 driver and moved
    > >>>>> over
    > >>>>> to e1000e. So if your e1000 stops working, you forgot to set
    > >>>>> CONFIG_E1000E.
    > >>>>>
    > >>>>>
    > >>>> Wouldn't it make sense to just default this to on if E1000 was on, rather
    > >>>> than screwing
    > >>>> everybody for no good reason (plus breaking all the automated testing, etc
    > >>>> etc)?
    > >>>> Much though I love random refactoring, it is fairly painful to just keep
    > >>>> changing the
    > >>>> names of things.
    > >>> (cc netdev and Auke)
    > >>>
    > >>> Yes, that would be very sensible. CONFIG_E1000E should default to whatever
    > >>> CONFIG_E1000 was set to.
    > >> which is "y" for x86 and friends, ppc, arm and ia64 through 'defconfig'. the
    > >> Kconfig files do not have defaults in them.

    > >
    > > I wouldn't be looking at defconfig files - I don't think many people use
    > > them. Most people use their previous config, via oldconfig.
    > >
    > > So what we want here is to give them E1000E if they had previously been
    > > using E1000. I don't know how one would do this in Kconfig.

    >
    > ditto. I doubt that "SELECT E1000E" would be a good idea here (maybe not even
    > work), and I can't think of anything else.


    "default E1000" in E1000E seems to work for me.

    ---

    From: Randy Dunlap

    Make E1000E default to the same kconfig setting as E1000,
    at least for -mm testing.

    Signed-off-by: Randy Dunlap
    ---
    drivers/net/Kconfig | 1 +
    1 file changed, 1 insertion(+)

    --- linux-2.6.24-rc4-mm1.orig/drivers/net/Kconfig
    +++ linux-2.6.24-rc4-mm1/drivers/net/Kconfig
    @@ -1986,6 +1986,7 @@ config E1000_DISABLE_PACKET_SPLIT
    config E1000E
    tristate "Intel(R) PRO/1000 PCI-Express Gigabit Ethernet support"
    depends on PCI
    + default E1000
    ---help---
    This driver supports the PCI-Express Intel(R) PRO/1000 gigabit
    ethernet family of adapters. For PCI or PCI-X e1000 adapters,
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: 2.6.24-rc4-mm1

    On Tue, 4 Dec 2007 21:17:01 -0800
    Andrew Morton wrote:

    > Changes since 2.6.24-rc3-mm2:


    2.6.24-rc4-mm1 brought a nice TCP oops on my x86_64 system, while I
    was stress-testing the VM and watching via ssh:

    general protection fault: 0000 [1] SMP
    last sysfs file: /sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/irq
    CPU 1
    Modules linked in: nfs lockd nfs_acl rfcomm l2cap bluetooth autofs4 sunrpc ipv6 acpi_cpufreq dm_multipath parport_pc e1000e parport firewire_ohci button i2c_i801 i2c_core i82975x_edac pcspkr firewire_core serio_raw edac_core rtc_cmos floppy crc_itu_t sg sr_mod cdrom pata_marvell ata_piix dm_snapshot dm_zero dm_mirror dm_mod ahci libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd
    Pid: 2946, comm: sshd Not tainted 2.6.24-rc4-mm1 #1
    RIP: 0010:[] [] __tcp_rb_insert+0x1a/0x67
    RSP: 0018:ffff810066401c88 EFLAGS: 00010202
    RAX: 6b6b6b6b6b6b6b6b RBX: ffff810076e9f000 RCX: ffff81003ddc9900
    RDX: 6b6b6b6b6b6b6bab RSI: ffff81006ed1b148 RDI: 6b6b6b6b6b6b6b5b
    RBP: ffff81006ed1aa00 R08: ffff810076e9f010 R09: 00000000bef8d64e
    R10: ffffffff81228926 R11: ffffffff8110b2aa R12: ffff810066401de8
    R13: 00000000000000e0 R14: ffff810066401ee8 R15: 0000000000000001
    FS: 00007f1c2c10d780(0000) GS:ffff81007f801578(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 0000000002aabfd3 CR3: 00000000665e3000 CR4: 00000000000006e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process sshd (pid: 2946, threadinfo ffff810066400000, task ffff8100665ce000)
    Stack: ffff81003ddc9900 ffffffff81228b26 0000000000000000 0000000100000000
    ffff810066401ee8 00000000810574da 000004e000000040 000000e0000004e0
    00007f1c2c797620 0000000000000246 0000000066401d60 0000000000000000
    Call Trace:
    [] tcp_sendmsg+0x21f/0xb00
    [] sock_aio_write+0xf8/0x110
    [] do_sync_write+0xc9/0x10c
    [] file_has_perm+0x9a/0xa9
    [] autoremove_wake_function+0x0/0x2e
    [] __lock_acquire+0x50f/0xc8e
    [] lock_release_holdtime+0x27/0x48
    [] vfs_write+0xd9/0x16f
    [] sys_write+0x45/0x6e
    [] tracesys+0xdc/0xe1


    Code: 44 3b 4a 1c 79 10 44 3b 4a 18 78 04 0f 0b eb fe 48 8d 50 10
    RIP [] __tcp_rb_insert+0x1a/0x67
    RSP


    --
    "Debugging is twice as hard as writing the code in the first place.
    Therefore, if you write the code as cleverly as possible, you are,
    by definition, not smart enough to debug it." - Brian W. Kernighan
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: 2.6.24-rc4-mm1

    Ilpo Järvinen wrote:
    > On Wed, 5 Dec 2007, David Miller wrote:
    >
    >> From: Reuben Farrelly
    >> Date: Thu, 06 Dec 2007 17:59:37 +1100
    >>
    >>> On 5/12/2007 4:17 PM, Andrew Morton wrote:
    >>>> - Lots of device IDs have been removed from the e1000 driver and moved over
    >>>> to e1000e. So if your e1000 stops working, you forgot to set CONFIG_E1000E.
    >>> This non fatal oops which I have just noticed may be related to this change then
    >>> - certainly looks networking related.
    >>>
    >>> WARNING: at net/ipv4/tcp_input.c:2518 tcp_fastretrans_alert()
    >>> Pid: 0, comm: swapper Not tainted 2.6.24-rc4-mm1 #1
    >>>
    >>> Call Trace:
    >>> [] tcp_fastretrans_alert+0x229/0xe63
    >>> [] tcp_ack+0xa3f/0x127d
    >>> [] tcp_rcv_established+0x55f/0x7f8
    >>> [] tcp_v4_do_rcv+0xdb/0x3a7
    >>> [] :nf_conntrack:nf_ct_deliver_cached_events+0x75/0x99

    >> No, it's from TCP assertions and changes added by Ilpo to the
    >> net-2.6.25 tree recently.

    >
    > Yeah, this (very likely) due to the new SACK processing (in net-2.6.25).
    > I'll look what could go wrong with fack_count calculations, most likely
    > it's the reason (I've found earlier one out-of-place retransmission
    > segment in one of my test case which already indicated that there's
    > something incorrect with them but didn't have time to debug it yet).
    >
    > Thanks for report. Some info about how easily you can reproduce &
    > couple of sentences about the test case might be useful later on when
    > evaluating the fix.


    I also got plenty of these when untaring a tarball on NFS.

    C.

    WARNING: at /home/legoater/linux/2.6.24-rc4-mm1/net/ipv4/tcp_input.c:2518 tcp_fastretrans_alert()
    Pid: 0, comm: swapper Not tainted 2.6.24-rc4-mm1 #2

    Call Trace:
    [] tcp_fastretrans_alert+0xb6/0xbf2
    [] tcp_ack+0xdf3/0xfbe
    [] sk_reset_timer+0x17/0x23
    [] tcp_rcv_established+0xf3/0x76d
    [] tcp_v4_do_rcv+0x37/0x3aa
    [] tcp_v4_rcv+0x9a9/0xa76
    [] ip_local_deliver_finish+0x161/0x23c
    [] ip_local_deliver+0x72/0x77
    [] ip_rcv_finish+0x371/0x3b5
    [] ip_rcv+0x292/0x2c6
    [] netif_receive_skb+0x267/0x340
    [] :tg3:tg3_poll+0x5d2/0x89e
    [] net_rx_action+0xd5/0x1ad
    [] __do_softirq+0x5f/0xe3
    [] call_softirq+0x1c/0x28
    [] do_softirq+0x39/0x9f
    [] irq_exit+0x4e/0x50
    [] do_IRQ+0xb7/0xd7
    [] mwait_idle+0x0/0x55
    [] ret_from_intr+0x0/0xf
    [] __atomic_notifier_call_chain+0x20/0x83
    [] mwait_idle+0x48/0x55
    [] enter_idle+0x22/0x24
    [] cpu_idle+0xa1/0xc5
    [] start_secondary+0x3b9/0x3c5

    WARNING: at /home/legoater/linux/2.6.24-rc4-mm1/net/ipv4/tcp_input.c:2518 tcp_fastretrans_alert()
    Pid: 0, comm: swapper Not tainted 2.6.24-rc4-mm1 #2

    Call Trace:
    [] tcp_fastretrans_alert+0xb6/0xbf2
    [] tcp_ack+0xdf3/0xfbe
    [] tcp_data_queue+0x5da/0xb0a
    [] tcp_rcv_established+0xf3/0x76d
    [] tcp_v4_do_rcv+0x37/0x3aa
    [] tcp_v4_rcv+0x9a9/0xa76
    [] ip_local_deliver_finish+0x161/0x23c
    [] ip_local_deliver+0x72/0x77
    [] ip_rcv_finish+0x371/0x3b5
    [] ip_rcv+0x292/0x2c6
    [] netif_receive_skb+0x267/0x340
    [] :tg3:tg3_poll+0x5d2/0x89e
    [] net_rx_action+0xd5/0x1ad
    [] __do_softirq+0x5f/0xe3
    [] call_softirq+0x1c/0x28
    [] do_softirq+0x39/0x9f
    [] irq_exit+0x4e/0x50
    [] do_IRQ+0xb7/0xd7
    [] mwait_idle+0x0/0x55
    [] ret_from_intr+0x0/0xf
    [] __atomic_notifier_call_chain+0x20/0x83
    [] mwait_idle+0x48/0x55
    [] enter_idle+0x22/0x24
    [] cpu_idle+0xa1/0xc5
    [] start_secondary+0x3b9/0x3c5

    WARNING: at /home/legoater/linux/2.6.24-rc4-mm1/net/ipv4/tcp_input.c:2518 tcp_fastretrans_alert()
    Pid: 0, comm: swapper Not tainted 2.6.24-rc4-mm1 #2

    Call Trace:
    [] tcp_fastretrans_alert+0xb6/0xbf2
    [] tcp_ack+0xdf3/0xfbe
    [] tcp_data_queue+0x5da/0xb0a
    [] tcp_rcv_established+0xf3/0x76d
    [] tcp_v4_do_rcv+0x37/0x3aa
    [] tcp_v4_rcv+0x9a9/0xa76
    [] ip_local_deliver_finish+0x161/0x23c
    [] ip_local_deliver+0x72/0x77
    [] ip_rcv_finish+0x371/0x3b5
    [] ip_rcv+0x292/0x2c6
    [] netif_receive_skb+0x267/0x340
    [] :tg3:tg3_poll+0x5d2/0x89e
    [] net_rx_action+0xd5/0x1ad
    [] __do_softirq+0x5f/0xe3
    [] call_softirq+0x1c/0x28
    [] do_softirq+0x39/0x9f
    [] irq_exit+0x4e/0x50
    [] do_IRQ+0xb7/0xd7
    [] mwait_idle+0x0/0x55
    [] ret_from_intr+0x0/0xf
    [] __atomic_notifier_call_chain+0x20/0x83
    [] mwait_idle+0x48/0x55
    [] enter_idle+0x22/0x24
    [] cpu_idle+0xa1/0xc5
    [] start_secondary+0x3b9/0x3c5

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: 2.6.24-rc4-mm1

    Ilpo Järvinen wrote:
    > On Wed, 5 Dec 2007, Andrew Morton wrote:
    >
    >> On Thu, 06 Dec 2007 17:59:37 +1100 Reuben Farrelly wrote:
    >>
    >>> This non fatal oops which I have just noticed may be related to this change then
    >>> - certainly looks networking related.

    >> yep, but it isn't e1000. It's core TCP.
    >>
    >>> WARNING: at net/ipv4/tcp_input.c:2518 tcp_fastretrans_alert()
    >>> Pid: 0, comm: swapper Not tainted 2.6.24-rc4-mm1 #1

    >> Ilpo, Reuben's kernel is talking to you

    >
    > ...Please try the patch below. Andrew, this probably fixes your problem
    > (the packets <= tp->packets_out) as well.


    nah. I got the WARNINGs again with this patch.

    C.

    > Dave, please include this one to net-2.6.25.
    >
    >


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. tcp_sacktag_one() WARNING (was Re: 2.6.24-rc4-mm1)

    Cedric Le Goater wrote:
    > Ilpo Järvinen wrote:
    >> On Wed, 5 Dec 2007, Andrew Morton wrote:
    >>
    >>> On Thu, 06 Dec 2007 17:59:37 +1100 Reuben Farrelly wrote:
    >>>
    >>>> This non fatal oops which I have just noticed may be related to this change then
    >>>> - certainly looks networking related.
    >>> yep, but it isn't e1000. It's core TCP.
    >>>
    >>>> WARNING: at net/ipv4/tcp_input.c:2518 tcp_fastretrans_alert()
    >>>> Pid: 0, comm: swapper Not tainted 2.6.24-rc4-mm1 #1
    >>> Ilpo, Reuben's kernel is talking to you

    >> ...Please try the patch below. Andrew, this probably fixes your problem
    >> (the packets <= tp->packets_out) as well.

    >
    > nah. I got the WARNINGs again with this patch.


    I got this new one on a 2.6.24-rc5-mm1. It looked similar ?

    C.

    WARNING: at /home/legoater/linux/2.6.24-rc5-mm1/net/ipv4/tcp_input.c:1280 tcp_sacktag_one()
    Pid: 0, comm: swapper Not tainted 2.6.24-rc5-mm1 #1

    Call Trace:
    [] tcp_sacktag_walk+0x2bc/0x62a
    [] tcp_sacktag_write_queue+0x595/0xa7c
    [] kfree+0xd4/0xe0
    [] tcp_ack+0x2a7/0xfc7
    [] mark_held_locks+0x47/0x6a
    [] trace_hardirqs_on+0xfe/0x139
    [] tcp_rcv_established+0x66a/0x76d
    [] tcp_v4_do_rcv+0x37/0x3aa
    [] tcp_v4_rcv+0x9a9/0xa76
    [] ip_local_deliver_finish+0x161/0x23c
    [] ip_local_deliver+0x72/0x77
    [] ip_rcv_finish+0x371/0x3b5
    [] ip_rcv+0x292/0x2c6
    [] netif_receive_skb+0x267/0x340
    [] :tg3:tg3_poll+0x5d2/0x89e
    [] net_rx_action+0xd5/0x1ad
    [] __do_softirq+0x5f/0xe3
    [] call_softirq+0x1c/0x28
    [] do_softirq+0x39/0x9f
    [] irq_exit+0x4e/0x50
    [] do_IRQ+0xb7/0xd7
    [] mwait_idle+0x0/0x52
    [] ret_from_intr+0x0/0xf
    [] __atomic_notifier_call_chain+0x20/0x83
    [] mwait_idle+0x48/0x52
    [] enter_idle+0x22/0x24
    [] cpu_idle+0xa1/0xc5
    [] start_secondary+0x3b9/0x3c5
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: 2.6.24-rc4-mm1 - BUG in tcp_fragment

    Andrew Morton wrote:
    > Temporarily at
    >
    > http://userweb.kernel.org/~akpm/2.6.24-rc4-mm1/
    >
    > Will appear later at
    >
    > ftp://ftp.kernel.org/pub/linux/kerne....6.24-rc4-mm1/


    I got this one while compiling on NFS.

    C.

    kernel BUG at /home/legoater/linux/2.6.24-rc4-mm1/include/net/tcp.h:1480!
    invalid opcode: 0000 [1] SMP
    last sysfs file: /sys/devices/pci0000:00/0000:00:1e.0/0000:01:01.0/local_cpus
    CPU 1
    Modules linked in: autofs4 nfs lockd sunrpc tg3 sg joydev ext3 jbd ehci_hcd ohci_hcd uhci_hcd
    Pid: 0, comm: swapper Not tainted 2.6.24-rc4-mm1 #3
    RIP: 0010:[] [] tcp_fragment+0x5ee/0x6f7
    RSP: 0018:ffff810147c9f9e0 EFLAGS: 00010217
    RAX: 000000001526c311 RBX: ffff8100c2ce1d00 RCX: ffff810143cc6aa0
    RDX: 0000000000000001 RSI: ffff810102b37b00 RDI: ffff810102b37b50
    RBP: ffff810147c9fa50 R08: 000000000000004a R09: 0000000000000001
    R10: 0000000000000b50 R11: 0000000000000001 R12: ffff81013a575700
    R13: 0000000000000000 R14: ffff810143cc6400 R15: ffff81013a575750
    FS: 0000000000000000(0000) GS:ffff810147c57140(0000) knlGS:0000000000000000
    CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
    CR2: 00002ad5d294b000 CR3: 00000000bd11b000 CR4: 00000000000006e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process swapper (pid: 0, threadinfo ffff810147c98000, task ffff810147c89040)
    Stack: ffff810147c9fa00 ffffffff00000000 000005a843cc6400 ffff810143cc6400
    ffff810147c9fa70 ffff8100c2ce1d50 ffff810143cc6590 ffff810143cc6aa0
    1526542100000000 ffff810143cc6400 ffff810143cc6400 ffff81013a575700
    Call Trace:
    [] tcp_retransmit_skb+0xd6/0x713
    [] tcp_xmit_retransmit_queue+0xd0/0x330
    [] tcp_fastretrans_alert+0xb92/0xbf2
    [] tcp_ack+0xdf3/0xfbe
    [] tcp_rcv_established+0x66a/0x76d
    [] tcp_v4_do_rcv+0x37/0x3aa
    [] tcp_v4_rcv+0x9a9/0xa76
    [] ip_local_deliver_finish+0x161/0x23c
    [] ip_local_deliver+0x72/0x77
    [] ip_rcv_finish+0x371/0x3b5
    [] ip_rcv+0x292/0x2c6
    [] netif_receive_skb+0x267/0x340
    [] :tg3:tg3_poll+0x5d2/0x89e
    [] net_rx_action+0xd5/0x1ad
    [] __do_softirq+0x5f/0xe3
    [] call_softirq+0x1c/0x28
    [] do_softirq+0x39/0x9f
    [] irq_exit+0x4e/0x50
    [] do_IRQ+0xb7/0xd7
    [] mwait_idle+0x0/0x55
    [] ret_from_intr+0x0/0xf
    [] __atomic_notifier_call_chain+0x20/0x83
    [] mwait_idle+0x48/0x55
    [] enter_idle+0x22/0x24
    [] cpu_idle+0xa1/0xc5
    [] start_secondary+0x3b9/0x3c5


    Code: 0f 0b eb fe 48 85 f6 74 08 8b 46 6c 3b 41 68 75 55 48 8d 41
    RIP [] tcp_fragment+0x5ee/0x6f7
    RSP
    Kernel panic - not syncing: Aiee, killing interrupt handler!
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: 2.6.24-rc4-mm1 - BUG in tcp_fragment

    On Thu, 13 Dec 2007, Cedric Le Goater wrote:

    > I got this one while compiling on NFS.
    >
    > C.
    >
    > kernel BUG at /home/legoater/linux/2.6.24-rc4-mm1/include/net/tcp.h:1480!


    I'm not exactly sure what patches you have applied and which patches are
    not, with rc4-mm1 there are two patches (first one was incomplete, I
    assume you had at least that one based on your other mail) to really fix
    the issues in (__|)tcp_reset_fack_counts(...). However, there seems to be
    so much breakage that I have a bit trouble to decide where to start...
    The situation seems bit scary :-).

    So, I might soon prepare a revert patch for most of the questionable
    TCP parts and ask Dave to apply it (and drop them fully during next
    rebase) unless I suddently figure something out soon which explains
    all/most of the problems, then return to drawing board. ...As it seems
    that the cumulative ACK processing problem discovered later on (having
    rather cumbersome solution with skbs only) will make part of the work
    that's currently in net-2.6.25 quite useless/duplicate effort. But thanks
    anyway for reporting these.


    --
    i.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: 2.6.24-rc4-mm1 - BUG in tcp_fragment

    Ilpo Järvinen wrote:
    > On Thu, 13 Dec 2007, Cedric Le Goater wrote:
    >
    >> I got this one while compiling on NFS.
    >>
    >> C.
    >>
    >> kernel BUG at /home/legoater/linux/2.6.24-rc4-mm1/include/net/tcp.h:1480!

    >
    > I'm not exactly sure what patches you have applied and which patches are
    > not, with rc4-mm1 there are two patches (first one was incomplete, I
    > assume you had at least that one based on your other mail) to really fix
    > the issues in (__|)tcp_reset_fack_counts(...).


    Yes I only have the first patch you sent on lkml on top of 2.6.24-rc4-mm1.
    attached below. I didn't see the second one on lkml ?

    > However, there seems to be so much breakage that I have a bit trouble to
    > decide where to start... The situation seems bit scary :-).


    my n/w environment seems to reproduce these issues quite easily. if you
    need some testing, just ping me.

    Cheers,

    C.

    > So, I might soon prepare a revert patch for most of the questionable
    > TCP parts and ask Dave to apply it (and drop them fully during next
    > rebase) unless I suddently figure something out soon which explains
    > all/most of the problems, then return to drawing board. ...As it seems
    > that the cumulative ACK processing problem discovered later on (having
    > rather cumbersome solution with skbs only) will make part of the work
    > that's currently in net-2.6.25 quite useless/duplicate effort. But thanks
    > anyway for reporting these.
    >
    >


    Subject: [PATCH] [TCP]: Fix fack_count miscountings (multiple places)

    1) Fack_count is set incorrectly if the highest sent skb is
    already sacked (the skb->prev won't return it because it's on
    the other list already). These manifest as fackets_out counting
    error later on, the second-order effects are very hard to track,
    so it may fix all out-standing TCP bug reports.

    2) Prev == NULL check was wrong way around

    3) Last skb's fack count was incorrectly skipped while() {} loop

    Signed-off-by: Ilpo Järvinen
    ---
    include/net/tcp.h | 22 ++++++++++++++++------
    1 files changed, 16 insertions(+), 6 deletions(-)

    diff --git a/include/net/tcp.h b/include/net/tcp.h
    index 9dbed0b..11a7e3e 100644
    --- a/include/net/tcp.h
    +++ b/include/net/tcp.h
    @@ -1337,10 +1337,20 @@ static inline struct sk_buff *tcp_send_head(struct sock *sk)
    static inline void tcp_advance_send_head(struct sock *sk, struct sk_buff *skb)
    {
    struct sk_buff *prev = tcp_write_queue_prev(sk, skb);
    + unsigned int fc = 0;
    +
    + if (prev == (struct sk_buff *)&sk->sk_write_queue)
    + prev = NULL;
    + else if (!tcp_skb_adjacent(sk, prev, skb))
    + prev = NULL;

    - if (prev != (struct sk_buff *)&sk->sk_write_queue)
    - TCP_SKB_CB(skb)->fack_count = TCP_SKB_CB(prev)->fack_count +
    - tcp_skb_pcount(prev);
    + if ((prev == NULL) && !__tcp_write_queue_empty(sk, TCP_WQ_SACKED))
    + prev = __tcp_write_queue_tail(sk, TCP_WQ_SACKED);
    +
    + if (prev != NULL)
    + fc = TCP_SKB_CB(prev)->fack_count + tcp_skb_pcount(prev);
    +
    + TCP_SKB_CB(skb)->fack_count = fc;

    sk->sk_send_head = tcp_write_queue_next(sk, skb);
    if (sk->sk_send_head == (struct sk_buff *)&sk->sk_write_queue)
    @@ -1464,7 +1474,7 @@ static inline struct sk_buff *__tcp_reset_fack_counts(struct sock *sk,
    {
    unsigned int fc = 0;

    - if (prev == NULL)
    + if (prev != NULL)
    fc = TCP_SKB_CB(*prev)->fack_count + tcp_skb_pcount(*prev);

    BUG_ON((*prev != NULL) && !tcp_skb_adjacent(sk, *prev, skb));
    @@ -1521,7 +1531,7 @@ static inline void tcp_reset_fack_counts(struct sock *sk, struct sk_buff *inskb)
    skb[otherq] = prev->next;
    }

    - while (skb[queue] != __tcp_write_queue_tail(sk, queue)) {
    + do {
    /* Lazy find for the other queue */
    if (skb[queue] == NULL) {
    skb[queue] = tcp_write_queue_find(sk, TCP_SKB_CB(prev)->seq,
    @@ -1535,7 +1545,7 @@ static inline void tcp_reset_fack_counts(struct sock *sk, struct sk_buff *inskb)
    break;

    queue ^= TCP_WQ_SACKED;
    - }
    + } while (skb[queue] != __tcp_write_queue_tail(sk, queue));
    }

    static inline void __tcp_insert_write_queue_after(struct sk_buff *skb,
    -- 1.5.0.6
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. [PATCH net-2.6.25] Revert recent TCP work

    On Fri, 14 Dec 2007, Ilpo Järvinen wrote:

    > So, I might soon prepare a revert patch for most of the questionable
    > TCP parts and ask Dave to apply it (and drop them fully during next
    > rebase) unless I suddently figure something out soon which explains
    > all/most of the problems, then return to drawing board. ...As it seems
    > that the cumulative ACK processing problem discovered later on (having
    > rather cumbersome solution with skbs only) will make part of the work
    > that's currently in net-2.6.25 quite useless/duplicate effort. But thanks
    > anyway for reporting these.


    Hi Dave,

    Could you either drop my recent patches (+one fix to them from Herbert
    Xu == "[TCP]: Fix crash in tcp_advance_send_head"), all mine after "[TCP]:
    Abstract tp->highest_sack accessing & point to next skb" from net-2.6.25
    or just apply the revert from below and do the removal during next rebase.
    I think it could even be automated by something like this (untested):
    for i in $(cat commits | cut -d ' ' -f 1); do git-rebase --onto $i^ $i; done
    (I've attached the commits list).

    I'll resend small bits that are still useful but get removed in this kind
    of straightforward operation (I guess it's easier for you to track this
    way and makes conflicts a non-problem).

    ....It was buggy as well, I've tried to Cc all bug reporters that I've
    noticed so far... Related bugs include at least these cases:

    These are completely removed by this revert:
    __tcp_rb_insert
    (__|)tcp_reset_fack_counts
    May still trigger later due to other, genuine bugs:
    tcp_sacktag_one (I'll rework & resend this soon)
    tcp_fastretrans_alert (fackets_out trap)
    BUG_TRAP(packets <= tp->packets_out); in tcp_mark_head_lost

    --
    i.


    [PATCH net-2.6.25] Revert recent TCP work

    It was recently discovered that there's yet another processing
    aspect to consider related to cumulative ACK processing. This
    solution wasn't enough to handle that but "(arguably) complex"
    and intrusive changes were still necessary in addition to the
    complexity this already introduced. Another approach is on the
    drawing board.

    This was somehow buggy as well, a lot of reports against it
    were filed already :-), but hunting the cause doesn't seem so
    beneficial anymore.

    Signed-off-by: Ilpo Järvinen
    ---
    include/linux/skbuff.h | 3 -
    include/linux/tcp.h | 4 -
    include/net/tcp.h | 362 ++++------------------------------------------
    net/ipv4/tcp_input.c | 341 ++++++++++++++++++++-----------------------
    net/ipv4/tcp_ipv4.c | 1 -
    net/ipv4/tcp_minisocks.c | 1 -
    net/ipv4/tcp_output.c | 13 +-
    net/ipv6/tcp_ipv6.c | 1 -
    8 files changed, 196 insertions(+), 530 deletions(-)

    diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
    index f21fee6..c618fbf 100644
    --- a/include/linux/skbuff.h
    +++ b/include/linux/skbuff.h
    @@ -18,7 +18,6 @@
    #include
    #include
    #include
    -#include

    #include
    #include
    @@ -254,8 +253,6 @@ struct sk_buff {
    struct sk_buff *next;
    struct sk_buff *prev;

    - struct rb_node rb;
    -
    struct sock *sk;
    ktime_t tstamp;
    struct net_device *dev;
    diff --git a/include/linux/tcp.h b/include/linux/tcp.h
    index 56342c3..08027f1 100644
    --- a/include/linux/tcp.h
    +++ b/include/linux/tcp.h
    @@ -174,7 +174,6 @@ struct tcp_md5sig {

    #include
    #include
    -#include
    #include
    #include
    #include
    @@ -321,9 +320,6 @@ struct tcp_sock {
    u32 snd_cwnd_used;
    u32 snd_cwnd_stamp;

    - struct rb_root write_queue_rb;
    - struct rb_root sacked_queue_rb;
    - struct sk_buff_head sacked_queue;
    struct sk_buff_head out_of_order_queue; /* Out of order segments go here */

    u32 rcv_wnd; /* Current receiver window */
    diff --git a/include/net/tcp.h b/include/net/tcp.h
    index 5e6c433..5ec1cac 100644
    --- a/include/net/tcp.h
    +++ b/include/net/tcp.h
    @@ -555,7 +555,6 @@ struct tcp_skb_cb {
    __u32 seq; /* Starting sequence number */
    __u32 end_seq; /* SEQ + FIN + SYN + datalen */
    __u32 when; /* used to compute rtt's */
    - unsigned int fack_count; /* speed up SACK processing */
    __u8 flags; /* TCP header flags. */

    /* NOTE: These must match up to the flags byte in a
    @@ -1191,112 +1190,29 @@ static inline void tcp_put_md5sig_pool(void)
    }

    /* write queue abstraction */
    -#define TCP_WQ_SACKED 1
    -
    -static inline struct sk_buff_head *__tcp_list_select(struct sock *sk, const int queue)
    -{
    - if (queue == TCP_WQ_SACKED)
    - return &tcp_sk(sk)->sacked_queue;
    - else
    - return &sk->sk_write_queue;
    -}
    -
    -static inline struct rb_root *__tcp_tree_select(struct sock *sk, const int tree)
    -{
    - if (tree == TCP_WQ_SACKED)
    - return &tcp_sk(sk)->sacked_queue_rb;
    - else
    - return &tcp_sk(sk)->write_queue_rb;
    -}
    -
    -/* All SACKed except S|R go to a separate skb space */
    -static inline int __tcp_skb_queue_select(const struct sk_buff *skb)
    -{
    - if ((TCP_SKB_CB(skb)->sacked &
    - (TCPCB_SACKED_ACKED|TCPCB_SACKED_RETRANS)) ==
    - TCPCB_SACKED_ACKED)
    - return TCP_WQ_SACKED;
    - else
    - return 0;
    -}
    -
    -static inline void tcp_write_queue_init(struct sock *sk)
    -{
    - tcp_sk(sk)->write_queue_rb = RB_ROOT;
    - tcp_sk(sk)->sacked_queue_rb = RB_ROOT;
    - skb_queue_head_init(&tcp_sk(sk)->sacked_queue);
    -}
    -
    -static inline void __tcp_write_queue_purge(struct sock *sk, int queue)
    +static inline void tcp_write_queue_purge(struct sock *sk)
    {
    struct sk_buff *skb;

    - while ((skb = __skb_dequeue(__tcp_list_select(sk, queue))) != NULL)
    + while ((skb = __skb_dequeue(&sk->sk_write_queue)) != NULL)
    sk_stream_free_skb(sk, skb);
    - *__tcp_tree_select(sk, queue) = RB_ROOT;
    -}
    -
    -static inline void tcp_write_queue_purge(struct sock *sk)
    -{
    - __tcp_write_queue_purge(sk, 0);
    - __tcp_write_queue_purge(sk, TCP_WQ_SACKED);
    sk_stream_mem_reclaim(sk);
    }

    -static inline struct sk_buff *__tcp_write_queue_head(struct sock *sk, int queue)
    -{
    - struct sk_buff *skb = __tcp_list_select(sk, queue)->next;
    - if (skb == (struct sk_buff *)__tcp_list_select(sk, queue))
    - return NULL;
    - return skb;
    -}
    -
    static inline struct sk_buff *tcp_write_queue_head(struct sock *sk)
    {
    - return __tcp_write_queue_head(sk, 0);
    -}
    -
    -/* FIXME, this should eventually vanish because callers likely benefit
    - * from scanning the non-SACKed and SACKed spaces separately.
    - */
    -static inline struct sk_buff *tcp_real_queue_head(struct sock *sk)
    -{
    - struct sk_buff *skb, *sacked;
    -
    - skb = tcp_write_queue_head(sk);
    - sacked = __tcp_write_queue_head(sk, TCP_WQ_SACKED);
    -
    - if (skb == NULL)
    - return sacked;
    - if (sacked == NULL)
    - return skb;
    -
    - if (after(TCP_SKB_CB(skb)->seq, TCP_SKB_CB(sacked)->seq))
    - return sacked;
    - return skb;
    -}
    -
    -static inline struct sk_buff *__tcp_write_queue_tail(struct sock *sk, int queue)
    -{
    - struct sk_buff *skb = __tcp_list_select(sk, queue)->prev;
    - if (skb == (struct sk_buff *)__tcp_list_select(sk, queue))
    + struct sk_buff *skb = sk->sk_write_queue.next;
    + if (skb == (struct sk_buff *) &sk->sk_write_queue)
    return NULL;
    return skb;
    }

    static inline struct sk_buff *tcp_write_queue_tail(struct sock *sk)
    {
    - return __tcp_write_queue_tail(sk, 0);
    -}
    -
    -static inline int __tcp_write_queue_empty(struct sock *sk, int queue)
    -{
    - return skb_queue_empty(__tcp_list_select(sk, queue));
    -}
    -
    -static inline int tcp_write_queue_empty(struct sock *sk)
    -{
    - return __tcp_write_queue_empty(sk, 0);
    + struct sk_buff *skb = sk->sk_write_queue.prev;
    + if (skb == (struct sk_buff *) &sk->sk_write_queue)
    + return NULL;
    + return skb;
    }

    static inline struct sk_buff *tcp_write_queue_next(struct sock *sk, struct sk_buff *skb)
    @@ -1304,29 +1220,18 @@ static inline struct sk_buff *tcp_write_queue_next(struct sock *sk, struct sk_bu
    return skb->next;
    }

    -static inline struct sk_buff *tcp_write_queue_prev(struct sock *sk, struct sk_buff *skb)
    -{
    - return skb->prev;
    -}
    -
    -static inline int tcp_skb_adjacent(struct sock *sk, struct sk_buff *skb,
    - struct sk_buff *next)
    -{
    - return TCP_SKB_CB(skb)->end_seq == TCP_SKB_CB(next)->seq;
    -}
    -
    -#define tcp_for_write_queue(skb, sk, queue) \
    - for (skb = __tcp_list_select(sk, queue)->next; \
    - (skb != (struct sk_buff *)__tcp_list_select(sk, queue));\
    +#define tcp_for_write_queue(skb, sk) \
    + for (skb = (sk)->sk_write_queue.next; \
    + (skb != (struct sk_buff *)&(sk)->sk_write_queue); \
    skb = skb->next)

    -#define tcp_for_write_queue_from(skb, sk, queue) \
    - for (; (skb != (struct sk_buff *)__tcp_list_select(sk, queue));\
    +#define tcp_for_write_queue_from(skb, sk) \
    + for (; (skb != (struct sk_buff *)&(sk)->sk_write_queue);\
    skb = skb->next)

    -#define tcp_for_write_queue_from_safe(skb, tmp, sk, queue) \
    +#define tcp_for_write_queue_from_safe(skb, tmp, sk) \
    for (tmp = skb->next; \
    - (skb != (struct sk_buff *)__tcp_list_select(sk, queue));\
    + (skb != (struct sk_buff *)&(sk)->sk_write_queue); \
    skb = tmp, tmp = skb->next)

    static inline struct sk_buff *tcp_send_head(struct sock *sk)
    @@ -1336,23 +1241,7 @@ static inline struct sk_buff *tcp_send_head(struct sock *sk)

    static inline void tcp_advance_send_head(struct sock *sk, struct sk_buff *skb)
    {
    - struct sk_buff *prev = tcp_write_queue_prev(sk, skb);
    - unsigned int fc = 0;
    -
    - if (prev == (struct sk_buff *)&sk->sk_write_queue)
    - prev = NULL;
    - else if (!tcp_skb_adjacent(sk, prev, skb))
    - prev = NULL;
    -
    - if ((prev == NULL) && !__tcp_write_queue_empty(sk, TCP_WQ_SACKED))
    - prev = __tcp_write_queue_tail(sk, TCP_WQ_SACKED);
    -
    - if (prev != NULL)
    - fc = TCP_SKB_CB(prev)->fack_count + tcp_skb_pcount(prev);
    -
    - TCP_SKB_CB(skb)->fack_count = fc;
    -
    - sk->sk_send_head = tcp_write_queue_next(sk, skb);
    + sk->sk_send_head = skb->next;
    if (sk->sk_send_head == (struct sk_buff *)&sk->sk_write_queue)
    sk->sk_send_head = NULL;
    }
    @@ -1368,78 +1257,9 @@ static inline void tcp_init_send_head(struct sock *sk)
    sk->sk_send_head = NULL;
    }

    -static inline struct sk_buff *__tcp_write_queue_find(struct rb_node *rb_node,
    - __u32 seq)
    -{
    - struct sk_buff *skb = NULL;
    -
    - while (rb_node) {
    - struct sk_buff *tmp = rb_entry(rb_node,struct sk_buff,rb);
    - if (after(TCP_SKB_CB(tmp)->end_seq, seq)) {
    - skb = tmp;
    - if (!after(TCP_SKB_CB(tmp)->seq, seq))
    - break;
    - rb_node = rb_node->rb_left;
    - } else
    - rb_node = rb_node->rb_right;
    -
    - }
    - return skb;
    -}
    -
    -static inline struct sk_buff *tcp_write_queue_find(struct sock *sk, __u32 seq, int tree)
    -{
    - return __tcp_write_queue_find(__tcp_tree_select(sk, tree)->rb_node, seq);
    -}
    -
    -/* Inserts skb into RB-tree root, prev node (ie., the skb before the inserted
    - * one) is returned, which is available as a side-effect from parent of the
    - * last rb_right edge. If no rb_right edge is walked, NULL is returned (tree
    - * does not contain a smaller node).
    - */
    -static struct sk_buff *__tcp_rb_insert(struct sk_buff *skb,
    - struct rb_root *root)
    -{
    - struct rb_node **rb_link, *rb_parent;
    - struct sk_buff *prev = NULL;
    - __u32 seq = TCP_SKB_CB(skb)->seq;
    -
    - rb_link = &root->rb_node;
    - rb_parent = NULL;
    - while (*rb_link) {
    - struct sk_buff *tmp;
    -
    - rb_parent = *rb_link;
    - tmp = rb_entry(rb_parent,struct sk_buff,rb);
    - if (after(TCP_SKB_CB(tmp)->end_seq, seq)) {
    - BUG_ON(!after(TCP_SKB_CB(tmp)->seq, seq));
    - rb_link = &rb_parent->rb_left;
    - } else {
    - rb_link = &rb_parent->rb_right;
    - prev = tmp;
    - }
    - }
    - rb_link_node(&skb->rb, rb_parent, rb_link);
    - rb_insert_color(&skb->rb, root);
    -
    - return prev;
    -}
    -
    -static inline void tcp_rb_insert(struct sk_buff *skb, struct rb_root *root)
    -{
    - __tcp_rb_insert(skb, root);
    -}
    -
    -static inline void tcp_rb_unlink(struct sk_buff *skb, struct rb_root *root)
    -{
    - rb_erase(&skb->rb, root);
    -}
    -
    static inline void __tcp_add_write_queue_tail(struct sock *sk, struct sk_buff *skb)
    {
    - TCP_SKB_CB(skb)->fack_count = 0;
    __skb_queue_tail(&sk->sk_write_queue, skb);
    - tcp_rb_insert(skb, &tcp_sk(sk)->write_queue_rb);
    }

    static inline void tcp_add_write_queue_tail(struct sock *sk, struct sk_buff *skb)
    @@ -1455,90 +1275,9 @@ static inline void tcp_add_write_queue_tail(struct sock *sk, struct sk_buff *skb
    }
    }

    -/* This is only used for tcp_send_synack(), so the write queue should
    - * be empty. If that stops being true, the fack_count assignment
    - * will need to be more elaborate.
    - */
    static inline void __tcp_add_write_queue_head(struct sock *sk, struct sk_buff *skb)
    {
    - BUG_ON(!skb_queue_empty(&sk->sk_write_queue));
    __skb_queue_head(&sk->sk_write_queue, skb);
    - TCP_SKB_CB(skb)->fack_count = 0;
    - tcp_rb_insert(skb, &tcp_sk(sk)->write_queue_rb);
    -}
    -
    -/* An insert into the middle of the write queue causes the fack
    - * counts in subsequent packets to become invalid, fix them up.
    - *
    - * FIXME, this definately could be improved!
    - */
    -static inline void tcp_reset_fack_counts(struct sock *sk, struct sk_buff *inskb)
    -{
    - struct sk_buff *prev;
    - struct sk_buff *skb[2] = {NULL, NULL};
    - int queue;
    - unsigned int fc = 0;
    -
    - if (!before(TCP_SKB_CB(inskb)->seq, tcp_sk(sk)->snd_nxt))
    - return;
    -
    - queue = __tcp_skb_queue_select(inskb);
    - skb[queue] = inskb;
    -
    - prev = inskb->prev;
    - if (inskb == __tcp_write_queue_head(sk, queue))
    - prev = NULL;
    -
    - if (((prev != NULL) && !tcp_skb_adjacent(sk, prev, inskb)) ||
    - ((prev == NULL) && (TCP_SKB_CB(inskb)->seq != tcp_sk(sk)->snd_una))) {
    - int otherq = queue ^ TCP_WQ_SACKED;
    -
    - BUG_ON (__tcp_write_queue_empty(sk, otherq));
    - prev = tcp_write_queue_find(sk, TCP_SKB_CB(inskb)->seq - 1,
    - otherq);
    - BUG_ON (prev == NULL || prev == tcp_send_head(sk));
    - skb[otherq] = prev->next;
    - }
    -
    - if (prev != NULL)
    - fc = TCP_SKB_CB(prev)->fack_count + tcp_skb_pcount(prev);
    -
    - while (skb[queue] != (struct sk_buff *)__tcp_list_select(sk, queue)) {
    - /* Lazy find for the other queue */
    - if (skb[queue] == NULL) {
    - skb[queue] = tcp_write_queue_find(sk, TCP_SKB_CB(prev)->seq,
    - queue);
    - if (skb[queue] == NULL)
    - break;
    - }
    -
    - BUG_ON((prev != NULL) && !tcp_skb_adjacent(sk, prev, skb[queue]));
    -
    - tcp_for_write_queue_from(skb[queue], sk, queue) {
    - if ((prev != NULL) && !tcp_skb_adjacent(sk, prev, skb[queue]))
    - break;
    -
    - if (!before(TCP_SKB_CB(skb[queue])->seq, tcp_sk(sk)->snd_nxt) ||
    - TCP_SKB_CB(skb[queue])->fack_count == fc)
    - return;
    -
    - TCP_SKB_CB(skb[queue])->fack_count = fc;
    - fc += tcp_skb_pcount(skb[queue]);
    -
    - prev = skb[queue];
    - }
    -
    - queue ^= TCP_WQ_SACKED;
    - }
    -}
    -
    -static inline void __tcp_insert_write_queue_after(struct sk_buff *skb,
    - struct sk_buff *buff,
    - struct sock *sk,
    - int queue)
    -{
    - __skb_append(skb, buff, __tcp_list_select(sk, queue));
    - tcp_rb_insert(buff, __tcp_tree_select(sk, queue));
    }

    /* Insert buff after skb on the write queue of sk. */
    @@ -1546,74 +1285,36 @@ static inline void tcp_insert_write_queue_after(struct sk_buff *skb,
    struct sk_buff *buff,
    struct sock *sk)
    {
    - __tcp_insert_write_queue_after(skb, buff, sk, __tcp_skb_queue_select(buff));
    - tcp_reset_fack_counts(sk, buff);
    + __skb_append(skb, buff, &sk->sk_write_queue);
    }

    -/* Insert new before skb on the write queue of sk.
    - *
    - * This is only used for tcp_mtu_probe() new send_head injection. If that
    - * stops being true, needs to consider fack_counts and TCP_WQ_SACKED.
    - */
    -static inline void __tcp_insert_write_queue_before(struct sk_buff *new,
    - struct sk_buff *skb,
    - struct sock *sk)
    +/* Insert skb between prev and next on the write queue of sk. */
    +static inline void tcp_insert_write_queue_before(struct sk_buff *new,
    + struct sk_buff *skb,
    + struct sock *sk)
    {
    - BUG_ON(sk->sk_send_head != skb);
    -
    __skb_insert(new, skb->prev, skb, &sk->sk_write_queue);
    - tcp_rb_insert(new, &tcp_sk(sk)->write_queue_rb);
    - sk->sk_send_head = new;
    -}

    -static inline void tcp_unlink_write_queue(struct sk_buff *skb, struct sock *sk)
    -{
    - int queue = __tcp_skb_queue_select(skb);
    -
    - __skb_unlink(skb, __tcp_list_select(sk, queue));
    - tcp_rb_unlink(skb, __tcp_tree_select(sk, queue));
    + if (sk->sk_send_head == skb)
    + sk->sk_send_head = new;
    }

    -/* Moves skb to queue part of the skb space, a bit fragile, call must be made
    - * prior (important) sacked changes (= ->S and &~R)
    - */
    -static inline void tcp_write_queue_requeue(struct sk_buff *skb,
    - struct sock *sk, int queue)
    +static inline void tcp_unlink_write_queue(struct sk_buff *skb, struct sock *sk)
    {
    - struct sk_buff *prev;
    -
    - /* FIXME, most of hints are to be dropped soon... */
    - if (tcp_sk(sk)->scoreboard_skb_hint == skb)
    - tcp_sk(sk)->scoreboard_skb_hint = skb->next;
    - if (tcp_sk(sk)->forward_skb_hint == skb)
    - tcp_sk(sk)->forward_skb_hint = skb->next;
    - /* ...These have related cnt */
    - if (tcp_sk(sk)->lost_skb_hint == skb)
    - tcp_sk(sk)->lost_skb_hint = NULL;
    - if (tcp_sk(sk)->retransmit_skb_hint == skb)
    - tcp_sk(sk)->retransmit_skb_hint = NULL;
    -
    - /* S|R must not be in SACKed space because of mark_lost_retrans walk */
    - if ((queue == TCP_WQ_SACKED) &&
    - (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_RETRANS))
    - return;
    -
    - tcp_unlink_write_queue(skb, sk);
    -
    - prev = __tcp_rb_insert(skb, __tcp_tree_select(sk, queue));
    - if (prev == NULL)
    - prev = (struct sk_buff *)__tcp_list_select(sk, queue);
    - __skb_append(prev, skb, __tcp_list_select(sk, queue));
    + __skb_unlink(skb, &sk->sk_write_queue);
    }

    static inline int tcp_skb_is_last(const struct sock *sk,
    const struct sk_buff *skb)
    {
    - BUG_ON(__tcp_skb_queue_select(skb) == TCP_WQ_SACKED);
    -
    return skb->next == (struct sk_buff *)&sk->sk_write_queue;
    }

    +static inline int tcp_write_queue_empty(struct sock *sk)
    +{
    + return skb_queue_empty(&sk->sk_write_queue);
    +}
    +
    /* Start sequence of the highest skb with SACKed bit, valid only if
    * sacked > 0 or when the caller has ensured validity by itself.
    */
    @@ -1628,9 +1329,6 @@ static inline u32 tcp_highest_sack_seq(struct tcp_sock *tp)
    return TCP_SKB_CB(tp->highest_sack)->seq;
    }

    -/* This is somewhat dangerous now, because skb must still be in non-sacked
    - * space
    - */
    static inline void tcp_advance_highest_sack(struct sock *sk, struct sk_buff *skb)
    {
    tcp_sk(sk)->highest_sack = tcp_skb_is_last(sk, skb) ? NULL :
    diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
    index 218754b..616bbcb 100644
    --- a/net/ipv4/tcp_input.c
    +++ b/net/ipv4/tcp_input.c
    @@ -1072,7 +1072,7 @@ static void tcp_update_reordering(struct sock *sk, const int metric,
    * the exact amount is rather hard to quantify. However, tp->max_window can
    * be used as an exaggerated estimate.
    */
    -static int tcp_is_sackblock_valid(struct tcp_sock *tp,
    +static int tcp_is_sackblock_valid(struct tcp_sock *tp, int is_dsack,
    u32 start_seq, u32 end_seq)
    {
    /* Too far in future, or reversed (interpretation is ambiguous) */
    @@ -1089,16 +1089,10 @@ static int tcp_is_sackblock_valid(struct tcp_sock *tp,
    if (after(start_seq, tp->snd_una))
    return 1;

    - return 0;
    -}
    -
    -static int tcp_is_past_dsack_useful(struct tcp_sock *tp,
    - u32 start_seq, u32 end_seq)
    -{
    - if (!tp->undo_marker)
    + if (!is_dsack || !tp->undo_marker)
    return 0;

    - /* ...Past D-SACK must reside below snd_una completely */
    + /* ...Then it's D-SACK, and must reside below snd_una completely */
    if (!after(end_seq, tp->snd_una))
    return 0;

    @@ -1138,7 +1132,7 @@ static void tcp_mark_lost_retrans(struct sock *sk)
    icsk->icsk_ca_state != TCP_CA_Recovery)
    return;

    - tcp_for_write_queue(skb, sk, 0) {
    + tcp_for_write_queue(skb, sk) {
    u32 ack_seq = TCP_SKB_CB(skb)->ack_seq;

    if (skb == tcp_send_head(sk))
    @@ -1155,10 +1149,6 @@ static void tcp_mark_lost_retrans(struct sock *sk)
    (tcp_is_fack(tp) ||
    !before(received_upto,
    ack_seq + tp->reordering * tp->mss_cache))) {
    -
    - if (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED)
    - tcp_write_queue_requeue(skb, sk, TCP_WQ_SACKED);
    -
    TCP_SKB_CB(skb)->sacked &= ~TCPCB_SACKED_RETRANS;
    tp->retrans_out -= tcp_skb_pcount(skb);

    @@ -1181,6 +1171,39 @@ static void tcp_mark_lost_retrans(struct sock *sk)
    tp->lost_retrans_low = new_low_seq;
    }

    +static int tcp_check_dsack(struct tcp_sock *tp, struct sk_buff *ack_skb,
    + struct tcp_sack_block_wire *sp, int num_sacks,
    + u32 prior_snd_una)
    +{
    + u32 start_seq_0 = ntohl(get_unaligned(&sp[0].start_seq));
    + u32 end_seq_0 = ntohl(get_unaligned(&sp[0].end_seq));
    + int dup_sack = 0;
    +
    + if (before(start_seq_0, TCP_SKB_CB(ack_skb)->ack_seq)) {
    + dup_sack = 1;
    + tcp_dsack_seen(tp);
    + NET_INC_STATS_BH(LINUX_MIB_TCPDSACKRECV);
    + } else if (num_sacks > 1) {
    + u32 end_seq_1 = ntohl(get_unaligned(&sp[1].end_seq));
    + u32 start_seq_1 = ntohl(get_unaligned(&sp[1].start_seq));
    +
    + if (!after(end_seq_0, end_seq_1) &&
    + !before(start_seq_0, start_seq_1)) {
    + dup_sack = 1;
    + tcp_dsack_seen(tp);
    + NET_INC_STATS_BH(LINUX_MIB_TCPDSACKOFORECV);
    + }
    + }
    +
    + /* D-SACK for already forgotten data... Do dumb counting. */
    + if (dup_sack &&
    + !after(end_seq_0, prior_snd_una) &&
    + after(end_seq_0, tp->undo_marker))
    + tp->undo_retrans--;
    +
    + return dup_sack;
    +}
    +
    /* Check if skb is fully within the SACK block. In presence of GSO skbs,
    * the incoming SACK may not exactly match but we can find smaller MSS
    * aligned portion of it that matches. Therefore we might need to fragment
    @@ -1214,15 +1237,11 @@ static int tcp_match_skb_to_sack(struct sock *sk, struct sk_buff *skb,
    }

    static int tcp_sacktag_one(struct sk_buff *skb, struct sock *sk,
    - int *reord, int dup_sack)
    + int *reord, int dup_sack, int fack_count)
    {
    struct tcp_sock *tp = tcp_sk(sk);
    u8 sacked = TCP_SKB_CB(skb)->sacked;
    int flag = 0;
    - int fack_count;
    -
    - fack_count = TCP_SKB_CB(skb)->fack_count -
    - TCP_SKB_CB(tcp_write_queue_head(sk))->fack_count;

    /* Account D-SACK for retransmitted packet. */
    if (dup_sack && (sacked & TCPCB_RETRANS)) {
    @@ -1274,28 +1293,23 @@ static int tcp_sacktag_one(struct sk_buff *skb, struct sock *sk,
    }
    }

    - fack_count += tcp_skb_pcount(skb);
    - if (!before(TCP_SKB_CB(skb)->seq, tcp_highest_sack_seq(tp))) {
    - WARN_ON((fack_count <= tp->fackets_out) ||
    - (fack_count > tp->packets_out));
    -
    - tcp_advance_highest_sack(sk, skb);
    - tp->fackets_out = fack_count;
    - } else
    - WARN_ON(fack_count > tp->fackets_out);
    -
    - tcp_write_queue_requeue(skb, sk, TCP_WQ_SACKED);
    -
    TCP_SKB_CB(skb)->sacked |= TCPCB_SACKED_ACKED;
    flag |= FLAG_DATA_SACKED;
    tp->sacked_out += tcp_skb_pcount(skb);

    + fack_count += tcp_skb_pcount(skb);
    +
    /* Lost marker hint past SACKed? Tweak RFC3517 cnt */
    if (!tcp_is_fack(tp) && (tp->lost_skb_hint != NULL) &&
    before(TCP_SKB_CB(skb)->seq,
    TCP_SKB_CB(tp->lost_skb_hint)->seq))
    tp->lost_cnt_hint += tcp_skb_pcount(skb);

    + if (fack_count > tp->fackets_out)
    + tp->fackets_out = fack_count;
    +
    + if (!before(TCP_SKB_CB(skb)->seq, tcp_highest_sack_seq(tp)))
    + tcp_advance_highest_sack(sk, skb);
    }

    /* D-SACK. We can detect redundant retransmission in S|R and plain R
    @@ -1303,8 +1317,6 @@ static int tcp_sacktag_one(struct sk_buff *skb, struct sock *sk,
    * are accounted above as well.
    */
    if (dup_sack && (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_RETRANS)) {
    - tcp_write_queue_requeue(skb, sk, TCP_WQ_SACKED);
    -
    TCP_SKB_CB(skb)->sacked &= ~TCPCB_SACKED_RETRANS;
    tp->retrans_out -= tcp_skb_pcount(skb);
    tp->retransmit_skb_hint = NULL;
    @@ -1314,14 +1326,14 @@ static int tcp_sacktag_one(struct sk_buff *skb, struct sock *sk,
    }

    static struct sk_buff *tcp_sacktag_walk(struct sk_buff *skb, struct sock *sk,
    + struct tcp_sack_block *next_dup,
    u32 start_seq, u32 end_seq,
    - int dup_sack, int *reord, int *flag,
    - int queue)
    + int dup_sack_in, int *fack_count,
    + int *reord, int *flag)
    {
    - struct sk_buff *next;
    -
    - tcp_for_write_queue_from_safe(skb, next, sk, queue) {
    + tcp_for_write_queue_from(skb, sk) {
    int in_sack = 0;
    + int dup_sack = dup_sack_in;

    if (skb == tcp_send_head(sk))
    break;
    @@ -1330,12 +1342,24 @@ static struct sk_buff *tcp_sacktag_walk(struct sk_buff *skb, struct sock *sk,
    if (!before(TCP_SKB_CB(skb)->seq, end_seq))
    break;

    - in_sack = tcp_match_skb_to_sack(sk, skb, start_seq, end_seq);
    + if ((next_dup != NULL) &&
    + before(TCP_SKB_CB(skb)->seq, next_dup->end_seq)) {
    + in_sack = tcp_match_skb_to_sack(sk, skb,
    + next_dup->start_seq,
    + next_dup->end_seq);
    + if (in_sack > 0)
    + dup_sack = 1;
    + }
    +
    + if (in_sack <= 0)
    + in_sack = tcp_match_skb_to_sack(sk, skb, start_seq, end_seq);
    if (unlikely(in_sack < 0))
    break;

    if (in_sack)
    - *flag |= tcp_sacktag_one(skb, sk, reord, dup_sack);
    + *flag |= tcp_sacktag_one(skb, sk, reord, dup_sack, *fack_count);
    +
    + *fack_count += tcp_skb_pcount(skb);
    }
    return skb;
    }
    @@ -1343,72 +1367,37 @@ static struct sk_buff *tcp_sacktag_walk(struct sk_buff *skb, struct sock *sk,
    /* Avoid all extra work that is being done by sacktag while walking in
    * a normal way
    */
    -static struct sk_buff *tcp_sacktag_skip(struct sock *sk, u32 skip_to_seq)
    +static struct sk_buff *tcp_sacktag_skip(struct sk_buff *skb, struct sock *sk,
    + u32 skip_to_seq)
    {
    - struct sk_buff *skb;
    + tcp_for_write_queue_from(skb, sk) {
    + if (skb == tcp_send_head(sk))
    + break;

    - skb = tcp_write_queue_find(sk, skip_to_seq, 0);
    - if (skb == tcp_write_queue_head(sk))
    - skb = NULL;
    + if (!before(TCP_SKB_CB(skb)->end_seq, skip_to_seq))
    + break;
    + }
    return skb;
    }

    -static int tcp_handle_dsack(struct sock *sk, struct sk_buff *ack_skb,
    - struct tcp_sack_block_wire *sp, u32 *reord,
    - int num_sacks, u32 prior_snd_una)
    +static struct sk_buff *tcp_maybe_skipping_dsack(struct sk_buff *skb,
    + struct sock *sk,
    + struct tcp_sack_block *next_dup,
    + u32 skip_to_seq,
    + int *fack_count, int *reord,
    + int *flag)
    {
    - struct tcp_sock *tp = tcp_sk(sk);
    - struct sk_buff *skb;
    - u32 start_seq_0 = ntohl(get_unaligned(&sp[0].start_seq));
    - u32 end_seq_0 = ntohl(get_unaligned(&sp[0].end_seq));
    - int flag = 0;
    + if (next_dup == NULL)
    + return skb;

    - if (before(start_seq_0, TCP_SKB_CB(ack_skb)->ack_seq)) {
    - flag |= FLAG_DSACKING_ACK;
    - tcp_dsack_seen(tp);
    - NET_INC_STATS_BH(LINUX_MIB_TCPDSACKRECV);
    -
    - if (!tcp_is_past_dsack_useful(tp, start_seq_0, end_seq_0)) {
    - if (!tp->undo_marker)
    - NET_INC_STATS_BH(LINUX_MIB_TCPDSACKIGNOREDNOUNDO);
    - else
    - NET_INC_STATS_BH(LINUX_MIB_TCPDSACKIGNOREDOLD);
    -
    - return flag;
    - }
    -
    - /* D-SACK for already forgotten data... Do dumb counting. */
    - if (!after(end_seq_0, prior_snd_una))
    - tp->undo_retrans--;
    -
    - } else if (num_sacks > 1) {
    - u32 end_seq_1 = ntohl(get_unaligned(&sp[1].end_seq));
    - u32 start_seq_1 = ntohl(get_unaligned(&sp[1].start_seq));
    -
    - if (!after(end_seq_0, end_seq_1) &&
    - !before(start_seq_0, start_seq_1)) {
    - flag |= FLAG_DSACKING_ACK;
    - tcp_dsack_seen(tp);
    - NET_INC_STATS_BH(LINUX_MIB_TCPDSACKOFORECV);
    - if (!tcp_is_sackblock_valid(tp, start_seq_0, end_seq_0)) {
    - /* FIXME, reordering check like in the other place! */
    - NET_INC_STATS_BH(LINUX_MIB_TCPSACKDISCARD);
    - return flag;
    - }
    - }
    + if (before(next_dup->start_seq, skip_to_seq)) {
    + skb = tcp_sacktag_skip(skb, sk, next_dup->start_seq);
    + tcp_sacktag_walk(skb, sk, NULL,
    + next_dup->start_seq, next_dup->end_seq,
    + 1, fack_count, reord, flag);
    }

    - if ((flag & FLAG_DSACKING_ACK) && after(end_seq_0, prior_snd_una)) {
    - skb = tcp_write_queue_find(sk, start_seq_0, TCP_WQ_SACKED);
    - if (skb != NULL)
    - tcp_sacktag_walk(skb, sk, start_seq_0, end_seq_0, 1, reord, &flag, TCP_WQ_SACKED);
    -
    - skb = tcp_write_queue_find(sk, start_seq_0, 0);
    - if (skb != NULL)
    - tcp_sacktag_walk(skb, sk, start_seq_0, end_seq_0, 1, reord, &flag, 0);
    - }
    -
    - return flag;
    + return skb;
    }

    static int tcp_sack_cache_ok(struct tcp_sock *tp, struct tcp_sack_block *cache)
    @@ -1431,7 +1420,10 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_
    int used_sacks;
    int reord = tp->packets_out;
    int flag = 0;
    + int found_dup_sack = 0;
    + int fack_count;
    int i, j;
    + int first_sack_index;

    if (!tp->sacked_out) {
    if (WARN_ON(tp->fackets_out))
    @@ -1439,7 +1431,10 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_
    tcp_highest_sack_reset(sk);
    }

    - flag |= tcp_handle_dsack(sk, ack_skb, sp_wire, &reord, num_sacks, prior_snd_una);
    + found_dup_sack = tcp_check_dsack(tp, ack_skb, sp_wire,
    + num_sacks, prior_snd_una);
    + if (found_dup_sack)
    + flag |= FLAG_DSACKING_ACK;

    /* Eliminate too old ACKs, but take into
    * account more or less fresh ones, they can
    @@ -1452,17 +1447,30 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_
    goto out;

    used_sacks = 0;
    - for (i = (flag & FLAG_DSACKING_ACK) ? 1 : 0; i < num_sacks; i++) {
    + first_sack_index = 0;
    + for (i = 0; i < num_sacks; i++) {
    + int dup_sack = !i && found_dup_sack;
    +
    sp[used_sacks].start_seq = ntohl(get_unaligned(&sp_wire[i].start_seq));
    sp[used_sacks].end_seq = ntohl(get_unaligned(&sp_wire[i].end_seq));

    - if (!tcp_is_sackblock_valid(tp, sp[used_sacks].start_seq,
    + if (!tcp_is_sackblock_valid(tp, dup_sack,
    + sp[used_sacks].start_seq,
    sp[used_sacks].end_seq)) {
    - /* Don't count olds caused by ACK reordering */
    - if ((TCP_SKB_CB(ack_skb)->ack_seq != tp->snd_una) &&
    - !after(sp[used_sacks].end_seq, tp->snd_una))
    - continue;
    - NET_INC_STATS_BH(LINUX_MIB_TCPSACKDISCARD);
    + if (dup_sack) {
    + if (!tp->undo_marker)
    + NET_INC_STATS_BH(LINUX_MIB_TCPDSACKIGNOREDNOUNDO);
    + else
    + NET_INC_STATS_BH(LINUX_MIB_TCPDSACKIGNOREDOLD);
    + } else {
    + /* Don't count olds caused by ACK reordering */
    + if ((TCP_SKB_CB(ack_skb)->ack_seq != tp->snd_una) &&
    + !after(sp[used_sacks].end_seq, tp->snd_una))
    + continue;
    + NET_INC_STATS_BH(LINUX_MIB_TCPSACKDISCARD);
    + }
    + if (i == 0)
    + first_sack_index = -1;
    continue;
    }

    @@ -1482,11 +1490,16 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_
    tmp = sp[j];
    sp[j] = sp[j+1];
    sp[j+1] = tmp;
    +
    + /* Track where the first SACK block goes to */
    + if (j == first_sack_index)
    + first_sack_index = j+1;
    }
    }
    }

    skb = tcp_write_queue_head(sk);
    + fack_count = 0;
    i = 0;

    if (!tp->sacked_out) {
    @@ -1503,6 +1516,11 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_
    while (i < used_sacks) {
    u32 start_seq = sp[i].start_seq;
    u32 end_seq = sp[i].end_seq;
    + int dup_sack = (found_dup_sack && (i == first_sack_index));
    + struct tcp_sack_block *next_dup = NULL;
    +
    + if (found_dup_sack && ((i + 1) == first_sack_index))
    + next_dup = &sp[i + 1];

    /* Event "B" in the comment above. */
    if (after(end_seq, tp->high_seq))
    @@ -1514,36 +1532,36 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_
    cache++;

    /* Can skip some work by looking recv_sack_cache? */
    - if (tcp_sack_cache_ok(tp, cache) &&
    + if (tcp_sack_cache_ok(tp, cache) && !dup_sack &&
    after(end_seq, cache->start_seq)) {

    /* Head todo? */
    if (before(start_seq, cache->start_seq)) {
    - skb = tcp_sacktag_skip(sk, start_seq);
    - if (skb == NULL)
    - break;
    - skb = tcp_sacktag_walk(skb, sk, start_seq,
    - cache->start_seq, 0,
    - &reord, &flag, 0);
    + skb = tcp_sacktag_skip(skb, sk, start_seq);
    + skb = tcp_sacktag_walk(skb, sk, next_dup, start_seq,
    + cache->start_seq, dup_sack,
    + &fack_count, &reord, &flag);
    }

    /* Rest of the block already fully processed? */
    if (!after(end_seq, cache->end_seq))
    goto advance_sp;

    + skb = tcp_maybe_skipping_dsack(skb, sk, next_dup, cache->end_seq,
    + &fack_count, &reord, &flag);
    +
    /* ...tail remains todo... */
    if (tcp_highest_sack_seq(tp) == cache->end_seq) {
    /* ...but better entrypoint exists! */
    skb = tcp_highest_sack(sk);
    if (skb == NULL)
    break;
    + fack_count = tp->fackets_out;
    cache++;
    goto walk;
    }

    - skb = tcp_sacktag_skip(sk, cache->end_seq);
    - if (skb == NULL)
    - break;
    + skb = tcp_sacktag_skip(skb, sk, cache->end_seq);
    /* Check overlap against next cached too (past this one already) */
    cache++;
    continue;
    @@ -1553,14 +1571,13 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_
    skb = tcp_highest_sack(sk);
    if (skb == NULL)
    break;
    + fack_count = tp->fackets_out;
    }
    - skb = tcp_sacktag_skip(sk, start_seq);
    - if (skb == NULL)
    - break;
    + skb = tcp_sacktag_skip(skb, sk, start_seq);

    walk:
    - skb = tcp_sacktag_walk(skb, sk, start_seq, end_seq,
    - 0, &reord, &flag, 0);
    + skb = tcp_sacktag_walk(skb, sk, next_dup, start_seq, end_seq,
    + dup_sack, &fack_count, &reord, &flag);

    advance_sp:
    /* SACK enhanced FRTO (RFC4138, Appendix B): Clearing correct
    @@ -1657,7 +1674,6 @@ int tcp_use_frto(struct sock *sk)
    {
    const struct tcp_sock *tp = tcp_sk(sk);
    struct sk_buff *skb;
    - struct sk_buff *notsacked; /* Or S|R => deny basic F-RTO */

    if (!sysctl_tcp_frto)
    return 0;
    @@ -1669,19 +1685,15 @@ int tcp_use_frto(struct sock *sk)
    if (tp->retrans_out > 1)
    return 0;

    - notsacked = tcp_write_queue_head(sk);
    - /* Not interested in head skb here because F-RTO is reentrable if only
    - * head skb has been retransmitted (equals to multiple RTOs case)
    - */
    - notsacked = tcp_write_queue_next(sk, notsacked);
    - if ((notsacked != NULL) && TCP_SKB_CB(notsacked)->sacked & TCPCB_RETRANS)
    - return 0;
    -
    - tcp_for_write_queue(skb, sk, TCP_WQ_SACKED) {
    + skb = tcp_write_queue_head(sk);
    + skb = tcp_write_queue_next(sk, skb); /* Skips head */
    + tcp_for_write_queue_from(skb, sk) {
    + if (skb == tcp_send_head(sk))
    + break;
    if (TCP_SKB_CB(skb)->sacked&TCPCB_RETRANS)
    return 0;
    - /* Short-circuit when past first non-SACKed skb */
    - if (after(TCP_SKB_CB(skb)->seq, TCP_SKB_CB(notsacked)->seq))
    + /* Short-circuit when first non-SACKed skb has been checked */
    + if (!(TCP_SKB_CB(skb)->sacked&TCPCB_SACKED_ACKED))
    break;
    }
    return 1;
    @@ -1782,7 +1794,7 @@ static void tcp_enter_frto_loss(struct sock *sk, int allowed_segments, int flag)
    if (tcp_is_reno(tp))
    tcp_reset_reno_sack(tp);

    - tcp_for_write_queue(skb, sk, 0) {
    + tcp_for_write_queue(skb, sk) {
    if (skb == tcp_send_head(sk))
    break;

    @@ -1880,16 +1892,9 @@ void tcp_enter_loss(struct sock *sk, int how)
    tp->sacked_out = 0;
    tp->fackets_out = 0;
    tcp_clear_all_retrans_hints(tp);
    -
    - tcp_for_write_queue(skb, sk, TCP_WQ_SACKED) {
    - /* FIXME, this could be optimized by avoiding tree
    - * deletes
    - */
    - tcp_write_queue_requeue(skb, sk, 0);
    - }
    }

    - tcp_for_write_queue(skb, sk, 0) {
    + tcp_for_write_queue(skb, sk) {
    if (skb == tcp_send_head(sk))
    break;

    @@ -1923,7 +1928,7 @@ static int tcp_check_sack_reneging(struct sock *sk)
    * receiver _host_ is heavily congested (or buggy).
    * Do processing similar to RTO timeout.
    */
    - if ((skb = tcp_real_queue_head(sk)) != NULL &&
    + if ((skb = tcp_write_queue_head(sk)) != NULL &&
    (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED)) {
    struct inet_connection_sock *icsk = inet_csk(sk);
    NET_INC_STATS_BH(LINUX_MIB_TCPSACKRENEGING);
    @@ -2122,21 +2127,6 @@ static void tcp_verify_retransmit_hint(struct tcp_sock *tp,
    tp->retransmit_skb_hint = NULL;
    }

    -/* Simple NewReno thing: Mark head LOST if it wasn't yet and it's below
    - * high_seq, stop. That's all.
    - */
    -static void tcp_mark_head_lost_single(struct sock *sk)
    -{
    - struct tcp_sock *tp = tcp_sk(sk);
    - struct sk_buff *skb = tcp_write_queue_head(sk);
    -
    - if (!(TCP_SKB_CB(skb)->sacked & TCPCB_LOST) &&
    - before(tp->snd_una, tp->high_seq)) {
    - TCP_SKB_CB(skb)->sacked |= TCPCB_LOST;
    - tp->lost_out += tcp_skb_pcount(skb);
    - }
    -}
    -
    /* Mark head of queue up as lost. With RFC3517 SACK, the packets is
    * is against sacked "cnt", otherwise it's against facked "cnt"
    */
    @@ -2145,10 +2135,6 @@ static void tcp_mark_head_lost(struct sock *sk, int packets, int fast_rexmit)
    struct tcp_sock *tp = tcp_sk(sk);
    struct sk_buff *skb;
    int cnt;
    - unsigned int fc;
    - unsigned int fack_count_base;
    -
    - fack_count_base = TCP_SKB_CB(tcp_write_queue_head(sk))->fack_count;

    BUG_TRAP(packets <= tp->packets_out);
    if (tp->lost_skb_hint) {
    @@ -2159,7 +2145,7 @@ static void tcp_mark_head_lost(struct sock *sk, int packets, int fast_rexmit)
    cnt = 0;
    }

    - tcp_for_write_queue_from(skb, sk, 0) {
    + tcp_for_write_queue_from(skb, sk) {
    if (skb == tcp_send_head(sk))
    break;
    /* TODO: do this better */
    @@ -2167,18 +2153,9 @@ static void tcp_mark_head_lost(struct sock *sk, int packets, int fast_rexmit)
    tp->lost_skb_hint = skb;
    tp->lost_cnt_hint = cnt;

    - fc = TCP_SKB_CB(skb)->fack_count;
    - if (tcp_is_fack(tp)) {
    - cnt = fc - fack_count_base + tcp_skb_pcount(skb);
    - } else {
    - if (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED)
    - cnt += tcp_skb_pcount(skb);
    - /* Add SACK blocks between this and skb->prev */
    - if ((skb != tcp_write_queue_head(sk)) &&
    - !tcp_skb_adjacent(sk, skb->prev, skb))
    - cnt += fc - TCP_SKB_CB(skb->prev)->fack_count -
    - tcp_skb_pcount(skb->prev);
    - }
    + if (tcp_is_fack(tp) ||
    + (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED))
    + cnt += tcp_skb_pcount(skb);

    if (((!fast_rexmit || (tp->lost_out > 0)) && (cnt > packets)) ||
    after(TCP_SKB_CB(skb)->end_seq, tp->high_seq))
    @@ -2189,6 +2166,7 @@ static void tcp_mark_head_lost(struct sock *sk, int packets, int fast_rexmit)
    tcp_verify_retransmit_hint(tp, skb);
    }
    }
    + tcp_verify_left_out(tp);
    }

    /* Account newly detected lost packet(s) */
    @@ -2198,7 +2176,7 @@ static void tcp_update_scoreboard(struct sock *sk, int fast_rexmit)
    struct tcp_sock *tp = tcp_sk(sk);

    if (tcp_is_reno(tp)) {
    - tcp_mark_head_lost_single(sk);
    + tcp_mark_head_lost(sk, 1, fast_rexmit);
    } else if (tcp_is_fack(tp)) {
    int lost = tp->fackets_out - tp->reordering;
    if (lost <= 0)
    @@ -2211,8 +2189,6 @@ static void tcp_update_scoreboard(struct sock *sk, int fast_rexmit)
    tcp_mark_head_lost(sk, sacked_upto, fast_rexmit);
    }

    - tcp_verify_left_out(tp);
    -
    /* New heuristics: it is possible only after we switched
    * to restart timer each time when something is ACKed.
    * Hence, we can detect timed out packets during fast
    @@ -2224,7 +2200,7 @@ static void tcp_update_scoreboard(struct sock *sk, int fast_rexmit)
    skb = tp->scoreboard_skb_hint ? tp->scoreboard_skb_hint
    : tcp_write_queue_head(sk);

    - tcp_for_write_queue_from(skb, sk, 0) {
    + tcp_for_write_queue_from(skb, sk) {
    if (skb == tcp_send_head(sk))
    break;
    if (!tcp_skb_timedout(sk, skb))
    @@ -2422,7 +2398,7 @@ static int tcp_try_undo_loss(struct sock *sk)

    if (tcp_may_undo(tp)) {
    struct sk_buff *skb;
    - tcp_for_write_queue(skb, sk, 0) {
    + tcp_for_write_queue(skb, sk) {
    if (skb == tcp_send_head(sk))
    break;
    TCP_SKB_CB(skb)->sacked &= ~TCPCB_LOST;
    @@ -2528,8 +2504,11 @@ tcp_fastretrans_alert(struct sock *sk, int pkts_acked, int flag)
    (tcp_fackets_out(tp) > tp->reordering));
    int fast_rexmit = 0;

    - if (WARN_ON(!tp->packets_out && tp->sacked_out))
    + /* Some technical things:
    + * 1. Reno does not count dupacks (sacked_out) automatically. */
    + if (!tp->packets_out)
    tp->sacked_out = 0;
    +
    if (WARN_ON(!tp->sacked_out && tp->fackets_out))
    tp->fackets_out = 0;

    @@ -2794,7 +2773,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p,
    s32 seq_rtt = -1;
    ktime_t last_ackt = net_invalid_timestamp();

    - while ((skb = tcp_real_queue_head(sk)) && skb != tcp_send_head(sk)) {
    + while ((skb = tcp_write_queue_head(sk)) && skb != tcp_send_head(sk)) {
    struct tcp_skb_cb *scb = TCP_SKB_CB(skb);
    u32 end_seq;
    u32 acked_pcount;
    diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
    index 5a27e42..652c323 100644
    --- a/net/ipv4/tcp_ipv4.c
    +++ b/net/ipv4/tcp_ipv4.c
    @@ -1849,7 +1849,6 @@ static int tcp_v4_init_sock(struct sock *sk)
    struct inet_connection_sock *icsk = inet_csk(sk);
    struct tcp_sock *tp = tcp_sk(sk);

    - tcp_write_queue_init(sk);
    skb_queue_head_init(&tp->out_of_order_queue);
    tcp_init_xmit_timers(sk);
    tcp_prequeue_init(tp);
    diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
    index e1a0e4a..b61b768 100644
    --- a/net/ipv4/tcp_minisocks.c
    +++ b/net/ipv4/tcp_minisocks.c
    @@ -426,7 +426,6 @@ struct sock *tcp_create_openreq_child(struct sock *sk, struct request_sock *req,

    tcp_set_ca_state(newsk, TCP_CA_Open);
    tcp_init_xmit_timers(newsk);
    - tcp_write_queue_init(newsk);
    skb_queue_head_init(&newtp->out_of_order_queue);
    newtp->write_seq = treq->snt_isn + 1;
    newtp->pushed_seq = newtp->write_seq;
    diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
    index 6110459..9a985b5 100644
    --- a/net/ipv4/tcp_output.c
    +++ b/net/ipv4/tcp_output.c
    @@ -1207,7 +1207,7 @@ static int tso_fragment(struct sock *sk, struct sk_buff *skb, unsigned int len,

    /* Link BUFF into the send queue. */
    skb_header_release(buff);
    - __tcp_insert_write_queue_after(skb, buff, sk, 0);
    + tcp_insert_write_queue_after(skb, buff, sk);

    return 0;
    }
    @@ -1344,10 +1344,10 @@ static int tcp_mtu_probe(struct sock *sk)
    nskb->csum = 0;
    nskb->ip_summed = skb->ip_summed;

    - __tcp_insert_write_queue_before(nskb, skb, sk);
    + tcp_insert_write_queue_before(nskb, skb, sk);

    len = 0;
    - tcp_for_write_queue_from_safe(skb, next, sk, 0) {
    + tcp_for_write_queue_from_safe(skb, next, sk) {
    copy = min_t(int, skb->len, probe_size - len);
    if (nskb->ip_summed)
    skb_copy_bits(skb, 0, skb_put(nskb, copy), copy);
    @@ -1760,7 +1760,7 @@ void tcp_simple_retransmit(struct sock *sk)
    unsigned int mss = tcp_current_mss(sk, 0);
    int lost = 0;

    - tcp_for_write_queue(skb, sk, 0) {
    + tcp_for_write_queue(skb, sk) {
    if (skb == tcp_send_head(sk))
    break;
    if (skb->len > mss &&
    @@ -1848,7 +1848,6 @@ int tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb)
    (skb->len < (cur_mss >> 1)) &&
    (tcp_write_queue_next(sk, skb) != tcp_send_head(sk)) &&
    (!tcp_skb_is_last(sk, skb)) &&
    - (tcp_skb_adjacent(sk, skb, tcp_write_queue_next(sk, skb))) &&
    (skb_shinfo(skb)->nr_frags == 0 && skb_shinfo(tcp_write_queue_next(sk, skb))->nr_frags == 0) &&
    (tcp_skb_pcount(skb) == 1 && tcp_skb_pcount(tcp_write_queue_next(sk, skb)) == 1) &&
    (sysctl_tcp_retrans_collapse != 0))
    @@ -1937,7 +1936,7 @@ void tcp_xmit_retransmit_queue(struct sock *sk)

    /* First pass: retransmit lost packets. */
    if (tp->lost_out) {
    - tcp_for_write_queue_from(skb, sk, 0) {
    + tcp_for_write_queue_from(skb, sk) {
    __u8 sacked = TCP_SKB_CB(skb)->sacked;

    if (skb == tcp_send_head(sk))
    @@ -2010,7 +2009,7 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
    else
    skb = tcp_write_queue_head(sk);

    - tcp_for_write_queue_from(skb, sk, 0) {
    + tcp_for_write_queue_from(skb, sk) {
    if (skb == tcp_send_head(sk))
    break;
    tp->forward_skb_hint = skb;
    diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
    index d576833..0ef9986 100644
    --- a/net/ipv6/tcp_ipv6.c
    +++ b/net/ipv6/tcp_ipv6.c
    @@ -1886,7 +1886,6 @@ static int tcp_v6_init_sock(struct sock *sk)
    struct inet_connection_sock *icsk = inet_csk(sk);
    struct tcp_sock *tp = tcp_sk(sk);

    - tcp_write_queue_init(sk);
    skb_queue_head_init(&tp->out_of_order_queue);
    tcp_init_xmit_timers(sk);
    tcp_prequeue_init(tp);
    --
    1.5.0.6

  11. Re: [PATCH net-2.6.25] Revert recent TCP work

    From: "Ilpo_Järvinen"
    Date: Fri, 14 Dec 2007 22:14:29 +0200 (EET)

    > Could you either drop my recent patches (+one fix to them from Herbert
    > Xu == "[TCP]: Fix crash in tcp_advance_send_head"), all mine after "[TCP]:
    > Abstract tp->highest_sack accessing & point to next skb" from net-2.6.25
    > or just apply the revert from below and do the removal during next rebase.
    > I think it could even be automated by something like this (untested):
    > for i in $(cat commits | cut -d ' ' -f 1); do git-rebase --onto $i^ $i; done
    > (I've attached the commits list).


    I'll take care of this when I rebase the net-2.6.25 tree later
    today.

    Thanks Ilpo.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread
Page 6 of 6 FirstFirst ... 4 5 6