[tbench regression fixes]: digging out smelly deadmen. - Kernel

This is a discussion on [tbench regression fixes]: digging out smelly deadmen. - Kernel ; On Fri, 31 Oct 2008, Eric Dumazet wrote: > Ilpo Järvinen a écrit : > > On Fri, 31 Oct 2008, Eric Dumazet wrote: > > > > > David Miller a écrit : > > > > From: "Ilpo ...

+ Reply to Thread
Page 5 of 5 FirstFirst ... 3 4 5
Results 81 to 92 of 92

Thread: [tbench regression fixes]: digging out smelly deadmen.

  1. Re: [tbench regression fixes]: digging out smelly deadmen.

    On Fri, 31 Oct 2008, Eric Dumazet wrote:

    > Ilpo Järvinen a écrit :
    > > On Fri, 31 Oct 2008, Eric Dumazet wrote:
    > >
    > > > David Miller a écrit :
    > > > > From: "Ilpo Järvinen"
    > > > > Date: Fri, 31 Oct 2008 11:40:16 +0200 (EET)
    > > > > > > Let me remind that it is just a single process, so no ping-pong
    > >> & other
    > > > > > lock related cache effects should play any significant role here,
    > > > no? (I'm
    > > > > > no expert though :-)).
    > > > > > Not locks or ping-pongs perhaps, I guess. So it just sends and
    > > > > receives over a socket, implementing both ends of the communication
    > > > > in the same process?
    > > > > > If hash chain conflicts do happen for those 2 sockets, just
    > > > traversing
    > > > > the chain 2 entries deep could show up.
    > > >
    > > > tbench is very sensible to cache line ping-pongs (on SMP machines of
    > > > course)

    > >
    > > ...Sorry to disappoint you but we were discussion there on my AIM9 tcp_test
    > > results :-).
    > >

    >
    > Well, before you added AIM9 on this topic, we were focusing on tbench
    >
    > Sorry to disappoint you


    It's all Stephen's fault, he added port randomization first... ;-)

    --
    i.

  2. Re: [tbench regression fixes]: digging out smelly deadmen.

    On Fri, 31 Oct 2008 11:45:33 +0100
    Eric Dumazet wrote:

    > David Miller a écrit :
    > > From: "Ilpo Järvinen"
    > > Date: Fri, 31 Oct 2008 11:40:16 +0200 (EET)
    > >
    > >> Let me remind that it is just a single process, so no ping-pong & other
    > >> lock related cache effects should play any significant role here, no? (I'm
    > >> no expert though :-)).

    > >
    > > Not locks or ping-pongs perhaps, I guess. So it just sends and
    > > receives over a socket, implementing both ends of the communication
    > > in the same process?
    > >
    > > If hash chain conflicts do happen for those 2 sockets, just traversing
    > > the chain 2 entries deep could show up.

    >
    > tbench is very sensible to cache line ping-pongs (on SMP machines of course)
    >
    > Just to prove my point, I coded the following patch and tried it
    > on a HP BL460c G1. This machine has 2 quad cores cpu
    > (Intel(R) Xeon(R) CPU E5450 @3.00GHz)
    >
    > tbench 8 went from 2240 MB/s to 2310 MB/s after this patch applied
    >
    > [PATCH] net: Introduce netif_set_last_rx() helper
    >
    > On SMP machine, loopback device (and possibly others net device)
    > should try to avoid dirty the memory cache line containing "last_rx"
    > field. Got 3% increase on tbench on a 8 cpus machine.
    >
    > Signed-off-by: Eric Dumazet
    > ---
    > drivers/net/loopback.c | 2 +-
    > include/linux/netdevice.h | 16 ++++++++++++++++
    > 2 files changed, 17 insertions(+), 1 deletion(-)
    >
    >


    Why bother with last_rx at all on loopback. I have been thinking
    we should figure out a way to get rid of last_rx all together. It only
    seems to be used by bonding, and the bonding driver could do the calculation
    in its receive handling.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: [tbench regression fixes]: digging out smelly deadmen.

    On Fri, Oct 31, 2008 at 12:57:13PM -0700, Stephen Hemminger (shemminger@vyatta.com) wrote:
    > Why bother with last_rx at all on loopback. I have been thinking
    > we should figure out a way to get rid of last_rx all together. It only
    > seems to be used by bonding, and the bonding driver could do the calculation
    > in its receive handling.


    Not related to the regression: bug will be just papered out by this
    changes. Having bonding on loopback is somewhat strange idea, but still
    this kind of changes is an attempt to make a good play in the bad game:
    this loopback-only optimization does not fix the problem.

    --
    Evgeniy Polyakov
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: [tbench regression fixes]: digging out smelly deadmen.

    Evgeniy Polyakov a écrit :
    > On Fri, Oct 31, 2008 at 12:57:13PM -0700, Stephen Hemminger (shemminger@vyatta.com) wrote:
    >> Why bother with last_rx at all on loopback. I have been thinking
    >> we should figure out a way to get rid of last_rx all together. It only
    >> seems to be used by bonding, and the bonding driver could do the calculation
    >> in its receive handling.

    >
    > Not related to the regression: bug will be just papered out by this
    > changes. Having bonding on loopback is somewhat strange idea, but still
    > this kind of changes is an attempt to make a good play in the bad game:
    > this loopback-only optimization does not fix the problem.
    >


    Just to be clear, this change was not meant to be committed.
    It already was rejected by David some years ago (2005, and 2006)

    http://www.mail-archive.com/netdev@v.../msg07382.html

    If you read my mail, I was *only* saying that tbench results can be sensible to
    cache line ping pongs. tbench is a crazy benchmark, and only is a crazy benchmark.

    Optimizing linux for tbench sake would be .... crazy ?

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: [tbench regression fixes]: digging out smelly deadmen.

    Hi Eric.

    On Fri, Oct 31, 2008 at 10:03:00PM +0100, Eric Dumazet (dada1@cosmosbay.com) wrote:
    > Just to be clear, this change was not meant to be committed.
    > It already was rejected by David some years ago (2005, and 2006)
    >
    > http://www.mail-archive.com/netdev@v.../msg07382.html
    >
    > If you read my mail, I was *only* saying that tbench results can be
    > sensible to
    > cache line ping pongs. tbench is a crazy benchmark, and only is a crazy
    > benchmark.


    No problem Eric, I just pointed that this particular case is rather
    fluffy, which really does not fix anything. It improves the case, but
    the way it does it, is not the right one imho.
    We would definitely want to eliminate assignment of global constantly
    updated variables in the pathes where it is not required, but in a
    way which does improve the design and implementation, but not to
    hide some other problem.

    Tbench is, well, as is it is quite usual network server
    Dbench side is rather non-optimized, but still it is quite common
    pattern of small-sized IO. Anyway, optimizing for some kind of the
    workload tends to force other side to become slower, so I agree of
    course that any narrow-viewed optimizations are bad, and instead we
    should focus on searching error patter more widerspread.

    --
    Evgeniy Polyakov
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: [tbench regression fixes]: digging out smelly deadmen.

    On Fri, 31 Oct 2008 16:51:44 -0700 (PDT)
    David Miller wrote:

    > From: Eric Dumazet
    > Date: Fri, 31 Oct 2008 22:03:00 +0100
    >
    > > Evgeniy Polyakov a écrit :
    > > > On Fri, Oct 31, 2008 at 12:57:13PM -0700, Stephen Hemminger (shemminger@vyatta.com) wrote:
    > > >> Why bother with last_rx at all on loopback. I have been thinking
    > > >> we should figure out a way to get rid of last_rx all together. It only
    > > >> seems to be used by bonding, and the bonding driver could do the calculation
    > > >> in its receive handling.
    > > > Not related to the regression: bug will be just papered out by this
    > > > changes. Having bonding on loopback is somewhat strange idea, but still
    > > > this kind of changes is an attempt to make a good play in the bad game:
    > > > this loopback-only optimization does not fix the problem.

    > >
    > > Just to be clear, this change was not meant to be committed.
    > > It already was rejected by David some years ago (2005, and 2006)
    > >
    > > http://www.mail-archive.com/netdev@v.../msg07382.html

    >
    > However, I do like Stephen's suggestion that maybe we can get rid of
    > this ->last_rx thing by encapsulating the logic completely in the
    > bonding driver.


    Since bonding driver doesn't actually see the rx packets, that isn't
    really possible. But it would be possible to change last_rx from a variable
    to an function pointer, so that device's could apply other logic to derive
    the last value. One example would be to keep it per cpu and then take the
    maximum.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: [tbench regression fixes]: digging out smelly deadmen.

    From: Eric Dumazet
    Date: Fri, 31 Oct 2008 22:03:00 +0100

    > Evgeniy Polyakov a écrit :
    > > On Fri, Oct 31, 2008 at 12:57:13PM -0700, Stephen Hemminger (shemminger@vyatta.com) wrote:
    > >> Why bother with last_rx at all on loopback. I have been thinking
    > >> we should figure out a way to get rid of last_rx all together. It only
    > >> seems to be used by bonding, and the bonding driver could do the calculation
    > >> in its receive handling.

    > > Not related to the regression: bug will be just papered out by this
    > > changes. Having bonding on loopback is somewhat strange idea, but still
    > > this kind of changes is an attempt to make a good play in the bad game:
    > > this loopback-only optimization does not fix the problem.

    >
    > Just to be clear, this change was not meant to be committed.
    > It already was rejected by David some years ago (2005, and 2006)
    >
    > http://www.mail-archive.com/netdev@v.../msg07382.html


    However, I do like Stephen's suggestion that maybe we can get rid of
    this ->last_rx thing by encapsulating the logic completely in the
    bonding driver.

    > If you read my mail, I was *only* saying that tbench results can be sensible to
    > cache line ping pongs. tbench is a crazy benchmark, and only is a crazy benchmark.
    >
    > Optimizing linux for tbench sake would be .... crazy ?


    Unlike dbench I think tbench is worth cranking up as much as possible.

    It doesn't have a huge memory working set, it just writes mostly small
    messages over a TCP socket back and forth, and does a lot of blocking

    And I think we'd like all of those operating to run as fast as possible.

    When Tridge first wrote tbench I would see the expected things at the
    top of the profiles. Things like tcp_ack(), copy to/from user, and
    perhaps SLAB.

    Things have changed considerably.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: [tbench regression fixes]: digging out smelly deadmen.

    Stephen Hemminger wrote:

    >On Fri, 31 Oct 2008 16:51:44 -0700 (PDT)
    >David Miller wrote:

    [...]
    >> However, I do like Stephen's suggestion that maybe we can get rid of
    >> this ->last_rx thing by encapsulating the logic completely in the
    >> bonding driver.

    >
    >Since bonding driver doesn't actually see the rx packets, that isn't
    >really possible. But it would be possible to change last_rx from a variable
    >to an function pointer, so that device's could apply other logic to derive
    >the last value. One example would be to keep it per cpu and then take the
    >maximum.


    I suspect it could also be tucked away in skb_bond_should_drop,
    which is called both by the standard input path and the VLAN accelerated
    path to see if the packet should be tossed (e.g., it arrived on an
    inactive bonding slave).

    Since last_rx is part of struct net_device, I don't think any
    additional bonding internals knowledge would be needed. It could be
    arranged to only update last_rx for devices that are actually bonding
    slaves.

    Just off the top of my head (haven't tested this), something
    like this:

    diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
    index c8bcb59..ed1e58f 100644
    --- a/include/linux/netdevice.h
    +++ b/include/linux/netdevice.h
    @@ -1743,22 +1743,24 @@ static inline int skb_bond_should_drop(struct sk_buff *skb)
    struct net_device *dev = skb->dev;
    struct net_device *master = dev->master;

    - if (master &&
    - (dev->priv_flags & IFF_SLAVE_INACTIVE)) {
    - if ((dev->priv_flags & IFF_SLAVE_NEEDARP) &&
    - skb->protocol == __constant_htons(ETH_P_ARP))
    - return 0;
    -
    - if (master->priv_flags & IFF_MASTER_ALB) {
    - if (skb->pkt_type != PACKET_BROADCAST &&
    - skb->pkt_type != PACKET_MULTICAST)
    + if (master) {
    + dev->last_rx = jiffies;
    + if (dev->priv_flags & IFF_SLAVE_INACTIVE)) {
    + if ((dev->priv_flags & IFF_SLAVE_NEEDARP) &&
    + skb->protocol == __constant_htons(ETH_P_ARP))
    return 0;
    - }
    - if (master->priv_flags & IFF_MASTER_8023AD &&
    - skb->protocol == __constant_htons(ETH_P_SLOW))
    - return 0;

    - return 1;
    + if (master->priv_flags & IFF_MASTER_ALB) {
    + if (skb->pkt_type != PACKET_BROADCAST &&
    + skb->pkt_type != PACKET_MULTICAST)
    + return 0;
    + }
    + if (master->priv_flags & IFF_MASTER_8023AD &&
    + skb->protocol == __constant_htons(ETH_P_SLOW))
    + return 0;
    +
    + return 1;
    + }
    }
    return 0;
    }


    That doesn't move the storage out of struct net_device, but it
    does stop the updates for devices that aren't bonding slaves. It could
    probably be refined further to only update when the ARP monitor is
    running (the gizmo that uses last_rx).

    -J

    ---
    -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: [tbench regression fixes]: digging out smelly deadmen.

    On Wed, Oct 29, 2008 at 10:50 AM, Evgeniy Polyakov wrote:
    > On Wed, Oct 29, 2008 at 12:14:05PM +0300, Evgeniy Polyakov (zbr@ioremap.net) wrote:
    >> vanilla 27 : 347.222
    >> no TSO/GSO : 357.331
    >> no hrticks : 382.983
    >> no balance : 389.802
    >> 4403b4 commit : 361.184
    >> dirty_ratio-50 : 361.086
    >> no-sched-tweaks : 361.367
    >>
    >> So, probably, if we revert -tip merge to vanilla .27, add nohrtick patch
    >> and nobalance tweak _only_, and apply naive TSO patch we could bring
    >> system to 400 MB/s. Note, that .22 has 479.82 and .23 454.36 MB/s.

    >
    > And now I have to admit that the very last -top merge did noticebly
    > improve the situation upto 391.331 MB/s (189 in domains, with tso/gso
    > off and naive tcp_tso_should_defer() hange).
    >
    > So we are now essentially at the level of 24-25 trees in my tests.


    That's good and make think whether it would be a good idea to add some
    performance number in each pull request that affect the core part of
    the kernel.

    Ciao,
    --
    Paolo
    http://paolo.ciarrocchi.googlepages.com/
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: [tbench regression fixes]: digging out smelly deadmen.

    From: Jay Vosburgh
    Date: Fri, 31 Oct 2008 17:16:33 -0700

    > I suspect it could also be tucked away in skb_bond_should_drop,
    > which is called both by the standard input path and the VLAN accelerated
    > path to see if the packet should be tossed (e.g., it arrived on an
    > inactive bonding slave).
    >
    > Since last_rx is part of struct net_device, I don't think any
    > additional bonding internals knowledge would be needed. It could be
    > arranged to only update last_rx for devices that are actually bonding
    > slaves.
    >
    > Just off the top of my head (haven't tested this), something
    > like this:

    ...
    >
    > That doesn't move the storage out of struct net_device, but it
    > does stop the updates for devices that aren't bonding slaves. It could
    > probably be refined further to only update when the ARP monitor is
    > running (the gizmo that uses last_rx).


    I like this very much.

    Jay can you give this a quick test by just trying this patch
    and removing the ->last_rx setting in the driver you use for
    your test?

    Once you do that, I'll apply this to net-next-2.6 and do the
    leg work to zap all of the ->last_rx updates from the entire tree.

    Thanks!
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. [PATCH net-next-2.6] bonding, net: Move last_rx update into bonding recv logic


    The only user of the net_device->last_rx field is bonding. This
    patch adds a conditional update of last_rx to the bonding special logic
    in skb_bond_should_drop, causing last_rx to only be updated when the ARP
    monitor is running.

    This frees network device drivers from the necessity of updating
    last_rx, which can have cache line thrash issues.

    Signed-off-by: Jay Vosburgh

    diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
    index 56c823c..39575d7 100644
    --- a/drivers/net/bonding/bond_main.c
    +++ b/drivers/net/bonding/bond_main.c
    @@ -4564,6 +4564,8 @@ static int bond_init(struct net_device *bond_dev, struct bond_params *params)
    bond_dev->tx_queue_len = 0;
    bond_dev->flags |= IFF_MASTER|IFF_MULTICAST;
    bond_dev->priv_flags |= IFF_BONDING;
    + if (bond->params.arp_interval)
    + bond_dev->priv_flags |= IFF_MASTER_ARPMON;

    /* At first, we block adding VLANs. That's the only way to
    * prevent problems that occur when adding VLANs over an
    diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
    index 296a865..e400d7d 100644
    --- a/drivers/net/bonding/bond_sysfs.c
    +++ b/drivers/net/bonding/bond_sysfs.c
    @@ -620,6 +620,8 @@ static ssize_t bonding_store_arp_interval(struct device *d,
    ": %s: Setting ARP monitoring interval to %d.\n",
    bond->dev->name, new_value);
    bond->params.arp_interval = new_value;
    + if (bond->params.arp_interval)
    + bond->dev->priv_flags |= IFF_MASTER_ARPMON;
    if (bond->params.miimon) {
    printk(KERN_INFO DRV_NAME
    ": %s: ARP monitoring cannot be used with MII monitoring. "
    @@ -1039,6 +1041,7 @@ static ssize_t bonding_store_miimon(struct device *d,
    "ARP monitoring. Disabling ARP monitoring...\n",
    bond->dev->name);
    bond->params.arp_interval = 0;
    + bond->dev->priv_flags &= ~IFF_MASTER_ARPMON;
    if (bond->params.arp_validate) {
    bond_unregister_arp(bond);
    bond->params.arp_validate =
    diff --git a/include/linux/if.h b/include/linux/if.h
    index 6524684..2a6e296 100644
    --- a/include/linux/if.h
    +++ b/include/linux/if.h
    @@ -65,6 +65,7 @@
    #define IFF_BONDING 0x20 /* bonding master or slave */
    #define IFF_SLAVE_NEEDARP 0x40 /* need ARPs for validation */
    #define IFF_ISATAP 0x80 /* ISATAP interface (RFC4214) */
    +#define IFF_MASTER_ARPMON 0x100 /* bonding master, ARP mon in use */

    #define IF_GET_IFACE 0x0001 /* for querying only */
    #define IF_GET_PROTO 0x0002
    diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
    index 9d77b1d..f1b0dbe 100644
    --- a/include/linux/netdevice.h
    +++ b/include/linux/netdevice.h
    @@ -1742,22 +1742,26 @@ static inline int skb_bond_should_drop(struct sk_buff *skb)
    struct net_device *dev = skb->dev;
    struct net_device *master = dev->master;

    - if (master &&
    - (dev->priv_flags & IFF_SLAVE_INACTIVE)) {
    - if ((dev->priv_flags & IFF_SLAVE_NEEDARP) &&
    - skb->protocol == __constant_htons(ETH_P_ARP))
    - return 0;
    -
    - if (master->priv_flags & IFF_MASTER_ALB) {
    - if (skb->pkt_type != PACKET_BROADCAST &&
    - skb->pkt_type != PACKET_MULTICAST)
    + if (master) {
    + if (master->priv_flags & IFF_MASTER_ARPMON)
    + dev->last_rx = jiffies;
    +
    + if (dev->priv_flags & IFF_SLAVE_INACTIVE) {
    + if ((dev->priv_flags & IFF_SLAVE_NEEDARP) &&
    + skb->protocol == __constant_htons(ETH_P_ARP))
    return 0;
    - }
    - if (master->priv_flags & IFF_MASTER_8023AD &&
    - skb->protocol == __constant_htons(ETH_P_SLOW))
    - return 0;

    - return 1;
    + if (master->priv_flags & IFF_MASTER_ALB) {
    + if (skb->pkt_type != PACKET_BROADCAST &&
    + skb->pkt_type != PACKET_MULTICAST)
    + return 0;
    + }
    + if (master->priv_flags & IFF_MASTER_8023AD &&
    + skb->protocol == __constant_htons(ETH_P_SLOW))
    + return 0;
    +
    + return 1;
    + }
    }
    return 0;
    }
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. Re: [PATCH net-next-2.6] bonding, net: Move last_rx update into bonding recv logic

    From: Jay Vosburgh
    Date: Mon, 03 Nov 2008 18:13:09 -0800

    >
    > The only user of the net_device->last_rx field is bonding. This
    > patch adds a conditional update of last_rx to the bonding special logic
    > in skb_bond_should_drop, causing last_rx to only be updated when the ARP
    > monitor is running.
    >
    > This frees network device drivers from the necessity of updating
    > last_rx, which can have cache line thrash issues.
    >
    > Signed-off-by: Jay Vosburgh


    Applied, thanks a lot Jay.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread
Page 5 of 5 FirstFirst ... 3 4 5