disappointing performance with Ethernet bonding on Linux
Hello,
Our group has a small, mostly-homogeneous Linux cluster (5 boxes, two
dual-core Opterons per box, GigE interconnects on PCI-X bus) that
we're using to develop a parallel-ized engineering code. The code is
compiled against MPICH2. From what we're seeing right now, inter-node
communication (in the form of relatively few bulk-transfers of data)
represents a large chunk of the code's execution time. In an attempt
to improve performance at a modest cost, we implemented NIC (channel)
bonding.
The results that I'm seeing, so far, aren't all that impressive. We
bought a round of dual-port GigE NICs, so we're bonding 3 NICs per
box. We have an 802.3ad-compliant switch (Linksys SLM2024), so we're
running bonding mode=4, but we've tried several of the others, to
little avail. The basic benchmarks we're running (subounce v.1.0) are
only showing sporadic ~10% improvements in bandwidth and latency.
Performance of our primary code of interest has only improved by a
factor of 5%-20%, which is significantly less than I was expecting.
The contents of /proc/net/bonding/bond0 is at the end of this post.
At some point, I'll increase the maximum packet size (MTU), which is
currently at the default level, but it seems like something more
fundamental is wrong here. Should the "Number of Ports" perhaps read
higher than "1"?
Can anyone think of something we might be forgetting to do? Have I
misunderstood what channel bonding is capable of? Any experience or
pointers would be greatly appreciated. Let me know if more info would
be helpful.
Thanks,
Greg
***
[fischega@master BOUNCE]$ cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v2.6.1 (October 29, 2004)
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
802.3ad info
LACP rate: slow
Active Aggregator Info:
Aggregator ID: 1
Number of ports: 1
Actor Key: 17
Partner Key: 1
Partner Mac Address: 00:00:00:00:00:00
Slave Interface: eth0
MII Status: up
Link Failure Count: 1
Permanent HW addr: 00:e0:81:43:75:9c
Aggregator ID: 1
Slave Interface: eth2
MII Status: up
Link Failure Count: 1
Permanent HW addr: 00:1b:21:13:4c:e8
Aggregator ID: 2
Slave Interface: eth3
MII Status: up
Link Failure Count: 1
Permanent HW addr: 00:1b:21:13:4c:e9
Aggregator ID: 3
***
Re: disappointing performance with Ethernet bonding on Linux
Am 04.03.2008, 22:14 Uhr, schrieb <Greg.A.Fischer@gmail.com>:
[color=blue]
> Hello,
>
> Our group has a small, mostly-homogeneous Linux cluster (5 boxes, two
> dual-core Opterons per box, GigE interconnects on PCI-X bus) that
> we're using to develop a parallel-ized engineering code. The code is
> compiled against MPICH2. From what we're seeing right now, inter-node
> communication (in the form of relatively few bulk-transfers of data)
> represents a large chunk of the code's execution time. In an attempt
> to improve performance at a modest cost, we implemented NIC (channel)
> bonding.[/color]
Since I do not know too much about Ethernet bonding I will try
to challenge your analýsis.
If you positively measured that
- the bonding improves the bandwidth between the nodes
and that
- the small latency increase by bonding doe not hurt
then my conclusion is that bandwidth may be not your problem.
How did you conclude that the bandwidth is the problem? Did you check
if long time spent in MPI is not caused by load imbalance?
Regards
Georg
[color=blue]
>
> The results that I'm seeing, so far, aren't all that impressive. We
> bought a round of dual-port GigE NICs, so we're bonding 3 NICs per
> box. We have an 802.3ad-compliant switch (Linksys SLM2024), so we're
> running bonding mode=4, but we've tried several of the others, to
> little avail. The basic benchmarks we're running (subounce v.1.0) are
> only showing sporadic ~10% improvements in bandwidth and latency.
> Performance of our primary code of interest has only improved by a
> factor of 5%-20%, which is significantly less than I was expecting.
>
> The contents of /proc/net/bonding/bond0 is at the end of this post.
> At some point, I'll increase the maximum packet size (MTU), which is
> currently at the default level, but it seems like something more
> fundamental is wrong here. Should the "Number of Ports" perhaps read
> higher than "1"?
>
> Can anyone think of something we might be forgetting to do? Have I
> misunderstood what channel bonding is capable of? Any experience or
> pointers would be greatly appreciated. Let me know if more info would
> be helpful.
>
> Thanks,
> Greg
>
> ***
>
> [fischega@master BOUNCE]$ cat /proc/net/bonding/bond0
> Ethernet Channel Bonding Driver: v2.6.1 (October 29, 2004)
>
> Bonding Mode: IEEE 802.3ad Dynamic link aggregation
> MII Status: up
> MII Polling Interval (ms): 100
> Up Delay (ms): 0
> Down Delay (ms): 0
>
> 802.3ad info
> LACP rate: slow
> Active Aggregator Info:
> Aggregator ID: 1
> Number of ports: 1
> Actor Key: 17
> Partner Key: 1
> Partner Mac Address: 00:00:00:00:00:00
>
> Slave Interface: eth0
> MII Status: up
> Link Failure Count: 1
> Permanent HW addr: 00:e0:81:43:75:9c
> Aggregator ID: 1
>
> Slave Interface: eth2
> MII Status: up
> Link Failure Count: 1
> Permanent HW addr: 00:1b:21:13:4c:e8
> Aggregator ID: 2
>
> Slave Interface: eth3
> MII Status: up
> Link Failure Count: 1
> Permanent HW addr: 00:1b:21:13:4c:e9
> Aggregator ID: 3
>
> ***
>[/color]
--
This signature was left intentionally almost blank.
[url]http://www.this-page-intentionally-left-blank.org/[/url]
Re: disappointing performance with Ethernet bonding on Linux
In comp.os.linux.networking [email]Greg.A.Fischer@gmail.com[/email] wrote:[color=blue]
> Bonding Mode: IEEE 802.3ad Dynamic link aggregation[/color]
IIRC this means that any one "flow" will have the services of only one
link in the bond and a "flow" will be defined as going to a given
destination MAC address.
On the "inbound" side, it will be what the switch does - IIRC most
switches by default will also use destination MAC address.
Depending on the distribution of the MAC addresses in your cluster,
you may or may not get very good distribution of traffic among the
links in your bond.
rick jones
--
portable adj, code that compiles under more than one compiler
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
Re: disappointing performance with Ethernet bonding on Linux
Hello,
On Tue, 4 Mar 2008 13:14:06 -0800 (PST)
[email]Greg.A.Fischer@gmail.com[/email] wrote:
[...][color=blue]
> In an attempt
> to improve performance at a modest cost, we implemented NIC (channel)
> bonding.
>
> The results that I'm seeing, so far, aren't all that impressive. We
> bought a round of dual-port GigE NICs, so we're bonding 3 NICs per
> box. We have an 802.3ad-compliant switch (Linksys SLM2024), so we're
> running bonding mode=4, but we've tried several of the others, to
> little avail. The basic benchmarks we're running (subounce v.1.0) are
> only showing sporadic ~10% improvements in bandwidth and latency.
> Performance of our primary code of interest has only improved by a
> factor of 5%-20%, which is significantly less than I was expecting.[/color]
I never have experienced any throughput-gain by channel bonding on
Gigabit Ethernet (in contrast to Fast Ethernet).
If you need bonding for MPI applications only, then I would suggest to
use Open MPI [1]. This MPI implementation benefits from multiple NICs
without depending on the Linux bonding kernel module. For large
messages you should get an almost ideal speedup but the latency will
not decrease (no or little speedup for short messages).
Heiko
[1] [url]http://www.open-mpi.org[/url]
--
-- Ein guter Spruch ist die Wahrheit eines ganzen Buches
-- in einem einzigen Satz. (Theodor Fontane, 1819-1898)
-- Cluster Computing @ [url]http://www.clustercomputing.de[/url]
-- Heiko Bauke @ [url]http://www.mpi-hd.mpg.de/personalhomes/bauke[/url]