[PATCH 00/30] SMP-group balancer - take 3 - Kernel
This is a discussion on [PATCH 00/30] SMP-group balancer - take 3 - Kernel ; Hi,
Another go at SMP fairness for group scheduling.
This code needs some serious testing,..
However on my system performance doesn't tank as much as it used to.
I've ran sysbench and volanomark benchmarks.
The machine is a Quad core ...
-
[PATCH 00/30] SMP-group balancer - take 3
Hi,
Another go at SMP fairness for group scheduling.
This code needs some serious testing,..
However on my system performance doesn't tank as much as it used to.
I've ran sysbench and volanomark benchmarks.
The machine is a Quad core (Intel Q9450) with 4GB of RAM.
Fedora9 - x86_64
sysbench-0.4.8 + postgresql-8.3.3
volanomark-2.5.0.9 + openjdk-1.6.0
I've used cgroup group scheduling.
cgroup:/ - means all tasks are in the root group
cgroup:/foo - means all tasks are in a subgroup
mkdir /cgroup/foo
for i in `cat /cgroup/tasks`; do
echo $i > /cgroup/foo/tasks
done
The patches are against: tip/auto-sched-next of a few days ago.
---
..25
[root@twins sysbench-0.4.8]# ./doit-psql-256-60sec
1: transactions: 50514 (841.90 per sec.)
2: transactions: 98745 (1645.73 per sec.)
4: transactions: 192682 (3211.31 per sec.)
8: transactions: 192082 (3201.26 per sec.)
16: transactions: 188891 (3147.95 per sec.)
32: transactions: 182364 (3039.12 per sec.)
64: transactions: 169412 (2822.94 per sec.)
128: transactions: 139505 (2323.95 per sec.)
256: transactions: 131516 (2188.98 per sec.)
[root@twins vmark]# LOOP_CLIENT_COUNT=1000 ./loopclient.sh 2>&1 | grep Average
Average throughput = 113350 messages per second
Average throughput = 112230 messages per second
Average throughput = 113125 messages per second
..26-rc
cgroup:/
[root@twins sysbench-0.4.8]# ./doit-psql-256-60sec
1: transactions: 50553 (842.54 per sec.)
2: transactions: 98625 (1643.74 per sec.)
4: transactions: 191351 (3189.12 per sec.)
8: transactions: 193525 (3225.32 per sec.)
16: transactions: 190516 (3175.10 per sec.)
32: transactions: 186914 (3114.96 per sec.)
64: transactions: 178940 (2981.78 per sec.)
128: transactions: 156430 (2606.00 per sec.)
256: transactions: 134929 (2246.63 per sec.)
[root@twins vmark]# LOOP_CLIENT_COUNT=1000 ./loopclient.sh 2>&1 | grep Average
Average throughput = 124089 messages per second
Average throughput = 121962 messages per second
Average throughput = 121223 messages per second
cgroup:/foo
[root@twins sysbench-0.4.8]# ./doit-psql-256-60sec
1: transactions: 50246 (837.43 per sec.)
2: transactions: 97466 (1624.41 per sec.)
4: transactions: 179609 (2993.43 per sec.)
8: transactions: 190931 (3182.07 per sec.)
16: transactions: 189882 (3164.50 per sec.)
32: transactions: 184649 (3077.14 per sec.)
64: transactions: 178200 (2969.46 per sec.)
128: transactions: 158835 (2646.14 per sec.)
256: transactions: 142100 (2366.51 per sec.)
[root@twins vmark]# LOOP_CLIENT_COUNT=1000 ./loopclient.sh 2>&1 | grep Average
Average throughput = 117789 messages per second
Average throughput = 118154 messages per second
Average throughput = 118945 messages per second
..26-rc-smp-group
cgroup:/
[root@twins sysbench-0.4.8]# ./doit-psql-256-60sec
1: transactions: 50137 (835.61 per sec.)
2: transactions: 97406 (1623.41 per sec.)
4: transactions: 170755 (2845.88 per sec.)
8: transactions: 187406 (3123.35 per sec.)
16: transactions: 186865 (3114.18 per sec.)
32: transactions: 183559 (3059.03 per sec.)
64: transactions: 176834 (2946.70 per sec.)
128: transactions: 158882 (2647.04 per sec.)
256: transactions: 145081 (2415.81 per sec.)
[root@twins vmark]# LOOP_CLIENT_COUNT=1000 ./loopclient.sh 2>&1 | grep Average
Average throughput = 121499 messages per second
Average throughput = 120181 messages per second
Average throughput = 119775 messages per second
cgroup:/foo
[root@twins sysbench-0.4.8]# ./doit-psql-256-60sec
1: transactions: 49564 (826.06 per sec.)
2: transactions: 96642 (1610.67 per sec.)
4: transactions: 183081 (3051.29 per sec.)
8: transactions: 187553 (3125.79 per sec.)
16: transactions: 185435 (3090.45 per sec.)
32: transactions: 182314 (3038.25 per sec.)
64: transactions: 174527 (2908.22 per sec.)
128: transactions: 159321 (2654.24 per sec.)
256: transactions: 140167 (2333.82 per sec.)
[root@twins vmark]# LOOP_CLIENT_COUNT=1000 ./loopclient.sh 2>&1 | grep Average
Average throughput = 130208 messages per second
Average throughput = 129086 messages per second
Average throughput = 129362 messages per second
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
[PATCH 09/30] sched: fix sched_domain aggregation
Keeping the aggregate on the first cpu of the sched domain has two problems:
- it could collide between different sched domains on different cpus
- it could slow things down because of the remote accesses
Signed-off-by: Peter Zijlstra
---
include/linux/sched.h | 1
kernel/sched.c | 113 +++++++++++++++++++++++---------------------------
kernel/sched_fair.c | 12 ++---
3 files changed, 60 insertions(+), 66 deletions(-)
Index: linux-2.6-2/include/linux/sched.h
================================================== =================
--- linux-2.6-2.orig/include/linux/sched.h
+++ linux-2.6-2/include/linux/sched.h
@@ -766,7 +766,6 @@ struct sched_domain {
struct sched_domain *child; /* bottom domain must be null terminated */
struct sched_group *groups; /* the balancing groups of the domain */
cpumask_t span; /* span of all CPUs in this domain */
- int first_cpu; /* cache of the first cpu in this domain */
unsigned long min_interval; /* Minimum balance interval ms */
unsigned long max_interval; /* Maximum balance interval ms */
unsigned int busy_factor; /* less balancing by factor if busy */
Index: linux-2.6-2/kernel/sched.c
================================================== =================
--- linux-2.6-2.orig/kernel/sched.c
+++ linux-2.6-2/kernel/sched.c
@@ -1539,12 +1539,12 @@ static int task_hot(struct task_struct *
*/
static inline struct aggregate_struct *
-aggregate(struct task_group *tg, struct sched_domain *sd)
+aggregate(struct task_group *tg, int cpu)
{
- return &tg->cfs_rq[sd->first_cpu]->aggregate;
+ return &tg->cfs_rq[cpu]->aggregate;
}
-typedef void (*aggregate_func)(struct task_group *, struct sched_domain *);
+typedef void (*aggregate_func)(struct task_group *, int, struct sched_domain *);
/*
* Iterate the full tree, calling @down when first entering a node and @up when
@@ -1552,14 +1552,14 @@ typedef void (*aggregate_func)(struct ta
*/
static
void aggregate_walk_tree(aggregate_func down, aggregate_func up,
- struct sched_domain *sd)
+ int cpu, struct sched_domain *sd)
{
struct task_group *parent, *child;
rcu_read_lock();
parent = &root_task_group;
down:
- (*down)(parent, sd);
+ (*down)(parent, cpu, sd);
list_for_each_entry_rcu(child, &parent->children, siblings) {
parent = child;
goto down;
@@ -1567,7 +1567,7 @@ down:
up:
continue;
}
- (*up)(parent, sd);
+ (*up)(parent, cpu, sd);
child = parent;
parent = parent->parent;
@@ -1579,8 +1579,8 @@ up:
/*
* Calculate the aggregate runqueue weight.
*/
-static
-void aggregate_group_weight(struct task_group *tg, struct sched_domain *sd)
+static void
+aggregate_group_weight(struct task_group *tg, int cpu, struct sched_domain *sd)
{
unsigned long rq_weight = 0;
unsigned long task_weight = 0;
@@ -1591,15 +1591,15 @@ void aggregate_group_weight(struct task_
task_weight += tg->cfs_rq[i]->task_weight;
}
- aggregate(tg, sd)->rq_weight = rq_weight;
- aggregate(tg, sd)->task_weight = task_weight;
+ aggregate(tg, cpu)->rq_weight = rq_weight;
+ aggregate(tg, cpu)->task_weight = task_weight;
}
/*
* Compute the weight of this group on the given cpus.
*/
-static
-void aggregate_group_shares(struct task_group *tg, struct sched_domain *sd)
+static void
+aggregate_group_shares(struct task_group *tg, int cpu, struct sched_domain *sd)
{
unsigned long shares = 0;
int i;
@@ -1607,18 +1607,18 @@ void aggregate_group_shares(struct task_
for_each_cpu_mask(i, sd->span)
shares += tg->cfs_rq[i]->shares;
- if ((!shares && aggregate(tg, sd)->rq_weight) || shares > tg->shares)
+ if ((!shares && aggregate(tg, cpu)->rq_weight) || shares > tg->shares)
shares = tg->shares;
- aggregate(tg, sd)->shares = shares;
+ aggregate(tg, cpu)->shares = shares;
}
/*
* Compute the load fraction assigned to this group, relies on the aggregate
* weight and this group's parent's load, i.e. top-down.
*/
-static
-void aggregate_group_load(struct task_group *tg, struct sched_domain *sd)
+static void
+aggregate_group_load(struct task_group *tg, int cpu, struct sched_domain *sd)
{
unsigned long load;
@@ -1630,17 +1630,17 @@ void aggregate_group_load(struct task_gr
load += cpu_rq(i)->load.weight;
} else {
- load = aggregate(tg->parent, sd)->load;
+ load = aggregate(tg->parent, cpu)->load;
/*
* shares is our weight in the parent's rq so
* shares/parent->rq_weight gives our fraction of the load
*/
- load *= aggregate(tg, sd)->shares;
- load /= aggregate(tg->parent, sd)->rq_weight + 1;
+ load *= aggregate(tg, cpu)->shares;
+ load /= aggregate(tg->parent, cpu)->rq_weight + 1;
}
- aggregate(tg, sd)->load = load;
+ aggregate(tg, cpu)->load = load;
}
static void __set_se_shares(struct sched_entity *se, unsigned long shares);
@@ -1649,8 +1649,8 @@ static void __set_se_shares(struct sched
* Calculate and set the cpu's group shares.
*/
static void
-__update_group_shares_cpu(struct task_group *tg, struct sched_domain *sd,
- int tcpu)
+__update_group_shares_cpu(struct task_group *tg, int cpu,
+ struct sched_domain *sd, int tcpu)
{
int boost = 0;
unsigned long shares;
@@ -1677,8 +1677,8 @@ __update_group_shares_cpu(struct task_gr
* \Sum rq_weight
*
*/
- shares = aggregate(tg, sd)->shares * rq_weight;
- shares /= aggregate(tg, sd)->rq_weight + 1;
+ shares = aggregate(tg, cpu)->shares * rq_weight;
+ shares /= aggregate(tg, cpu)->rq_weight + 1;
/*
* record the actual number of shares, not the boosted amount.
@@ -1698,15 +1698,15 @@ __update_group_shares_cpu(struct task_gr
* task went to.
*/
static void
-__move_group_shares(struct task_group *tg, struct sched_domain *sd,
+__move_group_shares(struct task_group *tg, int cpu, struct sched_domain *sd,
int scpu, int dcpu)
{
unsigned long shares;
shares = tg->cfs_rq[scpu]->shares + tg->cfs_rq[dcpu]->shares;
- __update_group_shares_cpu(tg, sd, scpu);
- __update_group_shares_cpu(tg, sd, dcpu);
+ __update_group_shares_cpu(tg, cpu, sd, scpu);
+ __update_group_shares_cpu(tg, cpu, sd, dcpu);
/*
* ensure we never loose shares due to rounding errors in the
@@ -1722,19 +1722,19 @@ __move_group_shares(struct task_group *t
* we need to walk up the tree and change all shares until we hit the root.
*/
static void
-move_group_shares(struct task_group *tg, struct sched_domain *sd,
+move_group_shares(struct task_group *tg, int cpu, struct sched_domain *sd,
int scpu, int dcpu)
{
while (tg) {
- __move_group_shares(tg, sd, scpu, dcpu);
+ __move_group_shares(tg, cpu, sd, scpu, dcpu);
tg = tg->parent;
}
}
-static
-void aggregate_group_set_shares(struct task_group *tg, struct sched_domain *sd)
+static void
+aggregate_group_set_shares(struct task_group *tg, int cpu, struct sched_domain *sd)
{
- unsigned long shares = aggregate(tg, sd)->shares;
+ unsigned long shares = aggregate(tg, cpu)->shares;
int i;
for_each_cpu_mask(i, sd->span) {
@@ -1742,20 +1742,20 @@ void aggregate_group_set_shares(struct t
unsigned long flags;
spin_lock_irqsave(&rq->lock, flags);
- __update_group_shares_cpu(tg, sd, i);
+ __update_group_shares_cpu(tg, cpu, sd, i);
spin_unlock_irqrestore(&rq->lock, flags);
}
- aggregate_group_shares(tg, sd);
+ aggregate_group_shares(tg, cpu, sd);
/*
* ensure we never loose shares due to rounding errors in the
* above redistribution.
*/
- shares -= aggregate(tg, sd)->shares;
+ shares -= aggregate(tg, cpu)->shares;
if (shares) {
- tg->cfs_rq[sd->first_cpu]->shares += shares;
- aggregate(tg, sd)->shares += shares;
+ tg->cfs_rq[cpu]->shares += shares;
+ aggregate(tg, cpu)->shares += shares;
}
}
@@ -1763,21 +1763,21 @@ void aggregate_group_set_shares(struct t
* Calculate the accumulative weight and recursive load of each task group
* while walking down the tree.
*/
-static
-void aggregate_get_down(struct task_group *tg, struct sched_domain *sd)
+static void
+aggregate_get_down(struct task_group *tg, int cpu, struct sched_domain *sd)
{
- aggregate_group_weight(tg, sd);
- aggregate_group_shares(tg, sd);
- aggregate_group_load(tg, sd);
+ aggregate_group_weight(tg, cpu, sd);
+ aggregate_group_shares(tg, cpu, sd);
+ aggregate_group_load(tg, cpu, sd);
}
/*
* Rebalance the cpu shares while walking back up the tree.
*/
-static
-void aggregate_get_up(struct task_group *tg, struct sched_domain *sd)
+static void
+aggregate_get_up(struct task_group *tg, int cpu, struct sched_domain *sd)
{
- aggregate_group_set_shares(tg, sd);
+ aggregate_group_set_shares(tg, cpu, sd);
}
static DEFINE_PER_CPU(spinlock_t, aggregate_lock);
@@ -1790,18 +1790,18 @@ static void __init init_aggregate(void)
spin_lock_init(&per_cpu(aggregate_lock, i));
}
-static int get_aggregate(struct sched_domain *sd)
+static int get_aggregate(int cpu, struct sched_domain *sd)
{
- if (!spin_trylock(&per_cpu(aggregate_lock, sd->first_cpu)))
+ if (!spin_trylock(&per_cpu(aggregate_lock, cpu)))
return 0;
- aggregate_walk_tree(aggregate_get_down, aggregate_get_up, sd);
+ aggregate_walk_tree(aggregate_get_down, aggregate_get_up, cpu, sd);
return 1;
}
-static void put_aggregate(struct sched_domain *sd)
+static void put_aggregate(int cpu, struct sched_domain *sd)
{
- spin_unlock(&per_cpu(aggregate_lock, sd->first_cpu));
+ spin_unlock(&per_cpu(aggregate_lock, cpu));
}
static void cfs_rq_set_shares(struct cfs_rq *cfs_rq, unsigned long shares)
@@ -1815,12 +1815,12 @@ static inline void init_aggregate(void)
{
}
-static inline int get_aggregate(struct sched_domain *sd)
+static inline int get_aggregate(int cpu, struct sched_domain *sd)
{
return 0;
}
-static inline void put_aggregate(struct sched_domain *sd)
+static inline void put_aggregate(int cpu, struct sched_domain *sd)
{
}
#endif
@@ -3604,7 +3604,7 @@ static int load_balance(int this_cpu, st
cpus_setall(*cpus);
- unlock_aggregate = get_aggregate(sd);
+ unlock_aggregate = get_aggregate(this_cpu, sd);
/*
* When power savings policy is enabled for the parent domain, idle
@@ -3743,7 +3743,7 @@ out_one_pinned:
ld_moved = 0;
out:
if (unlock_aggregate)
- put_aggregate(sd);
+ put_aggregate(this_cpu, sd);
return ld_moved;
}
@@ -7337,7 +7337,6 @@ static int __build_sched_domains(const c
SD_INIT(sd, ALLNODES);
set_domain_attribute(sd, attr);
sd->span = *cpu_map;
- sd->first_cpu = first_cpu(sd->span);
cpu_to_allnodes_group(i, cpu_map, &sd->groups, tmpmask);
p = sd;
sd_allnodes = 1;
@@ -7348,7 +7347,6 @@ static int __build_sched_domains(const c
SD_INIT(sd, NODE);
set_domain_attribute(sd, attr);
sched_domain_node_span(cpu_to_node(i), &sd->span);
- sd->first_cpu = first_cpu(sd->span);
sd->parent = p;
if (p)
p->child = sd;
@@ -7360,7 +7358,6 @@ static int __build_sched_domains(const c
SD_INIT(sd, CPU);
set_domain_attribute(sd, attr);
sd->span = *nodemask;
- sd->first_cpu = first_cpu(sd->span);
sd->parent = p;
if (p)
p->child = sd;
@@ -7372,7 +7369,6 @@ static int __build_sched_domains(const c
SD_INIT(sd, MC);
set_domain_attribute(sd, attr);
sd->span = cpu_coregroup_map(i);
- sd->first_cpu = first_cpu(sd->span);
cpus_and(sd->span, sd->span, *cpu_map);
sd->parent = p;
p->child = sd;
@@ -7385,7 +7381,6 @@ static int __build_sched_domains(const c
SD_INIT(sd, SIBLING);
set_domain_attribute(sd, attr);
sd->span = per_cpu(cpu_sibling_map, i);
- sd->first_cpu = first_cpu(sd->span);
cpus_and(sd->span, sd->span, *cpu_map);
sd->parent = p;
p->child = sd;
Index: linux-2.6-2/kernel/sched_fair.c
================================================== =================
--- linux-2.6-2.orig/kernel/sched_fair.c
+++ linux-2.6-2/kernel/sched_fair.c
@@ -1403,11 +1403,11 @@ load_balance_fair(struct rq *this_rq, in
/*
* empty group
*/
- if (!aggregate(tg, sd)->task_weight)
+ if (!aggregate(tg, this_cpu)->task_weight)
continue;
- rem_load = rem_load_move * aggregate(tg, sd)->rq_weight;
- rem_load /= aggregate(tg, sd)->load + 1;
+ rem_load = rem_load_move * aggregate(tg, this_cpu)->rq_weight;
+ rem_load /= aggregate(tg, this_cpu)->load + 1;
this_weight = tg->cfs_rq[this_cpu]->task_weight;
busiest_weight = tg->cfs_rq[busiest_cpu]->task_weight;
@@ -1425,10 +1425,10 @@ load_balance_fair(struct rq *this_rq, in
if (!moved_load)
continue;
- move_group_shares(tg, sd, busiest_cpu, this_cpu);
+ move_group_shares(tg, this_cpu, sd, busiest_cpu, this_cpu);
- moved_load *= aggregate(tg, sd)->load;
- moved_load /= aggregate(tg, sd)->rq_weight + 1;
+ moved_load *= aggregate(tg, this_cpu)->load;
+ moved_load /= aggregate(tg, this_cpu)->rq_weight + 1;
rem_load_move -= moved_load;
if (rem_load_move < 0)
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [PATCH 00/30] SMP-group balancer - take 3
* Peter Zijlstra wrote:
> Hi,
>
> Another go at SMP fairness for group scheduling.
>
> This code needs some serious testing,..
>
> However on my system performance doesn't tank as much as it used to.
> I've ran sysbench and volanomark benchmarks.
>
> The machine is a Quad core (Intel Q9450) with 4GB of RAM.
> Fedora9 - x86_64
>
> sysbench-0.4.8 + postgresql-8.3.3
> volanomark-2.5.0.9 + openjdk-1.6.0
>
> I've used cgroup group scheduling.
cool. I have applied your patches to a new temporary topic,
tip/sched/devel.smp-group-balance. If that works out fine in testing
then we can merge it back into sched/devel.
Thanks Peter,
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [PATCH 00/30] SMP-group balancer - take 3
On Fri, Jun 27, 2008 at 01:41:09PM +0200, Peter Zijlstra wrote:
> Hi,
>
> Another go at SMP fairness for group scheduling.
>
> This code needs some serious testing,..
>
> However on my system performance doesn't tank as much as it used to.
> I've ran sysbench and volanomark benchmarks.
>
> The machine is a Quad core (Intel Q9450) with 4GB of RAM.
> Fedora9 - x86_64
>
> sysbench-0.4.8 + postgresql-8.3.3
> volanomark-2.5.0.9 + openjdk-1.6.0
>
> I've used cgroup group scheduling.
>
> cgroup:/ - means all tasks are in the root group
> cgroup:/foo - means all tasks are in a subgroup
>
> mkdir /cgroup/foo
> for i in `cat /cgroup/tasks`; do
> echo $i > /cgroup/foo/tasks
> done
>
> The patches are against: tip/auto-sched-next of a few days ago.
>
> ---
>
> .25
>
> [root@twins sysbench-0.4.8]# ./doit-psql-256-60sec
> 1: transactions: 50514 (841.90 per sec.)
> 2: transactions: 98745 (1645.73 per sec.)
> 4: transactions: 192682 (3211.31 per sec.)
> 8: transactions: 192082 (3201.26 per sec.)
> 16: transactions: 188891 (3147.95 per sec.)
> 32: transactions: 182364 (3039.12 per sec.)
> 64: transactions: 169412 (2822.94 per sec.)
> 128: transactions: 139505 (2323.95 per sec.)
> 256: transactions: 131516 (2188.98 per sec.)
>
> [root@twins vmark]# LOOP_CLIENT_COUNT=1000 ./loopclient.sh 2>&1 | grep Average
> Average throughput = 113350 messages per second
> Average throughput = 112230 messages per second
> Average throughput = 113125 messages per second
>
>
> .26-rc
>
> cgroup:/
>
> [root@twins sysbench-0.4.8]# ./doit-psql-256-60sec
> 1: transactions: 50553 (842.54 per sec.)
> 2: transactions: 98625 (1643.74 per sec.)
> 4: transactions: 191351 (3189.12 per sec.)
> 8: transactions: 193525 (3225.32 per sec.)
> 16: transactions: 190516 (3175.10 per sec.)
> 32: transactions: 186914 (3114.96 per sec.)
> 64: transactions: 178940 (2981.78 per sec.)
> 128: transactions: 156430 (2606.00 per sec.)
> 256: transactions: 134929 (2246.63 per sec.)
>
> [root@twins vmark]# LOOP_CLIENT_COUNT=1000 ./loopclient.sh 2>&1 | grep Average
> Average throughput = 124089 messages per second
> Average throughput = 121962 messages per second
> Average throughput = 121223 messages per second
>
>
> cgroup:/foo
>
> [root@twins sysbench-0.4.8]# ./doit-psql-256-60sec
> 1: transactions: 50246 (837.43 per sec.)
> 2: transactions: 97466 (1624.41 per sec.)
> 4: transactions: 179609 (2993.43 per sec.)
> 8: transactions: 190931 (3182.07 per sec.)
> 16: transactions: 189882 (3164.50 per sec.)
> 32: transactions: 184649 (3077.14 per sec.)
> 64: transactions: 178200 (2969.46 per sec.)
> 128: transactions: 158835 (2646.14 per sec.)
> 256: transactions: 142100 (2366.51 per sec.)
>
> [root@twins vmark]# LOOP_CLIENT_COUNT=1000 ./loopclient.sh 2>&1 | grep Average
> Average throughput = 117789 messages per second
> Average throughput = 118154 messages per second
> Average throughput = 118945 messages per second
>
>
> .26-rc-smp-group
>
> cgroup:/
>
> [root@twins sysbench-0.4.8]# ./doit-psql-256-60sec
> 1: transactions: 50137 (835.61 per sec.)
> 2: transactions: 97406 (1623.41 per sec.)
> 4: transactions: 170755 (2845.88 per sec.)
> 8: transactions: 187406 (3123.35 per sec.)
> 16: transactions: 186865 (3114.18 per sec.)
> 32: transactions: 183559 (3059.03 per sec.)
> 64: transactions: 176834 (2946.70 per sec.)
> 128: transactions: 158882 (2647.04 per sec.)
> 256: transactions: 145081 (2415.81 per sec.)
>
> [root@twins vmark]# LOOP_CLIENT_COUNT=1000 ./loopclient.sh 2>&1 | grep Average
> Average throughput = 121499 messages per second
> Average throughput = 120181 messages per second
> Average throughput = 119775 messages per second
>
>
> cgroup:/foo
>
> [root@twins sysbench-0.4.8]# ./doit-psql-256-60sec
> 1: transactions: 49564 (826.06 per sec.)
> 2: transactions: 96642 (1610.67 per sec.)
> 4: transactions: 183081 (3051.29 per sec.)
> 8: transactions: 187553 (3125.79 per sec.)
> 16: transactions: 185435 (3090.45 per sec.)
> 32: transactions: 182314 (3038.25 per sec.)
> 64: transactions: 174527 (2908.22 per sec.)
> 128: transactions: 159321 (2654.24 per sec.)
> 256: transactions: 140167 (2333.82 per sec.)
>
> [root@twins vmark]# LOOP_CLIENT_COUNT=1000 ./loopclient.sh 2>&1 | grep Average
> Average throughput = 130208 messages per second
> Average throughput = 129086 messages per second
> Average throughput = 129362 messages per second
Some fairness numbers from tip/master
kernel compiles with even number of threads
/cgroup/a
[dhaval@mordor a]$ time make -j8
real 1m53.033s
user 1m28.785s
sys 0m22.224s
/cgroup/b
[dhaval@mordor b]$ time make -j16
real 1m51.826s
user 1m29.022s
sys 0m21.911s
kernel compile with odd number of threads
/cgroup/a
[dhaval@mordor a]$ time make -j7
real 1m49.441s
user 1m26.962s
sys 0m21.698s
/cgroup/b
[dhaval@mordor b]$ time make -j13
real 1m50.418s
user 1m26.888s
sys 0m21.508s
Running infinite loops in parallel (5 in one group, 2 in another)
8789 - 8793 belong to /cgroup/a
8794, 8795 belong /cgroup/b
When we start.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
8795 dhaval 20 0 1720 264 212 R 54.6 0.0 0:06.31 test
8794 dhaval 20 0 1720 264 212 R 45.6 0.0 0:06.91 test
8790 dhaval 20 0 1720 264 212 R 23.0 0.0 0:07.29 test
8789 dhaval 20 0 1720 260 212 R 22.6 0.0 0:07.80 test
8791 dhaval 20 0 1720 264 212 R 18.3 0.0 0:07.28 test
8792 dhaval 20 0 1720 260 212 R 18.3 0.0 0:07.01 test
8793 dhaval 20 0 1720 260 212 R 18.0 0.0 0:06.93 test
After sometime
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
8794 dhaval 20 0 1720 264 212 R 49.9 0.0 0:46.98 test
8795 dhaval 20 0 1720 264 212 R 49.9 0.0 0:52.61 test
8793 dhaval 20 0 1720 260 212 R 20.3 0.0 0:24.96 test
8789 dhaval 20 0 1720 260 212 R 20.0 0.0 0:24.83 test
8790 dhaval 20 0 1720 264 212 R 20.0 0.0 0:24.32 test
8791 dhaval 20 0 1720 264 212 R 20.0 0.0 0:23.29 test
8792 dhaval 20 0 1720 260 212 R 20.0 0.0 0:25.04 test
But these numbers are not very stable. Also it takes a long time (~1min)
to converge here.
The results look really good though.
--
regards,
Dhaval
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [PATCH 00/30] SMP-group balancer - take 3
Hi,
I get this at bootup
------------[ cut here ]------------
WARNING: at kernel/lockdep.c:2738 check_flags+0x8a/0x12d()
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.26-rc8-tip #5
[] warn_on_slowpath+0x41/0x7b
[] ? trace_hardirqs_off+0xb/0xd
[] ? native_sched_clock+0x8b/0x9d
[] ? __sysctl_head_next+0x98/0x9f
[] ? _spin_unlock+0x1d/0x20
[] ? __sysctl_head_next+0x98/0x9f
[] ? __lock_acquire+0xd96/0xda5
[] check_flags+0x8a/0x12d
[] lock_acquire+0x3b/0x89
[] ? tg_shares_up+0x0/0x170
[] walk_tg_tree+0x2c/0x9f
[] ? walk_tg_tree+0x0/0x9f
[] ? tg_nop+0x0/0x5
[] update_shares+0x54/0x5d
[] try_to_wake_up+0x59/0x22b
[] wake_up_process+0xf/0x11
[] kthread_create+0x68/0x98
[] ? worker_thread+0x0/0xc2
[] __create_workqueue_key+0x19e/0x1ee
[] ? worker_thread+0x0/0xc2
[] init_workqueues+0x4c/0x5d
[] kernel_init+0xcf/0x255
[] ? trace_hardirqs_on_thunk+0xc/0x10
[] ? trace_hardirqs_on_caller+0x10b/0x136
[] ? trace_hardirqs_on_thunk+0xc/0x10
[] ? restore_nocheck_notrace+0x0/0xe
[] ? kernel_init+0x0/0x255
[] ? kernel_init+0x0/0x255
[] kernel_thread_helper+0x7/0x10
=======================
---[ end trace 4eaa2a86a8e2da22 ]---
possible reason: unannotated irqs-on.
irq event stamp: 1892
hardirqs last enabled at (1891): [] trace_hardirqs_on+0xb/0xd
hardirqs last disabled at (1892): []
trace_hardirqs_off+0xb/0xd
softirqs last enabled at (1548): [] __do_softirq+0x13e/0x146
softirqs last disabled at (1541): [] do_softirq+0x3a/0x52
--
regards,
Dhaval
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [PATCH 00/30] SMP-group balancer - take 3
* Dhaval Giani wrote:
> Hi,
>
> I get this at bootup
>
> ------------[ cut here ]------------
> WARNING: at kernel/lockdep.c:2738 check_flags+0x8a/0x12d()
> Modules linked in:
> Pid: 1, comm: swapper Not tainted 2.6.26-rc8-tip #5
please check latest tip/master. This is the commit that should fix it:
----------------
| commit 2d452c9b10caeec455eb5e56a0ef4ed485178213
| Author: Ingo Molnar
| Date: Sun Jun 29 15:01:59 2008 +0200
|
| sched: sched_clock_cpu() based cpu_clock(), lockdep fix
|
| Vegard Nossum reported:
|
| > WARNING: at kernel/lockdep.c:2738 check_flags+0x142/0x160()
----------------
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [PATCH 00/30] SMP-group balancer - take 3
On Mon, Jun 30, 2008 at 02:59:56PM +0200, Ingo Molnar wrote:
>
> * Dhaval Giani wrote:
>
> > Hi,
> >
> > I get this at bootup
> >
> > ------------[ cut here ]------------
> > WARNING: at kernel/lockdep.c:2738 check_flags+0x8a/0x12d()
> > Modules linked in:
> > Pid: 1, comm: swapper Not tainted 2.6.26-rc8-tip #5
>
> please check latest tip/master. This is the commit that should fix it:
>
Nope, does not
. Still get,
------------[ cut here ]------------
WARNING: at kernel/lockdep.c:2662 check_flags+0x7c/0x10b()
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.26-rc8 #2
[] warn_on_slowpath+0x41/0x5d
[] ? find_usage_backwards+0xb4/0xd5
[] ? find_usage_backwards+0xb4/0xd5
[] ? find_usage_backwards+0xb4/0xd5
[] ? check_usage+0x23/0x58
[] ? check_prev_add_irq+0x71/0x85
[] ? check_prev_add+0x3b/0x17f
[] ? check_prevs_add+0x5a/0xb2
[] ? validate_chain+0xaa/0x29c
[] check_flags+0x7c/0x10b
[] lock_acquire+0x30/0x7e
[] ? tg_shares_up+0x0/0x100
[] walk_tg_tree+0x2c/0x96
[] ? walk_tg_tree+0x0/0x96
[] ? tg_nop+0x0/0x5
[] update_shares+0x42/0x4a
[] try_to_wake_up+0x4c/0x11f
[] wake_up_process+0xf/0x11
[] kthread_create+0x6c/0x9c
[] ? worker_thread+0x0/0xd2
[] ? __spin_lock_init+0x24/0x47
[] create_workqueue_thread+0x2b/0x45
[] ? worker_thread+0x0/0xd2
[] __create_workqueue_key+0x115/0x14d
[] ? kernel_init+0x0/0x93
[] init_workqueues+0x4c/0x5d
[] do_basic_setup+0x8/0x1e
[] kernel_init+0x58/0x93
[] kernel_thread_helper+0x7/0x10
=======================
---[ end trace 4eaa2a86a8e2da22 ]---
possible reason: unannotated irqs-on.
irq event stamp: 10216
hardirqs last enabled at (10215): []
debug_check_no_locks_freed+0x9d/0xa7
hardirqs last disabled at (10216): []
native_sched_clock+0x50/0xb8
softirqs last enabled at (9922): [] __do_softirq+0xdf/0xe6
softirqs last disabled at (9915): [] do_softirq+0x39/0x51
--
regards,
Dhaval
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [PATCH 02/30] sched: revert the revert of: weight calculations
* Peter Zijlstra [2008-06-27 13:41:11]:
> Try again..
>
> initial commit: 8f1bc385cfbab474db6c27b5af1e439614f3025c
> revert: f9305d4a0968201b2818dbed0dc8cb0d4ee7aeb3
>
> Signed-off-by: Peter Zijlstra
> ---
>
> ---
> kernel/sched.c | 9 +---
> kernel/sched_fair.c | 105 ++++++++++++++++++++++++++++++++----------------
> kernel/sched_features.h | 1
> 3 files changed, 76 insertions(+), 39 deletions(-)
>
> Index: linux-2.6/kernel/sched.c
> ================================================== =================
> --- linux-2.6.orig/kernel/sched.c
> +++ linux-2.6/kernel/sched.c
> @@ -1342,6 +1342,9 @@ static void __resched_task(struct task_s
> */
> #define SRR(x, y) (((x) + (1UL << ((y) - 1))) >> (y))
>
> +/*
> + * delta *= weight / lw
> + */
> static unsigned long
> calc_delta_mine(unsigned long delta_exec, unsigned long weight,
> struct load_weight *lw)
> @@ -1369,12 +1372,6 @@ calc_delta_mine(unsigned long delta_exec
> return (unsigned long)min(tmp, (u64)(unsigned long)LONG_MAX);
> }
>
> -static inline unsigned long
> -calc_delta_fair(unsigned long delta_exec, struct load_weight *lw)
> -{
> - return calc_delta_mine(delta_exec, NICE_0_LOAD, lw);
> -}
> -
> static inline void update_load_add(struct load_weight *lw, unsigned long inc)
> {
> lw->weight += inc;
> Index: linux-2.6/kernel/sched_fair.c
> ================================================== =================
> --- linux-2.6.orig/kernel/sched_fair.c
> +++ linux-2.6/kernel/sched_fair.c
> @@ -334,6 +334,34 @@ int sched_nr_latency_handler(struct ctl_
> #endif
>
> /*
> + * delta *= w / rw
> + */
> +static inline unsigned long
> +calc_delta_weight(unsigned long delta, struct sched_entity *se)
> +{
> + for_each_sched_entity(se) {
> + delta = calc_delta_mine(delta,
> + se->load.weight, &cfs_rq_of(se)->load);
> + }
> +
> + return delta;
> +}
> +
> +/*
> + * delta *= rw / w
> + */
> +static inline unsigned long
> +calc_delta_fair(unsigned long delta, struct sched_entity *se)
> +{
> + for_each_sched_entity(se) {
> + delta = calc_delta_mine(delta,
> + cfs_rq_of(se)->load.weight, &se->load);
> + }
> +
> + return delta;
> +}
> +
These functions can do with better comments
delta is scaled up as we move up the hierarchy
Why is calc_delta_weight() different from calc_delta_fair()?
> +/*
> * The idea is to set a period in which each task runs once.
> *
> * When there are too many tasks (sysctl_sched_nr_latency) we have to stretch
> @@ -362,47 +390,54 @@ static u64 __sched_period(unsigned long
> */
> static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
> {
> - u64 slice = __sched_period(cfs_rq->nr_running);
> -
> - for_each_sched_entity(se) {
> - cfs_rq = cfs_rq_of(se);
> -
> - slice *= se->load.weight;
> - do_div(slice, cfs_rq->load.weight);
> - }
> -
> -
> - return slice;
> + return calc_delta_weight(__sched_period(cfs_rq->nr_running), se);
> }
>
> /*
> * We calculate the vruntime slice of a to be inserted task
> *
> - * vs = s/w = p/rw
> + * vs = s*rw/w = p
> */
> static u64 sched_vslice_add(struct cfs_rq *cfs_rq, struct sched_entity *se)
> {
> unsigned long nr_running = cfs_rq->nr_running;
> - unsigned long weight;
> - u64 vslice;
>
> if (!se->on_rq)
> nr_running++;
>
> - vslice = __sched_period(nr_running);
> + return __sched_period(nr_running);
Do we always return a constant value based on nr_running? Am I
misreading the diff by any chance?
> +}
> +
> +/*
> + * The goal of calc_delta_asym() is to be asymmetrically around NICE_0_LOAD, in
> + * that it favours >=0 over <0.
> + *
> + * -20 |
> + * |
> + * 0 --------+-------
> + * .'
> + * 19 .'
> + *
> + */
> +static unsigned long
> +calc_delta_asym(unsigned long delta, struct sched_entity *se)
> +{
> + struct load_weight lw = {
> + .weight = NICE_0_LOAD,
> + .inv_weight = 1UL << (WMULT_SHIFT-NICE_0_SHIFT)
> + };
Could you please explain this
weight is 1 << 10
and inv_weight is 1 << 22
>
> for_each_sched_entity(se) {
> - cfs_rq = cfs_rq_of(se);
> + struct load_weight *se_lw = &se->load;
>
> - weight = cfs_rq->load.weight;
> - if (!se->on_rq)
> - weight += se->load.weight;
> + if (se->load.weight < NICE_0_LOAD)
> + se_lw = &lw;
Why do we do this?
>
> - vslice *= NICE_0_LOAD;
> - do_div(vslice, weight);
> + delta = calc_delta_mine(delta,
> + cfs_rq_of(se)->load.weight, se_lw);
> }
>
> - return vslice;
> + return delta;
> }
>
> /*
> @@ -419,11 +454,7 @@ __update_curr(struct cfs_rq *cfs_rq, str
>
> curr->sum_exec_runtime += delta_exec;
> schedstat_add(cfs_rq, exec_clock, delta_exec);
> - delta_exec_weighted = delta_exec;
> - if (unlikely(curr->load.weight != NICE_0_LOAD)) {
> - delta_exec_weighted = calc_delta_fair(delta_exec_weighted,
> - &curr->load);
> - }
> + delta_exec_weighted = calc_delta_fair(delta_exec, curr);
> curr->vruntime += delta_exec_weighted;
> }
>
> @@ -609,8 +640,17 @@ place_entity(struct cfs_rq *cfs_rq, stru
>
> if (!initial) {
> /* sleeps upto a single latency don't count. */
> - if (sched_feat(NEW_FAIR_SLEEPERS))
> - vruntime -= sysctl_sched_latency;
> + if (sched_feat(NEW_FAIR_SLEEPERS)) {
> + unsigned long thresh = sysctl_sched_latency;
> +
> + /*
> + * convert the sleeper threshold into virtual time
> + */
> + if (sched_feat(NORMALIZED_SLEEPER))
> + thresh = calc_delta_fair(thresh, se);
> +
> + vruntime -= thresh;
> + }
>
> /* ensure we never gain time by being placed backwards. */
> vruntime = max_vruntime(se->vruntime, vruntime);
> @@ -1111,11 +1151,10 @@ static unsigned long wakeup_gran(struct
> unsigned long gran = sysctl_sched_wakeup_granularity;
>
> /*
> - * More easily preempt - nice tasks, while not making
> - * it harder for + nice tasks.
> + * More easily preempt - nice tasks, while not making it harder for
> + * + nice tasks.
> */
> - if (unlikely(se->load.weight > NICE_0_LOAD))
> - gran = calc_delta_fair(gran, &se->load);
> + gran = calc_delta_asym(sysctl_sched_wakeup_granularity, se);
>
> return gran;
> }
> Index: linux-2.6/kernel/sched_features.h
> ================================================== =================
> --- linux-2.6.orig/kernel/sched_features.h
> +++ linux-2.6/kernel/sched_features.h
> @@ -1,4 +1,5 @@
> SCHED_FEAT(NEW_FAIR_SLEEPERS, 1)
> +SCHED_FEAT(NORMALIZED_SLEEPER, 1)
> SCHED_FEAT(WAKEUP_PREEMPT, 1)
> SCHED_FEAT(START_DEBIT, 1)
> SCHED_FEAT(AFFINE_WAKEUPS, 1)
--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [PATCH 00/30] SMP-group balancer - take 3
On Mon, Jun 30, 2008 at 08:23:57PM +0530, Dhaval Giani wrote:
> On Mon, Jun 30, 2008 at 02:59:56PM +0200, Ingo Molnar wrote:
> >
> > * Dhaval Giani wrote:
> >
> > > Hi,
> > >
> > > I get this at bootup
> > >
> > > ------------[ cut here ]------------
> > > WARNING: at kernel/lockdep.c:2738 check_flags+0x8a/0x12d()
> > > Modules linked in:
> > > Pid: 1, comm: swapper Not tainted 2.6.26-rc8-tip #5
> >
> > please check latest tip/master. This is the commit that should fix it:
> >
>
> Nope, does not
. Still get,
>
Ah, turns out my git-fetch did not work so well. I just pulled the
latest tip, and it seems to have been fixed. Sorry for the noise.
Thanks,
--
regards,
Dhaval
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [PATCH 02/30] sched: revert the revert of: weight calculations
On Mon, 2008-06-30 at 23:37 +0530, Balbir Singh wrote:
> * Peter Zijlstra [2008-06-27 13:41:11]:
> > /*
> > + * delta *= w / rw
> > + */
> > +static inline unsigned long
> > +calc_delta_weight(unsigned long delta, struct sched_entity *se)
> > +{
> > + for_each_sched_entity(se) {
> > + delta = calc_delta_mine(delta,
> > + se->load.weight, &cfs_rq_of(se)->load);
> > + }
> > +
> > + return delta;
> > +}
> > +
> > +/*
> > + * delta *= rw / w
> > + */
> > +static inline unsigned long
> > +calc_delta_fair(unsigned long delta, struct sched_entity *se)
> > +{
> > + for_each_sched_entity(se) {
> > + delta = calc_delta_mine(delta,
> > + cfs_rq_of(se)->load.weight, &se->load);
> > + }
> > +
> > + return delta;
> > +}
> > +
>
> These functions can do with better comments
you mean like:
/*
* delta *= \Prod_{i} rw_{i} / w_{i} ?
*/
?
> delta is scaled up as we move up the hierarchy
>
> Why is calc_delta_weight() different from calc_delta_fair()?
Because they do the opposite operation.
I agree though that perhaps the names could have been chosen better.
I've wondered about that at several occasions but so far failed to come
up with anything sane.
> > +/*
> > * The idea is to set a period in which each task runs once.
> > *
> > * When there are too many tasks (sysctl_sched_nr_latency) we have to stretch
> > @@ -362,47 +390,54 @@ static u64 __sched_period(unsigned long
> > */
> > static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
> > {
> > - u64 slice = __sched_period(cfs_rq->nr_running);
> > -
> > - for_each_sched_entity(se) {
> > - cfs_rq = cfs_rq_of(se);
> > -
> > - slice *= se->load.weight;
> > - do_div(slice, cfs_rq->load.weight);
> > - }
> > -
> > -
> > - return slice;
> > + return calc_delta_weight(__sched_period(cfs_rq->nr_running), se);
> > }
> >
> > /*
> > * We calculate the vruntime slice of a to be inserted task
> > *
> > - * vs = s/w = p/rw
> > + * vs = s*rw/w = p
> > */
> > static u64 sched_vslice_add(struct cfs_rq *cfs_rq, struct sched_entity *se)
> > {
> > unsigned long nr_running = cfs_rq->nr_running;
> > - unsigned long weight;
> > - u64 vslice;
> >
> > if (!se->on_rq)
> > nr_running++;
> >
> > - vslice = __sched_period(nr_running);
> > + return __sched_period(nr_running);
>
> Do we always return a constant value based on nr_running? Am I
> misreading the diff by any chance?
static u64 __sched_period(unsigned long nr_running)
{
u64 period = sysctl_sched_latency;
unsigned long nr_latency = sched_nr_latency;
if (unlikely(nr_running > nr_latency)) {
period = sysctl_sched_min_granularity;
period *= nr_running;
}
return period;
}
its not exactly constant..
> > +}
> > +
> > +/*
> > + * The goal of calc_delta_asym() is to be asymmetrically around NICE_0_LOAD, in
> > + * that it favours >=0 over <0.
> > + *
> > + * -20 |
> > + * |
> > + * 0 --------+-------
> > + * .'
> > + * 19 .'
> > + *
> > + */
> > +static unsigned long
> > +calc_delta_asym(unsigned long delta, struct sched_entity *se)
> > +{
> > + struct load_weight lw = {
> > + .weight = NICE_0_LOAD,
> > + .inv_weight = 1UL << (WMULT_SHIFT-NICE_0_SHIFT)
> > + };
>
> Could you please explain this
>
> weight is 1 << 10
> and inv_weight is 1 << 22
we have the relation that:
x/weight ~= (x*inv_weight) >> 32
or
inv_weight = (1<<32) / weight
See kernel/sched.c:calc_delta_mine()
when weight is 1<<10, that reduces to 1<<(32-10) = 1<<22
> >
> > for_each_sched_entity(se) {
> > - cfs_rq = cfs_rq_of(se);
> > + struct load_weight *se_lw = &se->load;
> >
> > - weight = cfs_rq->load.weight;
> > - if (!se->on_rq)
> > - weight += se->load.weight;
> > + if (se->load.weight < NICE_0_LOAD)
> > + se_lw = &lw;
>
> Why do we do this?
You're basically asking what the _asym part is about, right?
So, what this patch does is change the virtual time calculation from:
1 / w, to rw / w
[ actuallly to: \Prod_{i} rw_{i}/w_{i} ]
Now wakeup_gran() has this asymetry:
> > /*
> > - * More easily preempt - nice tasks, while not making
> > - * it harder for + nice tasks.
> > */
> > - if (unlikely(se->load.weight > NICE_0_LOAD))
> > - gran = calc_delta_fair(gran, &se->load);
calc_delta_asym() tries to generalize that to the new scheme. As you can
see from the next two patches the code in this patch isn't perfect. This
patch just restores the status quo to before the revert, the next
patches continue.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/