[PATCH] Make for_each_cpu_mask a bit smaller - Kernel

This is a discussion on [PATCH] Make for_each_cpu_mask a bit smaller - Kernel ; The for_each_cpu_mask loop is used quite often in the kernel. It makes use of two functions: first_cpu and next_cpu. This patch changes for_each_cpu_mask to use only one function: a newly introduced find_next_cpu. Each use of the for_each_cpu_mask constuct then becomes ...

+ Reply to Thread
Results 1 to 15 of 15

Thread: [PATCH] Make for_each_cpu_mask a bit smaller

  1. [PATCH] Make for_each_cpu_mask a bit smaller

    The for_each_cpu_mask loop is used quite often in the kernel. It
    makes use of two functions: first_cpu and next_cpu. This patch
    changes for_each_cpu_mask to use only one function: a newly
    introduced find_next_cpu. Each use of the for_each_cpu_mask
    constuct then becomes a few bytes smaller. An x86_64 defconfig
    kernel is about 2000 bytes smaller with this patch applied:

    text data bss dec hex filename
    5395732 976736 734280 7106748 6c70bc vmlinux.orig
    5393639 976736 734280 7104655 6c688f vmlinux

    Runs fine on qemu UP/SMP x86_64/i386.

    Signed-off-by: Alexander van Heukelum

    ---

    Hello Andrew,

    Could you add this patch to -mm?

    Greetings,
    Alexander

    include/linux/cpumask.h | 14 ++++++++------
    lib/cpumask.c | 6 ++++++
    2 files changed, 14 insertions(+), 6 deletions(-)

    diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
    index 9650806..a760e29 100644
    --- a/include/linux/cpumask.h
    +++ b/include/linux/cpumask.h
    @@ -221,9 +221,11 @@ int __first_cpu(const cpumask_t *srcp);
    #define first_cpu(src) __first_cpu(&(src))
    int __next_cpu(int n, const cpumask_t *srcp);
    #define next_cpu(n, src) __next_cpu((n), &(src))
    +int find_next_cpu_mask(int n, const cpumask_t *srcp);
    #else
    -#define first_cpu(src) ({ (void)(src); 0; })
    -#define next_cpu(n, src) ({ (void)(src); 1; })
    +#define first_cpu(src) ({ (void)(src); 0; })
    +#define next_cpu(n, src) ({ (void)(src); 1; })
    +#define find_next_cpu_mask(n, src) ({ (void)(src); n; })
    #endif

    #ifdef CONFIG_HAVE_CPUMASK_OF_CPU_MAP
    @@ -351,10 +353,10 @@ static inline void __cpus_fold(cpumask_t *dstp, const cpumask_t *origp,
    }

    #if NR_CPUS > 1
    -#define for_each_cpu_mask(cpu, mask) \
    - for ((cpu) = first_cpu(mask); \
    - (cpu) < NR_CPUS; \
    - (cpu) = next_cpu((cpu), (mask)))
    +#define for_each_cpu_mask(cpu, mask) \
    + for ((cpu) = 0; \
    + (cpu) = find_next_cpu_mask((cpu), &(mask)), \
    + (cpu) < NR_CPUS; (cpu)++)
    #else /* NR_CPUS == 1 */
    #define for_each_cpu_mask(cpu, mask) \
    for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask)
    diff --git a/lib/cpumask.c b/lib/cpumask.c
    index bb4f76d..93dd6ca 100644
    --- a/lib/cpumask.c
    +++ b/lib/cpumask.c
    @@ -15,6 +15,12 @@ int __next_cpu(int n, const cpumask_t *srcp)
    }
    EXPORT_SYMBOL(__next_cpu);

    +int find_next_cpu_mask(int n, const cpumask_t *srcp)
    +{
    + return find_next_bit(srcp->bits, NR_CPUS, n);
    +}
    +EXPORT_SYMBOL(find_next_cpu_mask);
    +
    int __any_online_cpu(const cpumask_t *mask)
    {
    int cpu;
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: [PATCH] Make for_each_cpu_mask a bit smaller

    Alexander wrote:
    > This patch
    > changes for_each_cpu_mask to use only one function: a newly
    > introduced find_next_cpu.


    I believe that it's for_each_cpu_mask which is newly introduced,
    not find_next_cpu ... just a typo, granted.

    Any chance that you could make the same change to nodemask.h?
    Where practical, I like to keep cpumask.h and nodemask.h the same.

    --
    I won't rest till it's the best ...
    Programmer, Linux Scalability
    Paul Jackson 1.940.382.4214
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: [PATCH] Make for_each_cpu_mask a bit smaller

    On Sun, May 11, 2008 at 03:50:39PM +0200, Alexander van Heukelum wrote:
    > #if NR_CPUS > 1
    > -#define for_each_cpu_mask(cpu, mask) \
    > - for ((cpu) = first_cpu(mask); \
    > - (cpu) < NR_CPUS; \
    > - (cpu) = next_cpu((cpu), (mask)))
    > +#define for_each_cpu_mask(cpu, mask) \
    > + for ((cpu) = 0; \
    > + (cpu) = find_next_cpu_mask((cpu), &(mask)), \
    > + (cpu) < NR_CPUS; (cpu)++)


    For anyone else having similar cognitive dissonance while reading this
    thinking "But won't the first call to find_next_cpu_mask return a number
    > 0", the answer is "no, find_next_bit returns the next set bit that's
    >= the number passed in, which is why we need both the cpu++ and

    find_next_cpu_mask".

    > +int find_next_cpu_mask(int n, const cpumask_t *srcp)
    > +{
    > + return find_next_bit(srcp->bits, NR_CPUS, n);
    > +}
    > +EXPORT_SYMBOL(find_next_cpu_mask);


    Maybe a better name for this function would help. I can't think of a
    good one right now though.

    --
    Intel are signing my paycheques ... these opinions are still mine
    "Bill, look, we understand that you're interested in selling us this
    operating system, but compare it to ours. We can't possibly take such
    a retrograde step."
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. [RFC/PATCH] Make for_each_node_mask out-of-line

    The for_each_node_mask loop makes use of two inlined functions:
    first_node and next_node. This patch changes for_each_node_mask
    to use only one out-of-line function: find_next_node_mask. An
    x86_64 defconfig kernel is about 1500 bytes smaller with this
    patch applied:

    text data bss dec hex filename
    5395732 976736 734280 7106748 6c70bc vmlinux.orig
    5394174 976736 734280 7105190 6c6aa6 vmlinux

    Signed-off-by: Alexander van Heukelum

    ---

    Hello,
    > Alexander wrote:
    > > This patch
    > > changes for_each_cpu_mask to use only one function: a newly
    > > introduced find_next_cpu.

    >
    > I believe that it's for_each_cpu_mask which is newly introduced,
    > not find_next_cpu ... just a typo, granted.


    I meant find_next_cpu_mask. That was a last-minute change of mind
    about the name :/.

    > Any chance that you could make the same change to nodemask.h?
    > Where practical, I like to keep cpumask.h and nodemask.h the same.


    Sure. This patch introduces lib/nodemask.c, but I'm not quite sure
    if building it should depend on CONFIG_SMP or something else (NUMA?).
    When is MAX_NUMNODES 1?

    It seems to work on qemu x86_64-smp and i386-up.

    I'ld be happy to take a stab at aligning the cpumask and nodemask
    code even more by uninlining some more functions and using stubs
    for the MAX_NUMNODES=1 case.

    Greetings,
    Alexander

    include/linux/nodemask.h | 11 +++++++----
    lib/Makefile | 2 +-
    lib/nodemask.c | 10 ++++++++++
    3 files changed, 18 insertions(+), 5 deletions(-)
    create mode 100644 lib/nodemask.c

    diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h
    index 848025c..dbc80f0 100644
    --- a/include/linux/nodemask.h
    +++ b/include/linux/nodemask.h
    @@ -239,6 +239,8 @@ static inline int __next_node(int n, const nodemask_t *srcp)
    return min_t(int,MAX_NUMNODES,find_next_bit(srcp->bits, MAX_NUMNODES, n+1));
    }

    +int find_next_node_mask(int n, const nodemask_t *srcp);
    +
    #define nodemask_of_node(node) \
    ({ \
    typeof(_unused_nodemask_arg_) m; \
    @@ -347,10 +349,11 @@ static inline void __nodes_fold(nodemask_t *dstp, const nodemask_t *origp,
    }

    #if MAX_NUMNODES > 1
    -#define for_each_node_mask(node, mask) \
    - for ((node) = first_node(mask); \
    - (node) < MAX_NUMNODES; \
    - (node) = next_node((node), (mask)))
    +#define for_each_node_mask(node, mask) \
    + for ((node) = 0; \
    + (node) = find_next_node_mask((node), &(mask)), \
    + (node) < MAX_NUMNODES; \
    + (node)++)
    #else /* MAX_NUMNODES == 1 */
    #define for_each_node_mask(node, mask) \
    if (!nodes_empty(mask)) \
    diff --git a/lib/Makefile b/lib/Makefile
    index 74b0cfb..48fc85c 100644
    --- a/lib/Makefile
    +++ b/lib/Makefile
    @@ -9,7 +9,7 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \
    proportions.o prio_heap.o ratelimit.o

    lib-$(CONFIG_MMU) += ioremap.o
    -lib-$(CONFIG_SMP) += cpumask.o
    +lib-$(CONFIG_SMP) += cpumask.o nodemask.o

    lib-y += kobject.o kref.o klist.o

    diff --git a/lib/nodemask.c b/lib/nodemask.c
    new file mode 100644
    index 0000000..08341d3
    --- /dev/null
    +++ b/lib/nodemask.c
    @@ -0,0 +1,10 @@
    +#include
    +#include
    +#include
    +#include
    +
    +int find_next_node_mask(int n, const nodemask_t *srcp)
    +{
    + return find_next_bit(srcp->bits, NR_CPUS, n);
    +}
    +EXPORT_SYMBOL(find_next_node_mask);

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: [PATCH] Make for_each_cpu_mask a bit smaller


    On Sun, 11 May 2008 09:24:40 -0600, "Matthew Wilcox"
    said:
    > On Sun, May 11, 2008 at 03:50:39PM +0200, Alexander van Heukelum wrote:
    > > #if NR_CPUS > 1
    > > -#define for_each_cpu_mask(cpu, mask) \
    > > - for ((cpu) = first_cpu(mask); \
    > > - (cpu) < NR_CPUS; \
    > > - (cpu) = next_cpu((cpu), (mask)))
    > > +#define for_each_cpu_mask(cpu, mask) \
    > > + for ((cpu) = 0; \
    > > + (cpu) = find_next_cpu_mask((cpu), &(mask)), \
    > > + (cpu) < NR_CPUS; (cpu)++)

    >
    > For anyone else having similar cognitive dissonance while reading this
    > thinking "But won't the first call to find_next_cpu_mask return a number
    > > 0", the answer is "no, find_next_bit returns the next set bit that's
    > >= the number passed in, which is why we need both the cpu++ and

    > find_next_cpu_mask".


    That's how it works, indeed.

    > > +int find_next_cpu_mask(int n, const cpumask_t *srcp)
    > > +{
    > > + return find_next_bit(srcp->bits, NR_CPUS, n);
    > > +}
    > > +EXPORT_SYMBOL(find_next_cpu_mask);

    >
    > Maybe a better name for this function would help. I can't think of a
    > good one right now though.


    I can't think of a better name, and there is find_next_bit of which
    find_next_cpu_mask is just a wrapper. I think the name is good enough.

    Greetings,
    Alexander
    --
    Alexander van Heukelum
    heukelum@fastmail.fm

    --
    http://www.fastmail.fm - Access all of your messages and folders
    wherever you are

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: [RFC/PATCH] Make for_each_node_mask out-of-line

    Alexander wrote:
    > Sure. This patch introduces lib/nodemask.c, but I'm not quite sure
    > if building it should depend on CONFIG_SMP or something else (NUMA?).
    > When is MAX_NUMNODES 1?


    Well ... I'm pretty sure it made sense to depend on SMP, back when
    it was first added. However that might have changed. I recall
    vaguely that there has been discussion of this CONFIG_SMP dependency
    every year or so, but I don't have the time right now to dig through
    the archives and code to figure it out.

    So ... offhand ... good questions, but I don't have answers.


    > I'ld be happy to take a stab at aligning the cpumask and nodemask
    > code even more by uninlining some more functions and using stubs
    > for the MAX_NUMNODES=1 case.


    That could be good ... though could you co-ordinate with Mike Travis
    first, to minimize the risks of merge conflicts with what he's doing?

    You kernel text space saving in the first patch seemed worth going
    ahead with even if it did conflict a little, and I liked the matching
    nodemask patch, just to keep things in sync. Other nodemask cleanup
    is a little lower priority in my book, so should make a modest effort
    to co-ordinate with more critical patches, to minimize conflict.

    --
    I won't rest till it's the best ...
    Programmer, Linux Scalability
    Paul Jackson 1.940.382.4214
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: [PATCH] Make for_each_cpu_mask a bit smaller

    On Sun, May 11, 2008 at 06:19:39PM +0200, Alexander van Heukelum wrote:
    >
    > On Sun, 11 May 2008 09:24:40 -0600, "Matthew Wilcox"
    > said:
    > > On Sun, May 11, 2008 at 03:50:39PM +0200, Alexander van Heukelum wrote:
    > > > #if NR_CPUS > 1
    > > > -#define for_each_cpu_mask(cpu, mask) \
    > > > - for ((cpu) = first_cpu(mask); \
    > > > - (cpu) < NR_CPUS; \
    > > > - (cpu) = next_cpu((cpu), (mask)))
    > > > +#define for_each_cpu_mask(cpu, mask) \
    > > > + for ((cpu) = 0; \
    > > > + (cpu) = find_next_cpu_mask((cpu), &(mask)), \
    > > > + (cpu) < NR_CPUS; (cpu)++)

    > >
    > > For anyone else having similar cognitive dissonance while reading this
    > > thinking "But won't the first call to find_next_cpu_mask return a number
    > > > 0", the answer is "no, find_next_bit returns the next set bit that's
    > > >= the number passed in, which is why we need both the cpu++ and

    > > find_next_cpu_mask".

    >
    > That's how it works, indeed.
    >
    > > > +int find_next_cpu_mask(int n, const cpumask_t *srcp)
    > > > +{
    > > > + return find_next_bit(srcp->bits, NR_CPUS, n);
    > > > +}
    > > > +EXPORT_SYMBOL(find_next_cpu_mask);

    > >
    > > Maybe a better name for this function would help. I can't think of a
    > > good one right now though.

    >
    > I can't think of a better name, and there is find_next_bit of which
    > find_next_cpu_mask is just a wrapper. I think the name is good enough.


    How about doing it this way?

    #define for_each_cpu_mask(cpu, mask) \
    for ((cpu) = -1; \
    (cpu) < NR_CPUS; \
    (cpu) = find_next_cpu_mask((cpu), &(mask)))

    int find_next_cpu_mask(int n, const cpumask_t *srcp)
    {
    return find_next_bit(srcp->bits, NR_CPUS, ++n);
    }

    That actually behaves the way I'd expect a function called
    'find_next_cpu_mask' to work. It also abuses the 'for' condtion
    less and might take a little less text space.

    --
    Intel are signing my paycheques ... these opinions are still mine
    "Bill, look, we understand that you're interested in selling us this
    operating system, but compare it to ours. We can't possibly take such
    a retrograde step."
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: [PATCH] Make for_each_cpu_mask a bit smaller

    On Sun, 11 May 2008 16:01:12 -0600, "Matthew Wilcox"
    said:
    > On Sun, May 11, 2008 at 06:19:39PM +0200, Alexander van Heukelum wrote:
    > >
    > > On Sun, 11 May 2008 09:24:40 -0600, "Matthew Wilcox"
    > > said:
    > > > On Sun, May 11, 2008 at 03:50:39PM +0200, Alexander van Heukelum wrote:
    > > > > #if NR_CPUS > 1
    > > > > -#define for_each_cpu_mask(cpu, mask) \
    > > > > - for ((cpu) = first_cpu(mask); \
    > > > > - (cpu) < NR_CPUS; \
    > > > > - (cpu) = next_cpu((cpu), (mask)))
    > > > > +#define for_each_cpu_mask(cpu, mask) \
    > > > > + for ((cpu) = 0; \
    > > > > + (cpu) = find_next_cpu_mask((cpu), &(mask)), \
    > > > > + (cpu) < NR_CPUS; (cpu)++)
    > > >
    > > > For anyone else having similar cognitive dissonance while reading this
    > > > thinking "But won't the first call to find_next_cpu_mask return a number
    > > > > 0", the answer is "no, find_next_bit returns the next set bit that's
    > > > >= the number passed in, which is why we need both the cpu++ and
    > > > find_next_cpu_mask".

    > >
    > > That's how it works, indeed.
    > >
    > > > > +int find_next_cpu_mask(int n, const cpumask_t *srcp)
    > > > > +{
    > > > > + return find_next_bit(srcp->bits, NR_CPUS, n);
    > > > > +}
    > > > > +EXPORT_SYMBOL(find_next_cpu_mask);
    > > >
    > > > Maybe a better name for this function would help. I can't think of a
    > > > good one right now though.

    > >
    > > I can't think of a better name, and there is find_next_bit of which
    > > find_next_cpu_mask is just a wrapper. I think the name is good enough.

    >
    > How about doing it this way?
    >
    > #define for_each_cpu_mask(cpu, mask) \
    > for ((cpu) = -1; \
    > (cpu) < NR_CPUS; \
    > (cpu) = find_next_cpu_mask((cpu), &(mask)))
    >
    > int find_next_cpu_mask(int n, const cpumask_t *srcp)
    > {
    > return find_next_bit(srcp->bits, NR_CPUS, ++n);
    > }
    >
    > That actually behaves the way I'd expect a function called
    > 'find_next_cpu_mask' to work. It also abuses the 'for' condtion
    > less and might take a little less text space.


    But it does not work.

    It introduces a stray cpu=-1 iteration if cpu happens to be
    (replaced by) a signed variable.

    It skips the entire loop if cpu happens to be unsigned.

    I don't think that using 'for' in a less conventional way
    is bad if it is hidden in a macro, as long as the name of
    the macro makes the intention sufficiently clear.

    I think of find_next_cpu_mask(cpu, mask) as: "find next
    cpu-index in mask, starting at index cpu". And similar
    with find_next_bit.

    As for the text-space argument, I think you might be right.
    Just not on i386/x86_64 where initialising a register to -1
    can be done in three bytes, initialising to 0 in two bytes
    and an increment in one byte :-).

    Greetings,
    Alexander

    > --
    > Intel are signing my paycheques ... these opinions are still mine
    > "Bill, look, we understand that you're interested in selling us this
    > operating system, but compare it to ours. We can't possibly take such
    > a retrograde step."

    --
    Alexander van Heukelum
    heukelum@fastmail.fm

    --
    http://www.fastmail.fm - mmm... Fastmail...

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: [PATCH] Make for_each_cpu_mask a bit smaller

    On Mon, May 12, 2008 at 01:04:54PM +0200, Alexander van Heukelum wrote:
    > > #define for_each_cpu_mask(cpu, mask) \
    > > for ((cpu) = -1; \
    > > (cpu) < NR_CPUS; \
    > > (cpu) = find_next_cpu_mask((cpu), &(mask)))
    > >
    > > int find_next_cpu_mask(int n, const cpumask_t *srcp)
    > > {
    > > return find_next_bit(srcp->bits, NR_CPUS, ++n);
    > > }
    > >
    > > That actually behaves the way I'd expect a function called
    > > 'find_next_cpu_mask' to work. It also abuses the 'for' condtion
    > > less and might take a little less text space.

    >
    > But it does not work.


    You're right, of course. Either of these should work:

    #define for_each_cpu_mask(cpu, mask) \
    for ((cpu) = find_next_cpu_mask(-1, &(mask)); \
    (cpu) < NR_CPUS; \
    (cpu) = find_next_cpu_mask((cpu), &(mask)))

    #define for_each_cpu_mask(cpu, mask) \
    for ((cpu) = -1; \
    (cpu) = find_next_cpu_mask((cpu), &(mask)), (cpu) < NR_CPUS; \
    )

    int find_next_cpu_mask(int n, const cpumask_t *srcp)
    {
    return find_next_bit(srcp->bits, NR_CPUS, ++n);
    }

    > I think of find_next_cpu_mask(cpu, mask) as: "find next
    > cpu-index in mask, starting at index cpu". And similar
    > with find_next_bit.


    If you're determined to stick to your original formulation, renaming it
    to find_valid_cpu_mask() would work better.

    > As for the text-space argument, I think you might be right.
    > Just not on i386/x86_64 where initialising a register to -1
    > can be done in three bytes, initialising to 0 in two bytes
    > and an increment in one byte :-).


    While the initialisation may take one extra byte, the increment is
    done in the function, and so takes at least one byte out of the loop.
    Of course on other instruction sets, that's probably 4 bytes out of the
    loop and they can initialise to -1 as cheaply as init to 0, so it's a
    win on non-x86 and neutral on x86.

    --
    Intel are signing my paycheques ... these opinions are still mine
    "Bill, look, we understand that you're interested in selling us this
    operating system, but compare it to ours. We can't possibly take such
    a retrograde step."
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: [RFC/PATCH] Make for_each_node_mask out-of-line

    On Sun, 11 May 2008 16:01:04 -0500, "Paul Jackson" said:
    > Alexander wrote:
    > > Sure. This patch introduces lib/nodemask.c, but I'm not quite sure
    > > if building it should depend on CONFIG_SMP or something else (NUMA?).
    > > When is MAX_NUMNODES 1?

    >
    > Well ... I'm pretty sure it made sense to depend on SMP, back when
    > it was first added. However that might have changed. I recall
    > vaguely that there has been discussion of this CONFIG_SMP dependency
    > every year or so, but I don't have the time right now to dig through
    > the archives and code to figure it out.
    >
    > So ... offhand ... good questions, but I don't have answers.
    >
    > > I'ld be happy to take a stab at aligning the cpumask and nodemask
    > > code even more by uninlining some more functions and using stubs
    > > for the MAX_NUMNODES=1 case.

    >
    > That could be good ... though could you co-ordinate with Mike Travis
    > first, to minimize the risks of merge conflicts with what he's doing?


    I believe the x86#testing tree includes Mike's work? The two patches
    in this thread apply fine to current x86#testing.

    > You kernel text space saving in the first patch seemed worth going
    > ahead with even if it did conflict a little, and I liked the matching
    > nodemask patch, just to keep things in sync. Other nodemask cleanup
    > is a little lower priority in my book, so should make a modest effort
    > to co-ordinate with more critical patches, to minimize conflict.


    Thanks for your guidance.

    Greetings,
    Alexander

    > --
    > I won't rest till it's the best ...
    > Programmer, Linux Scalability
    > Paul Jackson 1.940.382.4214

    --
    Alexander van Heukelum
    heukelum@fastmail.fm

    --
    http://www.fastmail.fm - Accessible with your email software
    or over the web

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. Re: [RFC/PATCH] Make for_each_node_mask out-of-line

    Alexander van Heukelum wrote:
    > On Sun, 11 May 2008 16:01:04 -0500, "Paul Jackson" said:
    >> Alexander wrote:
    >>> Sure. This patch introduces lib/nodemask.c, but I'm not quite sure
    >>> if building it should depend on CONFIG_SMP or something else (NUMA?).
    >>> When is MAX_NUMNODES 1?

    >> Well ... I'm pretty sure it made sense to depend on SMP, back when
    >> it was first added. However that might have changed. I recall
    >> vaguely that there has been discussion of this CONFIG_SMP dependency
    >> every year or so, but I don't have the time right now to dig through
    >> the archives and code to figure it out.
    >>
    >> So ... offhand ... good questions, but I don't have answers.
    >>
    >>> I'ld be happy to take a stab at aligning the cpumask and nodemask
    >>> code even more by uninlining some more functions and using stubs
    >>> for the MAX_NUMNODES=1 case.

    >> That could be good ... though could you co-ordinate with Mike Travis
    >> first, to minimize the risks of merge conflicts with what he's doing?

    >
    > I believe the x86#testing tree includes Mike's work? The two patches
    > in this thread apply fine to current x86#testing.


    Yes, my only change right now is to use nr_cpu_ids instead of NR_CPUS
    which short cuts a huge chunk out of the back end of the loop. And my
    changes are in sched/latest (which includes x86/latest).

    Thanks,
    Mike

    >
    >> You kernel text space saving in the first patch seemed worth going
    >> ahead with even if it did conflict a little, and I liked the matching
    >> nodemask patch, just to keep things in sync. Other nodemask cleanup
    >> is a little lower priority in my book, so should make a modest effort
    >> to co-ordinate with more critical patches, to minimize conflict.

    >
    > Thanks for your guidance.
    >
    > Greetings,
    > Alexander
    >
    >> --
    >> I won't rest till it's the best ...
    >> Programmer, Linux Scalability
    >> Paul Jackson 1.940.382.4214


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. [PATCHv2] Make for_each_cpu_mask a bit smaller

    The for_each_cpu_mask loop is used quite often in the kernel. It
    makes use of two functions: first_cpu and next_cpu. This patch
    changes for_each_cpu_mask to use only the latter. Because next_cpu
    finds the next eligible cpu _after_ the given one, the iteration
    variable has to be initialized to -1 and next_cpu has to be
    called with this value before the first iteration. An x86_64
    defconfig kernel (from sched/latest) is about 2500 bytes smaller
    with this patch applied:

    text data bss dec hex filename
    6222517 917952 749932 7890401 7865e1 vmlinux.orig
    6219922 917952 749932 7887806 785bbe vmlinux

    The same size reduction is seen for defconfig+MAXSMP

    text data bss dec hex filename
    6241772 2563968 1492716 10298456 9d2458 vmlinux.orig
    6239211 2563968 1492716 10295895 9d1a57 vmlinux

    Signed-off-by: Alexander van Heukelum

    ---

    Hello Mike, Ingo,

    Here is a version of the for_each_cpu_mask change that applies on
    top of sched/latest. I think it is non-intrusive enough to put
    it in sched/latest?

    This version reuses the already-existing api next_cpu instead
    of inventing a new one; initializing the iteration counter to
    -1 was suggested by Matthew Wilcox.

    Greetings,
    Alexander

    include/linux/cpumask.h | 16 ++++++++--------
    1 files changed, 8 insertions(+), 8 deletions(-)

    diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
    index 73434e5..74b748b 100644
    --- a/include/linux/cpumask.h
    +++ b/include/linux/cpumask.h
    @@ -377,10 +377,10 @@ int __any_online_cpu(const cpumask_t *mask);
    #define first_cpu(src) __first_cpu(&(src))
    #define next_cpu(n, src) __next_cpu((n), &(src))
    #define any_online_cpu(mask) __any_online_cpu(&(mask))
    -#define for_each_cpu_mask(cpu, mask) \
    - for ((cpu) = first_cpu(mask); \
    - (cpu) < NR_CPUS; \
    - (cpu) = next_cpu((cpu), (mask)))
    +#define for_each_cpu_mask(cpu, mask) \
    + for ((cpu) = ~((typeof(cpu))0); \
    + (cpu) = next_cpu((cpu), (mask)), \
    + (cpu) < NR_CPUS; )
    #endif

    #if NR_CPUS <= 64
    @@ -394,10 +394,10 @@ int __any_online_cpu(const cpumask_t *mask);
    int __next_cpu_nr(int n, const cpumask_t *srcp);
    #define next_cpu_nr(n, src) __next_cpu_nr((n), &(src))
    #define cpus_weight_nr(cpumask) __cpus_weight(&(cpumask), nr_cpu_ids)
    -#define for_each_cpu_mask_nr(cpu, mask) \
    - for ((cpu) = first_cpu(mask); \
    - (cpu) < nr_cpu_ids; \
    - (cpu) = next_cpu_nr((cpu), (mask)))
    +#define for_each_cpu_mask_nr(cpu, mask) \
    + for ((cpu) = ~((typeof(cpu))0); \
    + (cpu) = next_cpu_nr((cpu), (mask)), \
    + (cpu) < nr_cpu_ids; )

    #endif /* NR_CPUS > 64 */

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  13. Re: [PATCHv2] Make for_each_cpu_mask a bit smaller

    Alexander van Heukelum writes:

    > +#define for_each_cpu_mask(cpu, mask) \
    > + for ((cpu) = ~((typeof(cpu))0); \


    There is no need for such a complicated expression, -1 will work for
    every (arithmetic) type.

    Andreas.

    --
    Andreas Schwab, SuSE Labs, schwab@suse.de
    SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
    PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
    "And now for something completely different."
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  14. [PATCHv3] Make for_each_cpu_mask a bit smaller

    The for_each_cpu_mask loop is used quite often in the kernel. It
    makes use of two functions: first_cpu and next_cpu. This patch
    changes for_each_cpu_mask to use only the latter. Because next_cpu
    finds the next eligible cpu _after_ the given one, the iteration
    variable has to be initialized to -1 and next_cpu has to be
    called with this value before the first iteration. An x86_64
    defconfig kernel (from sched/latest) is about 2500 bytes smaller
    with this patch applied:

    text data bss dec hex filename
    6222517 917952 749932 7890401 7865e1 vmlinux.orig
    6219922 917952 749932 7887806 785bbe vmlinux

    The same size reduction is seen for defconfig+MAXSMP

    text data bss dec hex filename
    6241772 2563968 1492716 10298456 9d2458 vmlinux.orig
    6239211 2563968 1492716 10295895 9d1a57 vmlinux

    Signed-off-by: Alexander van Heukelum

    ---

    On Mon, May 12, 2008, Andreas Schwab wrote:
    > > +#define for_each_cpu_mask(cpu, mask) \
    > > + for ((cpu) = ~((typeof(cpu))0); \

    >
    > There is no need for such a complicated expression, -1 will work for
    > every (arithmetic) type.


    Indeed, thanks.

    This version applies on top of sched/latest.

    This version reuses the already-existing api next_cpu instead
    of inventing a new one; initializing the iteration counter to
    -1 was suggested by Matthew Wilcox. Now with a -1 instead of
    an overly carefull ~((typeof(cpu))0). "-1" is properly sign-
    extended even if cpu is u64 in a 32-bit environment.

    Greetings,
    Alexander

    include/linux/cpumask.h | 16 ++++++++--------
    1 files changed, 8 insertions(+), 8 deletions(-)

    diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
    index 73434e5..c24a556 100644
    --- a/include/linux/cpumask.h
    +++ b/include/linux/cpumask.h
    @@ -377,10 +377,10 @@ int __any_online_cpu(const cpumask_t *mask);
    #define first_cpu(src) __first_cpu(&(src))
    #define next_cpu(n, src) __next_cpu((n), &(src))
    #define any_online_cpu(mask) __any_online_cpu(&(mask))
    -#define for_each_cpu_mask(cpu, mask) \
    - for ((cpu) = first_cpu(mask); \
    - (cpu) < NR_CPUS; \
    - (cpu) = next_cpu((cpu), (mask)))
    +#define for_each_cpu_mask(cpu, mask) \
    + for ((cpu) = -1; \
    + (cpu) = next_cpu((cpu), (mask)), \
    + (cpu) < NR_CPUS; )
    #endif

    #if NR_CPUS <= 64
    @@ -394,10 +394,10 @@ int __any_online_cpu(const cpumask_t *mask);
    int __next_cpu_nr(int n, const cpumask_t *srcp);
    #define next_cpu_nr(n, src) __next_cpu_nr((n), &(src))
    #define cpus_weight_nr(cpumask) __cpus_weight(&(cpumask), nr_cpu_ids)
    -#define for_each_cpu_mask_nr(cpu, mask) \
    - for ((cpu) = first_cpu(mask); \
    - (cpu) < nr_cpu_ids; \
    - (cpu) = next_cpu_nr((cpu), (mask)))
    +#define for_each_cpu_mask_nr(cpu, mask) \
    + for ((cpu) = -1; \
    + (cpu) = next_cpu_nr((cpu), (mask)), \
    + (cpu) < nr_cpu_ids; )

    #endif /* NR_CPUS > 64 */

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  15. Re: [PATCHv3] Make for_each_cpu_mask a bit smaller


    * Alexander van Heukelum wrote:

    > The for_each_cpu_mask loop is used quite often in the kernel. It makes
    > use of two functions: first_cpu and next_cpu. This patch changes
    > for_each_cpu_mask to use only the latter. Because next_cpu finds the
    > next eligible cpu _after_ the given one, the iteration variable has to
    > be initialized to -1 and next_cpu has to be called with this value
    > before the first iteration. An x86_64 defconfig kernel (from
    > sched/latest) is about 2500 bytes smaller with this patch applied:
    >
    > text data bss dec hex filename
    > 6222517 917952 749932 7890401 7865e1 vmlinux.orig
    > 6219922 917952 749932 7887806 785bbe vmlinux
    >
    > The same size reduction is seen for defconfig+MAXSMP
    >
    > text data bss dec hex filename
    > 6241772 2563968 1492716 10298456 9d2458 vmlinux.orig
    > 6239211 2563968 1492716 10295895 9d1a57 vmlinux


    applied for testing, thanks Alexander.

    Ingo
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread