[RFC PATCH 0/8]: uninline & uninline - Kernel

This is a discussion on [RFC PATCH 0/8]: uninline & uninline - Kernel ; Hi all, I run some lengthy tests to measure cost of inlines in headers under include/, simple coverage calculations yields to 89% but most of the failed compiles are due to preprocessor cutting the tested block away anyway. Test setup: ...

+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 20 of 28

Thread: [RFC PATCH 0/8]: uninline & uninline

  1. [RFC PATCH 0/8]: uninline & uninline

    Hi all,

    I run some lengthy tests to measure cost of inlines in headers under
    include/, simple coverage calculations yields to 89% but most of the
    failed compiles are due to preprocessor cutting the tested block away
    anyway. Test setup: v2.6.24-mm1, make allyesconfig, 32-bit x86,
    gcc (GCC) 4.1.2 20070626 (Red Hat 4.1.2-13). Because one inline was
    tested (function uninlined) at a time, the actual benefits of removing
    multiple inlines may well be below what the sum of those individually
    is (especially when something calls __-func with equal name).

    Ok, here's the top of the list (10000+ bytes):

    -110805 869 f, 198 +, 111003 -, diff: -110805 skb_put
    -41525 2066 f, 3370 +, 44895 -, diff: -41525 IS_ERR
    -36290 42 f, 197 +, 36487 -, diff: -36290 cfi_build_cmd
    -35698 1234 f, 2391 +, 38089 -, diff: -35698 atomic_dec_and_test
    -28162 354 f, 3005 +, 31167 -, diff: -28162 skb_pull
    -23668 392 f, 104 +, 23772 -, diff: -23668 dev_alloc_skb
    -22212 415 f, 130 +, 22342 -, diff: -22212 __dev_alloc_skb
    -21593 356 f, 2418 +, 24011 -, diff: -21593 skb_push
    -19036 478 f, 259 +, 19295 -, diff: -19036 netif_wake_queue
    -18409 396 f, 6447 +, 24856 -, diff: -18409 __skb_pull
    -16420 187 f, 103 +, 16523 -, diff: -16420 dst_release
    -16025 13 f, 280 +, 16305 -, diff: -16025 cfi_send_gen_cmd
    -14925 486 f, 978 +, 15903 -, diff: -14925 add_timer
    -14896 199 f, 558 +, 15454 -, diff: -14896 sg_page
    -12870 36 f, 121 +, 12991 -, diff: -12870 le_key_k_type
    -12310 692 f, 7215 +, 19525 -, diff: -12310 signal_pending
    -11640 251 f, 118 +, 11758 -, diff: -11640 __skb_trim
    -11059 111 f, 293 +, 11352 -, diff: -11059 __nlmsg_put
    -10976 209 f, 123 +, 11099 -, diff: -10976 skb_trim
    -10344 125 f, 462 +, 10806 -, diff: -10344 pskb_may_pull
    -10061 300 f, 1163 +, 11224 -, diff: -10061 try_module_get
    -10008 75 f, 341 +, 10349 -, diff: -10008 nlmsg_put

    ~250 are in 1000+ bytes category and ~440 in 500+. Full list
    has some entries without number because given config doesn't
    build them, and therefore nothing got uninlined, and the missing
    entries consists solely of compile failures, available here:

    http://www.cs.helsinki.fi/u/ijjarvin/inlines/sorted

    I made some patches to uninline couple of them (picked mostly
    net related) to stir up some discussion, however, some of them
    are not ready for inclusion as is (see patch descriptions).
    The cases don't represent all top 8 cases because some of the
    cases require a bit more analysis (e.g., config dependant,
    comments about gcc optimizations).

    The tools I used are available here except the site-specific
    distribute machinery (in addition one needs pretty late
    codiff from Arnaldo's toolset because there were some inline
    related bugs fixed lately):

    http://www.cs.helsinki.fi/u/ijjarvin/inline-tools.git/

    I'm planning to run similar tests also on inlines in headers that
    are not under include/ but it requires minor modifications to
    those tools.

    --
    i.

    ps. I'm sorry about the duplicates, old git-send-email's
    8-bit-header problem bit me again. :-(



    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. [RFC PATCH 3/8] [NET]: uninline dev_alloc_skb, de-bloats a lot

    -23668 392 funcs, 104 +, 23772 -, diff: -23668 --- dev_alloc_skb

    Signed-off-by: Ilpo Järvinen
    ---
    include/linux/skbuff.h | 17 +----------------
    net/core/skbuff.c | 18 ++++++++++++++++++
    2 files changed, 19 insertions(+), 16 deletions(-)

    diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
    index a9f8f15..df3cce2 100644
    --- a/include/linux/skbuff.h
    +++ b/include/linux/skbuff.h
    @@ -1269,22 +1269,7 @@ static inline struct sk_buff *__dev_alloc_skb(unsigned int length,
    return skb;
    }

    -/**
    - * dev_alloc_skb - allocate an skbuff for receiving
    - * @length: length to allocate
    - *
    - * Allocate a new &sk_buff and assign it a usage count of one. The
    - * buffer has unspecified headroom built in. Users should allocate
    - * the headroom they think they need without accounting for the
    - * built in space. The built in space is used for optimisations.
    - *
    - * %NULL is returned if there is no free memory. Although this function
    - * allocates memory it can be called from an interrupt.
    - */
    -static inline struct sk_buff *dev_alloc_skb(unsigned int length)
    -{
    - return __dev_alloc_skb(length, GFP_ATOMIC);
    -}
    +extern struct sk_buff *dev_alloc_skb(unsigned int length);

    extern struct sk_buff *__netdev_alloc_skb(struct net_device *dev,
    unsigned int length, gfp_t gfp_mask);
    diff --git a/net/core/skbuff.c b/net/core/skbuff.c
    index 14f462b..081bffb 100644
    --- a/net/core/skbuff.c
    +++ b/net/core/skbuff.c
    @@ -263,6 +263,24 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev,
    return skb;
    }

    +/**
    + * dev_alloc_skb - allocate an skbuff for receiving
    + * @length: length to allocate
    + *
    + * Allocate a new &sk_buff and assign it a usage count of one. The
    + * buffer has unspecified headroom built in. Users should allocate
    + * the headroom they think they need without accounting for the
    + * built in space. The built in space is used for optimisations.
    + *
    + * %NULL is returned if there is no free memory. Although this function
    + * allocates memory it can be called from an interrupt.
    + */
    +struct sk_buff *dev_alloc_skb(unsigned int length)
    +{
    + return __dev_alloc_skb(length, GFP_ATOMIC);
    +}
    +EXPORT_SYMBOL(dev_alloc_skb);
    +
    static void skb_drop_list(struct sk_buff **listp)
    {
    struct sk_buff *list = *listp;
    --
    1.5.2.2

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: [RFC PATCH 1/8] [NET]: uninline skb_put, de-bloats a lot

    Ilpo Järvinen wrote:
    > ~500 files changed
    > ...
    > kernel/uninlined.c:
    > skb_put | +104
    > 1 function changed, 104 bytes added, diff: +104
    >
    > vmlinux.o:
    > 869 functions changed, 198 bytes added, 111003 bytes removed, diff: -110805
    >
    > This change is INCOMPLETE, I think that the call to current_text_addr()
    > should be rethought but I don't have a clue how to do that.



    I guess __builtin_return_address(0) would work.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. [RFC PATCH 5/8] [NET]: uninline dst_release

    Codiff stats:
    -16420 187 funcs, 103 +, 16523 -, diff: -16420 --- dst_release

    Signed-off-by: Ilpo Järvinen
    ---
    include/net/dst.h | 10 +---------
    net/core/dst.c | 10 ++++++++++
    2 files changed, 11 insertions(+), 9 deletions(-)

    diff --git a/include/net/dst.h b/include/net/dst.h
    index e3ac7d0..bf33471 100644
    --- a/include/net/dst.h
    +++ b/include/net/dst.h
    @@ -158,15 +158,7 @@ struct dst_entry * dst_clone(struct dst_entry * dst)
    return dst;
    }

    -static inline
    -void dst_release(struct dst_entry * dst)
    -{
    - if (dst) {
    - WARN_ON(atomic_read(&dst->__refcnt) < 1);
    - smp_mb__before_atomic_dec();
    - atomic_dec(&dst->__refcnt);
    - }
    -}
    +extern void dst_release(struct dst_entry *dst);

    /* Children define the path of the packet through the
    * Linux networking. Thus, destinations are stackable.
    diff --git a/net/core/dst.c b/net/core/dst.c
    index 7deef48..cc2e724 100644
    --- a/net/core/dst.c
    +++ b/net/core/dst.c
    @@ -259,6 +259,16 @@ again:
    return NULL;
    }

    +void dst_release(struct dst_entry *dst)
    +{
    + if (dst) {
    + WARN_ON(atomic_read(&dst->__refcnt) < 1);
    + smp_mb__before_atomic_dec();
    + atomic_dec(&dst->__refcnt);
    + }
    +}
    +EXPORT_SYMBOL(dst_release);
    +
    /* Dirty hack. We did it in 2.2 (in __dst_free),
    * we have _very_ good reasons not to repeat
    * this mistake in 2.3, but we have no choice
    --
    1.5.2.2

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. [RFC PATCH 8/8] Jhash in too big for inlining, move under lib/

    vmlinux.o:
    62 functions changed, 66 bytes added, 10935 bytes removed, diff: -10869

    ....+ these to lib/jhash.o:
    jhash_3words: 112
    jhash2: 276
    jhash: 475

    select for networking code might need a more fine-grained approach.

    Signed-off-by: Ilpo Järvinen
    ---
    drivers/infiniband/Kconfig | 1 +
    drivers/net/Kconfig | 1 +
    fs/Kconfig | 1 +
    fs/dlm/Kconfig | 1 +
    fs/gfs2/Kconfig | 1 +
    include/linux/jhash.h | 99 +------------------------------------
    init/Kconfig | 2 +
    lib/Kconfig | 6 ++
    lib/Makefile | 1 +
    lib/jhash.c | 116 ++++++++++++++++++++++++++++++++++++++++++++
    net/Kconfig | 1 +
    11 files changed, 134 insertions(+), 96 deletions(-)
    create mode 100644 lib/jhash.c

    diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig
    index a5dc78a..421ab71 100644
    --- a/drivers/infiniband/Kconfig
    +++ b/drivers/infiniband/Kconfig
    @@ -2,6 +2,7 @@ menuconfig INFINIBAND
    tristate "InfiniBand support"
    depends on PCI || BROKEN
    depends on HAS_IOMEM
    + select JHASH
    ---help---
    Core support for InfiniBand (IB). Make sure to also select
    any protocols you wish to use as well as drivers for your
    diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
    index f337800..8257648 100644
    --- a/drivers/net/Kconfig
    +++ b/drivers/net/Kconfig
    @@ -2496,6 +2496,7 @@ config CHELSIO_T3
    tristate "Chelsio Communications T3 10Gb Ethernet support"
    depends on PCI
    select FW_LOADER
    + select JHASH
    help
    This driver supports Chelsio T3-based gigabit and 10Gb Ethernet
    adapters.
    diff --git a/fs/Kconfig b/fs/Kconfig
    index d731282..693fe71 100644
    --- a/fs/Kconfig
    +++ b/fs/Kconfig
    @@ -1667,6 +1667,7 @@ config NFSD
    select LOCKD
    select SUNRPC
    select EXPORTFS
    + select JHASH
    select NFSD_V2_ACL if NFSD_V3_ACL
    select NFS_ACL_SUPPORT if NFSD_V2_ACL
    select NFSD_TCP if NFSD_V4
    diff --git a/fs/dlm/Kconfig b/fs/dlm/Kconfig
    index 2dbb422..f27a99a 100644
    --- a/fs/dlm/Kconfig
    +++ b/fs/dlm/Kconfig
    @@ -4,6 +4,7 @@ menuconfig DLM
    depends on SYSFS && (IPV6 || IPV6=n)
    select CONFIGFS_FS
    select IP_SCTP
    + select JHASH
    help
    A general purpose distributed lock manager for kernel or userspace
    applications.
    diff --git a/fs/gfs2/Kconfig b/fs/gfs2/Kconfig
    index de8e64c..b9dcabf 100644
    --- a/fs/gfs2/Kconfig
    +++ b/fs/gfs2/Kconfig
    @@ -3,6 +3,7 @@ config GFS2_FS
    depends on EXPERIMENTAL
    select FS_POSIX_ACL
    select CRC32
    + select JHASH
    help
    A cluster filesystem.

    diff --git a/include/linux/jhash.h b/include/linux/jhash.h
    index 2a2f99f..14200c6 100644
    --- a/include/linux/jhash.h
    +++ b/include/linux/jhash.h
    @@ -1,25 +1,6 @@
    #ifndef _LINUX_JHASH_H
    #define _LINUX_JHASH_H

    -/* jhash.h: Jenkins hash support.
    - *
    - * Copyright (C) 1996 Bob Jenkins (bob_jenkins@burtleburtle.net)
    - *
    - * http://burtleburtle.net/bob/hash/
    - *
    - * These are the credits from Bob's sources:
    - *
    - * lookup2.c, by Bob Jenkins, December 1996, Public Domain.
    - * hash(), hash2(), hash3, and mix() are externally useful functions.
    - * Routines to test the hash are included if SELF_TEST is defined.
    - * You can use this free for any purpose. It has no warranty.
    - *
    - * Copyright (C) 2003 David S. Miller (davem@redhat.com)
    - *
    - * I've modified Bob's hash to be useful in the Linux kernel, and
    - * any bugs present are surely my fault. -DaveM
    - */
    -
    /* NOTE: Arguments are modified. */
    #define __jhash_mix(a, b, c) \
    { \
    @@ -41,77 +22,12 @@
    * of bytes. No alignment or length assumptions are made about
    * the input key.
    */
    -static inline u32 jhash(const void *key, u32 length, u32 initval)
    -{
    - u32 a, b, c, len;
    - const u8 *k = key;
    -
    - len = length;
    - a = b = JHASH_GOLDEN_RATIO;
    - c = initval;
    -
    - while (len >= 12) {
    - a += (k[0] +((u32)k[1]<<8) +((u32)k[2]<<16) +((u32)k[3]<<24));
    - b += (k[4] +((u32)k[5]<<8) +((u32)k[6]<<16) +((u32)k[7]<<24));
    - c += (k[8] +((u32)k[9]<<8) +((u32)k[10]<<16)+((u32)k[11]<<24));
    -
    - __jhash_mix(a,b,c);
    -
    - k += 12;
    - len -= 12;
    - }
    -
    - c += length;
    - switch (len) {
    - case 11: c += ((u32)k[10]<<24);
    - case 10: c += ((u32)k[9]<<16);
    - case 9 : c += ((u32)k[8]<<8);
    - case 8 : b += ((u32)k[7]<<24);
    - case 7 : b += ((u32)k[6]<<16);
    - case 6 : b += ((u32)k[5]<<8);
    - case 5 : b += k[4];
    - case 4 : a += ((u32)k[3]<<24);
    - case 3 : a += ((u32)k[2]<<16);
    - case 2 : a += ((u32)k[1]<<8);
    - case 1 : a += k[0];
    - };
    -
    - __jhash_mix(a,b,c);
    -
    - return c;
    -}
    +extern u32 jhash(const void *key, u32 length, u32 initval);

    /* A special optimized version that handles 1 or more of u32s.
    * The length parameter here is the number of u32s in the key.
    */
    -static inline u32 jhash2(const u32 *k, u32 length, u32 initval)
    -{
    - u32 a, b, c, len;
    -
    - a = b = JHASH_GOLDEN_RATIO;
    - c = initval;
    - len = length;
    -
    - while (len >= 3) {
    - a += k[0];
    - b += k[1];
    - c += k[2];
    - __jhash_mix(a, b, c);
    - k += 3; len -= 3;
    - }
    -
    - c += length * 4;
    -
    - switch (len) {
    - case 2 : b += k[1];
    - case 1 : a += k[0];
    - };
    -
    - __jhash_mix(a,b,c);
    -
    - return c;
    -}
    -
    +extern u32 jhash2(const u32 *k, u32 length, u32 initval);

    /* A special ultra-optimized versions that knows they are hashing exactly
    * 3, 2 or 1 word(s).
    @@ -119,16 +35,7 @@ static inline u32 jhash2(const u32 *k, u32 length, u32 initval)
    * NOTE: In partilar the "c += length; __jhash_mix(a,b,c);" normally
    * done at the end is not done here.
    */
    -static inline u32 jhash_3words(u32 a, u32 b, u32 c, u32 initval)
    -{
    - a += JHASH_GOLDEN_RATIO;
    - b += JHASH_GOLDEN_RATIO;
    - c += initval;
    -
    - __jhash_mix(a, b, c);
    -
    - return c;
    -}
    +extern u32 jhash_3words(u32 a, u32 b, u32 c, u32 initval);

    static inline u32 jhash_2words(u32 a, u32 b, u32 initval)
    {
    diff --git a/init/Kconfig b/init/Kconfig
    index 824d48c..3210626 100644
    --- a/init/Kconfig
    +++ b/init/Kconfig
    @@ -601,6 +601,7 @@ config FUTEX
    bool "Enable futex support" if EMBEDDED
    default y
    select RT_MUTEXES
    + select JHASH
    help
    Disabling this option will cause the kernel to be built without
    support for "fast userspace mutexes". The resulting kernel may not
    @@ -718,6 +719,7 @@ config PROFILING

    config MARKERS
    bool "Activate markers"
    + select JHASH
    help
    Place an empty function call at each marker site. Can be
    dynamically changed for a probe function.
    diff --git a/lib/Kconfig b/lib/Kconfig
    index ba3d104..23d5507 100644
    --- a/lib/Kconfig
    +++ b/lib/Kconfig
    @@ -85,6 +85,12 @@ config GENERIC_ALLOCATOR
    boolean

    #
    +# Jenkins has support is selected if needed
    +#
    +config JHASH
    + boolean
    +
    +#
    # reed solomon support is select'ed if needed
    #
    config REED_SOLOMON
    diff --git a/lib/Makefile b/lib/Makefile
    index 23de261..a100a49 100644
    --- a/lib/Makefile
    +++ b/lib/Makefile
    @@ -49,6 +49,7 @@ obj-$(CONFIG_CRC32) += crc32.o
    obj-$(CONFIG_CRC7) += crc7.o
    obj-$(CONFIG_LIBCRC32C) += libcrc32c.o
    obj-$(CONFIG_GENERIC_ALLOCATOR) += genalloc.o
    +obj-$(CONFIG_JHASH) += jhash.o

    obj-$(CONFIG_ZLIB_INFLATE) += zlib_inflate/
    obj-$(CONFIG_ZLIB_DEFLATE) += zlib_deflate/
    diff --git a/lib/jhash.c b/lib/jhash.c
    new file mode 100644
    index 0000000..ee4083f
    --- /dev/null
    +++ b/lib/jhash.c
    @@ -0,0 +1,116 @@
    +/* jhash.c: Jenkins hash support.
    + *
    + * Copyright (C) 1996 Bob Jenkins (bob_jenkins@burtleburtle.net)
    + *
    + * http://burtleburtle.net/bob/hash/
    + *
    + * These are the credits from Bob's sources:
    + *
    + * lookup2.c, by Bob Jenkins, December 1996, Public Domain.
    + * hash(), hash2(), hash3, and mix() are externally useful functions.
    + * Routines to test the hash are included if SELF_TEST is defined.
    + * You can use this free for any purpose. It has no warranty.
    + *
    + * Copyright (C) 2003 David S. Miller (davem@redhat.com)
    + *
    + * I've modified Bob's hash to be useful in the Linux kernel, and
    + * any bugs present are surely my fault. -DaveM
    + */
    +#include
    +#include
    +#include
    +
    +/* The most generic version, hashes an arbitrary sequence
    + * of bytes. No alignment or length assumptions are made about
    + * the input key.
    + */
    +u32 jhash(const void *key, u32 length, u32 initval)
    +{
    + u32 a, b, c, len;
    + const u8 *k = key;
    +
    + len = length;
    + a = b = JHASH_GOLDEN_RATIO;
    + c = initval;
    +
    + while (len >= 12) {
    + a += (k[0] +((u32)k[1]<<8) +((u32)k[2]<<16) +((u32)k[3]<<24));
    + b += (k[4] +((u32)k[5]<<8) +((u32)k[6]<<16) +((u32)k[7]<<24));
    + c += (k[8] +((u32)k[9]<<8) +((u32)k[10]<<16)+((u32)k[11]<<24));
    +
    + __jhash_mix(a,b,c);
    +
    + k += 12;
    + len -= 12;
    + }
    +
    + c += length;
    + switch (len) {
    + case 11: c += ((u32)k[10]<<24);
    + case 10: c += ((u32)k[9]<<16);
    + case 9 : c += ((u32)k[8]<<8);
    + case 8 : b += ((u32)k[7]<<24);
    + case 7 : b += ((u32)k[6]<<16);
    + case 6 : b += ((u32)k[5]<<8);
    + case 5 : b += k[4];
    + case 4 : a += ((u32)k[3]<<24);
    + case 3 : a += ((u32)k[2]<<16);
    + case 2 : a += ((u32)k[1]<<8);
    + case 1 : a += k[0];
    + };
    +
    + __jhash_mix(a,b,c);
    +
    + return c;
    +}
    +EXPORT_SYMBOL(jhash);
    +
    +/* A special optimized version that handles 1 or more of u32s.
    + * The length parameter here is the number of u32s in the key.
    + */
    +u32 jhash2(const u32 *k, u32 length, u32 initval)
    +{
    + u32 a, b, c, len;
    +
    + a = b = JHASH_GOLDEN_RATIO;
    + c = initval;
    + len = length;
    +
    + while (len >= 3) {
    + a += k[0];
    + b += k[1];
    + c += k[2];
    + __jhash_mix(a, b, c);
    + k += 3; len -= 3;
    + }
    +
    + c += length * 4;
    +
    + switch (len) {
    + case 2 : b += k[1];
    + case 1 : a += k[0];
    + };
    +
    + __jhash_mix(a,b,c);
    +
    + return c;
    +}
    +EXPORT_SYMBOL(jhash2);
    +
    +/* A special ultra-optimized versions that knows they are hashing exactly
    + * 3, 2 or 1 word(s).
    + *
    + * NOTE: In partilar the "c += length; __jhash_mix(a,b,c);" normally
    + * done at the end is not done here.
    + */
    +u32 jhash_3words(u32 a, u32 b, u32 c, u32 initval)
    +{
    + a += JHASH_GOLDEN_RATIO;
    + b += JHASH_GOLDEN_RATIO;
    + c += initval;
    +
    + __jhash_mix(a, b, c);
    +
    + return c;
    +}
    +EXPORT_SYMBOL(jhash_3words);
    diff --git a/net/Kconfig b/net/Kconfig
    index 6627c6a..0749381 100644
    --- a/net/Kconfig
    +++ b/net/Kconfig
    @@ -6,6 +6,7 @@ menu "Networking"

    config NET
    bool "Networking support"
    + select JHASH
    ---help---
    Unless you really know what you are doing, you should say Y here.
    The reason is that some programs need kernel networking support even
    --
    1.5.2.2

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. [RFC PATCH 7/8] [SCTP]: uninline sctp_add_cmd_sf

    I added inline to sctp_add_cmd and appropriate comment there to
    avoid adding another call into the call chain. This works at least
    with "gcc (GCC) 4.1.2 20070626 (Red Hat 4.1.2-13)". Alternatively,
    __sctp_add_cmd could be introduced to .h.

    net/sctp/sm_statefuns.c:
    sctp_sf_cookie_wait_prm_abort | -125
    sctp_sf_cookie_wait_prm_shutdown | -75
    sctp_sf_do_9_1_prm_abort | -75
    sctp_sf_shutdown_sent_prm_abort | -50
    sctp_sf_pdiscard | -25
    sctp_stop_t1_and_abort | -100
    sctp_sf_do_9_2_start_shutdown | -154
    __sctp_sf_do_9_1_abort | -50
    sctp_send_stale_cookie_err | -29
    sctp_sf_abort_violation | -181
    sctp_sf_do_9_2_shutdown_ack | -154
    sctp_sf_do_9_2_reshutack | -86
    sctp_sf_tabort_8_4_8 | -28
    sctp_sf_heartbeat | -52
    sctp_sf_shut_8_4_5 | -27
    sctp_eat_data | -246
    sctp_sf_shutdown_sent_abort | -58
    sctp_sf_check_restart_addrs | -50
    sctp_sf_do_unexpected_init | -110
    sctp_sf_sendbeat_8_3 | -107
    sctp_sf_unk_chunk | -65
    sctp_sf_do_prm_asoc | -129
    sctp_sf_do_prm_send | -25
    sctp_sf_do_9_2_prm_shutdown | -50
    sctp_sf_error_closed | -25
    sctp_sf_error_shutdown | -25
    sctp_sf_shutdown_pending_prm_abort | -25
    sctp_sf_do_prm_requestheartbeat | -28
    sctp_sf_do_prm_asconf | -75
    sctp_sf_do_6_3_3_rtx | -104
    sctp_sf_do_6_2_sack | -25
    sctp_sf_t1_init_timer_expire | -133
    sctp_sf_t1_cookie_timer_expire | -104
    sctp_sf_t2_timer_expire | -161
    sctp_sf_t4_timer_expire | -175
    sctp_sf_t5_timer_expire | -75
    sctp_sf_autoclose_timer_expire | -50
    sctp_sf_do_5_2_4_dupcook | -579
    sctp_sf_do_4_C | -125
    sctp_sf_shutdown_pending_abort | -32
    sctp_sf_do_5_1E_ca | -186
    sctp_sf_backbeat_8_3 | -27
    sctp_sf_cookie_echoed_err | -300
    sctp_sf_eat_data_6_2 | -146
    sctp_sf_eat_data_fast_4_4 | -125
    sctp_sf_eat_sack_6_2 | -29
    sctp_sf_operr_notify | -25
    sctp_sf_do_9_2_final | -152
    sctp_sf_do_asconf | -64
    sctp_sf_do_asconf_ack | -284
    sctp_sf_eat_fwd_tsn_fast | -160
    sctp_sf_eat_auth | -86
    sctp_sf_do_5_1B_init | -110
    sctp_sf_do_5_1C_ack | -204
    sctp_sf_do_9_2_shutdown | -78
    sctp_sf_do_ecn_cwr | -24
    sctp_sf_do_ecne | -32
    sctp_sf_eat_fwd_tsn | -135
    sctp_sf_do_5_1D_ce | -197
    sctp_sf_beat_8_3 | -28
    60 functions changed, 6184 bytes removed, diff: -6184
    net/sctp/sm_sideeffect.c:
    sctp_side_effects | -3873
    sctp_do_sm | +3429
    2 functions changed, 3429 bytes added, 3873 bytes removed, diff: -444
    kernel/uninlined.c:
    sctp_add_cmd_sf | +35
    1 function changed, 35 bytes added, diff: +35

    vmlinux.o:
    63 functions changed, 3464 bytes added, 10057 bytes removed, diff: -6593

    Signed-off-by: Ilpo Järvinen
    Cc: Vlad Yasevich
    ---
    include/net/sctp/sm.h | 8 ++------
    net/sctp/command.c | 12 +++++++++++-
    2 files changed, 13 insertions(+), 7 deletions(-)

    diff --git a/include/net/sctp/sm.h b/include/net/sctp/sm.h
    index ef9e7ed..6740b11 100644
    --- a/include/net/sctp/sm.h
    +++ b/include/net/sctp/sm.h
    @@ -385,13 +385,9 @@ static inline int ADDIP_SERIAL_gte(__u16 s, __u16 t)
    return (((s) == (t)) || (((t) - (s)) & ADDIP_SERIAL_SIGN_BIT));
    }

    -
    /* Run sctp_add_cmd() generating a BUG() if there is a failure. */
    -static inline void sctp_add_cmd_sf(sctp_cmd_seq_t *seq, sctp_verb_t verb, sctp_arg_t obj)
    -{
    - if (unlikely(!sctp_add_cmd(seq, verb, obj)))
    - BUG();
    -}
    +extern void sctp_add_cmd_sf(sctp_cmd_seq_t *seq, sctp_verb_t verb,
    + sctp_arg_t obj);

    /* Check VTAG of the packet matches the sender's own tag. */
    static inline int
    diff --git a/net/sctp/command.c b/net/sctp/command.c
    index bb97733..187da2d 100644
    --- a/net/sctp/command.c
    +++ b/net/sctp/command.c
    @@ -51,8 +51,11 @@ int sctp_init_cmd_seq(sctp_cmd_seq_t *seq)

    /* Add a command to a sctp_cmd_seq_t.
    * Return 0 if the command sequence is full.
    + *
    + * Inline here is not a mistake, this way sctp_add_cmd_sf doesn't need extra
    + * calls, size penalty is of insignificant magnitude here
    */
    -int sctp_add_cmd(sctp_cmd_seq_t *seq, sctp_verb_t verb, sctp_arg_t obj)
    +inline int sctp_add_cmd(sctp_cmd_seq_t *seq, sctp_verb_t verb, sctp_arg_t obj)
    {
    if (seq->next_free_slot >= SCTP_MAX_NUM_COMMANDS)
    goto fail;
    @@ -66,6 +69,13 @@ fail:
    return 0;
    }

    +/* Run sctp_add_cmd() generating a BUG() if there is a failure. */
    +void sctp_add_cmd_sf(sctp_cmd_seq_t *seq, sctp_verb_t verb, sctp_arg_t obj)
    +{
    + if (unlikely(!sctp_add_cmd(seq, verb, obj)))
    + BUG();
    +}
    +
    /* Return the next command structure in a sctp_cmd_seq.
    * Returns NULL at the end of the sequence.
    */
    --
    1.5.2.2

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. [RFC PATCH 6/8] [NET]: uninline skb_trim, de-bloats

    -10976 209 funcs, 123 +, 11099 -, diff: -10976 --- skb_trim

    Signed-off-by: Ilpo Järvinen
    ---
    include/linux/skbuff.h | 16 +---------------
    net/core/skbuff.c | 16 ++++++++++++++++
    2 files changed, 17 insertions(+), 15 deletions(-)

    diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
    index c11f248..75d8a66 100644
    --- a/include/linux/skbuff.h
    +++ b/include/linux/skbuff.h
    @@ -1156,21 +1156,7 @@ static inline void __skb_trim(struct sk_buff *skb, unsigned int len)
    skb_set_tail_pointer(skb, len);
    }

    -/**
    - * skb_trim - remove end from a buffer
    - * @skb: buffer to alter
    - * @len: new length
    - *
    - * Cut the length of a buffer down by removing data from the tail. If
    - * the buffer is already under the length specified it is not modified.
    - * The skb must be linear.
    - */
    -static inline void skb_trim(struct sk_buff *skb, unsigned int len)
    -{
    - if (skb->len > len)
    - __skb_trim(skb, len);
    -}
    -
    +extern void skb_trim(struct sk_buff *skb, unsigned int len);

    static inline int __pskb_trim(struct sk_buff *skb, unsigned int len)
    {
    diff --git a/net/core/skbuff.c b/net/core/skbuff.c
    index 05d43fd..b57cadb 100644
    --- a/net/core/skbuff.c
    +++ b/net/core/skbuff.c
    @@ -931,6 +931,22 @@ unsigned char *skb_pull(struct sk_buff *skb, unsigned int len)
    }
    EXPORT_SYMBOL(skb_pull);

    +/**
    + * skb_trim - remove end from a buffer
    + * @skb: buffer to alter
    + * @len: new length
    + *
    + * Cut the length of a buffer down by removing data from the tail. If
    + * the buffer is already under the length specified it is not modified.
    + * The skb must be linear.
    + */
    +void skb_trim(struct sk_buff *skb, unsigned int len)
    +{
    + if (skb->len > len)
    + __skb_trim(skb, len);
    +}
    +EXPORT_SYMBOL(skb_trim);
    +
    /* Trims skb to length len. It can change skb pointers.
    */

    --
    1.5.2.2

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: [RFC PATCH 1/8] [NET]: uninline skb_put, de-bloats a lot

    On Wed, 20 Feb 2008 15:47:11 +0200
    "Ilpo Järvinen" wrote:

    > ~500 files changed
    > ...
    > kernel/uninlined.c:
    > skb_put | +104
    > 1 function changed, 104 bytes added, diff: +104
    >
    > vmlinux.o:
    > 869 functions changed, 198 bytes added, 111003 bytes removed, diff: -110805
    >
    > This change is INCOMPLETE, I think that the call to current_text_addr()
    > should be rethought but I don't have a clue how to do that.


    You want to use __builtin_return_address(0)

    >
    > Signed-off-by: Ilpo Järvinen
    > ---
    > include/linux/skbuff.h | 20 +-------------------
    > net/core/skbuff.c | 21 +++++++++++++++++++++
    > 2 files changed, 22 insertions(+), 19 deletions(-)
    >
    > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
    > index 412672a..5925435 100644
    > --- a/include/linux/skbuff.h
    > +++ b/include/linux/skbuff.h
    > @@ -896,25 +896,7 @@ static inline unsigned char *__skb_put(struct sk_buff *skb, unsigned int len)
    > return tmp;
    > }
    >
    > -/**
    > - * skb_put - add data to a buffer
    > - * @skb: buffer to use
    > - * @len: amount of data to add
    > - *
    > - * This function extends the used data area of the buffer. If this would
    > - * exceed the total buffer size the kernel will panic. A pointer to the
    > - * first byte of the extra data is returned.
    > - */
    > -static inline unsigned char *skb_put(struct sk_buff *skb, unsigned int len)
    > -{
    > - unsigned char *tmp = skb_tail_pointer(skb);
    > - SKB_LINEAR_ASSERT(skb);
    > - skb->tail += len;
    > - skb->len += len;
    > - if (unlikely(skb->tail > skb->end))
    > - skb_over_panic(skb, len, current_text_addr());
    > - return tmp;
    > -}
    > +extern unsigned char *skb_put(struct sk_buff *skb, unsigned int len);
    >
    > static inline unsigned char *__skb_push(struct sk_buff *skb, unsigned int len)
    > {
    > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
    > index 4e35422..661d439 100644
    > --- a/net/core/skbuff.c
    > +++ b/net/core/skbuff.c
    > @@ -857,6 +857,27 @@ free_skb:
    > return err;
    > }
    >
    > +/**
    > + * skb_put - add data to a buffer
    > + * @skb: buffer to use
    > + * @len: amount of data to add
    > + *
    > + * This function extends the used data area of the buffer. If this would
    > + * exceed the total buffer size the kernel will panic. A pointer to the
    > + * first byte of the extra data is returned.
    > + */
    > +unsigned char *skb_put(struct sk_buff *skb, unsigned int len)
    > +{
    > + unsigned char *tmp = skb_tail_pointer(skb);
    > + SKB_LINEAR_ASSERT(skb);
    > + skb->tail += len;
    > + skb->len += len;
    > + if (unlikely(skb->tail > skb->end))
    > + skb_over_panic(skb, len, current_text_addr());
    > + return tmp;
    > +}
    > +EXPORT_SYMBOL(skb_put);
    > +
    > /* Trims skb to length len. It can change skb pointers.
    > */
    >
    > --
    > 1.5.2.2
    >
    > --
    > To unsubscribe from this list: send the line "unsubscribe netdev" in
    > the body of a message to majordomo@vger.kernel.org
    > More majordomo info at http://vger.kernel.org/majordomo-info.html
    >

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: [RFC PATCH 3/8] [NET]: uninline dev_alloc_skb, de-bloats a lot


    On Feb 20 2008 15:47, Ilpo Järvinen wrote:
    >
    >-23668 392 funcs, 104 +, 23772 -, diff: -23668 --- dev_alloc_skb
    >
    >-static inline struct sk_buff *dev_alloc_skb(unsigned int length)
    >-{
    >- return __dev_alloc_skb(length, GFP_ATOMIC);
    >-}
    >+extern struct sk_buff *dev_alloc_skb(unsigned int length);


    Striking. How can this even happen? A callsite which calls

    dev_alloc_skb(n)

    is just equivalent to

    __dev_alloc_skb(n, GFP_ATOMIC);

    which means there's like 4 (or 8 if it's long) bytes more on the
    stack. For a worst case, count in another 8 bytes for push and pop or mov on
    the stack. But that still does not add up to 23 kb.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: [RFC PATCH 3/8] [NET]: uninline dev_alloc_skb, de-bloats a lot

    Jan Engelhardt wrote:
    > On Feb 20 2008 15:47, Ilpo Järvinen wrote:
    >> -23668 392 funcs, 104 +, 23772 -, diff: -23668 --- dev_alloc_skb
    >>
    >> -static inline struct sk_buff *dev_alloc_skb(unsigned int length)
    >> -{
    >> - return __dev_alloc_skb(length, GFP_ATOMIC);
    >> -}
    >> +extern struct sk_buff *dev_alloc_skb(unsigned int length);

    >
    > Striking. How can this even happen? A callsite which calls
    >
    > dev_alloc_skb(n)
    >
    > is just equivalent to
    >
    > __dev_alloc_skb(n, GFP_ATOMIC);
    >
    > which means there's like 4 (or 8 if it's long) bytes more on the
    > stack. For a worst case, count in another 8 bytes for push and pop or mov on
    > the stack. But that still does not add up to 23 kb.



    __dev_alloc_skb() is also an inline function which performs
    some extra work. Which raises the question - if dev_alloc_skb()
    is uninlined, shouldn't __dev_alloc_skb() be uninline as well?
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. Re: [RFC PATCH 3/8] [NET]: uninline dev_alloc_skb, de-bloats a lot


    On Feb 20 2008 17:27, Patrick McHardy wrote:
    >> Striking. How can this even happen? A callsite which calls
    >>
    >> dev_alloc_skb(n)
    >>
    >> is just equivalent to
    >>
    >> __dev_alloc_skb(n, GFP_ATOMIC);
    >>
    >> which means there's like 4 (or 8 if it's long) bytes more on the
    >> stack. For a worst case, count in another 8 bytes for push and pop or mov on
    >> the stack. But that still does not add up to 23 kb.

    >
    > __dev_alloc_skb() is also an inline function which performs
    > some extra work. Which raises the question - if dev_alloc_skb()
    > is uninlined, shouldn't __dev_alloc_skb() be uninline as well?
    >

    I'd like to see the results when {__dev_alloc_skb is externed
    and dev_alloc_skb remains inlined}.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. Re: [RFC PATCH 3/8] [NET]: uninline dev_alloc_skb, de-bloats a lot

    On Wed, 20 Feb 2008, Jan Engelhardt wrote:

    >
    > On Feb 20 2008 17:27, Patrick McHardy wrote:
    > >> Striking. How can this even happen? A callsite which calls
    > >>
    > >> dev_alloc_skb(n)
    > >>
    > >> is just equivalent to
    > >>
    > >> __dev_alloc_skb(n, GFP_ATOMIC);
    > >>
    > >> which means there's like 4 (or 8 if it's long) bytes more on the
    > >> stack. For a worst case, count in another 8 bytes for push and pop or mov on
    > >> the stack. But that still does not add up to 23 kb.


    I think you misunderstood the results, if I uninlined dev_alloc_skb(), it
    _alone_ was uninlined which basically means that __dev_alloc_skb() that is
    inline as well is included inside that uninlined function.

    When both were inlined, they add up to everywhere, and uninlining
    dev_alloc_skb alone mitigates that for both(!) of them in every place
    where dev_alloc_skb is being called. Because __dev_alloc_skb call sites
    are few, most benefits show up already with dev_alloc_skb uninlining
    alone. On the other hand, if __dev_alloc_skb is uninlined, the size
    reasoning you used above applies to dev_alloc_skb callsites, and that
    is definately less than 23kB.

    > > __dev_alloc_skb() is also an inline function which performs
    > > some extra work. Which raises the question - if dev_alloc_skb()
    > > is uninlined, shouldn't __dev_alloc_skb() be uninline as well?


    Of course that could be done as well, however, I wouldn't be too keen to
    deepen callchain by both of them, ie., uninlined dev_alloc_skb would just
    contain few bytes which perform the call to __dev_alloc_skb which has the
    bit larger content due to that "extra work". IMHO the best solution would
    duplicate the "extra work" to both of them on binary level (obviously
    not on the source level), e.g., by adding static inline ___dev_alloc_skb()
    to .h which is inlined to both of the variants. I'm not too sure if inline
    to __dev_alloc_skb() alone is enough in .c file to result in inlining of
    __dev_alloc_skb to dev_alloc_skb (with all gcc versions and relevant
    optimization settings).

    > I'd like to see the results when {__dev_alloc_skb is externed
    > and dev_alloc_skb remains inlined}.


    The results are right under your nose already... ;-)
    See from the list of the series introduction:

    http://marc.info/?l=linux-netdev&m=120351526210711&w=2

    IMHO more interesting number (which I currently don't have) is the
    _remaining_ benefits of uninlining __dev_alloc_skb after
    dev_alloc_skb was first uninlined.


    --
    i.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  13. Re: [RFC PATCH 7/8] [SCTP]: uninline sctp_add_cmd_sf

    Ilpo Järvinen wrote:
    > I added inline to sctp_add_cmd and appropriate comment there to
    > avoid adding another call into the call chain. This works at least
    > with "gcc (GCC) 4.1.2 20070626 (Red Hat 4.1.2-13)". Alternatively,
    > __sctp_add_cmd could be introduced to .h.
    >


    My only concern was performance regressions, but it looks like it
    doesn't effect anything from the few quick runs I've made.

    Since we are putting sctp_add_cmd_sf() on the call stack, we might
    as well get rid of sctp_add_cmd() and reduce it a bit more.

    -vlad

    diff --git a/include/net/sctp/command.h b/include/net/sctp/command.h
    index 10ae2da..4263af8 100644
    --- a/include/net/sctp/command.h
    +++ b/include/net/sctp/command.h
    @@ -205,12 +205,11 @@ typedef struct {
    int sctp_init_cmd_seq(sctp_cmd_seq_t *seq);

    /* Add a command to an sctp_cmd_seq_t.
    - * Return 0 if the command sequence is full.
    *
    * Use the SCTP_* constructors defined by SCTP_ARG_CONSTRUCTOR() above
    * to wrap data which goes in the obj argument.
    */
    -int sctp_add_cmd(sctp_cmd_seq_t *seq, sctp_verb_t verb, sctp_arg_t obj);
    +void sctp_add_cmd_sf(sctp_cmd_seq_t *seq, sctp_verb_t verb, sctp_arg_t obj);

    /* Return the next command structure in an sctp_cmd_seq.
    * Return NULL at the end of the sequence.
    diff --git a/include/net/sctp/sm.h b/include/net/sctp/sm.h
    index ef9e7ed..2481173 100644
    --- a/include/net/sctp/sm.h
    +++ b/include/net/sctp/sm.h
    @@ -385,14 +385,6 @@ static inline int ADDIP_SERIAL_gte(__u16 s, __u16 t)
    return (((s) == (t)) || (((t) - (s)) & ADDIP_SERIAL_SIGN_BIT));
    }

    -
    -/* Run sctp_add_cmd() generating a BUG() if there is a failure. */
    -static inline void sctp_add_cmd_sf(sctp_cmd_seq_t *seq, sctp_verb_t verb, sctp_arg_t obj)
    -{
    - if (unlikely(!sctp_add_cmd(seq, verb, obj)))
    - BUG();
    -}
    -
    /* Check VTAG of the packet matches the sender's own tag. */
    static inline int
    sctp_vtag_verify(const struct sctp_chunk *chunk,
    diff --git a/net/sctp/command.c b/net/sctp/command.c
    index bb97733..3a06513 100644
    --- a/net/sctp/command.c
    +++ b/net/sctp/command.c
    @@ -51,19 +51,16 @@ int sctp_init_cmd_seq(sctp_cmd_seq_t *seq)

    /* Add a command to a sctp_cmd_seq_t.
    * Return 0 if the command sequence is full.
    + *
    + * Inline here is not a mistake, this way sctp_add_cmd_sf doesn't need extra
    + * calls, size penalty is of insignificant magnitude here
    */
    -int sctp_add_cmd(sctp_cmd_seq_t *seq, sctp_verb_t verb, sctp_arg_t obj)
    +void sctp_add_cmd_sf(sctp_cmd_seq_t *seq, sctp_verb_t verb, sctp_arg_t obj)
    {
    - if (seq->next_free_slot >= SCTP_MAX_NUM_COMMANDS)
    - goto fail;
    + BUG_ON(seq->next_free_slot >= SCTP_MAX_NUM_COMMANDS);

    seq->cmds[seq->next_free_slot].verb = verb;
    seq->cmds[seq->next_free_slot++].obj = obj;
    -
    - return 1;
    -
    -fail:
    - return 0;
    }

    /* Return the next command structure in a sctp_cmd_seq.
    diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
    index f2ed647..1475a29 100644
    --- a/net/sctp/sm_statefuns.c
    +++ b/net/sctp/sm_statefuns.c
    @@ -3135,12 +3135,8 @@ sctp_disposition_t sctp_sf_operr_notify(const struct sctp_endpoint *ep,
    if (!ev)
    goto nomem;

    - if (!sctp_add_cmd(commands, SCTP_CMD_EVENT_ULP,
    - SCTP_ULPEVENT(ev))) {
    - sctp_ulpevent_free(ev);
    - goto nomem;
    - }
    -
    + sctp_add_cmd_sf(commands, SCTP_CMD_EVENT_ULP,
    + SCTP_ULPEVENT(ev));
    sctp_add_cmd_sf(commands, SCTP_CMD_PROCESS_OPERR,
    SCTP_CHUNK(chunk));
    }


  14. Re: [RFC PATCH 7/8] [SCTP]: uninline sctp_add_cmd_sf

    On Wed, 20 Feb 2008, Vlad Yasevich wrote:

    > Ilpo Järvinen wrote:
    > > I added inline to sctp_add_cmd and appropriate comment there to
    > > avoid adding another call into the call chain. This works at least
    > > with "gcc (GCC) 4.1.2 20070626 (Red Hat 4.1.2-13)". Alternatively,
    > > __sctp_add_cmd could be introduced to .h.
    > >

    >
    > My only concern was performance regressions, but it looks like it
    > doesn't effect anything from the few quick runs I've made.


    There was one call made anyway, it's a bit hard to see how it would hurt
    to push that BUG() deeper down (in fact, this is one of the easiest case
    in this respect, many other cases elsewhere that need uninlining don't
    currently make any calls with inlines).

    > Since we are putting sctp_add_cmd_sf() on the call stack, we might
    > as well get rid of sctp_add_cmd() and reduce it a bit more.


    IMHO it is definately better solution for archiving the size reduction,
    I just didn't know before that the only sctp_add_cmd call could be
    converted.

    [...snip...]
    > diff --git a/net/sctp/command.c b/net/sctp/command.c
    > index bb97733..3a06513 100644
    > --- a/net/sctp/command.c
    > +++ b/net/sctp/command.c
    > @@ -51,19 +51,16 @@ int sctp_init_cmd_seq(sctp_cmd_seq_t *seq)
    >
    > /* Add a command to a sctp_cmd_seq_t.
    > * Return 0 if the command sequence is full.
    > + *
    > + * Inline here is not a mistake, this way sctp_add_cmd_sf doesn't need extra
    > + * calls, size penalty is of insignificant magnitude here


    This won't be a necessary note anymore. :-)

    [...snip...]

    --
    i.

  15. Re: [RFC PATCH 7/8] [SCTP]: uninline sctp_add_cmd_sf

    Ilpo Järvinen wrote:
    > On Wed, 20 Feb 2008, Vlad Yasevich wrote:
    >
    >> Ilpo Järvinen wrote:
    >>> I added inline to sctp_add_cmd and appropriate comment there to
    >>> avoid adding another call into the call chain. This works at least
    >>> with "gcc (GCC) 4.1.2 20070626 (Red Hat 4.1.2-13)". Alternatively,
    >>> __sctp_add_cmd could be introduced to .h.
    >>>

    >> My only concern was performance regressions, but it looks like it
    >> doesn't effect anything from the few quick runs I've made.

    >
    > There was one call made anyway, it's a bit hard to see how it would hurt
    > to push that BUG() deeper down (in fact, this is one of the easiest case
    > in this respect, many other cases elsewhere that need uninlining don't
    > currently make any calls with inlines).
    >
    >> Since we are putting sctp_add_cmd_sf() on the call stack, we might
    >> as well get rid of sctp_add_cmd() and reduce it a bit more.

    >
    > IMHO it is definately better solution for archiving the size reduction,
    > I just didn't know before that the only sctp_add_cmd call could be
    > converted.


    That one was a really silly one. The chance of not calling BUG() in
    that one case was so small, that it didn't really make any sense from
    the code robustness side.

    >
    > [...snip...]
    >> diff --git a/net/sctp/command.c b/net/sctp/command.c
    >> index bb97733..3a06513 100644
    >> --- a/net/sctp/command.c
    >> +++ b/net/sctp/command.c
    >> @@ -51,19 +51,16 @@ int sctp_init_cmd_seq(sctp_cmd_seq_t *seq)
    >>
    >> /* Add a command to a sctp_cmd_seq_t.
    >> * Return 0 if the command sequence is full.
    >> + *
    >> + * Inline here is not a mistake, this way sctp_add_cmd_sf doesn't need extra
    >> + * calls, size penalty is of insignificant magnitude here

    >
    > This won't be a necessary note anymore. :-)
    >
    > [...snip...]
    >


    Yeah. If you are going to resubmit, feel free to put my Signed-off-by line.

    -vlad
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  16. Re: [RFC PATCH 0/8]: uninline & uninline

    On Wed, 20 Feb 2008 15:47:10 +0200 "Ilpo J__rvinen" wrote:

    > Ok, here's the top of the list (10000+ bytes):


    This is good stuff - thanks.

    > -41525 2066 f, 3370 +, 44895 -, diff: -41525 IS_ERR


    This is a surprise. I expect that the -mm-only
    profile-likely-unlikely-macros.patch is the cause of this and mainline
    doesn't have this problem.

    If true, then this likely/unlikely bloat has probably spread into a lot of
    your other results and it all should be redone against mainline, sorry

    (I'm not aware of anyone having used profile-likely-unlikely-macros.patch
    in quite some time. That's unfortunate because it has turned up some
    fairly flagrant code deoptimisations)
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  17. Re: [RFC PATCH 8/8] Jhash in too big for inlining, move under lib/

    On Wed, 20 Feb 2008 15:47:18 +0200 "Ilpo Järvinen" wrote:

    > vmlinux.o:
    > 62 functions changed, 66 bytes added, 10935 bytes removed, diff: -10869
    >
    > ...+ these to lib/jhash.o:
    > jhash_3words: 112
    > jhash2: 276
    > jhash: 475
    >
    > select for networking code might need a more fine-grained approach.


    It should be possible to use a modular jhash.ko. The things which you
    have identified as clients of the jhash library are usually loaded as modules.

    But in the case where someone does (say) NFSD=y we do need jhash.o linked into
    vmlinux also. This is doable in Kconfig but I always forget how. Adrian, Sam
    and Randy are the repositories of knowledge here
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  18. Re: [RFC PATCH 8/8] Jhash in too big for inlining, move under lib/

    On Sat, 23 Feb 2008, Andrew Morton wrote:

    > On Wed, 20 Feb 2008 15:47:18 +0200 "Ilpo Järvinen" wrote:
    >
    > > vmlinux.o:
    > > 62 functions changed, 66 bytes added, 10935 bytes removed, diff: -10869
    > >
    > > ...+ these to lib/jhash.o:
    > > jhash_3words: 112
    > > jhash2: 276
    > > jhash: 475
    > >
    > > select for networking code might need a more fine-grained approach.

    >
    > It should be possible to use a modular jhash.ko. The things which you
    > have identified as clients of the jhash library are usually loaded as modules.
    > But in the case where someone does (say) NFSD=y we do need jhash.o linked into
    > vmlinux also. This is doable in Kconfig but I always forget how.


    Ok, even though its not that likely that one lives without e.g.
    net/ipv4/inet_connection_sock.c or net/netlink/af_netlink.c? But maybe
    some guys "really know what they are doing" and can come up with config
    that would be able to build it as module (for other than proof-of-concept
    uses I mean)... :-/

    > Adrian, Sam and Randy are the repositories of knowledge here


    Thanks, I'll consult them in this. I've never needed to do any Kconfig
    stuff so far so it's no surprise I have very little clue... :-)

    I've one question for you Andrew, how would you like this kind of
    cross-subsys toucher to be merged, through you directly I suppose?


    --
    i.

  19. Re: [RFC PATCH 0/8]: uninline & uninline

    On Sat, 23 Feb 2008, Andrew Morton wrote:

    > On Wed, 20 Feb 2008 15:47:10 +0200 "Ilpo J__rvinen" wrote:
    >
    > > -41525 2066 f, 3370 +, 44895 -, diff: -41525 IS_ERR

    >
    > This is a surprise.


    It surprised me as well, there were something like 10 bytes I just
    couldn't explain in IS_ERR size (kernel/uninlined.c: IS_ERR | +25). I was
    to look into it deeper but didn't have the .o's at hand right away, not so
    trivial to store results of 5000+ build results except some carefully
    picked textual output :-)... Hmm, I'll add logging for the disassembly of
    the uninlined stuff into the next run, that won't cost too much...

    > I expect that the -mm-only
    > profile-likely-unlikely-macros.patch is the cause of this and mainline
    > doesn't have this problem.


    Ahaa, this explain it, I suspected that there was something (un)likely
    related elsewhere as well, no wonder why I couldn't reproduce the 25 bytes
    result in my quick copy-pasted non-kernel test...

    > If true, then this likely/unlikely bloat has probably spread into a lot of
    > your other results and it all should be redone against mainline, sorry


    It isn't that troublesome to redo them, it's mainly automatic combined
    with impatient waiting from my behalf :-)... The spreading problem is
    probably true, to some extent. I did some runs also with carefully
    selected CONFIG.*DEBUG.* off under include/net/ earlier, in general it
    made very little difference, if something bloats, it usually does that
    regardless of .config. There are probably couple of exceptions when the
    size is on the boundary where it's very close of being useful to uninline
    it.

    One interesting thing in there was that the largest offenders are quite
    small per call-site but due to vast number of them even a small benefit
    buys off a lot in kernel wide results. I suspect the differences due to
    (un)likely will be negligle, because the IS_ERR with all its trivialness
    is now mostly -15, so anything clearly larger than that will likely still
    be a win x n (where n is quite large).

    Anyway, I'll see when I get the new set of tests running... :-) I'd prefer
    with CONFIG.*DEBUG off to get larger picture about non-debug builds too
    (especially the scatterlist.h things interest me a lot), do you think that
    would do as well?

    Thanks for your comments, I found them very useful.


    --
    i.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  20. Re: [RFC PATCH 8/8] Jhash in too big for inlining, move under lib/

    Andrew Morton writes:
    >
    > It should be possible to use a modular jhash.ko. The things which you
    > have identified as clients of the jhash library are usually loaded as modules.


    For very small functions like this own modules are quite expensive. First
    everything gets rounded up to at least one 4K page (or worse on architectures
    with larger pages). That just wastes some memory.

    But then since modules live in vmalloc space they also need an own
    TLB entry, which are notouriously scarce in the kernel because often user space
    wants to monopolize them all. So if you're unlucky and user space
    is thrashing the TLB just a single call to this hash function will be an own
    TLB miss and quite expensive.

    It would be better to just always link it in for this case.

    -Andi
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread
Page 1 of 2 1 2 LastLast