fastest transition from bitmask to the position of it's lowest SETbit - Unix

This is a discussion on fastest transition from bitmask to the position of it's lowest SETbit - Unix ; I want to combine a bitmask M with a byte-aligned variable V to extract a (not necessarily) byte-aligned parameter P (from a network protocol). The bitmask of the same bitsize as V has the (successive) bits set to 1, where ...

+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 20 of 30

Thread: fastest transition from bitmask to the position of it's lowest SETbit

  1. fastest transition from bitmask to the position of it's lowest SETbit

    I want to combine a bitmask M with a byte-aligned variable V to extract
    a (not necessarily) byte-aligned parameter P (from a network protocol).

    The bitmask of the same bitsize as V has the (successive) bits set to 1,
    where the parameter is stored, and the rest to 0.

    So basically I wanna apply this math:

    P = (V AND M) SHR O

    where AND is a bitwise AND, SHR is bitwise shift-right and O is the
    order of the lowest set bit in the bitmask.

    For example:
    V = b0110101000110010 (2 bytes)
    M = b0000011111110000 (2 bytes)
    O = ln (16) / ln (2) = 4

    P = b0110101000110010
    AND b0000011111110000
    SHR 4
    = b0000001000110000 SHR 4 = b0000000000100011 = b100011 (= d35)

    Now I am quite at a loss to determine that order O in a single
    mathematical operation other than the logarithm.

    I could of course do this operation in a loop
    while (M AND 1) == 0
    P = P SHR 1
    M = M SHR 1
    end while

    But this would definitely be slower if the parameter is a very long one.

    So - any suggestions to determine the (binary) order of the lowest set
    bit of a variable?

    Thank you in advance!

    Lars

  2. Re: fastest transition from bitmask to the position of it's lowestSET bit

    On May 29, 9:40 am, Lars Uffmann wrote:
    > For example:
    > V = b0110101000110010 (2 bytes)
    > M = b0000011111110000 (2 bytes)
    > O = ln (16) / ln (2) = 4
    >
    > P = b0110101000110010
    > AND b0000011111110000
    > SHR 4
    > = b0000001000110000 SHR 4 = b0000000000100011 = b100011 (= d35)
    >
    > Now I am quite at a loss to determine that order O in a single
    > mathematical operation other than the logarithm.


    You want (V & M) / (M & -M):

    V = b0110101000110010
    M = b0000011111110000
    -M = b1111100000010000 <-- 2's complement
    M&-M = b0000000000010000 <-- lowest bit on
    V&M = b0000001000110000 <-- masked
    /M&-M = b0000000000100011 <-- divide is like shr low_bit_pos

    Jason




  3. Re: fastest transition from bitmask to the position of it's lowestSET bit

    On May 29, 10:33 am, "jason.cipri...@gmail.com"
    wrote:
    > M&-M = b0000000000010000 <-- lowest bit on


    It should be apparent but I forgot to mention you'll have to handle it
    as a special case if M==0. Also note that if you can guarantee that
    your mask bits are contiguous and you want to avoid integer division
    for any reason you can substitute (M & (~M << 1)) for (M / -M), and
    not have to handle M==0 separately as well.

    Jason

  4. Re: fastest transition from bitmask to the position of it's lowestSET bit

    On May 29, 10:53 am, "jason.cipri...@gmail.com"
    wrote:
    > for any reason you can substitute (M & (~M << 1)) for (M / -M), and
    > not have to handle M==0 separately as well.


    Ignore everything after ", and" in that sentence, sorry.

  5. Re: fastest transition from bitmask to the position of it's lowestSET bit

    Lars Uffmann wrote:
    > I want to combine a bitmask M with a byte-aligned variable V to extract
    > a (not necessarily) byte-aligned parameter P (from a network protocol).
    >
    > The bitmask of the same bitsize as V has the (successive) bits set to 1,
    > where the parameter is stored, and the rest to 0.
    >
    > So basically I wanna apply this math:
    >
    > P = (V AND M) SHR O
    >
    > where AND is a bitwise AND, SHR is bitwise shift-right and O is the
    > order of the lowest set bit in the bitmask.
    >
    > For example:
    > V = b0110101000110010 (2 bytes)
    > M = b0000011111110000 (2 bytes)
    > O = ln (16) / ln (2) = 4
    >
    > P = b0110101000110010
    > AND b0000011111110000
    > SHR 4
    > = b0000001000110000 SHR 4 = b0000000000100011 = b100011 (= d35)
    >
    > Now I am quite at a loss to determine that order O in a single
    > mathematical operation other than the logarithm.
    >
    > I could of course do this operation in a loop
    > while (M AND 1) == 0
    > P = P SHR 1
    > M = M SHR 1
    > end while
    >
    > But this would definitely be slower if the parameter is a very long one.
    >
    > So - any suggestions to determine the (binary) order of the lowest set
    > bit of a variable?


    In most situations the masks are predetermined constants:
    You know in advance that you're interested in bits 4-10 of a
    word. If that's your situation, then I'd suggest turning the
    problem around: Instead of starting with the mask and trying
    to derive the 4, start with the 4 and 10:

    #define HIBIT 10
    #define LOBIT 4
    P = (V >> LOBIT) & ~(-1u << (HIBIT - LOBIT));

    (You could add special handling for HIBIT==31 and/or LOBIT==0.)

    But if the masks actually do arrive "out of the blue" and
    you need to deduce LOBIT, a loop is about the best you can do.
    You might speed it up somewhat by biting off more than one
    bit at a time:

    for (m = M; (m & 0xF) == 0; m >>= 4)
    p >>= 4;
    for ( ; (m & 1) == 0; m >>= 1)
    p >>= 1;

    You could pursue this strategy further:

    p = V & M;
    m = M;
    if ((m & 0xFFFF) == 0) {
    m >>= 16;
    p >>= 16;
    }
    if ((m & 0xFF) == 0) {
    m >>= 8;
    p >>= 8;
    }
    if ((m & 0xF) == 0) {
    ...

    You could even resort to a full-scale binary search:

    if ((M & 0x0000FFFF) == 0) {
    if ((M & 0x00FF0000) == 0) {
    if ((M & 0x0F000000) == 0) {
    if ((M & 0x30000000) == 0) {
    if ((M & 0x40000000) == 0)
    p >>= 31;
    else
    p >>= 30;
    }
    else {
    if ((M & 0x10000000) == 0)
    p >>= 29;
    else
    p >>= 28;
    }
    }
    else ...

    It's not a given, though, that any of these will be faster
    than a simple loop. Measure, measure, measure.

    --
    Eric.Sosman@sun.com

  6. Re: fastest transition from bitmask to the position of it's lowestSET bit

    Hey Jason,

    jason.cipriani@gmail.com wrote:
    > You want (V & M) / (M & -M):


    Thank you very much! It's embarrassing but it took me like... 40 minutes
    to fully understand the 2's complement in binary, why it looks exactly
    the way you described and (M & -M) will get me the lowest bit on Your
    solution is quite a bit easier than what I had figured for the lowest
    bit on:
    ((M << 1) XOR M) AND M

    But the crucial part was that of course - I simply have to divide my
    masked value by the value of the lowest bit on. Thanks for pointing that
    out, even though I feel like I cheated on a simple riddle by not solving
    it myself

    Best Regards,

    Lars

  7. Re: fastest transition from bitmask to the position of it's lowestSET bit

    Eric Sosman wrote:
    > [...]
    > But if the masks actually do arrive "out of the blue" and
    > you need to deduce LOBIT, a loop is about the best you can do.
    > [...]


    Oooh! See Jason Cipriani's gimmick; it's so cool
    it assuages my shame in being refuted.

    --
    Eric.Sosman@sun.com

  8. Re: fastest transition from bitmask to the position of it's lowest SET bit

    Eric Sosman writes:
    > Eric Sosman wrote:
    >> [...]
    >> But if the masks actually do arrive "out of the blue" and
    >> you need to deduce LOBIT, a loop is about the best you can do.
    >> [...]

    >
    > Oooh! See Jason Cipriani's gimmick; it's so cool
    > it assuages my shame in being refuted.


    Hmm ... the first time I have seen this particular expression is
    in the ARM-specific part of Linux 2.6 (include/arm/bitops.h):

    #define fls(x) \
    ( __builtin_constant_p(x) ? constant_fls(x) : \
    ({ int __r; asm("clz\t%0, %1" : "=r"(__r) : "r"(x) : "cc"); 32-__r; }) )
    #define ffs(x) ({ unsigned long __t = (x); fls(__t & -__t); })
    #define __ffs(x) (ffs(x) - 1)

    where it (IIRC) appeared shortly after a somewhat heated discussion
    with R. King regarding wether it should be allowed that drivers for
    ARM SoC-devices (in particular, C. Hohnstaedt's 2.6 driver for the
    NPEs being part of Intel IXP4XX-chips) make use of ARM-assembly
    instructions insofar they are particularly well-suited to solving a
    specific problem[*].
    [*] Insofar there is any doubt regarding this: Mr King's position on
    this is/ was, of course, that assembly is arcane, cryptic and
    just disgusting and must not be used in a driver for any reason and
    the people suggestion to use it nevertheless just because it would
    make technical sense are clearly 'non desirable inferior lifeforms to
    stupid for this universe which Must Be Flamed Out Of Existence'
    (Yes, this was me. The IPX4XX-driver I have written for the 2nd
    generation [if ever] hardware of my employer still uses clz directly).

  9. Re: fastest transition from bitmask to the position of it's lowestSET bit

    Lars Uffmann wrote:
    > I want to combine a bitmask M with a byte-aligned variable V to extract
    > a (not necessarily) byte-aligned parameter P (from a network protocol).
    >
    > The bitmask of the same bitsize as V has the (successive) bits set to 1,
    > where the parameter is stored, and the rest to 0.
    >
    > So basically I wanna apply this math:
    >
    > P = (V AND M) SHR O
    >
    > where AND is a bitwise AND, SHR is bitwise shift-right and O is the
    > order of the lowest set bit in the bitmask.


    The method that Jason presented, using division to do the shift, is
    great if you have reasonably fast integer division on your system. If
    you don't then you may want to consider other methods.

    Most C libraries have an "ffs" function that "Finds the First Set" bit
    in a value. (I have a feeling that it's non-standard but common.) gcc
    has a builtin for this that expands to an appropriate single machine
    instruction on platforms that have one (and many do). So you can write:

    P = (V & M) >> (ffs(M)-1)

    Try "man ffs" for details. Note the need for the -1.

    But: unless you're sure that you need the extra performance, stick with
    the obvious loop!


    Phil.

  10. Re: fastest transition from bitmask to the position of it's lowestSET bit

    jason.cipriani@gmail.com wrote:
    > It should be apparent but I forgot to mention you'll have to handle it
    > as a special case if M==0.

    You never know - I might have slipped and missed that, but I'm not gonna
    handle parameters that do not exist (i.e. bitmask = 0x0) - thanks for
    mentioning anyways.

    > Also note that if you can guarantee that your mask bits are contiguous

    Luckily can

    > and you want to avoid integer division for any reason you can substitute
    > (M & (~M << 1)) for (M / -M)


    Erm - I gather that ~M is the 2-complement of M, but then you probably meant
    (M & (~M + 1)) for (M & -M)
    ?

    For your original suggestion doesn't contain M/-M, unless I am just
    unaware of the binary operator rules and the denominator of V&M/(M&-M)
    can be picked apart somehow

    Best Regards,

    Lars

  11. Re: fastest transition from bitmask to the position of it's lowestSET bit

    Eric Sosman wrote:
    > Oooh! See Jason Cipriani's gimmick; it's so cool
    > it assuages my shame in being refuted.


    Thanks for your time anyways, though I had the idea with the loop
    myself - I definitely wanted to use integer arithmetic, for that is a
    set amount (in this case 4) of cpu cycles for the calculation. In a loop
    I would have had to do at least a binary search tree which would have -
    as you demonstrated - involved LOTS more code AND have been slower


    my resulting function now looks like this:

    long extractParam (long *buf, long mask)
    {
    return (*buf & mask) / (mask & -mask);
    }

    It really is *that* easy.

    Cheers!

    Lars

    PS: I'm not handling mask = 0 on purpose, the functions domain just
    doesn't include mask = 0 (undefined).

  12. Re: fastest transition from bitmask to the position of it's lowest SET bit

    Lars Uffmann writes:

    > jason.cipriani@gmail.com wrote:
    >> It should be apparent but I forgot to mention you'll have to handle it
    >> as a special case if M==0.

    > You never know - I might have slipped and missed that, but I'm not
    > gonna handle parameters that do not exist (i.e. bitmask = 0x0) -
    > thanks for mentioning anyways.
    >
    >> Also note that if you can guarantee that your mask bits are contiguous

    > Luckily can
    >
    >> and you want to avoid integer division for any reason you can substitute
    >> (M & (~M << 1)) for (M / -M)

    >
    > Erm - I gather that ~M is the 2-complement of M, but then you probably meant


    ~M is the bitwise inverse of M.

    --
    Måns Rullgård
    mans@mansr.com

  13. Re: fastest transition from bitmask to the position of it's lowestSET bit

    Måns Rullgård wrote:
    > ~M is the bitwise inverse of M.


    Sorry - of course. ~M = (M XOR -1)
    I thought the 2-complement was the same, as opposed to -M

    Just checked wikipedia, cleared up what was still unclear. Thanks for
    pointing that out! My formula correction still applies though, as I was
    thinking 2-complement = bitwise inverse Whatever made me come to that
    assumption...

    Best Regards,

    Lars


  14. Re: fastest transition from bitmask to the position of it's lowestSET bit

    Lars Uffmann wrote:
    > my resulting function now looks like this:
    >
    > long extractParam (long *buf, long mask)
    > {
    > return (*buf & mask) / (mask & -mask);
    > }


    Actually I think I prefer

    long extractParam (void *buf, long mask)
    {
    return (*(long *)buf & mask) / (mask & -mask);
    }

    That way the function doesn't depend on long-aligned values. Can someone
    tell me how a typecast to long* behaves on a
    non-sizeof(long)-byte-aligned address if the system hardware doesn't
    support this? Or is it always possible to use non-aligned variables with
    C compilers if you accept slower processing due to software emulation?

    Thanks!

    Lars

  15. Re: fastest transition from bitmask to the position of it's lowestSET bit

    Phil wrote:

    > Most C libraries have an "ffs" function that "Finds the First Set" bit
    > in a value. (I have a feeling that it's non-standard but common.)


    It's in POSIX/SUS, but is part of the XSI option (i.e. required for
    UNIX conformance, but optional for POSIX conformance).

    --
    Geoff Clare

  16. Re: fastest transition from bitmask to the position of it's lowestSET bit

    Lars Uffmann wrote:
    > Eric Sosman wrote:
    >> Oooh! See Jason Cipriani's gimmick; it's so cool
    >> it assuages my shame in being refuted.

    >
    > Thanks for your time anyways, though I had the idea with the loop
    > myself - I definitely wanted to use integer arithmetic, for that is a
    > set amount (in this case 4) of cpu cycles for the calculation. In a loop
    > I would have had to do at least a binary search tree which would have -
    > as you demonstrated - involved LOTS more code AND have been slower
    >
    >
    > my resulting function now looks like this:
    >
    > long extractParam (long *buf, long mask)
    > {
    > return (*buf & mask) / (mask & -mask);
    > }
    >
    > It really is *that* easy.


    I'd suggest you use `unsigned long' rather than plain
    `long', to avoid problems with negative numbers.

    --
    Eric Sosman
    esosman@ieee-dot-org.invalid

  17. Re: fastest transition from bitmask to the position of it's lowest SET bit

    Lars Uffmann writes:

    > Måns Rullgård wrote:
    >> ~M is the bitwise inverse of M.

    >
    > Sorry - of course. ~M = (M XOR -1)


    That is only true for two's complement representation of negative
    numbers.

    > I thought the 2-complement was the same, as opposed to -M
    >
    > Just checked wikipedia, cleared up what was still unclear.


    The wikipedia coverage of C related matters is atrocious. Many
    statements made there are outright lies, and much of the remainder is
    inaccurate or incomplete. If you want to learn C programming, get a
    real book on the subject and a copy of the C99 standard.

    --
    Måns Rullgård
    mans@mansr.com

  18. Re: fastest transition from bitmask to the position of it's lowest SET bit

    Lars Uffmann writes:

    > Lars Uffmann wrote:
    >> my resulting function now looks like this:
    >> long extractParam (long *buf, long mask)
    >> {
    >> return (*buf & mask) / (mask & -mask);
    >> }

    >
    > Actually I think I prefer
    >
    > long extractParam (void *buf, long mask)
    > {
    > return (*(long *)buf & mask) / (mask & -mask);
    > }
    >
    > That way the function doesn't depend on long-aligned values. Can
    > someone tell me how a typecast to long* behaves on a
    > non-sizeof(long)-byte-aligned address if the system hardware doesn't
    > support this?


    Your programme with crash with a SIGBUS. On machines where this
    doesn't happen, it will run slower, sometimes 100 times slower or
    worse.

    > Or is it always possible to use non-aligned variables with C
    > compilers if you accept slower processing due to software emulation?


    No.

    Again, I suggest getting a good book on C programming and a copy of
    the standard.

    --
    Måns Rullgård
    mans@mansr.com

  19. Re: fastest transition from bitmask to the position of it's lowestSET bit

    Lars Uffmann wrote:
    > Lars Uffmann wrote:
    >> my resulting function now looks like this:
    >>
    >> long extractParam (long *buf, long mask)
    >> {
    >> return (*buf & mask) / (mask & -mask);
    >> }

    >
    > Actually I think I prefer
    >
    > long extractParam (void *buf, long mask)
    > {
    > return (*(long *)buf & mask) / (mask & -mask);
    > }
    >
    > That way the function doesn't depend on long-aligned values.


    ... except that it does! If `buf' points to a mis-aligned
    address, converting it from `void*' to `long*' doesn't magically
    fix the alignment.


    > Can someone
    > tell me how a typecast to long* behaves on a
    > non-sizeof(long)-byte-aligned address if the system hardware doesn't
    > support this?


    I think you're asking to be told how to do X, starting
    with the assumption that X is impossible ...

    > Or is it always possible to use non-aligned variables with
    > C compilers if you accept slower processing due to software emulation?


    The effect of dereferencing an improperly-aligned pointer
    is undefined; in theory, anything at all could happen. In
    practice, I've encountered three behaviors:

    - Hardware trap leading to program termination (most
    common)

    - Hardware trap leading to slow, many-instruction byte-
    by-byte fixup code (less common)

    - Hardware pretends the offending low-order address bits
    were zero, and silently accesses "nearby" memory rather
    than the intended memory (I've seen this once only)

    I imagine you'd be unhappy with all three of these ...

    --
    Eric Sosman
    esosman@ieee-dot-org.invalid

  20. Re: fastest transition from bitmask to the position of it's lowest SET bit

    Eric Sosman writes:

    > Lars Uffmann wrote:
    >> Lars Uffmann wrote:
    >>> my resulting function now looks like this:
    >>>
    >>> long extractParam (long *buf, long mask)
    >>> {
    >>> return (*buf & mask) / (mask & -mask);
    >>> }

    >> Actually I think I prefer
    >> long extractParam (void *buf, long mask)
    >> {
    >> return (*(long *)buf & mask) / (mask & -mask);
    >> }
    >> That way the function doesn't depend on long-aligned values.

    >
    > ... except that it does! If `buf' points to a mis-aligned
    > address, converting it from `void*' to `long*' doesn't magically
    > fix the alignment.
    >
    >> Can someone tell me how a typecast to long* behaves on a
    >> non-sizeof(long)-byte-aligned address if the system hardware doesn't
    >> support this?

    >
    > I think you're asking to be told how to do X, starting
    > with the assumption that X is impossible ...
    >
    >> Or is it always possible to use non-aligned variables with C
    >> compilers if you accept slower processing due to software emulation?

    >
    > The effect of dereferencing an improperly-aligned pointer
    > is undefined; in theory, anything at all could happen. In
    > practice, I've encountered three behaviors:
    >
    > - Hardware trap leading to program termination (most
    > common)
    >
    > - Hardware trap leading to slow, many-instruction byte-
    > by-byte fixup code (less common)
    >
    > - Hardware pretends the offending low-order address bits
    > were zero, and silently accesses "nearby" memory rather
    > than the intended memory (I've seen this once only)


    You missed some:

    - Hardware silently reads a rounded address, and rotates the value
    read according the least significant bits of the address. Some ARM
    variants do this.

    - Hardware does something unspecified, usually requiring multiple
    reads, returning the correct value, only 2-3 times slower than an
    aligned access. Intel x86 does this.

    --
    Måns Rullgård
    mans@mansr.com

+ Reply to Thread
Page 1 of 2 1 2 LastLast