fastest transition from bitmask to the position of it's lowest SETbit  Unix
This is a discussion on fastest transition from bitmask to the position of it's lowest SETbit  Unix ; I want to combine a bitmask M with a bytealigned variable V to extract
a (not necessarily) bytealigned parameter P (from a network protocol).
The bitmask of the same bitsize as V has the (successive) bits set to 1,
where ...
 Forum
 OS Forums
 Unix
 fastest transition from bitmask to the position of it's lowest SETbit

fastest transition from bitmask to the position of it's lowest SETbit
I want to combine a bitmask M with a bytealigned variable V to extract
a (not necessarily) bytealigned parameter P (from a network protocol).
The bitmask of the same bitsize as V has the (successive) bits set to 1,
where the parameter is stored, and the rest to 0.
So basically I wanna apply this math:
P = (V AND M) SHR O
where AND is a bitwise AND, SHR is bitwise shiftright and O is the
order of the lowest set bit in the bitmask.
For example:
V = b0110101000110010 (2 bytes)
M = b0000011111110000 (2 bytes)
O = ln (16) / ln (2) = 4
P = b0110101000110010
AND b0000011111110000
SHR 4
= b0000001000110000 SHR 4 = b0000000000100011 = b100011 (= d35)
Now I am quite at a loss to determine that order O in a single
mathematical operation other than the logarithm.
I could of course do this operation in a loop
while (M AND 1) == 0
P = P SHR 1
M = M SHR 1
end while
But this would definitely be slower if the parameter is a very long one.
So  any suggestions to determine the (binary) order of the lowest set
bit of a variable?
Thank you in advance!
Lars

Re: fastest transition from bitmask to the position of it's lowestSET bit
On May 29, 9:40 am, Lars Uffmann wrote:
> For example:
> V = b0110101000110010 (2 bytes)
> M = b0000011111110000 (2 bytes)
> O = ln (16) / ln (2) = 4
>
> P = b0110101000110010
> AND b0000011111110000
> SHR 4
> = b0000001000110000 SHR 4 = b0000000000100011 = b100011 (= d35)
>
> Now I am quite at a loss to determine that order O in a single
> mathematical operation other than the logarithm.
You want (V & M) / (M & M):
V = b0110101000110010
M = b0000011111110000
M = b1111100000010000 < 2's complement
M&M = b0000000000010000 < lowest bit on
V&M = b0000001000110000 < masked
/M&M = b0000000000100011 < divide is like shr low_bit_pos
Jason

Re: fastest transition from bitmask to the position of it's lowestSET bit
On May 29, 10:33 am, "jason.cipri...@gmail.com"
wrote:
> M&M = b0000000000010000 < lowest bit on
It should be apparent but I forgot to mention you'll have to handle it
as a special case if M==0. Also note that if you can guarantee that
your mask bits are contiguous and you want to avoid integer division
for any reason you can substitute (M & (~M << 1)) for (M / M), and
not have to handle M==0 separately as well.
Jason

Re: fastest transition from bitmask to the position of it's lowestSET bit
On May 29, 10:53 am, "jason.cipri...@gmail.com"
wrote:
> for any reason you can substitute (M & (~M << 1)) for (M / M), and
> not have to handle M==0 separately as well.
Ignore everything after ", and" in that sentence, sorry.

Re: fastest transition from bitmask to the position of it's lowestSET bit
Lars Uffmann wrote:
> I want to combine a bitmask M with a bytealigned variable V to extract
> a (not necessarily) bytealigned parameter P (from a network protocol).
>
> The bitmask of the same bitsize as V has the (successive) bits set to 1,
> where the parameter is stored, and the rest to 0.
>
> So basically I wanna apply this math:
>
> P = (V AND M) SHR O
>
> where AND is a bitwise AND, SHR is bitwise shiftright and O is the
> order of the lowest set bit in the bitmask.
>
> For example:
> V = b0110101000110010 (2 bytes)
> M = b0000011111110000 (2 bytes)
> O = ln (16) / ln (2) = 4
>
> P = b0110101000110010
> AND b0000011111110000
> SHR 4
> = b0000001000110000 SHR 4 = b0000000000100011 = b100011 (= d35)
>
> Now I am quite at a loss to determine that order O in a single
> mathematical operation other than the logarithm.
>
> I could of course do this operation in a loop
> while (M AND 1) == 0
> P = P SHR 1
> M = M SHR 1
> end while
>
> But this would definitely be slower if the parameter is a very long one.
>
> So  any suggestions to determine the (binary) order of the lowest set
> bit of a variable?
In most situations the masks are predetermined constants:
You know in advance that you're interested in bits 410 of a
word. If that's your situation, then I'd suggest turning the
problem around: Instead of starting with the mask and trying
to derive the 4, start with the 4 and 10:
#define HIBIT 10
#define LOBIT 4
P = (V >> LOBIT) & ~(1u << (HIBIT  LOBIT));
(You could add special handling for HIBIT==31 and/or LOBIT==0.)
But if the masks actually do arrive "out of the blue" and
you need to deduce LOBIT, a loop is about the best you can do.
You might speed it up somewhat by biting off more than one
bit at a time:
for (m = M; (m & 0xF) == 0; m >>= 4)
p >>= 4;
for ( ; (m & 1) == 0; m >>= 1)
p >>= 1;
You could pursue this strategy further:
p = V & M;
m = M;
if ((m & 0xFFFF) == 0) {
m >>= 16;
p >>= 16;
}
if ((m & 0xFF) == 0) {
m >>= 8;
p >>= 8;
}
if ((m & 0xF) == 0) {
...
You could even resort to a fullscale binary search:
if ((M & 0x0000FFFF) == 0) {
if ((M & 0x00FF0000) == 0) {
if ((M & 0x0F000000) == 0) {
if ((M & 0x30000000) == 0) {
if ((M & 0x40000000) == 0)
p >>= 31;
else
p >>= 30;
}
else {
if ((M & 0x10000000) == 0)
p >>= 29;
else
p >>= 28;
}
}
else ...
It's not a given, though, that any of these will be faster
than a simple loop. Measure, measure, measure.

Eric.Sosman@sun.com

Re: fastest transition from bitmask to the position of it's lowestSET bit
Hey Jason,
jason.cipriani@gmail.com wrote:
> You want (V & M) / (M & M):
Thank you very much! It's embarrassing but it took me like... 40 minutes
to fully understand the 2's complement in binary, why it looks exactly
the way you described and (M & M) will get me the lowest bit on Your
solution is quite a bit easier than what I had figured for the lowest
bit on:
((M << 1) XOR M) AND M
But the crucial part was that of course  I simply have to divide my
masked value by the value of the lowest bit on. Thanks for pointing that
out, even though I feel like I cheated on a simple riddle by not solving
it myself
Best Regards,
Lars

Re: fastest transition from bitmask to the position of it's lowestSET bit
Eric Sosman wrote:
> [...]
> But if the masks actually do arrive "out of the blue" and
> you need to deduce LOBIT, a loop is about the best you can do.
> [...]
Oooh! See Jason Cipriani's gimmick; it's so cool
it assuages my shame in being refuted.

Eric.Sosman@sun.com

Re: fastest transition from bitmask to the position of it's lowest SET bit
Eric Sosman writes:
> Eric Sosman wrote:
>> [...]
>> But if the masks actually do arrive "out of the blue" and
>> you need to deduce LOBIT, a loop is about the best you can do.
>> [...]
>
> Oooh! See Jason Cipriani's gimmick; it's so cool
> it assuages my shame in being refuted.
Hmm ... the first time I have seen this particular expression is
in the ARMspecific part of Linux 2.6 (include/arm/bitops.h):
#define fls(x) \
( __builtin_constant_p(x) ? constant_fls(x) : \
({ int __r; asm("clz\t%0, %1" : "=r"(__r) : "r"(x) : "cc"); 32__r; }) )
#define ffs(x) ({ unsigned long __t = (x); fls(__t & __t); })
#define __ffs(x) (ffs(x)  1)
where it (IIRC) appeared shortly after a somewhat heated discussion
with R. King regarding wether it should be allowed that drivers for
ARM SoCdevices (in particular, C. Hohnstaedt's 2.6 driver for the
NPEs being part of Intel IXP4XXchips) make use of ARMassembly
instructions insofar they are particularly wellsuited to solving a
specific problem[*].
[*] Insofar there is any doubt regarding this: Mr King's position on
this is/ was, of course, that assembly is arcane, cryptic and
just disgusting and must not be used in a driver for any reason and
the people suggestion to use it nevertheless just because it would
make technical sense are clearly 'non desirable inferior lifeforms to
stupid for this universe which Must Be Flamed Out Of Existence'
(Yes, this was me. The IPX4XXdriver I have written for the 2nd
generation [if ever] hardware of my employer still uses clz directly).

Re: fastest transition from bitmask to the position of it's lowestSET bit
Lars Uffmann wrote:
> I want to combine a bitmask M with a bytealigned variable V to extract
> a (not necessarily) bytealigned parameter P (from a network protocol).
>
> The bitmask of the same bitsize as V has the (successive) bits set to 1,
> where the parameter is stored, and the rest to 0.
>
> So basically I wanna apply this math:
>
> P = (V AND M) SHR O
>
> where AND is a bitwise AND, SHR is bitwise shiftright and O is the
> order of the lowest set bit in the bitmask.
The method that Jason presented, using division to do the shift, is
great if you have reasonably fast integer division on your system. If
you don't then you may want to consider other methods.
Most C libraries have an "ffs" function that "Finds the First Set" bit
in a value. (I have a feeling that it's nonstandard but common.) gcc
has a builtin for this that expands to an appropriate single machine
instruction on platforms that have one (and many do). So you can write:
P = (V & M) >> (ffs(M)1)
Try "man ffs" for details. Note the need for the 1.
But: unless you're sure that you need the extra performance, stick with
the obvious loop!
Phil.


Re: fastest transition from bitmask to the position of it's lowestSET bit
Eric Sosman wrote:
> Oooh! See Jason Cipriani's gimmick; it's so cool
> it assuages my shame in being refuted.
Thanks for your time anyways, though I had the idea with the loop
myself  I definitely wanted to use integer arithmetic, for that is a
set amount (in this case 4) of cpu cycles for the calculation. In a loop
I would have had to do at least a binary search tree which would have 
as you demonstrated  involved LOTS more code AND have been slower
my resulting function now looks like this:
long extractParam (long *buf, long mask)
{
return (*buf & mask) / (mask & mask);
}
It really is *that* easy.
Cheers!
Lars
PS: I'm not handling mask = 0 on purpose, the functions domain just
doesn't include mask = 0 (undefined).

Re: fastest transition from bitmask to the position of it's lowest SET bit
Lars Uffmann writes:
> jason.cipriani@gmail.com wrote:
>> It should be apparent but I forgot to mention you'll have to handle it
>> as a special case if M==0.
> You never know  I might have slipped and missed that, but I'm not
> gonna handle parameters that do not exist (i.e. bitmask = 0x0) 
> thanks for mentioning anyways.
>
>> Also note that if you can guarantee that your mask bits are contiguous
> Luckily can
>
>> and you want to avoid integer division for any reason you can substitute
>> (M & (~M << 1)) for (M / M)
>
> Erm  I gather that ~M is the 2complement of M, but then you probably meant
~M is the bitwise inverse of M.

Måns Rullgård
mans@mansr.com

Re: fastest transition from bitmask to the position of it's lowestSET bit
Måns Rullgård wrote:
> ~M is the bitwise inverse of M.
Sorry  of course. ~M = (M XOR 1)
I thought the 2complement was the same, as opposed to M
Just checked wikipedia, cleared up what was still unclear. Thanks for
pointing that out! My formula correction still applies though, as I was
thinking 2complement = bitwise inverse Whatever made me come to that
assumption...
Best Regards,
Lars

Re: fastest transition from bitmask to the position of it's lowestSET bit
Lars Uffmann wrote:
> my resulting function now looks like this:
>
> long extractParam (long *buf, long mask)
> {
> return (*buf & mask) / (mask & mask);
> }
Actually I think I prefer
long extractParam (void *buf, long mask)
{
return (*(long *)buf & mask) / (mask & mask);
}
That way the function doesn't depend on longaligned values. Can someone
tell me how a typecast to long* behaves on a
nonsizeof(long)bytealigned address if the system hardware doesn't
support this? Or is it always possible to use nonaligned variables with
C compilers if you accept slower processing due to software emulation?
Thanks!
Lars

Re: fastest transition from bitmask to the position of it's lowestSET bit
Phil wrote:
> Most C libraries have an "ffs" function that "Finds the First Set" bit
> in a value. (I have a feeling that it's nonstandard but common.)
It's in POSIX/SUS, but is part of the XSI option (i.e. required for
UNIX conformance, but optional for POSIX conformance).

Geoff Clare

Re: fastest transition from bitmask to the position of it's lowestSET bit
Lars Uffmann wrote:
> Eric Sosman wrote:
>> Oooh! See Jason Cipriani's gimmick; it's so cool
>> it assuages my shame in being refuted.
>
> Thanks for your time anyways, though I had the idea with the loop
> myself  I definitely wanted to use integer arithmetic, for that is a
> set amount (in this case 4) of cpu cycles for the calculation. In a loop
> I would have had to do at least a binary search tree which would have 
> as you demonstrated  involved LOTS more code AND have been slower
>
>
> my resulting function now looks like this:
>
> long extractParam (long *buf, long mask)
> {
> return (*buf & mask) / (mask & mask);
> }
>
> It really is *that* easy.
I'd suggest you use `unsigned long' rather than plain
`long', to avoid problems with negative numbers.

Eric Sosman
esosman@ieeedotorg.invalid

Re: fastest transition from bitmask to the position of it's lowest SET bit
Lars Uffmann writes:
> Måns Rullgård wrote:
>> ~M is the bitwise inverse of M.
>
> Sorry  of course. ~M = (M XOR 1)
That is only true for two's complement representation of negative
numbers.
> I thought the 2complement was the same, as opposed to M
>
> Just checked wikipedia, cleared up what was still unclear.
The wikipedia coverage of C related matters is atrocious. Many
statements made there are outright lies, and much of the remainder is
inaccurate or incomplete. If you want to learn C programming, get a
real book on the subject and a copy of the C99 standard.

Måns Rullgård
mans@mansr.com

Re: fastest transition from bitmask to the position of it's lowest SET bit
Lars Uffmann writes:
> Lars Uffmann wrote:
>> my resulting function now looks like this:
>> long extractParam (long *buf, long mask)
>> {
>> return (*buf & mask) / (mask & mask);
>> }
>
> Actually I think I prefer
>
> long extractParam (void *buf, long mask)
> {
> return (*(long *)buf & mask) / (mask & mask);
> }
>
> That way the function doesn't depend on longaligned values. Can
> someone tell me how a typecast to long* behaves on a
> nonsizeof(long)bytealigned address if the system hardware doesn't
> support this?
Your programme with crash with a SIGBUS. On machines where this
doesn't happen, it will run slower, sometimes 100 times slower or
worse.
> Or is it always possible to use nonaligned variables with C
> compilers if you accept slower processing due to software emulation?
No.
Again, I suggest getting a good book on C programming and a copy of
the standard.

Måns Rullgård
mans@mansr.com

Re: fastest transition from bitmask to the position of it's lowestSET bit
Lars Uffmann wrote:
> Lars Uffmann wrote:
>> my resulting function now looks like this:
>>
>> long extractParam (long *buf, long mask)
>> {
>> return (*buf & mask) / (mask & mask);
>> }
>
> Actually I think I prefer
>
> long extractParam (void *buf, long mask)
> {
> return (*(long *)buf & mask) / (mask & mask);
> }
>
> That way the function doesn't depend on longaligned values.
... except that it does! If `buf' points to a misaligned
address, converting it from `void*' to `long*' doesn't magically
fix the alignment.
> Can someone
> tell me how a typecast to long* behaves on a
> nonsizeof(long)bytealigned address if the system hardware doesn't
> support this?
I think you're asking to be told how to do X, starting
with the assumption that X is impossible ...
> Or is it always possible to use nonaligned variables with
> C compilers if you accept slower processing due to software emulation?
The effect of dereferencing an improperlyaligned pointer
is undefined; in theory, anything at all could happen. In
practice, I've encountered three behaviors:
 Hardware trap leading to program termination (most
common)
 Hardware trap leading to slow, manyinstruction byte
bybyte fixup code (less common)
 Hardware pretends the offending loworder address bits
were zero, and silently accesses "nearby" memory rather
than the intended memory (I've seen this once only)
I imagine you'd be unhappy with all three of these ...

Eric Sosman
esosman@ieeedotorg.invalid

Re: fastest transition from bitmask to the position of it's lowest SET bit
Eric Sosman writes:
> Lars Uffmann wrote:
>> Lars Uffmann wrote:
>>> my resulting function now looks like this:
>>>
>>> long extractParam (long *buf, long mask)
>>> {
>>> return (*buf & mask) / (mask & mask);
>>> }
>> Actually I think I prefer
>> long extractParam (void *buf, long mask)
>> {
>> return (*(long *)buf & mask) / (mask & mask);
>> }
>> That way the function doesn't depend on longaligned values.
>
> ... except that it does! If `buf' points to a misaligned
> address, converting it from `void*' to `long*' doesn't magically
> fix the alignment.
>
>> Can someone tell me how a typecast to long* behaves on a
>> nonsizeof(long)bytealigned address if the system hardware doesn't
>> support this?
>
> I think you're asking to be told how to do X, starting
> with the assumption that X is impossible ...
>
>> Or is it always possible to use nonaligned variables with C
>> compilers if you accept slower processing due to software emulation?
>
> The effect of dereferencing an improperlyaligned pointer
> is undefined; in theory, anything at all could happen. In
> practice, I've encountered three behaviors:
>
>  Hardware trap leading to program termination (most
> common)
>
>  Hardware trap leading to slow, manyinstruction byte
> bybyte fixup code (less common)
>
>  Hardware pretends the offending loworder address bits
> were zero, and silently accesses "nearby" memory rather
> than the intended memory (I've seen this once only)
You missed some:
 Hardware silently reads a rounded address, and rotates the value
read according the least significant bits of the address. Some ARM
variants do this.
 Hardware does something unspecified, usually requiring multiple
reads, returning the correct value, only 23 times slower than an
aligned access. Intel x86 does this.

Måns Rullgård
mans@mansr.com