How to perform a bitwise operation on floating point numbers


Question

I tried this:

float a = 1.4123;
a = a & (1 << 3);

I get a compiler error saying that the operand of & cannot be of type float.

When I do:

float a = 1.4123;
a = (int)a & (1 << 3);

I get the program running. The only thing is that the bitwise operation is done on the integer representation of the number obtained after rounding off.

The following is also not allowed.

float a = 1.4123;
a = (void*)a & (1 << 3);

I don't understand why int can be cast to void* but not float.

I am doing this to solve the problem described in Stack Overflow question How to solve linear equations using a genetic algorithm?.

1
44
5/23/2017 12:02:14 PM

Accepted Answer

At the language level, there's no such thing as "bitwise operation on floating-point numbers". Bitwise operations in C/C++ work on value-representation of a number. And the value-representation of floating point numbers is not defined in C/C++. Floating point numbers don't have bits at the level of value-representation, which is why you can't apply bitwise operations to them.

All you can do is analyze the bit content of the raw memory occupied by the floating-point number. For that you need to either use a union as suggested below or (equivalently, and only in C++) reinterpret the floating-point object as an array of unsigned char objects, as in

float f = 5;
unsigned char *c = reinterpret_cast<unsigned char *>(&f);
// inspect memory from c[0] to c[sizeof f - 1]

And please, don't try to reinterpret a float object as an int object, as other answers suggest. That doesn't make much sense, that is illegal, and that is not guaranteed to work in compilers that follow strict-aliasing rules in optimization. The only legal way to inspect memory content in C++ is by reinterpreting it as an array of [signed/unsigned] char.

Also note that you technically aren't guaranteed that floating-point representation on your system is IEEE754 (although in practice it is unless you explicitly allow it not to be, and then only with respect to -0.0, ┬▒infinity and NaN).

71
11/13/2014 11:28:50 PM

If you are trying to change the bits in the floating-point representation, you could do something like this:

union fp_bit_twiddler {
    float f;
    int i;
} q;
q.f = a;
q.i &= (1 << 3);
a = q.f;

As AndreyT notes, accessing a union like this invokes undefined behavior, and the compiler could grow arms and strangle you. Do what he suggests instead.


Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon