Fastest way to scan for bit pattern in a stream of bits


I need to scan for a 16 bit word in a bit stream. It is not guaranteed to be aligned on byte or word boundaries.

What is the fastest way of achieving this? There are various brute force methods; using tables and/or shifts but are there any "bit twiddling shortcuts" that can cut down the number of calculations by giving yes/no/maybe contains the flag results for each byte or word as it arrives?

C code, intrinsics, x86 machine code would all be interesting.

11/14/2009 9:11:07 AM

Accepted Answer

Using simple brute force is sometimes good.

I think precalc all shifted values of the word and put them in 16 ints so you got an array like this (assuming int is twice as wide as short)

 unsigned short pattern = 1234;
 unsigned int preShifts[16];
 unsigned int masks[16];
 int i;
 for(i=0; i<16; i++)
      preShifts[i] = (unsigned int)(pattern<<i);  //gets promoted to int
      masks[i] = (unsigned int) (0xffff<<i);

and then for every unsigned short you get out of the stream, make an int of that short and the previous short and compare that unsigned int to the 16 unsigned int's. If any of them match, you got one.

So basically like this:

  int numMatch(unsigned short curWord, unsigned short prevWord)
       int numHits = 0;
       int combinedWords = (prevWord<<16) + curWord;

       int i=0;
       for(i=0; i<16; i++)
             if((combinedWords & masks[i]) == preShifsts[i]) numHits++;
       return numHits;

Do note that this could potentially mean multiple hits when the patterns is detected more than once on the same bits:

e.g. 32 bits of 0's and the pattern you want to detect is 16 0's, then it would mean the pattern is detected 16 times!

The time cost of this, assuming it compiles approximately as written, is 16 checks per input word. Per input bit, this does one & and ==, and branch or other conditional increment. And also a table lookup for the mask for every bit.

The table lookup is unnecessary; by instead right-shifting combined we get significantly more efficient asm, as shown in another answer which also shows how to vectorize this with SIMD on x86.

8/22/2019 6:06:24 PM

Here is a trick to speed up the search by a factor of 32, if neither the Knuth-Morris-Pratt algorithm on the alphabet of two characters {0, 1} nor reinier's idea are fast enough.

You can first use a table with 256 entries to check for each byte in your bit stream if it is contained in the 16-bit word you are looking for. The table you get with

unsigned char table[256];
for (int i=0; i<256; i++)
  table[i] = 0; // initialize with false
for (i=0; i<8; i++)
  table[(word >> i) & 0xff] = 1; // mark contained bytes with true

You can then find possible positions for matches in the bit stream using

for (i=0; i<length; i++) {
  if (table[bitstream[i]]) {
    // here comes the code which checks if there is really a match

As at most 8 of the 256 table entries are not zero, in average you have to take a closer look only at every 32th position. Only for this byte (combined with the bytes one before and one after) you have then to use bit operations or some masking techniques as suggested by reinier to see if there is a match.

The code assumes that you use little endian byte order. The order of the bits in a byte can also be an issue (known to everyone who already implemented a CRC32 checksum).

Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow