This is the well know select algorithm. see http://en.wikipedia.org/wiki/Selection_algorithm.
I need it to find the median value of a set of 3x3x3 voxel values. Since the volume is made of a billion voxels and the algorithm is recursive, it better be a little bit fast. In general it can be expected that values are relatively close.
The fastest known algorithm I have tried out so far uses the quick sort partition function. I would like to know if there is a faster one.
I've "invented" a 20% faster one using two heaps, but expected an even faster one using a hash. Before implementing this, I'd like to know if a blitz fast solution already exist out there.
The fact that I'm using floats shouldn't matter since they can be considered as unsigned integer after inverting the sign bit. The order will be preserved.
EDIT: benchmark and source code moved into a separate answer as suggested by Davy Landman. See below for the answer by chmike.
EDIT: The most efficient algorithm so far was referenced below by Boojum as a link to the Fast Median and Bilateral Filtering paper which is now the answer to this question. The first smart idea of this method is to use radix sort, the second is to combine median search of adjacent pixels who share a lot of pixels.
Since it sounds like you're performing a median filter on a large array of volume data, you might want to take a look at the Fast Median and Bilateral Filtering paper from SIGGRAPH 2006. That paper deals with 2D image processing, but you might be able to adapt the algorithm for 3D volumes. If nothing else, it might give you some ideas on how to step back and look at the problem from a slightly different perspective.
The selection algorithm is linear time (O(n)). Complexity-wise you can't do better than linear time, because it takes linear time to read in all the data. So you couldn't have made something that is faster complexity-wise. Perhaps you have something that is a constant factor faster on certain inputs? I doubt it would make much of a difference.
C++ already includes the linear-time selection algorithm. Why not just use it?
std::vector<YourType>::iterator first = yourContainer.begin(); std::vector<YourType>::iterator last = yourContainer.end(); std::vector<YourType>::iterator middle = first + (last - first) / 2; std::nth_element(first, middle, last); // can specify comparator as optional 4th arg YourType median = *middle;
Edit: Technically, that is only the median for a container of odd length. For one of even length, it will get the "upper" median. If you want the traditional definition of median for even length, you might have to run it twice, once for each of the two "middles" at
first + (last - first) / 2 and
first + (last - first) / 2 - 1 and then average them or something.