Algorithms_in_C++ 1.0.0
Set of algorithms implemented in C++.
windowed_median.cpp File Reference

An implementation of a median calculation of a sliding window along a data stream. More...

#include <cassert>
#include <cstdlib>
#include <ctime>
#include <list>
#include <set>
#include <vector>
Include dependency graph for windowed_median.cpp:

Classes

class  probability::windowed_median::WindowedMedian
 A class to calculate the median of a leading sliding window at the back of a stream of integer values. More...
 

Namespaces

namespace  probability
 for std::vector - needed in testing
 
namespace  windowed_median
 Functions for the Windowed Median algorithm implementation.
 

Typedefs

using probability::windowed_median::Window = std::list< int >
 
using probability::windowed_median::size_type = Window::size_type
 

Functions

static void test (const std::vector< int > &vals, int windowSize)
 namespace probability More...
 
int main (int argc, const char *argv[])
 Main function. More...
 

Detailed Description

An implementation of a median calculation of a sliding window along a data stream.

Given a stream of integers, the algorithm calculates the median of a fixed size window at the back of the stream. The leading time complexity of this algorithm is O(log(N), and it is inspired by the known algorithm to [find median from (infinite) data stream](https://www.tutorialcup.com/interview/algorithm/find-median-from-data-stream.htm), with the proper modifications to account for the finite window size for which the median is requested

Algorithm

The sliding window is managed by a list, which guarantees O(1) for both pushing and popping. Each new value is pushed to the window back, while a value from the front of the window is popped. In addition, the algorithm manages a multi-value binary search tree (BST), implemented by std::multiset. For each new value that is inserted into the window, it is also inserted to the BST. When a value is popped from the window, it is also erased from the BST. Both insertion and erasion to/from the BST are O(logN) in time, with N the size of the window. Finally, the algorithm keeps a pointer to the root of the BST, and updates its position whenever values are inserted or erased to/from BST. The root of the tree is the median! Hence, median retrieval is always O(1)

Time complexity: O(logN). Space complexity: O(N). N - size of window

Author
Yaniv Hollander

Function Documentation

◆ main()

int main ( int  argc,
const char *  argv[] 
)

Main function.

Parameters
argccommand line argument count (ignored)
argvcommand line array of arguments (ignored)
Returns
0 on exit

A few fixed test cases

Array of sorted values; odd window size

Array of sorted values - decreasing; odd window size

Even window size

Array with repeating values

Array with same values except one

Array that includes repeating values including negatives

Array with large values - sum of few pairs exceeds MAX_INT. Window size is even - testing calculation of average median between two middle values

Random test cases

Array size in the range [5, 20]

Window size in the range [3, 10]

Random array values (positive/negative)

Testing randomized test

197 {
198
199 /// A few fixed test cases
200 test({1, 2, 3, 4, 5, 6, 7, 8, 9}, 3); /// Array of sorted values; odd window size
201 test({9, 8, 7, 6, 5, 4, 3, 2, 1}, 3); /// Array of sorted values - decreasing; odd window size
202 test({9, 8, 7, 6, 5, 4, 5, 6}, 4); /// Even window size
203 test({3, 3, 3, 3, 3, 3, 3, 3, 3}, 3); /// Array with repeating values
204 test({3, 3, 3, 3, 7, 3, 3, 3, 3}, 3); /// Array with same values except one
205 test({4, 3, 3, -5, -5, 1, 3, 4, 5}, 5); /// Array that includes repeating values including negatives
206
207 /// Array with large values - sum of few pairs exceeds MAX_INT. Window size is even - testing calculation of
208 /// average median between two middle values
209 test({470211272, 101027544, 1457850878, 1458777923, 2007237709, 823564440,
210 1115438165, 1784484492, 74243042, 114807987}, 6);
211
212 /// Random test cases
213 std::srand(static_cast<unsigned int>(std::time(nullptr)));
214 std::vector<int> vals;
215 for (int i = 8; i < 100; i++) {
216 const auto n = 1 + std::rand() / ((RAND_MAX + 5u) / 20); /// Array size in the range [5, 20]
217 auto windowSize = 1 + std::rand() / ((RAND_MAX + 3u) / 10); /// Window size in the range [3, 10]
218 vals.clear();
219 vals.reserve(n);
220 for (int i = 0; i < n; i++) {
221 vals.push_back(rand() - RAND_MAX); /// Random array values (positive/negative)
222 }
223 test(vals, windowSize); /// Testing randomized test
224 }
225 return 0;
226}
T clear(T... args)
T push_back(T... args)
T rand(T... args)
T reserve(T... args)
T srand(T... args)
T time(T... args)
static void test(const std::vector< int > &vals, int windowSize)
namespace probability
Definition: windowed_median.cpp:181
Here is the call graph for this function:

◆ test()

static void test ( const std::vector< int > &  vals,
int  windowSize 
)
static

namespace probability

Self-test implementations

Parameters
valsStream of values
windowSizeSize of sliding window

Comparing medians: efficient function vs. Naive one

181 {
182 probability::windowed_median::WindowedMedian windowedMedian(windowSize);
183 for (const auto val : vals) {
184 windowedMedian.insert(val);
185
186 /// Comparing medians: efficient function vs. Naive one
187 assert(windowedMedian.getMedian() == windowedMedian.getMedianNaive());
188 }
189}
A class to calculate the median of a leading sliding window at the back of a stream of integer values...
Definition: windowed_median.cpp:56
Here is the call graph for this function: