We have a program that performs a comparison of every pair of elements in a database, and then compares the result with the median value of all values. It was suggested that the program should instead make pairwise comparisons within every possible combination of subsets. How many comparisons would we end up making?

For this example, each element in our database will be represented by an integer. We call these elements $P \in \mathbb{Z}$ the population which is then partitioned into all possible distinct pairwise disjoint subsets $\{P_1, P_2, .. , P_k\}$. Each $P_i$ is called a block of \emph{P} and is distinct; the program should not enumerate the same block multiple times. As an extra complication, each pairwise comparison will need to be also made against the block median. So for each block $P_i$ there are two operations performed:

The first operation need only be performed once per block and the second will need to be performed exactly $\|P_i\|$ times, as each block is also a subset and cannot contain duplicates.

The number of ways can the population \emph{P} be divided into $n$ disjoint non-empty subsets is known as a Stirling number of the second kind, or simply a Stirling partition number. (see [1]) This is defined as:

$\stirling{n}{k} =\frac{1}{k!}\sum\limits_{j=0}{k}(-1)^{k-j}\binom{k}{j}j^n$

So when \emph{P} is divided with $k$ partitions where $n$ is the cardinality of the block it follows that the program will need to obtain $k$ median values and then perform $n$ pairwise comparisons:[sterling]

$C^k_{total} = n\stirling{n}{k} + k$

which is the number of ways a set with n elements can be partitioned into disjoint, non-empty subsets. $S$ the number of ways a set with $n$ elements can be partitioned into $k$ disjoint, non-empty subsets. As the partition changes, all of $n$ needs to be recalculated as does the score of the partition $k$. For every different set of partitions we perform $n$ operations plus $k$ operations as it is possible to have anywhere from 0 to "$n$" partitions in the set.

Then the total number of operations over all possible partitions is: C_{total} = \sum_{k=0}^{|P|}{( n\stirling{n}{k} + k)} C_{total} = n\sum_{k=0}^{n}{\stirling{n}{k}} + \frac{n(n+1)}{2}" C_{total} = nB_n + \frac{n(n+1)}{2} Where B_n is the nth Bell number. We can now use this formula to calculate the exact number of comparisons made during the execution of the program

Further Reading