Thursday, July 23, 2009

High Variance Partitioning

Consider all of the partitions of a set N of size N. There are BN of these partitions, where Bi is the Bell number. We want to select partitions that are "different" from other partitions, e.g., to have lots of varied partitions. Additionally, we want partitions of roughly equal size. So, to begin with we'll only consider partitions p of the set into two. The number of such partitions is given by the BN,2 Bell Polynomial, Specifically, the xN/22 term. This is (N choose N/2)/2.

Distance Between Two Partitions


Define the distance δ between two partitions a, b (|b| == |a|) as follows: For each group bi in b, map it to a group ai such that for all groups ai, the total number of elements that differ between ai and it's partner is minimized.

For example, consider the following:
a = {{1,3,5,...}, {2,4,6,...}}
b = {{1,2,5,6,9,10,...}, {3,4,7,8,11,12,...}}

Here, we can match a0 with b0 or b1. Assuming it maps to b0, exactly half of the elements of b0 are in a0 (the integers 1mod4). Similarly, half of the elements of b1 are in a1 (the integers 0mod4). Since each b0 and b1 is of size N/2, and half of the elements are different, there are N/2 total differences; therefore δ(a,b) = N/2. Note that N/2 is the maximum difference of two partitions due to the minimization mentioned above.

Maximizing Differences Between Half Partitions

Now, we can pose some questions about partitioning in half:
  1. How to select k half-partitions such that the total distance between all pairs of partitions is maximized (max δN,k).
  2. What is an algorithm which takes k chosen half-partitions, and finds another partition p such that the total distance between all pairs of the set of the chosen partitions+p is minimized.
You'll notice that question 2 sounds a lot like a greedy formulation to this problem. 1 is really frickin' hard in the sense that there are ((N choose N/2)/2 choose k) choices there, so brute forcing them won't work.

Here is an algorithm which builds partitions which satisfy questions 1 and 2 (sort of) when k < floor(log2(N))
for skip in (2**kk for kk in range(k)):
partition = {0:[], 1:[]}
for n in range(N):
t = int(n/skip) % 2
partition[t].append(n)

The intuition is pretty simple. First, you pick every other element to be in a. Then, the elements 0,1 mod 4. Then, the elements 0,1,2,3 mod 8, etc. Each of these sets is has a difference of N/2 to each other, meaning the average pairwise distance is maximized at N/2.

Now for the Really Hard Questions

Of course, if it were as simple as above, it wouldn't be interesting. So, here are some tougher questions.
  1. For a given N, what is the largest k such that max δN,k is N/2?
  2. Can the k partitions from 1. be found efficiently?
  3. Can the k partitions with max δN,k be found efficiently?
  4. Is the the greedy formulation optimal? More importantly, is there an online way to find k+1 that is "probably good enough?"
  5. Can 4 be done efficiently?
  6. What about 3-way or m-way partitions?
Unfortunately, these answers are elusive. Both wikipedia and google have failed to find any research on this, though it seems like a obvious subject. Here are my thoughts:
  1. It seems like this could be floor(log2(N)), but that's very wrong. A brute-force greedy simulation for N=8 finds 7 partitions with an average distance of N/2 (I bolded the three my above algorithm would find). If the greedy algorithm is sub-optimal, there could be even more partitions.
  2. Perhaps...
  3. No idea.
  4. I think that the greedy formulation is probably sub-optimal, but very close to the real solutions. That is, the greedy algorithm making a mistake will usually be recovered as k grows.
  5. Note that my current algorithm, while labeled as greedy, actually needs to enumerate all partitions and compare them. This is bad. No idea if this is possible either.
  6. I think 3-way and n-way won't be much more complicated than 2-way. Note that we don't really need to consider uneven partitions, because a 3-way split is basically the same as a 1/3 2/3 split.
Thoughts anyone?