Distance Between Two Partitions
Define the distance δ between two partitions a, b (|b| == |a|) as follows: For each group bi in b, map it to a group ai such that for all groups ai, the total number of elements that differ between ai and it's partner is minimized.
For example, consider the following:
a = {{1,3,5,...}, {2,4,6,...}}
b = {{1,2,5,6,9,10,...}, {3,4,7,8,11,12,...}}
Here, we can match a0 with b0 or b1. Assuming it maps to b0, exactly half of the elements of b0 are in a0 (the integers 1mod4). Similarly, half of the elements of b1 are in a1 (the integers 0mod4). Since each b0 and b1 is of size N/2, and half of the elements are different, there are N/2 total differences; therefore δ(a,b) = N/2. Note that N/2 is the maximum difference of two partitions due to the minimization mentioned above.
Maximizing Differences Between Half Partitions
Now, we can pose some questions about partitioning in half:
- How to select k half-partitions such that the total distance between all pairs of partitions is maximized (max δN,k).
- What is an algorithm which takes k chosen half-partitions, and finds another partition p such that the total distance between all pairs of the set of the chosen partitions+p is minimized.
Here is an algorithm which builds partitions which satisfy questions 1 and 2 (sort of) when k < floor(log2(N))
for skip in (2**kk for kk in range(k)):
partition = {0:[], 1:[]}
for n in range(N):
t = int(n/skip) % 2
partition[t].append(n)
The intuition is pretty simple. First, you pick every other element to be in a. Then, the elements 0,1 mod 4. Then, the elements 0,1,2,3 mod 8, etc. Each of these sets is has a difference of N/2 to each other, meaning the average pairwise distance is maximized at N/2.
Now for the Really Hard Questions
Of course, if it were as simple as above, it wouldn't be interesting. So, here are some tougher questions.
- For a given N, what is the largest k such that max δN,k is N/2?
- Can the k partitions from 1. be found efficiently?
- Can the k partitions with max δN,k be found efficiently?
- Is the the greedy formulation optimal? More importantly, is there an online way to find k+1 that is "probably good enough?"
- Can 4 be done efficiently?
- What about 3-way or m-way partitions?
- It seems like this could be floor(log2(N)), but that's very wrong. A brute-force greedy simulation for N=8 finds 7 partitions with an average distance of N/2 (I bolded the three my above algorithm would find). If the greedy algorithm is sub-optimal, there could be even more partitions.
- Perhaps...
- No idea.
- I think that the greedy formulation is probably sub-optimal, but very close to the real solutions. That is, the greedy algorithm making a mistake will usually be recovered as k grows.
- Note that my current algorithm, while labeled as greedy, actually needs to enumerate all partitions and compare them. This is bad. No idea if this is possible either.
- I think 3-way and n-way won't be much more complicated than 2-way. Note that we don't really need to consider uneven partitions, because a 3-way split is basically the same as a 1/3 2/3 split.