The median of a trillion numbers

  • If you know the range of numbers then I'd do a binary search. Get each server to count how many are higher or lower. Play the high low game. When the two totals are the same you're done. I think this covers the authors edge cases too. :-)

  • Really? This is 10 minutes in hadoop time if those 1k machines were in the cluster. And 10 minutes is on the top end of the estimate.