Hacker News Clone

The $5000 Compression Challenge

by igul222 on 1/8/2013, 7:01 AM with 61 comments

by haberman on 1/8/2013, 9:57 AM
Poetic justice. I love this quotation from Mike:
"I think that I did not make a mistake except to believe that you truly intended to attempt to compress the data I was sending you."
Mike is happily preying upon people by taking their money to enter what he believes is an impossible contest, but the moment he is outsmarted he appeals to morals and calls into question whether Patrick was acting honestly.
I'm guessing there is some backstory on this newsgroup involving people who would claim to invent compression algorithms that do the impossible. I'm imagining that one day Mike thought "time to get these people to put their money where their mouth is."
Still, it's pretty low to pose the challenge without saying "I'm taking this bet because it's highly unlikely that you can actually succeed. This is explained in the FAQ. Are you sure you want to give me $100?"
by Nate75Sanders on 1/8/2013, 9:47 AM
Mike Goldman does not need to ever issue a challenge again.
It is WIDELY agreed upon by people who engage in this activity regularly (known as "prop betting") that the spirit of the law does not matter one bit.
He was way out of his element here and Patrick was well within his rights to do what he did.
by ars on 1/8/2013, 9:57 AM
The pigeonhole principle only applies to a specific compression algorithm.
If you get to custom write the algorithm for the data you can arrange things so that for this datafile it will be smaller, yet larger for any other file, and still meet the principle.
i.e. this is not actually impossible. If you can find ANY redundancy in the data, and you can code a very small decompresser specifically for that, you can probably win.
And due to the nature of randomness, there are always numeric streaks, and other patterns in the data. The larger the datafile the better the chance of finding some sort of pattern or streak.
The decompresser does not have to be large either - it would be perfectly legal to use a perl script for example. (i.e. so you don't have to use space writing IO handling code).
Edit: I suspect I may be wrong here.
by CJefferson on 1/8/2013, 9:16 AM
This is an old story, but a fun one.
To me it is clear that Mike Goldman should have paid up. Patrick clearly completed the challenge placed on him.
While Patrick didn't obey the spirit of the challenge he obeyed the law, and Mike knew full well that the spirit of his challenge was impossible!
by splicer on 1/8/2013, 10:36 AM
Here's another approach:
Let the compressed file contain a hash (say, SHA1) of the original file. The decompression program then generates random files of the chosen size. If a generated file's hash doesn't match the desired hash, delete it. Now run the program for a very long time. The program is likely to eventually reproduce the original file (along with a bunch of files that happen to have the same hash), and you win :)
by vanni on 1/8/2013, 9:41 AM
Previous discussion:
http://news.ycombinator.com/item?id=4616704
by algorias on 1/8/2013, 10:11 AM
Even without doing any filesystem tricks, the challenge seems quite possible. While it's true that writing a compressor that compresses all input data is impossible, this barrier is removed because the compressor only needs to work on a single file.
by on 1/8/2013, 11:09 AM
undefined
by RyanZAG on 1/8/2013, 11:09 AM
It is definitely possible to win this challenge though.
Consider an arbitrary long series of integers. Somewhere within this series of integers, there will be some kind of randomly created pattern, since this is a property of an infinite set. eg. somewhere within the data set, there could be the values [1, 2, 3, ... 10] or [1, 3, 9, .. 27] or [1, 2, 4, 16, 32] - it does not matter which of these patterns, exist, only that there does exist some mathematical pattern in the data.
The chances of there being no pattern in a big enough set of random data is impossible as there is a finite number of possible data combinations for bytes [1..256][1..256] etc. I guess a data set of 256^256 bytes would guarantee a pattern, but I'm sure there is a far smaller number that would give 99% confidence.
Once you find a pattern in the data, you can remove that pattern and replace it with code that will recreate the pattern using a data offset. ie. you remove the pattern from the data completely, and replace it with a smaller piece of code to recreate that pattern exactly and insert it into the correct position.
The key here is that once the data has been generated, it is no longer 'random data', but a fixed input. eg, you cannot compress a random unknown string of bytes, but you can compress the string [1,2,4,16..]
The output data would have all possible mathematical patterns removed from it, and the decompression code would be just a list of mathematical functions and data offset points.
by DanBC on 1/8/2013, 2:05 PM
If you're interested in efficient compression with English you might like the Hutter Prize (http://prize.hutter1.net/) - a €50,000 prize if you can compress the 100MB file enwik8 to less than the current record of about 16MB. They claim it's an AI problem. I guess it could be if you're compressing a bunch of different English texts, but this challenge has a single text. A program that provides very efficient compression of this text might not be so great for other texts.
Mike's challenge took a specially selected random file. As other people said, this was an attempt to show compression kooks that some files are not compressible (if you include the size of the decompressor).
by pavanky on 1/8/2013, 10:00 AM
Makes me wonder if a "decompressor" adding generating hundreds of decompressed files by brute force (of which the original file is one) would be a valid solution. You could store a certain section of the original as the "compressed file"
by kyle4211 on 1/8/2013, 2:25 PM
I can't find anything which solidly points to this challenge being 'likely' impossible.
This all is not so simple as pointing at Kolmogorov complexity. Randomness is not so much inherent as relative to the machine on which you're running your program.
http://rjlipton.wordpress.com/2011/06/02/how-powerful-are-ra...
by xentronium on 1/8/2013, 9:48 AM
Why even bother with file concatenation? Just use 3M chars long filename for an empty file to "compress" 3Mb of data.
by kunil on 1/8/2013, 9:14 AM
I don't get it, why he didn't just write a simple algorithm and get the prize? It looks like an easy money
by splicer on 1/8/2013, 10:20 AM
Just post the original data file on the web, then have your "decompression" program download the file. The "compressed" file could simply contain a URL.
by SparrowOS on 1/8/2013, 8:20 AM
I did Ticketmaster's file compression. Denny gave me a magazine with LZW. I screwed-around with many good ideas to combine with LZW, but that was just a white-trash waste of time. I had the sense to keep it simple, in the end, with straight-up LZW.