Hacker News Clone

Show HN: Homogenius – Packer/unpacker to reduce the size of JSON

by afshinmeh on 9/14/2014, 7:39 AM with 28 comments

by beefsack on 9/14/2014, 9:35 AM
Out of interest, do you have any size comparisons compared to gzipping the raw homogenous JSON text?
Edit: I did a quick test by myself, using the first example but repeated for 10,000 objects:
```
  Raw:                550111
  Homogenius:         150183
  
  Raw gzipped:        1688
  Homogenius gzipped: 419
```
Interesting to see a ~4x between raw JSON and Homogenius JSON both when compressed and not compressed.
by _ducky on 9/14/2014, 11:49 AM
I think [Transit][1] by Cognitect addresses this issue as well.
[1]: https://github.com/cognitect/transit-format#caching
by akx on 9/14/2014, 8:59 AM
I think this needs some sort of marker for the compressed form to make it clear it's not to be digested as is. Maybe `$HMGNS$` or something as the first key of the array?
by imaginenore on 9/14/2014, 3:32 PM
Why not just use MessagePack (aka MsgPack)
http://msgpack.org/
by codehero on 9/14/2014, 11:00 AM
I see two great things about this tool, and not exactly for what it was designed:
1) Fix bandwidth problems caused by API designer laziness.
2) Detect API designer laziness and programmer laziness.
The API designer laziness merits description. Essentially, if you as an API host server are sending back tons of redundant data, you are doing a disservice to your user by not passing references to this data in the first place. The results in your API should either include an entities section or links to retrieve more data. (Note the scale of repeated data I am talking about here are subtrees, not just key/value)
The programmer laziness is when an he just sends back JSON.stringify(someVeryLargeObject), instead of making sure only a minimal object is sent back.
So we can detect this laziness by running this tool and looking for 4x compression gains!
by eli on 9/14/2014, 1:02 PM
If size is a concern, wouldn't a binary message format make more sense?
by bussiere on 9/14/2014, 11:46 AM
You loose in human readability but you win in size. An interesting project and i think about a lot of uses for it (save bandwith, avoid timeout etc ...)
thks to make it and i will use it (in go i think).
by ah- on 9/14/2014, 10:41 AM
So it's a schema plus interned values? Do you think it might be better on real-world data to use just one mapping of value id to value instead of one per column/key?
by placeybordeaux on 9/14/2014, 3:34 PM
I really don't see too many use cases here. Both parties have to buy into it, then at that point you might as well use a binary protocol.
by stinky613 on 9/14/2014, 2:20 PM
Doing this with JSON is a neat idea. I recently learned that Excel workbooks do this using a sharedStrings.xml file in the .xlsx package.
by stefan_kendall3 on 9/14/2014, 2:38 PM
No benchmarking or comparisons with gzip.
by shocks on 9/14/2014, 12:54 PM
Any plans to add this to npm?
by lololololololz on 9/14/2014, 1:02 PM
like Alan Turing lolol