Hacker News Clone

Introducing the Open Images Dataset

by hurrycane on 9/30/2016, 5:38 PM with 36 comments

by imh on 10/1/2016, 12:30 AM
Lawyers are funny:
>Today, we introduce Open Images, a dataset consisting of ~9 million URLs ... having a Creative Commons Attribution license* .
Then the footnote below:
>* While we tried to identify images that are licensed under a Creative Commons Attribution license, we make no representations or warranties regarding the license status of each image and you should verify the license for each image yourself.
I think this might be the most blatant instance I've ever seen of, "We have to write this even though it's essentially impossible for you to actually follow our directions."
by transcranial on 9/30/2016, 8:17 PM
Interesting that the base data consists of URLs. I guess it makes sense given copyright issues. Anybody know what the ballpark expected half-life of such URLs?
by diyseguy on 10/1/2016, 12:49 AM
Any guesses on how large the resulting dataset would be if you actually downloaded all the images? I imagine the urls will get removed in a hurry as everybody starts automating it.
by devindotcom on 9/30/2016, 8:05 PM
First video, now images - wonder if speech and others are on the way?
It's nice that they're doing this, helps advance the art I think. But it also puts a lot of smaller operations in unis sort of under the Google system in that they're best compared to Google's ML work and others using these datasets. It's a small way of stacking the deck to make Google and DeepMind more embedded in the community.
That said, its utility for others surely outweighs the strategic advantage gained here, so I for one welcome these libraries. A lot of work goes into them. Hopefully others will release theirs as well.
by zappo2938 on 10/1/2016, 5:55 AM
I'm glad I'm getting a return on all the effort clicking street signs and store fronts on reCaptcha.
by pilooch on 10/1/2016, 6:53 PM
I've put an efficient downloader here for the interested crowd: https://github.com/beniz/openimages_downloader It's a fork of the one script I used to grab Imagenet.
by dharma1 on 9/30/2016, 10:58 PM
Is there a link to the trained model somewhere?
by rocky1138 on 9/30/2016, 8:24 PM
Are there any other libraries that are similar?
by Omnipresent on 10/1/2016, 1:01 AM
Looking forward to someone trying tensorFlow CNN on this