Introducing the Open Images Dataset
- Lawyers are funny: - >Today, we introduce Open Images, a dataset consisting of ~9 million URLs ... having a Creative Commons Attribution license* . - Then the footnote below: - >* While we tried to identify images that are licensed under a Creative Commons Attribution license, we make no representations or warranties regarding the license status of each image and you should verify the license for each image yourself. - I think this might be the most blatant instance I've ever seen of, "We have to write this even though it's essentially impossible for you to actually follow our directions." 
- Interesting that the base data consists of URLs. I guess it makes sense given copyright issues. Anybody know what the ballpark expected half-life of such URLs? 
- Any guesses on how large the resulting dataset would be if you actually downloaded all the images? I imagine the urls will get removed in a hurry as everybody starts automating it. 
- First video, now images - wonder if speech and others are on the way? - It's nice that they're doing this, helps advance the art I think. But it also puts a lot of smaller operations in unis sort of under the Google system in that they're best compared to Google's ML work and others using these datasets. It's a small way of stacking the deck to make Google and DeepMind more embedded in the community. - That said, its utility for others surely outweighs the strategic advantage gained here, so I for one welcome these libraries. A lot of work goes into them. Hopefully others will release theirs as well. 
- I'm glad I'm getting a return on all the effort clicking street signs and store fronts on reCaptcha. 
- I've put an efficient downloader here for the interested crowd: https://github.com/beniz/openimages_downloader It's a fork of the one script I used to grab Imagenet. 
- Is there a link to the trained model somewhere? 
- Are there any other libraries that are similar? 
- Looking forward to someone trying tensorFlow CNN on this