CaptionBot by Microsoft
I found it curious that this Bot is really bad at recognizing apes: chimpanzees and gorillas specifically. I fed it a lot of the images from a Google image search for these animals and more often than not it either doesn't recognize anything or considers them bears.
I don't mean to offend, but I'm left wondering if the creators of image recognition services disincentivize their neural nets from recognizing something as an ape, gorilla or chimpanzee so as to avoid the same mistake Google made when it falsely recognized black people as gorillas [1].
[1] http://blogs.wsj.com/digits/2015/07/01/google-mistakenly-tag...
I fed it the "Wat" meme and it thinks it's Pope Benedict.
>I am not really confident, but I think it's a man is smiling for the camera and they seem . I am 99% sure that's Pope Benedict XVI
Source Image: http://memesvault.com/wp-content/uploads/Wat-Meme-Old-Lady-0...
Needless to say, my errant habits of trying to break stuff shine through once again.
Took a few tries but worth it - my son from the other morning:
Edit: They should have named it CationBot.
"I seem to be under the weather right now. Try again later :(" i.e. we killed it.
CaptionBot team here. Thanks for the images and captions! Please keep sharing them and give us feedback.
I'm super impressed by its response to this image:
I worked in Image Processing and Vision for a long time. If you'd asked me 2 years ago that something like this could be possible, I would have laughed you out of the room. But in the last year or so, I've been stunned beyond belief at how well these networks work.
Hmm I can't help but think it should have done a little better with this image http://i.imgur.com/yBNJWKf.png
Feeding noise to a neural network is always fun: https://i.imgur.com/pPdwIGx.png
Pretty good. Can't wait to see how good this tech gets in the next few years.
https://www.dropbox.com/s/ty34c02y1mngyrc/Screenshot%202016-...
I gave this image - https://i.imwx.com/images/maps/truvu/map_specnewsdct-109_lts...
and I got this result "I am not really confident, but I think it's a couple of glass vases with flowers on top of a surfboard."
It's only a matter of time before a repeat of Microsoft's last AI experiment (Tay), when the Internet teaches CaptionBot all of the positions in the Kama Sutra.
Made me chuckle:
It says "any image" but I think they really mean "any photograph", based on the samples as well as the stuff I uploaded to it.
This one is really Funny https://pbs.twimg.com/media/Cf8LJk6WcAAqZ0X.jpg (image can be found on windows default install!)
This is a ton of fun. Cat on a counter... Lol http://m.imgur.com/2tYgmmL
Fun stuff
http://i.imgur.com/kS6sgNT.png
Edit: I was expecting it to think an eel was a snake, but... http://i.imgur.com/EmpRNkA.png
This is no fun to talk about without permalinks to uploaded images/results.
I tried a Magic Eye photo. It didn't see the sailboat at all.
My lab is trying to do something similar for answering questions about images. We have a significantly better system than the current system that's online, but we haven't had a chance to update it yet: http://askimage.org
It is far from perfect, but is near state-of-the-art. I'm guessing it won't hold up to HN.
I like it. https://i.imgur.com/5HPdbSa.png
This is amazing. This is exactly what I needed to get through a long on-call shift.
It is almost as smart as a child. I uploaded a picture of my Notre-Dame vacation photo, and the caption was "A person standing in front of a church"... which is close to my sons "mommy standing in front of that church we went to"
It's amazing how wrong this gets some things, and then again its amazing how right it gets other.
The last one in this set really surprised me:
Ohh I got a good one: "I am not really confident, but I think it's a close up of a plane with a blue umbrella."
It was spot on for 30% of the images, but wildly inaccurate on the rest.
In fact, I assume this is a crowd sourced training for the tech..
Kind of disappointing, but at the same time I understand that this task is not trivial at all.
Links to a couple of the initial super impressive research papers on generating captions for images from 2014 and 2015: http://googleresearch.blogspot.com/2014/11/a-picture-is-wort... http://cs.stanford.edu/people/karpathy/deepimagesent/
As far as I know this was the first research to do the super cool thing to combine multiple neural nets trained on different data in super cool ways:
"Now, what if we replaced that first RNN and its input words with a deep Convolutional Neural Network (CNN) trained to classify objects in images? Normally, the CNN’s last layer is used in a final Softmax among known classes of objects, assigning a probability that each object might be in the image. But if we remove that final layer, we can instead feed the CNN’s rich encoding of the image into a RNN designed to produce phrases. We can then train the whole system directly on images and their captions, so it maximizes the likelihood that descriptions it produces best match the training descriptions for each image."
AND
"Our alignment model is based on a novel combination of Convolutional Neural Networks over image regions, bidirectional Recurrent Neural Networks over sentences, and a structured objective that aligns the two modalities through a multimodal embedding"
It can't recognise B. Gates photo. Ok.
Android users: do you get a lack of memory/resources error when you try to take a pic instead of selecting from gallery? It is a silly bug where the camera activity kills the browser activity that called it.
Google: we cannot move forward with 'Progressive Webapps' if you guys don't fix these silly bugs.
Take-picture-do-something is a common feature of webapps like here Mr CaptionBot!!
Yeah, the tech is not yet ready for prime time.
1) Close-up of a roman coin
- I think it's a banana peel
2) Inverse black-on-white outline drawing of a wolf howling at the moon (logo of comic series Elfquest)
- I am not really confident, but I think it's a close up of two giraffes near a tree.
3) Red-on-black drawing of eight arrows with a circle in the middle (chaos symbol).
- I am not really confident, but I think it's a red and white sign.
4) Red-on-black drawing of a hammer-and-sickle (communism symbol).
- I am not really confident, but I think it's a picture of some sort.
5) Ltd Cmdr Data laughing, one hand on his chest, the other extending outside the picture.
- I am not really confident, but I think it's a man holding a wii controller and he seems :D
6) Germaine Greer biting the head off a barbie, while shaking another off its ponytail
- I am not really confident, but I think it's a woman eating a doughnut and they seem :D D:
7) Image of a tiny lilac octopus on a black background
- I am not really confident, but I think it's a close up of a doughnut.
8) Red-on-black drawing of an "A" in a circle (anarchy symbol)
- I am not really confident, but I think it's a lamppost
9) Black-and-white picture of actress Liv Ulmann
- I am not really confident, but I think it's a man with a stuffed animal.
10) Portrait of countess Elisabeth Bathory
- I am not really confident, but I think it's a woman wearing a hat and she seems :|
For the record, number (10) is spot on (though with low confidence, so may be just random).
At least it got the tree part right.
Hmm.
A couple of days ago I think there was a post about Google doing a lot of development and research around creating systems that understand / categorize / comment / recognize images.
One thing I took away from reading about it is that Google has billions of images to train it with from all their different ventures.
Does Microsoft have access to anywhere near the same numbers of pictures?
> I am not really confident, but I think it's a close up of a cat.
Hello kitty: http://i.livescience.com/images/i/000/024/750/i02/tarantula-...
An interesting project, but it fared pretty poorly on all of the images I gave it - the suggestions were wildly outlandish.
I wish services like this would be released without any kind of moral filter on the subjects it classifies.
I uploaded a picture of Michaelangelos David to the service to see what captionbot would say about it, and I got back a message "I think this may be inappropriate content so I won't show it."
Tried it with three different pictures, one from clker.com (http://www.clker.com/cliparts/0/3/f/0/1194984730712928848mag...) mistaken for a lamp-post, and two from unsplash (https://unsplash.com/photos/2Ts5HnA67k8 and https://unsplash.com/photos/iIg4F2IWbTM). In the latter two it tells me that it cannot recognise anything. So for me, it isn't there yet....
It classified my picture of a dog as inappropriate content and wont display it. Dang it.
It feels like there are two sides of this: either recognition is amazing, either is really really far.
It seems that after it generates the caption, this needs to be fed to some semantic pipe, so that a plane sitting on a book would not make sense, and try further.
After all, it really depends on the training data. If the picture of a train ticket was never seen by the NN, how could it answer correctly? How ever, it should try to reduce the answer to some more meaningfull info, for example instead of two giraffes near a tree, ideally would have said, it's a text and would attempt OCR.
I gave it a photo of a Cylon [0] and it said "I am not really confident, but I think it's a close up of a motorcycle." Close but not really there; Google's reverse image search has a better detection in this case. As an aside, it'd have been really cool if it said it was a picture of a toaster.
[0] http://www.xperiax10.net/wp-content/gallery/cinema_x10/cylon...
CaptionBot doesn't really know what to make of Winged Doom: http://imgur.com/86uwKfa
Pretty impressive - gave it a few profile photos and it did suprisingly well, correctly identifying "A couple walking on a beach at sunset," "a man looking out a window", etc.
It struggled with wildlife photos - a pack of arctic wolves was "a sheep standing in the snow", and penguins swimming was "a bird flying over a body of water" (close but no cigar).
Uploaded a cropped version of Mars in a photo that shows its atmosphere from http://spaceref.com/onorbit/mars-methane-and-mysteries.html
And was told: "I am not really confident, but I think it's a toilet that is in the dark."
I tried a bunch of different images and I got 'two giraffes near a tree' a bunch of times. They were drawn images though.
It's not at all working. Every time, the same thing pops up - "I am under the weather now. Try again later. :("
Definitely not a picture of 2 giraffes near a tree: https://www.dropbox.com/s/ki9p59txh8mk143/Photo%20Apr%2013%2... It's just a Caltrain ticket ¯\_(ツ)_/¯
It doesn't seem to know about rockets. http://imgur.com/mqRuLVq.png
(I tried the spacex landing pictures too - it correctly identified "a boat in a large body of water" but ignored the ten-story rocket above said boat.)
Silly Microsoft, should have at least had some caching layer instead of analyzing every image. RIP CaptionBot.
undefined
Microsoft has great tech team - no doubt, but seems it lacks in product and market strategies.
I'd like to see photo battles between microsoft and google, as a live game show.
My results ranged from impressive to awful. It recognized Pete Carroll with 96% accuracy from a meme picture where he struts and chews gum. Then it thought a picture of the super bowl field before the game was boats on a table.
"I never felt at home here. This is an awful place to be dropped down halfway”
Gave it a picture of an AR-15 on a shooting bench and it thought it was a bicycle.
Service appears to be down or "under the weather" whatever that means.
Ha, got eerily accurate results. Some funny ones as well, but interesting tech.
Pretty impressed that it got this one, given how the faucet breaks up the outline.
My photos did not do too well. My Coral looks like a cake, my lizard looks like a bird, my boy fishing looks like a man next to a river, and a waterfall looks like a close up of Rock.
Hypnotoad is not a "person on a surf board in a skate park." http://imgur.com/2Cf5LKW
I uploaded the sad Michael Jordan meme face and it responded "I think it's Michael Jordan wearing a suit and tie and he seems :(", sounds about right...
So I looked for a random photo on my phone and fed it a picture of a spot my leg that I'm keeping an eye on. Close-up of a cat apparently. Damn these hairy legs.
Hmmm... I'm not seeing it. https://i.imgur.com/OFPArbf.png
I gave it a picture of a Captcha, and it said it was some giraffes against a fence. :) So at least we know they haven't broken Captcha yet!
This one made me laugh http://imgur.com/u0E5eu5
Close up of a Bicycle... http://imgur.com/BZd088p
This made me laugh: https://imgur.com/PhbyAyK
Surprisingly it does pretty poorly on the images include in Windows XP's "Sample Pictures" folder.
I gave it a statue of Joan of Arc and it thinks it is a motorcycle mirror with a neutral expression...
Wow! I gave it a photo of a kitesurfer and it got it (man flying a kite in a body of water). Amazing!!
Uploaded dick pick. Caption said it was a micro penis :-(
Feed it Deep Dream generated images.
It did not work well for me. I tried to give it an easy one. A picture of a salt and pepper shaker. Here's what it said:
> I am not really confident, but I think it's a cake made to look like a phone.
Nice try m$.