Hacker News Clone

CaptionBot by Microsoft

by afshinmeh on 4/13/2016, 3:39 PM with 160 comments

by jnky on 4/13/2016, 8:03 PM
I found it curious that this Bot is really bad at recognizing apes: chimpanzees and gorillas specifically. I fed it a lot of the images from a Google image search for these animals and more often than not it either doesn't recognize anything or considers them bears.
I don't mean to offend, but I'm left wondering if the creators of image recognition services disincentivize their neural nets from recognizing something as an ape, gorilla or chimpanzee so as to avoid the same mistake Google made when it falsely recognized black people as gorillas [1].
[1] http://blogs.wsj.com/digits/2015/07/01/google-mistakenly-tag...
by 6stringmerc on 4/13/2016, 4:25 PM
I fed it the "Wat" meme and it thinks it's Pope Benedict.
>I am not really confident, but I think it's a man is smiling for the camera and they seem . I am 99% sure that's Pope Benedict XVI
Source Image: http://memesvault.com/wp-content/uploads/Wat-Meme-Old-Lady-0...
Needless to say, my errant habits of trying to break stuff shine through once again.
by flatline on 4/13/2016, 5:49 PM
Took a few tries but worth it - my son from the other morning:
https://imgur.com/a/w1Uai
Edit: They should have named it CationBot.
by coldcode on 4/13/2016, 3:59 PM
"I seem to be under the weather right now. Try again later :(" i.e. we killed it.
by gilnahmias on 4/13/2016, 5:08 PM
CaptionBot team here. Thanks for the images and captions! Please keep sharing them and give us feedback.
by jagger27 on 4/13/2016, 4:45 PM
I'm super impressed by its response to this image:
http://i.imgur.com/tc5rz9s.png
by 1024core on 4/13/2016, 6:51 PM
I worked in Image Processing and Vision for a long time. If you'd asked me 2 years ago that something like this could be possible, I would have laughed you out of the room. But in the last year or so, I've been stunned beyond belief at how well these networks work.
by satysin on 4/13/2016, 7:12 PM
Hmm I can't help but think it should have done a little better with this image http://i.imgur.com/yBNJWKf.png
by madmoose on 4/13/2016, 9:20 PM
Feeding noise to a neural network is always fun: https://i.imgur.com/pPdwIGx.png
by donutdan4114 on 4/13/2016, 5:52 PM
Pretty good. Can't wait to see how good this tech gets in the next few years.
https://www.dropbox.com/s/ty34c02y1mngyrc/Screenshot%202016-...
by arunitc on 4/13/2016, 3:50 PM
I gave this image - https://i.imwx.com/images/maps/truvu/map_specnewsdct-109_lts...
and I got this result "I am not really confident, but I think it's a couple of glass vases with flowers on top of a surfboard."
by nerdy on 4/13/2016, 4:17 PM
It's only a matter of time before a repeat of Microsoft's last AI experiment (Tay), when the Internet teaches CaptionBot all of the positions in the Kama Sutra.
by Thaxll on 4/13/2016, 4:50 PM
Made me chuckle:
http://imgur.com/GSpanVe
by larrik on 4/13/2016, 3:56 PM
It says "any image" but I think they really mean "any photograph", based on the samples as well as the stuff I uploaded to it.
by jchampem on 4/13/2016, 5:46 PM
This one is really Funny https://pbs.twimg.com/media/Cf8LJk6WcAAqZ0X.jpg (image can be found on windows default install!)
by spo81rty on 4/14/2016, 3:50 AM
This is a ton of fun. Cat on a counter... Lol http://m.imgur.com/2tYgmmL
by arprocter on 4/13/2016, 6:31 PM
Fun stuff
http://i.imgur.com/kS6sgNT.png
Edit: I was expecting it to think an eel was a snake, but... http://i.imgur.com/EmpRNkA.png
by mapleoin on 4/13/2016, 3:49 PM
This is no fun to talk about without permalinks to uploaded images/results.
by ulkesh on 4/13/2016, 4:46 PM
I tried a Magic Eye photo. It didn't see the sailboat at all.
by chriskanan on 4/13/2016, 6:14 PM
My lab is trying to do something similar for answering questions about images. We have a significantly better system than the current system that's online, but we haven't had a chance to update it yet: http://askimage.org
It is far from perfect, but is near state-of-the-art. I'm guessing it won't hold up to HN.
by ataylor32 on 4/13/2016, 6:57 PM
I like it. https://i.imgur.com/5HPdbSa.png
by Spivak on 4/13/2016, 10:27 PM
This is amazing. This is exactly what I needed to get through a long on-call shift.
https://m.imgur.com/N72gtoC
by swalsh on 4/13/2016, 4:44 PM
It is almost as smart as a child. I uploaded a picture of my Notre-Dame vacation photo, and the caption was "A person standing in front of a church"... which is close to my sons "mommy standing in front of that church we went to"
by verelo on 4/13/2016, 6:30 PM
It's amazing how wrong this gets some things, and then again its amazing how right it gets other.
The last one in this set really surprised me:
http://imgur.com/a/gLTl4
by Savageman on 4/13/2016, 7:22 PM
Ohh I got a good one: "I am not really confident, but I think it's a close up of a plane with a blue umbrella."
http://imgur.com/FYucrda
by justsaysmthng on 4/13/2016, 4:17 PM
It was spot on for 30% of the images, but wildly inaccurate on the rest.
In fact, I assume this is a crowd sourced training for the tech..
Kind of disappointing, but at the same time I understand that this task is not trivial at all.
by andreyk on 4/13/2016, 7:23 PM
Links to a couple of the initial super impressive research papers on generating captions for images from 2014 and 2015: http://googleresearch.blogspot.com/2014/11/a-picture-is-wort... http://cs.stanford.edu/people/karpathy/deepimagesent/
As far as I know this was the first research to do the super cool thing to combine multiple neural nets trained on different data in super cool ways:
"Now, what if we replaced that first RNN and its input words with a deep Convolutional Neural Network (CNN) trained to classify objects in images? Normally, the CNN’s last layer is used in a final Softmax among known classes of objects, assigning a probability that each object might be in the image. But if we remove that final layer, we can instead feed the CNN’s rich encoding of the image into a RNN designed to produce phrases. We can then train the whole system directly on images and their captions, so it maximizes the likelihood that descriptions it produces best match the training descriptions for each image."
AND
"Our alignment model is based on a novel combination of Convolutional Neural Networks over image regions, bidirectional Recurrent Neural Networks over sentences, and a structured objective that aligns the two modalities through a multimodal embedding"
by viach on 4/13/2016, 3:59 PM
It can't recognise B. Gates photo. Ok.
by bikamonki on 4/13/2016, 5:57 PM
Android users: do you get a lack of memory/resources error when you try to take a pic instead of selecting from gallery? It is a silly bug where the camera activity kills the browser activity that called it.
Google: we cannot move forward with 'Progressive Webapps' if you guys don't fix these silly bugs.
Take-picture-do-something is a common feature of webapps like here Mr CaptionBot!!
by YeGoblynQueenne on 4/14/2016, 6:50 AM
Yeah, the tech is not yet ready for prime time.
1) Close-up of a roman coin
- I think it's a banana peel
2) Inverse black-on-white outline drawing of a wolf howling at the moon (logo of comic series Elfquest)
- I am not really confident, but I think it's a close up of two giraffes near a tree.
3) Red-on-black drawing of eight arrows with a circle in the middle (chaos symbol).
- I am not really confident, but I think it's a red and white sign.
4) Red-on-black drawing of a hammer-and-sickle (communism symbol).
- I am not really confident, but I think it's a picture of some sort.
5) Ltd Cmdr Data laughing, one hand on his chest, the other extending outside the picture.
- I am not really confident, but I think it's a man holding a wii controller and he seems :D
6) Germaine Greer biting the head off a barbie, while shaking another off its ponytail
- I am not really confident, but I think it's a woman eating a doughnut and they seem :D D:
7) Image of a tiny lilac octopus on a black background
- I am not really confident, but I think it's a close up of a doughnut.
8) Red-on-black drawing of an "A" in a circle (anarchy symbol)
- I am not really confident, but I think it's a lamppost
9) Black-and-white picture of actress Liv Ulmann
- I am not really confident, but I think it's a man with a stuffed animal.
10) Portrait of countess Elisabeth Bathory
- I am not really confident, but I think it's a woman wearing a hat and she seems :|
For the record, number (10) is spot on (though with low confidence, so may be just random).
by emanueld on 4/13/2016, 11:19 PM
At least it got the tree part right.
http://m.imgur.com/sIm97r9
by ThinkBeat on 4/13/2016, 4:36 PM
Hmm.
A couple of days ago I think there was a post about Google doing a lot of development and research around creating systems that understand / categorize / comment / recognize images.
One thing I took away from reading about it is that Google has billions of images to train it with from all their different ventures.
Does Microsoft have access to anywhere near the same numbers of pictures?
by chris-at on 4/13/2016, 8:28 PM
> I am not really confident, but I think it's a close up of a cat.
Hello kitty: http://i.livescience.com/images/i/000/024/750/i02/tarantula-...
by debacle on 4/13/2016, 3:53 PM
An interesting project, but it fared pretty poorly on all of the images I gave it - the suggestions were wildly outlandish.
by oh_sigh on 4/14/2016, 6:37 AM
I wish services like this would be released without any kind of moral filter on the subjects it classifies.
I uploaded a picture of Michaelangelos David to the service to see what captionbot would say about it, and I got back a message "I think this may be inappropriate content so I won't show it."
by plank on 4/13/2016, 8:11 PM
Tried it with three different pictures, one from clker.com (http://www.clker.com/cliparts/0/3/f/0/1194984730712928848mag...) mistaken for a lamp-post, and two from unsplash (https://unsplash.com/photos/2Ts5HnA67k8 and https://unsplash.com/photos/iIg4F2IWbTM). In the latter two it tells me that it cannot recognise anything. So for me, it isn't there yet....
by woodfordb on 4/13/2016, 8:46 PM
It classified my picture of a dog as inappropriate content and wont display it. Dang it.
by ccozan on 4/13/2016, 10:12 PM
It feels like there are two sides of this: either recognition is amazing, either is really really far.
It seems that after it generates the caption, this needs to be fed to some semantic pipe, so that a plane sitting on a book would not make sense, and try further.
After all, it really depends on the training data. If the picture of a train ticket was never seen by the NN, how could it answer correctly? How ever, it should try to reduce the answer to some more meaningfull info, for example instead of two giraffes near a tree, ideally would have said, it's a text and would attempt OCR.
by apocalyptic0n3 on 4/13/2016, 5:47 PM
I gave it a photo of a Cylon [0] and it said "I am not really confident, but I think it's a close up of a motorcycle." Close but not really there; Google's reverse image search has a better detection in this case. As an aside, it'd have been really cool if it said it was a picture of a toaster.
[0] http://www.xperiax10.net/wp-content/gallery/cinema_x10/cylon...
by bgalbraith on 4/13/2016, 5:54 PM
CaptionBot doesn't really know what to make of Winged Doom: http://imgur.com/86uwKfa
by Devthrowaway80 on 4/13/2016, 7:00 PM
Pretty impressive - gave it a few profile photos and it did suprisingly well, correctly identifying "A couple walking on a beach at sunset," "a man looking out a window", etc.
It struggled with wildlife photos - a pack of arctic wolves was "a sheep standing in the snow", and penguins swimming was "a bird flying over a body of water" (close but no cigar).
by icefox on 4/14/2016, 3:31 AM
Uploaded a cropped version of Mars in a photo that shows its atmosphere from http://spaceref.com/onorbit/mars-methane-and-mysteries.html
And was told: "I am not really confident, but I think it's a toilet that is in the dark."
by vdnkh on 4/13/2016, 3:57 PM
I tried a bunch of different images and I got 'two giraffes near a tree' a bunch of times. They were drawn images though.
by lordvissu on 4/13/2016, 5:27 PM
It's not at all working. Every time, the same thing pops up - "I am under the weather now. Try again later. :("
by semerda on 4/13/2016, 5:13 PM
Definitely not a picture of 2 giraffes near a tree: https://www.dropbox.com/s/ki9p59txh8mk143/Photo%20Apr%2013%2... It's just a Caltrain ticket ¯\_(ツ)_/¯
by skykooler on 4/13/2016, 8:22 PM
It doesn't seem to know about rockets. http://imgur.com/mqRuLVq.png
(I tried the spacex landing pictures too - it correctly identified "a boat in a large body of water" but ignored the ten-story rocket above said boat.)
by zacharynewton on 4/13/2016, 5:36 PM
Silly Microsoft, should have at least had some caching layer instead of analyzing every image. RIP CaptionBot.
by on 4/13/2016, 3:55 PM
undefined
by jdkanani on 4/13/2016, 5:04 PM
Microsoft has great tech team - no doubt, but seems it lacks in product and market strategies.
by krambo on 4/13/2016, 6:12 PM
I'd like to see photo battles between microsoft and google, as a live game show.
by mcheshier on 4/13/2016, 7:20 PM
My results ranged from impressive to awful. It recognized Pete Carroll with 96% accuracy from a meme picture where he struts and chews gum. Then it thought a picture of the super bowl field before the game was boats on a table.
by joshu on 4/13/2016, 4:48 PM
"I never felt at home here. This is an awful place to be dropped down halfway”
by tomschlick on 4/13/2016, 6:28 PM
Gave it a picture of an AR-15 on a shooting bench and it thought it was a bicycle.
by bingeboy on 4/13/2016, 4:02 PM
Service appears to be down or "under the weather" whatever that means.
by lotso on 4/13/2016, 3:55 PM
Ha, got eerily accurate results. Some funny ones as well, but interesting tech.
by breischl on 4/13/2016, 9:33 PM
Pretty impressed that it got this one, given how the faucet breaks up the outline.
http://puu.sh/ohauF/435af67ac1.jpg
by StephenConnell on 4/14/2016, 3:22 AM
My photos did not do too well. My Coral looks like a cake, my lizard looks like a bird, my boy fishing looks like a man next to a river, and a waterfall looks like a close up of Rock.
by gsbell on 4/13/2016, 8:12 PM
Hypnotoad is not a "person on a surf board in a skate park." http://imgur.com/2Cf5LKW
by jlubawy on 4/14/2016, 6:59 AM
I uploaded the sad Michael Jordan meme face and it responded "I think it's Michael Jordan wearing a suit and tie and he seems :(", sounds about right...
by zarify on 4/14/2016, 12:39 AM
So I looked for a random photo on my phone and fed it a picture of a spot my leg that I'm keeping an eye on. Close-up of a cat apparently. Damn these hairy legs.
by vic20forever on 4/13/2016, 9:10 PM
Hmmm... I'm not seeing it. https://i.imgur.com/OFPArbf.png
by jedberg on 4/13/2016, 10:02 PM
I gave it a picture of a Captcha, and it said it was some giraffes against a fence. :) So at least we know they haven't broken Captcha yet!
by indatawetrust on 4/14/2016, 1:09 PM
http://i.hizliresim.com/o35z6m.png
by Koopa on 4/13/2016, 10:19 PM
This one made me laugh http://imgur.com/u0E5eu5
by gsbell on 4/13/2016, 8:20 PM
Close up of a Bicycle... http://imgur.com/BZd088p
by asib on 4/14/2016, 12:25 AM
This made me laugh: https://imgur.com/PhbyAyK
by daxfohl on 4/13/2016, 6:50 PM
Surprisingly it does pretty poorly on the images include in Windows XP's "Sample Pictures" folder.
by monknomo on 4/13/2016, 8:56 PM
I gave it a statue of Joan of Arc and it thinks it is a motorcycle mirror with a neutral expression...
by monk_e_boy on 4/13/2016, 5:39 PM
Wow! I gave it a photo of a kitesurfer and it got it (man flying a kite in a body of water). Amazing!!
by andrewclunn on 4/13/2016, 6:06 PM
Uploaded dick pick. Caption said it was a micro penis :-(
by cabirum on 4/13/2016, 6:18 PM
Feed it Deep Dream generated images.
by jcoffland on 4/13/2016, 7:12 PM
It did not work well for me. I tried to give it an easy one. A picture of a salt and pepper shaker. Here's what it said:
> I am not really confident, but I think it's a cake made to look like a phone.
Nice try m$.