Ask HN: Microsoft Computer Vision API or Google Cloud Vision API?

Hi HN community!

I am trying to decide on Microsoft's CV offering vs . Google's CV offering for my B2B startup. Any recommendations from people who have tried both??

Background - We are trying to use images of models uploaded by agencies and deriving labels & image properties. Face detection is something that is an added bonus if possible.

  • http://www.clarifai.com/

    Pricing is more friendly than the other two. The API is nice.

    I did like Google's a lot, but the price just wasn't there for me. Especially if you want most processing options.

    Microsoft have been in this game a lot longer, but surpisingly a lot of their cool stuff isn't in their APIs. i.e. ability to spot similar images and seamlessly stitch them. This stuff was in their maps products a long time ago, and you can download tools of theirs: http://research.microsoft.com/en-us/um/redmond/projects/ice/ but no APIs. Their basic APIs are just basic... so why not save the dime and go with a smaller player offering just the basics but very well instead.

  • Depends on the complexity of what you require. I know I might get down-voted for this, but if your task is relatively simple, then roll your own using deep learning. Message me if you want help with this.

    I wouldn't rely on either for my own startup, because I dont think these API's will have broad appeal, as a result wont get traction, and will be shutdown with little warning.

    I could help you if you want - my email is in my profile

  • OpenCV includes face detection, and given a reasonably limited corpus of faces, it performs quite well and quite reliably.

    (Whether you want to use that or use a service depends on how close to your core business this is.)

  • Interesting looking at Google's Vision API overview, where they explicitly state that facial recognition is not available.

    The technology to do this clearly exists, but I gather they are concerned about the potential for abuse. Which makes sense. You could build some very creepy apps with this.

  • A request to those suggesting "why not X" or "consider X" : If you could mention a reason or two favoring X over Y, that'll help OP & future visitors.

  • http://algorithmia.com

    Pay-as-you-go, many APIs, supportive community.

    For your use case you might want to check the Computer Vision tag, specifically the "Illustration Tagger" algorithm.

    https://algorithmia.com/tags/computer%20vision

  • I'm interested in good OCR, preferably local, but I'm close to giving up and using Google Cloud Vision API --- It works well for text that's not prefectly aligned and laid out - unlike e.g. Tesseract or any other local OCR I've used.

    As far as I can tell, clarifai.com doesn't have OCR, and neither does anyone else except MS and G.

  • Develop your code so that the API is pluggable. Try both and decide which works best for you.

  • It really depends what you are attempting to accomplish, and what you wish to detect in the images.

    As you mentioned faces:

    Are you looking for face detection or recognition? Face detection has been robustly solved before the advent of DL with HAARs/ face models. Now being pushed a bit further with DL.

    (http://docs.opencv.org/master/d7/d8b/tutorial_py_face_detect...)

    Current cutting edge face recognition systems rely on DL, and the top performing models are one out of Russia (NTechLAB, facenx_large) and one from Google (FaceNet v8). These were the top two performers in the MegaFace challenge - identification with 1M distractors. Truly remarkable results. http://megaface.cs.washington.edu/results/

    As with most DL systems you will need a massive corpus of labeled faces (aka, google or vkontakte - which the NTechLab group used)

  • I have tried both some time ago for an OCR task. In my brief experience, GCV performs better than Microsoft. Also last time I tried, I sometimes randomly get server error from Microsoft, so I guess Google infrastructure is more ready. The downside is GCV is a bit pricier. Also both do not provide parameter to set language models, so that's a minus in my eyes.

  • Is there a publicly accessible API that can geocode photos, to a degree of accuracy? I'd like to be able to decorate digital photos taken before geocoding was a thing with geo data. I figure photos I have taken off St. Marks Square in Venice have probably been taken a million times by other people, some of whom have probably added GPS coordinates to theirs, so a smart CV offering should be able to figure it out to a sufficient degree of accuracy (for reasonably well photographed and unchanging locations of the earth).

    EDIT: I see Google Cloud Vision has landmark detection, that might be useful if the API returns the GPS coordinates of the landmark.

  • We build customised image labelling solutions where you can label many more things like type of neck in a cloth, pattern of label on a mug and many such things which is not supported by Google or Microsoft.

    We also offer finding similar images as well as image search capabilities apart from finding tags from images. Please connect at https://twitter.com/adityapatadia to discuss further.

  • https://sightengine.com

    Alternative solution for image moderation and nudity detection. Simple API and simple pricing.

  • When I checked last, Google API does not allow to identify specific faces. It can detect faces but that's it. Clarify or Microsoft do. Pricing wise almost all are the same. In my view, Watson is a complete no no..

  • I think Microsoft's works better, try also IBM's if you can.

  • Do you know of a human pose estimation from 2d images service/library? I've seen papers about it. I would like to try it out.

  • Consider Imagga.

    http://imagga.com/

  • I think it depends on your dataset and application. It should be easy enough to try both.

  • What works well for face detection in low light images?

  • Microsoft's CV API worked better for me.

  • Try the Microsoft Computer Vision API with R and find out: http://thinktostart.com/analyze-face-emotions-r/