Microsoft says its AI can describe images 'as well as people do'

  • Microsoft says its AI can describe images 'as well as people do'

Microsoft says its AI can describe images 'as well as people do'

Then the researchers fine-tuned the pre-trained model for captioning on the already captioned images dataset.

Everyone has experienced an automatically generated caption at some point that is more robotic gibberish than a description of the photo. In 2016, Google said its artificial intelligence could caption images nearly as well as humans, with 94 percent accuracy.

And while that's a notable milestone on its own, Microsoft isn't just keeping this tech to itself. This new improved image captioning capability is now available to all customers via Azure Cognitive Services.

Meanwhile, Microsoft is including the tool in its app for the visually impaired-Seeing AI. It will arrive in PowerPoint for the web, Windows, and Mac later this year as well as in Word and Outlook on desktop platforms.

"[Image captioning] is one of the hardest problems in AI", said Eric Boyd, CVP of Azure AI, in an interview with Engadget.

When evaluated on nocaps, the AI system created captions that were more descriptive and accurate than the captions for the same images that were written by people, according to results presented in a research paper. Suffice it to say this isn't exactly Skynet, but we can be pretty sure that future Terminators will be able to describe your photo library better than you can...

Nonetheless, Microsoft's innovations will help make the internet a better place for visually impaired users and sighted individuals alike.

Xuedong Huang, a Microsoft technical fellow and the chief technology officer of Azure AI Cognitive Services.

Training the AI model for such a task involves feeding hundreds of thousands of images into a dataset, with each image being accompanied by word tags rather than full captions.

But while beating a benchmark is significant, the real test for Microsoft's new model will be how it functions in the real world. The training process helped the model to learn how to compose a sentence. "As Microsoft explains on its blog: "[this] breakthrough in a benchmark challenge is a milestone in Microsoft's push to make its products and services inclusive and accessible to all users".

In addition to the latest version of the Cognitive Services Computer Vision API, Microsoft says the model is now included in Seeing AI.

Microsoft benchmarked the VIVO-pretrained model on nocaps, a test created to encourage the development of image captioning models that can learn visual concepts from alternative sources of data.