Microsoft’s new AI auto-captions images for the visually impaired

Ryan is a senior editor at TechForge Media with over a decade of experience covering the latest technology and interviewing leading industry figures. He can often be sighted at tech conferences with a strong coffee in one hand and a laptop in the other. If it's geeky, he’s probably into it. Find him on Twitter (@Gadget_Ry) or Mastodon (

A new AI from Microsoft aims to automatically caption images in documents and emails so that software for visual impairments can read it out.

Researchers from Microsoft explained their machine learning model in a paper on preprint repository arXiv.

The model uses VIsual VOcabulary pre-training (VIVO) which leverages large amounts of paired image-tag data to learn a visual vocabulary.

A second dataset of properly captioned images is then used to help teach the AI how to best describe the pictures.

“Ideally, everyone would include alt text for all images in documents, on the web, in social media – as this enables people who are blind to access the content and participate in the conversation. But, alas, people don’t,” said Saqib Shaikh, a software engineering manager with Microsoft’s AI platform group.

Overall, the researchers expect the AI to deliver twice the performance of Microsoft’s existing captioning system.

In order to benchmark the performance of their new AI, the researchers entered it into the ‘nocaps’ challenge. As of writing, Microsoft’s AI now ranks first on its leaderboard.

“The nocaps challenge is really how are you able to describe those novel objects that you haven’t seen in your training data?” commented Lijuan Wang, a principal research manager in Microsoft’s research lab.

Developers wanting to get started with building apps using Microsoft’s auto-captioning AI can already do so as it’s available in Azure Cognitive Services’ Computer Vision package.

Microsoft’s impressive SeeingAI application – which uses computer vision to describe an individual’s surroundings for people suffering from vision loss – will be updated with features using the new AI.

“Image captioning is one of the core computer vision capabilities that can enable a broad range of services,” said Xuedong Huang, Microsoft CTO of Azure AI Cognitive Services.

“We’re taking this AI breakthrough to Azure as a platform to serve a broader set of customers,” Huang continued. “It is not just a breakthrough on the research; the time it took to turn that breakthrough into production on Azure is also a breakthrough.”

The improved auto-captioning feature is also expected to be available in Outlook, Word, and PowerPoint later this year.

(Photo by K8 on Unsplash)

Interested in hearing industry leaders discuss subjects like this? Attend the co-located 5G Expo, IoT Tech Expo, Blockchain Expo, AI & Big Data Expo, and Cyber Security & Cloud Expo World Series with upcoming events in Silicon Valley, London, and Amsterdam.

Tags: , , , , , , , , , , , , , ,

View Comments
Leave a comment

Leave a Reply