Data sets are fundamental building blocks of AI systems, and this paradigm isn’t likely to ever change. Without a corpus on which to draw, as human beings employ daily, models can’t learn the relationships that inform their predictions.
But why stop at a single corpus? An intriguing report by ABI Research anticipates that while the total installed base of AI devices will grow from 2.69 billion in 2019 to 4.47 billion in 2024, comparatively few will be interoperable in the short term. Rather than combine the gigabytes to petabytes of data flowing through them into a single AI model or framework, they’ll work independently and heterogeneously to make sense of the data they’re fed.
That’s unfortunate, argues ABI, because of the insights that might be gleaned if they played nicely together. That’s why as an alternative to this unimodality, the research firm proposes multimodal learning, which consolidates data from various sensors and inputs into a single system.
Multimodal learning can carry complementary information or trends, which often only become evident when they’re all included in the learning process. Plus, learning-based methods that leverage signals from different modalities can generate more robust inference than would be possible in a unimodal system.
Consider images and text captions. If different words are paired with similar images, these words are likely used to describe the same things or objects. Conversely, if some words appear next to different images, this implies these images represent the same object. Given this, it should be possible for an AI model to predict image objects from text descriptions, and indeed, a body of academic literature has proven this to be the case.
Despite the many advantages of multimodal approaches to machine learning, ABI’s report notes that most platform companies — including IBM, Microsoft, Amazon, and Google — continue to focus predominantly on unimodal systems. That’s partly because it’s challenging to mitigate the noise and conflicts in modalities, and to reconcile the differences in quantitative influence that modalities have over predictions.
Fortunately, there’s hope yet for wide multimodal adoption. ABI Research anticipates the total number of devices shipped will grow from 3.94 million in 2017 to 514.12 million in 2023, spurred by adoption in the robotics, consumer, health care, and media and entertainment segments. Companies like Waymo are leveraging multimodal approaches to build hyper-aware self-driving vehicles, while teams like that led by Intel Labs principal engineer Omesh Tickoo are investigating techniques for sensor data collation in real-world environments.
“In a noisy scenario, you may not be able to get a lot of information out of your audio sensors, but if the lighting is good, maybe a camera can give you a little better information,” Tickoo explained to VentureBeat in a phone interview. “What we did is, using techniques to figure out context such as the time of day, we built a system that tells you when a sensor’s data is not of the highest quality. Given that confidence value, it weighs different sensors against each at different intervals and chooses the right mix to give us the answer we’re looking for.”
Multimodal learning won’t supplant unimodal learning, necessarily — unimodal learning is highly effective in applications like image recognition and natural language processing. But as electronics become cheaper and compute more scalable, it’ll likely only rise in prominence.
“Classification, decision-making, and HMI systems are going to play a significant role in driving adoption of multimodal learning, providing a catalyst to refine and standardize some of the technical approaches,” said ABI Research chief research officer Stuart Carlaw in a statement. “There is impressive momentum driving multimodal applications into devices.”
Thanks for reading,
AI Staff Writer
P.S. Please enjoy this video about Bill Gates discussing AI at Bloomberg’s New Economy Forum in Beijing, among other topics like climate change and nuclear power.