Waymo has launched its new AI research model, EMMA, aimed at advancing autonomous driving technology. The model, still in the research stages, integrates multimodal learning for improved decision-making and real-time driving performance. Waymo plans to refine EMMA’s capabilities, which could reshape future self-driving systems.
Waymo's EMMA Model: Pioneering Multimodal AI for the Future of Autonomous Driving
According to Teslarati, Waymo, the autonomous ride-hailing division of Alphabet, Google's parent company, has introduced a new AI research methodology for its self-driving operations.
Waymo has provided information about its future goals for the AI research model in two news releases on its approach to AI and its new end-to-end multimodal model for autonomous driving, called EMMA. The EMMA model, which resembles Tesla's Full Self-Driving (FSD) and other end-to-end model approaches, is still used in research stages, according to the business, rather than in actual vehicles.
“EMMA is research that demonstrates the power and relevance of multimodal models for autonomous driving,” said Drago Anguelov, VP and Head of Research at Waymo. “We are excited to continue exploring how multimodal methods and components can contribute towards building an even more generalizable and adaptable driving stack.”
According to Waymo, the end-to-end strategy is anticipated to eventually enable autonomous vehicles to function directly from sensor data and real-time driving scenarios, whereas the EMMA model leverages real-world information based on its Gemini language model. By referring to its architecture as the Waymo Foundation Model, the business has also emphasized using Large Language Models (LLMs) and Vision-Language Models (VLMs).
Waymo's EMMA Model Advances Autonomous Driving with End-to-End Learning and Chain-of-Thought Reasoning
Waymo outlines the following as essential components of the research effort in the press release announcing EMMA:
- End-to-End Learning: EMMA creates various driving outputs, such as planner trajectories, perception objects, and road graph parts, by processing textual data and raw camera inputs.
- Unified Language Space: EMMA optimizes Gemini's world knowledge by encoding non-sensor inputs and outputs as natural language text.
- Chain-of-Thought Reasoning: EMMA employs chain-of-thought reasoning to improve its decision-making process, resulting in a 6.7% improvement in end-to-end planning performance and providing comprehensible justification for its guiding decisions.
“The problem we’re trying to solve is how to build autonomous agents that navigate in the real world,” says Srikanth Thirumalai, Waymo VP of Engineering. “This goes far beyond what many AI companies out there are trying to do.”
However, some have questioned the large-scale end-to-end approach, arguing that generative AI models without substantial safeguards may be too dangerous.
“It’s bandwagoning around something that sounds impressive but is not a solution,” said Sterling Anderson, Aurora Innovation’s Chief Product Officer, in a statement to Automotive News.
End-to-end methods are "a huge risk," according to Mobileye CTO Shai Shalev-Shwartz, mainly when confirming the decision-making process for vehicles using the model. It's also important to note that Waymo is merely investigating the strategy now and has no immediate intentions to commercialize it.
The announcement follows Waymo's recent $5.6 billion investment round closing, effectively increasing the company's worth to above $45 billion. At a new facility in Georgia, the company is also developing its next generation of self-driving cars, which will be based on the Hyundai Ioniq 5.