In a revealing Wall Street Journal interview, OpenAI CTO Mira Murati was tight-lipped about the specific data sources used to train Sora, the organization's advanced AI video generator. Amidst growing scrutiny over AI training practices, Murati's reluctance to detail the origins of the data highlights the ongoing debate surrounding copyright and ethical AI development.
OpenAI CTO Evades Detailed Queries on Sora's Training Data Amid Copyright Concerns
That trend has stayed the same with OpenAI's Sora, the company's upcoming text-to-video generative AI that has demonstrated the ability to create lifelike and realistic videos.
In an interview video with the Wall Street Journal, OpenAI's former CEO (she was CEO for two days when Sam Altman was temporarily removed) and current CTO Mira Murati discussed the company's new technology. Murati's interview was intended to discuss the benefits of Sora and hype the upcoming technology. That happens, but Joanna Stern of the WSJ did more than throw softballs; she also asked some difficult questions.
In a three-minute segment, Stern questions Sora's training set. Before the interview, Stern provided OpenAI with some new text descriptions that would be used to create videos for their interview.
"Every time I watch a Sora clip, I wonder what videos this AI model learned from," Joanna says. Did the model see any clips of Ferdinand to know what a bull in a Chinese shop should look like? Was it a fan of Spongebob?"
While she asks these questions, clips from the animated film Ferdinand and the children's television show Spongebob appear side by side with Sora's work, making it difficult not to notice the similarities. The next question was, naturally, what data was used to train Sora?
"We used publicly available data and licensed data," Murati responds.
"So videos on YouTube?" Stern asked. "Videos from Facebook, Instagram? What about Shutterstock? I know you guys have a deal with them."
"I'm actually not sure about that. If they were publicly available, publicly available to use, there might be that data, but I'm not sure. I'm not confident about it," Murati said.
"I'm just not going to go into the details of the data that was used, but it was publicly available or licensed data."
Murati confirmed to Stern after the interview that the licensed data includes Shutterstock content, but her refusal to discuss the topic on camera is telling.
Ethical Quandaries: AI's Content Creation Sparks Copyright Controversy and Artist Concerns
PetaPixel reports that as impressive as generative AI, the debate over how these companies create visual content and the likelihood that it violates artists' copyrights remains constant. There have been reports that the people behind AI image generators specifically target specific artists in their training data under the guise of making it "publicly available." Even when this is not the case, the ease with which photographers can recreate their photos with minimal effort, or the fact that iconic images are just as simple to recreate with minimal effort, tells the story.
These AI systems have likely seen and been trained on those copyrighted images, which explains why they can easily recreate their versions. However, speculation isn't necessary. Midjourney's founder admitted that its AI used a "hundred million" image as a training set without permission. OpenAI admitted that it is "impossible" to train AI without relying on copyrighted content.
That said, Murati is likely aware that discussing using stolen content to train its AI is not something OpenAI wants to admit regularly, so she refuses to respond to Stern's question. It is, however, an easy way to argue that these companies care little about human artists' rights and demonstrate how far they will go to further their interests, regardless of the cost.
Photo: Levart_Photographer/Unsplash


U.S. Lawmakers Urge Pentagon to Blacklist More Chinese Tech Firms Over Military Ties
MetaX IPO Soars as China’s AI Chip Stocks Ignite Investor Frenzy
OpenAI Explores Massive Funding Round at $750 Billion Valuation
Republicans Raise National Security Concerns Over Intel’s Testing of China-Linked Chipmaking Tools
Jared Isaacman Confirmed as NASA Administrator, Becomes 15th Leader of U.S. Space Agency
Nvidia to Acquire Groq in $20 Billion Deal to Boost AI Chip Dominance
Mizuho Raises Broadcom Price Target to $450 on Surging AI Chip Demand
Apple Opens iPhone to Alternative App Stores in Japan Under New Competition Law
SpaceX Begins IPO Preparations as Wall Street Banks Line Up for Advisory Roles
Apple Explores India for iPhone Chip Assembly as Manufacturing Push Accelerates
FTC Praises Instacart for Ending AI Pricing Tests After $60M Settlement
Nvidia Weighs Expanding H200 AI Chip Production as China Demand Surges
SUPERFORTUNE Launches AI-Powered Mobile App, Expanding Beyond Web3 Into $392 Billion Metaphysics Market
Trump Administration Reviews Nvidia H200 Chip Sales to China, Marking Major Shift in U.S. AI Export Policy
SpaceX Insider Share Sale Values Company Near $800 Billion Amid IPO Speculation
John Carreyrou Sues Major AI Firms Over Alleged Copyrighted Book Use in AI Training
ByteDance Plans Massive AI Investment in 2026 to Close Gap With U.S. Tech Giants 



