In a revealing Wall Street Journal interview, OpenAI CTO Mira Murati was tight-lipped about the specific data sources used to train Sora, the organization's advanced AI video generator. Amidst growing scrutiny over AI training practices, Murati's reluctance to detail the origins of the data highlights the ongoing debate surrounding copyright and ethical AI development.
OpenAI CTO Evades Detailed Queries on Sora's Training Data Amid Copyright Concerns
That trend has stayed the same with OpenAI's Sora, the company's upcoming text-to-video generative AI that has demonstrated the ability to create lifelike and realistic videos.
In an interview video with the Wall Street Journal, OpenAI's former CEO (she was CEO for two days when Sam Altman was temporarily removed) and current CTO Mira Murati discussed the company's new technology. Murati's interview was intended to discuss the benefits of Sora and hype the upcoming technology. That happens, but Joanna Stern of the WSJ did more than throw softballs; she also asked some difficult questions.
In a three-minute segment, Stern questions Sora's training set. Before the interview, Stern provided OpenAI with some new text descriptions that would be used to create videos for their interview.
"Every time I watch a Sora clip, I wonder what videos this AI model learned from," Joanna says. Did the model see any clips of Ferdinand to know what a bull in a Chinese shop should look like? Was it a fan of Spongebob?"
While she asks these questions, clips from the animated film Ferdinand and the children's television show Spongebob appear side by side with Sora's work, making it difficult not to notice the similarities. The next question was, naturally, what data was used to train Sora?
"We used publicly available data and licensed data," Murati responds.
"So videos on YouTube?" Stern asked. "Videos from Facebook, Instagram? What about Shutterstock? I know you guys have a deal with them."
"I'm actually not sure about that. If they were publicly available, publicly available to use, there might be that data, but I'm not sure. I'm not confident about it," Murati said.
"I'm just not going to go into the details of the data that was used, but it was publicly available or licensed data."
Murati confirmed to Stern after the interview that the licensed data includes Shutterstock content, but her refusal to discuss the topic on camera is telling.
Ethical Quandaries: AI's Content Creation Sparks Copyright Controversy and Artist Concerns
PetaPixel reports that as impressive as generative AI, the debate over how these companies create visual content and the likelihood that it violates artists' copyrights remains constant. There have been reports that the people behind AI image generators specifically target specific artists in their training data under the guise of making it "publicly available." Even when this is not the case, the ease with which photographers can recreate their photos with minimal effort, or the fact that iconic images are just as simple to recreate with minimal effort, tells the story.
These AI systems have likely seen and been trained on those copyrighted images, which explains why they can easily recreate their versions. However, speculation isn't necessary. Midjourney's founder admitted that its AI used a "hundred million" image as a training set without permission. OpenAI admitted that it is "impossible" to train AI without relying on copyrighted content.
That said, Murati is likely aware that discussing using stolen content to train its AI is not something OpenAI wants to admit regularly, so she refuses to respond to Stern's question. It is, however, an easy way to argue that these companies care little about human artists' rights and demonstrate how far they will go to further their interests, regardless of the cost.
Photo: Levart_Photographer/Unsplash


Samsung Electronics Eyes Record Q1 Profit Amid AI-Driven Chip Boom
Rubio Directs U.S. Diplomats to Use X and Military Psyops to Counter Foreign Propaganda
Annie Altman Amends Sexual Abuse Lawsuit Against OpenAI CEO Sam Altman
Elon Musk Ties SpaceX IPO Access to Mandatory Grok AI Subscriptions
China's AI Stocks Surge as Zhipu and MiniMax Hit Record Highs
Bank of America Identifies Top Asia-Pacific Semiconductor Stocks Poised for AI-Driven Growth
Apple's Foldable iPhone Faces Engineering Setbacks, Mass Production Timeline at Risk
Britain Courts Anthropic Amid US Defense Department Dispute
Anthropic's Mythos AI Model Sparks Emergency Cybersecurity Meeting With Top U.S. Bank CEOs
China's Push to Steal Taiwan's Chip Technology and Talent Raises Security Alarms
Alibaba Shares Slide as Jefferies Slashes Price Target Over AI Spending and Business Losses
Apple Turns 50: From Garage Startup to AI Crossroads
San Francisco Suspect Arrested After Molotov Cocktail Attack on OpenAI CEO Sam Altman's Home
Lumentum Holdings Rides AI Wave With Order Book Filled Through 2028
China vs. NASA: The New Moon Race and What's at Stake by 2030 



