Hot topics close

Sora (text-to-video model)

Sora texttovideo model
Sora is a text-to-video model by the U.S. based artificial intelligence (AI) research organization OpenAI. It can generate videos based on descriptive...
Sora
A video generated by Sora of a woman walking down a Tokyo street
Developer(s) OpenAI
Platform OpenAI
Type Text-to-video model
Website openai.com/sora Edit this on Wikidata
Part of a series on
Artificial intelligence
Major goals
  • Artificial general intelligence
  • Recursive self-improvement
  • Planning
  • Computer vision
  • General game playing
  • Knowledge reasoning
  • Machine learning
  • Natural language processing
  • Robotics
  • AI safety
Approaches
  • Symbolic
  • Deep learning
  • Bayesian networks
  • Evolutionary algorithms
  • Situated approach
  • Hybrid intelligent systems
  • Systems integration
Applications
  • Projects
  • Deepfake
  • Machine translation
  • Art
  • Healthcare
    • Mental health
  • Government
  • Music
  • Industry
  • Earth sciences
  • Bioinformatics
  • Physics
Philosophy
  • Chinese room
  • Friendly AI
  • Control problem/Takeover
  • Ethics
  • Existential risk
  • Turing test
  • Regulation
History
  • Timeline
  • Progress
  • AI winter
  • AI boom
  • AI era
Glossary
  • Glossary
  • v
  • t
  • e

Sora is a text-to-video model by the U.S. based artificial intelligence (AI) research organization OpenAI. It can generate videos based on descriptive prompts as well as extend existing videos forwards or backwards in time.[1][2] As of February 2024, it is unreleased and not yet available to the public.[3]

History

Several other text-to-video generating models had been created prior to Sora, including Meta's Make-A-Video, Runway's Gen-2, and Google's Lumiere, the last of which, as of February 2024, is also still in its research phase.[4][5]OpenAI, the company behind Sora, had released DALL·E 3, the third of its DALL-E text-to-image models, in September 2023.[6]

The team that developed Sora named it after the Japanese word for sky to signify its "limitless creative potential".[1] On February 15, 2024, OpenAI first previewed Sora by releasing multiple clips of high-definition videos that it created, including an SUV driving down a mountain road, an animation of a "short fluffy monster" next to a candle, two people walking through Tokyo in the snow, and fake historical footage of the California gold rush, and stated that it was able to generate videos up to one minute long.[5][4] The company then shared a technical report, which highlighted the methods used to train the model.[2][7] OpenAI CEO Sam Altman also posted a series of tweets, responding to Twitter users' prompts with Sora-generated videos of the prompts.

OpenAI has stated that it plans to make Sora available to the public but that it would not be soon; it has not specified when.[5][3] The company provided limited access to a small "red team", including experts in misinformation and bias, to perform adversarial testing on the model.[6] The company also shared Sora with a small group of creative professionals, including video makers and artists, to seek feedback on its usefulness in creative fields.[8]

Capabilities and limitations

The technology behind Sora is an adaptation of the technology behind DALL·E 3. According to OpenAI, Sora is a diffusion transformer[9] – a denoising latent diffusion model with one Transformer as the denoiser. A video is generated in latent space by denoising 3D "patches", then transformed to standard space by a video decompressor. Re-captioning is used to augment training data, by using a video-to-text model to create detailed captions on videos.[7]

OpenAI trained the model using publicly-available videos as well as copyrighted videos licensed for the purpose, but did not reveal the number or the exact source of the videos.[1] Upon its release, OpenAI acknowledged some of Sora's shortcomings, including its struggling to simulate complex physics, to understand causality, and to differentiate left from right.[10] OpenAI also stated that, in adherence to the company's existing safety practices, Sora will restrict text prompts for sexual, violent, hateful, or celebrity imagery, as well as content featuring pre-existing intellectual property.[6]

Tim Brooks, a researcher on Sora, stated that the model figured out how to create 3D graphics from its dataset alone, while Bill Peebles, also a Sora researcher, said that the model automatically created different video angles without being prompted.[5] According to OpenAI, Sora-generated videos are tagged with C2PA metadata to indicate that they were AI-generated.[1]

Reception

Will Douglas Heaven of the MIT Technology Review called the demonstration videos "impressive", but noted that they must have been cherry-picked and may not be representative of Sora's typical output.[8] American academic Oren Etzioni expressed concerns over the technology's ability to create online disinformation for political campaigns.[1] For Wired, Steven Levy similarly wrote that it had the potential to become "a misinformation train wreck" and opined that its preview clips were "impressive" but "not perfect" and that it "show[ed] an emergent grasp of cinematic grammar" due to its unprompted shot changes. Levy added, "[i]t will be a very long time, if ever, before text-to-video threatens actual filmmaking."[5] Lisa Lacy of CNET called its example videos "remarkably realistic – except perhaps when a human face appears close up or when sea creatures are swimming".[6]

See also

  • VideoPoet

References

  1. ^ a b c d e Metz, Cade (February 15, 2024). "OpenAI Unveils A.I. That Instantly Generates Eye-Popping Videos". The New York Times. Archived from the original on February 15, 2024. Retrieved February 15, 2024.
  2. ^ a b Brooks, Tim; Peebles, Bill; Holmes, Connor; DePue, Will; Guo, Yufei; Jing, Li; Schnurr, David; Taylor, Joe; Luhman, Troy; Luhman, Eric; Ng, Clarence Wing Yin; Wang, Ricky; Ramesh, Aditya (February 15, 2024). "Video generation models as world simulators". OpenAI. Archived from the original on February 16, 2024. Retrieved February 16, 2024.
  3. ^ a b Yang, Angela (February 15, 2024). "OpenAI teases 'Sora,' its new text-to-video AI model". NBC News. Archived from the original on February 15, 2024. Retrieved February 16, 2024.
  4. ^ a b Mauran, Cecily (February 15, 2024). "OpenAI announces Sora, a wild AI text-to-video model. See it in action". Mashable. Archived from the original on February 15, 2024. Retrieved February 16, 2024.
  5. ^ a b c d e Levy, Steven (February 15, 2024). "OpenAI's Sora Turns AI Prompts Into Photorealistic Videos". Wired. Archived from the original on February 15, 2024. Retrieved February 16, 2024.
  6. ^ a b c d Lacy, Lisa (February 15, 2024). "Meet Sora, OpenAI's Text-to-Video Generator". CNET. Archived from the original on February 16, 2024. Retrieved February 16, 2024.
  7. ^ a b Edwards, Benj (February 16, 2024). "OpenAI collapses media reality with Sora, a photorealistic AI video generator". Ars Technica. Archived from the original on February 17, 2024. Retrieved February 17, 2024.
  8. ^ a b Heaven, Will Douglas (February 15, 2024). "OpenAI teases an amazing new generative video model called Sora". MIT Technology Review. Archived from the original on February 15, 2024. Retrieved February 15, 2024.
  9. ^ Peebles, William; Xie, Saining (2023). "Scalable Diffusion Models with Transformers". 2023 IEEE/CVF International Conference on Computer Vision (ICCV). pp. 4172–4182. arXiv:2212.09748. doi:10.1109/ICCV51070.2023.00387. ISBN 979-8-3503-0718-4. ISSN 2380-7504. S2CID 254854389.
  10. ^ Pequeño IV, Antonio (February 15, 2024). "OpenAI Reveals 'Sora': AI Video Model Capable Of Realistic Text-To-Video Prompts". Forbes. Archived from the original on February 15, 2024. Retrieved February 15, 2024.

External links

Wikimedia Commons has media related to Sora.
  • Official website
  • v
  • t
  • e
OpenAI
Products
  • ChatGPT
    • in education
  • DALL-E
  • GitHub Copilot
  • OpenAI Five
  • Sora
  • Whisper
Foundation models
  • OpenAI Codex
  • Generative pre-trained transformer
    • GPT-1
    • GPT-2
    • GPT-3
    • GPT-4
People
CEOs
  • Sam Altman
    • removal
  • Mira Murati
  • Emmett Shear
Board of directors
Current
  • Bret Taylor
  • Larry Summers
  • Adam D'Angelo
Former
  • Greg Brockman (2017–2023)
  • Reid Hoffman (2019–2023)
  • Will Hurd (2021–2023)
  • Holden Karnofsky (2017–2021)
  • Elon Musk (2015–2018)
  • Ilya Sutskever (2017–2023)
  • Helen Toner (2021–2023)
  • Shivon Zilis (2019–2023)
Related
  • AI Dungeon
  • Auto-GPT
  • "Deep Learning"
  • LangChain
  • Microsoft Copilot
  • Microsoft Bing
  • Category
  • Commons
Retrieved from "https://en.wikipedia.org/w/index.php?title=Sora_(text-to-video_model)&oldid=1208818892"
Similar news
News Archive
  • Savings account
    Savings account
    How your money could grow in a Betterment high-yield savings account
    1 Sep 2019
    3
  • Anthoine Hubert
    Anthoine Hubert
    Anthoine Hubert's tragic death means 'I cannot fully enjoy my first victory,' says F1 star Charles Leclerc
    31 Aug 2019
    17
  • CRIME SCENE
    CRIME SCENE
    “Crime Scene Returns” Cast Members Get Entangled In Murder Mystery In Riveting New Poster
    18 Jan 2024
    1