Not to be outdone by competitors like Google, which recently previewed a text-to-video tool, AI startup OpenAI on Thursday introduced its own text-to-video model, Sora.
Like Google’s Lumiere, Sora’s availability is limited. Unlike Lumiere, Sora can generate videos up to 1 minute long.
Text-to-video has become the latest arms race in generative AI as OpenAI, Google, Microsoft and more look beyond text and image generation and seek to cement their position in a sector projected to reach $1.3 trillion in revenue by 2032 — and to win over consumers who’ve been intrigued by generative AI since ChatGPT arrived a little more than a year ago.
According to a post from OpenAI, maker of both ChatGPT and Dall-E, Sora will be available to “red teamers,” or experts in areas like misinformation, hateful content and bias, who will be “adversarially testing the model,” as well as visual artists, designers and filmmakers to gain additional feedback from creative professionals. That adversarial testing will be especially important to address the potential for convincing deepfakes, a major area of concern for the use of AI to create images and video.
In addition to garnering feedback from outside the organization, the AI startup said it wants to share its progress now to “give the public a sense of what AI capabilities are on the horizon.”
Strengths
One thing that may set Sora apart is its ability to interpret long prompts — including one example that clocked in at 135 words. The sample video OpenAI shared on Thursday demonstrate Sora can create a variety of characters and scenes, from people and animals and fluffy monsters to cityscapes, landscapes, zen gardens and even New York City submerged underwater.
This is thanks in part to OpenAI’s past work with its Dall-E and GPT models. Text-to-image generator Dall-E 3 was released in September. CNET’s Stephen Shankland called it “a big step up from Dall-E 2 from 2022.” (OpenAI’s latest AI model, GPT-4 Turbo, arrived in November.)
In…
Read the full article here