Thursday, June 20, 2024

How good is OpenAI’s Sora video model — and will it transform jobs?

Must read

OpenAI has been showcasing Sora, its artificial intelligence video-generation model, to media industry executives in recent weeks to drum up enthusiasm and ease concerns about the potential for the technology to disrupt specific sectors.

The Financial Times wanted to put Sora to the test, alongside the systems of rival AI video generation companies Runway and Pika.

We asked executives in advertising, animation and real estate to write prompts to generate videos they might use in their work. We then asked them their views on how such technology may transform their jobs in the future.

Sora has yet to be released to the public, so OpenAI tweaked some of the prompts before sending the resulting clips, which it said resulted in better-quality videos.

On Runway and Pika, the initial and tweaked prompts were entered using both companies’ most advanced models. Here are the results.

Charlotte Bunyan, co-founder of Arq, a brand advertising consultant

OpenAI’s revised version of Bunyan’s prompt to create a campaign for a “well-known high street supermarket”:

Pika and Runway’s videos based on Bunyan’s original prompt:

“Sora’s presentation of people was consistent, while the actual visualisation of the fantastical playground was faithfully rendered in terms of the descriptions of the different elements, which others failed to generate.

“It is interesting that OpenAI changed ‘children’ to ‘people’, and I would love to know why. Is it a safeguarding question? Is it harder to represent children because they haven’t been trained on as many? They opted for ‘people’ rather than a Caucasian man with a beard and brown hair, which is what Sora actually generated, which raises questions about bias.

“Pika felt surreal as if you were in a trippy film moment. The children’s version is much better than the League Of Gentlemen surrealness of the adult iteration, but the rest of the environment lacks details from the prompt. I do have a certain fondness for the vibrancy of [Pika’s children’s] version, as it conveys a sense of joy and happiness more strongly than any of the others.”

The video generated by Sora includes multiple elements, such as the banana slide, runner bean frame and watermelon roundabout
A screenshot of the AI generated video created by Runway displaying two children on a watermelon spinning
The video generated by Runway has distorting limbs throughout

“Runway was very much in the middle. Certainly, in the adult version, there was less glitching, but the representation of the playground elements was lacking.

“I could potentially use the Sora video as a taster of something we could bring to life in a virtual experience. It would demonstrate the playfulness of food. However, you may need to add a human layer to the content by using editing tools.

“These tools will speed up the way we communicate creative ideas and make them more tangible. For example, in the early stages of presenting a concept to a brand, this would make it much easier for clients to understand what it could look like or how it would work.

“My prompt has abstract creative concepts that are harder for these tools. Often, in the world of creativity, you’re trying to create something that hasn’t existed before. I know there is a lot of concern and perhaps negativity about AI taking all of our jobs, but I think we should consider how AI is going to make our jobs easier and relieve some burdens.”

Alex Williams of Escape Studios, an animator whose credits include ‘The Lion King’

Videos generated by OpenAI’s revised prompt:

“It has that slight morphy quality that AI-generated work has, which I don’t think makes it client-ready yet, but that’s something that will get smoothed out.

“Each one is amazing in terms of what it does, but each one [has] obvious mistakes . . . like heads changing shape and flamingos blending into other flamingos — it doesn’t work yet.”

Stills from AI-generated video of flamingos by Runway
Runway’s video had issues with heads changing shape
Stills from AI-generated video of flamingos by Runway

“It didn’t manage to produce a short film with a beginning, middle and end, so it didn’t do what I hoped it would. On the other hand, what it does in terms of animation is very impressive.

“Since I started in animation in the ‘80s, some very significant technological advances have changed the medium a lot. There’s no question that this is the biggest change I’ve seen in my career.

“I would draw comparisons with the switch from 2D to 3D animation, which happened in the late ‘90s when Toy Story came out. There was a lot of resistance among the hand-drawn animation community to those changes, including me, in the beginning.

“It took me a couple of years to realise I had to embrace this change. We all fought it collectively for a while, but it became the great box office driver. As an industry we do need to embrace technology because you never want to get on the wrong side.”

Ashley Shakibai, production manager at commercial real estate agents OBI Property

Videos generated by prompt for promotional video of a commercial building in Manchester:

“Sora did a reasonable job at the start. The transition will always be tricky, and it struggled with that. But I think the photorealism at the end of the shot was quite pleasing and surprising.

“Technically, the prompt was that people were walking in the building, but that was not shown and there were many other elements it didn’t achieve.

“All Pika has gathered from my prompt is a ‘sunny day’. It has given us some flares and a couple of buildings, but you can’t make out the people.”

A screenshot from AI video generated by Sora of a couple’s faces
Sora generated people with more realistic faces
A screenshot from AI video generated by Runway of distorted body
Runway distorted people’s features

“I had to laugh when I watched this Runway one. There’s a bit more photorealism but the people are walking forwards and then backwards, so it’s certainly not a believable scene.

“As an industry professional, my expectation is perfection. I am looking for realistic quality video, and AI is probably never going to quite get there.

“At the end of the Sora video, the couple is having a conversation in a coffee shop, looking like they’re enjoying themselves. That would be a shot that we’d use to sell a commercial property space as an amenity nearby.

“We will eventually reach a point where this is an incredibly powerful tool for creators, inevitably eliminating the use of other tools. Sora will seriously challenge stock websites and the role of actors, both of which we use now.

“You must be very careful when adding computer-generated imagery. If it’s not for a purpose, if it’s not believable, it can be too distracting. It is very much at the testing stage.”

Additional reporting by Madhumita Murgia

Latest article