Artificial Intelligence
Blog

Hello From the AI Side – Are NVIDIA’s Maxine and Vid2Vid Cameo the Real Deal?

by
Varun Mehta
July 28, 2021
4
Min(s)

The light fades on, a familiar tune plays on the violin, and a bespectacled and suave gentleman appears and introduces himself as ‘The Professor’. Does this send alarm bells ringing in your head, and you are excited for what’s to come? But, instead of Tokyo, Berlin, Nairobi, and Salvadore Dali masks, you are greeted by ‘The Framer’, ‘The Animator’, and ‘The Compressor’.

You might be wondering if your mind is playing tricks on you. Let us assure you it isn’t. This is not Netflix’s widely popular series ‘Money Heist’, but NVIDIA’s latest innovation. We had you there for a minute, didn’t we?

Imagine you woke up late for a video meeting with tousled hair and crumpled clothes. Then, without any care in the world, you switched on your laptop and turned on the webcam, and boom – you are looking perfectly formal for that morning meeting. No, this is not some sorcery, as you may be speculating, but NVIDIA’s Vid2Vid Cameo, powered by NVIDIA’s Maxine.

What is NVIDIA Vid2Vid Cameo?

Let us start by answering the question that’s burning a hole in your mind. What exactly is the NVIDIA Vid2Vid Cameo?

The Vid2Vid Cameo is essentially an AI (artificial intelligence) model that employs GANS (generative adversarial networks) to develop realistic cameo videos with a single image of a person. GANs are part of machine learning. They can create images that look like photographs of people, even though those faces are not of a real person. A GAN has three key components – the generator model that generates new data or images, the discriminator model that classifies if the generated images are real or fake, and the adversarial model that compares the images against each other.

Now coming back to the Vid2Vid Cameo, the GAN helps create ‘talking head’ videos of people. Additionally, the Vid2Vid Cameo offers state-of-the-art performance thanks to a training dataset of at least 180,000 high-quality videos.

The Vid2Vid Cameo was first introduced to the world last October as part of NVIDIA Maxine, a cloud-based video streaming and conferencing AI platform (more on this later).

This Vid2Vid Cameo was designed keeping video conferencing in mind. All it needs is a single image of a person and a video stream to create a virtual avatar. This AI-based technology detects 20 critical points that focus on facial features like the eyes, nose, ears, and mouth. These points are then extracted to create the virtual image, allowing you to attend video conferencing meetings in your PJs.

Connection Between NVIDIA Vid2Vid Cameo and Maxine

5th October 2020, the stage was all set for the occasion – the NVIDIA GPU Technology Conference (GTC) 2020. NVIDIA CEO, Jensen Huang made an announcement that set the platform to change the entire video conferencing world. He introduced a new product titled Maxine. This product came about due to the ongoing COVID-19 pandemic wherein there was a shift to remote working. This remote working scenario led to virtual collaborations that resulted in a whopping 70 million hours of videoconferencing every day. NVIDIA wanted to capitalize on this opportunity; thus, Maxine was born.

In a nutshell, Maxine is a cloud-based video streaming and conferencing platform that offers AI-powered capabilities to web meetings daily. Some of these AI features include Vid2Vid Cameo, noise cancellation, face alignment, super-resolution, gaze correction, and face relighting.

Talking about Maxine at the launch event during the GTC 2020, Ian Buck, Vice President and General Manager of Accelerated Computing at NVIDIA, passionately stated, “Video conferencing is now a part of everyday life, helping millions of people work, learn and play, and even see the doctor. NVIDIA Maxine integrates our most advanced video, audio, and conversational AI capabilities to bring breakthrough efficiency and new capabilities to the platforms that are keeping us all connected.”

Maxine offers several key advantages, which include:

End-users do not require any specialized hardware since the data is processed in the cloud.

Real-time AI performance that is optimized depending on the GPU (graphics processing unit) in the system.

Pre-programmed world-class models that offer AR, high-quality audio, and video capabilities.

Programmed for various workflows with in-built capabilities such as video streaming, video decode, transcode, encode, analytics, and conversational AI.

But possibly the most significant impact of NVIDIA Maxine and Vid2Vid Cameo is the reduction of the video bandwidth to one-tenth of the current standard. NVIDIA researcher Ming-Yu Liu gave the explanation surrounding this in a research paper, “Instead of sending bulky live video streams from one participant to the other, video conferencing platforms can send data on how the speaker’s key facial points are moving. On the receiver’s side, the GAN model uses this information to synthesize a video that mimics the appearance of the reference image.”

During the press release for Maxine, she also added that Maxine and Vid2Vid Cameo would also be of great assistance to people who have limited bandwidth but want to enjoy untroubled video calls with their friends and family. In addition, she also mentioned that this software could greatly assist game developers, animators, and photo editors.

Are NVIDIA Maxine and Vid2Vid Cameo the Future?

As is the case with almost every product, even Maxine and Vid2Vid Cameo have their share of criticism, which is twofold.

Numerous users have raised the issue of deepfakes, but what exactly are they? Deepfakes are the 21st century’s answer to Photoshop. Deepfakes use AI to create fake images, videos, and events, hence the name.

Herein lies the problem of deepfakes in NVIDIA Maxine and Vid2Vid Cameo. This could lead to phishing, identity theft, cyberbullying, and worse, obscene deepfake videos.

NVIDIA took this issue in a positive spirit and is building safeguards to prevent any of the incidents mentioned above from occurring. But, will it work out in the long run? Only time will tell.

Other cynical users and technology experts have spoken about how NVIDIA Maxine can be used to rope in greater sales of NVIDIA’s GPUs. This is due to the trade-off between lesser bandwidth and higher CPU consumption, leading to more demand for NVIDIA’s top-class GPUs.

However, you won’t find us complaining about higher CPU consumption when there are other excellent benefits to consider.

With this innovation, the demand for AI and AR is at an all-time high, so how can Mutual Mobile be left behind? As the foremost experts in these fields, we have completed high-quality projects for organizations like Walmart and Builders First Source.

If you require any such automation or app and website development, you know who to contact for excellent tailored solutions.

Become an early mover with us

Explore Your Options