Disclaimer: We may earn a commission if you make any purchase by clicking our links. Please see our detailed guide here.

Follow us on:

Google News
Whatsapp

Nvidia Partners Cornell University to Unveil AI Video Generation Model – VideoLDM

Yusuf Balogun
Yusuf Balogun
Yusuf is a law graduate and freelance journalist with a keen interest in tech reporting.

Join the Opinion Leaders Network

Join the Techgenyz Opinion Leaders Network today and become part of a vibrant community of change-makers. Together, we can create a brighter future by shaping opinions, driving conversations, and transforming ideas into reality.

The emergence of artificial intelligence AI has been one of the most significant technological advancements of the 21st century. From self-driving cars to virtual assistants and chatbots, AI has become ubiquitous in our daily lives. Its effects on several businesses and society at large are immense.

To complement these technological developments, the renowned American graphics processing unit manufacturer, Nvidia in partnership with researchers from Cornell University, has unveiled an AI video generation model named VideoLDM. The new AI can generate high-resolution videos based on text descriptions.

VideoLDM: Nvidia AI Video Generation Model

Based on a text description, the AI model can create videos with a maximum resolution of 2048 x 1280 pixels, 24 frame rates, and a maximum runtime of 4.7 seconds. The stable diffusion neural network is the foundation of the model. Only 2.7 billion of the 4.1 billion parameters in the NVIDIA solution used video for training.

This is quite modest by the standards of modern AI. Using a powerful Latent Diffusion Model (LDM) method, engineers were able to produce a wide range of high-definition films that were both diversified and time-consistent.

VideoLDM Features

The research team from both Nvidia and Cornell University highlight the following features of this model: Both the creation of customized videos and temporal convolution synthesis. LDM image reference networks that have been fine-tuned beforehand in the DreamBooth picture collection are inserted with temporal layers trained in VideoLDM to translate text to video.

You can produce slightly longer clips with no quality loss by applying the learned time planes wrinkle-wise over time. Additionally, the model can produce films of driving scenes. Videos can last up to 5 minutes and have a resolution of 1024×512 pixels.

By employing bounding boxes to create an engaging environment, synthesizing an appropriate source image, and then producing convincing films, it is feasible to recreate a particular driving experience. Additionally, the model may generate a variety of conceivable missions from a single initial frame to provide multimodal predictions of motion scenarios.

Currently, this research is a participant in the Machine Vision and Pattern Recognition Conference taking place June 18-22 in Vancouver. The described neural network is currently simply a research project, and it is unknown when NVIDIA will make something similar available to the general public.

Join 10,000+ Fellow Readers

Get Techgenyz’s roundup delivered to your inbox curated with the most important for you that keeps you updated about the future tech, mobile, space, gaming, business and more.

SourceNvidia

Recomended

Partner With Us

Digital advertising offers a way for your business to reach out and make much-needed connections with your audience in a meaningful way. Advertising on Techgenyz will help you build brand awareness, increase website traffic, generate qualified leads, and grow your business.

Power Your Business

Solutions you need to super charge your business and drive growth

More from this topic