Google introduces convolutional network that has self-supervised tracking via video colorization

Google Colorizing Videos

Have you not upgraded your website to HTTPS yet? Upgrade NOW.

Google with its Chrome 68 update to show all HTTP websites as NOT SECURE. Avoid Google's penalty by installing an SSL Certificate. Get a DigiCert Standard SSL and secure your website at just $157/year. BUY NOW

Get daily updates straight in your inbox.

Google takes a step further. Researchers at Google with the help of artificial intelligence introduces convolutional network that can not only add color to black and white videos but also can constrain those colors to particular objects, people, and pets in a given frame, like a child coloring within the lines of a flip book.

It has always been a problem for users to track objects in a video via computer vision. Visually tracking objects is challenging because it requires large, labeled tracking datasets for training, which are impractical to annotate at scale. And so Google makes use of a convolutional network that can put a color frame to the gray-scale videos.

We introduce a convolutional network that colorizes grayscale videos, but is constrained to copy colors from a single reference frame. – Carl Vondrick, Research Scientist, Machine Perception.

The Google colorizing videos, as the scientists describe is a convolutional neural network, a kind of neural network that is architecturally well-suited to object tracking and video stabilization. It helps to learn and to follow multiple objects through occlusions.

The first step taken was to teach the algorithm to color the gray-scale movies. Researchers taking clips from kinetics datasets like videos from YouTube that covers a wide range of human action converts the first frame to black and white. Then the neural network was trained by them to predict the original colors in subsequent frames.


It was challenging to train the neural network as the model had to color moving objects and regions and was effectively forced to learn in order to track those objects and regions.

This forces the model to learn an explicit mechanism that we can use for tracking. – Vondrick

In the result, the model can keep tabs on any region specified in the first frame of the video and, if given points of reference, can even track human poses. It is greatly impressive to witness how it outperforms several state-of-the-art colorization techniques.

Source: Google AI Blog

Google introduces convolutional network that has self-supervised tracking via video colorization