Researchers from Disney Research and the University of California, Irvine have shown that an artificial intelligence based video compression model has the potential to compete against established video codec technology.
Though still at an early stage of development, the new compression model has apparently yielded less distortion and smaller bits-per-pixel rates than current coding-decoding algorithms such as H.265 when trained on specialized video content. The researchers said they achieved comparable results on downscaled YouTube videos.
Research lead, Stephan Mandt, UCI assistant professor of computer science, explained that ultimately all video compression is a trade off between file size and image quality and that, in aiming for a smaller file size, acceptance of some errors is required. “The hope is that our neural network-based approach does a better trade-off overall between file size and quality.”
The success of a video codec is down to how well it can predict the contents of the next frame so that it has less to memorise and store. Current compression algorithms perform this task by trying to compute the linear displacement of small, localized patches relative to their position on the previous frame. In contrast, deep neural networks take a datacentric approach and learn the video’s underlying dynamics by drawing on large datasets of video material, and advances in deep learning show promise for shrinking video file sizes in future generations of video compression codecs.
The team’s approach involves first downscaling the dimensions of the video using a variational autoencoder which results in rendering each frame into a condensed array of numbers. Mandt explained, “You can think of the autoencoder as having an hourglass shape,” he went on, “It has a low-dimensional, compact version of the image in the middle; this is how we compress every frame into something smaller.”
The compression algorithm then uses an AI-based technique called a “deep generative model” to guess the next compressed version of an image given what has gone before.
The next step is the algorithm’s unique approach to encoding frame content by converting the autoencoder’s array of real numbers into integers, which are far easier to store.
Finally, lossless compression is applied to the array, allowing for its exact restoration. Crucially, this algorithm is informed by the neural network about which video frame to expect next, making the lossless compression aspect extremely efficient.
Mandt likens the process to compressing and uncompressing the letters from a finite alphabet without any loss. Here, he explains, each frame of the video sequence is now a discrete countable alphabet to which lossless compression can be applied, thus reducing the file size even more.
The whole process is described as an ‘end to end’ video compression algorithm but still needs developing into an applicable version. One way of improving it, Mandt suggests, is to compress the neural network itself, along with the video.
“Because the receiver requires a trained neural network for reconstructing the video, you might also have to think about how you transmit it along with the data,” Mandt said. “There are lots of open questions still. It’s a very early stage.”
Could Disney be trying to come up with a video compression process which it feels improves its Disney+ streaming service in time? Let us know what you think.
Image Source: AnalyticsInsight.net