AI News, Using Machine Learning to Improve Streaming Quality at Netflix

Using Machine Learning to Improve Streaming Quality at Netflix

by Chaitanya Ekanadham One of the common questions we get asked is: “Why do we need machine learning to improve streaming quality?” This is a really important question, especially given the recent hype around machine learning and AI which can lead to instances where we have a “solution in search of a problem.” In this blog post, we describe some of the technical challenges we face for video streaming at Netflix and how statistical models and machine learning techniques can help overcome these challenges.

As we expand rapidly to audiences with diverse viewing behavior, operating on networks and devices with widely varying capabilities, a “one size fits all” solution for streaming video becomes increasingly suboptimal.

At Netflix, we observe network and device conditions as well as aspects of the user experience (e.g., video quality) we were able to deliver for every session, allowing us to leverage statistical modeling and machine learning in this space.

A richer characterization of network quality would prove useful for analyzing networks (for targeting/analyzing product improvements), determining initial video quality and/or adapting video quality throughout playback (more on that below).

The quality of experience can be measured in several ways, including the initial amount of time spent waiting for video to play, the overall video quality experienced by the user, the number of times playback paused to load more video into the buffer (“rebuffer”), and the amount of perceptible fluctuation in quality during playback.

This “credit assignment” problem is a well-known challenge when learning optimal control algorithms, and machine learning techniques (e.g., recent advances in reinforcement learning) have great potential to tackle these issues.

Another area in which statistical models can improve the streaming experience is by predicting what a user will play in order to cache (part of) it on the device before the user hits play, enabling the video to start faster and/or at a higher quality.

By combining various aspects of their viewing history together with recent user interactions and other contextual variables, one can formulate this as a supervised learning problem where we want to maximize the model’s likelihood of caching what the user actually ended up playing, while respecting constraints around resource usage coming from the cache size and available bandwidth.

By employing predictive modeling to prioritize device reliability issues, we’ve already seen large reductions in overall alert volume while maintaining an acceptably low false negative rate, which we expect to drive substantial efficiency gains for Netflix’s device reliability team.

The aforementioned problems are a sampling of the technical challenges where we believe statistical modeling and machine learning methods can improve the state of the art: Solving these problems is central to Netflix’s strategy as we stream video under increasingly diverse network and device conditions.

Enhancing high-resolution 360 streaming with view prediction

In January 2016, we first launched dynamic streaming, a 360 streaming technology that uses geometry-mapping techniques to stream the highest number of pixels to a person's field of view.

We developed three technologies to tackle this challenge: With these techniques, we were able to create a new encoding system that makes 360 video more accessible under difficult network conditions.

If the client tries to fetch a video chunk 10 seconds ahead of its scheduled playback time, deciding which stream to fetch ought to be based on the predicted view orientation 10 seconds into the future rather than on the current one.

However, Heatmap cannot be directly applied for view prediction, due to several properties of 360 video viewing: To address all these properties in view prediction, we developed the gravitational predictor.

For each point on the map, a kernel whose size is equal to the FOV and amplitude is equal to the heat value (i.e., the popularity of that spot signified on the heatmap) is deducted from the corresponding position on the sphere.

Once the landscape is created, the current focal point resembles a marble on the point of the map where the person is looking, and the marble moves to where most people would look.

The gravitational predictor addressed four factors we were concerned with in predicting the user's view orientation in 360 video: The balance between the influence of a viewer's kinetic momentum and the influence of content is tunable by adjusting the gravitational constant (G) in the physics simulation.

To address this, we first built a system to generate heatmaps using techniques like computer vision, data filtering and aggregation, and temporal and 3D spatial interpolation to help indicate the areas of most interest.

It allows us to provide a scalable predictive solution for videos even in the absence of statistical information, while offering us an option that helps users explore interesting visual content wherever it might be.

We expect that integration and the use of AI-generated saliency maps will make our gravitational models more efficient and ultimately will contribute to higher video quality and better performance.

To facilitate content-dependent streaming, we have made significant improvements to the way we manage image layout in order to make the most efficient use of available pixels in each frame.

Although the flat faces of the cube map help improve image compression, encoded cube map images lack uniform angular distribution within faces.

We discovered, however, that if we cut the top 25 percent and bottom 25 percent of an image formatted in an equirectangular layout so we're left with the middle 90 degrees of the scene, we end up with an image that contains a quite uniform angular sample distribution.

To cover the rest of the sphere, we can use the top and bottom faces from the cube map layout — in particular, we need only the middle circle of these faces, since they connect to the equirectangular portion of the layout.

Now if we place the equirectangular portion of the image and the two circles from the cube map side by side, the middle 90 degrees of the equirectangular have a pretty nice, uniform angular sample distribution, while the flat top and bottom cube map faces solve the equirectangular issue for the poles.

With content-dependent dynamic streaming, we can see an increase of up to 51 percent in the effective resolution compared with using cube map and view-dependent dynamic streaming at the similar bit rate.

The new projection and the horizontal offset are available on GitHub today: https://github.com/facebook/transform360 The updates we've shared today, including content-dependent streaming and the integration of both our gravitational and AI prediction models, are currently in testing and will be in production later in the year.

Our solutions Whether you’re delivering over-the-top (OTT) video, webpages, online commerce, applications, games, mobile ads or live events, Verizon Digital Media Services has a faster, more reliable and secure solution for all your content delivery needs.

Understanding Video Streaming

What is video streaming and how does it work? In this tutorial you'll learn the basics. This is a great starter course for those of you ..

no cost income stream 2.0 The simplest way to make money online

For More Information on the no cost income stream 2.0 then please click link This powerful video based training course will teach ..

ML Kit: Machine Learning SDK for mobile developers (Google I/O '18)

ML Kit allows you to harness the power of ML without needing to be an expert in it. Leverage powerful but simple-to-use image recognition capabilities across a ...

Introduction to Voice Over IP

Follow the Insanity at: Downloadable Podcasts at: iTunes: .

Keynote (Google I/O '18)

Learn about the latest product and platform innovations at Google in a Keynote led by Sundar Pichai. This video is also subtitled in Chinese, Indonesian, Italian, ...

Software Strategies and Solutions for Next-Generation Wireless Networks Webinar

This Webinar is presented by Freescale. 3G and 4G wireless networks are facing a severe capacity crunch caused by massive data demand created by smart ...