Best Practices For Using A Multi-CDN Strategy: How To Balance, Prioritize and Optimize Traffic

The debate surrounding the use of a multi-CDN strategy has been gaining momentum over the past few months with more case studies showing how it can be done. For a while now, multiple vendors have provided CDN load-balancing as a service, and in that time customers have learned a lot about the process of configuring CDNs to improve quality and match business goals. When used correctly, a multi-CDN strategy provides great advantages to content owners including the ability to better control quality, prevent overage charges, ensure bandwidth commitments are met, and permitting a selection process for delivery using additional requirements. A multi-CDN strategy requires two decisions: the first is the criteria used to select the CDN, and the second is deciding the process by which the switch between CDNs is carried out.

There are many selection strategies that can be used when discussing CDN balancing, and solutions provider NPAW recently shared with me how they explain the process to customers. There are three types of strategies, those being balanced, prioritized, and optimized. Using a balanced strategy, one merely distributes traffic with different thresholds (like traffic served or concurrent streams) to spill over into secondary CDNs upon reaching a specified limit. A prioritized schema provides a criteria hierarchy, which may include the platform, ISP, device, or protocol utilized until a certain level. For example, you can control concurrencies in your own delivery network better by diverting overflow to a regional or global CDN network depending on the amount of incoming traffic, or number of concurrent users. Ultimately, on the most granular level, an optimized strategy leverages performance metrics in the decision making process. This means that the CDN chosen is the “best” performing CDN, which has received the highest score across a number of factors, including the recent measurement of QoE metrics within a specific region, for a specific piece of content, considering the ISP and device of the end-user intending to access the video.

By choosing the best performing CDN for each user/view, OTT platforms and content distributors can significantly reduce buffering rates, play failures, and join times of their services which results in driving more consumption, reduced churn, longer play times, and maximizing a user’s quality of experience. The second part of the process is deciding how to actually perform the switch once the CDN has been selected. There are three main ways to execute a switch: based on DNS routing, through the sole use of client-side plug-ins, and based on client or server-side communication between APIs.

DNS: A CDN switching technique that works at the DNS level can be integrated without modifying the app, as it is independent of the application layer. This is a big advantage as far as integration is concerned, although it makes CDN traffic analysis more difficult afterwards. The main benefit (which can also turn into the worst drawback) is that the application is unaware of anything about the CDN being used and therefore cannot influence the DNS routing.

A DNS routing switch indicates the URL of the service and this URL is divided into two different parts: the base, which changes every time there is a CDN switch, and the content, which specifies the video content to be delivered. DNS routing for VOD streaming poses a low risk with such modifications of the URL, but live streaming switching might not be possible given the specific URLs used by some CDNs where not only the base route changes, but also parameters in the entire URL.

PLUGINS: CDN-switching based on plugins is basically a third-party software platform inside the player that makes the decision to switch between CDNs. When switching between CDNs (or even renditions) this permits parts of QoE metrics and performance issues affecting to the user’s device (for example CPU performance or memory usage) to be taken into account. NPAW says this grade of autonomy, although it may seem tempting, is very dangerous because these systems are making very important decisions without any knowledge of the whole business context.

Plugin based switching may make automatic adjustments to account for preset QoE parameters, but since the program is unaware of the context of that adjustment, the chosen CDN might not match the business and strategic goals the distributor desired. However, the main risk associated with having an autonomous system in your player that makes decisions purely based on performance, is the liability of impacting another “middleman” that can fail along the critical path with your video delivery plan.

API: While a bit more complicated to implement than a DNS solution, one of the most important differences between an API based model and other strategies is that it is completely scalable. For example, CDN-Switching based on plugins is a model where the switching method occurs in the players. This means that for each new player adopted by the customer a new implementation with that new player is required, with the cost of time and development that this represents included as well. By contrast, a solution based on APIs is totally scalable as it functions based on the server, so the integration of new players is fast, effective, and without additional costs.

Also, if a switching method based on an API is used, the communication can be extended from ‘client-server’ to ‘server-server’. In fact, this is the communication method that the majority of industry leaders use. The client or server-side sends a request to for instance NPAW’s API inquiring as to which CDN is the best for a specific IP and device. NPAW’s API computes the algorithm considering the configurations previously made by the customer and it returns the CDNs ordered based on the switching method configured in real-time. Then, the client’s API will finally choose the CDN and redirect the data flow. Here’s a diagram from NPAW that shows how their Youbora solution works:

communication_client-server_vs-_communication_server-server

Last but not the least, NPAW’s solution exists independent of the “critical path” as their platform operates based on the server, not based on client. The players-plugins scheme only collects useful information, it does not execute actions, which means that they will never drive a total blackout of the service, with the economic costs that this means for the customers.

Content owners I have spoken to have tried and tested many alternatives with a CDN-switching method based on APIs. Yet, the industry appears to be defaulting to the belief that an API based switching technique offers a lot more benefits when compared to the other solutions previously explored including low customization, low client side cost, and higher flexibility. Multi-CDN deployments aren’t new in the industry, but they are getting a lot of traction as of late with solutions like NPAW’s and others in the market that let you do it easily, cost-effectively and most importantly, based off of real video QoE data.

  • Tim Napoleon

    A couple other points to mention. With the switch to HTML 5 and several platforms like Android now requiring secure delivery it is more practical to have an origin and then select the front side edge to deliver everything (video and other assets such as player css and javascript). With Flash managing cross domain rules was a simple xml file. WIth HTML 5 and secure delivery very complex security policies come into play. With formats like HLS the manifest and chunklist will cause security alerts and depending on browser settings non playable links. For security and performance having all elements player/manifest/assets inside a secure url that you control simplifies deployment. Test the “S” in https across a wide selection of devices and browsers to make sure all parties have it perfected.

    Second when factoring when to select the “fastest” CDN there is the following flow that needs to be factored in. The first time or session a user hits a site there are a lot of files and connections that fire to render the page elements and app metadata. The stream chunks and playback get caught up in this initial congestion. The system switches to the back up streaming URL at this point as the initial congestion clears. The backup CDN appears to the analytics to be faster. A way to control for this is to randomize which CDN is responsible for first load. Almost always the “back up” CDN will show the majority of their improved performance based on this.

    Lastly, cache control and midgress is huge factor when tuning performance. When you split traffic you will have more cache misses and lower cache rates. Used CDNs that offer cache hierarchy meaning they will allow edge servers to pull from regional servers that hold files at the mid-point between edge and origin allowing cache hit misses to be filled faster. Clients with geographic pockets of customers for example large base in Europe and Asia will see a huge improvement in performance if they origin their content in region. This is especially true for colder content. With CDN storage regions and cloud providers now servicing globally having these forward origins I would argue will improve performance as much as multiple edge providers.