Apple’s Adoption Of HEVC Will Drive A Massive Increase In Encoding Costs Requiring Cloud Hardware Acceleration

For the last 10 years, H.264/AVC has been the dominant video codec used for streaming but with Apple adopting H.265/HEVC in iOS 11 and Google heavily supporting VP9 in Android, a change is on the horizon. Next year the Alliance for Open Media will release their AV1 codec which will again improve video compression efficiency even further. But the end result is that the codec market is about to get very fragmented, with content owners soon having to decide if they need to support three codecs (H.264, H.265, and VP9) instead of just H.264 and with AV1 expected to be released in 2019.

As a result of what’s take place in the codec market, and with better quality video being demanded by consumers, content owners, broadcasters and OTT providers are starting to see a massive increase in encoding costs. New codecs like H.265 and VP9 need 5x the servers costs because of their complexity. Currently, AV1 needs over 20x the server costs. The mix of SD, HD and UHD continues to move to better quality: e.g. HDR, 10-bit and higher frame rates. Server encoding cost to move from 1080p SDR to 4K HDR is 5x. 360 and Facebook’s 6DoF video are also growing in consumption by consumers which again increases encoding costs by at least 4x.

If you add up all these variables, it’s not hard to do the math and see that for some, encoding costs could increase by 500x over the next few years as new codecs, higher quality video, 360 video and general demand increases. If you want to see how I get to that number, here’s the math:

  • 5x number of minutes to be encoded over today
  • 5x the encoding costs for new codecs like VP9 and HEVC over H.264
  • 5x as more video is in higher resolution, higher frame rate, HDR (e.g. 1080p60 SDR vs 4Kp60 HDR is 5x pixels)
  • 2x as now you have to support two codecs (H.264 & HEVC or VP9)
  • 2x if you have to support 360 video and Facebook’s 6DoF (Degrees of Freedom)

This is why over the past year, a new type of accelerator in public clouds called Field Programmable Gate Array (FPGA) is growing in the market. Unlike CPUs and GPUs FPGAs are not programmed by using an instruction set but instead by wiring up an electrical circuit. This is the same way traditional Application Specific Integrated Circuits (ASIC) are programmed but a big difference is that FPGA can be programmed “in the field”. This means it can be programmed on demand in the cloud just like CPUs and GPUs are. Fortunately, customers just need to change a single line of code to replace a software encoder with an FPGA encoder and still get the benefits of using common frameworks like FFmpeg.

Encoding software such as x265 contains a great many presets that allow the user to customize settings and trade-off overall computing requirements against the size of the encoded video. x265 can produce very high-quality results with the “veryslow” preset. The coding rate (frames per second encoded) is low, yielding the best compression, but with considerable cost of encoding resources. On the AWS EC2 c4.8xlarge instance running X265 deliver only 3 frames per second (fps) of 1080p video. Hence to deliver 60fps 20x c4.8xlarge instances would be required which would cost around $33 an hour.

To put that in comparison, video compression vendor NGCodec’s encoder running in the AWS EC2 FPGA instance f1.2xlarge can deliver better visual quality than x264 ‘veryslow’ but can deliver over 60 fps on a single f1.2xlarge instance. The total cost would be around $3 including the cost of the f1 instance and the cost of the codec. This is a total savings of over 10x as well as avoiding the complexity of parallelizing live video to use multiple C4 instances. This cost and quality benefit is why public cloud providers like Amazon, Baidu, Nimbix, and OVH have already deployed FPGA instances which their customers can use on demand. Many other data centers providers tell me they are also in development of FPGA public instances and I expect this trend to continue.

I’d be interested to hear what others think of FPGA and welcome their comments below.

  • Tom

    These cost calculations for software-only encoding are not accurate. At typical HEVC video streaming bit rates (around 3 Mbps), using the veryslow preset, x265 will encode about 4 frames/second. No experienced video professional tries to achieve 60 FPS with x265’s veryslow preset. More commonly, a streaming video service will use the slower preset, or the slow preset, which deliver twice or four times the frame rate of veryslow, with a very small tradeoff in compression efficiency. For 60FPS real-time encoding system operators use one of our faster presets. We’re constantly improving the compute efficiency of our real-time and offline encoding algorithms, and we’ll have a live 4K60P 10 bit encoding demo at IBC.

    A single instance of x265 encoding 1080P video can’t use all 36 threads of an Amazon EC2 C4.8xlarge instance efficiently. You would only see less than 50% effective CPU utilization. Streaming movie services use our UHDkit library, which can run multiple encoder instances in parallel, utilizing a many-core server fully efficiently. UHDkit shares information across encoder instances, encoding multiple bit rate tiers (for adaptive bit rate streaming) twice as efficiently as encoding each bit rate tier individually.

    Your calculations are based on Broadwell Xeon processors that launched 2 years ago. We’re seeing roughly 50% higher performance per core from the current generation Intel Xeon Scalable Processor Family, which will power the forthcoming C5 instances (announced last November). We are working to further optimize x265 on the latest generation Xeons. That performance, combined with more cores per dollar, is massively improving the productivity of software encoding, while fixed-function and FPGA performance and quality remain static.

    • Tom

      5x number of minutes to be encoded over today – has nothing to do with HEVC.

      5x the encoding costs for new codecs like VP9 and HEVC over H.264 – potentially accurate, depending on your settings, but this will be offset by much lower bandwidth costs, which are often the dominant cost for a video service, and/or a higher quality of experience, which is what drives business results (market share, brand value, subscriber growth, lower churn, higher average selling price, etc.).

      5x as more video is in higher resolution, higher frame rate, HDR (e.g. 1080p60 SDR vs 4Kp60 HDR is 5x pixels) – Increasing the pixel resolution, bit depth or frame rate of a video experience is independent of “the cost of HEVC”. If you want to massively improve the quality of your video experience, without a linear increase in the cost to deliver that higher quality experience, upgrade to HEVC, which won’t require 5x the bandwidth.

      2x as now you have to support two codecs (H.264 & HEVC or VP9) – No, the cost of AVC encoding stays the same. So if HEVC encoding is 5x AVC, you don’t multiply by 2 here, you multiply by 1.2 (add back in the cost of AVC encoding, which is 20% of HEVC).

      2x if you have to support 360 video and Facebook’s 6DoF – Again, choosing to upgrade the depth or quality of the video experience you’re offering is independent to the decision to support HEVC. In order to deliver a high quality 360 degree video experience, a codec that is twice as efficient as H.264 is essential.

      • NGCodec

        Most likely all 3 codec are going to be required – H.264 for legacy, HEVC for Apple devices and VP9 for Android devices. On a low cost $50 Android handset the HEVC patent cost are too high and so VP9 and H.264 will be used. Hence 2x the costs to support multiple codecs.

        • Tom

          The majority of Android devices will support HEVC natively, allowing video services to avoid having to support 2 new codecs. The hardware is already in most high-end smartphones, and the technology is already licensed by the leading device OEMs (Samsung, LG, Sony, etc.).

          • NGCodec

            I agree most mobile chipsets have the hardware support for both HEVC and VP9 decoding. I don’t see the majority of OEM’s on Android supporting HEVC the cost is just too high. Outside of Apple and Samsung I am not aware of any one else that has hit the HEVC cap for the essential patents.

          • Tom

            The LG G6 and the Sony Xperia XZ Premium support HEVC. Android smartphone and tablet vendors can’t afford to fall behind Apple in terms of their technical specifications. Typically they try to advertise better raw hardware capabilities than Apple. The cost of licensing HEVC patents is not a fully settled matter, but at the end of the process, I don’t expect it to be too high (thanks to the RAND obligation that all companies have that contributed to the standard). Keep in mind that Apple is also supporting HEIF photos, which use HEVC to store photos with 2.4x the efficiency of JPEG. For image sequences (burst mode), you get another 2 to 5x gain. This photo compression benefit can’t be provided by VP9, and this benefit is as valuable or more valuable than the improvement in video compression. Turning a 128 GB phone into the equivalent of a 200+ GB phone, while cutting your bandwidth consumption for photo and video backup, sharing and enjoyment is quite a valuable feature. HEVC hardware is already in the most smartphones, TVs and connected set-top boxes sold in the past 2 years. When Apple pushes out the free iOS 11, MacOS High Sierra, Safari, tvOS and HLS updates in the fall, roughly 1 billion devices will have platform-native HEVC and HEIF capability. The photo and video experience on these devices will be twice as good as on devices that don’t support these formats. I don’t expect leading Android device OEMs to sit on their hands and let Apple double or triple their market share.

          • NGCodec

            Technically HEVC and VP9 performance the same within a small margin of error. We are one of the few companies which have built broadcast quality encoders for both so we know.
            HEVC licensing is still a mess 5 years after the standard was completed. 3 major patent pools + a number of companies which are not in any of the patent pools. Maybe it will get solved but for now the majority of the android market is going to use VP9 instead of HEVC.
            I am expecting to see an open still image profile leveraging VP9/AV1 Intra to compete with HEIF.

          • Kirk

            I am expecting to see an open still image profile leveraging VP9/AV1 Intra to compete with HEIF.

            There may be a new WebP profile with VP9 andor AV1 support. But VP9 and AV1 could also be added to HEIF. HEIF itself is just an image container format and indepent of the codecs used inside it. Right now HEIF supports JPEG, H.264, and H.265:

            https://github.com/nokiatech/heif

            But it could be extended with VP9 and AV1 support. And, since it’s an image format, software implementations of the decoders will be fine on even very old systems.

    • danrayburn

      Hi Tom, these calculations may change of course as new solutions come into the market, but the key word you use is “tradeoff”. That’s defined very differently by companies, and many don’t want a “tradeoff” of any kind. With faster presets some video quality is being lost and that’s not acceptable to many. That said, everyone has a different definition of what’s “acceptable” quality to them, based on the application they are using video for.

      Amazon doesn’t support the C5 yet in their cloud, so while optimizing for them is something I expect more vendors to do, it’s impact in the market is limited until the cloud providers support them.

      • Tom

        Dan – you started with the wrong assumptions. I happen to know the x265 settings that many of the largest video services in the world use. I can’t share them with you, but I can assure you that your starting point is not the right one. As you know, there are seriously diminishing returns when you try to extract every last ounce of efficiency out of a video encoder. The encoding efficiency difference between x265’s slow, slower, and veryslow presets is very, very small, but the difference in encoding speed is large. Given all of the tradeoffs involved, more services use settings equivalent to our slow or slower preset than our veryslow preset.

        • Reel Solver

          Tom, your point is well taken, with regards to effectiveness (better encoding) versus efficiency (longer encoding times). With that in mind, I submitted a Streams of Thought column in mid-July for the September issue of Streaming Media magazine talking about that precise tradeoff. Watch for that in the upcoming issue… Tim Siglin

    • NGCodec

      Fundamentally FPGA have 10x or more performance than software. Just as Moore’s law improves CPUs it will also improve FPGAs. Currently the C4 Instance is the faster processor available on AWS. By running with faster presets VQ is being lost increasing CDN and storage costs and lowering the consumer experience.

  • VidGeek

    Interesting discussion. A few points:
    1. I’m not aware of VP9 use outside of YouTube for obvious reason. Google seems to have little interest in optimizing the libvpx codec because their focus is on AV1. libvpx transcoding speeds are poor especially with >HD resolutions and higher bit-depths. The HEVC endorsement from Apple is significant not just because of the codec but also because of the protocol change to HLS (the move to fragmented MP4 making it much closer to CMAF).
    2. While cloud-based FPGA instances are interesting, they are only currently available within the AWS us-east-1 (N. Virginia) region. The limited regional deployment also implies limited capacity. Finally, I’ve yet to see published pricing for the f1 instance type but they are likely to be considerably higher than standard instance types with limited. If we use the GPU-backed instances to guide the pricing comparison, the CPU only general purpose c4.8xlarge is $1.591/hour compared to $2.28/hour for the previous generation g3.8xlarge GPU-backed instance and $7.20/hour for the current generation p2.8xlarge GPU-backed instance. These comparisons are imperfect because I’ve chosen similar vCPU sizes, whereas I should really be comparing throughout for a given workload, but that’s not possible. So… like GPUs before them, FPGA in the cloud is likely to be fast but expensive and that might be fine for certain applications but I doubt it is a panacea.

  • Tom

    Oliver – is NGCodec claiming that you can match x264 –preset veryslow quality at 1080P60 real time, or x265 –preset veryslow? This blog post discusses x265 performance as a baseline, but then goes on to say that NGCodec “can deliver better visual quality than x264 ‘veryslow’”. If this is a typo, and you’re actually claiming to have matched or beaten x265’s veryslow preset, you should make your HEVC bitstreams available for all to see and compare side by side with the x265 veryslow bitstreams.

    • NGCodec

      Yes our Chestnut encoder running at 60fps on AWS F1 has better VQ than X265 veryslow for the typical bit rates used in an ABR use case. Like other commercial encoders vs open source we make our bitstreams available under NDA to our customers. If any potential customer would like to evaluate then please reach out to me directly.

  • Ben Waggoner

    I think we are at best decades away from people encoding 360 DoF 60p content in 5x the volume the industry is doing normal video content today! The production cost of that content is also orders of magnitude higher per minute, which will strongly limit the availability of sources to encode. I doubt we’d get to even 5% of content being 360 DoF before we are in the AV2/H.266 era.

    Even today, maybe only 1% of content is UHD and less than that is HDR. The viewership per UHD/HDR title is going to be a lot higher because that tends to be compelling high-budget content. But encoding costs are proportional to what gets encoded, not watched.

    Also, most scripted content is 24p. We see 60p in sports/news/reality programming, but it is a pretty small minority of VOD viewership. As it is, there are only four big budget 60p titles I know will be released this decade (Billy Lynn and the Avatar sequels).

    Also, no one will do live at anything near –preset veryslow.

    • IO

      I agree, especially on the last line 😉 but I add: at competitive bitrates and for complex scenes as it’s easy to show same performance between fast live hw encoding and slow offline sw encoding at high bitrates and for not complex content.

  • Ryan .

    Microsoft currently uses FPGAs for Bing.