• About
    • About WebM
    • FAQ
    • Discuss
    • Supporters
  • Developer
    • Overview & Code Repos
    • Contribute
    • Submitting Patches
    • Code Reviews
    • Workflow
    • Conventions
    • Bug Reporting
    • Build Prerequisites
    • Repository Layout
    • Releases
    • Roadmap
    • Hardware
  • Docs
    • libvpx API
    • RFC 6386: VP8 Data Format
    • WebM Container Format
    • VP8 RTP Proposal (Draft)
    • Encoder Examples
    • Wiki
  • Tools
  • Licenses
  • Blog
  • Home >
  • Hardware >
  • RTC Coding Requirements

RTC Hardware Coding Requirements

Show Contents

  • Overview
  • Requirements
    • Coding Tools
      • Encoder
      • Decoder
      • Post-processor
    • Throughput
      • 1:1 call
      • Multiparty Call
    • Low-latency
    • Temporal Thinning
    • Rate Control
    • Region of Interest
    • Encoding Quality
    • Error Resilience
    • Receiver Feedback
    • Screencasting

Overview

This document outlines the mandatory feature requirements for a hardware video codec implementation in order to fully support Google’s vision of high-quality real-time communication (RTC), including the WebRTC project.

See the Tech Talk from Google I/O 2012 (40 minutes) for a good overview and current status of WebRTC technology, and how to use the APIs.

Google provides silicon-proven VP8 hardware encoder and decoder IP cores meeting the requirements listed in this document free of charge. Visit the hardware page for more information.

Requirements

Coding Tools

This paragraph lists the VP8 codec tools required for a high-quality RTC experience.

Encoder

The VP8 encoder must support the previous, golden and alternate reference frame types. The encoder API must enable handling of the reference frames as follows:

  • Update a golden reference frame
  • Update an alternate reference frame
  • Update a previous reference frame
  • Use a golden frame as reference
  • Use an alternate frame as reference
  • Use a previous frame as reference
  • Create a non-referenced frame (a.k.a. a droppable frame)

The three frame types are used for two purposes: temporal scalability and error resilience.

Furthermore, the encoder needs to support a set of coding tools that helps it meet the quality requirements mentioned later in this document. The encoder must also support multiple simultaneous encoding instances.

It is recommended that the coding system implements a denoiser either in the ISP (Image Signal Processor) or in the encoder. A motion-compensated temporal denoiser reference implementation is available in the libvpx code: http://code.google.com/p/webm/downloads/list

Decoder

The decoder must support all the coding tools defined in the VP8 bitstream specification. It must also support multiple simultaneous decoding instances.

Post-processor

Temporal post-processing is recommended for increasing the quality of base layer frames in case of multiple temporal layer coding, and for removing the lower quality keyframe popping effect.

In the reference post-processing algorithm, each macroblock in the current frame is compared to the co-located macroblock of the previous frame, and if their Sum of Absolute Difference exceeds a set threshold T, a weighted averaging operation between the two is applied.

A reference implementation can be found in the libvpx code: http://code.google.com/p/webm/downloads/list

Throughput

1:1 call

In the one-to-one call, both encoder and decoder need to be capable of 720p@30fps, 2 Mbps throughput.

Multiparty Call

The multiparty call provides spatial scalability by the means of each encoder encoding and streaming three separate bitstreams of different resolutions to the RTC backend server. The server will send each receiving client only one of the three possible streams, based on their available bandwidth.

The encoder needs to be capable of simultaneous

  • 720p@30fps, 1.2 Mbps
  • 360p@30fps, 500 kbps
  • 180p@30fps, 100 kbps

throughput. This sums up to 141750 macroblocks per second, at a total bandwidth of 1.8 Mbps.

The decoder needs to be capable of simultaneous

  • 720p@30fps, 1.2 Mbps
  • up to (N-1)x 180p@30fps, 100 kbps

throughput, where N is the maximum number of participants. Using N=20, the total macroblocks per second is 243000 macroblocks per second, at a total bandwidth of 3.2 Mbps.

Both encoder and decoder need to be able to return the maximum supported resolution information to the host.

Low-latency

The encoder needs to process the data in real-time, without buffering frames internally. I.e. each frame received from the sensor needs to be encoded and sent out immediately.

Temporal Thinning

The RTC applications use temporal thinning as a means to quickly adjust data rates to network bandwidth variations. Frames forming each incremental temporal layer can be dropped without affecting the decoding of lower layers.

The encoder must support up to three temporal layers, and use the reference frame handling operations for implementing them as described below:

  • Layer 0 is encoded as previous frames that use previous frames as references.
  • Layer 1 is encoded as golden frames that use golden or previous frames as references.
  • Layer 2 is encoded as non-referenced frames that use golden, alt-ref or previous frames as references.

A typical frame rate for layers 0, 1 and 2 is 7.5 fps, 7.5 fps and 15 fps respectively, i.e. each participant can view either 30 fps, 15 fps or 7.5 fps video based on their bandwidth.

Rate Control

In order to provide a great user experience over variable network conditions, the VP8 encoder must support the following bitrate adjusting features:

  • Constant bitrate
    • Should have a rate control to adjust the encoded bitrate to be equal to the target bitrate within a set time window
    • Should be able to set a minimum and maximum quantisation parameter
    • Should be able to change the target bitrate on-the-fly within the defined time window (the next time window should have the new target in place)
    • Should be able to change the target rate at any time without generating a key frame
    • Note: inserting stuffing bits is not required for ensuring exact bitrate match at all time instants.
  • Multiple streams and layers
    • Should be able to set different bitrate targets for different streams encoded simultaneously
    • Should be able to set different bitrate targets for different temporal layers
  • Should be able to drop frames if necessary to keep the bit rate within the target
    • Should be able to drop frames from selected layers only, or drop entire layers

Region of Interest

Application can set up a region of interest as a rectangular area in the frame. Encoder should use this information to concentrate the usage of available bitstream bandwidth to the specified region of interest. Region of interest control is applied only in constant bitrate mode, and can be implemented using VP8 segments.

Encoding Quality

On desktop computers, the Google RTC applications are using the libvpx encoder library with command line settings --rt --cpu-used=-5. For consistent user experience across devices, a hardware encoder

  • Must be able to compress video sequences to the aforementioned target bitrates at the target resolutions
  • Should meet or exceed the quality of libvpx at these resolutions and bitrates.

For quality benchmarking, libvpx can be downloaded at http://code.google.com/p/webm/downloads/list.

Error Resilience

The encoder needs to be able to disable the probability table updates so that the entropy tables can be independent between layers.

The aforementioned list of reference frame update/use operations can also be used for error resilience purposes.

Receiver Feedback

Receiver can send information to the encoder about correctly received and decoded golden and alternate reference frames. Receiver can also send information about corrupted frames when they are detected in the bitstream. Encoder can utilize this information to encode the output bitstream in a way that allows receiver to recover from the errors.

When encoder gets information about corrupted frames it should try to recover from the corruption either by encoding the corrupted blocks as intra blocks or by using correctly received alternate and/or golden reference frames as references. In constant bitrate mode recovery must obey given bitrate constraints.

This feature has been designed to closely match the RTP payload format for VP8 video, defined in http://tools.ietf.org/html/draft-westin-payload-vp8.

Screencasting

In screencasting, a client in the RTC session shares his screen with other participants, which is great for seminars, video conferences, teaching, presentations etc. The screencasting supports up to 2560 x 1600 resolution at 5 fps, at 100 kbps. Both encoder and decoder should support this resolution.

About
  • About WebM
  • FAQ
  • Discuss
  • Supporters
More
  • Tools
  • Hardware
  • Licenses
  • Downloads
Developer
  • Overview
  • Contribute
  • Submitting Patches
  • Code Reviews
  • Workflow
  • Conventions
  • Bug Reporting
  • Build Prerequisites
  • Repository Layout
  • Releases
  • Roadmap
Docs
  • libvpx API
  • RFC 6386: VP8 Data Format
  • WebM Container Format
  • VP8 RTP Proposal (Draft)
  • Encoder Examples
  • Wiki
Copyright 2010 - 2013
The WebM Project
HTML5 Powered with CSS3 / Styling, and Semantics
Follow @WebM
webmaster@webmproject.org