Last modified: 2016-09-19
Author: Frank Galligan
Define a mechanism for supporting AES encryption in the WebM video container specification.
There is a W3C proposal to add extensions for encrypted media. In order for WebM to be supported, it requires a system-independent way of encrypting the files.
Matroska has support for encrypting certain elements with AES (ContentEncryption element), but does not define how they are encrypted.
Advanced Encryption Standard
An encryption algorithm that works on fixed length blocks of data.
This is the block used to generate the keystream with AES-CTR.
A mode of AES encryption that uses Counter Blocks to generate a key stream that is then XORed with the plaintext to produce the ciphertext.
A non-secret auxiliary input to cryptographic algorithms used to prevent certain classes of attacks. Fixed size input to the cryptographic algorithm.
Media that is captured and sent to users at a specific time.
MPEG Common Encryption (ISO/IEC 23001-7)
Video on demand. Previously recorded media files that are watched when a user decides to watch them.
In this use case, a content distributor wants to serve protected content to users. The users want to watch the encrypted content, while also seeking to other times within the media.
In this use case, the user wants to playback the encrypted content from local storage.
In this use case, encrypted frames may arrive to a client out of order. The client may want to decrypt the frames as soon as they arrive. An example of this use case is WebRTC, which decodes out of order video frames.
3.1.1 Use the smallest possible number of encryption parameter combinations, ideally one.
3.1.2 Add as little overhead to the stream data as possible.
3.1.3 Support seeking within VOD files.
3.1.4 Minimize added latency after a seek.
3.1.5 Support live streaming.
3.1.6 Strive compatibility with CENC.
3.1.7 Lowest possible startup latency.
Having one common encryption for WebM benefits both the delivery side and client comsumption.
The WebM common encryption algorithm is AES. The key size is 128 bit. Information on how the blocks are encrypted is stored in the Track element and interleaved with the Block’s data.
A master element named ContentEncAESSettings
is added as a
sub-element of the ContentEncryption
element, which contains elements
representing the features of AES. ContentEncAESSettings
contains one sub element. AESSettingsCipherMode
conveys the
block cipher mode used with the AES encryption.
AESSettingsCipherMode
contains one value, CTR
.
Element Name | L | ID | D | T | Description |
---|---|---|---|---|---|
ContentEncryption |
5 | [50][35] |
- | m | Settings describing the encryption used. MUST be present if the value of ContentEncodingType is 1 and absent otherwise. |
ContentEncAESSettings |
6 | [47][E7] |
- | m | Settings describing the encryption algorithm used. If ContentEncAlgo != 5 this MUST be absent. |
AESSettingsCipherMode |
7 | [47][E8] |
1 | u | The cipher mode used in the encryption. Predefined values: 1 - CTR |
With these new elements, clients should be able to decode frames encoded with AES.
The following Matroska elements and values are added to the WebM specification.
ContentEncryption
ContentEncAlgo
(Supported AES value = 5)ContentEncKeyID
ContentEncAESSettings
AESSettingsCipherMode
(Supported CTR value = 1)The payload of unencrypted Blocks is comprised of two parts. The first part is the Signal Byte. The last part is frame data.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Signal Byte | |
+-+-+-+-+-+-+-+-+ |
: Bytes 1..N of unencrypted frame :
| |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The payload of a Full-sample Encrypted Block is comprised of three parts. The first part is the Signal Byte. The second part is the IV. The last part of an Encrypted Block payload is frame data. The only part of the Block that is encrypted is the frame data.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Signal Byte | |
+-+-+-+-+-+-+-+-+ IV |
| |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | |
|-+-+-+-+-+-+-+-+ |
: Bytes 1..N of encrypted frame :
| |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The Subsample Encrypted Block format extends the Full-sample format by setting
a "partitioned" (P) bit in the Signal Byte. If this bit is set, the
EncryptedBlock
header shall include an 8-bit integer indicating the number
of sample partitions (dividers between clear/encrypted sections), and a series
of 32-bit integers in big-endian encoding indicating the byte offsets of such
partitions.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Signal Byte | |
+-+-+-+-+-+-+-+-+ IV |
| |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | num_partition | Partition 0 offset -> |
|-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-|
| -> Partition 0 offset | ... |
|-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-|
| ... | Partition n-1 offset -> |
|-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-|
| -> Partition n-1 offset | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| Clear/encrypted sample data |
| |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The samples shall be partitioned into alternating clear and encrypted sections, always starting with a clear section. Generally for n clear/encrypted sections there shall be n-1 partition offsets. However, if it is required that the first section be encrypted, then the first partition shall be at byte offset 0 (indicating a zero-size clear section), and there shall be n partition offsets.
Please refer to the "Sample Encryption" description of the "Common Encryption" section of the VP Codec ISO Media File Format Binding Specification for more detail on how subsample encryption is implemented.
0 1 2 3 4 5 6 7
+-+-+-+-+-+-+-+-+
|X| RSV |P|E|
+-+-+-+-+-+-+-+-+
num_partitions
byte, and num_partitions
* 32-bit
partition offsets. This bit can only be set if the E bit is also set.The IV MUST be unique for every frame for a given key. The IV SHOULD start with a random value on the first encrypted frame.
The IV MUST be increased by 1 for every encrypted frame. The IV MUST be stored as a raw stream of bytes. Incrementing of the IV should be treated as an unsigned 64 bit number, i.e., if the IV value of the current encrypted frame is 0xFFFFFFFFFFFFFFFF, then the IV value of the next encrypted frame should be 0.
The Counter Block Format generation is only valid if the stream has a
ContentEncAlgo
=5 and a AESSettingsCipherMode
=1. If the stream has any
values that are different then this, Counter Block Format generation MUST NOT
be used.
Every encrypted frame MUST reinitialize the decryptor with a unique Counter Block. Each Counter Block MUST be unique within the same stream for the same encryption key. All Counter Blocks MUST be 16 bytes.
The most significant 8 bytes of the Counter Block is the IV, which is set from the IV data in the encrypted Block. The least significant 8 bytes is the Block Counter that is initialized to 0.
After encrypting a frame there may be excess key stream data. This data MUST be discarded before the next frame is encrypted.
IV = 0xFFFFFFFFFFFFFFFE
Block Counter = 0x0000000000000000
Counter Block = 0xFFFFFFFFFFFFFFFE0000000000000000
IV = 0xFFFFFFFFFFFFFFFF
Block Counter = 0x0000000000000000
Counter Block = 0xFFFFFFFFFFFFFFFF0000000000000000
IV = 0x0000000000000000
Block Counter = 0x0000000000000000
Counter Block = 0x00000000000000000000000000000000
Acquiring keys for the decryption may take longer than some clients deem acceptable. To speed startup, it is recommended to create Tracks that have the first number of frames unencrypted.
Lacing is not supported.
Version | Comment |
---|---|
1.1 | Add subsample encrypted block and partitioning scheme. |
1.0 | Initial public release. |
0.5 | Changed storing of IV values to be a raw stream of bytes. |
0.4 | Removed HMAC. |
0.3 | Frames may be encrypted or unencrypted. Adding signal byte to every frame. Adding Use Cases. |
0.2 | Changing IV prepended to every frame. |
0.1 | First released revision. All frames encrypted. HMAC prepended to every frame. IV derived from Block timestamp. |