WebM 容器准则

概述

WebM 是由开源的 WebM 项目推广的数字多媒体容器文件格式。它由 Matroska 多媒体容器格式的子集组成。

目标

短期:

  • 选择一款适合 VP8 以及开放网络的理想容器格式。
  • 使网络内容提供者易于创建和分发 VP8 视频。

长期:

  • Foster the popularity of our open format so video can be enjoyed everywhere, with no effort by users.

命名

容器格式名称 WebM
文件扩展名 .webm
MIME 类型 video/webm
纯音频的 MIME 类型 audio/webm
视频编解码器名称 VP8

HTML5 视频类型参数

视频编解码

  • VP8: vp8.X codec is vp8, bitstream is version X.
    • VP8 当前只有位流版本 0。
    • 对于要求 fourcc 的应用程序将匹配到数值为VP8X 的 fourcc。

音频编解码

  • Vorbis
    • 纯音频文件的 mime 类型应为 “audio/webm”

canPlayType 函数

  • canPlayType('video/webm') 应返回 “maybe”
  • canPlayType('audio/webm') 应返回 “maybe”
  • canPlayType('video/webm; codecs="vp8, vorbis"') 应返回 “probably”
  • canPlayType('video/webm; codecs="vp8.0, vorbis"') 应返回 “probably”
  • canPlayType('audio/webm; codecs="vorbis"') 应返回 “probably”

Smart Client

One of the major goals is to allow content creators to have advanced playback capabilities, such as fast seeking and fast start using only an HTTP server. To achieve this, the WebM file format guidelines below should be followed when creating content.

WebM 准则

本指南目前针对通过 HTTP 连接串流的文件,且标明了同更宽松的 Matroska 规范(http://www.matroska.org/technical/specs/index.html) 相比 WebM 要求相对严格的领域。

解流器与混流器准则

  • DocType 元素  为 “webm”。
  • Video 编解码  为 VP8。
    • Codec ID 为 “V_VP8”。
    • VP8 没有 CodecPrivate 数据。
  • 音频编解码  为 Vorbis。
  • Initial WebM release does not support subtitles.
    • WHATWG / W3C RFC will release guidance on subtitles and other overlays in HTML5 <video> in the near future. WebM intends to follow that guidance.
  • DocReadTypeVersion should follow the Matroska specification.
    • Example: Files with v2 elements should have a DocReadTypeVersion of 2.

混流器准则

Muxers should treat all guidelines marked should in this section as must. This will foster consistency across WebM files in the real world.

  • WebM should contain the SeekHead element.
    • Reason: Allows the client to know if the file contains a Cues element.
  • WebM files should include a keyframe-only Cues element.
    • The Cues element should contain only video key frames, to decrease the size of the file header.
    • It is recommended that the Cues element be before any clusters, so that the client can seek to a point in the data that has not yet been downloaded in a single seek operation. Ref: a tool that will put the Cues at the front.
  • All absolute (block + cluster) timecodes must be strictly increasing.
    • All timecodes are associated with the start time of the block.
  • The TimecodeScale element should be set to a default of 1.000.000 nanoseconds.
    • Reason: Allows every cluster to have blocks with positive values up to 32.767 seconds.
  • Key frames should be placed at the beginning of clusters.
    • Having key frames at the beginning of clusters should make seeking faster and easier for the client.
  • Audio blocks that contain the video key frame’s timecode should be in the same cluster as the video key frame block.
  • Audio blocks that have same absolute timecode as video blocks should be written before the video blocks.
  • WebM files must only support pixels for the DisplayUnit element.
  • VP8 frames should be muxed in the SimpleBlock element.

VP8 Alternate Reference Frames

When enabled, the VP8 encoder will at its discretion inject a new frame — the alternate reference (AR) — into the output prior to the frame that depends on it. There will be at MOST 1 frame added between I/P-frames. The dependent frame (D) will always be a P-frame. The AR will be marked with the invisible flag by the codec SDK. This frame must be decoded before D, but will produce no output on its own.

To satisfy the monotonically-increasing requirement, the encoder will currently set the AR’s timestamp to 1 less than D’s. This means the time base given to the encoder at configure time must be granular enough to allow this, e.g., at least 2X framerate.

Ideally the AR’s timestamp should be as close as possible to frame D-1 to allow the decoder as much time as possible to decode AR before needing to display D.

Encode Example
Input F0 F1
Output I/P AR D
PTS 0 1 2
Decode Example
Input I/P AR D
Output F0 F1
PTS 0 2

Demuxer Guidelines

  • The demuxer must only open webm DocType files.
  • Once the demuxer deems the header and metadata of the file to be valid for a WebM file and the player starts playing the file, the demuxer must do all it can to parse the file, so playback can occur as correctly as possible.
  • Seeking will be disabled if the webm file does not have a key frame Cues element.
    • The project is considering support for seeking without a Cues element.

Current Implementation Details

At initial release, WebM supports a subset of the Matroska specification. Support for additional Matroska functionality will be under consideration as the project matures.

Following is a more detailed description of the currently supported elements, and the features still being evaluated.

EBML Basics

Priority Element Name Description
Supported EBML Set the EBML characteristics of the data to follow. Each EBML document has to start with this.
Supported EBMLVersion The version of EBML parser used to create the file.
Supported EBMLReadVersion The minimum EBML version a parser has to support to read this file.
Supported EBMLMaxIDLength The maximum length of the IDs you’ll find in this file (4 or less in Matroska).
Supported EBMLMaxSizeLength The maximum length of the sizes you’ll find in this file (8 or less in Matroska). This does not override the element size indicated at the beginning of an element. Elements that have an indicated size which is larger than what is allowed by EBMLMaxSizeLength shall be considered invalid.
Supported DocType A string that describes the type of document that follows this EBML header (‘webm’ in our case).
Supported DocTypeVersion The version of DocType interpreter used to create the file.
Supported DocTypeReadVersion The minimum DocType version an interpreter has to support to read this file.

Global Elements (Used throughout the format)

Priority Element Name Description
Undecided CRC-32 The CRC is computed on all the data of the Master element it’s in, regardless of its position. It’s recommended to put the CRC value at the beggining of the Master element for easier reading. All level 1 elements should include a CRC-32.
Supported Void Used to void damaged data, to avoid unexpected behaviors when using damaged data. The content is discarded. Also used to reserve space in a sub-element for later use.
Signature Start
Undecided SignatureSlot Contain signature of some (coming) elements in the stream.
Undecided SignatureAlgo Signature algorithm used (1=RSA, 2=elliptic).
Undecided SignatureHash Hash algorithm used (1=SHA1-160, 2=MD5).
Undecided SignaturePublicKey The public key to use with the algorithm (in the case of a PKI-based signature).
Undecided Signature The signature of the data (until a new.
Undecided SignatureElements Contains elements that will be used to compute the signature.
Undecided SignatureElementList A list consists of a number of consecutive elements that represent one case where data is used in signature. Ex: Cluster
Undecided SignedElement An element ID whose data will be used to compute the signature.
Signature End

Segment

Priority Element Name Description
Supported Segment This element contains all other top-level (level 1) elements. Typically a Matroska file is composed of 1 segment.

Meta Seek Information

Priority Element Name Description
Supported SeekHead Contains the position of other level 1 elements.
Supported Seek Contains a single seek entry to an EBML element.
Supported SeekID The binary ID corresponding to the element name.
Supported SeekPosition The position of the element in the segment in octets (0 = first level 1 element).

Segment Information

Priority Element Name Description
Supported Info Contains miscellaneous general information and statistics on the file.
Undecided SegmentUID A randomly generated unique ID to identify the current segment between many others (128 bits).
Undecided SegmentFilename A filename corresponding to this segment.
Undecided PrevUID A unique ID to identify the previous chained segment (128 bits).
Undecided PrevFilename An escaped filename corresponding to the previous segment.
Undecided NextUID A unique ID to identify the next chained segment (128 bits).
Undecided NextFilename An escaped filename corresponding to the next segment.
Undecided SegmentFamily A randomly generated unique ID that all segments related to each other must use (128 bits).
Undecided ChapterTranslate A tuple of corresponding ID used by chapter codecs to represent this segment.
Undecided ChapterTranslateEditionUID Specify an edition UID on which this correspondance applies. When not specified, it means for all editions found in the segment.
Undecided ChapterTranslateCodec The chapter codec using this ID (0: Matroska Script, 1: DVD-menu).
Undecided ChapterTranslateID The binary value used to represent this segment in the chapter codec data. The format depends on the ChapProcessCodecID used.
Supported TimecodeScale Timecode scale in nanoseconds (1.000.000 means all timecodes in the segment are expressed in milliseconds).
Supported Duration Duration of the segment (based on TimecodeScale).
Supported DateUTC Date of the origin of timecode (value 0), i.e. production date.
Undecided Title General name of the segment.
Supported MuxingApp Muxing application or library (“libmatroska-0.4.3”).
Supported WritingApp Writing application (“mkvmerge-0.3.3”).

Cluster

Priority Element Name Description
Supported Cluster The lower level element containing the (monolithic) Block structure.
Supported Timecode Absolute timecode of the cluster (based on TimecodeScale).
Undecided SilentTracks The list of tracks that are not used in that part of the stream. It is useful when using overlay tracks on seeking. Then you should decide what track to use.
Undecided SilentTrackNumber One of the track number that are not used from now on in the stream. It could change later if not specified as silent in a further Cluster.
Undecided Position Position of the Cluster in the segment (0 in live broadcast streams). It might help to resynchronise offset on damaged streams.
Supported PrevSize Size of the previous Cluster, in octets. Can be useful for backward playing.
Supported BlockGroup Basic container of information containing a single Block or BlockVirtual, and information specific to that Block/VirtualBlock.
Supported Block Block containing the actual data to be rendered and a timecode relative to the Cluster Timecode.
Undecided BlockVirtual A Block with no data. It must be stored in the stream at the place the real Block should be in display order.
Undecided BlockAdditions Contain additional blocks to complete the main one. An EBML parser that has no knowledge of the Block structure could still see and use/skip these data.
Undecided BlockMore Contain the BlockAdditional and some parameters.
Undecided BlockAddID An ID to identify the BlockAdditional level.
Undecided BlockAdditional Interpreted by the codec as it wishes (using the BlockAddID).
Supported BlockDuration The duration of the Block (based on TimecodeScale). This element is mandatory when DefaultDuration is set for the track. When not written and with no DefaultDuration, the value is assumed to be the difference between the timecode of this Block and the timecode of the next Block in “display” order (not coding order). This element can be useful at the end of a Track (as there is not other Block available), or when there is a break in a track like for subtitle tracks.
Undecided ReferencePriority This frame is referenced and has the specified cache priority. In cache only a frame of the same or higher priority can replace this frame. A value of 0 means the frame is not referenced.
Supported ReferenceBlock Timecode of another frame used as a reference (ie: B or P frame). The timecode is relative to the block it’s attached to.
Undecided ReferenceVirtual Relative position of the data that should be in position of the virtual block.
Undecided CodecState The new codec state to use. Data interpretation is private to the codec. This information should always be referenced by a seek entry.
Undecided Slices Contains slices description.
Undecided TimeSlice Contains extra time information about the data contained in the Block. While there are a few files in the wild with this element, it is no longer in use and has been deprecated. Being able to interpret this element is not required for playback.
Supported LaceNumber The reverse number of the frame in the lace (0 is the last frame, 1 is the next to last, etc). While there are a few files in the wild with this element, it is no longer in use and has been deprecated. Being able to interpret this element is not required for playback.
Undecided FrameNumber The number of the frame to generate from this lace with this delay (allow you to generate many frames from the same Block/Frame).
Undecided BlockAdditionID The ID of the BlockAdditional element (0 is the main Block).
Undecided Delay The (scaled) delay to apply to the element.
Undecided Duration The (scaled) duration to apply to the element.
Supported SimpleBlock Similar to Block but without all the extra information, mostly used to reduced overhead when no extra feature is needed.
Undecided EncryptedBlock Similar to SimpleBlock but the data inside the Block are Transformed (encrypt and/or signed).

Track

Priority Element Name Description
Supported Tracks A top-level block of information with many tracks described.
Supported TrackEntry Describes a track with all elements.
Supported TrackNumber The track number as used in the Block Header (using more than 127 tracks is not encouraged, though the design allows an unlimited number).
Supported TrackUID A unique ID to identify the Track. This should be kept the same when making a direct stream copy of the Track to another file.
Supported TrackType A set of track types coded on 8 bits (1: video, 2: audio, 3: complex, 0x10: logo, 0x11: subtitle, 0x12: buttons, 0x20: control).
Supported FlagEnabled Set if the track is used.
Supported FlagDefault Set if that track (audio, video or subs) SHOULD be used if no language found matches the user preference.
Supported FlagForced Set if that track MUST be used during playback. There can be many forced track for a kind (audio, video or subs), the player should select the one which language matches the user preference or the default + forced track. Overlay MAY happen between a forced and non-forced track of the same kind.
Supported FlagLacing Set if the track may contain blocks using lacing.
Undecided MinCache The minimum number of frames a player should be able to cache during playback. If set to 0, the reference pseudo-cache system is not used.
Undecided MaxCache The maximum cache size required to store referenced frames in and the current frame. 0 means no cache is needed.
Supported DefaultDuration Number of nanoseconds (i.e. not scaled) per frame.
Undecided TrackTimecodeScale The scale to apply on this track to work at normal speed in relation with other tracks (mostly used to adjust video speed when the audio length differs).
{{page.m} safe} TrackOffset
Undecided MaxBlockAdditionID The maximum value of BlockAddID. A value 0 means there is no BlockAdditions for this track.
Supported Name A human-readable track name.
Supported Language Specifies the language of the track in the Matroska languages form.
Supported CodecID An ID corresponding to the codec, see the codec page for more info.
Supported CodecPrivate Private data only known to the codec.
Supported CodecName A human-readable string specifying the codec.
Undecided AttachmentLink The UID of an attachment that is used by this codec.
Undecided CodecSettings A string describing the encoding setting used.
Undecided CodecInfoURL A URL to find information about the codec used.
Undecided CodecDownloadURL A URL to download about the codec used.
Undecided CodecDecodeAll The codec can decode potentially damaged data.
Undecided TrackOverlay Specify that this track is an overlay track for the Track specified (in the u-integer). That means when this track has a gap (see SilentTracks) the overlay track should be used instead. The order of multiple TrackOverlay matters, the first one is the one that should be used. If not found it should be the second, etc.
Undecided TrackTranslate The track identification for the given Chapter Codec.
Undecided TrackTranslateEditionUID Specify an edition UID on which this translation applies. When not specified, it means for all editions found in the segment.
Undecided TrackTranslateCodec The chapter codec using this ID (0: Matroska Script, 1: DVD-menu).
Undecided TrackTranslateTrackID The binary value used to represent this track in the chapter codec data. The format depends on the ChapProcessCodecID used.
Video Start
Supported Video Video settings.
Supported FlagInterlaced Set if the video is interlaced.
Undecided StereoMode Stereo-3D video mode on 2 bits (0: mono, 1: right eye, 2: left eye, 3: both eyes).
Supported PixelWidth Width of the encoded video frames in pixels.
Supported PixelHeight Height of the encoded video frames in pixels.
Supported PixelCropBottom The number of video pixels to remove at the bottom of the image (for HDTV content).
Supported PixelCropTop The number of video pixels to remove at the top of the image.
Supported PixelCropLeft The number of video pixels to remove on the left of the image.
Supported PixelCropRight The number of video pixels to remove on the right of the image.
Supported DisplayWidth Width of the video frames to display.
Supported DisplayHeight Height of the video frames to display.
Supported DisplayUnit Type of the unit for DisplayWidth/Height (0: pixels, 1: centimeters, 2: inches). Pixels only supported.
Supported AspectRatioType Specify the possible modifications to the aspect ratio (0: free resizing, 1: keep aspect ratio, 2: fixed).
Undecided ColourSpace Same value as in AVI (32 bits).
Undecided GammaValue Gamma Value.
Supported FrameRate Number of frames per second. Informational only.
Video End
Audio Start
Supported Audio Audio settings.
Supported SamplingFrequency Sampling frequency in Hz.
Supported OutputSamplingFrequency Real output sampling frequency in Hz (used for SBR techniques).
Supported Channels Numbers of channels in the track.
Undecided ChannelPositions Table of horizontal angles for each successive channel, see appendix.
Supported BitDepth Bits per sample, mostly used for PCM.
Audio End
Content Encoding Start
Undecided ContentEncodings Settings for several content encoding mechanisms like compression or encryption.
Undecided ContentEncoding Settings for one content encoding like compression or encryption.
Undecided ContentEncodingOrder Tells when this modification was used during encoding/muxing starting with 0 and counting upwards. The decoder/demuxer has to start with the highest order number it finds and work its way down. This value has to be unique over all ContentEncodingOrder elements in the segment.
Undecided ContentEncodingScope A bit field that describes which elements have been modified in this way. Values (big endian) can be OR’ed. Possible values: 1 – all frame contents; 2 – the track’s private data; 4 – the next ContentEncoding (next ContentEncodingOrder. Either the data inside ContentCompression and/or ContentEncryption)
Undecided ContentEncodingType A value describing what kind of transformation has been done. Possible values: 0 – compression; 1 – encryption.
Undecided ContentCompression Settings describing the compression used. Must be present if the value of ContentEncodingType is 0 and absent otherwise. Each block must be decompressable even if no previous block is available in order not to prevent seeking.
Undecided ContentCompAlgo The compression algorithm used. Algorithms that have been specified so far are: 0 – zlib; 1 – bzlib; 2 – lzo1x; 3 – Header Stripping.
Undecided ContentCompSettings Settings that might be needed by the decompressor. For Header Stripping (ContentCompAlgo=3), the bytes that were removed from the beggining of each frame of the track.
Undecided ContentEncryption Settings describing the encryption used. Must be present if the value of ContentEncodingType is 1 and absent otherwise.
Undecided ContentEncAlgo The encryption algorithm used. The value ‘0’ means that the contents have not been encrypted but only signed. Predefined values: 1 – DES; 2 – 3DES; 3 – Twofish; 4 – Blowfish; 5 – AES.
Undecided ContentEncKeyID For public key algorithms this is the ID of the public key the the data was encrypted with.
Undecided ContentSignature A cryptographic signature of the contents.
Undecided ContentSigKeyID This is the ID of the private key the data was signed with.
Undecided ContentSigAlgo The algorithm used for the signature. A value of ‘0’ means that the contents have not been signed but only encrypted. Predefined values: 1 – RSA
Undecided ContentSigHashAlgo The hash algorithm used for the signature. A value of ‘0’ means that the contents have not been signed but only encrypted. Predefined values: 1 – SHA1-160; 2 – MD5.
Content Encoding End

Cueing Data

Priority Element Name Description
Supported Cues A top-level element to speed seeking access. All entries are local to the segment.
Supported CuePoint Contains all information relative to a seek point in the segment.
Supported CueTime Absolute timecode according to the segment time base.
Supported CueTrackPositions Contain positions for different tracks corresponding to the timecode.
Supported CueTrack The track for which a position is given.
Supported CueClusterPosition The position of the Cluster containing the required Block.
Supported CueBlockNumber Number of the Block in the specified Cluster.
Undecided CueCodecState The position of the Codec State corresponding to this Cue element. 0 means that the data is taken from the initial Track Entry.
Undecided CueReference The Clusters containing the required referenced Blocks.
Undecided CueRefTime Timecode of the referenced Block.
Undecided CueRefCluster Position of the Cluster containing the referenced Block.
Undecided CueRefNumber Number of the referenced Block of Track X in the specified Cluster.
Undecided CueRefCodecState The position of the Codec State corresponding to this referenced element. 0 means that the data is taken from the initial Track Entry.

Attachment

Priority Element Name Description
Undecided Attachments Contain attached files.
Undecided AttachedFile An attached file.
Undecided FileDescription A human-friendly name for the attached file.
Undecided FileName Filename of the attached file.
Undecided FileMimeType MIME type of the file.
Undecided FileData The data of the file.
Undecided FileUID Unique ID representing the file, as random as possible.
Undecided FileReferral A binary value that a track/codec can refer to when the attachment is needed.

Chapters

Priority Element Name Description
Undecided Chapters A system to define basic menus and partition data. For more detailed information, look at the Chapters Explanation.
Undecided EditionEntry Contains all information about a segment edition.
Undecided EditionUID A unique ID to identify the edition. It’s useful for tagging an edition.
Undecided EditionFlagHidden If an edition is hidden (1), it should not be available to the user interface (but still to Control Tracks).
Undecided EditionFlagDefault If a flag is set (1) the edition should be used as the default one.
Undecided EditionFlagOrdered Specify if the chapters can be defined multiple times and the order to play them is enforced.
Undecided ChapterAtom Contains the atom information to use as the chapter atom (apply to all tracks).
Undecided ChapterUID A unique ID to identify the Chapter.
Undecided ChapterTimeStart Timecode of the start of Chapter (not scaled).
Undecided ChapterTimeEnd Timecode of the end of Chapter (timecode excluded, not scaled).
Undecided ChapterFlagHidden If a chapter is hidden (1), it should not be available to the user interface (but still to Control Tracks).
Undecided ChapterFlagEnabled Specify wether the chapter is enabled. It can be enabled/disabled by a Control Track. When disabled, the movie should skip all the content between the TimeStart and TimeEnd of this chapter.
Undecided ChapterSegmentUID A segment to play in place of this chapter. Edition ChapterSegmentEditionUID should be used for this segment, otherwise no edition is used.
Undecided ChapterSegmentEditionUID The edition to play from the segment linked in ChapterSegmentUID.
Undecided ChapterPhysicalEquiv Specify the physical equivalent of this ChapterAtom like “DVD” (60) or “SIDE” (50), see complete list of values.
Undecided ChapterTrack List of tracks on which the chapter applies. If this element is not present, all tracks apply
Undecided ChapterTrackNumber UID of the Track to apply this chapter too. In the absense of a control track, choosing this chapter will select the listed Tracks and deselect unlisted tracks. Absense of this element indicates that the Chapter should be applied to any currently used Tracks.
Undecided ChapterDisplay Contains all possible strings to use for the chapter display.
Undecided ChapString Contains the string to use as the chapter atom.
Undecided ChapLanguage The languages corresponding to the string, in the bibliographic ISO-639-2 form.
Undecided ChapCountry The countries corresponding to the string, same 2 octets as in Internet domains.
Undecided ChapProcess Contains all the commands associated to the Atom.
Undecided ChapProcessCodecID Contains the type of the codec used for the processing. A value of 0 means native Matroska processing (to be defined), a value of 1 means the DVD command set is used. More codec IDs can be added later.
Undecided ChapProcessPrivate Some optional data attached to the ChapProcessCodecID information. For ChapProcessCodecID = 1, it is the “DVD level” equivalent.
Undecided ChapProcessCommand Contains all the commands associated to the Atom.
Undecided ChapProcessTime Defines when the process command should be handled (0: during the whole chapter, 1: before starting playback, 2: after playback of the chapter).
Undecided ChapProcessData Contains the command information. The data should be interpreted depending on the ChapProcessCodecID value. For ChapProcessCodecID = 1, the data correspond to the binary DVD cell pre/post commands.

Tagging

Priority Element Name Description
Undecided Tags Element containing elements specific to Tracks/Chapters. A list of valid tags can be found here.
Undecided Tag Element containing elements specific to Tracks/Chapters.
Undecided Targets Contain all UIDs where the specified meta data apply. It is void to describe everything in the segment.
Undecided TargetTypeValue A number to indicate the logical level of the target (see TargetType).
Undecided TargetType An informational string that can be used to display the logical level of the target like “ALBUM”, “TRACK”, “MOVIE”, “CHAPTER”, etc (see TargetType).
Undecided TrackUID A unique ID to identify the Track(s) the tags belong to. If the value is 0 at this level, the tags apply to all tracks in the Segment.
Undecided EditionUID A unique ID to identify the EditionEntry(s) the tags belong to. If the value is 0 at this level, the tags apply to all editions in the Segment.
Undecided ChapterUID A unique ID to identify the Chapter(s) the tags belong to. If the value is 0 at this level, the tags apply to all chapters in the Segment.
Undecided AttachmentUID A unique ID to identify the Attachment(s) the tags belong to. If the value is 0 at this level, the tags apply to all the attachments in the Segment.
Undecided SimpleTag Contains general information about the target.
Undecided TagName The name of the Tag that is going to be stored.
Undecided TagLanguage Specifies the language of the tag specified, in the Matroska languages form.
Undecided TagDefault Indication to know if this is the default/original language to use for the given tag.
Undecided TagString The value of the Tag.
Undecided TagBinary The values of the Tag if it is binary. Note that this cannot be used in the same SimpleTag as TagString.

发表评论

电子邮件地址不会被公开。 必填项已用*标注