As a case study we'll consider the playback of a video through the <video>
tag.
<video>
(and <audio>
) starts in blink::HTMLMediaElement
in third_party/blink/ and reaches third_party/blink/public/platform/media/ in media::WebMediaPlayerImpl
after a brief hop through content::MediaFactory
. Each blink::HTMLMediaElement
owns a media::WebMediaPlayerImpl
for handling things like play, pause, seeks, and volume changes (among other things).
media::WebMediaPlayerImpl
handles or delegates media loading over the network as well as demuxer and pipeline initialization. media::WebMediaPlayerImpl
owns a media::PipelineController
which manages the coordination of a media::DataSource
, media::Demuxer
, and media::Renderer
during playback.
During a normal playback, the media::Demuxer
owned by WebMediaPlayerImpl may be either media::FFmpegDemuxer
or media::ChunkDemuxer
. The ffmpeg variant is used for standard src= playback where WebMediaPlayerImpl is responsible for loading bytes over the network. media::ChunkDemuxer
is used with Media Source Extensions (MSE), where JavaScript code provides the muxed bytes.
The media::Renderer is typically media::RendererImpl
which owns and coordinates media::AudioRenderer
and media::VideoRenderer
instances. Each of these in turn own a set of media::AudioDecoder
and media::VideoDecoder
implementations. Each issues an async read to a media::DemuxerStream
exposed by the media::Demuxer
which is routed to the right decoder by media::DecoderStream
. Decoding is again async, so decoded frames are delivered at some later time to each renderer.
The media/ library contains hardware decoder implementations in media/gpu for all supported Chromium platforms, as well as software decoding implementations in media/filters backed by FFmpeg and libvpx. Decoders are attempted in the order provided via the media::RendererFactory
; the first one which reports success will be used for playback (typically the hardware decoder for video).
Each renderer manages timing and rendering of audio and video via the event- driven media::AudioRendererSink
and media::VideoRendererSink
interfaces respectively. These interfaces both accept a callback that they will issue periodically when new audio or video frames are required.
On the audio side, again in the normal case, the media::AudioRendererSink
is driven via a base::SyncSocket
and shared memory segment owned by the browser process. This socket is ticked periodically by a platform level implementation of media::AudioOutputStream
within media/audio.
On the video side, the media::VideoRendererSink
is driven by async callbacks issued by the compositor to media::VideoFrameCompositor
. The media::VideoRenderer
will talk to the media::AudioRenderer
through a media::TimeSource
for coordinating audio and video sync.
With that we‘ve covered the basic flow of a typical playback. When debugging issues, it’s helpful to review the internal logs at chrome://media-internals. The internals page contains information about active media::WebMediaPlayerImpl
, media::AudioInputController
, media::AudioOutputController
, and media::AudioOutputStream
instances.
Media playback typically involves multiple threads, in many cases even multiple processes. Media operations are often asynchronous running in a sandbox. These make attaching a debugger (e.g. GDB) sometimes less efficient than other mechanisms like logging.
In media we use DVLOG() a lot. It makes filename-based filtering super easy. Within one file, not all logs are created equal. To make log filtering more convenient, use appropriate log levels. Here are some general recommendations:
MediaLog will send logs to about://media-internals
, which is easily accessible by developers (including web developes), testers and even users to get detailed information about a playback instance. For guidance on how to use MediaLog, see media/base/media_log.h
.
MediaLog messages should be concise and free of implementation details. Error messages should provide clues as to how to fix them, usually by precisely describing the circumstances that led to the error. Use properties, rather than messages, to record metadata and state changes.