In continuing our tradition of “in simple words about complex”, and considering we do a lot of work on projects related to video content on the Internet, I want to address some questions about codecs and containers we get from our clients.

We’ll start with the fact that media information (video and audio) must be saved in a digital format, and it would be good if it were saved in a compressed format. Codecs do this (codec, coder-decoder). They convert media information into digital streams and back.


It shall be said that formally, codecs are the implementation of a coding/decoding standard, or in other words, a program or algorithm. Still, codecs are sometimes referred to as the coding standard.

Coding standards typically leave finding algorithms for the implementation of these standards to developers, and codecs from different producers, implementing one and the same standard, can vary greatly in speed and quality at the same bit rate. The concept of “quality”, of course, is very subjective, and different people have different opinions about which codec has higher quality.

Codecs exist with both a loss of quality (lossy) and without a loss of quality (lossless). Codecs without a loss of quality compress to a certain theoretical limit. This limit is large and media takes up a lot of space. For example, an hour of music takes up 400 MB. This, of course, depends on the relative sampling frequency, rate (for audio), image size, number of frames per second (for video), but all the same, on average, it’s a lot. Codecs with a loss of quality can compress quite a bit, but at the cost of a loss in quality, obviously :-).


Examples of codecs: H.264/MPEG-4 AVC, Theora, VP6, MPEG-2 Audio Layer 3. The first three are video codecs; the last one is an audio codec. This list of course is far from being complete.

It would be good to somehow store a digital stream created by an encoder (a codec’s coding element). It would also be good to store a video stream together with an audio stream. Containers (media containers) store a heap of meta-information: length, description, etc. Their job is to compile an index where the parts of a stream are kept, combine these parts in order to lower unnecessary loading when reading video and audio streams in parallel, arrange indices corresponding to frame spacing in bytes, etc.


Generally, the choice of the codec does not depend on the container, although some containers can only store specific codecs. At the same time, some codecs apply certain limitations to containers. For example, codecs that use b-frames need to be put in containers that support b-frames.

Examples of containers: AVI, MPEG-4 Part 14 (mp4), Matroska, Ogg.

New containers and codecs are created fairly often according to the NIH-principle, but some have their own features and are optimized for one purpose or another. For example, the flv container was initially developed for the facilitation of its embedding in swf files and the ability to support the screenshare codec, which was created especially for transferring video streams from a computer monitor. Also, flv contains meta-information with a frame index, which is convenient for “compressing” videos when returning them with progressive http download. Of course, it must be noted that this is not the only container that uses this.

This leads us to the question of stream broadcasting over the Internet. That is, far from all routers are prepared to take part in an IGMP, so multicasts over the Internet don’t work. Thus, separate data streams are made for every client, and if several clients from one network want to watch one synchronized stream, then several identical copies of data are sent over the network, one copy to each client.

A broadcast streaming server’s task is to return data fast enough for viewing with the ability to accept commands to start, stop, rewind and return a stream accordingly. The stream is returned from one of the streaming protocols, RTP/RTSP or RTMP for example. The streaming server doesn’t “know” anything about the media-data itself other than meta-information, according to which it can associate frame number with position in a file. Streaming servers don’t have codecs, so they receive the bitstream from a file the same way they return it.

So, the streaming protocols is in a sense behaves as media-data container.