H.264 video coding

First, the development of video coding technology

This article refers to the address: http://

The video coding technology is basically the introduction of the international standard of two series of video coding of H.26x developed by MPEG-x and ITU-T. From the H.261 video coding recommendations to H.262/3, MPEG-1/2/4, etc., there is a common goal that is pursued at the lowest possible bit rate (or storage capacity). Good image quality. Moreover, as the market demand for image transmission increases, the problem of how to adapt to different channel transmission characteristics is also increasingly apparent. So IEO/IEC and ITU-T two international standardization organizations jointly developed a new video standard H.264 to solve these problems.
H.261 was the first video coding recommendation to standardize video coding techniques for conference television and video telephony applications on ISDN networks. It employs an algorithm that combines inter-frame prediction with reduced temporal redundancy and a hybrid coding method that reduces spatially redundant DCT transforms. Matching with the ISDN channel, the output code rate is p × 64 kbit / s. When the value of p is small, only the image with less sharpness is transmitted, which is suitable for face-to-face videophones; when the value of p is large (such as p>6), the conference television image with better definition can be transmitted. H.263 proposes a low bit rate image compression standard, which is technically an improvement and extension of H.261, supporting applications with a bit rate less than 64 kbit/s. However, H.263 and later H.263+ and H.263++ have been developed to support the full-rate application. It can be seen from the support of numerous image formats, such as Sub-QCIF and QCIF. , CIF, 4CIF and even 16CIF formats.

The MPEG-1 standard has a code rate of about 1.2 Mbit/s and can provide 30 frames of CIF (352 × 288) quality images, which is developed for video storage and playback of CD-ROM discs. The basic algorithm of the MPEG-l standard video coding part is similar to H.261/H.263, and also uses motion compensated inter prediction, two-dimensional DCT, VLC run length coding and the like. In addition, concepts such as intra frame (I), prediction frame (P), bidirectional prediction frame (B), and DC frame (D) are introduced, which further improves coding efficiency. On the basis of MPEG-1, the MPEG-2 standard has made some improvements in improving image resolution and compatible digital TV, such as the accuracy of its motion vector is half pixel; in coding operations (such as motion estimation and DCT) Distinguish between "frame" and "field"; introduces coding scalability techniques such as spatial scalability, temporal scalability, and signal-to-noise ratio scalability. The MPEG-4 standard introduced in recent years introduces an audio-visual object (AVO: Audio-Visual Object)-based encoding, which greatly improves the interactive capability and coding efficiency of video communication. Some new technologies are also used in MPEG-4, such as shape coding, adaptive DCT, arbitrary shape video object coding, and so on. But the basic video encoder of MPEG-4 is still a kind of hybrid encoder similar to H.263.

In short, H.261 is a classic video coding, H.263 is its development, and will gradually be replaced in practice, mainly used in communications, but H.263 numerous options often make users feel at a loss. The MPEG family of standards has evolved from applications for storage media to applications that adapt to transmission media. The basic framework for core video coding is consistent with H.261, in which the "object-based coding" part of the compelling MPEG-4 is still There are technical barriers that are currently difficult to apply universally. Therefore, the new video coding proposal H.264 developed on this basis overcomes the weaknesses of both. In the framework of hybrid coding, a new coding method is introduced, which improves the coding efficiency and is practical. At the same time, it is jointly developed by the two major international standardization organizations, and its application prospects should be self-evident.


H.264 Introduction H.264 is a new development developed by ITU-T's VCEG (Video Coding Experts Group) and ISO/IEC MPEG (Moving Picture Coding Experts Group) Joint Video Team (JVT: joint video team) Digital video coding standard, which is both ITU-T H.264 and ISO/IEC Part 10 of MPEG-4. The draft was drafted in January 1998, the first draft was completed in September 1999, its test mode TML-8 was established in May 2001, and the FCD board of H.264 was adopted at the 5th meeting of JVT in June 2002. . Officially released in March 2003.

H.264, like the previous standard, is also a hybrid coding mode for DPCM plus transform coding. However, it adopts the simple design of "return to basic", and does not use many options to obtain much better compression performance than H.263++; it has enhanced the adaptability to various channels, adopting the "network-friendly" structure and syntax. It is beneficial to the processing of error and packet loss; the application target range is wide to meet the needs of different speeds, different resolutions and different transmission (storage) occasions; its basic system is open and use without copyright.

Technically, there are multiple flashes in the H.264 standard, such as unified VLC symbol encoding, high-precision, multi-mode displacement estimation, integer transform based on 4×4 blocks, hierarchical coding syntax, and so on. These measures make the H.264 algorithm have a very high coding efficiency, and can save about 50% of the code rate than H.263 under the same reconstructed image quality. H.264's code stream structure network has strong adaptability, increases error recovery capability, and can adapt well to IP and wireless network applications.

Third, the technical highlights of H.264

1, layered design

The H.264 algorithm can be conceptually divided into two layers: the Video Coding Layer (VCL) is responsible for efficient video content representation, and the Network Abstraction Layer (NAL: Network Abstraction Layer) is responsible for the appropriate way in the network. Pack and transfer data. A packet-based interface is defined between the VCL and the NAL, and the packing and corresponding signaling are part of the NAL. Thus, the tasks of high coding efficiency and network friendliness are performed by VCL and NAL, respectively.

The VCL layer includes block-based motion compensated hybrid coding and some new features. Like the previous video coding standards, H.264 does not include pre- and post-processing features in the draft, which increases the flexibility of the standard.

The NAL is responsible for encapsulating data using a segmentation format of the underlying network, including framing, signaling of logical channels, utilization of timing information, or sequence end signals. For example, NAL supports the transmission format of video over circuit switched channels, supporting formats in which video is transmitted over the Internet using RTP/UDP/IP. The NAL includes its own header information, segment structure information, and actual payload information, that is, upper layer VCL data. (If data segmentation is used, the data may consist of several parts).

2, high precision, multi-mode motion estimation

H.264 supports motion vectors with 1/4 or 1/8 pixel precision. A 6-tap filter can be used to reduce high-frequency noise at 1/4 pixel accuracy, and a more complex 8-tap filter can be used for motion vectors with 1/8 pixel accuracy. When performing motion estimation, the encoder can also select an "enhanced" interpolation filter to improve the prediction effect.

In the motion prediction of H.264, one macroblock (MB) can be divided into different sub-blocks according to FIG. 2 to form block sizes of seven different modes. This multi-mode flexible and meticulous division is more in line with the shape of the actual moving object in the image, greatly improving the accuracy of motion estimation. In this manner, 1, 2, 4, 8, or 16 motion vectors can be included in each macroblock.

In H.264, the encoder is allowed to use more than one frame of previous frames for motion estimation, which is the so-called multi-frame reference technique. For example, a 2 or 3 frame just encoded reference frame, the encoder will choose to give a better predicted frame for each target macroblock and indicate for each macroblock which frame is used for prediction.

3, 4 × 4 block integer transformation

Similar to the previous standard, H.264 uses block-based transform coding for residuals, but the transform is an integer operation rather than a real operation, and the process is basically similar to DCT. The advantage of this method is that it allows for the same transform and inverse transform in the encoder and in the decoder, making it easy to use simple fixed-point arithmetic. In other words, there is no "inverse transformation error" here. The unit of transformation is 4 × 4 blocks, instead of the 8 × 8 blocks that were commonly used in the past. Due to the size reduction of the transform block, the division of the moving object is more accurate, so that not only the calculation amount of the transformation is relatively small, but also the convergence error at the edge of the moving object is greatly reduced. In order to make the conversion method of the small-sized block do not generate the gray-scale difference between the blocks in the smooth area of ​​the large area in the image, the DC coefficients of the 16 4×4 blocks of the intra-macro block luminance data may be generated (each small block) One, a total of 16) performs a second 4×4 block transformation, and performs a 2×2 block transformation on the DC coefficients of four 4×4 blocks of chrominance data (one for each small block, four in total).

In order to improve the ability of rate control, the amplitude of the change of the quantization step is controlled at about 12.5% ​​instead of the constant increase. The normalization of the magnitude of the transform coefficients is processed in the inverse quantization process to reduce the computational complexity. To emphasize the color fidelity, a smaller quantization step size is used for the chrominance coefficient.

4, unified VLC

There are two methods for entropy coding in H.264. One is to use uniform VLC (UVLC: Universal VLC) for all symbols to be coded, and the other is to use content-adaptive binary arithmetic coding (CABAC: Context-Adaptive). Binary Arithmetic Coding). CABAC is optional and its coding performance is slightly better than UVLC, but the computational complexity is also high. UVLC uses an infinite number of codewords, and the design structure is very regular. Different objects can be encoded with the same code table. This method is easy to generate a codeword, and the decoder can easily identify the prefix of the codeword. UVLC can quickly obtain resynchronization when a bit error occurs.

5, intra prediction

In the previous H.26x series and MPEG-x series standards, the interframe prediction method was adopted. In H.264, intra prediction is available when encoding Intra images. For each 4x4 block (except for special handling of edge blocks), each pixel can be predicted with a different weighted sum of 17 closest previously encoded pixels (some weights can be 0), ie this pixel 17 pixels in the upper left corner of the block. Obviously, such intra prediction is not a temporal, but predictive coding algorithm performed in the spatial domain, which can remove the spatial redundancy between adjacent blocks and achieve more efficient compression.

As shown in FIG. 4, a, b, ..., p in the 4x4 block are 16 pixels to be predicted, and A, B, ..., P are encoded pixels. For example, the value of the m point can be predicted by (J+2K+L+2)/ 4 , or by (A+B+C+D+I+J+K+L)/ 8 , and so on. There are 9 different modes of brightness depending on the point of the selected prediction reference, but the intra prediction of chrominance has only the 1 mode.

6, for IP and wireless environments

The H.264 draft includes tools for error elimination to facilitate the transmission of compressed video in error-prone, packet-dropping environments, such as the robustness of transmissions over mobile channels or IP channels.

To combat transmission errors, time synchronization in H.264 video streams can be accomplished by using intra-frame image refresh, which is supported by slice structured coding. At the same time, in order to facilitate resynchronization after error, a certain resynchronization point is also provided in the video data of one image. In addition, the intra macroblock refresh and the multi-reference macroblock allow the encoder to consider not only the coding efficiency but also the characteristics of the transmission channel when determining the macroblock mode.

In addition to using the change of the quantization step size to adapt to the channel code rate, in H.264, the data segmentation method is often used to cope with the change of the channel code rate. In general, the concept of data partitioning is to generate video data with different priorities in the encoder to support quality of service QoS in the network. For example, a syntax-based data partitioning method is used to divide each frame of data into several parts according to its importance, which allows discarding less important information when the buffer overflows. A similar temporal data partitioning method can also be employed, by using multiple reference frames in P and B frames.

In wireless communication applications, we can support large bit rate changes in wireless channels by changing the quantization accuracy or spatial/temporal resolution of each frame. However, in the case of multicast, it is impossible to require the encoder to respond to varying bit rates. Therefore, unlike the method of Fine Granular Scalability (FGS) adopted in MPEG-4 (lower efficiency), H.264 uses stream-switched SP frames instead of hierarchical coding.

Fourth, the performance comparison of H.264

TML-8 is the test mode of H.264, which is used to compare and test the video coding efficiency of H.264. The PSNR provided by the test results clearly shows that the results of H.264 have significant advantages over the performance of MPEG-4 (ASP: Advanced Simple Profile) and H.263++ (HLP: High Latency Profile).

The PSNR of H.264 is significantly better than MPEG-4 (ASP) and H.263++ (HLP). In the comparison test of 6 rates, the PSNR of H.264 is 2dB higher than MPEG-4 (ASP). It is 3dB higher than H.263 (HLP) on average. The six test rates and their associated conditions are: 32 kbit/s rate, 10f/s frame rate and QCIF format; 64 kbit/s rate, 15f/s frame rate and QCIF format; 128kbit/s rate, 15f/s Frame rate and CIF format; 256kbit/s rate, 15f/s frame rate and QCIF format; 512 kbit/s rate, 30f/s frame rate and CIF format; 1024 kbit/s rate, 30f/s frame rate and CIF format.

Multi Core PVC Flexible Cable

Multi Core PVC Flexible Cable,Copper Multi Core PVC Flexible Cable,Multi Core PVC insulated Flexible Cable

Shenzhen Hongyan Wire Industry Co., Ltd. , http://www.hy-cable.com

Posted on