Why use interpolation in video coding?
Motion-compensated prediction (MCP) is the key to the success of the modern video coding standards, as it removes the temporal redundancy in video signals and reduces the size of bitstreams significantly. With MCP, the pixels to be coded are predicted from the temporally neighboring ones, and only the prediction errors and the motion vectors (MV) are transmitted. However, due to the finite sampling rate, the actual position of the prediction in the neighboring frames may be out of the sampling grid, where the intensity is unknown, so the intensities of the positions in between the integer pixels, called sub-positions, must be interpolated and the resolution of MV is increased accordingly.
Interpolation in H.264/AVC
In H.264/AVC, for the resolution of MV is quarter-pixel, the reference frame is interpolated to be 16 times the size for MCP, 4 times both sides. As shown in Fig. 1(a), the interpolation defined in H.264 includes two stages, interpolating the half-pixel and quarter-pixel sub-positions, respectively. The interpolation in the first stage is separable, which means the sampling rate in one direction is doubled by inserting zero-valued samples followed by filtering using a 1-D filter h1, [1, 0, -5, 0, 20, 32, 20, 0, -5, 0, 1]/32, and then the process repeats for the other direction. The second stage, which is non-separable, uses bilinear filtering supported by the integer pixels and the interpolated half-pixel values. The impulse and frequency responses of the interpolation filter in H.264/AVC are shown in Fig. 2 (a) and (b), respectively. As can be seen, this interpolation filter is almost ideal, as it has a square passband with the cutoff frequencies π/4 in both horizontal and vertical directions and very small ripples in the stopband. To fit the general statistics of various video sources, the filter coefficients are fixed.
Fig. 1 Interpolation process of (a) the filter in H.264/AVC, (b) the optimal AIF, and (c) the separable AIF
Fig. 2 Impulse and frequency responses of the normative interpolation filter in H.264/AVC
Review of Adaptive Interpolation Filters (AIF)
Considering the time-varying statistics of video sources, some researchers propose using adaptive interpolation filter (AIF), which is one of the design elements making KTA significantly outperform JM. With AIF, the filter coefficients are optimized on a frame basis, such that for each frame the energy of the MCP error is minimized. The optimal coefficients are quantized, coded, and transmitted as the side information of the associated frame. Our previous post listed all the AIF techniques adopted in KTA. These AIF techniques all use the minimum mean squared error (MMSE) estimator to calculate the coefficients and achieve minimum MCP error, but provide different balances among performance, complexity, and size of the side information, by using different support regions and imposing different symmetry constraints.
2-D non-separable AIF, of which the interpolation process is shown in Fig. 1(b), increases the spatial sampling rate 16 times at one time by zero-insertion, and each sub-position is interpolated directly by filtering the surrounding 6×6 integer pixels. Fig. 3 (a) shows the support region of 2-D non-separable AIF. As the spatial statistics are assumed to be isotropic, the filter h is in circular symmetry and therefore 1/8 of the coefficients are coded, as shown in Fig. 3 (b). The assumption that the spatial statistics are isotropic may not hold for every frame in a video sequence. 2-D separable AIF is proposed, which considers the spatial statistics of horizontal and vertical directions different and reduces the complexity of 2-D non-separable AIF. The 1-D AIFs for the two directions are separately designed. As shown in Fig. 1(c), the horizontal sampling rate is increased four times by zero-insertion and a 1-D filter h1 calculated for the current frame is applied. Then, the process repeats for the vertical direction using h2.
Fig. 3 2-D non-separable AIF’s (a) support region and (b) coded coefficients
Directional AIF (D-AIF) further reduces the complexity, although also following the process in Fig. 1(b). Each sub-position is supported by at most 12 surrounding pixels in a form of diagonal cross. Its support region is shown in Fig. 4 (a). By doing this, the correlation along the diagonal direction can be exploited for interpolation. As D-AIF also has circular symmetry, the coded coefficients are shown in Fig. 4 (b). D-AIF is enhanced by the authors, known as E-DAIF, of which the support region is adaptively switched between diagonal cross (see Fig. 4 (a)) and a radial support (see Fig. 5 (a)).
Fig. 4 D-AIF’s (a) support region and (b) coded coefficients
The enhanced AIF (E-AIF) uses 12-tap filters with a radial support to interpolate sub-positions, as shown in Fig. 5 (a), and adds a 5×5 filter for integer pixels and a DC offset for each integer and sub-position. The horizontal and vertical statistical properties are thought different, so the filter is axisymmetric, of which the coded coefficients are shown in Fig. 5 (b).
Fig. 5 E-AIF’s (a) support region and (b) coded coefficients
Instead of adaptively calculating the filter coefficients for each frame, switched interpolation filter with offset (SIFO) enables frame-level switching between fixed interpolation and sending DC offset for each sub-position. When not using any of the filters, the pixels in different sub-positions and the integer position will be added by their relevant DC offsets, in order to compensate the illumination changes.