Object-oriented coding in the MPEG-4 standard enables the separate processing of foreground objects and the scene background sprite. Since the background sprite only has to be sent once, transmission bandwidth can be saved. We have found that the counter-intuitive approach of splitting the background into several independent parts can reduce the overall amount of data.
In the general case, the synthesis of a single background sprite is even impossible and the scene background must be sent as multiple sprites instead. For this reason, we developed an algorithm that provides an optimal partitioning of a video sequence into independent background sprites (a multi-sprite), resulting in a significant reduction of the involved coding cost.
Additionally, our sprite-generation algorithm ensures that the sprite resolution is kept high enough to preserve all details of the input sequence, which is a problem especially during camera zoom-in operations. Even though our sprite generation algorithm creates multiple sprites instead of only a single background sprite, it is fully compatible with the existing MPEG-4 standard. The total coding cost for the sprite-VOP is reduced by a factor of about 2.6 or even higher, depending on the sequence.
MPEG-4 sprite coding is based on the projective motion model, which in geometric terms is a plane-to-plane mapping (a homography). Thus, the sprite image can be envisioned as the collective projection of all input images onto a single image plane. This leads to increasing perspective image distortions, the more the camera rotates away from the frontal viewing angle. For a standard sprite, a rotating camera setup looks like this:
Our solution to this problem is to use multiple image planes instead of only a single plane for the sprite. This leads to a set of sprite images, which we denote as multi-sprite. A multi-sprite can be constructed in such a way that image distortions are minimized and arbitrary camera movement can be processed. With multi-sprites, the same camera setup from above can now be handled like this:
In order to generate the multi-sprite, it must be decided how the input sequence should be divided into n ranges (1,p1-1), (p1,p2-1), ..., (pn-1,N). To this end, we compute the sprite-size ||Sa;b|| that would result from synthesizing a sprite for each range (a,b). Using this formulation, the partitioning problem can be formulated as finding the partitioning P={ (ai,bi) } which minimizes the total sum of sprite sizes.
This can be solved very efficiently using dynamic programming, by iteratively computing the costs ci, denoting the above sum upto frame i. Obvisouly, we start with c0=0.
Backtracking the choices that were made during the computations of each ci, we obtain the optimal partitioning.
Additionally, we can impose further constraints in the computation like a maximum sprite-buffer size, or a resolution-preservation constraint, without increasing the computational complexity. For details, see the papers below.
We preprocessed the stefan-sequence by cropping away the black borders, which can introduce artifacts in the generated sprite. The sequence we used has size 346x280 and can be downloaded in raw yuv format (see below). First, a feature-based motion-estimator was applied to obtain the frame-to-frame motion parameters [2]. We provide these parameters in the download section below for your convenience in carrying out comparative studies (the parameters are the 3x3 homography matrices, in row-first order). Note that these are low-accuracy motion-parameters that are not used for sprite construction. For the actual sprite-construction (after multi-sprite partitioning), we refine the motion parameters with a dense matching approach. You can also find the high-accuracy parameters below. These high-accuracy parameters are not inter-image parameters, but relative to the sprite plane. If you use any of these parameters in your research, don't forget to cite the corresponding paper from below.
The following pictures are the obtained multi-sprite images (computed with an additional resolution preservation constraint, see [1]). The reference frame is marked in each picture. Select "View Image" to see the picture in full size.
The sprite partitioning data:
By applying autocalibration techniques [3], the physical camera parameters (rotation angles, focal lengths) can be recovered from the motion data. This enables to get an overview about how the capturing took place.
This sequence shows a more complicated camera motion. For your convenience, we again provide the sequence and motion parameters:
Sprite partitioning data: