Background Estimation for Complex Video Sequences

Knowing the background image of a video scene simplifies the video-object segmentation problem and therefore it is re- quired by several automatic segmentation algorithms. The algorithm described on this page was developed with the MPEG-4 object coding in mind. As such, it is an offline algorithm which attempts to generate the best possible background for a given short video sequence. For an online background computation, it will have to be modified.

The Algorithm

The algorithm is based on the observation that each small block in the input image can be in one of these three categories:

First, moving content is sorted out, as this is quite easy to do and simplifies the solution. The main part of the algorithm chooses for each block, which content "cluster" is the background, the others being static foreground. It does this by considering the total time the content is seen, how often it reappears, and how coherent the background time-ranges are with the neighboring blocks.

Similarity Matrix

In order to find the static background for each block, the algorithm employs a similarity matrix. Each element Ma;b rates the visual similarity of the block in frame a and b (actually, a value of 0 indicates maximum similarity while 1 is minimum similarity). Currently, this is simply the sum of absolute differences over the block. For frames a in which the block contains motion, the matrix elements Ma;x and Mx;a are set to 1 (no similarity).

The subsequent optimization step computes the set of frames T that most probably show background. It does so by maximizing the similarity between pairs of frames in T, while minimizing the similarity of pairs (with at least one image) not in T. More specifically, the following minimization is carried out:

The optimization is carried out iteratively by adding or removing frames k from the set T until convergence. The change in cost when adding a frame k to T can be computed efficiently using

Note that this is correct because M is symmetric and Mk,k=0. Conversely, the change in cost when removing a frame k from T is clearly ΔC-k=-ΔCk.

Exploiting Temporal Correlation

Since neighboring blocks will often show background during similar periods of time, one can use the solution found for a neighboring block as initialization for the optimization of the current block. This not only increases computational efficiency, but reduces the risk of converging to a local optimum. In some cases, it may even force the optimization to the "correct optimum" even though this is not the global optimum.

We initialize the optimization with a set that contains frames if they are marked as background for the block on the left and the block on the top side. As one can see in a real-world example, these are highly correlated:

Background Image Generation

Generation of the final background image simply applies a pixel-based median filter, but including only data from static background blocks. Occasional optimization errors are further reduced in this step.


The following images show the example results for the classification of each block. You may notice that the classification does not seem very accurate for the first image. However, what is important to consider is that accidentally marking a background block as static foreground or moving does not hurt the overall performance much as long as there are not too many foreground blocks classified as static background. Hence, the parameters have been adjusted in such a way that classification as foreground is preferred.

Comparison to Median Algorithm

Note that the background of the last sequence is impossible to reconstruct, as not all parts of the image are visible in the sequence.


Below, you can find a linux binary of the algorithm for your evaluation purposes. Additionally to the described algorithm, this program also implements other baseline algorithms (e.g., median, aging, mode), each with the option to exclude moving regions.

The full list of command line options is:

BackgroundEstimator 0.1 (02 Mar 2004)  / (c) Dirk Farin

Usage: BackgroundEstimator [OPTIONS]... [FILES]...
   -h         --help                  Print help and exit
   -V         --version               Print version and exit
   -aSTRING   --algo=STRING           background reconstruction algorithm (--algo=list for available options)
   -nINT      --history=INT           number of input frames the algorithm may use
   -GDOUBLE   --aging-factor=DOUBLE   set aging factor for 'aging' algorithm
   -FINT      --fuzzyness=INT         set fuzzyness fator for 'mode' algorithm
   -bINT      --blocksize=INT         set block size for 'simmat' algorithm
   -2STRING   --algo2=STRING          second background reconstruction algorithm
              --history2=INT          number of input frames the second algorithm may use (overrides -n)
              --aging-factor2=DOUBLE  set aging factor for 'aging' algorithm (overrides -G)
              --fuzzyness2=INT        set fuzzyness fator for 'mode' algorithm (overrides -F)
              --blocksize2=INT        set block size for 'simmat' algorithm (overrides -b)
   -dINT      --decimate=INT          only use every 'n'th input frame
   -sINT      --speedup=INT           show output only after each 'n'th step
   -rSTRING   --resolution=STRING     resolution of input frames (YUV input only)
   -fSTRING   --format=STRING         input format (yuv, mpeg)
   -oSTRING   --save-output=STRING    write sequence of generated backgrounds to PPM files
   -l         --offline               only generate one background image for the whole sequence (overrides -s)
   -ASTRING   --alpha=STRING          use input file with alpha-channel information
   -t         --transparent-motion    make areas with image motion transparent
   -x         --no-x11                disable X11 output

and the list of available algorithms is:

Available algorithms: aging, average, median, mode, first, last, simmat

Aging:     A current background image is maintained. After each input frame 'f',
           the background 'b' is updated according to 'b <- a*b + (1-a)*f'.
           the aging factor 'a' can be adjusted.
Average:   Compute pixel-wise average in temporal direction (excluding
           transparent pixels).
Median:    Compute median filter in temporal direction (excluding transparent
Mode:      Select most frequently occuring pixel value (equality is considered
           up to a fuzzyness difference that can be adjusted)
First:     Take first non-transparent pixel into background
Last:      Take last non-transparent pixel into background
SimMat:    Algorithm based on block-similarity matrices and foreground time-range
           prediction (see Farin, ICIP'03 paper for longer explanation).

The program reads YUV and MPEG sequences. However, for reading MPEG sequences, you will need to have DVDview installed. A typical command line looks like this:

./bkgEstimator.x86.static.bin -f yuv -r 352x288 -o output -l -a simmat hall_cif.yuv

Writing the output into output0300.ppm or similar.