Automatic Video Segmentation
Employing Object/Camera Modeling Techniques 

Dirk Farin  
Welcome to the homepage of
my PhD thesis ! On this page, you can access the full text and also get more information for many chapters, like example results and demo movies. Have fun browsing through all of it and don't hessitate to ask for further information in case you got interested. Dirk Farin 
Table of Contents (pdf)  
Chapter
1: Introduction (pdf) This chapter presents a
survey of typical application areas that apply segmentation algorithms.
This includes video editing, compression, content analysis, and 3D
reconstruction applications. A particular focus is on the concept of
objectoriented video coding in the MPEG4 standard. Afterwards, the
requirements are defined for a segmentation system that is compliant
with the MPEG4 videocoding approach. A proposal for such a
segmentation system is made and the main components are briefly
introduced.


Part I: An Automatic Video Segmentation System  
Chapter
2: Projective Geometry (pdf) This chapter gives an
introduction to projective geometry as far as it is required to
understand the subsequent chapters. We start with defining the
projective space and deriving basic operations on points and lines in
it. We proceed by discussing geometric transformations in the
projective plane and in the 3D Euclidean space, including the
projection of 3D space onto a 2D image plane. Finally, we construct a
detailed model of the image formation process for a moving camera.


Chapter
3: Featurebased Motion I: PointCorrespondences (pdf) A featurebased motion estimator
comprises three steps: detection of featurepoints, establishing
featurecorrespondences, and estimation of camera motion parameters. In
this chapter, we give an introduction to featurebased motion
estimation and we cover the first two steps. We provide a survey of
featurepoint detectors and evaluate their accuracy. After that, we
present a fast algorithm for determining correspondences between two
sets of featurepoints.


Chapter
4: Featurebased Motion II: Parameter Estimation (pdf) This chapter describes the
algorithm for computing the parameters of the projective motion model,
including the case of images with multiple independent motions. To
extract the dominant motion model in this case, we apply the RANSAC
algorithm. An evaluation of the robust estimation algorithm shows
that the accuracy of the results in practice is worse than
expected from a theoretic evaluation. However, after analyzing this
discrepancy, we propose a modification to reach the theoretical
performance.


Chapter
5: Background Reconstruction (pdf) In order to synthesize a
background image of a scene recorded with a rotational camera, we first
increase the accuracy of the cameramotion parameters such that
longterm consistency is achieved. After that, the input frames are
combined such that moving foreground objects are removed. We
review some of the existing approaches and present a new algorithm for
background estimation which is designed for difficult sequences where
the background is only visible for a short time period.


Chapter
6: MultiSprite Backgrounds (pdf) The MPEG4 standard enables to
send a background image (sprite) independently from the foreground
objects. Whereas it seems optimal to combine as many images into one
background sprite as possible, we have found that the counterintuitive
approach of splitting the background into several independent parts can
reduce the overall amount of data needed to transmit the background
sprite.
We propose an algorithm that provides an optimal partitioning of a video sequence into independent background sprites, resulting in a significant reduction of the involved coding cost. (Content of this chapter has received two best paper awards.) 

Chapter
7: Background Subtraction (pdf) We discuss the
backgroundsubtraction module, starting with simple independent pixel
classification and then proceeding to more complex tests that include
contextual information to decrease the number classification errors.
Furthermore, typical problems that lead to segmentation errors are
identified and a modification to the segmentation algorithm is provided
to reduce these effects.
Finally, a few postprocessing filters are presented that can remove obvious errors like small clutter regions. 

Chapter
8: Results and Applications (pdf) This chapter describes several
variants how to combine the algorithms that were presented in Chapters
27 into a complete segmentation system. Results are provided
for a large variety of sequences to show the quality of th
segmentation masks, but also to show the limitations of the
proposed approach. Algorithm enhancements to overcome these limitations
are proposed for future work. Finally, we provide examples for applying
segmentation system in MPEG4 video coding, 3D video generation,
video editing, and video analysis.


Part II: Segmentation Using Object Models  
Chapter
9: Object Detection based on GraphModels I: Cartoons (pdf) Sometimes it is desired to
specify which objects should be extracted from a picture. This and
the following chapter present a graphbased model to describe video
objects. The modeldetection algorithm is based on an inexact
graphmatching between the userdefined object model and an
automatically extracted graph covering the input image. We discuss the
modelgeneration process, and we present a matching algorithm
tailored to the object detection in cartoon sequences.


Chapter
10: Object Detection based on GraphModels II: Natural (pdf) This chapter presents an
algorithm for videoobject segmentation that combines motion
information, a highlevel objectmodel detection, and spatial
segmentation into a single framework. This joint approach overcomes the
disadvantages of these algorithms when they are applied independently.
These disadvantages include the low semantic accuracy of spatial color
segmentation, the inexact object boundaries obtained from
objectmodel matching and the often incomplete motion information.


Chapter
11: Manual Segmentation and Signature Tracking (pdf) For some applications it can be
advantageous to use semiautomatic segmentation techniques, in which
the user controls the segmentation manually. A popular approach is
the Intelligent Scissors tool, which uses shortestpath algorithms
to locate the object contour between two usersupplied control
points. We propose a new segmentation tool, where the user specifies a
rough corridor along the object boundary, instead of placing
control points. This tool is based on a new algorithm for computing
circular shortest paths. Finally, the algorithm is extended with a
tracking component for video data.


Part III: From Camera Motion to 3D Models  
Chapter
12: Estimation of Physical Camera Parameters (pdf) We discuss the problem of
calculating the physical motion parameters from the frametoframe
homographies. These cameramotion parameters represent a valuable
feature for sequence classification, or augmentedreality applications
that
insert computergenerated 3D objects into a natural scene. We present two algorithm variants: the first algorithm applies a fast linear optimization approach, whereas the second algorithm uses a nonlinear bundleadjustment algorithm that provides a high accuracy. Both algorithms have been combined with our multisprite technique to support unrestricted camera motion. 

Chapter 13: Camera Calibration for the
Analysis of Sport Videos (pdf) For the indepth analysis of
sports like tennis or soccer, it is required to know the positions
of the players on the tennis court or soccer field. To obtain these
positions, it is necessary to establish the transformation between
image coordinates and realworld coordinates. We employ a model of the
playing field to define the realworld reference coordinate system
and we describe a new algorithm that localize the userdefined
geometric model in an input image in order to find the geometric
transformation between both.


Chapter
14: Panoramic Video and Floor Plan Reconstruction (pdf) Previously,
we have used a planar projection of the scene background. However, the
most frequentlyused model for panoramic images is the cylindrical
panorama. Since cylindrical images are a special kind of image with
geometric distortions, their contents are not always easy to
interpret. Therefore, we propose new visualization technique that
reconstructs the geometry of the room in which the panoramic image
was recorded. Finally, this concept is generalized to reconstruct a
complete floor plan based on multiple panoramic images.
Additionally to the aforementioned improved visualization, this enables
new applications like the presentation of real estate with virtual
tours through the appartment.


Chapter
15: Conclusions (pdf) This thesis has presented various
techniques for videoobject segmentation. An automatic segmentation
system for rotating cameras was presented and extended with
objectmodel controlled segmentation and camera autocalibration. In
this chapter, the achievements are summarized and it is discussed
how these techniques could be enhanced in future research. Finally,
interesting research directions are highlighted that may be
promising approaches for future videosegmentation systems.


Appendices  
Appendix
A: VideoSummarization with Scene Preknowledge (pdf) For long video sequences, it is
not easy to quickly get an overview of the content.
If the video content could be summarized with a collection of representative keyframes, the user could get an immediate overview. This appendix presents a new twostep clusteringbased algorithm. The first stage provides a soft form of shot separation, determining periods of stable content. The second clustering stage then selects the final set of keyframes. Image content which is not desired in the summary can be excluded by providing sample images of shots to exclude. 

Appendix
B: Efficient Computation of Homographies From Four Correspondences
(pdf) The usual way to compute the
parameters of a projective transform from four
point coordinates is to set up an 8x8 equation system. This appendix describes an algorithm to obtain these parameters with only the inversion of a 3x3 equation system. 

Appendix
C: Robust Motion Estimation with LTS and LMedS (pdf) In Chapter 4, we employed the
RANSAC algorithm to calculate globalmotion parameters for a set of
pointcorrespondences between two images. One disadvantage of the
RANSAC algorithm is its dependence on the inlier threshold.
In this chapter, we look at two alternative algorithms: LeastMedianofSquares (LMedS), and LeastTrimmedSquares (LTS). 

Appendix D: Additional Test Sequences (pdf)  
Appendix
E: Color Segmentation Using Region Merging (pdf) In this appendix, we briefly
introduce the regionmerging algorithm and present several merging
criteria and evaluate their performance in terms of noise robustness
and subjective segmentation quality. Furthermore, we introduce a
new merging criterion yielding a better subjective segmentation
quality, and propose to change to merging criterion during
processing to further increase the overall robustness and segmentation
quality. (Content
of this chapter has received a best paper award.)


Appendix F: ShapeBased Analysis of Object
Behaviour (pdf) One example application of
segmentation is to use the obtained object masks to identify the
object. Since objects usually do not appear static, but perform some
motion, this can also be analysed and assigned to subclasses like
"walking human", or "sitting human". We combined a
classification into several predefined classes with a model
of the transition probability between these classes over time. Having a
model that describes the transitions between classes makes the
classification more robust than an independent classification for
each input frame.


References, Summary, and Biography (pdf)  
Thesis statements (Stellingen) (pdf) 
The full text of the PhD thesis is also available as PDF (60 MB, 561 pages, excluding thesis statements).
There are a few printed copies left (two books with partly colored figures). If you would like to receive one of them, simply contact me by .