Library Concepts

The following picture shows the main flow diagram behind all OpenTL-based applications. This scheme follows our model-based tracking pipeline concept, which produces the desired output (state probability density estimation) starting from raw image data and going through multiple processing levels. In our framework, any tracking pipeline consists of the following steps:

Another important part of the scheme concerns the re-initialization module (detector) which is responsible for finding new objects and initializing their state estimate, by performing a global search in state-space. In OpenTL, this module makes use of the same facilities of the measurement processing step of the pipeline (pre-processing, multi-modal and multi-camera data association and fusion), while working on a static level, since no state prediction is yet available.

Library Architecture

The picture below shows an abstract organization of functionalities inside OpenTL, where the layers reflect the semantics of each module involved:

Library Modules

Following the previous abstraction, in this section we describe the internal module implementation of OpenTL, also organized in a hierarchical way that reflects dependencies across modules (indicated by arrows).

Layer 7: User Application

The HighAPI module in the topmost layer is finally meant to encapsulate the tracking pipeline in a more compact and user-friendly API, with an easier system and parameter specification. (currently work in progress)

Layer 6: Tracking pipeline

Here the main tracking pipeline is realized, through the main abstractions tracker and detector, as well as sensory input and output visualization.

Layer 5: Multi-modal visual processing

Layer 5 contains the visual modalities for tracking: they perform all model-based processing operations required for data association and fusion (both over multiple modalities and targets), and deliver output measurements to the trackers/detectors of the upper layer. It provides a common abstraction for model- and state-based measurement processing (pre-processing, features sampling, data association, Likelihood or explicit residual computation, data fusion, features update after state update) according to each visual modality (color, template, edges, ...) It depends on the ModelMapping module, because it needs to map points and features between object and image space, in order to perform matching, sampling and update operations. It depends on CvProcess, because modalities make use of specific image-based processing (e.g. pre-processing, detection or matching).

Layer 4: Object-to-sensor space mapping

The fourth layer consists of classes mapping between object and sensor (in particular, image) spaces. Here are also included advanced GPU-based facilities, for example a sampler of visible model edges from any given viewpoint. In detail it consists of object-to/from-image mapping facilities, like geometric point warp and derivatives, as well as GPU-based rendering and visible features sampling.

Layer 3: Tracking data and image processing

Layer 3 holds model-free image processing facilities, as well as object data on a higher level than the core module (i.e. target-related); here also GPU shaders for model-free image processing are provided. All of the functions implemented in this module do not make use of prior models, nor of any state hypothesis: examples include edge detection, color conversion, camera image de-bayering, invariant key-points detection, etc. General GPU shader management and standard shaders are also part of this module.

Layer 2: Base data structures and pose representations

The core module contains the base data types and processing classes, including the main Pose abstraction (object-space transformation matrices and Jacobians computation), the Image, and the abstraction for Feature data. Most of the OpenTL data structures are defined inside the cvdata namespace of this module. All of them inherit from the base abstraction CvData, for the most different purposes: state-space representations (which in turn inherit from a general Pose abstraction); image data; shape and appearance model; visual features of the most variable nature and descriptor (inheriting from a common abstraction).

Layer 1: Matrix computations

The base layer contains facilities for algebra and matrix computation/manipulation, as well as general math utilities. It serves as the foundation of the whole OpenTL software implementation. Currently the math implementation is based on the math implementation of OpenCV and therefore also uses Intel IPP library if installed. Other backends than the OpenCV math implementation are planned.