4.8 Real-Time Vision Pipeline Frameworks

Why This Matters

An accurate model alone does not create a real-time vision system.

A working system also needs:

input from cameras or files
decode and frame transport
inference and post-processing
display, storage, or downstream communication

That whole chain is called a pipeline.

This section introduces the framework thinking needed to understand real-time vision systems before moving into DeepStream and Jetson-specific services.

Learning Objectives

By the end of this section, you should be able to:

explain the main stages of a real-time vision pipeline
compare simple application pipelines with framework-based pipelines
understand the role of OpenCV and GStreamer
write small examples for reading and processing live video
think in terms of end-to-end system flow instead of isolated model calls

Core Concepts / Theory

A Real-Time Pipeline Has Stages

A typical real-time visual pipeline includes:

source input
decode and frame acquisition
preprocessing
model inference
post-processing
rendering, storage, or transmission

Why Pipelines Matter

If one stage is unstable or too slow, the whole system suffers.

For example:

a bad RTSP stream can create dropped frames
slow preprocessing can increase latency
unnecessary display overhead can limit throughput

OpenCV vs GStreamer

OpenCV is convenient for learning and application logic.

GStreamer is stronger when you need robust media pipelines, streaming, and real-time data flow.

Both are useful, but they serve different roles.

Key Terms

Source: where frames come from
Decode: converting compressed video into frames
Preprocessing: preparing frames for inference
Post-processing: turning raw predictions into useful output
Latency: end-to-end delay through the pipeline
Throughput: total volume of processed frames or streams

Worked Example / Code Example

Simple OpenCV Live Pipeline

python

import cv2
import time

cap = cv2.VideoCapture(0)
prev_time = time.time()

while True:
    ok, frame = cap.read()
    if not ok:
        break

    current_time = time.time()
    fps = 1.0 / (current_time - prev_time)
    prev_time = current_time

    cv2.putText(frame, f"FPS: {fps:.2f}", (20, 40), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
    cv2.imshow("OpenCV Live Pipeline", frame)

    if cv2.waitKey(1) & 0xFF == ord("q"):
        break

cap.release()
cv2.destroyAllWindows()

GStreamer Example

bash

gst-launch-1.0 videotestsrc ! videoconvert ! autovideosink

RTSP Input Example