YOLOv8 in Production: Building a Multi-Camera CCTV Anomaly Detection System
YOLOv8 achieves state-of-the-art object detection benchmarks on academic datasets. That's well documented. What's less documented is what happens when you deploy YOLOv8 to process 8 simultaneous CCTV feeds in real time, detect anomalies across zone-based business rules, deliver WebSocket alerts to a security dashboard under 200ms, and keep false positives low enough that security staff don't start ignoring the alerts.
I built a multi-camera CCTV anomaly detection system through four evolutionary phases — from a single-camera prototype to a production system processing 8 simultaneous feeds at 91% detection accuracy.
What Anomaly Detection Actually Means in This Context
For this system, anomaly detection means detecting specific pre-defined behavioral patterns that violate business rules:
- Unauthorized access to restricted zones
- Loitering beyond a configurable time threshold
- Crowd density exceeding zone-specific limits
- Object left unattended beyond threshold duration
- People count falling below minimum staffing levels in service areas
These are violations of explicit rules applied to detected objects in defined zones. YOLOv8 provides the object detection foundation. The business logic layer above it defines what constitutes an anomaly.
Phase 1: Single Camera Prototype
Model selection
YOLOv8 comes in five sizes: nano (n), small (s), medium (m), large (l), and extra-large (x). For real-time CCTV processing on an NVIDIA T4 GPU:
- YOLOv8n: 45 FPS on T4, 78% mAP on our evaluation set
- YOLOv8s: 28 FPS on T4, 86% mAP on our evaluation set
YOLOv8s with 8 cameras = 3.5 FPS per camera. YOLOv8n with 8 cameras = 5.6 FPS per camera. Neither was enough. After INT8 quantization, YOLOv8s reached 67 FPS on T4 — 8.4 FPS per camera, acceptable for this use case since behavioral anomalies unfold over seconds.
RTSP stream handling
RTSP is the standard for CCTV cameras. Handling RTSP streams reliably requires explicit reconnection logic — cameras go offline, network connections drop. We wrap each stream in a thread that monitors connection health and reconnects with exponential backoff. Camera status is tracked separately from detection.
Phase 2: Multi-Camera Architecture
Frame batching across cameras
Processing each camera in an independent thread with its own model inference wastes GPU resources. Instead, we collect one frame from each active camera, batch them into a single tensor, and run a single batched inference call. GPU parallelism makes this nearly free — 67 FPS effectively applies to all 8 cameras combined.
Zone definition
Each camera has configurable detection zones — polygonal regions defined in pixel coordinates, stored in the database and loaded at startup. Changing zone boundaries requires no code changes or redeployment. For each detected object, we calculate zone occupancy using point-in-polygon testing.
Object tracking
To apply time-based rules (loitering threshold, unattended object duration), we need persistent object identities across frames. We use ByteTracker — a lightweight multi-object tracker that assigns stable IDs across frames even through brief occlusions. Each tracked object maintains: track ID, first/last detection timestamp, current zone, and detection history.
Phase 3: Business Rules Engine and Alert Generation
Rules are defined in YAML rather than code. Each rule specifies camera ID, zone, object class, condition type, threshold, severity, and optional time windows.
rules:
- name: "loitering_restricted_zone"
camera_id: "cam_02"
zone: "server_room_entrance"
object_class: "person"
condition: "duration_in_zone"
threshold_seconds: 30
alert_severity: "high"
- name: "low_staffing_checkout"
camera_id: "cam_05"
zone: "checkout_area"
object_class: "person"
condition: "count_below"
threshold_count: 2
alert_severity: "medium"
time_window: "business_hours"False positive reduction
Raw detection results from YOLOv8 are noisy. We apply two filters before generating alerts:
- Temporal debouncing: a rule must be continuously triggered for N frames before generating an alert — brief triggers are filtered as noise
- Confidence thresholding: detections below 0.6 confidence are excluded from rule evaluation
Phase 4: Real-Time Dashboard and Alert Delivery
Alert delivery architecture
The rules engine publishes alert events to a Redis channel. A dedicated alert delivery service subscribes and pushes events to connected WebSocket clients in the appropriate security group. The pub-sub pattern decouples detection performance from delivery performance — a slow WebSocket client doesn't affect the detection pipeline.
Snapshot image handling
Each alert includes a snapshot of the triggering frame with the relevant zone and detected object highlighted, delivered within the 200ms budget:
- Snapshot cropped and resized to dashboard display size at generation time, not delivery time
- Compressed to JPEG quality 75 — readable for identification, fast to transfer
- Uploaded to Azure Blob Storage asynchronously; alert delivered with a pre-signed URL
- Dashboard loads image lazily — alert appears immediately, image loads as available
Production Numbers
- Concurrent CCTV feeds: 8 simultaneous streams on a single NVIDIA T4
- Detection accuracy: 91% mAP on the production evaluation set
- False positive rate: 4% (down from 23% before debouncing and confidence thresholding)
- Alert delivery latency: P95 < 180ms from triggering event to dashboard push
- Frame processing rate: 8.4 FPS per camera (INT8 quantized YOLOv8s on T4)
- Infrastructure cost: 60% lower than a single-camera-per-instance approach
Final Thoughts
Building a production CCTV anomaly detection system is substantially more complex than running YOLOv8 inference on video frames. The detection layer is the starting point, not the destination. The real engineering is in the tracking, the business rules engine, the false positive reduction, and the delivery infrastructure.
The false positive problem is the most underestimated challenge. A system that pages security staff 20 times per shift with false alerts trains them to ignore all alerts — including the real ones. Getting false positives below 5% required more engineering effort than everything else combined.
Solve the false positives first. Everything else is infrastructure.
Frequently Asked Questions
How many CCTV feeds can YOLOv8 process simultaneously in real time?
YOLOv8s quantized to INT8 achieves 67 FPS on an NVIDIA T4 GPU. With frame batching across cameras — collecting one frame per camera into a single batched inference call — this supports 8 simultaneous CCTV feeds at approximately 8.4 FPS per camera, sufficient for behavioral anomaly detection where events unfold over seconds.
How do you reduce false positives in AI-powered CCTV anomaly detection?
Apply two filters before generating alerts: temporal debouncing (require the rule to trigger for N consecutive frames before alerting, filtering brief false triggers as noise) and confidence thresholding (exclude YOLOv8 detections below 0.6 confidence). These two filters reduced our production false positive rate from 23% to 4%, which is the threshold where security staff begin to trust the system.
How do you track objects across video frames for time-based anomaly rules like loitering detection?
Use ByteTracker, a lightweight multi-object tracker that assigns stable IDs to detected objects across frames even through brief occlusions. Each tracked object maintains its track ID, first and last detection timestamps, current zone, and detection history — enabling rules like loitering detection (object in restricted zone for more than N seconds) and unattended object alerts.
Available for Consulting
Let's build something
that matters.
I take on a select number of project-based consulting engagements per quarter — from architecture reviews and LLM pipeline audits to full production builds.
80+ clients · 4+ years production AI · Remote / Islamabad