HARIS Hostile Activity Recognition and Intelligence System

01 The problem we're solving

CCTV sees everything, operators cannot.

Most CCTV deployments are reactive. Hours of footage are reviewed only after an incident, cameras do not talk to each other, and the few smart systems that exist flood operators with false alarms or hide behind a single opaque score.

Operator fatigue is the failure mode

A security operator watching 16 camera feeds cannot physically pay attention to all of them. The moment that matters is usually the one nobody was watching.

16 simultaneous feeds can turn even a trained operator into a delayed forensic reviewer.

False-alarm flooding breaks trust

Single detections are noisy in CCTV footage. A useful system must suppress weak signals, bind evidence over time, and show why an alert fired.

Handshake versus strike needs time

A single frame cannot reliably separate harmless contact from hostile motion. HARIS adds the temporal dimension through skeleton windows and tracked identities.

02 In Action

See the system working.

Real CCTV footage, annotated live by HARIS. Click any feed to focus it.

03 The pipeline

Four tiers, structured evidence.

The published CORE pipeline is a specialist chain. Each tier solves one sub-problem and passes structured evidence to the next, so alerts can be audited instead of treated as black-box scores.

  1. Tier 0

    Frame-difference motion gate

    Suppresses still frames and opens downstream inference only when motion evidence exists.

  2. Tier 1

    Dual RT-DETR detection

    Runs one fine-tuned weapon detector for gun and knife, plus one COCO-pretrained detector for people.

  3. Tier 2

    BoT-SORT and RTMPose-L

    Maintains person IDs and extracts 17-keypoint COCO skeletons for tracked people.

  4. Tier 3

    ST-GCN and Aggressor Logic Engine

    Classifies 30-frame skeleton windows and emits a JSON evidence chain for operator review.

Stage Component What it contributes
Motion Frame-difference gate Reduces wasted inference on still footage and makes the rest of the pipeline event-aware.
Detect Dual RT-DETR Fine-tuned HARIS weapon detector for gun and knife, paired with COCO-pretrained person detection.
Track and pose BoT-SORT plus RTMPose-L Stable person IDs across frames and 17 COCO keypoints per tracked person for skeleton reasoning.
Action ST-GCN Skeleton-based action classification over 30-frame temporal windows instead of one-frame guesses.
Reason Aggressor Logic Engine Combines detections, tracks, pose, temporal windows, and holder binding into a JSON evidence chain.

04 Measured results

Paper numbers, not brochure numbers.

These figures follow the published paper. The runtime latency benchmark was measured on an RTX 3070. Training was performed on an RTX 5070 Ti, and the two hardware contexts are kept separate.

0%

video-level FP reduction

2 of 25 benign videos were flagged, compared with 6 of 25 for the raw detector.

0.000

end-to-end F1

On in-scope UCF-Crime classes: Shooting, Assault, and Fighting. Precision 0.812, recall 0.688.

0.000

weapon mAP@50

Aggregate validation mAP@50, with gun 0.768, knife 0.711, precision 0.836, recall 0.690.

0

curated images

Deduplicated images across 7 source datasets, with leak-audited GroupKFold splits.

0.0 ms

mean runtime latency

Measured on RTX 3070: 154.5 ms P95 and 10.1 FPS end-to-end throughput.

05 Operator-facing features

Built like an operator surface.

The dashboard is designed as a professional DVR and NVR replacement, not a research notebook. Every overlay is toggleable, every threshold is live-tunable, and every alert shows its reasoning.

01

Continuous body overlay

Skeleton and mannequin rendering for every tracked person. When pose estimation drops a frame, a last-valid-pose snapshot holds briefly, then falls back to a generic body glow.

02

Weapon-threshold slider

Operators tune confidence sensitivity in real time, with live impact on detection panels, overlay strokes, the auto-flagger, and the threat heatmap timeline.

03

Threat-density heatmap

The scrub bar renders detected threat density across the clip, so operators can scan a long video quickly and jump to the seconds that matter.

04

Weapon holder binding

Detected weapons are bound to the wrist of the nearest tracked person through pose-based proximity, making crowded-scene ownership easier to audit.

05

Night and tint modes

Per-clip brightness and contrast boosts for low-light footage, plus customizable tint for washed-out daytime clips. Both persist per operator.

06

Auditable alerts

Every alert carries its evidence: frames, people, weapon class, confidence scores, and the temporal window. Operators can acknowledge, mark false-positive, or escalate.

06 What makes HARIS different

Traceable by design.

HARIS is not a monolithic detector wrapped in a dashboard. The system keeps named model boundaries and exposes the evidence chain to the operator.

01

Specialist pipeline, not monolith

Every decision is traceable to a named sub-model. When HARIS is wrong, we know which stage was wrong and can fix that stage without retraining the whole system.

02

Temporal, not single-frame

Actions are classified over 30-frame skeleton windows. Single-frame detections never carry the whole decision.

03

API-first, multi-client ready

A clean JSON boundary means future mobile and desktop clients can plug into the same server for portable operator workflows.

04

Honest evaluation

Group-aware splits, source-level deduplication, and published limitations keep the results anchored to what the system can actually do.

07 Roadmap

Published core, planned extensions.

Qwen2.5-VL summaries remain planned, while FaceNet re-ID is treated as an operator-gated live feature outside the published CORE pipeline.

Phase 1 | 2025

Foundations

Problem framing, literature review, initial dataset construction, first-pass detector and pose integration. Proposal defense passed.

Phase 2 | Q1 2026

Full published pipeline

Tier 0 motion gate, dual RT-DETR, BoT-SORT, RTMPose-L, ST-GCN, and Aggressor Logic Engine running end-to-end with auditable JSON output.

April 2026

Weapon detector v2 and evaluation

Fine-tuned detector, paper metrics, continuous body overlay, threshold controls, threat heatmap, holder binding, and night or tint modes.

Planned

Qwen2.5-VL on-alert summaries planned

On-alert visual-language summaries that explain scene context in natural language after a structured alert has already fired.

Planned

Aggressor Logic Engine smoothing variant planned

A temporal-smoothing variant that improves role assignment stability for aggressor, defender, and bystander labels.

Planned

Edge-device deployment planned

Profiling and optimization for constrained deployment targets beyond the desktop GPU environment used in the paper.

08 Honest limitations

Operational caps are part of the product.

We publish the caps. A system that pretends to have no limits hides those limits from its operators, which is the opposite of what surveillance AI should do.

Top-4 action labels

Tracking applies to everyone in frame, but skeleton-based action classification applies to the four highest-detection-confidence persons.

Video processing cap

The dashboard is scoped for short-clip operator workflows, with a 60-second, 10 FPS default upload cap that can be overridden for evaluation runs.

Gated re-ID

Tracker re-identification and watchlist matching are gated because the re-ID path has a wall-time cost and privacy implications.

Pose-confidence gating

Far-field subjects with low-quality skeletons can keep bounding boxes and tracks while dropping action labels.

Knife localization gap

Knife boxes are harder to localize at small scale and in occlusion, which can affect holder binding and confidence.

Long-range and low-resolution limits

Very small people, compressed footage, and poor camera angles still reduce detector, pose, and action-classifier reliability.

RT-DETR BoT-SORT RTMPose ST-GCN FaceNet PyTorch