Movatterモバイル変換


[0]ホーム

URL:


CN120263940A - An intelligent video surveillance system and method based on deep learning - Google Patents

An intelligent video surveillance system and method based on deep learning
Download PDF

Info

Publication number
CN120263940A
CN120263940ACN202510405856.5ACN202510405856ACN120263940ACN 120263940 ACN120263940 ACN 120263940ACN 202510405856 ACN202510405856 ACN 202510405856ACN 120263940 ACN120263940 ACN 120263940A
Authority
CN
China
Prior art keywords
intelligent video
feature
video surveillance
causal
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202510405856.5A
Other languages
Chinese (zh)
Inventor
谢之鑫
苏颖
李大严
郭冰玉
吴艳蓉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xinjiang Vocational & Technical College Of Communicaitons
Original Assignee
Xinjiang Vocational & Technical College Of Communicaitons
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xinjiang Vocational & Technical College Of CommunicaitonsfiledCriticalXinjiang Vocational & Technical College Of Communicaitons
Priority to CN202510405856.5ApriorityCriticalpatent/CN120263940A/en
Publication of CN120263940ApublicationCriticalpatent/CN120263940A/en
Pendinglegal-statusCriticalCurrent

Links

Landscapes

Abstract

Translated fromChinese

本发明提供了一种基于深度学习的智能视频监控系统及方法,属于视频监控技术领域,包括:异构传感阵列、认知驱动特征融合模块、动态可进化检测模型、因果推理跟踪引擎、元知识增强行为分析模块以及量子‑经典混合计算架构。异构传感阵列包含可重构的可见光/红外双模摄像头组、分布式麦克风阵列及毫米波雷达,配置为生成多物理场感知数据;认知驱动特征融合模块采用时空‑频谱联合编码技术,集成三维卷积网络与图注意力机制实现跨模态特征交互。本发明通过多维度技术创新,构建了具备自主进化能力的智能视频监控系统,在感知能力、推理精度、资源效率和安全性方面均显著超越传统方案,满足智慧城市、安防等场景的高标准需求。

The present invention provides an intelligent video surveillance system and method based on deep learning, which belongs to the field of video surveillance technology, and includes: a heterogeneous sensor array, a cognitive-driven feature fusion module, a dynamic evolvable detection model, a causal reasoning tracking engine, a meta-knowledge enhanced behavior analysis module, and a quantum-classical hybrid computing architecture. The heterogeneous sensor array includes a reconfigurable visible light/infrared dual-mode camera group, a distributed microphone array, and a millimeter-wave radar, which are configured to generate multi-physical field perception data; the cognitive-driven feature fusion module adopts a space-time-spectrum joint coding technology, and integrates a three-dimensional convolutional network with a graph attention mechanism to achieve cross-modal feature interaction. Through multi-dimensional technological innovation, the present invention constructs an intelligent video surveillance system with autonomous evolution capabilities, which significantly surpasses traditional solutions in terms of perception capabilities, reasoning accuracy, resource efficiency, and security, and meets the high standards of scenarios such as smart cities and security.

Description

Intelligent video monitoring system and method based on deep learning
Technical Field
The invention relates to the technical field of video monitoring, in particular to an intelligent video monitoring system and method based on deep learning.
Background
The monitoring system is one of the most applied systems in security systems, the more suitable building site monitoring system on the market is handheld video communication equipment, and video monitoring is currently the mainstream. The change of the ground over the sky occurs from the earliest analog monitoring to the digital monitoring of the heat of fire in the previous years to the video monitoring of the network of the current state of the prior art. Today, the IP technology is gradually unified worldwide, and it is necessary to re-recognize the development history of the video monitoring system. From a technical perspective, video surveillance system developments are divided into first generation analog video surveillance systems (CCTV), second generation digital video surveillance systems (DVR) based on "pc+multimedia card", and third generation completely IP network based video surveillance systems (IPVS).
The existing monitoring system has the following technical defects:
The traditional monitoring system relies on a single sensor (such as a visible light camera) and has performance dip under the scene of low illumination and noise interference.
Static model limitation is that a fixed network structure cannot adapt to dynamic scene changes (such as people stream density fluctuation), so that missed detection or false detection is caused.
Behavior analysis relies on labeling, namely, a large amount of labeling data is needed for abnormality detection, and the generalization capability of a small sample scene is poor.
The resource efficiency is low, the delay is high due to the centralized processing of cloud computing, and the computing power of edge computing is limited.
Therefore, an intelligent video monitoring system and method based on deep learning are provided.
Disclosure of Invention
The invention aims to solve the problems in the background technology, and provides an intelligent video monitoring system and method based on deep learning.
The specific technical scheme is as follows:
An intelligent video monitoring system based on deep learning, comprising:
The heterogeneous sensing array comprises a reconfigurable visible light/infrared dual-mode camera group, a distributed microphone array and a millimeter wave radar, and is configured to generate multi-physical-field sensing data;
the cognition driving feature fusion module adopts a space-time-frequency spectrum joint coding technology, integrates a three-dimensional convolution network and a graph annotation force mechanism, and realizes cross-modal feature interaction;
The dynamic evolutionary detection model comprises a meta-learning framework based on neural architecture search, and can automatically adjust the depth and width of a network according to the complexity of a scene;
The causal reasoning tracking engine is used for constructing a space-time causal graph model, and fusing a target kinematic equation and a social force field model to conduct track prediction;
the meta-knowledge enhancement behavior analysis module integrates a pre-training large language model and a domain knowledge graph to realize zero-sample abnormal behavior reasoning;
the quantum-classical mixed computing architecture performs real-time detection at the classical computing layer and performs complex behavior pattern optimization at the quantum simulation layer.
The intelligent video monitoring system based on deep learning, wherein the cognitive driving feature fusion module comprises:
The frequency spectrum sensing submodule adopts a tunable Gabor wavelet group to extract time-frequency domain characteristics;
the space-time diagram construction unit models the multi-target motion trail as a dynamic different diagram;
the field coupling attention mechanism realizes electromagnetic-acoustic feature fusion through a feature propagation algorithm inspired by a Maxwell equation, and the cross-modal feature propagation process meets the following conditions:
Equation 1:
Equation 2:
Wherein:
H represents an acoustic feature tensor;
Jv is visual feature stream density;
d is fusion characteristic tensor;
E is an electromagnetic feature tensor;
E0 is a vacuum dielectric constant adjustment factor;
pa is a leachable cross-modal projection matrix;
Representing tensor product operation;
Fv: visual characteristic tensor;
The wave number vector calculation of the field coupling attention mechanism satisfies the following conditions:
Equation 3:
Wherein ω is a normalized frequency parameter of the characteristic channel;
c is an acousto-optic propagation speed ratio regulating factor;
θ and φ are learnable azimuth parameters.
The intelligent video monitoring system based on deep learning, wherein the dynamic evolutionary detection model comprises:
The super network controller dynamically generates a detection network structure adapting to the current scene based on reinforcement learning;
The multi-physical-field anchor frame generator is used for generating a three-dimensional detection anchor point by combining heat radiation characteristics and sound wave propagation characteristics;
an uncertainty perception output layer is configured with MonteCarloDropout mechanism quantization detection confidence;
The intelligent video monitoring system based on deep learning, wherein the causal reasoning tracking engine comprises:
The inverse fact track prediction unit is used for constructing a virtual intervention scene to perform causal effect calculation;
The social relation modeler adopts an implicit interaction rule between the graphic neural network learning targets;
an energy function optimizer for solving an optimal trajectory assumption based on a hamilton monte carlo method;
Wherein the trajectory prediction of the causal inference tracking engine (140) satisfies the modified hamilton equation:
Wherein:
q is a target position vector;
p is a momentum vector;
vsocial is a social potential item;
Φscene is scene constraint potential energy;
αj is the interaction intensity coefficient.
The intelligent video monitoring system based on deep learning, wherein the meta-knowledge enhancement behavior analysis module comprises:
the semantic distillation unit is used for migrating the common sense reasoning capacity of the large language model to the lightweight classifier;
The cause and effect discovery engine is used for identifying potential risk factors in the scene through invariance testing;
a virtual scene generator that synthesizes rare abnormal event training samples based on the countermeasure generation network;
the semantic distillation unit of the meta-knowledge enhancement behavior analysis module performs a contrast loss function:
cosine similarity formula:
Wherein:
hLLM epsilon Rd is a d-dimensional embedded vector output by the large language model;
hkg E Rd is a feature vector of the knowledge-graph entity after being encoded by the graph neural network;
τ.epsilon.0, 1 is the temperature super parameter;
k is the batch size.
The invention also provides an intelligent video monitoring method of the intelligent video monitoring system based on deep learning, which comprises the following steps:
s1, synchronously acquiring and space-time registering multiple physical field data;
S2, constructing a dynamic characteristic hypergraph to perform cross-modal correlation analysis;
s3, self-adaptive target detection based on online element learning;
S4, eliminating the mixed deviation in the tracking process by applying causality inference;
s5, carrying out behavior semantic analysis by combining physical laws with common sense knowledge;
s6, optimizing a global resource allocation strategy by utilizing a quantum annealing algorithm.
The intelligent video monitoring method, wherein step S2 includes:
establishing an electromagnetic-acoustic joint propagation model to correct multi-sensor data;
Modeling cross-modal high-order correlation by using hypergraph neural network;
redundant feature dimensions are eliminated by tensor decomposition.
The intelligent video monitoring method, wherein, in step S3, online element learning comprises:
constructing a meta-feature vector containing a scene complexity index;
Designing a neural process-based small sample adaptation mechanism;
and gradually improving the detection difficulty by adopting a course learning strategy.
In the intelligent video monitoring method, the video monitoring system, the step S5 specifically includes:
Embedding a Newton mechanical equation into a neural network to perform physical compliance constraint;
constructing a behavior interpretation framework based on a causal agent model;
and a contrast language-image pre-training model is applied to realize natural language query.
The intelligent video monitoring method further comprises the following steps:
Deploying a verifiable security module, and adopting a formalization method to ensure the interpretability of system decisions;
Establishing a digital twin simulation environment to realize system toughness test under an attack scene;
The design of a blockchain-based model update validation mechanism prevents resistance attacks.
The intelligent video monitoring system based on deep learning provided by the invention has the following advantages:
multimode perception enhancement, namely electromagnetic-acoustic-millimeter wave multi-physical field fusion, and environmental adaptability is greatly improved;
the dynamic self-adaptive capability is that a network structure, a detection model and a tracking strategy evolve in real time, and the scene switching response time is obviously shortened;
the complex behavior analysis combines zero sample anomaly detection and causal reasoning, so that the false alarm rate is greatly reduced, and the false alarm rate is greatly reduced;
Resource efficiency optimization, namely quantum-classical collaborative computing and edge-cloud resource scheduling, wherein the computing power requirement is obviously reduced;
The security and credibility guarantee is realized by formalized verification and blockchain storage, and the anti-attack capability of the system is greatly improved.
Drawings
FIG. 1 is a schematic diagram of an architecture of an intelligent video monitoring system based on deep learning provided by the invention;
Fig. 2 is a schematic architecture diagram of a cognitive driving feature fusion module in the intelligent video monitoring system based on deep learning provided by the invention;
FIG. 3 is a schematic diagram of a dynamic evolutionary detection model in an intelligent video monitoring system based on deep learning;
FIG. 4 is a schematic diagram of the architecture of a causal inference tracking engine in the deep learning-based intelligent video surveillance system provided by the invention;
Fig. 5 is a schematic diagram of a structure of a meta-knowledge enhancement behavior analysis module in the intelligent video monitoring system based on deep learning.
In the accompanying drawings:
110. A heterogeneous sensing array;
120. A cognitive driving feature fusion module; 121, a frequency spectrum sensing submodule, 122, a space-time diagram construction unit, 123, a field coupling attention mechanism;
130. a dynamic evolutionary detection model; 131, a super network controller, 132, a multi-physical field anchor frame generator, 133, an uncertainty perception output layer;
140. A causal reasoning tracking engine; 141, a counterfactual track prediction unit, 142, a social relation modeler, 143, an energy function optimizer;
150. The meta-knowledge enhancement behavior analysis module; 151, a semantic distilling unit 152, a causal discovery engine 153, a virtual scene generator;
160. Quantum-classical hybrid computing architecture.
Detailed Description
The technical scheme of the invention is further described below by the specific embodiments with reference to the accompanying drawings.
In which the drawings are for illustrative purposes only and are not intended to be construed as limiting the present patent, and in which certain elements of the drawings may be omitted, enlarged or reduced in order to better illustrate embodiments of the present invention, and not to represent actual product dimensions, it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
In the description of the present invention, it should be understood that, if the terms "upper", "lower", "left", "right", "inner", "outer", etc. indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, it is merely for convenience in describing the present invention and simplifying the description, and it is not indicated or implied that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, so that the terms describing the positional relationships in the drawings are merely for illustration and are not to be construed as limitations of the present patent, and that the specific meanings of the terms may be understood by those skilled in the art according to specific circumstances.
In the description of the present invention, unless explicitly stated or limited otherwise, the term "coupled" or the like should be interpreted broadly, as referring to a connection between two components, for example, a fixed connection, a removable connection, or a combination, a mechanical connection, an electrical connection, a direct connection, an indirect connection via an intermediary, a communication between two components, or an interaction between two components. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.
1-5, The deep learning-based intelligent video monitoring system provided by the embodiment comprises a heterogeneous sensing array 110, a cognitive driving feature fusion module 120, a dynamic evolutionary detection model 130, a causal reasoning tracking engine 140, a meta knowledge enhancement behavior analysis module 150 and a quantum-classical hybrid computing architecture 160, wherein the heterogeneous sensing array 110 is connected with the cognitive driving feature fusion module 120, the cognitive driving feature fusion module 120 is connected with the dynamic evolutionary detection model 130, the dynamic evolutionary detection model 130 is connected with the causal reasoning tracking engine 140, the causal reasoning tracking engine 140 is connected with the meta knowledge enhancement behavior analysis module 150, and the meta knowledge enhancement behavior analysis module 150 is connected with the quantum-classical hybrid computing architecture 160.
Wherein the heterogeneous sensing array 110 comprises a reconfigurable visible light/infrared dual-mode camera set, a distributed microphone array and a millimeter wave radar, and is configured to generate multi-physical-field sensing data;
wherein, the cognitive driving feature fusion module 120 adopts a space-time-frequency spectrum joint coding technology, integrates a three-dimensional convolution network and a graph annotation force mechanism to realize cross-modal feature interaction;
the dynamically evolutionary detection model 130 comprises a meta-learning framework based on neural architecture search, and can automatically adjust the depth and width of the network according to the complexity of the scene;
the causal reasoning tracking engine 140 is used for constructing a space-time causal graph model, and fusing a target kinematics equation and a social force field model to conduct track prediction;
The meta-knowledge enhancement behavior analysis module 150 integrates a pre-training large language model and a domain knowledge graph to realize zero-sample abnormal behavior reasoning;
The quantum-classical hybrid computing architecture 160 is used to perform real-time detection at the classical computing layer and complex behavior pattern optimization at the quantum simulation layer.
According to the intelligent video monitoring system based on deep learning, multiple physical field data (visible light/infrared, acoustic and millimeter wave) are integrated through the heterogeneous sensing array, so that the perception robustness under complex environments (such as low illumination and bad weather) is remarkably improved.
The quantum-classical mixed computing architecture realizes the division of real-time detection and complex mode optimization, and combines efficiency and precision.
Wherein, the cognitive driving feature fusion module 120 comprises:
the frequency spectrum sensing submodule 121 adopts a tunable Gabor wavelet group to extract time-frequency domain characteristics;
A space-time diagram construction unit 122 modeling the multi-target motion trajectory as a dynamic iso-graph;
The field coupling attention mechanism 123 realizes electromagnetic-acoustic feature fusion through a feature propagation algorithm inspired by a Maxwell equation, and the cross-modal feature propagation process meets the following conditions:
Equation 1:
Equation 2:
Wherein:
H represents an acoustic feature tensor, representing a feature distribution of the audio signal in a time-frequency domain;
Jv is the visual feature flow density, the dynamic change rate of visual features extracted by the visible light/infrared camera;
D is a fusion feature tensor comprising a joint representation of electromagnetic and acoustic features;
E is an electromagnetic feature tensor;
e0 is a vacuum dielectric constant adjustment factor used for balancing the initial weight of the electromagnetic feature;
Pa is a leachable cross-modal projection matrix, and the dimension is d multiplied by d, and the Pa is used for mapping acoustic features to an electromagnetic space;
Representing tensor product operation, and realizing high-order interaction of cross-modal characteristics;
Fv, visual feature tensor, spatiotemporal features extracted by 3D-CNN;
The wave number vector calculation of the field coupling attention mechanism satisfies:
Equation 3:
Omega is a normalized frequency parameter of the characteristic channel, and the value range [0,1] is obtained by normalizing the frequency spectrum energy of the characteristic channel;
c is an acousto-optic transmission speed ratio regulating factor, and the initial value is 3 multiplied by 108 m/s (light speed), and can be trained and regulated;
θ and φ are learnable azimuth parameters for defining the spatial directivity of the wave number vector.
The working flow is as follows:
1. frequency spectrum sensing, namely extracting a time-frequency characteristic H of an acoustic signal through a Gabor wavelet group;
2. Feature propagation, namely dynamically fusing visual features Fv with acoustic features H based on modified Maxwell equations (formulas 1 and 2) to generate a cross-modal tensor D;
3. generating a wave number vector k with adjustable directions according to a formula 3, wherein the wave number vector k is used for weighting the space contributions of different sensors;
4. And (3) cross-modal alignment, namely adaptively adjusting the fusion weight of electromagnetic and acoustic characteristics through a field coupling attention mechanism, and outputting a joint characteristic matrix.
The field coupling attention mechanism is based on Maxwell equation modeling cross-modal feature interaction, so that the problem of alignment deviation of electromagnetic-acoustic features in the traditional method is solved, and the fusion precision is remarkably improved;
The wave number vector is dynamically adjusted through the learnable azimuth angle parameters theta and phi), the multi-sensor space perception weight is adaptively optimized, and the redundant calculation amount is greatly reduced.
Wherein the dynamically evolutionary detection model 130 comprises:
the super network controller 131 dynamically generates a detection network structure adapted to the current scene based on reinforcement learning;
A multi-physical-field anchor frame generator 132 that combines the heat radiation characteristics and the acoustic wave propagation characteristics to generate a three-dimensional detection anchor point;
The uncertainty perception output layer 133 is configured with MonteCarloDropout mechanism quantization detection confidence;
The dynamic evolutionary detection model adjusts the network structure in real time through neural architecture search, adapts to scene complexity changes (such as people flow density mutation), and greatly improves model reasoning speed;
the uncertainty perception output layer quantifies the detection confidence, so that the false alarm rate (such as false detection of pedestrians in foggy days) can be obviously reduced.
Wherein the causal inference tracking engine 140 comprises:
the inverse fact track prediction unit 141 constructs a virtual intervention scene to perform causal effect calculation;
A social relationship modeler 142 that learns implicit interaction rules between targets using a graph neural network;
an energy function optimizer 143 that solves an optimal trajectory assumption based on the hamilton monte carlo method;
wherein the trajectory prediction of the causal inference tracking engine 140 satisfies the modified hamiltonian equation:
Wherein:
q is a target position vector, and the dimension is 3×1 (three-dimensional space coordinates);
p is the momentum vector, p=mv, where m is the target mass (unified by default to 1 kg), v is the velocity vector;
Vsocial social potential terms, calculated by the GNN, expressed as:
wherein Wj is interaction weight, and sigma is action range parameter;
Φscene is scene constraint potential energy, is generated based on a scene semantic segmentation map, and is defined as follows:
αj is the interaction intensity coefficient calculated by the attention mechanism:
The working flow is as follows:
trajectory initialization, namely initializing momentum p0 based on the position q0 output by the target detection module.
Potential energy calculation, namely extracting social relation through GNN to generate Vsocial, and generating phi scene by combining scene segmentation results.
Equation solving, namely numerically solving the Hamiltonian equation by using a fourth-order Runge-Kutta method, and predicting qt+1 and pt+1 at the next moment.
Trajectory optimization-the optimal trajectory assumption is screened by an energy function optimizer (hamilton monte carlo).
The causal reasoning tracking engine introduces social potential Vsocial and scene constraint potential phi scene, so that the problem of dense scene track intersection is solved, and the ID switching rate is greatly reduced;
the anti-facts track prediction models causal effects through virtual intervention scenes, and track prediction accuracy is effectively improved.
Wherein the meta-knowledge enhancement behavior analysis module 150 includes:
a semantic distillation unit 151 that migrates the common sense inference capability of the large language model to a lightweight classifier;
The cause and effect discovery engine 152 identifies potential risk factors in the scene through invariance testing;
A virtual scene generator 153 that synthesizes rare abnormal event training samples based on the countermeasure generation network;
the semantic distilling unit of the meta-knowledge enhancement behavior analysis module 150 performs a contrast loss function:
cosine similarity formula:
Wherein:
hLLM epsilon Rd is a d-dimensional embedded vector output by a large language model (such as GPT-4), and is extracted by a mean value pooling layer;
hkg E Rd is a feature vector of the knowledge-graph entity after being coded by a graph neural network, and is generated by coding by a graph rolling network (GCN);
τ epsilon (0, 1) is a temperature super parameter and is used for adjusting the steepness of probability distribution;
k is the batch size, and the dynamic adjustment strategy is:
(with FP16 precision, the coefficient is 2; FP32 is 4).
The working flow is as follows:
Extracting features, namely extracting hLLM and hkg from the large language model and the knowledge graph respectively;
Similarity calculation, namely calculating similarity matrixes of positive sample pairs (diagonals) and negative sample pairs (non-diagonals) according to a cosine similarity formula;
loss calculation, namely pulling the similarity of positive sample pairs through cross entropy loss and pushing negative sample pairs away;
and back propagation, namely updating parameters of the language model and the knowledge graph encoder by gradient to realize semantic alignment.
The semantic distillation contrast loss realizes the semantic alignment of the large language model and the knowledge graph, can greatly improve F1-score of zero sample anomaly detection, and the virtual scene generator synthesizes rare anomaly training data, thereby greatly reducing the labeling cost;
The embodiment also provides an intelligent video monitoring method of the intelligent video monitoring system based on deep learning, which comprises the following steps:
s1, synchronously acquiring and space-time registering multiple physical field data;
S2, constructing a dynamic characteristic hypergraph to perform cross-modal correlation analysis;
s3, self-adaptive target detection based on online element learning;
S4, eliminating the mixed deviation in the tracking process by applying causality inference;
s5, carrying out behavior semantic analysis by combining physical laws with common sense knowledge;
s6, optimizing a global resource allocation strategy by utilizing a quantum annealing algorithm.
Wherein, step S2 includes:
establishing an electromagnetic-acoustic joint propagation model to correct multi-sensor data;
Modeling cross-modal high-order correlation by using hypergraph neural network;
redundant feature dimensions are eliminated by tensor decomposition.
The online element learning in step S3 includes:
constructing a meta-feature vector containing a scene complexity index;
Designing a neural process-based small sample adaptation mechanism;
and gradually improving the detection difficulty by adopting a course learning strategy.
The step S5 specifically includes:
Embedding a Newton mechanical equation into a neural network to perform physical compliance constraint;
constructing a behavior interpretation framework based on a causal agent model;
A contrast language-image pre-training (CLIP) model is applied to implement natural language queries.
The intelligent video monitoring method further comprises the following steps:
Deploying a verifiable security module, and adopting a formalization method to ensure the interpretability of system decisions;
Establishing a digital twin simulation environment to realize system toughness test under an attack scene;
The design of a blockchain-based model update validation mechanism prevents resistance attacks.
According to the intelligent video monitoring method, the dynamic characteristic hypergraph modeling (S2) improves the cross-modal correlation analysis efficiency, the characteristic redundancy is greatly reduced, the quantum annealing algorithm (S6) optimizes the resource allocation strategy, the system energy consumption is obviously reduced, and the success rate of defending against attacks is greatly improved through block chain model verification.
The specific experimental data of the quantum annealing algorithm in resource scheduling are provided in the embodiment as follows:
Experimental setup
The scene is a large-scale transportation hub monitoring system (8 edge nodes and 1 cloud quantum computing node).
The comparison method comprises the following steps:
Traditional methods Genetic Algorithm (GA), simulated Annealing (SA)
The scheme is that an optimization algorithm based on a D-Wave2000Q quantum annealing machine
And the optimization target is that the task scheduling delay, the energy consumption and the resource utilization rate are balanced.
Experimental results
Index (I)Genetic Algorithm (GA)Simulated Annealing (SA)Quantum annealing (this scheme)
Average task delay (ms)320285152
Total energy consumption of system (kWh/day)18.716.29.8
Standard deviation of resource utilization0.340.290.15
Rate of completion of complex task (%)72%81%95%
The conclusion is that the delay is reduced by 52.5% compared with the delay of GA through parallel optimization of global resource allocation of the quantum annealing algorithm, the efficiency of solving Ising model by the quantum annealing machine is higher, the energy consumption is reduced by 45.9%, and the standard deviation of the resource utilization rate is reduced to 0.15, which is obviously superior to that of the traditional method.
The hardware implementation of the field coupling attention mechanism in this embodiment is as follows:
Hardware architecture design
Platform XilinxVersalACAPFPGA (AIEngine + programmable logic);
The core module comprises:
Field coupling attention mechanism hardware architecture:
sensor interface unit:
Multimode data input (hdmi 2.0for camera, I2Sfor microphone, SPIfor millimeter wave radar) is supported.
Time synchronization accuracy ± 1 μs.
Feature extraction engine:
gabor wavelet group 16-channel parallel filtering, frequency resolution 0.1Hz.
The 3D-CNN acceleration kernel supports inflation convolution, peak force 12TOPS.
A field coupling calculation unit:
customized tensor product operation moduleSupport 4D tensor operation 16X 16 x 16).
The wave number vector generator is used for programming azimuth angle (theta, phi) parameters, and the precision is 0.01 degree.
A storage subsystem:
On-chip HBM2 memory is 8GB, and bandwidth is 460GB/s.
And the characteristic cache is a double-buffer design, and supports real-time data pipeline processing.
Resource consumption and performance
Resource typeOccupancy ratioDescription of the invention
LUTs63%For logic operations and state machine control
DSPSlices78%Tensor product and wave number vector calculation acceleration
BlockRAM45%Feature caching and parameter storage
Power consumption23WPeak power consumption (@ 1.2 GHz)
Processing delay8 Ms/frame1080P video stream real-time processing
3. Implementation details
Tensor product operation optimization:
the Winograd algorithm is adopted to reduce the computational complexity, and the multiplication operation is reduced to 1/4 of that of the traditional method.
Wave number vector dynamic adjustment:
And the theta and phi parameters are updated in real time through an on-chip microcontroller (ARMCortex-R5), so that online learning is supported.
Actual scene test result of zero sample anomaly detection
Scene(s)Abnormality typeF1-scoreAccuracy rate ofRecall rate of recallFalse alarm rate
Subway station (peak time)Crowd retrograde0.830.850.810.09
Night road (Low light)Illegal parking0.780.800.760.12
Mall entrance (shielding environment)Leaving suspicious articles behind0.810.830.790.07
Crossroad (rain and fog weather)Pedestrian rushes red light0.750.770.730.15
Comparative experiments
Method ofAverage F1-scoreAnnotating data requirementsDeployment cost
Supervised learning (FasterR-CNN)0.6810,000+ Labeled sampleHigh height
CLIP zero sample0.710In (a)
The method of the scheme0.790Low and low
Conclusion(s)
Cross-scene robustness, namely under complex conditions of low illumination, shielding and the like, the F1-score is kept to be more than or equal to 0.75.
The cost advantage is that zero sample learning reduces labeling cost by 100%, and deployment cost is reduced by 60% compared with supervised learning.
Real-time performance: the edge-side inference delay is less than or equal to 50ms, the real-time monitoring requirement is met.
Summarizing, the quantum annealing algorithm remarkably improves the resource scheduling efficiency through global optimization, delay and energy consumption are reduced by more than 45%, the FPGA implementation scheme of the field coupling attention mechanism realizes the real-time processing capacity of 8 ms/frame with 23W power consumption, zero sample anomaly detection averages F1-score to 0.79 in four actual scenes, and the practicability and the robustness of the scheme are verified.
In summary, the intelligent video monitoring system based on deep learning provided by the embodiment has the following advantages:
multimode perception enhancement, namely electromagnetic-acoustic-millimeter wave multi-physical field fusion, and environmental adaptability is greatly improved;
the dynamic self-adaptive capability is that a network structure, a detection model and a tracking strategy evolve in real time, and the scene switching response time is obviously shortened;
the complex behavior analysis combines zero sample anomaly detection and causal reasoning, so that the false alarm rate is greatly reduced, and the false alarm rate is greatly reduced;
Resource efficiency optimization, namely quantum-classical collaborative computing and edge-cloud resource scheduling, wherein the computing power requirement is obviously reduced;
The security and credibility guarantee is realized by formalized verification and blockchain storage, and the anti-attack capability of the system is greatly improved.
Working principle flow
1. Data acquisition and alignment:
heterogeneous sensors (visible light/infrared cameras, microphone arrays, millimeter wave radars) synchronously acquire multiple physical fields of data.
The spatiotemporal registration module pairs Ji Duo the time stamp and the spatial coordinates of the source data.
2. Feature fusion and detection:
The field-coupled attention mechanism fuses electromagnetic-acoustic features to generate a cross-modal joint representation.
The dynamic evolutionary detection model is used for adaptively adjusting the network structure and outputting a target detection result and confidence.
3. Target tracking and reasoning:
the causal reasoning engine predicts the trajectory based on the improved hamiltonian equation and optimizes the path in combination with social potential energy.
The meta-knowledge enhancement module identifies abnormal behavior by comparing the loss alignment language model with the knowledge graph.
4. Resource allocation and optimization:
The quantum annealing algorithm optimizes the computing resource allocation, the edge end performs real-time detection, and the cloud updates the model.
The block chain verification ensures the updating safety of the model, and the digital twin environment tests the toughness of the system.
The innovation of the invention is that:
and the interdisciplinary technology is fused, and Maxwell equation, quantum computation and causal reasoning are introduced into video analysis, so that the limitation of the traditional algorithm is broken through.
And the dynamic self-adaptive architecture is used for realizing real-time optimization of a network structure and a detection strategy based on meta-learning and neural architecture search.
Knowledge-data double driving, namely collaborative reasoning of a large language model and a knowledge graph, and reducing dependence on annotation data.
Safe and efficient calculation, namely quantum-classical mixed architecture and blockchain verification, and balancing efficiency and safety.
Summarizing:
according to the intelligent video monitoring system with the autonomous evolution capability, through multi-dimensional technical innovation, the intelligent video monitoring system with the autonomous evolution capability is constructed, the traditional scheme is remarkably surpassed in the aspects of perception capability, reasoning precision, resource efficiency and safety, and the high standard requirements of scenes such as smart cities and security are met.
The foregoing is merely illustrative of the preferred embodiments of the present invention and is not intended to limit the embodiments and scope of the present invention, and it should be appreciated by those skilled in the art that equivalent substitutions and obvious variations may be made using the description and illustrations of the present invention, and are intended to be included in the scope of the present invention.

Claims (10)

Translated fromChinese
1.一种基于深度学习的智能视频监控系统,其特征在于,包括:1. An intelligent video surveillance system based on deep learning, characterized by comprising:异构传感阵列(110),包含可重构的可见光/红外双模摄像头组、分布式麦克风阵列及毫米波雷达,配置为生成多物理场感知数据;A heterogeneous sensor array (110), comprising a reconfigurable visible light/infrared dual-mode camera group, a distributed microphone array, and a millimeter wave radar, configured to generate multi-physics field perception data;认知驱动特征融合模块(120),采用时空-频谱联合编码技术,集成三维卷积网络与图注意力机制实现跨模态特征交互;A cognitive driven feature fusion module (120) adopts a spatiotemporal-spectral joint coding technique and integrates a three-dimensional convolutional network with a graph attention mechanism to achieve cross-modal feature interaction;动态可进化检测模型(130),包含基于神经架构搜索的元学习框架,能根据场景复杂度自动调整网络深度与宽度;Dynamically evolvable detection models (130), including a meta-learning framework based on neural architecture search, which can automatically adjust the network depth and width according to the complexity of the scene;因果推理跟踪引擎(140),构建时空因果图模型,融合目标运动学方程与社交力场模型进行轨迹预测;A causal reasoning tracking engine (140) constructs a spatiotemporal causal graph model and integrates the target kinematic equations and the social force field model for trajectory prediction;元知识增强行为分析模块(150),集成预训练大语言模型与领域知识图谱,实现零样本异常行为推理;A meta-knowledge enhanced behavior analysis module (150) integrates a pre-trained large language model and a domain knowledge graph to achieve zero-sample abnormal behavior reasoning;量子-经典混合计算架构(160),在经典计算层执行实时检测,在量子模拟层进行复杂行为模式优化。A quantum-classical hybrid computing architecture (160) performs real-time detection at the classical computing layer and optimizes complex behavioral patterns at the quantum simulation layer.2.根据权利要求1所述的基于深度学习的智能视频监控系统,其特征在于,所述认知驱动特征融合模块(120)包含:2. The deep learning-based intelligent video surveillance system according to claim 1, characterized in that the cognitive-driven feature fusion module (120) comprises:频谱感知子模块(121),采用可调谐Gabor小波组提取时频域特征;The spectrum sensing submodule (121) uses a tunable Gabor wavelet group to extract time-frequency domain features;时空图构建单元(122),将多目标运动轨迹建模为动态异构图;A spatiotemporal graph construction unit (122) models the multi-target motion trajectories as a dynamic heterogeneous graph;场耦合注意力机制(123),通过Maxwell方程启发的特征传播算法实现电磁-声学特征融合,其跨模态特征传播过程满足:The field-coupled attention mechanism (123) realizes electromagnetic-acoustic feature fusion through a feature propagation algorithm inspired by the Maxwell equations. Its cross-modal feature propagation process satisfies:公式1:Formula 1:公式2:Formula 2:其中:in:H表示声学特征张量;H represents the acoustic feature tensor;Jv为视觉特征流密度;Jv is the visual feature flow density;D为融合特征张量;D is the fused feature tensor;E为电磁特征张量;E is the electromagnetic characteristic tensor;∈0为真空介电常数调节因子;∈0 is the vacuum dielectric constant adjustment factor;Pa为可学习的跨模态投影矩阵;Pa is a learnable cross-modal projection matrix;表示张量积运算; Represents the tensor product operation;Fv:视觉特征张量;Fv: visual feature tensor;所述场耦合注意力机制的波数矢量计算满足:The wave number vector calculation of the field coupled attention mechanism satisfies:公式3:Formula 3:其中,ω为特征通道的归一化频率参数;Among them, ω is the normalized frequency parameter of the feature channel;c为声光传播速度比调节因子;c is the adjustment factor of the acoustic-optical propagation speed ratio;θ和φ为可学习的方位角参数。θ and φ are learnable azimuth parameters.3.根据权利要求1所述的基于深度学习的智能视频监控系统,其特征在于,所述动态可进化检测模型(130)包括:3. The deep learning-based intelligent video surveillance system according to claim 1, wherein the dynamic evolvable detection model (130) comprises:超网络控制器(131),基于强化学习动态生成适应当前场景的检测网络结构;A super network controller (131) dynamically generates a detection network structure adapted to the current scene based on reinforcement learning;多物理场锚框生成器(132),结合热辐射特征与声波传播特性生成三维检测锚点;A multi-physics anchor frame generator (132) generates three-dimensional detection anchor points by combining thermal radiation characteristics and acoustic wave propagation characteristics;不确定性感知输出层(133),配置有MonteCarloDropout机制量化检测置信度。The uncertainty-aware output layer (133) is configured with a Monte Carlo Dropout mechanism to quantify the detection confidence.4.根据权利要求1所述的基于深度学习的智能视频监控系统,其特征在于,所述因果推理跟踪引擎(140)包含:4. The deep learning-based intelligent video surveillance system according to claim 1, wherein the causal reasoning tracking engine (140) comprises:反事实轨迹预测单元(141),构建虚拟干预场景进行因果效应计算;A counterfactual trajectory prediction unit (141) constructs a virtual intervention scenario to calculate causal effects;社会关系建模器(142),采用图神经网络学习目标间的隐式交互规则;Social Relationship Modeler (142), which uses graph neural networks to learn implicit interaction rules between targets;能量函数优化器(143),基于哈密顿蒙特卡洛方法求解最优轨迹假设;Energy function optimizer (143), solving the optimal trajectory hypothesis based on Hamiltonian Monte Carlo method;其中,所述因果推理跟踪引擎(140)的轨迹预测满足改进的哈密顿方程:The trajectory prediction of the causal reasoning tracking engine (140) satisfies the improved Hamiltonian equation:其中:in:q为目标位置向量;q is the target position vector;p为动量向量;p is the momentum vector;Vsocial为社交势能项;Vsocial is the social potential energy item;Φscene为场景约束势能;Φscene is the scene constraint potential energy;αj为交互强度系数。αj is the interaction strength coefficient.5.根据权利要求1所述的基于深度学习的智能视频监控系统,其特征在于,所述元知识增强行为分析模块(150)包括:5. The deep learning-based intelligent video surveillance system according to claim 1, wherein the meta-knowledge enhanced behavior analysis module (150) comprises:语义蒸馏单元(151),将大语言模型的常识推理能力迁移到轻量化分类器;Semantic distillation unit (151), which transfers the common sense reasoning ability of large language models to lightweight classifiers;因果发现引擎(152),通过不变性测试识别场景中的潜在危险因素;Causal discovery engine (152), which identifies potential risk factors in scenarios through invariance testing;虚拟场景生成器(153),基于对抗生成网络合成罕见异常事件训练样本;Virtual scene generator (153), which synthesizes rare abnormal event training samples based on adversarial generative networks;所述元知识增强行为分析模块(150)的语义蒸馏单元执行对比损失函数:The semantic distillation unit of the meta-knowledge enhanced behavior analysis module (150) performs a contrast loss function:余弦相似度公式:Cosine similarity formula:其中:in:hLLM∈Rd为大语言模型输出的d维嵌入向量;hLLM ∈Rd is the d-dimensional embedding vector output by the large language model;hkg∈Rd为知识图谱实体经图神经网络编码后的特征向量;hkg∈Rd is the feature vector of the knowledge graph entity after being encoded by the graph neural network;τ∈(0,1]为温度超参数;τ∈(0,1] is the temperature hyperparameter;K为批次大小。K is the batch size.6.一种基于权利要求1-5任一基于深度学习的智能视频监控系统的智能视频监控方法,其特征在于,包括以下步骤:6. An intelligent video surveillance method based on any deep learning-based intelligent video surveillance system of claims 1-5, characterized in that it comprises the following steps:S1.多物理场数据同步采集与时空配准;S1.Synchronous acquisition and spatiotemporal registration of multi-physics field data;S2.构建动态特征超图进行跨模态关联分析;S2. Construct dynamic feature hypergraph for cross-modal correlation analysis;S3.基于在线元学习的自适应目标检测;S3. Adaptive object detection based on online meta-learning;S4.应用因果推断消除跟踪过程中的混杂偏差;S4. Apply causal inference to eliminate confounding bias in the tracking process;S5.结合物理规律与常识知识进行行为语义解析;S5. Combine physical laws with common sense knowledge to analyze behavioral semantics;S6.利用量子退火算法优化全局资源分配策略。S6. Use quantum annealing algorithm to optimize the global resource allocation strategy.7.根据权利要求6所述的智能视频监控方法,其特征在于,步骤S2包括:7. The intelligent video surveillance method according to claim 6, characterized in that step S2 comprises:建立电磁-声学联合传播模型进行多传感器数据校正;Establish electromagnetic-acoustic joint propagation model for multi-sensor data correction;应用超图神经网络建模跨模态高阶关联;Applying hypergraph neural networks to model cross-modal high-order relationships;通过张量分解消除冗余特征维度。Eliminate redundant feature dimensions through tensor decomposition.8.根据权利要求6所述的智能视频监控方法,其特征在于,步骤S3中在线元学习包含:8. The intelligent video surveillance method according to claim 6, characterized in that the online meta-learning in step S3 comprises:构建包含场景复杂度指标的元特征向量;Construct a meta-feature vector containing scene complexity indicators;设计基于神经过程的少样本适应机制;Designing few-shot adaptation mechanisms based on neural processes;采用课程学习策略逐步提升检测难度。Use a curriculum learning strategy to gradually increase the difficulty of the test.9.根据权利要求6所述的智能视频监控方法,其特征在于,步骤S5具体包括:9. The intelligent video surveillance method according to claim 6, characterized in that step S5 specifically comprises:将牛顿力学方程嵌入神经网络进行物理合规性约束;Embed Newtonian mechanics equations into neural networks for physical compliance constraints;构建基于因果中介模型的行为解释框架;Construct a behavioral explanation framework based on a causal mediation model;应用对比语言-图像预训练模型实现自然语言查询。Apply contrastive language-image pre-trained models to implement natural language queries.10.根据权利要求6所述的智能视频监控方法,其特征在于,还包括:10. The intelligent video monitoring method according to claim 6, further comprising:部署可验证安全模块,采用形式化方法保证系统决策的可解释性;Deploy verifiable security modules and use formal methods to ensure the explainability of system decisions;建立数字孪生仿真环境,实现攻击场景下的系统韧性测试;Establish a digital twin simulation environment to implement system resilience testing under attack scenarios;设计基于区块链的模型更新验证机制防止对抗性攻击。Design a blockchain-based model update verification mechanism to prevent adversarial attacks.
CN202510405856.5A2025-04-022025-04-02 An intelligent video surveillance system and method based on deep learningPendingCN120263940A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202510405856.5ACN120263940A (en)2025-04-022025-04-02 An intelligent video surveillance system and method based on deep learning

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202510405856.5ACN120263940A (en)2025-04-022025-04-02 An intelligent video surveillance system and method based on deep learning

Publications (1)

Publication NumberPublication Date
CN120263940Atrue CN120263940A (en)2025-07-04

Family

ID=96177032

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202510405856.5APendingCN120263940A (en)2025-04-022025-04-02 An intelligent video surveillance system and method based on deep learning

Country Status (1)

CountryLink
CN (1)CN120263940A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN120595254A (en)*2025-08-072025-09-05安徽星太宇科技有限公司 High-computing-power SAR real-time imaging and target recognition system based on FPGA+GPU architecture

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN120595254A (en)*2025-08-072025-09-05安徽星太宇科技有限公司 High-computing-power SAR real-time imaging and target recognition system based on FPGA+GPU architecture

Similar Documents

PublicationPublication DateTitle
US12175343B2 (en)Virtual intelligence and optimization through multi-source, real-time, and context-aware real-world data
CN114863226A (en) A cyber-physical system intrusion detection method
CN120263940A (en) An intelligent video surveillance system and method based on deep learning
Huang et al.Aircraft trajectory prediction based on bayesian optimised temporal convolutional network–bidirectional gated recurrent unit hybrid neural network
Liu et al.Video image target monitoring based on RNN-LSTM
CN120012164B (en)Privacy protection method for multi-mode pedestrian behavior monitoring
Unal et al.Towards robust autonomous driving systems through adversarial test set generation
Xu et al.Integration of mixture of experts and multimodal generative ai in internet of vehicles: A survey
Ranjith et al.Robust deep learning empowered real time object detection for unmanned aerial vehicles based surveillance applications
Song et al.Visibility estimation via deep label distribution learning in cloud environment
Ji et al.Learning the dynamics of time delay systems with trainable delays
Bhaumik et al.STLGRU: Spatio-temporal lightweight graph GRU for traffic flow prediction
Yi et al.V2IViewer: Towards efficient collaborative perception via point cloud data fusion and vehicle-to-infrastructure communications
Zhang et al.Obstacle‐transformer: A trajectory prediction network based on surrounding trajectories
Kateb et al.Archimedes Optimization with Deep Learning Based Aerial Image Classification for Cybersecurity Enabled UAV Networks.
Shang et al.[Retracted] Human‐Computer Interaction of Networked Vehicles Based on Big Data and Hybrid Intelligent Algorithm
Hashemi et al.A new comparison framework to survey neural networks‐based vehicle detection and classification approaches
Alotaibi et al.Enhancing Security in IoT-Assisted UAV Networks Using Adaptive Mongoose Optimization Algorithm With Deep Learning
Wu et al.Small target recognition method on weak features
QaffasAI-driven distributed IoT communication architecture for smart city traffic optimization
Li et al.An effective vehicle road scene recognition based on improved deep learning
Ye et al.A deep convolution neural network fusing of color feature and spatio-temporal feature for smoke detection
QuranEfficient and effective anomaly detection in autonomous vehicles: a combination of gradient boosting and ANFIS algorithms
Long et al.SDDNet: Infrared small and dim target detection network
Liu et al.A vehicle detection model based on 5G‐V2X for smart city security perception

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination

[8]ページ先頭

©2009-2025 Movatter.jp