Movatterモバイル変換


[0]ホーム

URL:


CN119337197A - A group behavior analysis method based on multimodal information fusion and dynamic updating - Google Patents

A group behavior analysis method based on multimodal information fusion and dynamic updating
Download PDF

Info

Publication number
CN119337197A
CN119337197ACN202411207783.0ACN202411207783ACN119337197ACN 119337197 ACN119337197 ACN 119337197ACN 202411207783 ACN202411207783 ACN 202411207783ACN 119337197 ACN119337197 ACN 119337197A
Authority
CN
China
Prior art keywords
data
feature
features
group
behavior
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202411207783.0A
Other languages
Chinese (zh)
Inventor
李树栋
张欣
吴晓波
方滨兴
曲春屹
姚明俊
冯依林
罗文伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou University
Original Assignee
Guangzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou UniversityfiledCriticalGuangzhou University
Priority to CN202411207783.0ApriorityCriticalpatent/CN119337197A/en
Publication of CN119337197ApublicationCriticalpatent/CN119337197A/en
Pendinglegal-statusCriticalCurrent

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本发明公开了一种多模态信息融合与动态更新的群体行为分析方法,包括:信息整合和动态更新,获取多模态信息并对信息进行预处理,提取特征,对特征进行表示转换、对齐及融合,采用流处理框架进行多模态信息的实时处理和动态更新;引入自学习机制和自优化机制;群体画像生成,从多模态数据中提取人口统计特征、地理位置分布、行为和情感心理特征、社会关系、社交媒体以及经济特征,生成群体画像;群体行为分析;输出可视化,以可视化的方式输出群体行为预测结果,包括行为趋势图、群体聚类图、关联规则图以及异常行为检测图。本发明方法有效整合和分析来自多种来源和形式的多模态数据,引入自学习和自优化机制,最终进行群体行为分析。

The present invention discloses a method for analyzing group behavior by fusion and dynamic update of multimodal information, including: information integration and dynamic update, obtaining multimodal information and preprocessing the information, extracting features, performing representation conversion, alignment and fusion on the features, using a stream processing framework for real-time processing and dynamic update of multimodal information; introducing a self-learning mechanism and a self-optimization mechanism; generating group portraits, extracting demographic characteristics, geographic location distribution, behavior and emotional psychological characteristics, social relations, social media and economic characteristics from multimodal data, generating group portraits; group behavior analysis; output visualization, outputting group behavior prediction results in a visualized manner, including behavior trend graphs, group clustering graphs, association rule graphs and abnormal behavior detection graphs. The method of the present invention effectively integrates and analyzes multimodal data from multiple sources and forms, introduces self-learning and self-optimization mechanisms, and finally performs group behavior analysis.

Description

Group behavior analysis method for multi-mode information fusion and dynamic update
Technical Field
The invention belongs to the technical field of group portraits, and particularly relates to a group behavior analysis method for multi-mode information fusion and dynamic update.
Background
Group portraits have become an important tool in modern information society to understand and analyze specific group behaviors, features, and motivations. The group portrait technology is widely applied to the fields of market research, social media analysis, safety information, public policy formulation and the like. Through group portrayal, organizations and institutions are able to learn more about the internal structure, behavioral patterns, demographics and psychological characteristics of a particular group to make more efficient strategies and decisions.
Natural Language Processing (NLP) and Social Network Analysis (SNA) are important technological means to construct group portraits. NLP technology can extract useful information from unstructured text data, such as emotional tendency, topic attention, views, etc., while SNA technology can reveal relational networks, information propagation paths, key characters, etc. inside a group. However, these prior arts rely on a single data type, may not fully capture the complexity and diversity of group behaviors, may not fully utilize information from multiple data sources, lack the ability for comprehensive analysis, have difficulty fully capturing diverse behavioral patterns and emotional expressions of groups, and exhibit low robustness in the face of data quality problems (e.g., noise, missing data), may be limited in prediction and decision accuracy, and may not fully utilize information in multi-modal data to improve analysis accuracy.
The data for a population typically comes from a variety of sources and forms, including structured data (e.g., demographic information), unstructured data (e.g., text content), semi-structured data (e.g., log files), and multimedia data (e.g., images, audio, video). How to effectively fuse these multi-modal data to form a comprehensive and accurate group representation is the focus of current technological development.
Disclosure of Invention
The invention aims to overcome the defects and shortcomings of the prior art and provides a group behavior analysis method for multi-mode information fusion and dynamic update.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
A multi-mode information fusion and dynamic update group behavior analysis method comprises the following steps:
Information integration and dynamic updating, namely acquiring multi-modal information, preprocessing the information, extracting characteristics, performing representation conversion, alignment and fusion on the characteristics, and performing real-time processing and dynamic updating on the multi-modal information by adopting a stream processing frame;
Introducing a self-learning mechanism and a self-optimizing mechanism;
generating a group portrait, namely extracting demographic characteristics, geographical position distribution, behavioral and emotional psychological characteristics, social relations, social media and economic characteristics from multi-mode data to generate the group portrait;
Group behavior analysis;
And outputting visualization, and outputting group behavior prediction results in a visual mode, wherein the group behavior prediction results comprise a behavior trend graph, a group clustering graph, an association rule graph and an abnormal behavior detection graph.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. The method is based on a multi-modal data integration, self-learning and self-adaptive optimization mechanism, and based on feature extraction, representation and alignment of multi-modal data of texts, images, audios and videos, the multi-modal data are subjected to deep fusion by combining technical means such as feature splicing, weighted summation and PCA dimension reduction, real-time processing and dynamic updating of the multi-modal data are realized through a stream processing framework APACHE FLINK, emotion analysis is performed by utilizing a BERT model, group behavior prediction is performed by adopting K-means clustering and LSTM time sequence analysis, and reliable decision support is provided for the fields such as marketing, public safety and network safety.
2. The method provides a solution in the aspects of feature extraction and unified representation of multi-mode data such as texts, images, audios and videos, the lack of a unified feature representation method in the prior art leads to insufficient data fusion, and the method realizes depth fusion by carrying out feature extraction and unified normalization on models such as BERT, CNN, MFCC and LSTM, thereby remarkably improving the precision and depth of data integration.
3. The invention adopts APACHE FLINK flow processing frame to realize the real-time processing and dynamic updating of the multi-mode data, and the prior art generally adopts batch processing method, which is difficult to reflect the latest data change in real time.
4. The invention has the capabilities of automatically labeling new data, self-updating a model, learning feedback mechanism, super-parameter optimization, model architecture optimization and computing resource optimization, is less in application and needs manual intervention in the aspects of self-learning and self-optimization in the prior art, and realizes highly intelligent and dynamically adaptive data analysis through technologies of on-line gradient descent, genetic algorithm, neural architecture search and the like.
Drawings
FIG. 1 is an overall flow chart of the present invention;
FIG. 2 is a flow chart of information integration and dynamic update of the present invention;
FIG. 3 is a self-learning and self-optimizing mechanism analysis flow chart of the present invention;
FIG. 4 is a flow chart of population representation generation and population behavior analysis of the present invention.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but embodiments of the present invention are not limited thereto.
Examples
As shown in FIG. 1, the method for analyzing the group behavior of the multi-modal information fusion and dynamic update comprises the following steps:
s1, information integration and dynamic updating are carried out, multi-mode information is obtained, preprocessing is carried out on the information, characteristics are extracted, representation conversion, alignment and fusion are carried out on the characteristics, and a stream processing frame is adopted for carrying out real-time processing and dynamic updating on the multi-mode information;
As shown in fig. 2, in this embodiment, multi-mode information is obtained and preprocessed, specifically:
Collecting multimodal data from a plurality of data sources including text, images, audio and video, text data in the form of Ti, image data in the form of Ii, audio data in the form of Ai, and video data in the form of Vi;
The method comprises the steps of cleaning and standardizing different types of data, wherein the text data is processed by the following steps:
Dividing words from a text, removing stop words, adding special marks, converting words into corresponding IDs, and regenerating an attention mask, namely generating a vector with the same length as a word ID sequence, indicating which positions are valid words and which are filled, finally generating a segment ID, marking words belonging to a first sentence and words belonging to a second sentence, and finally outputting three vectors, namely Tiid=(id1,id2…idsequence_length), the attention mask and the segment ID, wherein Tiid is the generated vector with the same length as the word ID sequence, and sequence_length is the word sequence length;
The processing procedure of the image data is as follows:
The method comprises the steps of adjusting the size of an image, wherein the output shape is H multiplied by W multiplied by C, H and W are the sizes of target pictures, and C is the number of channels;
according to the formula:
normalizing the adjusted image, wherein,For an image resized to an image, μ is the mean of the image dataset, σ is the standard deviation, and the final output form is the following matrix:
The processing process of the audio data is as follows:
the audio spectrum is obtained using a fast fourier transform according to the following formula:
Wherein,Is an audio signal in the frequency domain, f is the frequency, and N is the number of sampling points;
Denoising using spectral subtraction using the following formula:
Wherein,Is the noise spectrum estimated in the mute section;
Finally obtainThe form is as follows:
Aidenoised=(a1denoised,a2denoised…andenoised)
Each element corresponds to an amplitude value of the denoised audio signal at a time k, and n represents the total length of the denoised audio signal;
The processing process of the video data is as follows:
Video decoding is carried out on the video, the video stream is decoded into a plurality of independent frames, a frame sequence Videcoded is output, the shape is NXHXW XC, wherein N is the number of frames, H and W are the height and width of the frames, and C is the number of color channels;
Frame sampling, namely sampling key frames from a frame sequence, and outputting a sampled frame sequence Visampled;
performing frame pretreatment, and performing size adjustment and normalization operation on each frame to enable the frame to be suitable for model input;
Finally, the preprocessed frame sequence Vipreprocessed is output, and the output form is as follows:
in this embodiment, the text feature extraction is specifically:
Extracting semantic features from text data by using natural language processing NLP, inputting the processed text into a BERT model, wherein the BERT input comprises word IDs, attention masks and segmentation IDs, the input word ID sequence Tiid is firstly converted into word vectors with fixed dimensions through an embedding layer, the shape E epsilon Rsequence_length×hidden_size and the hidden_size are hidden layer dimensions, and then each input vector is calculated in the model through a multi-head self-attention mechanism, and the dependency relationship between the input vector and other positions in the sequence is calculated according to the following formula:
Wherein Q, K, V are respectively a matrix of queries, keys and values, obtained by different linear transformations, dk is a scaling factor, and then the multiple self-attentive outputs are spliced together:
MultiHead(Q,K,V)=Concat(head1,…,headh)WO
Wherein Concat () is a concatenation function, headi=Attention(Qi,Ki,Vi), and finally the vector of each position passes through a feedforward neural network to further process its semantic information:
FFN(x)=max(0,xW1+b1)W2+b2
Wherein W1,W2 is a learnable weight matrix, b1,b2 is a bias term, and x is the input of the last step;
And carrying out residual connection on the output of each layer and the input of each layer, and carrying out layer normalization, wherein the formula is as follows:
LayerNorm(x+SubLayer(x))
Finally outputting two vectors, the first is a sequence feature matrix, a three-dimensional tensor provides a vector representation of each word in the sequence, and the second is a sentence feature vector, a two-dimensional tensor provides an overall vector representation of the whole sequence;
The extraction of image features is specifically as follows:
the method comprises the steps of extracting visual features from image data by utilizing a convolutional neural network CNN deep learning method, inputting a preprocessed image, and firstly performing convolutional operation:
Wherein Fij is a pixel value of the convolution feature map, Wmnc is a weight of the convolution kernel, b is an offset term, Hf and Wf are a height and a width of the convolution kernel, respectively, and C is a channel number of the input image;
Outputting a convolution characteristic diagram F with the shape ofWherein Nf is the number of convolution kernels, representing the depth of the output feature map, i.e. the number of channels;
processing by a ReLU nonlinear activation function to introduce nonlinear characteristics:
Aij=ReLU(Fij)=max(0,Fij)
Wherein Aij is a pixel value in the activated feature map, and the shape of the activated feature map A is the same as that of the convolution feature map;
The feature map is downsampled by using the maximum pooling, the space dimension of the feature map is reduced, and important feature information is reserved at the same time:
Pij=max(A(i,j),A(i+1,j),A(i,j+1),A(i+1,j+1))
the feature map P after pooling has the shape ofWherein H '' x W '' is the height and width after pooling;
after multi-layer convolution, activation and pooling, the extracted feature map is typically flattened into a one-dimensional vector, i.e., Pflat =flat (P), and input to the fully connected layer, further combining and extracting global features:
z=Wfc·Pflat+bfc
Wherein Wfc is the weight matrix of the full connection layer, and bfc is the bias term;
Finally outputting the feature vector of the image;
the audio feature extraction is specifically as follows:
Extracting sound characteristics from the audio data by adopting a Mel Frequency Cepstrum Coefficient (MFCC), and performing short-time Fourier transform on the processed audio:
Where m is the frame index, f is the frequency, N is the number of samples per frame, hop_size is the frame shift, and the frequency domain signal is outputThe shape is a two-dimensional matrix, and the two-dimensional matrix contains spectrum information of a plurality of frames;
applying mel frequency scale:
Wherein, f0 is the normal frequency, fmel is the Mel frequency, and output Mel frequency spectrum Aimel(m,fmel which represents the frequency spectrum information of the audio signal on Mel scale;
Calculating a mel frequency cepstral coefficient:
where c is the coefficient index of the MFCC, taking the first 13 coefficients, i.e., c=13;
the final extracted MFCC features are represented as a two-dimensional array;
The video feature extraction is specifically as follows:
Extracting key frame characteristics and dynamic information from video data by combining image characteristic extraction and action recognition technology, firstly extracting static characteristics of the processed video by using CNN to obtain a static characteristic direction z of each frame, wherein the shape is z epsilon Rfeature_size, then extracting dynamic characteristics by using LSTM, for each time step, sequentially processing input characteristics zt by using LSTM, and combining hidden state ht-1 and cell state Ct-1 of a preamble time step to generate hidden state ht and cell state Ct of the current time step, wherein the processing of each time step is as follows:
ht=ot·tanh(Ct)
finally, static characteristics and dynamic characteristics are obtained.
In this embodiment, the feature is subjected to representation conversion, alignment and fusion, specifically:
converting the extracted features into unified vector representation so as to facilitate comparison and fusion between different modality data;
The method comprises the steps of synchronizing data of different modes in time and space through a time stamp, an event mark or other alignment modes, ensuring the relevance between the data, aligning the modes with time dimension, namely video and audio, according to the minimum time step in the time aspect, aligning the modes based on space anchor points in the space aspect, and normalizing by using zero mean unit variance:
Wherein μX is the mean of the feature vectors, σX is the standard deviation;
The final output normalized feature vector Tnorm、Inorm、Anorm、Vnorm;Tnorm、Inorm、Anorm、Vnorm is the feature vector of the corresponding four modes output after normalization by using zero mean unit variance;
The feature fusion specifically comprises the following steps:
Splicing the feature vectors of different modes together to form a comprehensive feature vector Fconcat∈Rbatch_size×(dimT+dimI+dimA+dimV), wherein the dimension of the new feature vector after splicing is the sum of the dimensions of all modes, the batch_size is the number of data samples processed during each training or reasoning, the specific numerical value is determined according to the actual situation, and the meaning of dimT, dimI, dimA, dimV is the dimension of the feature vector of four modes;
according to the importance of different modes, the weighted summation is carried out on each characteristic vector:
Fweighted=wT·Tnorm+wI·Inorm+wA·Anorm+wV·Vnorm
the weighted comprehensive feature vector Fweighted∈Rbatch_size×(dimT+dimI+dimA+dimV) forms a comprehensive feature, and the weight is determined according to experience and a use scene;
Finally, performing dimension reduction processing, namely performing dimension reduction on the high-dimension features by using PCA to reduce calculation complexity and noise, wherein firstly, input data is subjected to standardization processing:
where μ is the mean of each feature and σ is the standard deviation, normalized data is used to calculate a covariance matrix describing the linear correlation between features:
Outputting a covariance matrix C, and carrying out eigenvalue decomposition on the covariance matrix to obtain eigenvalues and corresponding eigenvectors, wherein the eigenvalues represent the variance of the data in the direction of each eigenvector:
Cvi=λivi
Where λi is the ith eigenvalue and vi is the corresponding eigenvector;
And outputting a eigenvalue vector lambda= [ lambda12,…,λdim ] and an eigenvector matrix V epsilon Rdim×dim, and selecting eigenvectors corresponding to the first n largest eigenvalues as principal components according to the size ordering of the eigenvalues:
Vpca=[v1,v2,…,vn]
where n is the number of principal components selected, determined from the cumulative variance contribution;
projecting the standardized data onto the selected principal component space to obtain a feature representation after dimension reduction:
Fpca=FstdVpca
And finally outputting a matrix of low-dimensional characteristic representations, wherein the matrix comprises the representation of the projection of the original high-dimensional data to the principal component directions, and the data of each sample has a projection value in the principal component directions, and the projection values form new characteristics Fpca∈Rbatch_size×n after the dimension reduction, wherein n is the target dimension after the dimension reduction.
In this embodiment, a stream processing framework is adopted to perform real-time processing and dynamic update of multi-mode information, specifically:
real-time processing and dynamic updating of the multi-modal information is performed using a stream processing framework APACHE FLINK;
Carrying out real-time cleaning and feature extraction on the accessed data stream through the Flink, and assuming that the weight of the model is W (t), expressing the dynamically updated model weight as:
The data streams are aggregated and analyzed in real time through the Flink, the Flink aggregates the data streams in a specific time window through window operation, and the aggregation characteristics in the time window [ t, t+delta t ] are expressed as follows:
generating real-time analysis results by classifying, clustering or other analysis operations on the aggregated features;
the Flink iterative operation continuously optimizes the model or the processing strategy, and updates the parameters of the model or adjusts the feature extraction method through the feedback analysis result, so that the system performs better when processing the next batch of data, and the iterative updating process is expressed as follows:
Wherein,Is the feature after the kth iteration, and α is the update step.
S2, introducing a self-learning mechanism and a self-optimizing mechanism;
as shown in fig. 3, in this embodiment, the introduction of the self-learning mechanism specifically includes:
marking and deducing new data automatically by using trained BERT model, CNN and LSTM, and generating label or classification result automatically;
The self-updating of the model is specifically as follows:
The method comprises the steps of updating model parameters when new data arrives through an online gradient descent method, adapting to dynamic changes of the data, updating weights of the new text data by using online learning expansion of BERT when the new text data arrives, inputting the new data Xnew, a real label ynew and a current model parameter W, and updating a formula through online gradient descent:
wherein, eta is the learning rate,Is the gradient of the loss function with respect to the model parameters. Outputting updated model parameters Wnew;
The learning feedback mechanism is specifically as follows:
Comparing the predicted label with the real label, adjusting the self-adaptive learning rate and the self-adaptive loss function weight, and finally obtaining the adjusted learning rate, loss function or other super parameters;
Input model predictive labelsAnd a real label ytrue, according to the prediction error, adjusting the learning rate and the weight of the loss function, wherein the error calculation formula is as follows:
the learning rate adjustment formula is:
ηnew=ηold·(1-α·ε2)
wherein α is an adjustment coefficient;
and (3) carrying out weight adjustment of a loss function:
λnew=λold+β·ε
Wherein β is an adjustment coefficient;
The adjusted learning rate ηnew and the loss function weight λnew are output.
In this embodiment, the introduction of the self-optimization mechanism specifically includes:
super-parameter optimization, wherein the super-parameter optimization aims at finding out a super-parameter combination which can optimize the performance of the model in a given super-parameter space;
Defining super parameters to be optimized and possible value ranges { theta12,...,θn }, simulating natural evolution to select optimal parameters by using a genetic algorithm, and repeating the processes of selection, crossing and mutation until convergence conditions or a preset maximum algebra are reached;
Model architecture optimization, specifically:
Defining the structural parameters of the neural network, evaluating the neural network architecture Ai randomly or by strategy selection from the search space S:
Wherein,The method is based on a model of a framework Ai, G (Ai) is average loss on a verification set, and the framework A* with the best performance on the verification set is selected as a final optimal framework through multiple sampling and evaluation;
the computing resource optimization is specifically as follows:
The method comprises the steps of firstly inputting the current computing resource state and task demands, including the utilization rate of a CPU and a GPU, the utilization rate of a memory and the priority of a task, adopting a task scheduling algorithm and resource monitoring and allocation to adjust resource allocation, monitoring the utilization rate of the resource in real time, dynamically adjusting the resource allocation to deal with the change of a system and the execution condition of the task, and finally outputting the optimized resource allocation and task scheduling strategy to improve the overall efficiency and response speed of the system.
S3, generating a group portrait, namely extracting demographic characteristics, geographical position distribution, behavioral and emotional psychological characteristics, social relations, social media and economic characteristics from the multi-mode data to generate the group portrait;
as shown in fig. 4, in this embodiment, the demographic feature extraction is specifically:
extracting information including, but not limited to, age, gender, and occupation from text data using a NER model based on conditional random fields, the formula is as follows:
Wherein y is a tag sequence, namely, extracted demographic characteristics, x is an input sequence, namely, text data, fk (y, x) is a characteristic function, lambdak is a weight, Z (x) is a normalization factor, and probability distribution normalization is ensured;
The extracted demographic characteristics generate a demographic characteristic vector Di containing the numerical values of various information;
The geographic position distribution is extracted specifically as follows:
Converting the position described by the text into geographic coordinates by using a geographic coding technology, calculating the square sum WCSS in the minimized cluster by using a K-means clustering method, and identifying a main active area;
The extraction of behavioral and emotional psychological characteristics is specifically as follows:
user behavior is analyzed using time series analysis and frequent pattern mining techniques, and user behavior patterns are analyzed and predicted using an autoregressive moving average model ARIMA:
xt=c+φ1xt-12xt-2+...+φpxt-p+∈t1t-1+...+θqt-q
Where c is a constant term, phip is an autoregressive coefficient, thetaq is a moving average coefficient, Et is an error term;
Mining frequent patterns from behavior data by using an Apriori algorithm, and identifying behavior combinations which occur frequently; the identified behavior pattern and frequent behavior combination generates a behavior feature vector Bi which represents the features of the user on the behavior pattern;
Analyzing emotion tendencies in text data by using an emotion analysis model BERT, extracting main topics in the text data by using an LDA topic modeling method, analyzing attention points and attitudes of groups, and evaluating psychological characteristics of users by combining an OCEAN model based on text analysis.
The social relationship feature extraction is specifically as follows:
Constructing a graph structure, defining nodes and edges, converting social network data into the graph structure, wherein the nodes represent individuals (such as users and devices), and the edges represent relationships (such as friend relationships and communication records) among the individuals;
Identifying a tightly connected node population in a social network based on modular community detection:
Wherein Aij is an element of an adjacency matrix, which indicates whether a connection exists between nodes i and j, ki and kj are degrees of the nodes i and j, m is the total number of edges in the graph, delta (ci,cj) is an indication function, and when i and j belong to the same social area, the value is 1, otherwise, the value is 0;
the social media and economic characteristics are extracted specifically as follows:
Analyzing the media type and frequency of the user contact, analyzing the source and preference of the user acquired information, extracting the expenditure type and the amount from the consumption record, and analyzing the consumption trend;
After demographic characteristics, geographical position distribution, behavioral and emotional psychological characteristics, social relations, social media and economic characteristics are extracted, the group portraits are intuitively displayed in a bar graph, a pie chart, a histogram and a network chart mode, the behavioral trends of the groups are analyzed, potential risks and challenges are estimated, and an entry for downloading analysis reports is provided.
S4, group behavior analysis, as shown in FIG. 4, in this embodiment, the method specifically includes:
Group clustering, specifically:
The number of clusters is determined using the elbow method, and the appropriate number of clusters is determined by calculating the intra-cluster squares sum WCSS at different cluster numbers k to find the k value, i.e., the elbow point, that significantly reduces WCSS:
Wherein Cj is the j-th cluster, muj is the mass center of the cluster, clustering is carried out by using a K-means algorithm to obtain a result so as to analyze group behaviors, cluster centers provide a concise description of the overall characteristics of the cluster, intra-cluster differences represent the consistency and diversity of the characteristics of members in the group, the inter-cluster differences are measured by the distances between different cluster centers, and larger inter-cluster differences mean that the characteristics of different groups are obviously different;
time series and emotion feature analysis, specifically:
The method comprises the steps of taking time sequence features as input, dividing a group into different behavior pattern groups by using a K-means clustering algorithm, carrying out contrast analysis on the groups of different clusters, identifying feature differences of the groups, and analyzing causal relations among the time sequence features by using a method of the Granges causal test, wherein a core formula of the Granges causal test is as follows:
If betaj is obviously different from zero, Xt is considered as the Grandide cause of Yt, finally, the LSTM time sequence model is used for predicting the future behavior trend of the group, and the BERT model is used for emotion analysis to obtain emotion polarity, emotion intensity and emotion trend;
the behavior association analysis specifically comprises the following steps:
converting the text data into a form in which each line represents a transaction, each transaction containing a set of items;
Inputting an Apriori algorithm, counting the occurrence frequency of each item in all transactions, calculating the support degree, screening frequent item sets according to a minimum support degree threshold, generating new candidate item sets by using the frequent item sets, calculating the support degree of the frequent item sets, and repeating the steps of generating the candidate frequent item sets and calculating the support degree until the new frequent item sets cannot be generated;
For each frequent item set, all possible association rules are generated, and for rule a- > B, confidence is calculated:
Confidence(A->B)=Support(A∪B)/Support(A)
screening out meaningful association rules according to the minimum confidence coefficient threshold value;
abnormality detection, specifically:
Calculating the average distance between the individual behaviors and the nearest neighbors by using a K-nearest neighbor algorithm, wherein the average distance is specifically as follows:
Where Xik and Xjk are the values of data points Xi and Xj, respectively, on the ith feature and n is the dimension of the feature;
If the distance exceeds the preset threshold value, the individual is likely to be abnormal, analyzing and explaining each abnormal point, finding out the reason for the deviation from the normal state, visualizing by using a scatter diagram and a box diagram method, and obviously highlighting the abnormal behavior.
S5, outputting a visualization, outputting a group behavior prediction result in a visual mode, wherein the step comprises the following steps:
the behavior trend graph is used for displaying the time sequence change trend of group behaviors, and each broken line represents the change of different behavior modes, so that the dynamic change of the behaviors in different time periods can be observed conveniently;
A group clustering graph, which is to display a clustering result of a group by adopting a two-dimensional or three-dimensional scatter diagram, wherein different colors or shapes represent different group clusters, and a mark in the center of the cluster is used for representing representative characteristics of the clusters so as to be convenient for identifying the similarity and the difference among the groups;
the association rule diagram is used for displaying association relations between behaviors by using the network diagram, nodes represent behaviors or events, connecting lines between the nodes represent association rules of the nodes, and the thickness and the color of the lines represent the strength and the confidence of association;
The abnormal behavior detection diagram is used for displaying abnormal behaviors in the group through a scatter diagram and a box diagram, wherein the scatter diagram is used for intuitively displaying differences between abnormal points and normal behaviors, and the box diagram is used for displaying distribution conditions of data and positions of abnormal values.
It should also be noted that in this specification, terms such as "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

Translated fromChinese
1.一种多模态信息融合与动态更新的群体行为分析方法,其特征在于,包括:1. A method for analyzing group behavior by multimodal information fusion and dynamic updating, characterized by comprising:信息整合和动态更新,获取多模态信息并对信息进行预处理,提取特征,对特征进行表示转换、对齐及融合,采用流处理框架进行多模态信息的实时处理和动态更新;Information integration and dynamic update: acquiring multimodal information and preprocessing the information, extracting features, performing representation conversion, alignment and fusion on features, and using a stream processing framework for real-time processing and dynamic update of multimodal information;引入自学习机制和自优化机制;Introduce self-learning and self-optimization mechanisms;群体画像生成,从多模态数据中提取人口统计特征、地理位置分布、行为和情感心理特征、社会关系、社交媒体以及经济特征,生成群体画像;Group portrait generation: extracting demographic characteristics, geographic location distribution, behavioral and emotional psychological characteristics, social relations, social media, and economic characteristics from multimodal data to generate group portraits;群体行为分析;Group behavior analysis;输出可视化,以可视化的方式输出群体行为预测结果,包括行为趋势图、群体聚类图、关联规则图以及异常行为检测图。Output visualization: Output group behavior prediction results in a visual way, including behavior trend graphs, group clustering graphs, association rule graphs, and abnormal behavior detection graphs.2.根据权利要求1所述的一种多模态信息融合与动态更新的群体行为分析方法,其特征在于,获取多模态信息并对信息进行预处理,具体为:2. The method for analyzing group behavior by multimodal information fusion and dynamic updating according to claim 1 is characterized in that the multimodal information is obtained and preprocessed, specifically:从多个数据源收集多模态数据,包括文本、图像、音频及视频;文本数据以Ti形式输入,图像数据以Ii形式输入,音频数据以Ai形式输入,视频数据以Vi形式输入;Collect multimodal data from multiple data sources, including text, images, audio and video; text data is input in the form of Ti , image data is input in the form of Ii , audio data is input in the form of Ai , and video data is input in the form of Vi ;对不同类型的数据进行清洗和标准化处理,其中文本数据的处理过程为:Clean and standardize different types of data, and the processing process of text data is as follows:对文本进行分词和去除停用词,添加特殊标记以及将词转换为对应的ID,再生成注意力掩码,即生成一个与词ID序列长度相同的向量,指示哪些位置是有效的词,哪些是填充,最后生成分段ID,标记属于第一句的词和属于第二句的词;最终输出三个向量:Tiid=(id1,id2…idsequence_length)、注意力掩码以及分段ID;其中,Tiid为生成的与词ID序列长度相同的向量,sequence_length为词序列长度;Segment the text and remove stop words, add special tags and convert words to corresponding IDs, then generate an attention mask, that is, generate a vector with the same length as the word ID sequence, indicating which positions are valid words and which are fillers, and finally generate a segment ID to mark the words belonging to the first sentence and the words belonging to the second sentence; finally outputthree vectors:Tiid = (id1 ,id2 ...idsequence_length ), attention mask and segment ID; whereTiid is the generatedvector with the same length as the word ID sequence, and sequence_length is the word sequence length;图像数据的处理过程为:The image data processing process is:对图像进行大小的调整,输出形状为H×W×C,H和W是目标图片的尺寸,C为通道数;Resize the image to an output shape of H×W×C, where H and W are the sizes of the target image and C is the number of channels;按照公式:According to the formula:对调整后的图像进行归一化处理,其中,为对图像进行大小调整后的图像,μ是图像数据集的均值,σ是标准差,最终输出形式如下矩阵:The adjusted image is normalized, where is the image after resizing the image, μ is the mean of the image dataset, σ is the standard deviation, and the final output is in the following matrix:音频数据的处理过程为:The audio data processing process is:使用快速傅里叶变换按照以下公式得到音频频谱:Use fast Fourier transform to get the audio spectrum according to the following formula:其中,是频域中的音频信号,f是频率,N是采样点的数量;in, is the audio signal in the frequency domain, f is the frequency, and N is the number of sampling points;采用谱减法利用以下公式进行去噪:The spectral subtraction method is used to remove noise using the following formula:其中,是在静音部分估计得到的噪声频谱;in, is the noise spectrum estimated in the silent part;最终得到形式如下:Finally get The form is as follows:Aidenoised=(a1denoised,a2denoised…andenoised)Aidenoised = (a1denoised , a2denoised ...andenoised )每个元素对应于去噪后音频信号在时间k时刻的振幅值,n表示去噪后音频信号的总长度;Each element corresponds to the amplitude value of the denoised audio signal at time k, and n represents the total length of the denoised audio signal;视频数据的处理过程为:The video data processing process is as follows:对视频进行视频解码,将视频流解码为多个独立的帧,输出帧序列Videcoded,形状为N×H×W×C,其中,N是帧数,H和W是帧的高度和宽度,C是颜色通道数;Decode the video, decode the video stream into multiple independent frames, and output a frame sequenceVidecoded with a shape of N×H×W×C, where N is the number of frames, H and W are the height and width of the frame, and C is the number of color channels;帧采样,从帧序列中采样关键帧,输出采样后的帧序列VisampledFrame sampling, sampling key frames from the frame sequence, and outputting the sampled frame sequenceVisampled ;进行帧预处理,对每帧进行大小调整、归一化操作,使其适合模型输入;Perform frame preprocessing, resize and normalize each frame to make it suitable for model input;最终输出预处理后的帧序列Vipreprocessed,输出形式如下矩阵:The final output is the preprocessed frame sequenceVipreprocessed , and the output form is the following matrix:3.根据权利要求2所述的一种多模态信息融合与动态更新的群体行为分析方法,其特征在于,提取文本特征具体为:3. According to the method for group behavior analysis of multimodal information fusion and dynamic update according to claim 2, it is characterized in that the text features are extracted specifically as follows:使用自然语言处理NLP,从文本数据中提取语义特征,处理后的文本输入BERT模型,BERT的输入包括词ID、注意力掩码和分段ID;输入的词ID序列Tiid首先通过嵌入层,转化为固定维度的词向量,形状为E∈Rsequence_length×hidden_size,hidden_size为隐藏层维度;之后每个输入向量在模型中通过多头自注意力机制,计算其与序列中其他位置的依赖关系,公式如下:Natural language processing (NLP) is used to extract semantic features from text data. The processed text is input into the BERT model. The input of BERT includes word ID, attention mask and segment ID. The input word ID sequence Tiid is first converted into a fixed-dimensional word vector through the embedding layer. The shape is E∈Rsequence_length×hidden_size , where hidden_size is the hidden layer dimension. After that, each input vector is calculated in the model through a multi-head self-attention mechanism to calculate its dependency with other positions in the sequence. The formula is as follows:其中,Q、K、V分别是查询、键和值的矩阵,通过不同的线性变换得到,dk是缩放因子,再将多个自注意力的输出拼接在一起:Among them, Q, K, V are the matrices of query, key and value respectively, obtained by different linear transformations, dk is the scaling factor, and then the outputs of multiple self-attentions are spliced together:MultiHead(Q,K,V)=Concat(head1,…,headh)WOMultiHead(Q,K,V)=Concat(head1 ,...,headh )WO其中,Concat()为拼接操作函数,headi=Attention(Qi,Ki,Vi),最后每个位置的向量通过一个前馈神经网络,进一步处理其语义信息:Among them, Concat() is the concatenation operation function, headi = Attention(Qi ,Ki ,Vi ), and finally the vector of each position passes through a feedforward neural network to further process its semantic information:FFN(x)=max(0,xW1+b1)W2+b2FFN(x)=max(0,xW1 +b1 )W2 +b2其中,W1、W2是可学习的权重矩阵,b1、b2是偏置项,x为上一步的输入;Among them, W1 and W2 are learnable weight matrices, b1 and b2 are bias terms, and x is the input of the previous step;将每一层的输出与其输入进行残差连接,并进行层归一化,公式如下:The output of each layer is residually connected to its input and the layer is normalized. The formula is as follows:LayerNorm(x+SubLayer(x))LayerNorm(x+SubLayer(x))最终输出两个向量,第一个是序列特征矩阵,一个三维张量,提供了序列中每个词的向量表示,第二个是句子特征向量,一个二维张量,提供了整个序列的整体向量表示;The final output is two vectors. The first is the sequence feature matrix, a three-dimensional tensor that provides the vector representation of each word in the sequence. The second is the sentence feature vector, a two-dimensional tensor that provides the overall vector representation of the entire sequence.提取图像特征具体为:The specific image features are extracted as follows:利用卷积神经网络CNN深度学习方法,从图像数据中提取视觉特征;输入预处理后的图像,首先进行卷积操作:The convolutional neural network (CNN) deep learning method is used to extract visual features from image data. The preprocessed image is input and the convolution operation is performed first:其中,Fij是卷积特征图的一个像素值,Wmnc是卷积核的权重,b是偏置项,Hf和Wf分别是卷积核的高度和宽度,C是输入图像的通道数;Among them, Fij is a pixel value of the convolution feature map, Wmnc is the weight of the convolution kernel, b is the bias term, Hf and Wf are the height and width of the convolution kernel respectively, and C is the number of channels of the input image;输出卷积特征图F,形状为其中,Nf是卷积核的数量,代表输出特征图的深度,即通道数;Output convolution feature map F, shape is Among them,Nf is the number of convolution kernels, representing the depth of the output feature map, that is, the number of channels;通过ReLU非线性激活函数处理,以引入非线性特征:Through the ReLU nonlinear activation function, nonlinear features are introduced:Aij=ReLU(Fij)=max(0,Fij)Aij =ReLU(Fij )=max(0,Fij )其中,Aij是激活后的特征图中的一个像素值,激活后的特征图A,形状与卷积特征图相同;Among them, Aij is a pixel value in the activated feature map, and the shape of the activated feature map A is the same as the convolution feature map;使用最大池化对特征图进行下采样,减少特征图的空间维度,同时保留重要的特征信息:Use maximum pooling to downsample the feature map to reduce the spatial dimension of the feature map while retaining important feature information:Pij=max(A(i,j),A(i+1,j),A(i,j+1),A(i+1,j+1))Pij =max(A(i,j),A(i+1,j),A(i,j+1),A(i+1,j+1))池化后的特征图P,形状为其中H″×W″是池化后的高度和宽度;The feature map P after pooling has the shape of Where H″×W″ is the height and width after pooling;在经过多层卷积、激活和池化之后,提取到的特征图通常被展平为一维向量,即Pflat=Flatten(P),并输入到全连接层,进一步组合和提取全局特征:After multiple layers of convolution, activation, and pooling, the extracted feature maps are usually flattened into a one-dimensional vector, i.e., Pflat = Flatten(P), and input into the fully connected layer to further combine and extract global features:z=Wfc·Pflat+bfcz=Wfc ·Pflat +bfc其中,Wfc是全连接层的权重矩阵,bfc是偏置项;Among them, Wfc is the weight matrix of the fully connected layer, and bfc is the bias term;最终输出图像的特征向量;The feature vector of the final output image;提取音频特征具体为:The specific audio features extracted are:采用梅尔频率倒谱系数MFCC,从音频数据中提取声音特征,处理后的音频进行短时傅里叶变换:Mel frequency cepstral coefficients (MFCC) are used to extract sound features from audio data, and the processed audio is subjected to short-time Fourier transform:其中,m是帧索引,f是频率,N是每帧的采样点数量,hop_size是帧移,输出频域信号Aistft,形状为二维矩阵,包含多个帧的频谱信息;Where m is the frame index, f is the frequency, N is the number of sampling points per frame, hop_size is the frame shift, and the output frequency domain signal Aistft is in the shape of a two-dimensional matrix, containing the spectrum information of multiple frames;应用梅尔频率标度:Apply Mel frequency scaling:其中,f0是普通频率,fmel是梅尔频率;输出梅尔频谱Aimel(m,fmel),表示音频信号在梅尔尺度上的频谱信息;Where f0 is the normal frequency and fmel is the Mel frequency; the output Mel spectrum Aimel (m, fmel ) represents the spectrum information of the audio signal on the Mel scale;计算梅尔频率倒谱系数:Compute the Mel-frequency cepstral coefficients:其中,c是MFCC的系数索引,取前13个系数,即c=13;Where c is the coefficient index of MFCC, taking the first 13 coefficients, that is, c = 13;最终提取的MFCC特征表示为一个二维数组;The final extracted MFCC features are represented as a two-dimensional array;提取视频特征具体为:The specific video features are extracted as follows:结合图像特征提取和动作识别技术,从视频数据中提取关键帧特征和动态信息,处理好的视频首先使用CNN进行静态特征提取,得到每帧的静态特征向z,形状为z∈Rfeature_size,之后使用LSTM进行动态特征提取,对于每一时间步,LSTM依次处理输入特征zt,结合前序时间步的隐藏状态ht-1和细胞状态Ct-1,生成当前时间步的隐藏状态ht和细胞状态Ct每一时间步的处理如下:Combining image feature extraction and action recognition technology, key frame features and dynamic information are extracted from video data. The processed video is first subjected to static feature extraction using CNN to obtain the static feature vector z of each frame, with a shape of z∈Rfeature_size . Then, LSTM is used for dynamic feature extraction. For each time step, LSTM processes the input feature zt in turn, combines the hidden state ht-1 and cell state Ct-1 of the previous time step, and generates the hidden state ht and cell state Ct of the current time step. The processing of each time step is as follows:ht=ot·tanh(Ct)ht = ot ·tanh(Ct )最终得到静态特征和动态特征。Finally, static features and dynamic features are obtained.4.根据权利要求3所述的一种多模态信息融合与动态更新的群体行为分析方法,其特征在于,对特征进行表示转换、对齐及融合,具体为:4. According to claim 3, a method for analyzing group behavior by multimodal information fusion and dynamic update is characterized in that the features are represented, aligned and fused, specifically:将提取的特征转换为统一的向量表示,以便于不同模态数据之间的比较和融合;Convert the extracted features into a unified vector representation to facilitate comparison and fusion between different modality data;通过时间戳、事件标记或其他对齐方式,将不同模态的数据在时间和空间上进行同步,确保数据之间的关联性;时间方面,将具有时间维度的模态,即视频和音频,根据最小时间步进行对齐;空间方面,基于空间锚点进行对齐;使用零均值单位方差归一化:Through timestamps, event markers or other alignment methods, data of different modalities are synchronized in time and space to ensure the relevance between data; in terms of time, the modalities with time dimension, namely video and audio, are aligned according to the minimum time step; in terms of space, alignment is performed based on spatial anchor points; and zero mean unit variance normalization is used:其中,μX是特征向量的均值,σX是标准差;Among them, μX is the mean of the eigenvector and σX is the standard deviation;最终输出归一化后的特征向量Tnorm、Inorm、Anorm、Vnorm;Tnorm、Inorm、Anorm、Vnorm为使用零均值单位方差归一化后输出的对应四个模态的特征向量;The normalized eigenvectors Tnorm , Inorm , Anorm , and Vnorm are finally output; Tnorm , Inorm , Anorm , and Vnorm are the eigenvectors of the corresponding four modes output after normalization using zero mean and unit variance;特征融合具体为:The specific feature fusion is:将不同模态的特征向量拼接在一起,形成一个综合的特征向量Fconcat∈Rbatch_size×(dimT+dimI+dimA+dimV),拼接后,新特征向量的维度为各模态维度的总和;其中,batch_size为每次训练或推理时处理的数据样本数量;dimT、dimI、dimA、dimV的含义为四个模态的特征向量的维度;Concatenate the feature vectors of different modalities to form a comprehensive feature vector Fconcat ∈Rbatch_size×(dimT+dimI+dimA+dimV) . After concatenation, the dimension of the new feature vector is the sum of the dimensions of each modality; batch_size is the number of data samples processed during each training or inference; dimT, dimI, dimA, and dimV are the dimensions of the feature vectors of the four modalities.根据不同模态的重要性,对各特征向量进行加权求和:According to the importance of different modes, each eigenvector is weighted and summed:Fweighted=wT·Tnorm+wI·Inorm+wA·Anorm+wV·VnormFweighted =wT ·Tnorm +wI ·Inorm +wA ·Anorm +wV ·Vnorm加权后的综合特征向量Fweighted∈Rbatch_size×(dimT+dimI+dimA+dimV),形成综合特征,权重的选择依据经验和使用场景确定;The weighted comprehensive feature vector Fweighted ∈Rbatch_size×(dimT+dimI+dimA+dimV) forms a comprehensive feature. The choice of weights is determined based on experience and usage scenarios.最后进行降维处理,使用PCA对高维特征进行降维,以减少计算复杂度和噪声,首先要对输入数据进行标准化处理:Finally, we perform dimensionality reduction and use PCA to reduce the dimensionality of high-dimensional features to reduce computational complexity and noise. First, we need to standardize the input data:其中,μ是每个特征的均值,σ是标准差;标准化后的数据用于计算协方差矩阵,这个矩阵描述了各特征之间的线性相关性:Among them, μ is the mean of each feature and σ is the standard deviation; the standardized data is used to calculate the covariance matrix, which describes the linear correlation between the features:输出协方差矩阵C,对协方差矩阵进行特征值分解,得到特征值和对应的特征向量,特征值表示了每个特征向量方向上数据的方差大小:Output the covariance matrix C, perform eigenvalue decomposition on the covariance matrix, and obtain the eigenvalues and corresponding eigenvectors. The eigenvalues represent the variance of the data in the direction of each eigenvector:CviiviCvii vi其中,λi是第i个特征值,vi是对应的特征向量;Among them, λi is the i-th eigenvalue, andvi is the corresponding eigenvector;输出特征值向量λ=[λ12,…,λdim]和特征向量矩阵V∈Rdim×dim,根据特征值的大小排序,选择前n个最大的特征值所对应的特征向量作为主成分:Output eigenvalue vector λ = [λ12 ,…,λdim ] and eigenvector matrix V∈Rdim×dim , sort by eigenvalue, and select the eigenvectors corresponding to the first n largest eigenvalues as principal components:Vpca=[v1,v2,…,vn]Vpca =[v1 ,v2 ,…,vn ]其中,n是选择的主成分数量,根据累积方差贡献率来决定;Among them, n is the number of principal components selected, which is determined by the cumulative variance contribution rate;将标准化后的数据投影到选定的主成分空间上,得到降维后的特征表示:Project the standardized data onto the selected principal component space to obtain the feature representation after dimensionality reduction:Fpca=FstdVpcaFpca =Fstd Vpca最终输出一个低维特征表示的矩阵,其包含了原始高维数据投影到主成分方向上的表示;每个样本的数据在这些主成分方向上都有一个投影值,这些投影值构成了降维后的新特征Fpca∈Rbatch_size×n,其中n是降维后的目标维度。Finally, a low-dimensional feature representation matrix is output, which contains the representation of the original high-dimensional data projected onto the principal component direction; the data of each sample has a projection value in these principal component directions, and these projection values constitute the new feature Fpca ∈Rbatch_size×n after dimensionality reduction, where n is the target dimension after dimensionality reduction.5.根据权利要求4所述的一种多模态信息融合与动态更新的群体行为分析方法,其特征在于,采用流处理框架进行多模态信息的实时处理和动态更新,具体为:5. The method for analyzing group behavior by multimodal information fusion and dynamic update according to claim 4 is characterized in that a stream processing framework is used to perform real-time processing and dynamic update of multimodal information, specifically:使用流处理框架Apache Flink进行多模态信息的实时处理和动态更新;Use the stream processing framework Apache Flink for real-time processing and dynamic updating of multimodal information;通过Flink对接入的数据流进行实时清洗和特征提取,假设模型的权重为W(t),则动态更新的模型权重表示为:Flink is used to perform real-time cleaning and feature extraction on the incoming data stream. Assuming that the weight of the model is W(t), the dynamically updated model weight is expressed as:通过Flink进行数据流的实时聚合与分析,Flink通过窗口操作将数据流在特定时间窗口内聚合,在时间窗口[t,t+Δt]内的聚合特征表示为:Flink is used to perform real-time aggregation and analysis of data streams. Flink aggregates data streams within a specific time window through window operations. The aggregation characteristics within the time window [t, t+Δt] are expressed as:通过对聚合后的特征进行分类、聚类或其他分析操作,生成实时的分析结果;Generate real-time analysis results by classifying, clustering or other analysis operations on the aggregated features;Flink迭代操作对模型或处理策略进行持续优化,通过反馈回来的分析结果,更新模型的参数或调整特征提取方法,使系统在处理下一批数据时表现更好,迭代更新过程表示为:Flink iterative operations continuously optimize the model or processing strategy. Based on the feedback of analysis results, the model parameters are updated or the feature extraction method is adjusted to make the system perform better when processing the next batch of data. The iterative update process is expressed as:其中,是第k次迭代后的特征,α是更新步长。in, is the feature after the kth iteration, and α is the update step size.6.根据权利要求1所述的一种多模态信息融合与动态更新的群体行为分析方法,其特征在于,引入自学习机制具体包括:6. The method for analyzing group behavior by multimodal information fusion and dynamic updating according to claim 1, wherein the introduction of a self-learning mechanism specifically comprises:新数据的自动标注,使用已经训练好的BERT模型、CNN及LSTM对新数据进行标注和推断,自动生成的标签或分类结果;输出自动生成的标签或分类结果;Automatically label new data, use the trained BERT model, CNN and LSTM to label and infer new data, and automatically generate labels or classification results; output automatically generated labels or classification results;模型的自我更新,具体为:Self-update of the model, specifically:通过在线梯度下降方法,在新数据到达时更新模型参数,适应数据的动态变化;在新文本数据到达时,使用BERT的在线学习扩展,更新其权重;输入新数据Xnew及其真实标签ynew以及当前模型参数W,在线梯度下降更新公式:Through the online gradient descent method, when new data arrives, the model parameters are updated to adapt to the dynamic changes of data; when new text data arrives, BERT's online learning extension is used to update its weights; input new data Xnew and its true label ynew and the current model parameters W, and the online gradient descent update formula is:其中,η是学习率,是损失函数关于模型参数的梯度;输出更新后的模型参数WnewWhere η is the learning rate, is the gradient of the loss function with respect to the model parameters; outputs the updated model parameters Wnew ;学习反馈机制,具体为:Learning feedback mechanism, specifically:将预测的标签与真实标签进行对比,调整自适应学习率、自适应损失函数权重,最后得到调整后的学习率、损失函数或其他超参数;Compare the predicted labels with the true labels, adjust the adaptive learning rate, adaptive loss function weight, and finally obtain the adjusted learning rate, loss function or other hyperparameters;输入模型预测标签和真实标签ytrue,根据预测误差调整学习率和损失函数权重,误差计算公式为:Input model prediction label and the true label ytrue , adjust the learning rate and loss function weight according to the prediction error, and the error calculation formula is:学习率调整公式为:The learning rate adjustment formula is:ηnew=ηold·(1-α·ε2)ηnew =ηold ·(1-α·ε2 )其中,α是调整系数;Among them, α is the adjustment coefficient;进行损失函数权重调整:Adjust the loss function weight:λnew=λold+β·ελnew = λold + β·ε其中,β是调整系数;Among them, β is the adjustment coefficient;输出调整后的学习率ηnew和损失函数权重λnewOutput the adjusted learning rate ηnew and loss function weight λnew .7.根据权利要求1所述的一种多模态信息融合与动态更新的群体行为分析方法,其特征在于,引入自优化机制具体包括:7. The method for analyzing group behavior by multimodal information fusion and dynamic updating according to claim 1, wherein the introduction of a self-optimization mechanism specifically comprises:超参数优化,超参数优化的目标是在给定的超参数空间中,找到能使模型性能最佳的超参数组合;Hyperparameter optimization: The goal of hyperparameter optimization is to find the hyperparameter combination that can achieve the best model performance in a given hyperparameter space;定义待优化的超参数及其可能的取值范围{θ12,...,θn},使用遗传算法模拟自然进化选择最优参数,重复选择、交叉、变异过程,直至达到收敛条件或达到预设的最大代数;最终确定的最佳超参数配置,以供BERT模型、CNN及LSTM使用;Define the hyperparameters to be optimized and their possible value ranges {θ12 ,...,θn }, use genetic algorithms to simulate natural evolution to select the optimal parameters, repeat the selection, crossover, and mutation process until the convergence condition is reached or the preset maximum number of generations is reached; finally determine the optimal hyperparameter configuration for use in the BERT model, CNN, and LSTM;模型架构优化,具体为:Model architecture optimization, specifically:输入初始的神经网络架构设计以及用于训练和验证的数据集;定义神经网络的结构参数,从搜索空间S中随机或通过策略选择神经网络架构Ai进行评估:Input the initial neural network architecture design and the dataset for training and validation; define the structural parameters of the neural network, and select the neural network architectureAi randomly or through a strategy from the search space S for evaluation:其中,是基于架构Ai的模型,G(Ai)是验证集上的平均损失;经过多次采样和评估,选择在验证集上表现最好的架构A*作为最终的最优架构;in, is the model based on architectureAi , G(Ai ) is the average loss on the validation set; after multiple sampling and evaluation, the architecture A* with the best performance on the validation set is selected as the final optimal architecture;计算资源优化,具体为:Computing resource optimization, specifically:首先输入当前计算资源状态和任务需求,包括CPU和GPU的使用率、内存使用情况及任务的优先级;采取任务调度算法和资源监控和分配调整资源分配;实时监控资源使用情况,动态调整资源分配以应对系统的变化和任务的执行情况;最终输出优化后的资源分配和任务调度策略以提升系统的整体效率和响应速度。First, input the current computing resource status and task requirements, including CPU and GPU utilization, memory usage, and task priority; use task scheduling algorithms and resource monitoring and allocation to adjust resource allocation; monitor resource usage in real time and dynamically adjust resource allocation to respond to system changes and task execution; finally output optimized resource allocation and task scheduling strategies to improve the overall efficiency and response speed of the system.8.根据权利要求1所述的一种多模态信息融合与动态更新的群体行为分析方法,其特征在于,提取人口统计特征具体为:8. The method for analyzing group behavior by multimodal information fusion and dynamic updating according to claim 1, wherein extracting demographic features is specifically:使用基于条件随机场的NER模型,从文本数据中提取包括但不限于年龄、性别及职业信息,公式如下:Use the NER model based on conditional random fields to extract information including but not limited to age, gender, and occupation from text data. The formula is as follows:其中,y是标签序列,即提取的人口统计特征;x是输入序列,即文本数据;fk(y,x)是特征函数,λk是权重,Z(x)是规范化因子,确保概率分布归一化;Where y is the label sequence, i.e. the extracted demographic features; x is the input sequence, i.e. the text data; fk (y, x) is the feature function, λk is the weight, and Z(x) is the normalization factor to ensure the normalization of the probability distribution;提取出的人口统计特征,生成一个人口统计特征向量Di,包含各项信息的数值;The extracted demographic features generate a demographic feature vector Di , which contains the numerical values of various information;提取地理位置分布,具体为:Extract geographic location distribution, specifically:使用地理编码技术将文本描述的位置转换为地理坐标,使用K-means聚类方法计算最小化簇内平方和WCSS,识别主要的活动区域;最后通过热力图可视化地理分布,显示群体在不同地区的分布密度;Use geocoding technology to convert the location described in the text into geographic coordinates, use the K-means clustering method to calculate the minimum intra-cluster sum of squares WCSS, and identify the main activity areas; finally, visualize the geographical distribution through heat maps to show the distribution density of groups in different regions;提取行为和情感心理特征,具体为:Extract behavioral and emotional psychological features, specifically:使用时间序列分析和频繁模式挖掘技术分析用户行为,使用自回归积分滑动平均模型ARIMA来分析和预测用户行为模式:Use time series analysis and frequent pattern mining techniques to analyze user behavior, and use the autoregressive integrated moving average model ARIMA to analyze and predict user behavior patterns:xt=c+φ1xt-12xt-2+...+φpxt-p+∈t1t-1+...+θqt-qxt =c+φ1 xt-12 xt-2 +...+φp xtp +∈t1t-1 +...+θqtq其中,c是常数项,φp是自回归系数,θq是移动平均系数,∈t是误差项;Among them, c is the constant term, φp is the autoregressive coefficient, θq is the moving average coefficient, and ∈t is the error term;使用Apriori算法从行为数据中挖掘频繁模式,识别高频出现的行为组合;识别出的行为模式和频繁行为组合,生成一个行为特征向量Bi,表示用户在行为模式上的特征;Use the Apriori algorithm to mine frequent patterns from behavioral data and identify frequently occurring behavioral combinations; the identified behavioral patterns and frequent behavioral combinations generate a behavioral feature vectorBi , which represents the characteristics of the user's behavioral pattern;使用情感分析模型BERT分析文本数据中的情感倾向;使用LDA主题建模方法提取文本数据中的主要主题,分析群体的关注点和态度;基于文本分析,结合OCEAN模型评估用户的心理特征。Use the sentiment analysis model BERT to analyze the sentiment tendencies in text data; use the LDA topic modeling method to extract the main topics in the text data and analyze the group's concerns and attitudes; based on text analysis, combine the OCEAN model to evaluate the user's psychological characteristics.9.根据权利要求1所述的一种多模态信息融合与动态更新的群体行为分析方法,其特征在于,提取社会关系特征具体为:9. The method for analyzing group behavior by multimodal information fusion and dynamic updating according to claim 1, wherein the social relationship features are extracted as follows:构建图结构,定义节点与边,将社交网络数据转换为图结构,节点代表个体,边代表个体之间的关系;根据实际数据,进一步区分边的类型,进行权重计算;如果关系的强度存在差异,为边分配权重;Construct a graph structure, define nodes and edges, and convert social network data into a graph structure. Nodes represent individuals, and edges represent relationships between individuals. Based on actual data, further distinguish edge types and calculate weights. If there are differences in the strength of relationships, assign weights to edges.基于模块度的社区检测,识别社交网络中紧密连接的节点群体:Community detection based on modularity, identifying groups of closely connected nodes in social networks:其中,Aij是邻接矩阵的元素,表示节点i和j之间是否存在连接,ki和kj是节点i和j的度数,m是图中边的总数,δ(ci,cj)是指示函数,当i和j属于同一社区时为1,否则为0;通过该方式来分析社交网络结构和群体内部的关系,识别网络中有影响力的个体或节点;Among them, Aij is an element of the adjacency matrix, indicating whether there is a connection between nodes i and j, ki and kj are the degrees of nodes i and j, m is the total number of edges in the graph, and δ(ci ,cj ) is an indicator function, which is 1 when i and j belong to the same community, otherwise it is 0; this method can be used to analyze the social network structure and the relationship within the group, and identify influential individuals or nodes in the network;提取社交媒体以及经济特征具体为:The specific extraction of social media and economic features is as follows:分析用户接触的媒体类型和频率;分析用户获取信息的来源和偏好;从消费记录中提取支出类型和金额,分析消费趋势;Analyze the types and frequency of media users are exposed to; analyze the sources and preferences of users for obtaining information; extract expenditure types and amounts from consumption records and analyze consumption trends;提取人口统计特征、地理位置分布、行为和情感心理特征、社会关系、社交媒体以及经济特征后,使用条形图、饼图、柱状图和网络图的方式来直观展示群体画像,分析群体的行为趋势,预估潜在风险和挑战,并提供下载分析报告的入口。After extracting demographic characteristics, geographic location distribution, behavioral and emotional psychological characteristics, social relations, social media, and economic characteristics, bar charts, pie charts, column charts, and network diagrams are used to intuitively display group portraits, analyze group behavioral trends, estimate potential risks and challenges, and provide an entry for downloading analysis reports.10.根据权利要求1所述的一种多模态信息融合与动态更新的群体行为分析法,其特征在于,群体行为分析具体包括:10. The method for group behavior analysis based on multimodal information fusion and dynamic update according to claim 1, wherein group behavior analysis specifically includes:群体聚类,具体为:Group clustering, specifically:使用肘部法确定聚类数目,通过计算不同聚类数k下的簇内平方和WCSS,找到使WCSS显著减少的k值,即肘部点,以此确定合适的聚类数目:The elbow method is used to determine the number of clusters. By calculating the intra-cluster sum of squares WCSS under different cluster numbers k, the k value that significantly reduces WCSS, that is, the elbow point, is found to determine the appropriate number of clusters:其中,Cj是第j个簇,μj是该簇的质心;然后使用K-means算法进行聚类,以获得的结果来分析群体行为;簇中心提供了对该簇总体特征的简洁描述;簇内差异代表了群体内成员特征的一致性和多样性;簇间差异通过不同簇中心之间的距离来衡量,较大的簇间差异意味着不同群体之间的特征明显不同;Among them,Cj is the jth cluster,μj is the centroid of the cluster; K-means algorithm is then used for clustering, and the obtained results are used to analyze group behavior; the cluster center provides a concise description of the overall characteristics of the cluster; the intra-cluster difference represents the consistency and diversity of the characteristics of the members within the group; the inter-cluster difference is measured by the distance between different cluster centers, and a larger inter-cluster difference means that the characteristics of different groups are significantly different;时间序列和情感特征分析,具体为:Time series and sentiment feature analysis, specifically:将时间序列特征作为输入,使用K-means聚类算法,将群体分为不同的行为模式群体;对不同聚类的群体进行对比分析,识别各群体的特征差异;使用格兰杰因果检验的方法,分析时间序列特征之间的因果关系,格兰杰因果检验的核心公式为:Taking time series features as input, the K-means clustering algorithm is used to divide groups into different behavior pattern groups; the groups of different clusters are compared and analyzed to identify the characteristic differences of each group; the Granger causality test method is used to analyze the causal relationship between time series features. The core formula of the Granger causality test is:其中,如果βj显著不为零,则认为Xt是Yt的格兰杰原因;最后使用LSTM时间序列模型预测群体未来的行为趋势,使用BERT模型进行情感分析,得出情感极性、情感强度及情感趋势;Among them, if βj is significantly not zero, it is considered that Xt is the Granger cause of Yt ; finally, the LSTM time series model is used to predict the future behavior trend of the group, and the BERT model is used for sentiment analysis to obtain the sentiment polarity, sentiment intensity and sentiment trend;行为关联分析,具体为:Behavioral association analysis, specifically:将文本数据转换为每一行代表一个事务,每个事务包含一组项的形式;Convert the text data into a format where each line represents a transaction and each transaction contains a set of items;输入Apriori算法,统计每个单项在所有事务中出现的频次,计算支持度,根据最小支持度阈值筛选出频繁单项集,使用频繁项集生成新的候选项集,计算它们的支持度,重复生成候选频繁项集和计算支持度的步骤,直至无法生成新的频繁项集;Input the Apriori algorithm, count the frequency of each item in all transactions, calculate the support, filter out frequent itemsets according to the minimum support threshold, use frequent itemsets to generate new candidate itemsets, calculate their support, and repeat the steps of generating candidate frequent itemsets and calculating support until no new frequent itemsets can be generated;对每个频繁项集,生成所有可能的关联规则,对于规则A->B,计算置信度:For each frequent item set, generate all possible association rules, and for rule A->B, calculate the confidence:Confidence(A->B)=Support(A∪B)/Support(A)Confidence(A->B)=Support(A∪B)/Support(A)根据最小置信度阈值,筛选出有意义的关联规则;According to the minimum confidence threshold, meaningful association rules are screened out;异常检测,具体为:Anomaly detection, specifically:使用K-最近邻算法计算个体行为与其最近邻居之间的平均距离,具体为:The K-nearest neighbor algorithm is used to calculate the average distance between individual behaviors and their nearest neighbors, specifically:其中,xik和xjk分别是数据点Xi和Xj在第i个特征上的值,n是特征的维度;Among them,xik andxjk are the values of data pointsXi andXj on the i-th feature, respectively, and n is the dimension of the feature;如果距离超过预设阈值,则该个体可能是异常;分析解释每个异常点,找出其偏离常态的原因,以散点图和箱线图的方法进行可视化,明显突出异常行为。If the distance exceeds the preset threshold, the individual may be an anomaly; analyze and explain each anomaly point, find out the reason for its deviation from the norm, and visualize it with scatter plots and box plots to clearly highlight abnormal behavior.
CN202411207783.0A2024-08-302024-08-30 A group behavior analysis method based on multimodal information fusion and dynamic updatingPendingCN119337197A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202411207783.0ACN119337197A (en)2024-08-302024-08-30 A group behavior analysis method based on multimodal information fusion and dynamic updating

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202411207783.0ACN119337197A (en)2024-08-302024-08-30 A group behavior analysis method based on multimodal information fusion and dynamic updating

Publications (1)

Publication NumberPublication Date
CN119337197Atrue CN119337197A (en)2025-01-21

Family

ID=94262463

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202411207783.0APendingCN119337197A (en)2024-08-302024-08-30 A group behavior analysis method based on multimodal information fusion and dynamic updating

Country Status (1)

CountryLink
CN (1)CN119337197A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN119670872A (en)*2025-02-192025-03-21中影年年(北京)科技有限公司 AIGC-based character behavior prediction method, system and storage medium
CN119988332A (en)*2025-04-172025-05-13北京流金岁月科技有限公司 A monitoring method for multimodal data digital processing process
CN120298185A (en)*2025-04-032025-07-11交通运输部公路科学研究所 Emergency rescue command and dispatch method, system, electronic device and storage medium based on multimodal data fusion

Cited By (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN119670872A (en)*2025-02-192025-03-21中影年年(北京)科技有限公司 AIGC-based character behavior prediction method, system and storage medium
CN119670872B (en)*2025-02-192025-05-23中影年年(北京)科技有限公司 AIGC-based character behavior prediction method, system and storage medium
CN120298185A (en)*2025-04-032025-07-11交通运输部公路科学研究所 Emergency rescue command and dispatch method, system, electronic device and storage medium based on multimodal data fusion
CN119988332A (en)*2025-04-172025-05-13北京流金岁月科技有限公司 A monitoring method for multimodal data digital processing process

Similar Documents

PublicationPublication DateTitle
CN118378912B (en)Emergency scene intelligent analysis and decision support method based on AI large model
CN118797542A (en) Customer portrait key data mining method and system based on spatiotemporal big data
CN119360278A (en) A dangerous behavior identification and early warning method based on multimodal analysis
CN116982037A (en)Semantic coverage in managing and measuring knowledge discovery processes
CN119337197A (en) A group behavior analysis method based on multimodal information fusion and dynamic updating
CN117493973A (en)Social media negative emotion recognition method based on generation type artificial intelligence
CN118608334B (en)Education course recommendation system, method and application based on education industry big data
CN117764084A (en)Short text emotion analysis method based on multi-head attention mechanism and multi-model fusion
CN118964641B (en) Method and system for building AI knowledge base model for enterprises
CN116257759A (en) A structured data intelligent classification and grading system based on deep neural network model
CN117391742A (en) A method of economic analysis of market operations
CN117668582A (en)User cluster analysis method based on behavior data
CN119444260A (en) Intelligent financial consulting system and method based on large language model
Asadi et al.Clustering of time series data with prior geographical information
CN119107153A (en) Customer information resource integration and analysis method and system based on big data
Amayri et al.A statistical process control chart approach for occupancy estimation in smart buildings
Leoshchenko et al.Neuroevolution Methods for Organizing the Search for Anomalies in Time Series.
Vanitha et al.A Novel Hybrid Encoder and Deep Learning Model for Enhancing Rumor Detection from Social Media Texts.
Aydoğdu et al.Machine learning for urban computing
CN119939229B (en)Network content propagation method and system based on fusion cognitive understanding and intelligent management
CN120541277B (en)Search result generation method and device of server, computer equipment and medium
CN120257113B (en) An intelligent data management system and method based on multi-source data acquisition
Yang et al.A Fuzzy Neural Network‐Based System for Alleviating Students’ Boredom in English Learning
Pathak et al.Deciphering emotions: A comprehensive examination of machine learning algorithms for sentiment analysis
CN118093969B (en)Expressway asset data visualization method and system based on digital twinning

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination

[8]ページ先頭

©2009-2025 Movatter.jp