Disclosure of Invention
The invention provides a multisource data fusion method based on the technology of the Internet of things, which comprises the following steps:
Step1, accessing multi-source heterogeneous data in an Internet of things environment into a fusion system;
step2, the fusion system carries out pre-fusion adaptation processing on the accessed data to obtain a data stream A;
Step3, extracting feature vectors in the data stream A in a cross-mode manner, and fusing the feature vectors according to the dynamically allocated fusion weights to obtain a data stream B;
step4, quantifying a fusion error of the data stream B, and further adjusting an adaptation parameter in Step2 and a fusion weight in Step3 according to the fusion error to form a system optimization closed loop;
Step5, utilizing the system optimization closed-loop optimization fusion system to the optimal post-production environment, and implementing real-time fusion of the access data.
The multi-source data fusion method based on the Internet of things technology, which is disclosed by the invention, comprises the following steps of:
Judging whether the current pre-fusion adaptation processor is executed for the first time;
if yes, initializing the adaptation parameters of the pre-fusion adaptation processor based on the data characteristics in the current Internet of things environment;
If not, fine tuning the adaptation parameters in the adapter before fusion based on the fusion error;
And inputting the standardized access data into the pre-fusion adapter to obtain a data stream A.
According to the multisource data fusion method based on the Internet of things technology, streaming processing is adopted for pre-fusion adaptation and fusion of access data in order to improve data processing speed.
The multi-source data fusion method based on the Internet of things technology, which is disclosed by the invention, comprises the steps of extracting the characteristic vector in the data stream A in a cross-mode, and fusing the characteristic vector according to the dynamically allocated fusion weight to obtain the data stream B, wherein the method comprises the following concrete steps:
extracting feature vectors in the data stream A by using a pre-trained cross-mode encoder;
judging whether the feature vector fusion is executed for the first time;
if yes, initializing fusion weights of the feature vectors based on an average allocation principle;
if not, fine tuning the fusion weight of each feature vector based on the fusion error;
And fusing the extracted feature vectors by using the distributed fusion weights.
The multi-source data fusion method based on the Internet of things technology, which is described above, wherein the fusion error of the quantized data stream B is specifically divided into the following sub-steps:
creating a fusion error calculation formula by combining downstream task requirements and data characteristics before and after fusion;
And calculating the fusion error of the data stream B by using the established fusion error calculation formula.
The invention also provides a multisource data fusion system based on the Internet of things technology, which comprises a data access module, a pre-fusion adaptation processing module, a feature extraction and fusion module, a fusion error feedback module and a system production module;
The data access module is used for accessing the multi-source heterogeneous data in the Internet of things environment into the fusion system;
the pre-fusion adaptation processing module is used for carrying out pre-fusion adaptation processing on the accessed data to obtain a data stream A;
the feature extraction and fusion module is used for extracting feature vectors in the data stream A in a cross-mode, and fusing the feature vectors according to the dynamically allocated fusion weight to obtain a data stream B;
the fusion error feedback module is used for quantifying fusion errors of the data stream B and feeding the quantified fusion errors back to the pre-fusion adaptation processing module and the feature extraction and fusion module to form a system optimization closed loop;
and the system production module is used for putting the optimized fusion system into a production environment and implementing real-time fusion of the access data.
The multi-source data fusion system based on the Internet of things technology comprises an adaptation parameter adjustment sub-module and a processing result output sub-module, wherein the pre-fusion adaptation processing module specifically comprises an adaptation parameter adjustment sub-module and a processing result output sub-module;
The adaptation parameter adjustment sub-module is used for initializing or fine-adjusting the adaptation parameters in the adaptation processor before fusion;
and the processing result output sub-module is used for inputting the standardized access data to the pre-fusion adapter to obtain a data stream A.
The multi-source data fusion system based on the Internet of things technology comprises a feature extraction sub-module, a fusion weight dynamic allocation sub-module and a feature fusion sub-module, wherein the feature extraction and fusion module specifically comprises a feature extraction sub-module, a fusion weight dynamic allocation sub-module and a feature fusion sub-module;
the feature extraction submodule is used for extracting feature vectors in the data stream A by using the pre-trained cross-mode encoder;
The fusion weight dynamic allocation sub-module is used for initializing or fine-tuning the fusion weight;
And the feature fusion sub-module is used for fusing the extracted feature vectors by using the distributed fusion weights.
The multi-source data fusion system based on the Internet of things technology is characterized in that a module involved in a system optimization closed loop is not arranged in a fusion system in a production environment, the system optimization closed loop dynamically optimizes the fusion system only in a test environment, the module involved in the system optimization closed loop comprises an adaptive parameter adjustment sub-module, a fusion weight dynamic allocation sub-module and a fusion error feedback module, after the fusion system is put on, the test environment still needs to be periodically accessed into data of the production environment to dynamically optimize the current fusion system, and the test environment is migrated to the production environment again after the optimization is finished.
The method has the beneficial effects that the multi-mode data (such as sensor data, images, texts and the like) with different sources and different formats in the Internet of things environment are effectively integrated, and the data integrity and consistency are remarkably improved.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
As shown in fig. 1, a first embodiment of the present application provides a multi-source data fusion method based on the internet of things technology, including:
Step S110, accessing multi-source heterogeneous data in the environment of the Internet of things into a fusion system;
Networking equipment in the environment of the internet of things is various, so that a protocol adapter is required to be arranged at an access port of the system, different protocols can be dynamically converted to adapt to different equipment, the adapter supports dynamic analysis and standardized encapsulation of various internet of things protocols such as MQTT, coAP, HTTP and the like, and heterogeneous equipment access is compatible;
the accessed data is firstly subjected to time stamp synchronization and space coordinate mapping, so that the problem of space-time inconsistency of multi-source heterogeneous data is solved.
Step S120, the fusion system carries out pre-fusion adaptation processing on the accessed data to obtain a data stream A;
The accessed data come from various devices in the environment of the Internet of things and have different data formats, so that one-time adaptation processing is needed before fusion to ensure the quality of the fused data, and the adaptability to downstream analysis tasks is improved, and the method specifically comprises the following substeps:
Step S121, judging whether the pre-fusion adaptive processor is executed for the first time;
Setting a global variable flag_jz, wherein the initial value is 0, if flag_jz= 0, the pre-fusion adaptation processor is indicated to be executed for the first time, and if flag_jz= 1, the pre-fusion adaptation processor is indicated to be executed for the first time.
Step S122, if yes, initializing the adaptation parameters of the pre-fusion adaptation processor based on the data characteristics in the current Internet of things environment;
The initialization formula of the attribute adaptation parameter d of the internet of things is as follows: whereinRepresenting the lowest data dimension in the access data stream,Representing the highest data dimension in the access data stream,Is an adaptation coefficient of the attribute of the internet of things,Is the information entropy value of the output data of the equipment with the highest total attribute value in the current Internet of things environment,The attribute of the Internet of things refers to attributes such as the functional criticality, priority, quantity ratio and the like of the networking equipment, which can measure the importance of the equipment, and the attribute values need to be manually preset by staff when the equipment is accessed;
Internet of things relevance adaptation parametersThe initialization formula of (2) is: whereinAs the signal-to-noise ratio of X,Is an adaptation coefficient of the relevance between the devices of the internet of things,And outputting the signal-to-noise ratio of the data for the device k, wherein P is a device set with an association relation with the device with the highest total attribute value, and L is the total number of devices contained in the device set P.
The pre-fusion adaptation processor is expressed as:
Wherein X is the current accessed multi-source heterogeneous data set,In order to process the completed data set,、The balance coefficients of the attribute adaptation parameters of the internet of things and the relevance adaptation parameters of the internet of things are respectively, W is the projection matrix of the adapter,N represents the number of samples in X, D represents the dimension of X, D is the attribute adaptation parameter of the Internet of things, D < D,Is the relevance adapting parameter of the internet of things,Is the nuclear normIs an automatic encoder (DAE).
Step 123, if not, fine tuning the adaptation parameters in the adapter before fusion based on the fusion error;
the fine tuning formula of the attribute adaptation parameter d of the internet of things is as follows: whereinFor the attribute adaptation of the finely tuned internet of things,For the learning rate of the pre-fusion adapter,The result is the quantized result of the fusion error;
Internet of things relevance adaptation parametersThe fine tuning formula of (2) is: whereinAnd adapting parameters for the correlation of the finely adjusted Internet of things.
Step S124, inputting the standardized access data into a pre-fusion adapter to obtain a data stream A;
in order to increase the data processing speed, the pre-fusion adaptation of the access data and the subsequent fusion are all processed by adopting streaming, so that the output data is one data stream, and the output data stream is denoted as A.
Step S130, extracting feature vectors in the data stream A in a cross-mode manner, and fusing the feature vectors according to the dynamically allocated fusion weights to obtain a data stream B;
extracting the feature vector of the data stream A in a cross-mode, dynamically fusing the extracted feature vector, and outputting the data stream B, wherein the method comprises the following concrete steps:
S131, extracting feature vectors in the data stream A by using a pre-trained cross-mode encoder;
The cross-mode encoder adopted in this embodiment is taken from the CLIP series, and other existing cross-mode encoders, such as UNITER or LXMERT, may be selected as required, and the extracted feature vectors are stored in the dataset C without limitation.
Step S132, judging whether the feature vector fusion is executed for the first time;
Setting a global variable flag_m to be 0 as an initial value, if flag_m= 0, it indicates that the feature vector fusion step is performed for the first time, and if flag_m= 1, it indicates that the feature vector fusion step is not performed for the first time.
Step S133, if yes, initializing fusion weights of the feature vectors based on an average allocation principle;
I.e. the fusion weight of each feature vector is 1/n, n being the total number of feature vectors.
Step S134, if not, fine tuning the fusion weight of each feature vector based on the fusion error;
The fine tuning formula of the fusion weight is expressed as: whereinFor the fusion weights of the i-th feature vector after fine tuning,For the learning rate of the downstream model (i.e., the downstream model that needs to perform the analysis task with the help of the fusion data), E is the fusion error,Is a function of mapping weights to probability simplex,,Is thatThe parameters of the function are entered into the function,Is the mapping vector of the mapping,Is a set of mapping vectors, for eachIt is necessary to satisfy the requirements of the system,The two conditions of this are that,Is thatThe t-th component of (b).
S135, fusing the extracted feature vectors by using the distributed fusion weights;
Utilizing the assigned fusion weightsAnd (3) carrying out weighted summation on each feature vector in the data set C to obtain a fused feature vector, and recording a data stream formed by the fused feature vector as B.
Step 140, quantifying the fusion error of the data stream B, and further adjusting the adaptive parameters in step 120 and the fusion weight in step 130 according to the fusion error to form a system optimization closed loop;
in order to accurately evaluate the quality of the fusion result, the fusion error of the data stream B needs to be comprehensively quantized by combining the downstream task requirements and the data characteristics, and the method specifically comprises the following sub-steps:
step S141, a fusion error calculation formula is created by combining the downstream task demand and the data characteristics before and after fusion;
The fused data stream B should keep rich informativity in the data stream A, and ensure smooth execution of downstream tasks, and a fused error E calculation formula designed based on the requirements is expressed as follows: wherein、、Information richness error, task execution error, and uncertainty error,The entropy value of the information representing the data stream a,Representing the jth item of data in data stream BIs used for the information entropy value of (a),Representing downstream model pairsIs used to determine the predicted value of (c),For downstream modelIs used to determine the desired output value of (a),Representing bayesian neural network pairsAnd outputting an uncertainty score, wherein j takes a value of 1-m, and m is the total number of items of data contained in the data stream B.
Step S142, calculating the fusion error of the data stream B by using the established fusion error calculation formula;
Step S143, feeding back the calculated fusion error to the step S120 and the step S130;
The global variables Flag JZ and Flag M need to be set to 1 before feedback, which also identifies the start of the system optimization closed loop.
Step S150, utilizing a system optimization closed-loop optimization fusion system to be put into a production environment after the system optimization is optimal, and implementing real-time fusion of access data;
And verifying the performance of the fusion system in real time according to the fusion error, when the value of the fusion system is not reduced or the trend of obvious reduction is not existed, marking the performance of the fusion system to achieve the best, and migrating the self-adaptive parameter and the fusion weight to the production environment at the moment so as to implement the real-time fusion of the access data of the current Internet of things environment and provide high-quality fusion data for the downstream model.
Example two
As shown in fig. 2, the second embodiment of the present application provides a multi-source data fusion system based on the internet of things technology, which comprises a data access module 210, a pre-fusion adaptation processing module 220, a feature extraction and fusion module 230, a fusion error feedback module 240 and a system production module 250;
The data access module 210 is configured to access multi-source heterogeneous data in the internet of things environment to the fusion system;
The pre-fusion adaptation processing module 220 is used for performing pre-fusion adaptation processing on the accessed data to obtain a data stream A, and specifically comprises an adaptation parameter adjustment sub-module and a processing result output sub-module;
1. The adaptation parameter adjustment sub-module is used for initializing or fine-adjusting the adaptation parameters in the adaptation processor before fusion;
Firstly, setting a global variable flag_jz, wherein the initial value is 0, if flag_jz= 0, the current pre-fusion adaptive processor is indicated to be executed for the first time, if flag_jz= 1, the current pre-fusion adaptive processor is indicated to be executed for the first time, if the current pre-fusion adaptive processor is executed for the first time, the adaptive parameters of the pre-fusion adaptive processor are initialized based on the data characteristics in the current Internet of things environment, and if the current pre-fusion adaptive processor is executed for the first time, the adaptive parameters in the pre-fusion adapter are finely adjusted based on fusion errors.
2. The processing result output sub-module is used for inputting the standardized access data to the pre-fusion adapter to obtain a data stream A;
in order to increase the data processing speed, the pre-fusion adaptation of the access data and the subsequent fusion are all processed by adopting streaming, so that the output data is one data stream, and the output data stream is denoted as A.
The feature extraction and fusion module 230 is used for extracting feature vectors in the data stream A in a cross-mode, and fusing the feature vectors according to the dynamic distribution fusion weight to obtain a data stream B;
1. The feature extraction submodule is used for extracting feature vectors in the data stream A by using the pre-trained cross-mode encoder;
The cross-mode encoder adopted in this embodiment is taken from the CLIP series, and other existing cross-mode encoders, such as UNITER or LXMERT, may be selected as required, and the extracted feature vectors are stored in the dataset C without limitation.
2. The fusion weight dynamic allocation sub-module is used for initializing or fine-tuning the fusion weight;
Setting a global variable flag_M with an initial value of 0, if flag_M= 0, indicating that the feature vector fusion step is executed for the first time, if flag_M= 1, indicating that the feature vector fusion step is not executed for the first time, initializing fusion weights of the feature vectors based on an average allocation principle if the feature vector fusion step is executed for the first time, namely, the fusion weight of each feature vector is 1/n, n is the total number of the feature vectors, and performing fine adjustment on the fusion weights of the feature vectors based on fusion errors if the feature vector fusion step is not executed for the first time;
The fine tuning formula of the fusion weight is expressed as: whereinFor the fusion weights of the i-th feature vector after fine tuning,For the learning rate of the downstream model (i.e., the downstream model that needs to perform the analysis task with the help of the fusion data), E is the fusion error,Is a function of mapping weights to probability simplex,,Is thatThe parameters of the function are entered into the function,Is the mapping vector of the mapping,Is a set of mapping vectors, for eachIt is necessary to satisfy the requirements of the system,The two conditions of this are that,Is thatThe t-th component of (b).
3. The feature fusion sub-module is used for fusing the extracted feature vectors by using the distributed fusion weights;
Utilizing the assigned fusion weightsAnd (3) carrying out weighted summation on each feature vector in the data set C to obtain a fused feature vector, and recording a data stream formed by the fused feature vector as B.
The fusion error feedback module 240 is configured to quantize a fusion error of the data stream B, and feed back the quantized fusion error to the pre-fusion adaptation processing module 220, and the feature extraction and fusion module 230, so as to form a system optimization closed loop;
Firstly, establishing a fusion error calculation formula by combining downstream task requirements and data characteristics before and after fusion;
The fused data stream B should keep rich informativity in the data stream A, and ensure smooth execution of downstream tasks, and a fused error E calculation formula designed based on the requirements is expressed as follows: wherein、、Information richness error, task execution error, and uncertainty error,The entropy value of the information representing the data stream a,Representing the jth item of data in data stream BIs used for the information entropy value of (a),Representing downstream model pairsIs used to determine the predicted value of (c),For downstream modelIs used to determine the desired output value of (a),Representing bayesian neural network pairsAnd outputting an uncertainty score, wherein j takes a value of 1-m, and m is the total number of items of data contained in the data stream B.
And then calculating the fusion error of the data stream B by using the established fusion error calculation formula, feeding back the calculated fusion error to the pre-fusion adaptation processing module 220 and the feature extraction and fusion module 230, and setting global variables flag_JZ and flag_M to 1 before feeding back, which also marks the start of the system optimization closed loop.
The system production module 250 is used for putting the optimized fusion system into a production environment and implementing real-time fusion of access data;
And verifying the performance of the fusion system in real time according to the fusion error, when the value of the fusion system is not reduced or the trend of obvious reduction is not existed, marking the performance of the fusion system to achieve the best, and migrating the self-adaptive parameter and the fusion weight to the production environment at the moment so as to implement the real-time fusion of the access data of the current Internet of things environment and provide high-quality fusion data for the downstream model.
The method is characterized in that a module related to a system optimization closed loop is not arranged in a fusion system in a production environment, the system optimization closed loop dynamically optimizes the fusion system only in a test environment, the module related to the system optimization closed loop comprises an adaptive parameter adjusting sub-module, a fusion weight dynamic allocation sub-module and a fusion error feedback module, after the fusion system is put into operation, the test environment still needs to be periodically accessed into data of the production environment to dynamically optimize the current fusion system, and the current fusion system is migrated to the production environment again after the optimization is completed, so that the effects of production and test isolation are achieved.
Corresponding to the above embodiments, the embodiments of the present invention provide a computer storage medium comprising at least one memory and at least one processor;
the memory is used for storing one or more program instructions;
And the processor is used for running one or more program instructions and executing a multi-source data fusion method based on the Internet of things technology.
Corresponding to the above embodiments, the embodiments of the present invention provide a computer readable storage medium, where the computer readable storage medium contains one or more program instructions, where the one or more program instructions are configured to be executed by a processor to perform a multi-source data fusion method based on the internet of things technology.
The disclosed embodiments provide a computer readable storage medium, in which computer program instructions are stored, which when run on a computer, cause the computer to perform a multi-source data fusion method based on the internet of things technology as described above.
In the embodiment of the invention, the processor may be an integrated circuit chip with signal processing capability. The Processor may be a general purpose Processor, a digital signal Processor (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), field programmable gate array (FieldProgrammable GATE ARRAY, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components.
The disclosed methods, steps, and logic blocks in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The processor reads the information in the storage medium and, in combination with its hardware, performs the steps of the above method.
The storage medium may be memory, for example, may be volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory.
The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable ROM (ELECTRICALLY EPROM, EEPROM), or a flash Memory.
The volatile memory may be a random access memory (Random Access Memory, RAM for short) which acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as static random access memory (STATIC RAM, SRAM), dynamic random access memory (DYNAMIC RAM, DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate Synchronous dynamic random access memory (Double DATA RATESDRAM, ddr SDRAM), enhanced Synchronous dynamic random access memory (ENHANCEDSDRAM, ESDRAM), synchronous link dynamic random access memory (SYNCHLINK DRAM, SLDRAM), and direct memory bus random access memory (DirectRambus RAM, DRRAM).
The storage media described in embodiments of the present invention are intended to comprise, without being limited to, these and any other suitable types of memory.
Those skilled in the art will appreciate that in one or more of the examples described above, the functions described in the present invention may be implemented in a combination of hardware and software. When the software is applied, the corresponding functions may be stored in a computer-readable medium or transmitted as one or more instructions or code on the computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
The foregoing embodiments have been provided for the purpose of illustrating the general principles of the present invention in further detail, and are not to be construed as limiting the scope of the invention, but are merely intended to cover any modifications, equivalents, improvements, etc. based on the teachings of the invention.