Movatterモバイル変換


[0]ホーム

URL:


CN120257113A - An intelligent data management system and method based on multi-source data collection - Google Patents

An intelligent data management system and method based on multi-source data collection
Download PDF

Info

Publication number
CN120257113A
CN120257113ACN202510743370.2ACN202510743370ACN120257113ACN 120257113 ACN120257113 ACN 120257113ACN 202510743370 ACN202510743370 ACN 202510743370ACN 120257113 ACN120257113 ACN 120257113A
Authority
CN
China
Prior art keywords
data
feature
unstructured
feature extraction
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202510743370.2A
Other languages
Chinese (zh)
Other versions
CN120257113B (en
Inventor
孙广芝
王淑敏
程越
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China National Institute of Standardization
Original Assignee
China National Institute of Standardization
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China National Institute of StandardizationfiledCriticalChina National Institute of Standardization
Priority to CN202510743370.2ApriorityCriticalpatent/CN120257113B/en
Publication of CN120257113ApublicationCriticalpatent/CN120257113A/en
Application grantedgrantedCritical
Publication of CN120257113BpublicationCriticalpatent/CN120257113B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Landscapes

Abstract

The invention discloses an intelligent data management system and method based on multi-source data acquisition, which relate to the technical field of data processing, and are used for acquiring multi-source data from multiple data sources, generating an original data set with reliability marks, classifying data types of data records in the original data set, extracting trend features by using a segmented multi-model time sequence feature extraction frame for time sequence data, extracting unstructured features for unstructured data by using a feature extraction model based on a multi-layer perceptron, establishing self-adaptive association relation for structured data by using a knowledge graph technology, generating association relation data, taking the trend features, the unstructured features and the association relation data as input of a data value evaluation model, generating dynamic weight data asset efficacy evaluation, and realizing intelligent management and value evaluation of multi-source heterogeneous data.

Description

Intelligent data management system and method based on multi-source data acquisition
Technical Field
The invention relates to the technical field of data processing, in particular to an intelligent data management system and method based on multi-source data acquisition.
Background
With the acceleration of digital transformation, organizations accumulate massive multi-source heterogeneous data, which has become the most important asset for enterprises. However, the data sources are various, the types are complex, and the quality is uneven, so that the effective management and the full play of the data value face a great challenge. The data type classification and the targeted feature extraction become key links, and the real value and the efficacy of the data asset can be scientifically evaluated only by accurately identifying the data type and adopting a corresponding feature extraction technology, so that a reliable basis is provided for enterprise decision-making.
The traditional data management system usually adopts a unified processing mode, so that the problem that the multi-source heterogeneous data is difficult to deal with or the characteristic extraction method is single, and the intrinsic value of different types of data cannot be fully mined;
Therefore, the invention provides an intelligent data management system and method based on multi-source data acquisition.
Disclosure of Invention
The present invention aims to solve at least one of the technical problems existing in the prior art. Therefore, the invention provides an intelligent data management system and method based on multi-source data acquisition, which realize intelligent management and value evaluation of multi-source heterogeneous data.
In order to achieve the above purpose, an intelligent data management method based on multi-source data acquisition is provided, which comprises the following steps:
Step one, multi-source data from a multi-data source is obtained, and an original data set with a credibility mark is generated;
Step two, classifying data types of the data records in the original data set by adopting a data type recognition method based on combination of rules and a decision tree classifier, and entering a step three for data with time sequence data as the data types;
step three, extracting trend characteristics of the time sequence data by using a segmented multi-model time sequence characteristic extraction frame;
extracting unstructured features from the unstructured data through a feature extraction model based on a multi-layer perceptron;
Establishing a self-adaptive association relation for the structured data by adopting a knowledge graph technology, and generating association relation data;
Step six, taking the trend feature, unstructured feature and association relation data as input of a pre-constructed multi-level fusion data value evaluation model to generate dynamic weight data asset efficacy evaluation;
the step of obtaining multi-source data from a multi-data source, the step of generating an original data set with a credibility mark and generating an original data set with a credibility mark comprises the following steps:
Step 11, acquiring original multi-source data from multiple data sources through a distributed data acquisition network consisting of a plurality of types of data acquisition units;
step 12, adopting a data preprocessing unit of an edge computing architecture to perform standardization processing and credibility evaluation on the collected original multi-source data to form an original data set with credibility marks
The data type classification of the data records in the original data set by adopting a data type recognition method based on the combination of rules and classifiers comprises the following steps:
Step 21, constructing a data type feature extractor comprising a structural feature extraction rule, a time feature extraction rule and a content feature extraction rule, and extracting key feature vectors required by data type judgment from an original data set;
Step 22, based on the extracted key feature vector, classifying the data record by using a decision tree classifier to generate a classification result, wherein the classification result comprises time sequence data, unstructured data and structured data;
The application of the segmented multi-model time sequence feature extraction framework to extract trend features of the time sequence data comprises the following steps:
Step 31, constructing a segmented multi-model time sequence feature extraction frame comprising a self-adaptive segmentation module, a multi-model feature extraction module and a feature fusion module;
Step 32, dividing continuous time sequence data into data segments with similar statistical characteristics by adopting a similarity measurement method based on dynamic time warping through the self-adaptive segmentation module, dynamically determining the boundaries of the data segments through a sliding window algorithm, and automatically adjusting the window size according to the change characteristics of the time sequence data;
Step 33, through the multi-model feature extraction module comprising a statistic feature extraction channel, a frequency domain feature extraction channel and a shape feature extraction channel, applying a corresponding feature extraction method to each data segment in parallel, and extracting single-dimensional feature vectors of time sequence data from different dimensions;
step 34, fusing the single-dimensional feature vectors with different dimensions by utilizing the feature fusion module to generate trend features comprehensively representing trend characteristics of the time sequence data;
the method for extracting unstructured features from the unstructured data through a feature extraction model based on a multi-layer perceptron comprises the following steps of:
Step 41, extracting initial unstructured features from unstructured data by applying a feature extractor based on a statistical method;
step 42, designing a multi-layer perceptron model consisting of an input layer, a plurality of hidden layers and an output layer, adopting a fully-connected neural network structure, training the multi-layer perceptron model, and mapping initial unstructured features to unstructured feature spaces;
The multi-layer perceptron model consists of an input layer, a plurality of hidden layers and an output layer, and adopts a fully-connected neural network structure. The number of neurons of the input layer is consistent with the dimension of the initial characteristic, the hidden layer adopts a multi-layer structure, and the number of neurons of each layer is gradually decreased layer by layer.
Step 43, applying the trained multi-layer perceptron model to perform feature extraction on the unstructured data to generate unstructured features in a unified format;
the method for generating the association relationship data by using the dynamic evolution knowledge graph technology to establish the self-adaptive association relationship for the structured data comprises the following steps:
Step 51, constructing a domain ontology model and a key entity identification rule, and extracting a key entity and attributes thereof from the structured data;
Step 52, constructing a structural knowledge graph by applying a relation mapping rule based on the key entity and the attribute thereof extracted in the step 51;
Step 53, extracting a hierarchical association relation of the structured data and generating application-oriented association relation data based on the structural knowledge graph, wherein the hierarchical association relation comprises three levels of key entity direct association, path association and group association;
The pre-construction of the data asset efficacy assessment to generate dynamic weights includes the steps of:
Step 61, constructing a multi-level data value evaluation model framework comprising a value feature extraction layer, a feature fusion layer and a performance evaluation layer;
Step 62, extracting value indexes from data of different data types through the value feature extraction layers of three parallel feature processing channels including a trend feature processing channel, an unstructured feature processing channel and an association relation processing channel;
The method comprises the steps of extracting a multi-layer perception machine, a trend feature processing channel, an unstructured feature processing channel, an incidence relation processing channel, a graph neural network and a multi-layer perception machine, wherein the trend feature processing channel receives trend features extracted from a segmented multi-model time sequence feature extraction frame, value information contained in the trend features is extracted through the convolutional neural network;
Step 63, performing self-adaptive weighted fusion on the multidimensional value indexes from the value characteristic extraction layer through a characteristic fusion layer based on a deep neural network to generate comprehensive characteristic representation;
And step 64, simultaneously predicting the business value, the technical value and the innovation value of the data through a multi-task learning framework based on the comprehensive feature representation, and fusing the value evaluation results of the three dimensions into a final data asset effectiveness evaluation score.
The intelligent data management system based on multi-source data acquisition comprises an original data collection module, a data classification module, a multi-type feature extraction module and an asset efficiency evaluation module, wherein the modules are electrically connected;
the system comprises an original data collection module, a data classification module and a data processing module, wherein the original data collection module acquires multi-source data from a multi-data source, generates an original data set with a credibility mark, and sends the original data set to the data classification module;
The data classification module classifies data types of the data records in the original data set by adopting a data type identification method based on combination of rules and a decision tree classifier, divides time sequence data, unstructured data and structured data, and sends the time sequence data, the unstructured data and the structured data to the multi-type feature extraction module;
The multi-type feature extraction module is used for extracting trend features of the time sequence data by applying a segmented multi-model time sequence feature extraction frame, extracting unstructured features of the unstructured data by using a feature extraction model based on a multi-layer perceptron, establishing self-adaptive association relation of the structured data by using a knowledge graph technology, generating association relation data, and sending the trend features, the unstructured features and the association relation data to the asset efficacy evaluation module;
And the asset efficiency evaluation module is used for taking the trend characteristics, the unstructured characteristics and the association relation data as a pre-constructed multi-level fusion data value evaluation model to generate dynamic weight data asset efficiency evaluation.
Compared with the prior art, the invention has the beneficial effects that:
The invention discloses a specialized feature extraction technology for different types of data through multi-data source acquisition, data type identification based on combination of rules and decision tree classifiers, which comprises the steps of adopting a segmented multi-model time sequence feature extraction framework for time sequence data to accurately capture fluctuation, trend and periodic features of a time sequence, adopting a multi-layer perceptron-based feature extraction model for unstructured data to effectively mine hidden features in unstructured data such as texts, images and the like, adopting a knowledge graph technology for structured data to construct self-adaptive association relations, revealing internal relations between the data, and finally adopting a multi-layer fusion data value evaluation model to realize intelligent management and value evaluation of multi-source heterogeneous data.
Drawings
FIG. 1 is a flow chart of an intelligent data management method based on multi-source data acquisition in embodiment 1 of the invention;
fig. 2 is a block connection diagram of an intelligent data management system based on multi-source data collection in embodiment 2 of the present invention.
Detailed Description
The technical solutions of the present invention will be clearly and completely described in connection with the embodiments, and it is obvious that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
As shown in fig. 1, an intelligent data management method based on multi-source data acquisition includes the following steps:
Step one, multi-source data from a multi-data source is obtained, and an original data set with a credibility mark is generated;
Step two, classifying data types of the data records in the original data set by adopting a data type recognition method based on combination of rules and a decision tree classifier, and entering a step three for data with time sequence data as the data types;
step three, extracting trend characteristics of the time sequence data by using a segmented multi-model time sequence characteristic extraction frame;
extracting unstructured features from the unstructured data through a feature extraction model based on a multi-layer perceptron;
Establishing a self-adaptive association relation for the structured data by adopting a knowledge graph technology, and generating association relation data;
Step six, taking the trend feature, unstructured feature and association relation data as input of a pre-constructed multi-level fusion data value evaluation model to generate dynamic weight data asset efficacy evaluation;
Wherein the multiple data sources include, but are not limited to, industrial sensors, internet of things devices, historical databases, and third party APIs;
In an embodiment of the present invention, the acquiring multi-source data from a multi-data source, generating an original data set with a confidence token, includes the steps of:
Step 11, acquiring original multi-source data from multiple data sources through a distributed data acquisition network consisting of a plurality of types of data acquisition units;
Specifically, the distributed data acquisition network comprises an industrial sensor acquisition unit, an Internet of things equipment acquisition unit, a history database access unit and a third party API call unit.
The industrial sensor acquisition unit is connected to various sensors in a production line, equipment and an environment monitoring system, acquires physical quantity data such as temperature, pressure, flow, current, vibration, gas concentration and the like in real time, establishes connection with the sensors by adopting standardized industrial communication protocols such as Modbus, profibus, OPC UA and the like, and reads the physical quantity data according to a preset sampling frequency.
The Internet of things equipment acquisition unit is connected with Internet of things terminals distributed at different positions through wireless communication technologies such as LoRa, NB-IoT, zigBee and the like, and acquires data such as equipment state, environmental parameters and position information;
the history database access unit is connected to various history database systems in the enterprise through a database connector, and comprises a relational database, a time sequence database, a document database and the like, and extracts data with different data types such as a history operation record, an equipment maintenance record, a quality detection record and the like;
The third party API calling unit is connected to an external data service provider through a standardized Web service interface such as REST API, SOAP and the like, and obtains external environment data such as weather data, market information, supply chain state and the like.
In the specific implementation process of the invention, each data acquisition unit executes the data acquisition task according to a preset sampling strategy. For example, the sampling frequency of the industrial sensor acquisition unit is set according to the change rate of the monitoring parameters, the key process parameters such as temperature and pressure are sampled at a high frequency every 10 seconds, the equipment state parameters such as motor current and vibration are sampled at a medium frequency every 30 seconds, and the environmental parameters such as environmental temperature and humidity are sampled at a low frequency every 5 minutes. The sampling frequency of the equipment acquisition unit of the Internet of things is set according to the balance of the service life of the battery of the equipment and the importance of data, and is generally set once every 1-10 minutes. The history database access unit performs incremental data extraction at a frequency of once per hour. The third party API call unit sets reasonable call intervals, typically varying from every 10 minutes to once per day, depending on the access restrictions of the API provider and the frequency of data updates.
The original data collected by all the collection units corresponding to each data source form original multi-source data, wherein the original data comprise industrial sensor data collected by an industrial sensor collection unit, internet of things equipment data collected by an internet of things equipment collection unit, a historical database record collected by a historical database access unit and API return data collected by a third-party API call unit;
step 12, carrying out standardization processing and credibility evaluation on the collected original multi-source data by adopting a data preprocessing unit of an edge computing architecture to form an original data set with credibility marks;
the standardized processing means that a special data preprocessing unit is configured for each data source, raw data of all acquisition units corresponding to the data source is received, and standardized processing of data format conversion, unit unification and time stamp alignment is performed.
The credibility evaluation is realized by adopting a multidimensional credibility evaluation method, and a credibility score is distributed to each piece of original data.
Specifically, in the data format conversion process, heterogeneous data from different sources are uniformly converted into a standard JSON or Avro format, so that the consistency of a data structure is ensured. In the unit unification stage, according to predefined unit conversion rules, measured values of different units are converted into system standard units, such as Fahrenheit to Celsius, english to metric, and the like. In the time stamp alignment process, the time information of all data is uniformly converted into a UTC time format, and time correction is carried out according to the actual delay of data acquisition, so that the comparability of different source data in the time dimension is ensured.
Further, the reliability evaluation is based on at least the following factors, namely, the reliability of the data source is evaluated based on historical performance, the freshness of the data is evaluated, the time-dependent reliability of the data is higher, the integrity of the data is checked, the integrity and the validity of the data fields are checked, the data consistency is evaluated through comparison with historical data or related data, the rationality of the data is evaluated, and the health state of the sensor is evaluated based on self-diagnosis information of the sensor.
In a specific implementation, the credibility evaluation adopts a weighted scoring model, and the weight of each evaluation factor is dynamically adjusted according to the requirements of different application scenes. For example, in a real-time control scenario, the data freshness is weighted higher, and in a quality analysis scenario, the data consistency is weighted higher. The final confidence score ranges from 0 to 100, where 0 means completely untrusted and 100 means completely trusted.
After standardized processing and credibility evaluation, the original multi-source data is organized into an original data set which is provided with credibility marks and is composed of a plurality of data records, wherein each data record at least comprises a global unique identifier which is used for uniquely identifying each data record, a time stamp which is accurate to a millisecond level, a data source identifier which marks the source type and specific equipment of the data, a parameter identifier which is used for uniquely identifying the monitored parameter type, a parameter value which is an actual measurement value subjected to standardized processing, a credibility score which represents the credibility of the data, and metadata which comprises related information of data acquisition and processing, such as sampling frequency, processing method and the like.
A specific example of the present invention is a device health management system for an intelligent manufacturing enterprise. The equipment health management system is connected to various sensors on core production equipment such as CNC (computer numerical control) machine tools, injection molding machines, robots and the like through an industrial sensor acquisition unit, acquires equipment operation parameters in real time, is connected to environment monitoring terminals distributed in workshops through an Internet of things equipment acquisition unit, acquires environment data such as temperature and humidity, dust concentration and the like, is connected to an MES (manufacturing execution system) and an equipment maintenance recording system of an enterprise through a history database access unit, extracts equipment history operation records and maintenance records, and acquires equipment parameter standards and fault feature libraries provided by equipment suppliers through a third party API (application program interface) calling unit. The data preprocessing unit performs standardization processing on the multi-source data, evaluates the data reliability based on factors such as sensor state, data acquisition time, data consistency and the like, and forms a device operation data set with reliability marks.
Further, the data type classification of the data records in the original data set by adopting a data type recognition method based on a combination of rules and a classifier comprises the following steps:
Step 21, constructing a data type feature extractor comprising a structural feature extraction rule, a time feature extraction rule and a content feature extraction rule, and extracting key feature vectors required by data type judgment from an original data set;
In particular, the structural feature extraction rules are used to analyze the organization of the data records, including, but not limited to, whether the data has a fixed key-value pair structure, the number and type distribution of data fields, the nesting hierarchy of the data, the serialization format of the data, and the like. For example, for JSON format data, the features of top field number, field type distribution (such as character string field proportion, numerical field proportion), nesting depth, etc. are extracted, and for CSV format data, the features of column number, numerical column proportion, date column proportion, etc. are extracted.
Wherein the time feature extraction rules are used to identify time series characteristics of the data including, but not limited to, whether a time stamp field is included, a distribution feature of the time stamps, a time interval regularity between data points, a continuity of the time series, etc. For example, whether a time field conforming to the ISO8601 format exists in the data is detected, the mean and variance of the time intervals of adjacent data points are calculated, the sampling frequency of the time series is analyzed, and the like.
Wherein the content feature extraction rules are used to analyze the content characteristics of the data, including but not limited to the length distribution of text fields, the statistics of numeric fields, the proportion of binary data, the identification of special format content (e.g., URL, image data, audio data), etc. For example, the average length and coefficient of variation of the text field are calculated, whether HTML tags or XML tags are included is detected, whether Base64 encoded binary data is included is identified, and so on.
In the implementation process of the invention, the data type feature extractor applies the three types of feature extraction rules to each data record in the original data set to generate a key feature vector containing a plurality of key features. The key feature vector contains a structural feature sub-vector (10-dimensional), a temporal feature sub-vector (8-dimensional) and a content feature sub-vector (12-dimensional), for a total of 30-dimensional features to characterize the type of data record.
Step 22, based on the extracted key feature vector, classifying the data record by using a decision tree classifier to generate a classification result, wherein the classification result comprises time sequence data, unstructured data and structured data;
Specifically, the decision tree classifier is constructed by adopting a C4.5 algorithm, and a classification model is constructed by recursively dividing a feature space by taking an information gain ratio as a splitting standard. Each internal node of the decision tree represents a test for a feature, each branch represents the output of the test, and each leaf node represents a class of data types.
In the process of constructing the decision tree classifier, a set of explicit classification rules is first defined as an initial structure of the decision tree based on expert knowledge. For example, data tends to be classified as time series data if it has distinct time series characteristics (timestamp fields exist and time intervals are uniform), as unstructured data if it contains a large amount of unstructured text content or binary coded content, and as structured data if it has a clear key-value pair structure and consists mainly of structured fields.
The decision tree classifier is then trained and optimized using the annotated first training dataset. The first training data set contains a plurality of data records from various data sources in the original data set as typical data samples, and each typical data sample is marked as time sequence data, unstructured data or structured data by a field expert. By training the decision tree classifier, finer classification rules are learned to handle boundary conditions with complex feature combinations.
In a further preferred embodiment of the invention, the classification result of the decision tree classifier can be verified and corrected by combining the context information of the data source and the semantic features of the data content to generate a more accurate classification result of the data type;
Specifically, on the basis of decision tree classification, data source context information and data content semantic features are introduced to verify and correct classification results. The data source context information comprises the source type, the acquisition mode, the expected use and the like of the data, and the semantic features of the data content are extracted through a simple text analysis and pattern matching method and are used for understanding the actual meaning of the data.
For the utilization of data source context information, the types of data typically generated by different data sources are recorded by maintaining a knowledge base of the data sources. For example, industrial sensor acquisition units typically generate time series data, historical database access units may generate structured data or time series data, and third party API call units may return various types of data. In the classifying process, the prior probability in the data source knowledge base is referenced, and the classifying result of the decision tree is adjusted.
For analysis of semantic features of data content, a set of predefined semantic rules and pattern matching rules may be applied. For example, it is detected whether the data field name contains time-related words (such as "time", "date", etc.), whether the data content conforms to the term pattern of the specific field is identified, whether the data structure conforms to the common data exchange format is determined, etc. These semantic features help to understand the actual use and type of data.
In the specific example of the device health management system of the intelligent manufacturing enterprise, the process of classifying the device operation data set based on the data type recognition method combining rules and statistical features comprises the steps of recognizing sensor data (such as continuous measurement values of physical quantities such as temperature, pressure and vibration) from a CNC machine tool, an injection molding machine and a robot as time sequence data, recognizing text descriptions and image records such as device maintenance records and fault reports as unstructured data, recognizing the data containing abundant unstructured contents and needing deep learning models for feature extraction, and recognizing tabular data such as device parameter configuration, production plans and bill of materials as structured data, wherein the data has clear relation structures and is suitable for establishing association relations through knowledge graph technology. Through accurate data type classification, the equipment health management system can select the most suitable processing method for each type of data, so that the data value is comprehensively mined, and equipment state monitoring, fault early warning and maintenance decision are supported.
Further, in an embodiment of the present invention, the extracting trend features of the time series data by using the segmented multi-model time series feature extracting framework includes the following steps:
Step 31, constructing a segmented multi-model time sequence feature extraction frame comprising a self-adaptive segmentation module, a multi-model feature extraction module and a feature fusion module;
And step 32, dividing continuous time sequence data into data segments with similar statistical characteristics by the self-adaptive segmentation module by adopting a similarity measurement method based on dynamic time warping, dynamically determining the boundaries of the data segments by the self-adaptive segmentation module by a sliding window algorithm, and automatically adjusting the window size according to the change characteristics of the time sequence data.
For example, for data segments with smooth changes, a larger window is used to increase the computational efficiency, and for data segments with severe changes, a smaller window is used to ensure the segmentation accuracy. In practical application, the variance of the data in the sliding window is calculated and compared with a preset threshold value to judge the change degree of the data, the smooth section is judged when the variance is smaller than the threshold value, and the fluctuation section is judged when the variance is larger than or equal to the threshold value.
Step 33, through the multi-model feature extraction module comprising a statistic feature extraction channel, a frequency domain feature extraction channel and a shape feature extraction channel, applying a corresponding feature extraction method to each data segment in parallel, and extracting single-dimensional feature vectors of time sequence data from different dimensions;
Specifically, the statistical feature extraction channel calculates statistics such as mean, variance, skewness, kurtosis and the like of each data segment, and is used for representing the basic statistical characteristics of the data segment. The frequency domain feature extraction channel converts the time domain data into a frequency domain representation through fast fourier transformation, extracts main frequency components of each data segment and energy distribution thereof, and is used for capturing periodicity and frequency characteristics of the data. The shape feature extraction channel adopts a symbolized aggregation approximation method to convert each data segment into a symbol sequence, and extracts the mode features of the data segment to be used for representing the shape features and the change modes of time sequence data. Each feature extraction channel operates independently to generate single-dimensional feature vectors with different dimensions.
Step 34, fusing the single-dimensional feature vectors with different dimensions by utilizing the feature fusion module to generate trend features comprehensively representing trend characteristics of the time sequence data;
Specifically, the feature fusion module adopts an attention mechanism to carry out weighted fusion on the single-dimensional feature vectors extracted by the three feature extraction channels. Firstly, an initial weight is allocated to each feature extraction channel, and then, the contribution degree of each feature extraction channel to final trend characterization is learned by using labeled time sequence data samples as training data in a supervised learning mode, so that the weight allocation is dynamically adjusted. In the fusion process, the statistical features, the frequency domain features and the shape features are weighted and summed according to the learned weights to form a final trend feature vector. The feature vector not only maintains the statistical characteristics of the original data, but also contains the frequency domain and the shape features, and can comprehensively reflect the trend change features of the time sequence data.
In the specific implementation process of the invention, the processing flow of the segmented multi-model time sequence feature extraction framework comprises the following steps of firstly inputting original time sequence data into an adaptive segmentation module to generate a plurality of data segments with similar features, then inputting each data segment into three feature extraction channels simultaneously to respectively extract statistical features, frequency domain features and shape features, and finally carrying out weighted fusion on the three types of features through a feature fusion module to output final trend feature representation. The whole process can realize the efficient and comprehensive extraction of the time sequence trend characteristics, and provides powerful support for the subsequent data analysis and decision.
In the above example of the device health management system for an intelligent manufacturing enterprise, the segmented multi-model timing feature extraction framework is applied to trend feature extraction of CNC machine spindle vibration signals. The vibration signal which is continuously collected is divided into a plurality of time periods through the self-adaptive segmentation module, and the time periods correspond to different machining states of the machine tool, such as no-load, normal machining, overload and the like. Then, the multi-model feature extraction module extracts features of each period from three dimensions of statistics, frequency domain and shape simultaneously, wherein the statistics features reflect the intensity and stability of vibration signals, the frequency domain features reflect the natural frequency and possible fault frequency of a machine tool spindle, and the shape features capture abnormal modes of vibration waveforms. And finally, the feature fusion module fuses the three types of features into a comprehensive trend feature vector according to the weight learned by the historical fault data. The trend feature vector can reflect the change trend of the running state of the CNC machine tool spindle, and effectively supports subsequent equipment state evaluation and potential fault early warning. For example, when the fused trend characteristic shows that the vibration frequency of the spindle gradually approaches to the critical frequency of the machine tool, an early warning signal is generated to prompt maintenance personnel to check the state of the spindle bearing, so that the early prevention of equipment faults is realized.
Further, the method for extracting unstructured features from the unstructured data based on a feature extraction model of a multi-layer perceptron comprises the following steps:
Step 41, extracting initial unstructured features from unstructured data by applying a feature extractor based on a statistical method;
specifically, for unstructured data of the text class, a bag of words model or a TF-IDF method is applied to extract word frequency features of the text, and an initial vector representation of the text is formed by calculating the occurrence frequency of each word in the text and the importance of each word in the whole corpus. In order to reduce feature dimensions, feature selection methods such as chi-square test are adopted, and top N terms with the most discriminative ability are reserved. In addition, other statistical characteristics of the text, such as average sentence length, punctuation mark distribution, keyword density and the like, can be extracted, so that the diversity of text representation is enriched.
For unstructured data of image class, classical feature descriptors such as HOG (direction gradient histogram), LBP (local binary pattern) and SIFT (scale invariant feature transform) are applied to extract texture and shape features of the image. The HOG feature captures the contour information of an object by calculating a gradient direction histogram of a local area of the image, the LBP feature extracts a local texture mode by comparing the intensity relation between a central pixel and surrounding pixels, and the SIFT feature captures the local structural feature of the image by detecting the gradient distribution of key points and the surrounding of the key points in the image. These feature descriptors complement each other and together form a comprehensive feature representation of the image.
And for the unstructured data of the audio class, acoustic feature descriptors such as MFCC (Mel frequency cepstrum coefficient), chromaticity features, energy features and the like are applied to extract the frequency spectrum characteristics and the time domain characteristics of the audio signal. The MFCC features simulate the perception characteristics of human ears on sounds with different frequencies, effectively represent the sound features of the audio, the chromaticity features represent the tone distribution of the audio, and the energy features reflect the intensity change of the audio signals in the time domain. By calculating statistics of these features, such as mean, variance, kurtosis, etc., a composite feature representation of the audio signal is formed.
Step 42, designing a multi-layer perceptron model consisting of an input layer, a plurality of hidden layers and an output layer, adopting a fully-connected neural network structure, training the multi-layer perceptron model, and mapping initial unstructured features to unstructured feature spaces;
Specifically, for unstructured data of different data types, a corresponding multi-layer perceptron model architecture is designed. Each multi-layer perceptron model consists of an input layer, a plurality of hidden layers and an output layer, and adopts a fully-connected neural network structure. The number of neurons of the input layer is consistent with the dimension of the initial characteristics, the hidden layers are of a multi-layer structure and generally comprise 2-3 hidden layers, the number of neurons of each layer is gradually decreased from layer to layer, such as 512 neurons of the first layer, 256 neurons of the second layer and 128 neurons of the third layer, more abstract characteristic representations are gradually extracted, the number of neurons of the output layer is generally set to 64-128, and compact and information-rich characteristic representations are generated.
In the design of the neural network, each hidden layer uses a ReLU activation function, introduces nonlinear transformation capability and enhances the expression capability of a model, and an output layer directly outputs a linear transformation result without using an activation function or uses a tanh activation function to limit the output range to a [ -1,1] interval. Batch Normalization layers are added between the layers, so that training convergence is accelerated and model stability is improved.
In the model training stage, a supervised learning mode is adopted, and a third training data set marked manually is used for training the multi-layer perceptron model. The third training data set contains unstructured data samples with class labels reflecting semantic classes or functional attributes of the data. The training goal is to enable the feature representation of the model output to effectively distinguish between different classes of unstructured data. Training uses a small batch gradient descent method, the batch size is typically 32-128, the loss function selects cross entropy loss (for classification tasks) or contrast loss (for feature learning tasks), the optimizer uses Adam algorithm, the initial learning rate is set to 0.001, and a learning rate decay strategy is used. To prevent overfitting, dropout (scale 0.3-0.5) and L2 regularization (coefficient 0.0001) mechanisms were added to the model and early-stop strategies were used to monitor the validation set performance, stopping training when the validation set performance was no longer improving for multiple consecutive rounds.
Step 43, applying the trained multi-layer perceptron model to perform feature extraction on the unstructured data to generate unstructured features in a unified format;
specifically, for unstructured data to be processed in step two, the initial features are first extracted by the statistical feature extractor of step 42. And then, inputting the initial characteristics into a trained multi-layer perceptron model of a corresponding type, and obtaining an activation value of a final layer (output layer) through forward propagation calculation of the model, wherein the activation value is used as the characteristic representation of the unstructured data.
In the feature extraction process, multiple levels of feature representations may be acquired for the same piece of unstructured data. In addition to the features of the output layer, the activation values of the last one or more hidden layers may be extracted as intermediate layer features. These intermediate layer features often contain more detailed information that can be complementary to the output layer features to collectively form a multi-scale feature representation.
In the above example of the device health management system of the intelligent manufacturing enterprise, the feature extraction model based on the multi-layer perceptron is applied to the processing of unstructured data such as device maintenance records and fault reports. Firstly, converting a text in a maintenance record into word frequency feature vectors through a TF-IDF method, extracting statistical features such as text length, professional term density and the like, and extracting textures and damage features of surfaces of HOG and LBP feature capturing equipment for fault images. These initial features are then mapped into semantically rich feature representations using a multi-layer perceptron model trained on historical fault cases. The text-specific multi-layer perceptron model can extract key semantic features reflecting equipment states (such as 'normal', 'slight anomaly', 'severe fault'), fault types (such as 'bearing wear', 'gear breakage') and maintenance operations (such as 'lubrication', 'replacement parts') from the maintenance records, and the image-specific multi-layer perceptron model can extract visual features representing equipment surface damage types, degrees and positions from the fault images. After PCA (principal component analysis) dimension reduction and random forest-based feature selection, generating a compact unstructured feature representation with dimension of 48, and combining the compact unstructured feature representation with the association relationship between the trend feature of the time sequence data and the structured data;
Further, the step of establishing an adaptive association relationship for the structured data by adopting a dynamic evolution knowledge graph technology, and the step of generating association relationship data comprises the following steps:
Step 51, constructing a domain ontology model and a key entity identification rule, and extracting a key entity and attributes thereof from the structured data;
Specifically, the domain ontology model is a concept hierarchy and a relationship network constructed based on specific domain knowledge and comprises three components, namely a category hierarchy, a relationship type definition and an attribute constraint rule. The category hierarchy structure defines main concepts and upper and lower hierarchical relationships thereof in the field, such as a classification system of core concepts of equipment types, process parameters, material properties and the like, the relationship type definition describes possible association modes between key entities of different categories, such as semantic relationships of 'containing', 'controlling', 'influencing', 'relying on', and the like, and the attribute constraint rules prescribe attribute characteristics of various key entities, including constraint information of data types, value ranges, units, validity conditions and the like of the attributes.
In the aspect of configuration of the key entity identification rules, the identification rules are defined for each type of key entity by combining category definition in the domain ontology model. These rules include both deterministic rules based on pattern matching, such as accurate recognition by field names, data formats, and value ranges, and probabilistic rules based on machine learning, for determining fuzzy conditions by pre-trained key entity recognition models. For example, the key entity of the production equipment type can be identified by the naming rule of the equipment number and the equipment parameter characteristics, and the key entity of the process parameter type can be identified by the parameter name key words and the distribution characteristics of the parameter values.
In the specific implementation process of the invention, the structural data is scanned and analyzed by applying the domain ontology model and the key entity identification rule, and all key entities and attributes thereof which meet the conditions are extracted. The extraction process firstly identifies the type of the key entity in the data, and then analyzes various attribute values of the key entity according to the attribute specifications defined in the ontology model. Each identified key entity contains at least a key entity unique identifier, a key entity type tag, a key entity name, a key entity attribute set, and a confidence score.
I modify the steps 52, 53 and 54 to better associate with the key entities and relationships extracted in the step 51, and give specific implementation modes of constructing a knowledge graph and establishing an adaptive association relationship.
Step 52, constructing a structural knowledge graph by applying a relation mapping rule based on the key entity and the attribute thereof extracted in the step 51;
Specifically, the relationship mapping rules comprise three types of rules, namely an explicit relationship mapping rule, an implicit relationship inference rule and a compound relationship construction rule. The explicit relationship mapping rules directly convert relationships between explicitly identified key entities in the structured data into edges in the structural knowledge graph, e.g., directly map "include" relationships in the device hierarchy, and "precedent" and "successor" relationships in the production flow, etc., into the graph. The implicit relation inference rule infers that the key entities possibly exist are associated based on the corresponding relation among the attribute values of the key entities, such as establishing an 'association' relation among different types of key entities with the same identifier, and establishing a 'similar' relation among key entities with similar attribute modes. The composite relationship construction rule is based on a combination mode of a plurality of basic relationships to construct a higher-level semantic relationship, for example, when the key entity A controls the key entity B and the key entity B affects the key entity C, the composite relationship between the key entity A indirectly affects the key entity C is established.
In the implementation process of the invention, the construction of the structural knowledge graph is divided into three stages. First, all the key entities extracted in step 51 are used as nodes of the knowledge graph, and each node retains a type label, a name and an attribute set of the key entity. And secondly, applying an explicit relation mapping rule, and establishing basic relation edges between corresponding key entity nodes according to the existing relation information in the structured data. Each relationship edge contains at least three basic attributes of relationship type, relationship strength and build time. Finally, an implicit relationship inference rule and a compound relationship construction rule are applied, the relationship among potential key entities is found and added, the semantic structure of the knowledge graph is enriched, the newly added relationship side is marked as an inferred relationship, and a confidence score is added to indicate the reliability level of the relationship. Through the above procedure, the key entities and their attributes extracted in step 51 are organized into a structured knowledge graph, in which the nodes represent the key entities, the edges represent the relationships, and together constitute a graphical representation of domain knowledge.
Step 53, extracting a hierarchical association relation of the structured data and generating application-oriented association relation data based on the structural knowledge graph, wherein the hierarchical association relation comprises three levels of key entity direct association, path association and group association;
Specifically, the direct association of key entities refers to the relationship between two key entities in the knowledge graph, which are directly connected through a single edge, and represents the most basic association form. The path association refers to an indirect relation between non-adjacent key entities connected by a path formed by a plurality of edges, and can reflect a more complex association mode. The group association refers to the overall relationship between closely associated key entity sets identified by a community detection algorithm, and represents a modularized structure and a functional partition in the system.
In the implementation process of the invention, the generation of the association relationship data at least comprises three steps. First, for each pair of key entities in the knowledge graph, the direct association strength and type between the key entities are calculated to form a key entity direct association matrix. And secondly, calculating the shortest path and the weighted path between any two key entities by applying a path analysis algorithm to generate a path association data set, wherein the path association data set comprises path length, an intermediate node sequence and comprehensive association strength. And finally, a community detection algorithm based on modularity optimization is applied, the knowledge graph is divided into a plurality of closely related key entity groups, common characteristics of key entities in the groups and interaction modes among the groups are analyzed, and a group related data set is generated.
The generated association relation data is organized by adopting a standardized data structure, and each association relation record at least comprises an association subject identifier which can be a single key entity, a key entity pair or key entity group, an association type direct association, a path association or group association, an association strength value, a value between 0 and 1, a degree of tightness of the association, an association directivity mark, a direction or no direction, an association effective time range and an association confidence score.
In the specific example of the device health management system of the intelligent manufacturing enterprise, the knowledge graph technology is applied to the establishment of the association relation of the structured data such as device parameter configuration, production plan, bill of materials and the like. First, the system extracts key entities of the CNC machine tool, the injection molding machine, the robot and other equipment, and the operation parameters, maintenance records and other attributes from the structured data. Then, a structural knowledge graph is constructed by applying a relation mapping rule, wherein the structural knowledge graph comprises a 'equipment-parameter' relation, such as a relation between a CNC machine tool and a spindle rotating speed parameter, a 'equipment-equipment' relation, such as a production line relation between the CNC machine tool and downstream conveying equipment, and a 'parameter-parameter' relation, such as an influence relation between a feeding speed and machining precision. Along with the production process, the system continuously collects new equipment operation data, and the relation strength in the knowledge graph is adjusted in real time through a state-aware dynamic association updating mechanism, for example, when the association between the increase of the temperature of the main shaft and the decrease of the processing precision is found to be enhanced, the weight of the relation between the two parameters is improved. Finally, the system extracts a multi-level correlation structure from the updated knowledge graph to generate correlation data, wherein the correlation data comprises direct correlation (such as 'main shaft temperature' directly influences 'processing precision'), path correlation (such as 'cooling system state' indirectly influences 'processing precision' by influencing 'main shaft temperature'), and group correlation (such as integral correlation between a temperature-related parameter group and a precision-related parameter group).
Further, the pre-construction of the data asset efficacy assessment to generate dynamic weights includes the steps of:
Step 61, constructing a multi-level data value evaluation model framework comprising a value feature extraction layer, a feature fusion layer and a performance evaluation layer;
Step 62, extracting value indexes from data of different data types through the value feature extraction layers of three parallel feature processing channels including a trend feature processing channel, an unstructured feature processing channel and an association relation processing channel;
In the embodiment of the invention, the trend feature processing channel receives trend features extracted from the segmented multi-model time sequence feature extraction frame, value information contained in the trend feature processing channel is extracted through a convolutional neural network, such as regularity of a data change mode, indirection and predictive capability of abnormal events, the unstructured feature processing channel receives unstructured features extracted from a feature extraction model of a multi-layer perceptron, key information points and implicit knowledge contained in the unstructured feature processing channel are identified through an attention mechanism, the association relation processing channel receives association relation data generated from a knowledge graph technology, and structured value indexes such as complexity, centrality and connection mode of relationships among entities are extracted through a graph neural network.
Step 63, performing self-adaptive weighted fusion on the multidimensional value indexes from the value characteristic extraction layer through a characteristic fusion layer based on a deep neural network to generate comprehensive characteristic representation;
Specifically, the feature fusion layer adopts a multi-head attention mechanism and a residual error connection structure to realize the fusion of the value indexes extracted by the three processing channels. The multi-head attention mechanism allows the model to pay attention to the value features of different dimensions and learn the interaction relationship between the value features, and the residual connection structure ensures that the original feature information is not lost in the deep network.
In the process of feature fusion, the output features of three processing channels are mapped to feature spaces with the same dimension, and then the attention weights among different features are calculated through a multi-head attention layer. For each feature, the attention mechanism will calculate its relevance to the other features and assign a fusion weight accordingly. The fusion method can adaptively adjust the importance of different types of features under different scenes, for example, trend features can obtain higher weight in scenes with larger data fluctuation, unstructured features can obtain higher weight in scenes needing deep understanding of content, and association relationship features can obtain higher weight in scenes needing comprehensive analysis of a relationship network.
Step 64, based on the comprehensive characteristic representation, predicting the business value, the technical value and the innovation value of the data through a multi-task learning framework simultaneously, and fusing the value evaluation results of the three dimensions into a final data asset efficacy evaluation score;
specifically, the efficiency evaluation layer adopts a multi-task learning framework and comprises three parallel evaluation branches, namely a business value evaluation branch, a technical value evaluation branch and an innovation value evaluation branch. The method comprises the steps of providing service value evaluation branch evaluation data, providing contribution degree of the service decision, flow optimization and income improvement, providing technical value evaluation branch evaluation data support degree of technical innovation, problem diagnosis and performance improvement, and providing heuristic property of the innovation value evaluation branch evaluation data on new product development, new mode exploration and knowledge discovery.
Each evaluation branch is formed by a special deep neural network that receives the composite feature representation from the feature fusion layer and outputs a value evaluation score for the corresponding dimension. The multi-task learning framework not only can provide multi-dimensional value evaluation, but also can promote evaluation of different dimensions mutually through sharing the feature representation layer, and improves the accuracy of overall evaluation.
The final data asset effectiveness evaluation score is obtained by weighting and fusing the evaluation results of the three dimensions. The fusion weight is dynamically adjusted according to different industries, different application scenes and different evaluation targets. For example, technical value may be weighted higher in a quality control scenario in manufacturing, business value may be weighted higher in a marketing scenario, and innovation value may be weighted higher in a research and development innovation scenario.
In the above example of the device health management system of the intelligent manufacturing enterprise, the multi-level fused data value evaluation model receives as input the feature vector extracted from unstructured data such as trend feature vector extracted from the CNC machine spindle vibration signal, device fault report, and association relationship data established by structured data such as device parameter configuration. The value feature extraction layer respectively extracts the early warning value of the trend feature of the vibration signal, the experience knowledge value in the fault report text and the optimization potential value in the association relation of the equipment parameters, the feature fusion layer carries out weighted fusion on the value features, dynamically adjusts the weight of each feature according to the focus of the current equipment health management, and the efficiency evaluation layer generates the comprehensive value evaluation score of the equipment data, wherein the comprehensive value evaluation score comprises the supporting value (business value) of preventive maintenance decision, the guiding value (technical value) of fault diagnosis and the heuristic value (innovation value) of equipment optimization improvement. Through the evaluation result, enterprises can identify the most valuable equipment monitoring data, optimize the data acquisition strategy and concentrate limited calculation and storage resources on the processing and analysis of high-value data, thereby improving the efficiency and the accuracy of equipment health management. For example, the system can automatically increase the sampling frequency and storage priority of vibration signal data with high early warning value for the evaluation display, strengthen text feature extraction and knowledge graph construction for fault reports with high diagnosis value for the evaluation display, and deepen causal relation analysis and optimization space mining for parameter associated data with high optimization value for the evaluation display. The intelligent resource allocation strategy based on the data value ensures that the equipment health management system can exert the maximum value of the data asset in the most efficient mode.
Example 2
As shown in FIG. 2, the intelligent data management system based on multi-source data acquisition comprises an original data collection module, a data classification module, a multi-type feature extraction module and an asset efficiency evaluation module, wherein the modules are electrically connected;
the system comprises an original data collection module, a data classification module and a data processing module, wherein the original data collection module acquires multi-source data from a multi-data source, generates an original data set with a credibility mark, and sends the original data set to the data classification module;
The data classification module classifies data types of the data records in the original data set by adopting a data type identification method based on combination of rules and a decision tree classifier, divides time sequence data, unstructured data and structured data, and sends the time sequence data, the unstructured data and the structured data to the multi-type feature extraction module;
The multi-type feature extraction module is used for extracting trend features of the time sequence data by applying a segmented multi-model time sequence feature extraction frame, extracting unstructured features of the unstructured data by using a feature extraction model based on a multi-layer perceptron, establishing self-adaptive association relation of the structured data by using a knowledge graph technology, generating association relation data, and sending the trend features, the unstructured features and the association relation data to the asset efficacy evaluation module;
And the asset efficiency evaluation module is used for taking the trend characteristics, the unstructured characteristics and the association relation data as a pre-constructed multi-level fusion data value evaluation model to generate dynamic weight data asset efficiency evaluation.
The above embodiments are only for illustrating the technical method of the present invention and not for limiting the same, and it should be understood by those skilled in the art that the technical method of the present invention may be modified or substituted without departing from the spirit and scope of the technical method of the present invention.

Claims (10)

Translated fromChinese
1.一种基于多源数据采集的智能数据管理方法,其特征在于,包括以下步骤:1. An intelligent data management method based on multi-source data collection, characterized in that it includes the following steps:步骤一:获取源自多数据源的多源数据,生成具有可信度标记的原始数据集合;Step 1: Obtain multi-source data from multiple data sources and generate a set of original data with credibility marks;步骤二:采用基于规则与决策树分类器结合的数据类型识别方法对所述原始数据集合中的数据记录进行数据类型的分类,对于数据类型为时序数据的数据,则进入步骤三;对于数据类型为非结构化数据的数据,则进入步骤四;对于数据类型为结构化数据的数据,则进入步骤五;Step 2: Classify the data types of the data records in the original data set by using a data type identification method based on a combination of rules and decision tree classifiers. For data whose data type is time series data, proceed to step 3; for data whose data type is unstructured data, proceed to step 4; for data whose data type is structured data, proceed to step 5;步骤三:应用分段多模型时序特征提取框架对所述时序数据进行趋势特征的提取;Step 3: Apply the segmented multi-model time series feature extraction framework to extract trend features from the time series data;步骤四:通过基于多层感知机的特征提取模型对所述非结构化数据进行非结构化特征的提取;Step 4: extracting unstructured features from the unstructured data using a feature extraction model based on a multi-layer perceptron;步骤五:采用知识图谱技术对所述结构化数据建立自适应关联关系,生成关联关系数据;Step 5: Using knowledge graph technology to establish adaptive association relationships for the structured data and generate association relationship data;步骤六:将所述趋势特征、非结构化特征和关联关系数据作为预先构建的多层次融合的数据价值评估模型的输入,生成动态权重的数据资产效能评估。Step 6: Use the trend characteristics, unstructured characteristics and correlation relationship data as inputs to a pre-built multi-level fusion data value assessment model to generate a dynamic weighted data asset performance assessment.2.根据权利要求1所述的一种基于多源数据采集的智能数据管理方法,其特征在于,所述获取源自多数据源的多源数据,生成具有可信度标记的原始数据集合生成具有可信度标记的原始数据集合包括以下步骤:2. According to claim 1, an intelligent data management method based on multi-source data acquisition is characterized in that the step of acquiring multi-source data from multiple data sources and generating a raw data set with a credibility mark comprises the following steps:步骤11:通过由若干种类型的数据采集单元组成的分布式数据采集网络从多数据源采集原始多源数据;Step 11: Collecting original multi-source data from multiple data sources through a distributed data collection network composed of several types of data collection units;步骤12:采用边缘计算架构的数据预处理单元对采集的原始多源数据进行标准化处理和可信度评估,形成具有可信度标记的原始数据集合。Step 12: Use the data preprocessing unit of the edge computing architecture to perform standardization processing and credibility assessment on the collected original multi-source data to form an original data set with credibility marks.3.根据权利要求2所述的一种基于多源数据采集的智能数据管理方法,其特征在于,所述采用基于规则与分类器相结合的数据类型识别方法对所述原始数据集合中的数据记录进行数据类型的分类包括以下步骤:3. The intelligent data management method based on multi-source data collection according to claim 2 is characterized in that the data type identification method based on the combination of rules and classifiers is used to classify the data records in the original data set into data types, which comprises the following steps:步骤21:构建包含结构特征提取规则、时间特征提取规则和内容特征提取规则的数据类型特征提取器,从原始数据集合中提取数据类型判断所需的关键特征向量;Step 21: construct a data type feature extractor including structure feature extraction rules, time feature extraction rules and content feature extraction rules, and extract key feature vectors required for data type judgment from the original data set;步骤22:基于提取的关键特征向量,应用决策树分类器对数据记录进行数据类型的分类,生成分类结果;所述分类结果包括时序数据、非结构化数据和结构化数据。Step 22: Based on the extracted key feature vectors, a decision tree classifier is applied to classify the data records into data types to generate classification results; the classification results include time series data, unstructured data, and structured data.4.根据权利要求3所述的一种基于多源数据采集的智能数据管理方法,其特征在于,所述应用分段多模型时序特征提取框架对所述时序数据进行趋势特征的提取包括以下步骤:4. According to claim 3, an intelligent data management method based on multi-source data collection is characterized in that the extraction of trend features of the time series data using the segmented multi-model time series feature extraction framework comprises the following steps:步骤31:构建包含自适应分段模块、多模型特征提取模块和特征融合模块的分段多模型时序特征提取框架;Step 31: construct a segmented multi-model temporal feature extraction framework including an adaptive segmentation module, a multi-model feature extraction module and a feature fusion module;步骤32:通过所述自适应分段模块采用基于动态时间规整的相似性度量方法,将连续的时序数据划分为具有相似统计特性的数据段;所述自适应分段模块通过滑动窗口算法动态确定数据段边界,窗口大小根据时序数据的变化特性自动调整;Step 32: The adaptive segmentation module adopts a similarity measurement method based on dynamic time warping to divide the continuous time series data into data segments with similar statistical characteristics; the adaptive segmentation module dynamically determines the data segment boundaries through a sliding window algorithm, and the window size is automatically adjusted according to the changing characteristics of the time series data;步骤33:通过包含统计特征提取通道、频域特征提取通道和形状特征提取通道三个特征提取通道的所述多模型特征提取模块,并行的对每个数据段应用对应的特征提取方法,从不同维度提取时序数据的单维特征向量;Step 33: applying a corresponding feature extraction method to each data segment in parallel through the multi-model feature extraction module including three feature extraction channels, namely, a statistical feature extraction channel, a frequency domain feature extraction channel, and a shape feature extraction channel, to extract a single-dimensional feature vector of the time series data from different dimensions;步骤34:利用所述特征融合模块对不同维度的单维特征向量进行融合,生成综合表征时序数据趋势特性的趋势特征。Step 34: Utilize the feature fusion module to fuse the single-dimensional feature vectors of different dimensions to generate trend features that comprehensively characterize the trend characteristics of the time series data.5.根据权利要求4所述的一种基于多源数据采集的智能数据管理方法,其特征在于,所述通过基于多层感知机的特征提取模型对所述非结构化数据进行非结构化特征的提取包括以下步骤:5. The intelligent data management method based on multi-source data collection according to claim 4 is characterized in that the step of extracting unstructured features from the unstructured data by using a feature extraction model based on a multi-layer perceptron comprises the following steps:步骤41:应用基于统计方法的特征提取器,从非结构化数据中提取初始非结构特征;Step 41: Apply a feature extractor based on statistical methods to extract initial unstructured features from the unstructured data;步骤42:设计由输入层、若干个隐藏层和输出层组成,采用全连接神经网络结构的多层感知机模型,并对多层感知机模型进行训练,将初始非结构特征映射到非结构化特征空间;Step 42: Design a multilayer perceptron model consisting of an input layer, a plurality of hidden layers and an output layer, using a fully connected neural network structure, and train the multilayer perceptron model to map the initial unstructured features to an unstructured feature space;步骤43:应用训练好的多层感知机模型对所述非结构化数据进行特征提取,生成统一格式的非结构化特征。Step 43: Apply the trained multi-layer perceptron model to extract features from the unstructured data to generate unstructured features in a unified format.6.根据权利要求5所述的一种基于多源数据采集的智能数据管理方法,其特征在于,所述多层感知机模型由输入层、多个隐藏层和输出层组成,采用全连接神经网络结构;输入层的神经元数量与初始特征的维度一致;隐藏层采用多层结构,每层的神经元数量逐层递减。6. According to claim 5, an intelligent data management method based on multi-source data acquisition is characterized in that the multi-layer perceptron model consists of an input layer, multiple hidden layers and an output layer, and adopts a fully connected neural network structure; the number of neurons in the input layer is consistent with the dimension of the initial features; the hidden layer adopts a multi-layer structure, and the number of neurons in each layer decreases layer by layer.7.根据权利要求6所述的一种基于多源数据采集的智能数据管理方法,其特征在于,所述采用知识图谱技术对所述结构化数据建立自适应关联关系,生成关联关系数据包括以下步骤:7. According to claim 6, an intelligent data management method based on multi-source data collection is characterized in that the use of knowledge graph technology to establish an adaptive association relationship for the structured data and generate association relationship data includes the following steps:步骤51:构建领域本体模型和关键实体识别规则,从结构化数据中提取关键实体及其属性;Step 51: Construct a domain ontology model and key entity identification rules to extract key entities and their attributes from structured data;步骤52:基于步骤51提取的关键实体及其属性,应用关系映射规则构建结构知识图谱;Step 52: Based on the key entities and their attributes extracted in step 51, the relationship mapping rules are applied to construct a structural knowledge graph;步骤53:基于结构知识图谱,提取结构化数据的层次关联关系并生成面向应用的关联关系数据;所述层次关联关系包括关键实体直接关联、路径关联和群组关联三个层次。Step 53: Based on the structural knowledge graph, extract the hierarchical association relationship of the structured data and generate application-oriented association relationship data; the hierarchical association relationship includes three levels: direct association of key entities, path association and group association.8.根据权利要求7所述的一种基于多源数据采集的智能数据管理方法,其特征在于,所述生成动态权重的数据资产效能评估的预先构建包括以下步骤:8. The intelligent data management method based on multi-source data collection according to claim 7, characterized in that the pre-construction of the data asset performance evaluation for generating dynamic weights comprises the following steps:步骤61:构建包含价值特征抽取层、特征融合层和效能评估层的多层次数据价值评估模型架构;Step 61: Construct a multi-level data value assessment model architecture including a value feature extraction layer, a feature fusion layer, and an effectiveness assessment layer;步骤62:通过包含趋势特征处理通道、非结构化特征处理通道和关联关系处理通道的三个并行特征处理通道的所述价值特征抽取层,从不同数据类型的数据中提取价值指标;Step 62: extracting value indicators from data of different data types through the value feature extraction layer including three parallel feature processing channels, namely, a trend feature processing channel, an unstructured feature processing channel and an association relationship processing channel;步骤63:通过基于深度神经网络的特征融合层,对来自价值特征抽取层的多维度价值指标进行自适应加权融合,生成综合特征表示;Step 63: Adaptively weighted fusion of multi-dimensional value indicators from the value feature extraction layer is performed through a feature fusion layer based on a deep neural network to generate a comprehensive feature representation;步骤64:基于所述综合特征表示,通过多任务学习框架同时预测数据的业务价值、技术价值和创新价值,并将这三个维度的价值评估结果融合为最终的数据资产效能评估分数。Step 64: Based on the comprehensive feature representation, the business value, technical value and innovation value of the data are simultaneously predicted through a multi-task learning framework, and the value assessment results of these three dimensions are integrated into the final data asset performance assessment score.9.根据权利要求8所述的一种基于多源数据采集的智能数据管理方法,其特征在于,所述趋势特征处理通道接收来自分段多模型时序特征提取框架提取的趋势特征,通过卷积神经网络提取其中包含的价值信息;非结构化特征处理通道接收来自多层感知机的特征提取模型提取的非结构化特征,通过注意力机制识别其中包含的关键信息点和隐含知识;关联关系处理通道接收来自知识图谱技术生成的关联关系数据,通过图神经网络提取实体间关系的复杂性、中心性和连接模式等结构化价值指标。9. According to claim 8, an intelligent data management method based on multi-source data acquisition is characterized in that the trend feature processing channel receives trend features extracted from a segmented multi-model time series feature extraction framework, and extracts the value information contained therein through a convolutional neural network; the unstructured feature processing channel receives unstructured features extracted from a feature extraction model of a multi-layer perceptron, and identifies key information points and implicit knowledge contained therein through an attention mechanism; the association relationship processing channel receives association relationship data generated from knowledge graph technology, and extracts structured value indicators such as the complexity, centrality and connection pattern of relationships between entities through a graph neural network.10.一种基于多源数据采集的智能数据管理系统,其用于实现权利要求1-9中任意一项所述的基于多源数据采集的智能数据管理方法,其特征在于,包括原始数据收集模块、数据分类模块、多类型特征提取模块以及资产效能评估模块;其中,各个模块之间通过电性方式连接;10. An intelligent data management system based on multi-source data collection, which is used to implement the intelligent data management method based on multi-source data collection according to any one of claims 1 to 9, characterized in that it includes a raw data collection module, a data classification module, a multi-type feature extraction module and an asset efficiency evaluation module; wherein each module is electrically connected;原始数据收集模块,获取源自多数据源的多源数据,生成具有可信度标记的原始数据集合,并将原始数据集合发送至数据分类模块;A raw data collection module, which obtains multi-source data from multiple data sources, generates a raw data set with credibility marks, and sends the raw data set to a data classification module;数据分类模块,采用基于规则与决策树分类器结合的数据类型识别方法对所述原始数据集合中的数据记录进行数据类型的分类,划分出时序数据、非结构化数据和结构化数据,并将所述时序数据、非结构化数据和结构化数据发送至多类型特征提取模块;A data classification module, which classifies the data records in the original data set into time series data, unstructured data and structured data by using a data type identification method based on a combination of rules and decision tree classifiers, and sends the time series data, unstructured data and structured data to a multi-type feature extraction module;多类型特征提取模块,应用分段多模型时序特征提取框架对所述时序数据进行趋势特征的提取,通过基于多层感知机的特征提取模型对所述非结构化数据进行非结构化特征的提取,采用知识图谱技术对所述结构化数据建立自适应关联关系,生成关联关系数据,并将趋势特征、非结构化特征和关联关系数据发送至资产效能评估模块;A multi-type feature extraction module, which applies a segmented multi-model time series feature extraction framework to extract trend features from the time series data, extracts unstructured features from the unstructured data through a feature extraction model based on a multi-layer perceptron, uses knowledge graph technology to establish adaptive association relationships for the structured data, generates association relationship data, and sends trend features, unstructured features, and association relationship data to an asset performance evaluation module;资产效能评估模块,将所述趋势特征、非结构化特征和关联关系数据作为预先构建的多层次融合的数据价值评估模型,生成动态权重的数据资产效能评估。The asset performance evaluation module uses the trend characteristics, unstructured characteristics and correlation relationship data as a pre-built multi-level fusion data value evaluation model to generate a dynamic weighted data asset performance evaluation.
CN202510743370.2A2025-06-052025-06-05 An intelligent data management system and method based on multi-source data acquisitionActiveCN120257113B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202510743370.2ACN120257113B (en)2025-06-052025-06-05 An intelligent data management system and method based on multi-source data acquisition

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202510743370.2ACN120257113B (en)2025-06-052025-06-05 An intelligent data management system and method based on multi-source data acquisition

Publications (2)

Publication NumberPublication Date
CN120257113Atrue CN120257113A (en)2025-07-04
CN120257113B CN120257113B (en)2025-09-12

Family

ID=96185808

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202510743370.2AActiveCN120257113B (en)2025-06-052025-06-05 An intelligent data management system and method based on multi-source data acquisition

Country Status (1)

CountryLink
CN (1)CN120257113B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN120578787A (en)*2025-08-012025-09-02上海龙田数码科技有限公司Heterogeneous data monitoring method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN116595355A (en)*2023-06-072023-08-15国网安徽省电力有限公司电力科学研究院 Grid data standardization reconstruction method, system, equipment and storage medium
CN119476499A (en)*2025-01-082025-02-18北京科杰科技有限公司 Data-driven cross-domain intelligent asset knowledge reasoning and value assessment method and system
CN119494463A (en)*2024-09-302025-02-21南京莱斯网信技术研究院有限公司 A method for monitoring urban operation indicators based on multi-source heterogeneity
CN119719377A (en)*2024-04-232025-03-28中国人民解放军新疆军区参谋部第二部 Intelligence collaborative processing platform and potential relationship prediction method based on knowledge graph
CN119831575A (en)*2024-12-272025-04-15湖北兴瑞硅材料有限公司Pipeline weld joint management system and management method thereof
CN120015357A (en)*2025-04-212025-05-16广东康软科技股份有限公司 Health management data mining method and system based on deep learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN116595355A (en)*2023-06-072023-08-15国网安徽省电力有限公司电力科学研究院 Grid data standardization reconstruction method, system, equipment and storage medium
CN119719377A (en)*2024-04-232025-03-28中国人民解放军新疆军区参谋部第二部 Intelligence collaborative processing platform and potential relationship prediction method based on knowledge graph
CN119494463A (en)*2024-09-302025-02-21南京莱斯网信技术研究院有限公司 A method for monitoring urban operation indicators based on multi-source heterogeneity
CN119831575A (en)*2024-12-272025-04-15湖北兴瑞硅材料有限公司Pipeline weld joint management system and management method thereof
CN119476499A (en)*2025-01-082025-02-18北京科杰科技有限公司 Data-driven cross-domain intelligent asset knowledge reasoning and value assessment method and system
CN120015357A (en)*2025-04-212025-05-16广东康软科技股份有限公司 Health management data mining method and system based on deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YU LI等: "Attention-Based Learning for Predicting Drug-Drug Interactions in Konwledge Graph Embedding Based on Multisource Fusion Information", INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEM, vol. 2024, 2 March 2024 (2024-03-02)*

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN120578787A (en)*2025-08-012025-09-02上海龙田数码科技有限公司Heterogeneous data monitoring method and system

Also Published As

Publication numberPublication date
CN120257113B (en)2025-09-12

Similar Documents

PublicationPublication DateTitle
CN111259947A (en)Power system fault early warning method and system based on multi-mode learning
Koppel et al.MDAIC–a Six Sigma implementation strategy in big data environments
CN117668774A (en)Metering equipment performance monitoring system
CN118503910A (en)Intelligent campus operation and maintenance data management method and system based on big data
CN117971808B (en)Intelligent construction method for enterprise data standard hierarchical relationship
CN116383645A (en)Intelligent system health degree monitoring and evaluating method based on anomaly detection
CN118152355A (en)Log acquisition management method and system
Xie et al.Logm: Log analysis for multiple components of hadoop platform
CN119004367A (en)Intelligent target system data monitoring method and system based on large model
CN120297690B (en)Production scheduling strategy adjustment system and method
CN118733714B (en)Semantic big model optimization method and system for electric power scene
CN117455059A (en)Industry trend evaluation system based on data acquisition
CN119337197A (en) A group behavior analysis method based on multimodal information fusion and dynamic updating
CN120162213A (en) Risk warning method, system and electronic equipment
CN119494463A (en) A method for monitoring urban operation indicators based on multi-source heterogeneity
CN120086993B (en) Spindle life prediction method and system based on application analysis
CN120217158A (en) Asset operation and maintenance decision-making management platform and management method based on data fusion
CN117422181B (en)Fuzzy label-based method and system for early warning loss of issuing clients
CN118798633A (en) Industrial chain risk information monitoring method based on network data and artificial intelligence technology
CN118708932A (en) A feature perception pre-training method and system for time series anomaly detection
CN120257113B (en) An intelligent data management system and method based on multi-source data acquisition
Benatia et al.Fault diagnosis using deep neural networks for industrial alarm sequence clustering
CN119441800B (en)Enterprise management data management method based on big data analysis
CN120410574B (en)AI-based data quality intelligent evaluation optimization system
CN119808794B (en) A big data intelligent analysis method and system based on AI

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp