Disclosure of Invention
The invention provides a health data processing method and system based on medical big data to solve at least one technical problem.
The application provides a health data processing method based on medical big data, which comprises the following steps:
step S1, acquiring medical basic data through an electronic medical record system, and preprocessing the medical basic data to acquire medical preprocessing data;
S2, medical data analysis is carried out on the medical pretreatment data so as to obtain medical analysis data, and the medical analysis data is utilized to carry out multi-mode data fusion on the medical pretreatment data so as to obtain medical multi-mode fusion data;
S3, medical characteristic extraction is carried out on the medical multi-mode fusion data, so that medical multi-mode characteristic data are obtained;
s4, carrying out associated feature mining on the medical multi-mode feature data so as to acquire medical associated feature mining data;
s5, constructing a time sequence prediction model of the medical associated feature mining data so as to obtain a medical health time sequence prediction data model;
And S6, performing medical resource optimization allocation according to the medical health time sequence prediction data model, so as to obtain time sequence medical resource allocation data.
According to the invention, through the steps of multi-mode data fusion and feature extraction, medical basic data from an electronic medical record system is effectively processed, and the accuracy of data processing is improved. By constructing a time sequence prediction model and carrying out associated feature mining, the invention can provide powerful decision support tools for doctors and medical institutions, help the doctors and the medical institutions to more accurately evaluate and predict the use condition of medical resources based on big data, and further make more targeted medical decisions. Can provide basis for medical institutions, help them to allocate resources more effectively, optimize service flows and improve work efficiency. By optimizing and allocating accurate medical resources, not only can the medical service quality be improved, but also the satisfaction degree of patients can be improved.
Preferably, step S1 is specifically:
Step S11, medical data acquisition is carried out from the electronic medical record system, so that medical basic data are acquired;
Step S12, medical data cleaning is carried out on the medical basic data, so that medical cleaning data are obtained;
Step S13, medical data desensitization is carried out on the medical cleaning data, so that medical desensitization data are obtained;
Step S14, medical data quality detection is carried out on the medical desensitization data, so that medical quality report data are obtained;
step S15, medical data standardization is carried out on medical quality report data so as to obtain medical standardization data;
S16, medical data transformation is carried out on the medical standardized data, so that medical transformation data are obtained;
step S17, medical data aggregation is carried out on the medical transformation data, so that medical aggregation data are obtained;
And S18, performing medical data storage conversion on the medical aggregate data so as to acquire medical pretreatment data.
The invention can effectively clean and detect the quality of medical data through the step S12 and the step S14, which can help to eliminate inaccurate, inconsistent or lost data, thereby improving the accuracy of subsequent data processing and analysis. Step S13 is used for desensitizing the data, so that privacy and sensitive information of a patient can be protected, and ethical and legal requirements of medical data can be complied with. Step S15 and step S16 convert the data into a unified standard form, and perform necessary data transformation, thus laying a solid foundation for subsequent data analysis and mining. Through the data aggregation of step S17, the multi-source data can be effectively integrated, which helps to improve the availability of the data and the integrity of the information. The data storage conversion of step S18 provides an effective solution for efficiently and securely storing and retrieving data.
Preferably, the medical data cleansing is performed by a medical data cleansing calculation formula, wherein the medical data cleansing calculation formula is specifically:
y is medical cleaning data, x is medical base data, a is a change rate of the medical base data, n is a total number of data of the medical base data, b is an offset of the medical base data, c is a stability parameter of the medical base data, d is a noise item of the medical base data,The fluctuation amplitude term of the medical basic data is f, the periodicity parameter of the medical basic data is g, the nonlinearity degree term of the medical basic data is h, and the upper limit value of the medical basic data is h.
The invention constructs a medical data cleaning calculation formula, which preprocesses and cleans the original medical data, eliminates noise, corrects deviation, stabilizes data distribution, and retains or extracts important features so as to adapt to the requirements of subsequent data analysis and modeling. The parameter b in the formula corresponds to the offset of the data, and can properly adjust the starting point of the data in the process of cleaning the data, and eliminate the offset caused by data acquisition or other factors, thereby improving the accuracy of data processing. And parameters c and d correspond to stability and noise items of the data, and can correct the original data containing random disturbance or other unstable factors to improve the stability of the data. Parameters e and f correspond to the fluctuation amplitude and periodicity of the data, and can be used for processing medical data with strong periodicity, such as heart rate and blood pressure, and periodic characteristics of the data can be effectively extracted and reserved. The parameter g corresponds to the degree of nonlinearity of the data, for which the data can be transformed appropriately to make it more consistent with the assumptions of the subsequent model. The parameter h represents the upper limit value of the data, and abnormal data beyond the normal range can be effectively processed. According to specific data characteristics and problem requirements, the values of all parameters are adjusted to achieve the optimal data cleaning effect.
Preferably, step S2 is specifically:
S21, analyzing medical data of the medical pretreatment data so as to acquire medical analysis data;
s22, performing preliminary medical data analysis on the medical analysis data so as to acquire medical analysis data;
step S23, medical analysis data are utilized to map medical data of the medical analysis data, so that medical mapping data are obtained;
S24, performing multi-mode fusion data conversion on the medical mapping data so as to acquire medical conversion data;
Step S25, medical data integration is carried out on the medical conversion data, so that medical integrated data are obtained;
Step S26, medical data fusion is carried out on the medical integrated data, so that medical fusion data are obtained;
step S27, medical data reorganization is carried out on the medical fusion data, so that medical reorganization data are obtained;
and step S28, performing medical data format conversion on the medical recombined data so as to obtain medical multi-mode fusion data.
By analyzing and primarily analyzing the medical pretreatment data, the method is beneficial to extracting information useful for subsequent processing and decision from the original data, and improves the comprehensiveness of the data. Through medical data mapping and format conversion, the data can be converted into a form more suitable for subsequent processing and analysis, and the interoperability of the data is enhanced. Through medical data integration and fusion, the invention can unify data from different sources or with different formats, improves the consistency of the data and reduces the complexity of data processing. Through multi-mode fusion data conversion, multiple types of data can be integrated together, a more comprehensive visual angle is provided, and the comprehensiveness of the data is enhanced. Through medical data reorganization, the invention can optimize the data structure, so that the data structure is more suitable for subsequent data analysis and machine learning tasks, and the data processing efficiency is improved.
Preferably, step S22 is specifically:
step S221, performing preliminary examination on the medical analysis data so as to obtain medical preliminary examination data;
step S222, carrying out data statistics summary processing on the medical preliminary examination data so as to obtain medical statistics data;
step S223, medical data exploration is carried out on the medical statistical data, so that medical exploration data are obtained;
step S224, carrying out hypothesis testing on the medical exploration data so as to obtain medical hypothesis testing data;
Step S225, medical data modeling is carried out on the hypothesis test data, so that a medical data verification model is constructed;
and step S226, performing result interpretation on the medical data verification model so as to acquire medical analysis data.
The invention divides the medical data processing process into clear sub-steps, which not only helps people understand and track the whole data processing process, but also facilitates the searching and solving of various problems in the data processing process. In step S223, the characteristics and structure of the data can be understood more deeply through a comprehensive exploration of the medical statistics, which is extremely important for subsequent hypothesis testing and model construction. In step S224, by performing hypothesis testing on the medical exploration data, the quality of the data and the reliability of the model can be ensured, and the accuracy of the subsequent data analysis can be improved. In step S225, by establishing the medical data verification model, the characteristics and structure of the data can be visualized, the data can be more intuitively understood and interpreted, and the basis can be provided for subsequent prediction and decision. In step S226, the results of the model are interpreted, which not only allows non-professionals to understand and use the results of the model, but also facilitates the assessment and improvement of the model by professionals.
Preferably, in step S23, the medical data map is processed by a medical data map calculation formula, wherein the medical data map calculation formula specifically includes:
R is medical mapping data, x1 is medical age data, x2 is medical blood pressure data, x3 is medical blood glucose data, x4 is medical cholesterol data, b is a blood glucose level index adaptively adjusted according to x1, o is a medical mapping constant term, x5 is medical weight data, x6 is medical heart rate data, x7 is medical vital capacity data, and x8 is medical body temperature data.
The invention constructs a medical data mapping calculation formula which can convert various medical data (age, blood pressure, blood sugar and cholesterol) into uniform mapping data R. This formula can provide a comprehensive assessment of the health of a patient by taking into account a number of important medical parameters. This facilitates the identification of possible health risks and the timely intervention. The formula maps a plurality of medical parameters to one numerical value, simplifies the data processing and analyzing process, and is convenient for the visualization and interpretation of data. According to the parameter b in the formula, the formula has certain self-adaptability, and can automatically adjust the blood glucose level index according to the age data. Both logox1 and sinx2 are non-linear transformations on data that help capture non-linear relationships that may exist in the data.Tan-1x6The data are transformed, which is helpful to enlarge or reduce the influence of certain parameters on the result so as to adapt to different medical scenes.The effect of body temperature x8 on health conditions can be emphasized, especially in the case of abnormal body temperature. The calculation formula provides a more accurate and reliable medical data mapping calculation mode by considering human body parameters.
Preferably, step S3 is specifically:
s31, medical characteristic extraction is carried out on medical multi-mode fusion data, so that preliminary medical multi-mode characteristic data are obtained;
Step S32, medical characteristic selection is carried out on the preliminary medical multi-mode characteristic data, so that the medical multi-mode selection characteristic data are obtained;
s33, performing medical characteristic construction on the medical multi-mode selection characteristic data so as to acquire medical multi-mode construction characteristic data;
Step S34, performing medical characteristic conversion on the medical multi-mode construction characteristic data so as to obtain medical multi-mode conversion characteristic data;
step S35, medical characteristic extraction is carried out on the medical multi-mode conversion characteristic data, so that the medical multi-mode extraction characteristic data are obtained;
And step S36, medical characteristic verification and optimization are carried out on the medical multi-mode extraction characteristic data, so that the medical multi-mode characteristic data are obtained.
By extracting and constructing the characteristics of the original medical multi-mode fusion data, the invention can generate new characteristics which possibly contain information which cannot be obviously represented in the original data. Such information may help to improve the depth and accuracy of the data analysis. By means of feature selection and feature extraction, irrelevant or redundant features can be removed, the dimension of data is reduced, the complexity and calculation burden of subsequent data analysis and model training are reduced, and the calculation efficiency is improved. Through feature verification and optimization, the representativeness and generalization capability of the selected features can be checked and improved, so that the stability and prediction accuracy of a data model established later are improved.
Preferably, step S4 is specifically:
S41, performing association rule data format conversion on the medical multi-mode feature data so as to obtain medical multi-mode feature preparation data;
Step S42, frequency scanning is carried out on the medical multi-mode feature preparation data so as to obtain feature frequency data;
S43, carrying out descending order construction on the characteristic frequency data so as to construct characteristic FP tree data;
S44, performing conditional FP tree generation on the characteristic FP tree data so as to acquire the characteristic conditional FP tree data;
s45, carrying out frequent item set mining on the characteristic condition FP tree data so as to acquire frequent item set data;
and step S46, carrying out association generation on the frequent item set data so as to acquire medical association feature mining data.
The present invention, through the use of frequent item set mining and association rule generation, can reveal patterns or relationships in the data that may be hidden, which may not be apparent or ignored in the original data. These relationships may further be used for diagnosis, therapeutic decisions, or understanding underlying mechanisms of disease. The use of FP-tree (frequent pattern tree) data structures can effectively store and process large-scale data sets more efficiently than conventional association rule mining algorithms such as Apriori. Association rule mining can identify those feature combinations that alone may not be relevant but that together appear meaningful, which can optimize the feature selection process, facilitating subsequent data analysis or model training. By understanding and applying the discovered association rules, such as predicting disease progression, treatment protocols more amenable to medical resource optimization are formulated.
Preferably, step S5 is specifically:
Step S51, marking time series data of the medical associated feature mining data so as to obtain time series medical data;
step S52, performing data characteristic analysis on the time-series medical data so as to obtain medical data characteristic analysis data;
Step S53, when the medical data characteristic analysis data are linear time characteristic analysis data, performing first time sequence prediction model construction on time sequence medical data so as to obtain a medical health time sequence prediction data model;
And S54, when the medical data characteristic analysis data are nonlinear time characteristic analysis data, performing second time sequence prediction model construction on the time sequence medical data so as to obtain a medical health time sequence prediction data model, wherein the first time sequence prediction model construction and the second time sequence prediction model construction are constructed by adopting different time sequence prediction model construction modes.
According to the invention, the linear time sequence prediction model and the nonlinear time sequence prediction model are respectively constructed according to different data characteristics, and the strategy for dynamically selecting the prediction model can effectively improve the prediction accuracy. The time series prediction model can effectively process medical time series data, and can predict future health conditions so as to facilitate early intervention and prevent disease development. By predicting time-sequenced medical data, personalized medical services may be provided for each patient. For example, more effective treatment plans may be formulated based on the predicted outcome, or preventive measures may be formulated in advance. By predicting future health, the healthcare provider may allocate resources more efficiently, e.g., may schedule an operating room or nurse in advance, optimizing the provision of healthcare.
Preferably, a health data processing system based on medical big data, comprising:
The medical data preprocessing module is used for acquiring medical basic data through the electronic medical record system and preprocessing the medical basic data to acquire medical preprocessing data;
the multi-mode data fusion module is used for carrying out medical data analysis on the medical pretreatment data so as to obtain medical analysis data, and carrying out multi-mode data fusion on the medical pretreatment data by utilizing the medical analysis data so as to obtain medical multi-mode fusion data;
The medical characteristic extraction module is used for extracting medical characteristics of the medical multi-mode fusion data so as to acquire the medical multi-mode characteristic data;
the associated feature mining module is used for carrying out associated feature mining on the medical multi-modal feature data so as to acquire medical associated feature mining data;
The time sequence prediction model construction module is used for constructing a time sequence prediction model of the medical associated feature mining data so as to acquire a medical health time sequence prediction data model;
and the health resource allocation module is used for optimally allocating the medical resources according to the medical health time sequence prediction data model so as to acquire time sequence medical resource allocation data.
The method has the beneficial effects that the medical basic data is preprocessed and analyzed, and the method can extract key information in the data and convert the key information into a format capable of further carrying out multi-mode data fusion. The process not only improves the usability of the data, but also creates conditions for the subsequent data analysis and model construction. The multi-mode data fusion processing enables the processing process to master the information of multiple aspects of the patient more comprehensively, and enables the subsequent feature extraction and model prediction to be more comprehensive and accurate. And carrying out deep feature extraction, associated feature mining and time sequence prediction model construction on the medical data. This process enables medical predictions to be deeper and more accurate, enabling the patient's health to be understood and predicted more deeply. This process enables medical professionals to recognize high risk medical conditions earlier and to intervene accordingly, thereby avoiding or alleviating problems with insufficient medical resources. The method is operated based on the medical big data, so that the potential of the big data can be fully utilized, more accurate and personalized medical health prediction is provided, and powerful support is provided for medical health management.
Detailed Description
The following is a clear and complete description of the technical method of the present patent in conjunction with the accompanying drawings, and it is evident that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, are intended to fall within the scope of the present invention.
Furthermore, the drawings are merely schematic illustrations of the present invention and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. The functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor methods and/or microcontroller methods.
It will be understood that, although the terms "first," "second," etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another element. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the scope of example embodiments. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.
Referring to fig. 1 to 7, the present application provides a health data processing method based on medical big data, comprising the following steps:
step S1, acquiring medical basic data through an electronic medical record system, and preprocessing the medical basic data to acquire medical preprocessing data;
specifically, for example, an electronic medical records system collects medical base data such as patient medical history, physiological indicators, diagnostic results, and treatment regimens. Medical data preprocessing includes cleaning (removing invalid, duplicate and erroneous data), normalizing (converting data to the same scale), filling in missing values (filling by interpolation or prediction), etc., to obtain medical preprocessed data.
S2, medical data analysis is carried out on the medical pretreatment data so as to obtain medical analysis data, and the medical analysis data is utilized to carry out multi-mode data fusion on the medical pretreatment data so as to obtain medical multi-mode fusion data;
Specifically, for example, medical pretreatment data is parsed, for example, key information in structured data is extracted, unstructured data (such as notes of doctors) is converted into structured data, and the parsed data and the pretreatment data are subjected to multi-modal data fusion. Data fusion may include data alignment (ensuring that the time series of source data are consistent) and data merging (integrating source data together).
S3, medical characteristic extraction is carried out on the medical multi-mode fusion data, so that medical multi-mode characteristic data are obtained;
Specifically, medical feature extraction is performed, for example, using a machine learning algorithm such as Principal Component Analysis (PCA), linear Discriminant Analysis (LDA), or the like, with the aim of reducing the data dimension while retaining the most important feature information, thereby acquiring medical multi-modal feature data.
S4, carrying out associated feature mining on the medical multi-mode feature data so as to acquire medical associated feature mining data;
Specifically, for example, the medical multi-modal feature data is analyzed by using association rule mining (such as Apriori, FP-growth algorithm and the like) so as to find association relations among different features, and medical association feature mining data is obtained.
S5, constructing a time sequence prediction model of the medical associated feature mining data so as to obtain a medical health time sequence prediction data model;
Specifically, for example, a time series prediction model, such as an autoregressive moving average model (ARIMA) and a Recurrent Neural Network (RNN), is constructed according to the medical association characteristic mining data, and a medical health time series prediction data model is obtained.
And S6, performing medical resource optimization allocation according to the medical health time sequence prediction data model, so as to obtain time sequence medical resource allocation data.
Specifically, for example, the medical health time series prediction data model is used for predicting the risk level, for example, a threshold is set, and the health risk level of the patient, such as low risk, middle risk and high risk, is judged according to the prediction result of the model, so that the health detection risk level data is obtained. Medical resource optimization allocation is carried out according to the risk level prediction result, so that potential medical problems caused by insufficient or too-sufficient medical resource allocation in the medical process can be better solved.
In particular, for example, hospital bed optimization, the use of predictive models to predict hospital bed demand at a particular time in the future based on historical hospital bed usage data. When a peak is predicted, the idle sickbeds in other areas can be temporarily increased or scheduled in advance to meet the demands. Conversely, when a valley is predicted, the unused beds may be used in other more desirable locations, or maintenance and cleaning of the beds may be performed. And medical staff scheduling, namely scheduling the medical staff in advance according to the patient number predicted by the prediction model. When a large number of patients are predicted in the future, more medical staff can be scheduled on duty in advance. Conversely, when fewer patients are predicted, the healthcare staff may be appropriately reduced or allowed to train and rest. Medical equipment management, namely predicting the use requirement of future equipment according to the prediction model, and optimally allocating the medical equipment. For example, when high demand is predicted, more equipment may be scheduled for maintenance and servicing to ensure that they are functioning properly when needed. Conversely, if less equipment needs are predicted, an upgrade or cleaning of the equipment may be performed. Drug and material management the predictive model may help predict future drug and medical material needs. By optimizing the procurement and distribution of medicines and supplies, expiration and waste of medicines can be reduced while ensuring adequate supply of medicines and supplies at critical times.
According to the invention, through the steps of multi-mode data fusion and feature extraction, medical basic data from an electronic medical record system is effectively processed, and the accuracy of data processing is improved. By constructing a time sequence prediction model and carrying out associated feature mining, the invention can efficiently predict the medical health state, provides prediction for the health risk level of an individual, and is beneficial to timely finding and preventing health risks. Powerful decision support tools can be provided for doctors and medical institutions, and the doctors and the medical institutions are helped to evaluate and predict the use condition of medical resources more accurately based on big data, so that more targeted medical decisions are made. Can provide basis for medical institutions, help them to allocate resources more effectively, optimize service flows and improve work efficiency. By optimizing and allocating accurate medical resources, not only can the medical service quality be improved, but also the satisfaction degree of patients can be improved.
Preferably, step S1 is specifically:
Step S11, medical data acquisition is carried out from the electronic medical record system, so that medical basic data are acquired;
in particular, medical data acquisition is performed, for example, from an electronic medical records system. The data types may include structured tabular data (e.g., patient age, sex, physiological index), and unstructured data (e.g., doctor's notes, visual diagnostic report).
Step S12, medical data cleaning is carried out on the medical basic data, so that medical cleaning data are obtained;
In particular, medical data cleansing is performed, for example, which includes removing invalid, duplicate and erroneous data, and filtering out extraneous information.
Step S13, medical data desensitization is carried out on the medical cleaning data, so that medical desensitization data are obtained;
In particular, medical data desensitization of medical cleaning data, for example, typically involves removing or replacing personal identification information (e.g., name, identification number, etc.) of a patient to preserve privacy of the patient.
Step S14, medical data quality detection is carried out on the medical desensitization data, so that medical quality report data are obtained;
Specifically, for example, medical data quality detection is performed on medical desensitization data. This may include checking the integrity of the data (whether the data is missing), consistency (whether the data is contradictory) and accuracy (whether the data is correct).
Step S15, medical data standardization is carried out on medical quality report data so as to obtain medical standardization data;
In particular, for example, medical data normalization of medical quality report data may involve converting the data to the same scale or range, e.g., normalizing all numeric data to a 0-1 range.
S16, medical data transformation is carried out on the medical standardized data, so that medical transformation data are obtained;
In particular, medical data transformations are performed on medical standardized data, such as converting a non-linear relationship to a linear relationship, or other transformations that facilitate subsequent analysis, for example.
Step S17, medical data aggregation is carried out on the medical transformation data, so that medical aggregation data are obtained;
Specifically, medical data aggregation is performed on medical transformation data, for example. This may involve categorizing and summarizing the data according to certain rules (e.g., time, place, patient population, etc.).
And S18, performing medical data storage conversion on the medical aggregate data so as to acquire medical pretreatment data.
Specifically, for example, medical data storage conversion is performed on medical aggregate data, for example, the data is stored in a format convenient for query and retrieval, or the data is uploaded to a data warehouse in the cloud, so that medical pretreatment data is acquired.
The invention can effectively clean and detect the quality of medical data through the step S12 and the step S14, which can help to eliminate inaccurate, inconsistent or lost data, thereby improving the accuracy of subsequent data processing and analysis. Step S13 is used for desensitizing the data, so that privacy and sensitive information of a patient can be protected, and ethical and legal requirements of medical data can be complied with. Step S15 and step S16 convert the data into a unified standard form, and perform necessary data transformation, thus laying a solid foundation for subsequent data analysis and mining. Through the data aggregation of step S17, the multi-source data can be effectively integrated, which helps to improve the availability of the data and the integrity of the information. The data storage conversion of step S18 provides an effective solution for efficiently and securely storing and retrieving data.
Preferably, the medical data cleansing is performed by a medical data cleansing calculation formula, wherein the medical data cleansing calculation formula is specifically:
y is medical cleaning data, x is medical base data, a is a change rate of the medical base data, n is a total number of data of the medical base data, b is an offset of the medical base data, c is a stability parameter of the medical base data, d is a noise item of the medical base data,The fluctuation amplitude term of the medical basic data is f, the periodicity parameter of the medical basic data is g, the nonlinearity degree term of the medical basic data is h, and the upper limit value of the medical basic data is h.
The invention constructs a medical data cleaning calculation formula, which preprocesses and cleans the original medical data, eliminates noise, corrects deviation, stabilizes data distribution, and retains or extracts important features so as to adapt to the requirements of subsequent data analysis and modeling. The parameter b in the formula corresponds to the offset of the data, and can properly adjust the starting point of the data in the process of cleaning the data, and eliminate the offset caused by data acquisition or other factors, thereby improving the accuracy of data processing. And parameters c and d correspond to stability and noise items of the data, and can correct the original data containing random disturbance or other unstable factors to improve the stability of the data. Parameters e and f correspond to the fluctuation amplitude and periodicity of the data, and can be used for processing medical data with strong periodicity, such as heart rate and blood pressure, and periodic characteristics of the data can be effectively extracted and reserved. The parameter g corresponds to the degree of nonlinearity of the data, for which the data can be transformed appropriately to make it more consistent with the assumptions of the subsequent model. The parameter h represents the upper limit value of the data, and abnormal data beyond the normal range can be effectively processed. According to specific data characteristics and problem requirements, the values of all parameters are adjusted to achieve the optimal data cleaning effect.
Preferably, step S2 is specifically:
S21, analyzing medical data of the medical pretreatment data so as to acquire medical analysis data;
specifically, for example, medical data parsing of medical pre-processing data may include identifying key information in text (e.g., diseases, symptoms, medications, etc.), decoding image data.
S22, performing preliminary medical data analysis on the medical analysis data so as to acquire medical analysis data;
specifically, for example, preliminary medical data analysis is performed on medical analysis data, such as calculation of statistical characteristics (e.g., mean, variance, etc.), detection of abnormal values.
Step S23, medical analysis data are utilized to map medical data of the medical analysis data, so that medical mapping data are obtained;
Specifically, the medical analysis data is mapped to medical data, for example, using the medical analysis data, for example, the disease name is mapped to a uniform code according to the ICD coding system.
S24, performing multi-mode fusion data conversion on the medical mapping data so as to acquire medical conversion data;
specifically, for example, the medical mapping data is subjected to a multi-modal fusion data conversion, such as converting the image data and the text data into a unified numerical representation.
Step S25, medical data integration is carried out on the medical conversion data, so that medical integrated data are obtained;
In particular, medical data integration is performed, for example, on medical conversion data, for example, integrating data at different time points of the same patient together.
Step S26, medical data fusion is carried out on the medical integrated data, so that medical fusion data are obtained;
specifically, medical data fusion is performed on medical integrated data, for example, different sources and different types of data are fused to obtain more comprehensive patient information.
Step S27, medical data reorganization is carried out on the medical fusion data, so that medical reorganization data are obtained;
in particular, medical fusion data is, for example, subjected to medical data reorganization, for example, data reorganized in accordance with a specific time window, for time series analysis.
And step S28, performing medical data format conversion on the medical recombined data so as to obtain medical multi-mode fusion data.
In particular, medical data format transformations are performed on medical reorganization data, such as converting the data to an input format required by a machine learning model.
By analyzing and primarily analyzing the medical pretreatment data, the method is beneficial to extracting information useful for subsequent processing and decision from the original data, and improves the comprehensiveness of the data. Through medical data mapping and format conversion, the data can be converted into a form more suitable for subsequent processing and analysis, and the interoperability of the data is enhanced. Through medical data integration and fusion, the invention can unify data from different sources or with different formats, improves the consistency of the data and reduces the complexity of data processing. Through multi-mode fusion data conversion, multiple types of data can be integrated together, a more comprehensive visual angle is provided, and the comprehensiveness of the data is enhanced. Through medical data reorganization, the invention can optimize the data structure, so that the data structure is more suitable for subsequent data analysis and machine learning tasks, and the data processing efficiency is improved.
Preferably, step S22 is specifically:
step S221, performing preliminary examination on the medical analysis data so as to obtain medical preliminary examination data;
specifically, for example, medical analytic data is initially reviewed, such as automatically reviewed by artificial intelligence or machine learning techniques, and potential outliers, false information, or missing values are identified and marked.
Step S222, carrying out data statistics summary processing on the medical preliminary examination data so as to obtain medical statistics data;
specifically, for example, data statistics summary processing is performed on medical primary review data, such as calculating statistics of mean, variance, median, mode, maximum, minimum of the data, so as to have a rough knowledge of the base condition of the data set.
Step S223, medical data exploration is carried out on the medical statistical data, so that medical exploration data are obtained;
Specifically, medical data exploration is performed on medical statistics, such as drawing histograms, box charts, scatter charts, for example, in order to more deeply understand the distribution and relevance of the data.
Step S224, carrying out hypothesis testing on the medical exploration data so as to obtain medical hypothesis testing data;
In particular, hypothesis testing is performed on medical exploration data, such as using t-test, chi-square test, analysis of variance methods, for example, to verify that certain hypotheses about the data are true.
Step S225, medical data modeling is carried out on the hypothesis test data, so that a medical data verification model is constructed;
specifically, medical data modeling is performed on hypothesis test data, such as using linear regression, logistic regression, random forest, machine learning algorithms of neural networks, and a predictive model or classification model is built from the data.
And step S226, performing result interpretation on the medical data verification model so as to acquire medical analysis data.
Specifically, for example, the medical data verification model is subjected to result interpretation, such as analysis of the accuracy of the model, recall, evaluation index of the F1 score, or analysis of the feature importance of the model, so as to understand the prediction or classification ability of the model, and the influence of each feature on the model.
The invention divides the medical data processing process into clear sub-steps, which not only helps people understand and track the whole data processing process, but also facilitates the searching and solving of various problems in the data processing process. In step S223, the characteristics and structure of the data can be understood more deeply through a comprehensive exploration of the medical statistics, which is extremely important for subsequent hypothesis testing and model construction. In step S224, by performing hypothesis testing on the medical exploration data, the quality of the data and the reliability of the model can be ensured, and the accuracy of the subsequent data analysis can be improved. In step S225, by establishing the medical data verification model, the characteristics and structure of the data can be visualized, the data can be more intuitively understood and interpreted, and the basis can be provided for subsequent prediction and decision. In step S226, the results of the model are interpreted, which not only allows non-professionals to understand and use the results of the model, but also facilitates the assessment and improvement of the model by professionals.
Preferably, in step S23, the medical data map is processed by a medical data map calculation formula, wherein the medical data map calculation formula specifically includes:
R is medical mapping data, x1 is medical age data, x2 is medical blood pressure data, x3 is medical blood glucose data, x4 is medical cholesterol data, b is a blood glucose level index adaptively adjusted according to x1, o is a medical mapping constant term, x5 is medical weight data, x6 is medical heart rate data, x7 is medical vital capacity data, and x8 is medical body temperature data.
The invention constructs a medical data mapping calculation formula which can convert various medical data (age, blood pressure, blood sugar and cholesterol) into uniform mapping data R. This formula can provide a comprehensive assessment of the health of a patient by taking into account a number of important medical parameters. This facilitates the identification of possible health risks and the timely intervention. The formula maps a plurality of medical parameters to one numerical value, simplifies the data processing and analyzing process, and is convenient for the visualization and interpretation of data. According to the parameter b in the formula, the formula has certain self-adaptability, and can automatically adjust the blood glucose level index according to the age data. Both logox1 and sinx2 are non-linear transformations on data that help capture non-linear relationships that may exist in the data.Tan-1x6The data are transformed, which is helpful to enlarge or reduce the influence of certain parameters on the result so as to adapt to different medical scenes.The effect of body temperature x8 on health conditions can be emphasized, especially in the case of abnormal body temperature. The calculation formula provides a more accurate and reliable medical data mapping calculation mode by considering human body parameters.
Preferably, step S3 is specifically:
s31, medical characteristic extraction is carried out on medical multi-mode fusion data, so that preliminary medical multi-mode characteristic data are obtained;
Specifically, for example, medical features of the medical multimodal fusion data are extracted, for example, features of texture, shape, and intensity are extracted from medical images using image processing algorithms, or features of keywords, phrases, and topics are extracted from medical record text using natural language processing techniques.
Step S32, medical characteristic selection is carried out on the preliminary medical multi-mode characteristic data, so that the medical multi-mode selection characteristic data are obtained;
specifically, for example, medical feature selection is performed on preliminary medical multi-modal feature data, and a part of features having the greatest influence on a prediction or classification target is selected from a plurality of preliminary features, for example, using a pearson correlation coefficient, mutual information, chi-square test, and a recursive feature elimination method.
S33, performing medical characteristic construction on the medical multi-mode selection characteristic data so as to acquire medical multi-mode construction characteristic data;
specifically, for example, medical multi-modal selection feature data is subjected to medical feature construction, for example, a method of feature intersection, feature combination and feature coding is used, and new features are constructed according to the selected features.
Step S34, performing medical characteristic conversion on the medical multi-mode construction characteristic data so as to obtain medical multi-mode conversion characteristic data;
Specifically, for example, medical multi-modal structural feature data is subjected to medical feature conversion, and structural features are subjected to dimension reduction or conversion by using a Principal Component Analysis (PCA), linear Discriminant Analysis (LDA), or an automatic encoder method.
Step S35, medical characteristic extraction is carried out on the medical multi-mode conversion characteristic data, so that the medical multi-mode extraction characteristic data are obtained;
Specifically, for example, medical feature extraction is performed on medical multi-modal transformation feature data, and a part of features having the greatest influence on prediction or classification targets is extracted from the transformed features by using, for example, a decision tree, a random forest, and a machine learning algorithm supporting a vector machine.
And step S36, medical characteristic verification and optimization are carried out on the medical multi-mode extraction characteristic data, so that the medical multi-mode characteristic data are obtained.
Specifically, for example, medical multi-modal extraction feature data is subjected to medical feature verification and optimization, for example, the extracted features are verified by using methods of cross verification, grid search and model evaluation, and the features are optimized according to verification results.
By extracting and constructing the characteristics of the original medical multi-mode fusion data, the invention can generate new characteristics which possibly contain information which cannot be obviously represented in the original data. Such information may help to improve the depth and accuracy of the data analysis. By means of feature selection and feature extraction, irrelevant or redundant features can be removed, the dimension of data is reduced, the complexity and calculation burden of subsequent data analysis and model training are reduced, and the calculation efficiency is improved. Through feature verification and optimization, the representativeness and generalization capability of the selected features can be checked and improved, so that the stability and prediction accuracy of a data model established later are improved.
Preferably, step S4 is specifically:
S41, performing association rule data format conversion on the medical multi-mode feature data so as to obtain medical multi-mode feature preparation data;
Specifically, for example, the medical multi-modal feature data is subjected to association rule data format conversion, for example, the multi-modal feature data is converted into a format of a transaction data set, each transaction corresponds to a medical record, and each transaction contains an item corresponding to a feature value of the medical record.
Step S42, frequency scanning is carried out on the medical multi-mode feature preparation data so as to obtain feature frequency data;
Specifically, for example, the medical multi-modal feature preparation data is frequency scanned, for example, the number of times each feature appears in all medical records is counted as the frequency of the feature.
S43, carrying out descending order construction on the characteristic frequency data so as to construct characteristic FP tree data;
specifically, for example, feature frequency data is constructed in a descending order, for example, features are ordered from high to low according to the frequency of the features, and a feature list with ordered frequencies is formed.
S44, performing conditional FP tree generation on the characteristic FP tree data so as to acquire the characteristic conditional FP tree data;
Specifically, for example, from a frequency ordered feature list, a conditional FP-tree is built according to the FP-Growth algorithm.
S45, carrying out frequent item set mining on the characteristic condition FP tree data so as to acquire frequent item set data;
specifically, a set of frequent items meeting a minimum support threshold is mined, for example, from a conditional FP-tree.
And step S46, carrying out association generation on the frequent item set data so as to acquire medical association feature mining data.
Specifically, association rules that meet a minimum confidence threshold are generated from the frequent item set as medical association feature mining data, for example, according to Apriori principles. The Apriori principle is based on an important property that if an item set is frequent, then all its subsets must be frequent as well. In a dataset, a set of items must occur at a frequency that is not higher than any subset thereof.
The present invention, through the use of frequent item set mining and association rule generation, can reveal patterns or relationships in the data that may be hidden, which may not be apparent or ignored in the original data. These relationships may further be used for diagnosis, therapeutic decisions, or understanding underlying mechanisms of disease. The use of FP-tree (frequent pattern tree) data structures can effectively store and process large-scale data sets more efficiently than conventional association rule mining algorithms such as Apriori. Association rule mining can identify those feature combinations that alone may not be relevant but that together appear meaningful, which can optimize the feature selection process, facilitating subsequent data analysis or model training. By understanding and applying the discovered association rules, such as predicting disease progression, treatment protocols more amenable to medical resource optimization are formulated.
Preferably, step S5 is specifically:
Step S51, marking time series data of the medical associated feature mining data so as to obtain time series medical data;
Specifically, for example, time-series data labeling is performed on the medical-related feature mining data, and for example, medical data is converted into time-series medical data with a time stamp by identifying the recording time of each medical record and using the time as the time stamp of the record.
Step S52, performing data characteristic analysis on the time-series medical data so as to obtain medical data characteristic analysis data;
In particular, for example, data characteristic analysis is performed on time-series medical data, for example, basic properties of time-series medical data, such as stability, periodicity, and trending of data, are studied by using a method of time-series analysis, such as an autocorrelation chart and a partial autocorrelation chart.
Step S53, when the medical data characteristic analysis data are linear time characteristic analysis data, performing first time sequence prediction model construction on time sequence medical data so as to obtain a medical health time sequence prediction data model;
specifically, for example, when it is determined that the medical data characteristic analysis data is linear time characteristic analysis data, for example, an autoregressive moving average model (ARMA) or an autoregressive integrated moving average model (ARIMA) is utilized, the time-series prediction model is constructed for the time-series medical data, thereby generating a medical health time-series prediction data model.
And S54, when the medical data characteristic analysis data are nonlinear time characteristic analysis data, performing second time sequence prediction model construction on the time sequence medical data so as to obtain a medical health time sequence prediction data model, wherein the first time sequence prediction model construction and the second time sequence prediction model construction are constructed by adopting different time sequence prediction model construction modes.
Specifically, for example, when it is determined that the medical data characteristic analysis data is nonlinear time characteristic analysis data, the second time-series prediction model is constructed on the time-series medical data by, for example, using a neural network, deep learning, or support vector machine, etc., thereby generating a medical health time-series prediction data model.
According to the invention, the linear time sequence prediction model and the nonlinear time sequence prediction model are respectively constructed according to different data characteristics, and the strategy for dynamically selecting the prediction model can effectively improve the prediction accuracy. The time series prediction model can effectively process medical time series data, and can predict future health conditions so as to facilitate early intervention and prevent disease development. By predicting time-sequenced medical data, personalized medical services may be provided for each patient. For example, more effective treatment plans may be formulated based on the predicted outcome, or preventive measures may be formulated in advance. By predicting future health, the healthcare provider may allocate resources more efficiently, e.g., may schedule an operating room or nurse in advance, optimizing the provision of healthcare.
Preferably, a health data processing system based on medical big data, comprising:
The medical data preprocessing module is used for acquiring medical basic data through the electronic medical record system and preprocessing the medical basic data to acquire medical preprocessing data;
the multi-mode data fusion module is used for carrying out medical data analysis on the medical pretreatment data so as to obtain medical analysis data, and carrying out multi-mode data fusion on the medical pretreatment data by utilizing the medical analysis data so as to obtain medical multi-mode fusion data;
The medical characteristic extraction module is used for extracting medical characteristics of the medical multi-mode fusion data so as to acquire the medical multi-mode characteristic data;
the associated feature mining module is used for carrying out associated feature mining on the medical multi-modal feature data so as to acquire medical associated feature mining data;
The time sequence prediction model construction module is used for constructing a time sequence prediction model of the medical associated feature mining data so as to acquire a medical health time sequence prediction data model;
and the health resource allocation module is used for optimally allocating the medical resources according to the medical health time sequence prediction data model so as to acquire time sequence medical resource allocation data.
The method has the beneficial effects that the medical basic data is preprocessed and analyzed, and the method can extract key information in the data and convert the key information into a format capable of further carrying out multi-mode data fusion. The process not only improves the usability of the data, but also creates conditions for the subsequent data analysis and model construction. The multi-mode data fusion processing enables the processing process to master the information of multiple aspects of the patient more comprehensively, and enables the subsequent feature extraction and model prediction to be more comprehensive and accurate. And carrying out deep feature extraction, associated feature mining and time sequence prediction model construction on the medical data. This process enables medical predictions to be deeper and more accurate, enabling the patient's health to be understood and predicted more deeply. This approach makes a level prediction of health risk. This process enables medical professionals to recognize high risk medical conditions earlier and to intervene accordingly, thereby avoiding or alleviating problems with insufficient medical resources. The method is operated based on the medical big data, so that the potential of the big data can be fully utilized, more accurate and personalized medical health prediction is provided, and powerful support is provided for medical health management.
The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.
The foregoing is only a specific embodiment of the invention to enable those skilled in the art to understand or practice the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.