Data quality improvement method and system based on medical big dataTechnical Field
The invention belongs to the technical field of data quality control, and particularly relates to a data quality improvement method and system based on medical big data.
Background
Currently, the closest prior art: with the development of society, people have more and more high quality requirements on medical data and requirements on the accuracy of the medical data. The existing big data technology can not use a conventional software tool to manage the data quality within a certain time range, and has the problem of uneven data quality.
In summary, the problems of the prior art are as follows: the existing medical data are complex in type and low in data quality. The non-uniform hospital level causes great difficulty in checking data and overlong checking time.
The difficulty of solving the technical problems is as follows: due to the complex data types and the large number of hospitals, the uploaded data types are not uniform.
Differences exist in uploaded data of hospitals, so that more errors exist in the uploaded data verification process, and the data quality is not high.
Different scoring standards are defined according to hospital level requirements, and the standards are customized according to hospital services.
The difference of data uploaded by hospitals is large, so that the time consumption of verification is long.
The significance of solving the technical problems is as follows: defining data standards, and mapping among the standards according to data uploaded by hospitals to achieve unification and standardization of the data uploaded by all hospitals so as to facilitate display in an electronic medical record system.
And providing a verification report to assist the hospital to correct the error relation in the verification report so as to improve the data quality.
According to the hospital level, different verification rules and grading rules are defined, and the effect of grading according to the hospital level is achieved.
Defining an uploading standard, firstly, carrying out a standard conversion before data acquisition to achieve the purposes of standardizing data, reducing conversion during verification, achieving quick verification and shortening verification time.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a data quality improvement method based on medical big data.
The invention is realized in such a way that a data quality improvement method based on medical big data comprises the following steps:
and adopting a PDLMV data cleaning framework theory to carry out multi-level data verification, and carrying out omnibearing display through a data exchange report, a data verification report, a special subject report and other aggregated result data.
Further, the data quality improvement method based on the medical big data comprises the following steps:
step one, calculating based on HIS atomic index values, and performing quality management through specification detail data, non-specification detail data, state data, atomic index summary and other field level checks;
calculating based on the platform atomic value, and finely checking the personal information of residents and service treatment records by collecting data through a public service platform;
calculating based on the BI atomic index value, and carrying out directional rule verification on a related basic table by taking the atomic index as guidance;
and fourthly, writing a dynamic sql execution statement, and performing data quality control and statistics based on a hadoop and hash calculation engine.
Another object of the present invention is to provide a medical big data-based data quality improvement system implementing the medical big data-based data quality improvement method, the medical big data-based data quality improvement system comprising:
a data checking module: the method is used for adopting a PDLMV data cleaning frame to carry out three-path-in-one multi-level data verification;
a data exchange module: the system is used for exchanging data by adopting ETL middleware KETTLE;
an analysis module: the system is used for tracking and analyzing the production log and the system log by utilizing hadoop, hash and other analysis frameworks;
a display module: the data exchange and verification system is used for carrying out all-around display on various aggregation result data through a data exchange report, a data verification report and a special report.
The data quality control module: and the quality of the data and the data verification problem are completely displayed through the consistency, the integrity, the normalization and the timeliness of the data.
It is another object of the present invention to provide a computer program product stored on a computer readable medium, comprising a computer readable program for providing a user input interface for implementing said method for improving data quality based on medical big data when executed on an electronic device.
Another object of the present invention is to provide a computer-readable storage medium, comprising instructions, which when executed on a computer, cause the computer to execute the method for improving data quality based on medical big data.
In summary, the advantages and positive effects of the invention are: the method for improving the data quality based on the medical big data can realize the control of the data quality by checking the data for multiple times by combining three paths. Aiming at solving the problems of complex medical data type, low data quality and the like in the medical data quality at present, the quality of the medical data is improved, the PDLMV data cleaning framework theory is adopted, multi-level data verification is realized, and all-round display is carried out through various aggregated result data such as a data exchange report, a data verification report special subject report and the like. The invention can solve the problem of improving the data quality based on medical big data, and controls the data quality through advanced theory and core check rules. The invention can self-define the path template for searching and relationship maintenance; performing multi-stage aggregation through data mart; using solr technique, columns are stored, and are quickly searched and stored by map (key, value).
Drawings
Fig. 1 is a flow chart of a data quality improvement method based on medical big data provided by an embodiment of the invention.
Fig. 2 is a schematic structural diagram of a data quality improvement system based on medical big data according to an embodiment of the present invention.
In the figure: 1. a data verification module; 2. a data exchange module; 3. an analysis module; 4. a display module; 5. and a data quality control module.
Fig. 3 is a schematic diagram of a data quality improvement method based on medical big data according to an embodiment of the present invention.
Fig. 4 is a schematic diagram of a data quality improvement system based on medical big data provided by an embodiment of the invention.
Fig. 5 is a data interface diagram for data quality control monitoring provided by an embodiment of the present invention.
FIG. 6 is a diagram of a scheduling interface of a data quality control program according to an embodiment of the present invention.
Fig. 7 is a diagram of a data verification script execution code interface provided by an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The existing medical data are complex in type and low in data quality.
In order to solve the problems in the prior art, the present invention provides a method and a system for improving data quality based on medical big data, and the present invention is described in detail below with reference to the accompanying drawings.
The data quality improvement method based on the medical big data provided by the embodiment of the invention comprises the following steps:
and a PDLMV data cleaning framework theory is adopted to carry out multi-level data verification, and omnibearing display is carried out through a data exchange report, a data verification report special subject report and other aggregation result data.
As shown in fig. 1, the data quality improvement method based on medical big data provided by the embodiment of the invention comprises the following steps:
and S101, calculating based on HIS atomic index values, and performing quality management through specification detail data, non-specification detail data, state data, atomic index summary and other field level checks.
And S102, calculating based on the platform atomic value, and finely checking the personal information of the residents and the service attendance records by collecting data through the public service platform.
S103, calculation is carried out based on the BI atom index value, and oriented rule verification is carried out on the relevant basic table by taking the atom index as a guide.
And S104, writing a dynamic sql execution statement, and performing data quality control and statistics based on a hadoop and hash calculation engine.
As shown in fig. 2, a data quality improvement system based on medical big data provided by an embodiment of the invention includes:
data verification module 1: the method is used for performing three-path-in-one multi-stage data verification by adopting a PDLMV data cleaning framework.
The data exchange module 2: for data exchange using ETL middleware keyle.
An analysis module 3: for tracking and analyzing production logs and system logs using hadoop, hash and other analysis frameworks.
The display module 4: the data exchange and verification system is used for carrying out all-around display on various aggregation result data through a data exchange report, a data verification report and a special report.
The data quality control module 5: and the quality of the data and the data verification problem are completely displayed through the consistency, the integrity, the normalization and the timeliness of the data.
Fig. 3 is a schematic diagram of a data quality improvement method based on medical big data according to an embodiment of the present invention.
Data are uploaded to a preposed library from a hospital business library through means of summarizing and the like, the originality of the data is kept, the preposed library uploads the data to big data by using ESB + ETL for verification, secondary summarizing is carried out, gold summarized data and verification data are distributed to each main body storage library by using DATAX so as to be used for each application platform conveniently, third summarizing is carried out according to data of a statistical table uploaded by a hospital, first three-path comparison is carried out according to the third summarizing, and a report is synthesized according to comparison conditions.
Fig. 4 is a schematic diagram of a data quality improvement system based on medical big data provided by an embodiment of the invention.
Accumulating data based on a medical data knowledge base, forming a quality control rule and a measurable quality control rule in a quality control center, carrying out data verification on medical data streams based on a sampk data calculation engine by the quality control rule, forming a problem report with blood relation, selecting and selecting problem-highlighted data to sequentially trace a problem data source manufacturer according to report priority by operation and maintenance personnel, carrying out data supplementary transmission or retransmission after the manufacturer corrects the problem, uniformly scheduling retransmission or supplementary transmission data for secondary verification by a quality control platform based on a data bus, forming a secondary verification report, and generating a final quality control scoring result according to the secondary report.
The technical solution of the present invention is further described with reference to the following specific embodiments.
Example (b):
the data quality improvement method based on the medical big data provided by the embodiment of the invention comprises the following steps:
(1) based on the national standard of medical treatment and the localization medical standard of Gansu province, the measurement standards of HIS atom index values, field normative rules, business association rules, data consistency verification rules and the like are sorted out. And assigning quality control weights according to the medical service relation priority to form a quality control scoring standard capable of tracking and measuring.
(2) The quality control rules are managed in a centralized mode, are adjusted and configured in a unified mode, the quality control rules of hospitals in different levels are determined, and all data verification levels of all links in a medical data link are determined.
(3) And carrying out data verification based on the quality control rule, synchronizing the verification result to a quality control scoring rule table, and carrying out hospital data quality scoring.
(4) The quality control verification needs to continuously test a data structure, detect abnormal contents, form controllable flow and trace problems.
The invention is further described below in connection with specific experiments.
The data quality control monitoring data interface is shown in fig. 5.
The data quality control program scheduling interface is shown in fig. 6.
The data verification script execution code interface is shown in FIG. 7.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.