Distributed data extraction method and device, computer equipment and storage mediumTechnical Field
The present invention relates to computer technologies, and in particular, to a distributed data extraction method and apparatus, a computer device, and a storage medium.
Background
Clinical decision making in the medical industry requires integration of patient data from multiple platforms or multiple data sources, and due to inconsistency between data sources and data structures, different data cannot be directly applied, and secondary processing is required.
At present, data extraction is generally applicable to database systems, and data extraction can be performed only through Structured Query Language (SQL) on data with different data structures in a database table, so that data of medical equipment cannot be completely covered, and meanwhile, data extraction efficiency is low and stability is poor.
Disclosure of Invention
The invention provides a distributed data extraction method, a distributed data extraction device, computer equipment and a storage medium, which are used for realizing efficient data acquisition on various data sources.
In a first aspect, an embodiment of the present invention provides a distributed data extraction method, where the method includes:
the method comprises the following steps that at least two independent data extraction modules acquire data of at least two data sources, wherein the data extraction modules are matched with the corresponding data sources;
the data extraction module converts the data into a preset format and sets a data identifier for the data to form collected data;
the data extraction module sends the acquired data to a data receiving end;
and the data receiving end receives the acquired data and stores the acquired data to a preset storage position according to the data identification.
In a second aspect, an embodiment of the present invention further provides a distributed data extraction apparatus, where the apparatus includes:
the data extraction modules are matched with the corresponding data sources, convert the data into a preset format, set data identification for the data to form acquired data, and send the acquired data to a data receiving end;
and the data receiving end is used for receiving the acquired data and storing the acquired data to a preset storage position according to the data identification.
In a third aspect, an embodiment of the present invention further provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the distributed data extraction method according to any embodiment of the present invention.
In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the distributed data extraction method provided in any embodiment of the present invention.
According to the embodiment of the invention, at least two data extraction modules which are adaptive to different data sources are arranged, the data of at least two data sources are independently acquired, the data are converted into a uniform format and then are sent to a data receiving end, and the data receiving end carries out classified storage on the received data according to data identification, so that the problem that a single data extraction module cannot carry out data extraction on different types of data sources is solved, the range of data extraction is improved, meanwhile, a plurality of data extraction modules carry out synchronous data extraction on a plurality of data sources, the data extraction efficiency is improved, the data extraction modules are mutually independent and interfere complementarily, and the stability of data extraction is improved.
Drawings
Fig. 1 is a flowchart of a distributed data extraction method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a distributed data extraction apparatus according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of a computer device according to a third embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1 is a flowchart of a distributed data extraction method according to an embodiment of the present invention, where the present embodiment is applicable to a situation where multiple data sources are acquired in a distributed manner, and is particularly applicable to distributed data acquisition of multiple data sources in a clinical decision support system in a medical industry. The method specifically comprises the following steps:
s110, at least two independent data extraction modules acquire data of at least two data sources, wherein the data extraction modules are matched with the corresponding data sources.
The data source refers to a device capable of providing data, and may be, for example, a server storing data, or a hardware device capable of detecting data, for example, in the medical industry, the hardware device may be, but is not limited to, a ventilator, a blood pressure monitor, and a monitor. The data extraction module is used for extracting data from the data source. For example, if the data source is a server, the data extraction module may extract data by sending a data extraction request, where the data extraction request may be in the form of an HTTP (hypertext transfer Protocol) message, for example. If the data source is a hardware device, the data extraction module may be connected to the hardware device in the form of a connection line and a connection port, and perform data extraction.
In this embodiment, each data extraction module is adapted to one data source, and each data extraction module is independent from each other, and fast and efficient data extraction can be performed on multiple different data sources simultaneously in a distributed manner, so that the types of the data sources from which data can be extracted are enlarged, and the data extraction efficiency is improved. Meanwhile, the data extraction modules are independent from one another, and when any data extraction module is abnormal, other data extraction modules cannot be affected, and the stability of data extraction is improved.
In this embodiment, the data extraction modules are determined according to the number and types of the data sources of the data to be extracted, and optionally, each data source corresponds to one data extraction module. Optionally, if a new data source exists and a corresponding data extraction module does not exist, a data extraction module adapted to the new data source is established, the whole data extraction system does not need to be modified, and the applicability is strong.
It should be noted that, one data extraction module is adapted to one data source, and can acquire single data or multiple data, for example, if the data source is a sphygmomanometer, the corresponding data extraction module can acquire blood pressure data of a patient; if the data source is a monitor, the data that the corresponding data extraction module can collect includes, but is not limited to, heart rate, respiration, blood oxygen, body temperature, etc. of the patient. Alternatively, the same item of data may be acquired by one or more data sources. For example, the blood pressure data may be obtained by data extraction of a sphygmomanometer or may be obtained by data extraction of a monitor.
And S120, converting the data into a preset format by the data extraction module, and setting a data identifier for the data to form collected data.
In this embodiment, the data extracted from different data sources is converted into the same format, so that the uniformity of the data is improved, and the identification and storage of different data and subsequent data application are facilitated, wherein the data application may be to generate a clinical decision by extracting data of a patient. For example, the preset format may be: (data type) data, for example, the data source is a monitor, the extractable data includes heart rate and body temperature, and accordingly, the data in the preset format may be (HR)90, (T) 37.
In this embodiment, a data identifier is set for the data, and the application distinguishes the data, where the data identifier may be related to at least one of a data extraction module, data acquisition time, or a data type, and optionally, the data identifier may be a character string. The collected data comprises data in a preset format and data identification.
S130, the data extraction module sends the acquired data to a data receiving end.
Wherein, the data receiving end is used for receiving and storing the extracted data.
And S140, the data receiving end receives the acquired data and stores the acquired data to a preset storage position according to the data identification.
In this embodiment, the data source, the data type, or the acquisition time of the acquired data may be identified by the data identifier, and optionally, the data receiving end may store the data in a classified manner according to the data source or the data type of the acquired data.
Optionally, after the data receiving end receives the collected data, the data receiving end further includes: the data receiving end judges whether the data repetition exists in the collected data according to the collection time and the data type of the collected data; and if so, deleting the repeated collected data.
In this embodiment, the same data can be acquired through different data sources, and the data receiving end detects whether repeated acquisition of the same data type exists at the same acquisition time after receiving the acquired data, and if so, the repeated data is deleted, so that the problem of inaccurate clinical decision caused by the repeated data is avoided.
Optionally, step S140 includes: the data receiving end receives the acquired data, acquires the receiving time of the acquired data, and prestores the acquired data in a storage queue according to the receiving time;
and the data receiving end identifies the data identification of each acquired data, and stores the acquired data to a preset storage position according to the storage sequence of the acquired data in the storage queue and the data identification.
The storage queue is used for pre-storing the acquired data, wherein the acquired data are sequentially stored in the storage queue according to the receiving time of the data receiving end for receiving the acquired data, and the acquired data are sequentially stored to the preset storage positions according to the storage sequence in the storage queue according to the first-in first-out principle, so that the asynchronous storage of the acquired data is realized, the problem of data storage blockage caused by overlarge data amount when a plurality of data sources are simultaneously subjected to data acquisition is solved, and the stability of the data extraction system is improved.
Optionally, after the data receiving end prestores the acquired data in the storage queue, the data receiving end sends receiving information to the data extraction module to prompt the data extraction module to extract the next data, so that the data extraction speed is increased.
The technical scheme of this embodiment, through setting up two at least data extraction modules that are adapted to different data sources, independently acquire the data of two at least data sources, send data to the data receiving terminal after converting into unified form, the data receiving terminal carries out classified storage according to the data identification to the data that receive, the problem that single data extraction module can not carry out the data extraction to different grade type data sources has been solved, the scope of data extraction is improved, a plurality of data extraction modules carry out synchronous data extraction to a plurality of data sources simultaneously, the data extraction efficiency is improved, the data extraction modules are mutually independent, complementary interference, the stability of data extraction has been improved.
On the basis of the above embodiment, after the data receiving end receives the collected data, the data receiving end further includes: the data receiving end detects whether the format of the acquired data conforms to a preset format; if not, deleting the acquired data, and sending format abnormity prompt information to the corresponding data extraction module.
In this embodiment, after the data receiving end receives the acquired data, if it is detected that the format of the received acquired data does not conform to the preset format, the acquired data is deleted, format abnormality prompt information is generated, and the data extraction module adjusts the sending format according to the format abnormality prompt information. The format abnormity prompting information is also used for prompting the user to overhaul the data extraction module.
Optionally, after the data receiving end receives the collected data, the data receiving end further includes: the data receiving end determines a data standard range according to the data type of the acquired data; and if the data receiving end detects that the acquired data is out of the data standard range, deleting the acquired data and generating data abnormity prompt information.
For the medical data of the human body, each type of data has a data standard range, the data standard range refers to a normal data range of the human body, illustratively, the data standard range of the oral cavity temperature of the human body is 36.2-37.2 ℃, and the normal heart rate range of adults is 60-100 times/minute.
In this embodiment, the data standard range of each data type is pre-stored in the data receiving end, and after the data receiving end receives the acquired data, the data type of the acquired data is identified, and the data standard range is obtained. Detecting whether the collected data is in a data standard range, if so, storing the collected data to a preset storage position, otherwise, determining that the data is abnormal, deleting the collected data, and generating data abnormality prompt information.
Detecting whether the data extraction module and the corresponding data source are abnormal according to the data abnormity prompting information, if so, prompting a user to overhaul, if the data extraction module and the corresponding data source are determined to be in a normal working state, judging whether the acquired data is acquired in real time, if so, determining that the body of the user corresponding to the data source is abnormal, and generating alarm information.
In the embodiment, the acquired data is verified through the data standard range, so that the accuracy of the acquired data is improved, and misleading of wrong data to clinical decision is avoided.
In one embodiment, after receiving the collected data, the data receiving end detects whether the format of the collected data conforms to a preset format, if not, the collected data is deleted, and format abnormity prompt information is sent to a corresponding data extraction module; and if so, determining a data standard range according to the data type of the acquired data. If the acquired data is detected to be out of the standard range of the data, deleting the acquired data and generating data abnormity prompt information; and if the acquired data is detected to be within the data standard range, storing the acquired data to a preset storage position.
Example two
Fig. 2 is a schematic diagram of a result of a distributed data extraction apparatus according to a second embodiment of the present invention, where fig. 2 is a schematic diagram of an implementation manner of a distributed data extraction apparatus, and takes two independent data extraction modules as an example, in other embodiments, the number of the data extraction modules is determined according to requirements, and the apparatus specifically includes:
the data extraction modules 210 are matched with the corresponding data sources, convert the data into a preset format, set data identifiers for the data to form acquired data, and send the acquired data to a data receiving end;
and a data receiving end 220, configured to receive the acquired data and store the acquired data to a preset storage location according to the data identifier.
Optionally, the data receiving end 220 includes a pre-storing unit and a storing unit; wherein,
the pre-storage unit is used for receiving the acquired data, acquiring the receiving time of the acquired data, and pre-storing the acquired data in a storage queue according to the receiving time;
and the storage unit is used for identifying the data identifier of each acquired data and storing the acquired data to a preset storage position according to the storage sequence of the acquired data in the storage queue and the data identifier.
Optionally, the data receiving end 220 further comprises a deduplication unit, wherein,
and the duplication elimination unit is used for judging whether the data are duplicated according to the acquisition time and the data type of the acquired data after the acquired data are received, and deleting the duplicated acquired data if the data are duplicated.
Optionally, the data receiving end 220 further includes a format detecting unit and a first data deleting unit;
the format detection unit is used for detecting whether the format of the acquired data conforms to a preset format or not after the acquired data is received;
and the first data deleting unit is used for deleting the acquired data and sending format abnormality prompt information to the corresponding data extraction module if the format of the acquired data does not conform to the preset format.
Optionally, the data receiving end 220 further includes a data range determining unit and a second data deleting unit;
the data range determining unit is used for determining a data standard range according to the data type of the acquired data after the acquired data are received;
and the second data deleting unit is used for deleting the acquired data and generating data abnormity prompt information if the acquired data is detected to be out of the data standard range.
The distributed data extraction device provided by the embodiment of the invention can execute the distributed data extraction method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of executing the distributed data extraction method.
EXAMPLE III
Fig. 3 is a schematic structural diagram of a computing device according to a third embodiment of the present invention. FIG. 3 illustrates a block diagram of an exemplary computing device 12 suitable for use in implementing embodiments of the present invention. The computing device 12 shown in FIG. 3 is only one example and should not be taken to limit the scope of use and functionality of embodiments of the present invention.
As shown in fig. 3, computing device 12 may comprise an electronic device with computing processing capabilities, and the types may include, but are not limited to, a terminal device, such as a mobile terminal, a PC, and the like, and a server device, such as a server or a computer cluster, and the like. Components of computing device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including the system memory 28 and the processing unit 16.
Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Computing device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computing device 12 and includes both volatile and nonvolatile media, removable and non-removable media.
The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)30 and/or cache memory 32. Computing device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 3, and commonly referred to as a "hard drive"). Although not shown in FIG. 3, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. Memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.
A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally carry out the functions and/or methodologies of the described embodiments of the invention.
Computing device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with computing device 12, and/or with any devices (e.g., network card, modem, etc.) that enable computing device 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. Also, computing device 12 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) through network adapter 20. As shown, network adapter 20 communicates with the other modules of computing device 12 via bus 18. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with computing device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
The processing unit 16 may include, but is not limited to, a Central Processing Unit (CPU) and/or a Graphics Processing Unit (GPU), and executes programs stored in the system memory 28 to perform various functional applications and data processing, such as implementing any of the distributed data extraction methods provided by the embodiments of the present invention: the method comprises the following steps that at least two independent data extraction modules acquire data of at least two data sources, wherein the data extraction modules are matched with the corresponding data sources; the data extraction module converts the data into a preset format and sets a data identifier for the data to form collected data; the data extraction module sends the acquired data to a data receiving end; and the data receiving end receives the acquired data and stores the acquired data to a preset storage position according to the data identification.
Example four
A fourth embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a distributed data extraction method according to the fourth embodiment of the present invention: the method comprises the following steps that at least two independent data extraction modules acquire data of at least two data sources, wherein the data extraction modules are matched with the corresponding data sources; the data extraction module converts the data into a preset format and sets a data identifier for the data to form collected data; the data extraction module sends the acquired data to a data receiving end; and the data receiving end receives the acquired data and stores the acquired data to a preset storage position according to the data identification.
Any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.