Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
The data fusion performed at present generally involves the fusion of multi-source heterogeneous data, which refers to: the data generated by each data source is recorded by a data table containing a plurality of dimensions (i.e. a plurality of attributes), the data of each dimension in the data table has a respective value range and data type, and since the services served by each data source are different, the attributes contained in each corresponding data table are also different. In the related technology, the data fusion completely depending on the code is low in flexibility and large in code amount.
In order to solve the above problems, the present application provides a data fusion method, for each source table, mapping configuration information (including a mapping relationship between a source table field and a target table field and a conversion rule corresponding to the source table field) corresponding to the source table is determined based on a stored target table field, so as to convert data in the source table to a target table according to the mapping configuration information, and since structures of the target table corresponding to each source table and a final target table are consistent, data of the target table field in each target table obtained by conversion can be fused to the final target table according to a pre-configured primary key field of the target table, so as to implement data fusion.
Based on the above description, it can be seen that, by determining the mapping configuration information corresponding to each source table to implement data fusion, the code amount of data fusion can be reduced, and even if the dimension (i.e. the source table field) of the source table changes, only the mapping configuration information needs to be changed, and the code for implementing data fusion does not need to be modified, so that the flexibility of data fusion can be improved.
The technical solution of the present application will be described in detail with specific examples.
Fig. 1A is a flowchart illustrating an embodiment of a data fusion method according to an exemplary embodiment of the present application, where the data fusion method may be applied to an electronic device, and as shown in fig. 1A, the data fusion method includes the following steps:
step 101: and for each source table, determining mapping configuration information corresponding to the source table based on the stored target table fields, wherein the mapping configuration information comprises the mapping relation between the source table fields and the target table fields and conversion rules corresponding to the source table fields.
The source table field refers to a field included in a source table, the target table field refers to a field included in a target table, a conversion rule in the mapping configuration information may be a Chinese name-changing pinyin (e.g., cn2py (name)), or a gender extracted from an identity card, or an age extracted from an identity card, or a data type conversion, and the like, and a mapping relationship between the source table field and the target table field may be implemented by an SQL language, such as City as address from a, which indicates that the source table field City in the source table a is mapped by the target table field address.
It should be noted that the source table fields in each source table may be consistent or inconsistent, but each source table needs to have a primary key field that is used for uniquely identifying one record and is the same for subsequent data fusion, where the primary key field belongs to the source table field. For example, the source table to be fused is a floating population table and a student archive table, and the source table fields in the floating population table are identity cards, names, ages, sexes and addresses; the source table fields in the student archive table comprise schools, classes, school numbers, names, identity cards and genders, and the main key fields in the floating population table and the student archive table are used for uniquely identifying one record and are identity cards.
For the process of determining the mapping configuration information corresponding to the source table based on the stored target table field, reference may be made to the following description of the embodiment shown in fig. 2A, and details of the process are not described here.
Step 102: and converting the data of the source table field in the source table into the target table corresponding to the source table by using the conversion rule corresponding to the source table field according to the mapping relation between the source table field and the target table field corresponding to the source table.
Where each source table corresponds to a target table, which may be generated using stored target table fields. It should be noted that the structures of the target tables corresponding to each source table are consistent, that is, the fields and field types of the target tables are consistent.
In an exemplary scenario, as shown in table 1, an exemplary employee source table is shown, where the source table fields have names, ages, sexes, positions, and numbers, and the corresponding target table fields in the target table have names, ages, and cities, it is assumed that mapping configuration information corresponding to table 1 includes mapping relationships between names and names, a conversion rule corresponding to names is a chinese name-pinyin, a mapping relationship between ages and ages, and a conversion rule corresponding to ages is a copy. The target table obtained after conversion according to the mapping configuration information corresponding to table 1 is shown in table 2.
| Name (I) | Age (age) | Sex | Position of employment | Numbering |
| Zhang three | 25 | For male | Teacher'steacher | 1 |
| Li four | 26 | Woman | Doctor | | 2 |
| Wangwu tea | 42 | Woman | Doctor | | 3 |
TABLE 1
| Name | Age | City |
| Zhangsan | 25 | |
| Lisi | 26 | |
| Wangwu | 42 | |
TABLE 2
Step 103: and fusing the data contained in the target table field in each target table obtained by conversion into a final target table according to the primary key field of the pre-configured target table.
In one embodiment, prior to performingstep 103, the field priority may be preconfigured, which may be implemented by: when a priority configuration command input from the outside is received, determining target table fields selected from the outside, and acquiring and displaying source table fields corresponding to the target table fields from mapping configuration information corresponding to each source table; and receiving and storing externally input priority configuration information aiming at the source table field corresponding to the target table field.
The user can send a priority configuration command by triggering a configuration priority button, the priority configuration page is displayed after the priority configuration command is received by the equipment, and the equipment displays a source table field corresponding to a target table field after the user selects the target table field in the page, so that the user can perform priority configuration. In addition, when the source table fields are displayed, the source table name to which each source table field belongs and the mapping rule corresponding to each source table field can be displayed, and therefore a user can conveniently carry out priority configuration according to actual requirements.
In an exemplary scenario, as shown in fig. 1B, the destination table field selected by the user in the priority configuration page is "AGE", the "AGE" field corresponds to "AGE", "PATIENT _ AGE", and "AGE" of the source table fields in the three source tables, respectively, and the user performs priority configuration by adjusting the "top", "up", and "down" buttons corresponding to each source table field, and the larger the value of the priority is, the more preferred the data of the source table field is to be fetched in the fusion.
In one embodiment, beforestep 103 is performed, the primary key field needs to be configured, which may be implemented as: when receiving externally input primary key field configuration commands, displaying the stored target table fields, and when receiving externally selected target table fields, determining the target table fields as the primary key fields.
The user can send a main key field configuration command by triggering a 'configure main key field' button, and the device displays the stored target table field after receiving the main key field configuration command, so that the user can select the target table field as the main key field according to actual requirements. As shown in FIG. 1C, for an exemplary primary key field configuration page, the target table fields shown in FIG. 1C have "NAME", "AGE" and "EDUBACK", from which the user can select one target table field having a mapping relationship in each source table as the primary key field.
It should be noted that, because the primary key field is used to uniquely identify a record, the user needs to select a target table field having a mapping relationship in the mapping configuration information of each source table as the primary key field in the target table field, so as to facilitate subsequent merging and use.
In an embodiment, for the process ofstep 103, the data contained in the primary key field may be obtained from each target table, and the obtained data is added to the final target table, and then for each data contained in the primary key field, if the data appears in only one target table, a record corresponding to the data is obtained from the one target table, and the record corresponding to the data is added to the final target table; and if the data appears in a plurality of target tables, acquiring records corresponding to the data from the plurality of target tables, determining data of each target field except the primary key field from the acquired records, and adding the determined data of each target table field corresponding to the data to a final target table.
And the final target table is consistent with the structure of the target table obtained through conversion. If the data contained in the primary key field appears in only one target table, only one record corresponding to the data is shown, the record corresponding to the data is directly added to the final target table, and if the data contained in the primary key field appears in a plurality of target tables, a plurality of records corresponding to the data are shown, the data of each target field except the primary key field needs to be determined from the plurality of records and added, so that the fusion purpose is achieved.
In an embodiment, for the data process of determining each target field except the primary key field from the obtained plurality of records, a record containing data of the target table field may be determined from the obtained plurality of records for each target table field except the primary key field, and when one record is determined, the data of the target table field in the one record is determined as the data of the target table field; when a plurality of records are determined, determining priority configuration information of a source table field corresponding to the target table field, selecting the source table field with the highest priority from the priority configuration information, acquiring the record corresponding to the source table field from the determined plurality of records, and determining data of the target table field in the record as data of the target table field; the record corresponding to the source table field is located in the target table corresponding to the source table where the source table field with the highest priority is located.
In an exemplary scenario, as shown in tables 3 and 4, for two target tables obtained by conversion, the target table field id is a primary key field, and it is assumed that the priority of the source table field corresponding to the target table field name in table 3 configured by the user is 1, and the priority of the source table field corresponding to the target table field name in table 4 is 2. Since the data 1000 contained in the primary key field id appears in both table 3 and table 4, 2 records can be obtained, and the data of the target table field name, address, and chief need to be determined from the 2 records respectively. For the name of the field of the target table, it may be determined that 2 records of the data containing the name exist, and since the target table corresponding to the source table where the field of the source table with the highest priority is located is table 4, the record corresponding to the field of the source table with the highest priority is: table 4-name ═ Hikvision; chief is Hu, and the data of the field name of the target table in the record is Hikvision; for the address of the field of the target table, it can be determined that the record of the data containing the address is 1: table 3-name ═ Hik; address is Hangzhou, so that the data of the address of the target table field can be determined to be Hangzhou; for the target table field chief, since it can be determined that the record of data containing chief is 1: table 4-name ═ Hikvision; since chief is Hu, it is determined that data of the target table field chief is Hu, and thus records of 2 primary key fields whose data is 1000 can be merged into the final target table as shown in table 5.
| id | name | address | chief |
| 1000 | Hik | Hangzhou | |
TABLE 3
| id | name | address | chief |
| 1000 | Hikvision | | Hu |
TABLE 4
| id | name | address | chief |
| 1000 | Hikvision | Hangzhou | Hu |
TABLE 5
In the embodiment of the present application, for each source table, mapping configuration information (including a mapping relationship between a source table field and a target table field and a conversion rule corresponding to the source table field) corresponding to the source table is determined based on a stored target table field, so as to convert data in the source table to a target table according to the mapping configuration information, and since structures of a target table corresponding to each source table and a final target table are consistent, data of a target table field in each target table obtained by conversion can be fused to the final target table according to a primary key field of a preconfigured target table, so as to implement data fusion.
Based on the above description, it can be seen that, by determining the mapping configuration information corresponding to each source table to implement data fusion, the code amount of data fusion can be reduced, and even if the dimension (i.e. the source table field) of the source table changes, only the mapping configuration information needs to be changed, and the code for implementing data fusion does not need to be modified, so that the flexibility of data fusion can be improved.
Fig. 2A is a flowchart of another data fusion method according to an exemplary embodiment of the present application, and based on the embodiment shown in fig. 1A, this embodiment exemplarily illustrates how to determine mapping configuration information corresponding to each source table for each source table. As shown in fig. 2A, the data fusion method further includes the following steps:
step 201: and acquiring and displaying the source table field and the stored target table field in the source table.
In an embodiment, when the source table field in the source table is acquired, the data type of the source table field may also be determined, and the source table field and the data type are output and displayed. In addition, the data type of the target table field may be displayed along with the stored target table field.
The data type may include a character type, an integer type, and the like. Because the target table field is stored locally, the structure of the corresponding target table can be automatically completed for each source table without editing by a user.
Step 202: and receiving and storing the mapping relation between the externally input source table field and the target table field and the conversion rule corresponding to the source table field.
In an embodiment, after receiving the mapping relationship between the source table field and the target table field and the conversion rule corresponding to the source table field, it may be detected whether the data type of the source table field is consistent with the data type of the target table field, and if not, prompt information indicating that the data types are inconsistent is output, so that a user may modify the data type of the target table field by using the data type output by the conversion rule, or continue to input the conversion rule for data type conversion.
In an exemplary scenario, as shown in fig. 2B, the source table list column contains 2 source tables, and a user may switch between the 2 source tables to configure mapping configuration information of each source table; the source table display column displays the source table field and the data type of the currently selected source table; the target table display column displays the stored target table fields and data types, and after configuration is completed, the background generates a target table corresponding to the currently selected source table for subsequent conversion of data in the source table into the target table. Connecting the source table field NAME with the target table field NAME to trigger the generation of a mapping relation between the source table field NAME and the target table field NAME, and inputting a conversion rule in an edit box of a connecting line: cn2py (NAME), connecting the source table field RN and the target table field AGE to trigger the generation of the mapping relationship between the two, and inputting the conversion rule in the edit box of the connecting line: sfz _ to _ birthday _ convert (RN), which shows the conversion rule for extracting the age from the ID card.
It should be noted that, for each source table, after the mapping configuration information is configured, a mapping test may be performed to verify whether the mapping configuration information meets the actual requirement, and the implementation process may be: randomly acquiring a preset amount of data from a source table, outputting and displaying the data, converting the acquired data into a target table according to mapping configuration information corresponding to the source table when an externally input test instruction is received, and displaying the converted target table for comparison and verification by a user.
In another exemplary scenario, as shown in fig. 2C, the left area of fig. 2C shows data of the source table 1, and after the user triggers the "sample test" button, the data of the target table 1 shown in the right area of fig. 2C is obtained, and it can be verified whether the mapping configuration information of the source table 1 meets the actual requirement by comparing the source table 1 with the target table 1.
By this, the flow of fig. 2A is completed, and the configuration of the mapping configuration information of the source table can be realized by the flow of fig. 2A.
Fig. 3 is a hardware block diagram of an electronic device according to an exemplary embodiment of the present application, where the electronic device includes: acommunication interface 301, aprocessor 302, a machine-readable storage medium 303, and abus 304; wherein thecommunication interface 301, theprocessor 302 and the machine-readable storage medium 103 communicate with each other via abus 304. Theprocessor 302 may execute the data fusion method described above by reading and executing machine executable instructions corresponding to the control logic of the data fusion method in the machinereadable storage medium 303, and the specific content of the method is described in the above embodiments, which will not be described herein again.
The machine-readable storage medium 303 referred to herein may be any electronic, magnetic, optical, or other physical storage device that can contain or store information such as executable instructions, data, and the like. For example, the machine-readable storage medium may be: volatile memory, non-volatile memory, or similar storage media. In particular, the machine-readable storage medium 303 may be a RAM (random Access Memory), a flash Memory, a storage drive (e.g., a hard drive), any type of storage disk (e.g., an optical disk, a DVD, etc.), or similar storage medium, or a combination thereof.
Fig. 4 is a block diagram of an embodiment of a data fusion device according to an exemplary embodiment of the present application, and as shown in fig. 4, the data fusion device includes:
a determiningmodule 410, configured to determine, for each source table, mapping configuration information corresponding to the source table based on a stored target table field, where the mapping configuration information includes a mapping relationship between the source table field and the target table field and a conversion rule corresponding to the source table field;
aconversion module 420, configured to convert, according to a mapping relationship between the source table field and the target table field, data in the source table field into a target table corresponding to the source table by using a conversion rule corresponding to the source table field;
thefusion module 430 is configured to fuse, according to a primary key field of a preconfigured target table, data of the target table field in each target table obtained through conversion into a final target table;
and the structures of the target table corresponding to each source table and the final target table are consistent.
In an optional implementation manner, the determiningmodule 410 is specifically configured to obtain and display a source table field and a stored target table field in the source table; and receiving and storing the mapping relation between the externally input source table field and the target table field and the conversion rule corresponding to the source table field.
In an alternative implementation, the apparatus further comprises (not shown in fig. 4):
the priority configuration module is used for determining externally selected target table fields when receiving an externally input priority configuration command before the fusion module fuses the data contained in the target table fields in each target table obtained by conversion into a final target table according to the primary key fields of the pre-configured target tables, and acquiring and displaying the source table fields corresponding to the target table fields from the mapping configuration information corresponding to each source table; and receiving and storing externally input priority configuration information aiming at the source table field corresponding to the target table field.
In an optional implementation manner, thefusion module 430 is specifically configured to obtain data included in the primary key field from each target table, and add the obtained data to the final target table; for each data contained in the primary key field, if the data only appears in one target table, acquiring a record corresponding to the data from the target table, and adding the data corresponding to the record to a final target table; and if the data appears in a plurality of target tables, acquiring records corresponding to the data from the plurality of target tables, determining data of each target field except the primary key field from the acquired records, and adding the determined data of each target table field corresponding to the data to a final target table.
In an optional implementation manner, thefusion module 430 is further specifically configured to, in the process of determining data of each target field except for the primary key field from the obtained multiple records, determine, for each target table field except for the primary key field, a record including the data of the target table field from the multiple records; when one record is determined, determining the data of the target table field in the record as the data of the target table field; when a plurality of records are determined, determining priority configuration information of a source table field corresponding to the target table field, selecting the source table field with the highest priority from the priority configuration information, acquiring the record corresponding to the source table field from the determined plurality of records, and determining data of the target table field in the record as data of the target table field; the record corresponding to the source table field is located in the target table corresponding to the source table where the source table field with the highest priority is located.
The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the application. One of ordinary skill in the art can understand and implement it without inventive effort.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the scope of protection of the present application.