Detailed Description
Technical solutions of the present invention generally provide a method, apparatus, system, and computer-readable storage medium for performing data acquisition via a data server system. Different from the technical mode that the existing data acquisition can only be matched one-to-one through a specific interface code, the invention provides a technical scheme capable of automatically identifying and recommending the optimal acquisition template. When the user does not know which acquisition template to use specifically for data acquisition, the inventive solution makes an automatic matching reference by providing template data, and in some aspects the user may also attempt acquisition based on the probability of a match.
Furthermore, the acquisition template can automatically realize the operations of data acquisition, extraction, cleaning, verification, warehousing and the like, and greatly reduces the labor cost, wherein the extraction function is that the acquired data can have various structures and types, and the data extraction process can help to convert the complex data into a single type or a type convenient to process, so as to achieve the purpose of rapid analysis and processing. Cleansing is because most data is not entirely valuable, some data is not intended to be of interest, and other data is entirely erroneous or irrelevant data. Therefore, such useless data can be removed by a filtering operation, and effective data can be extracted.
The technical solution of the present invention and various embodiments thereof will be described in detail below with reference to the accompanying drawings.
FIG. 1 is a functional block diagram illustrating adata service system 100 according to an embodiment of the present invention. As shown in fig. 1, thedata service system 100 of the present invention can be divided into adata layer 110 and anapplication layer 120 according to functions and roles, wherein the data layer can be used to identify and save data. In one or more embodiments, the application layer may be divided into three functional blocks,task management 122,analysis tool 124, andsystem management 126, depending on function and role. The following will be described in detail with respect to the respective functional blocks:
firstly, for task management, in the present invention, it mainly runs through the whole data analysis process, and its specific operations may include, but are not limited to, task operations such as creating, viewing, deleting, importing, exporting, and sharing tasks, and the aforementioned task contents may include data connection, extraction configuration, analysis configuration, template identification, code table identification, log table identification, table field identification, automatic analysis relationship identification, data tag identification, data processing configuration, task start-up, and task log identification related to table relationship establishment. In one or more embodiments, the results after the task is completed may be exposed by adding tags to the data or building table relationships.
Secondly, as for the analysis tool, in the present invention, the main function thereof may be to analyze again the results (e.g., table relations) after the completion of the automatic execution. In one or more embodiments, the re-analyzed work content may include, but is not limited to: filtering empty tables, filtering empty fields, data table analysis, table field analysis, table relationship analysis, table field retrieval, table field value retrieval, and the like. Through such a reanalysis process, the scheme of the invention can verify the accuracy of automatic analysis and can further deeply analyze data table relations, field value annotations and the like.
Finally, as for system management, in the present invention, its main function relates to various operations of user management (including login operation of user, etc.). In one or more embodiments, the primary functions of user management may include, but are not limited to: and the system comprises the contents of message reminding, operation log viewing, login password modification, user login switching, document viewing assistance and the like during task execution. Further, subsequent information updates and maintenance for the data semantic library and the industry semantic library may also be performed at the system management site. In addition, related system settings and database settings may also be made at the system management function.
The data service system of the present invention is described above with reference to fig. 1, and the technical solution of the present invention may be implemented between the data layer and the application layer of the above system, for collecting the data table in the database in the data layer to the application layer for analysis and processing.
Fig. 2 is a block diagram illustrating adata service system 200 according to an embodiment of the present invention. As shown in fig. 2, thedata service system 200 may include, among other things, adata collection apparatus 210, asource database 220, and atarget database 230. According to an embodiment of the present invention, the number of thesource databases 220 is not limited, but may be a plurality of databases (e.g., the first database, the second database, … …, the mth database, where M is a positive integer) having different types and/or different data structures, respectively, for providing data of different types or data structures. In one embodiment, valid data from different source databases are input or collected to the data collection device according to the data processing requirement, and the valid data generally have different data structures. To achieve such efficient data collection, the data collection device may establish one or more data transmission paths between the source database and the target database (e.g., via various types of data transmission techniques). Further, the data acquisition device of the present invention further includes a plurality of acquisition templates (such as the first acquisition template, the second acquisition template, … …, and the nth acquisition template shown in the figure, where N is a positive integer), so that data tables with different data structures can be acquired through the plurality of acquisition templates simultaneously or in a certain order, and the acquired data tables are processed and then transmitted to the target database.
In some embodiments, the target database of the present invention may be implemented using a remote dictionary Server (redis) according to functions and requirements, because the redis supports data persistence, and data in a memory may be saved in a disk, so that the data may be loaded again for use at the time of restart. Of course, it should be understood that the target database of the present invention is not limited to be implemented by redis, and any database management system that can provide a safe and reliable storage function for the structured data may be used as the database of the present invention.
To further illustrate the specific operation of the data service system, the following description will be made with respect to the specific components of the data acquisition device and the data processing with the source database and the target database.
Fig. 3 is a block schematic diagram illustrating adata service system 300 according to an embodiment of the present invention. As shown in fig. 3, thedata service system 300 includes adata acquisition apparatus 310, asource database 320, and atarget database 330. According to an embodiment of the invention, thedata acquisition device 310 may include atransmission module 312, one ormore acquisition templates 314, anidentification template 316, and amatching module 318. As previously mentioned, a source database may be a plurality of databases having different types, different data structures, for providing a plurality of different structures or types of data. In one implementation, the source database may transmit a plurality of data with different structures to the target database via the data transmission path established by the transmission module via the data acquisition device of the present invention.
In order to realize the data transmission between the source database and the target database, the data acquisition apparatus of the present invention may identify, through the identification module, characteristics of the source database of the data to be acquired, where the characteristics of the source database may include a type or a data structure of the source database. In addition to directly identifying characteristics of a source database to determine a type, alternatively or additionally, the identification module may determine the type of the source database from a data backup file. In one embodiment, the data backup file may be, for example, data from a user pre-stored. Therefore, when the identification module cannot directly identify the characteristics of the source database, the identification module can also identify the characteristics of the corresponding source database according to the data structure of the data backup file. After the identification module completes the identification operation, the data acquisition device can match the identified source database with an appropriate acquisition template by means of the matching module. According to the scheme of the invention, each acquisition template can be matched with the characteristics of one or more source databases, so that the data of different source databases can be acquired by the matched acquisition template, and the acquired effective data is transmitted to the target database through the transmission module.
In the invention, the acquisition template can be configured according to the type or data structure of the source database, and for the data information of the same type, the acquisition can be carried out through one template. When the source data structure is changed, the setting of the template is only required to be adjusted or the acquisition template is only required to be changed to adapt to the change of the source database, and the application program does not need to be developed again, so that high expandability is realized, convenience in use is realized, and the development and maintenance cost can be effectively saved. Furthermore, when different system software or new data results are added, the acquisition template is simply added without modifying the code, or only an appropriate amount of fine tuning application code is needed.
Furthermore, the plurality of acquisition templates of the invention can be expanded according to the use requirement. For example, a new collection template can be created by the matching module determining in advance the main foreign key relationship between a plurality of data tables or the association between a data table and a database, and inputting the relationship between the data tables or the association with the database (for example, the format of the data table and the corresponding database type) into the newly created collection template. After that, when the type of the source database or the data structure in the source database is changed, the scheme of the invention only needs to adjust the acquisition template corresponding to the database without reestablishing a new acquisition template, so that the acquisition template of the invention is relatively more flexible and effective in use.
In addition, each source database may store a large amount of data. If the data of the source database is comprehensively transmitted to the target database, the bearing capacity of the target database is too large, and the difficulty of subsequent data analysis is increased. However, with the acquisition template of the present invention, only valid data can be acquired from the source database, rather than collecting all the data comprehensively. Therefore, the acquisition template of the invention not only can save the time for acquiring data, but also can make the data more effectively utilized.
FIG. 4 is a flow diagram illustrating adata acquisition method 400 according to an embodiment of the invention. As mentioned above, the data collection is performed to transmit valid data collected from a source database to a target database, but each database has different types or data structures due to differences in system design, usage habits and versions, and therefore the data in the databases needs to have corresponding collection templates for collection. The specific flow of the data acquisition method will be described with reference to fig. 4.
As shown in fig. 4, atstep 401, themethod 400 establishes a direct data transmission path between the source database and the target database, through which collected data (including the aforementioned valid data) can be transmitted from the source database to the target database. Next, atstep 402, themethod 400 identifies characteristics of the source database of data to be collected as a reference for searching and matching the collection template. Here, the characteristics of the source database may include the type or data structure of the source database.
Atstep 403, themethod 400 determines an acquisition template that can be matched against the source database based on the characteristics of the source database. As previously described, each acquisition template of the present invention may be matched to the type or data structure of one or more source databases. Atstep 404, themethod 400 collects valid data from the corresponding source database through the matching collection template, and transmits the collected valid data from the source database to the target database via the transmission path.
In some embodiments, in addition to the collected data from the source database, some data may also come from a data backup file pre-stored by the user, so that it may not be possible to find a matching template directly by identifying the type of the source database. Nevertheless, the identification module of the present invention can identify the type of the source database of the data backup file according to the structure (or type) of the data backup file, and can find the matching collection template from the data structure of the data source of the data backup file. Therefore, the scheme of the invention can automatically collect the effective data from the source database to the specified target database through the collection template.
In some application scenarios, a large amount of data collection is required for valid data for a particular project or system, and thus data from multiple different source databases may need to be collected simultaneously. To further illustrate the application principle of the present invention in such a scenario, the following describes an exemplary collection method for data tables of two different financial systems with reference to fig. 5 and 6. Of course, the description herein is also exemplary rather than limiting and is not intended to limit the acquisition scheme of the present invention.
FIG. 5 is a diagram 500 illustrating a data table structure according to one embodiment of the invention. As shown in FIG. 5, the A and B databases may be categorized into two different sets of financial systems. For example, both may be databases of completely different types and different table structures. As can be seen from the system table structure diagram 500, thedata table structure 510 of the A database and thedata table structure 520 of the B database are not the same, so the table field names of the two data tables are written completely differently. When data acquisition is required, two different acquisition templates can be configured for the two databases. Further, each template may be configured with matching fields for comparison with fields of a data table. When the collection template is collected, the setting can be performed through a query statement. For example, when collecting fields of "abo.bd _ accaroa", "abo.bd _ acchart" and "abo.bd _ account" in the data table of the a database, fields of "accaroa", "acchart" and "account" may be input as setting conditions in the corresponding collection template, and matching fields configured in the template by the SQL syntax are as follows:
similarly, as to the field of "bud _ subset _ ICOME" in the data table of the B database, the collection is performed as long as the field of "subset _ ICOME" is input as the setting condition in the collection template. With the above arrangement, the a database and the B database can acquire the required data through the corresponding acquisition templates, and then transmit the acquired data to the data table 530 of the target database, and classify the acquired data into the same subset with the field name "accounting management data _ subject balance table". When the user wants to analyze the same type of data, all the related data can be acquired at once only in the subset of the "accounting management data _ subject balance table".
In some embodiments, matching fields may be configured in the collection template and Structured Query Language (SQL) may be employed. Using the SQL language, performing queries to a database, retrieving data from a database, inserting new records into a database, updating data in a database, deleting records from a database, creating a new database, creating new tables in a database, creating stored procedures in a database, creating views in a database, or setting up a table, storing procedures and permissions for views may be implemented.
FIG. 6 is a flow chart illustrating adata acquisition method 600 according to another embodiment of the invention. As shown in fig. 6, atstep 601, when data of the a database and the B database are to be collected to the target database at the same time, themethod 600 may establish a data transmission path between the a database and the target database, and between the B database and the target database, respectively. Next, atstep 602, themethod 600 identifies characteristics of the A and B databases, such as by using an identification module in a data acquisition device of the present invention. Here, the characteristics of the a database and the B database may include their types or data structures. In one embodiment, the a and B databases may be of disparate types and the first and second data tables may also be of different types and structures. For this reason, a single acquisition template cannot acquire data in the first data table and the second data table at the same time.
After completing identifying the features of the databases, atstep 603, themethod 600 may determine acquisition templates that match the a and B databases based on the type and structure of the a and B databases (or based on the different types and structures of the first and second data tables). According to an aspect of the invention, such a plurality of acquisition templates may be established through training of a large amount of data, and each acquisition template may have a type corresponding to the type of data to be acquired. After the recognition operation is performed using themethod 600, the A database may be matched to the first acquisition template. Similarly, the B database may be matched to a second acquisition template. Next, themethod 600 may collect the first data table of the a database by using the first collection template and the valid data required in the second data table of the B database by using the second collection template according to the requirement of the target database. For example, at step 604_1, themethod 600 collects a first subset of a first data table from the a database. Similarly, at step 604_2, themethod 600 may collect a second subset of the second data table from the B database.
After completing the data collection operations described above, atstep 605, themethod 600 may transmit the collected first and second subsets to a designated target database via a data transmission path. Further, in one or more embodiments, the collected data may also be stored in a specific data table to facilitate subsequent analysis of the data and perform additional processing and operations. It can be seen that, by the technical solution of the present invention, it is possible to acquire and store data of different systems (for example, different financial systems shown in fig. 5) in the same target database by using a set of systems (including two or more acquisition templates).
Through the above detailed description of the solution of the present invention, those skilled in the art can understand that the present invention identifies the database type through the automatic identification technology, and automatically matches to the appropriate acquisition template according to the database type and/or the data structure of the data table. Therefore, the data can be automatically collected, extracted, cleaned, checked, put in storage and the like through the collecting template, and accordingly labor cost is greatly reduced. After data are successfully collected, the data tables in the database are sorted and planned, so that operations such as system table translation, table relation identification and the like can be accelerated, basis and convenience are provided for front-end data collection personnel, and pre-preparation can be made for data management.
Further, based on the description of the present solution, those skilled in the art may also conceive that the present invention further discloses a data collecting apparatus for a data service system, where the data service system includes a source database and a target database, and the data collecting apparatus includes: at least one processor; at least one memory storing computer program instructions that, when executed by the at least one processor, cause the data acquisition apparatus to perform: establishing a data transmission path between the source database and the target database; identifying characteristics of the source database from which data is to be collected; determining a matched acquisition template according to the characteristics of the source databases, wherein each of one or more acquisition templates is matched with the characteristics of one of the one or more source databases; and collecting data from the source database by using the matched collection template, and transmitting the data to the target database through a data transmission path.
Correspondingly, the invention also discloses a data service system, which comprises a source database, a target database, and the data acquisition device and a plurality of embodiments thereof described in conjunction with fig. 2, fig. 3 and the above. Further, the present invention also discloses a computer readable storage medium comprising data acquisition program instructions for a data service system, which when executed by a processor, perform the method described in connection with fig. 4-6 and embodiments thereof.
It should be appreciated that any module, unit, component, server, computer, terminal, or device executing instructions exemplified herein may include or otherwise have access to a computer-readable medium, such as a storage medium, computer storage medium, or data storage device (removable) and/or non-removable), e.g., a magnetic disk, optical disk, or magnetic tape. Computer storage media may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules or other data.
Examples of computer storage media include RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by an application, a module, or both. Any such computer storage media may be part of, or accessible or connectable to, a device. Any applications or modules described herein may be implemented using computer-readable/executable instructions that may be stored or otherwise maintained by such computer-readable media.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification and claims of this application, the singular form of "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the term "and/or" as used in the specification and claims of this specification refers to any and all possible combinations of one or more of the associated listed items and includes such combinations.
The principles of the present invention have been explained above by means of a number of embodiments, and such an explanation is only intended to help understand the method of the present invention and its core idea. The invention is not limited to the embodiments described above, but rather to the embodiments of the invention, which are applicable to various fields of application.