Disclosure of Invention
Based on this, a data routing construction method, a device, computer equipment and a storage medium are provided for the problems that data transmission between different databases uses middleware such as Mycat and the like, real-time monitoring is needed, and dynamic data routing cannot be automatically switched.
A method for constructing a data route comprises the following steps:
acquiring sample data of a data source, inquiring a characteristic identifier in the sample data, and acquiring a data structure of the data source according to the characteristic identifier;
acquiring a data conversion model corresponding to the data structure, and participating the sample data into the data conversion model to obtain database storage data corresponding to the sample data;
summarizing database storage data corresponding to a plurality of sample data of the data source, extracting common items in the database storage data, converting the common items into conditional statements, summarizing all the conditional statements, and obtaining a database template corresponding to the data source;
extracting an identifier ID value in the database template, generating a main key field of the database table according to the identifier ID value, and establishing an index of the database table according to the main key field;
obtaining a database category after querying the database table by taking category feature words in an index of the database table as query conditions, and obtaining metadata corresponding to the database table according to the database category, wherein the metadata corresponds to data routes one by one;
and receiving a data route query request, extracting keyword information in the data route query request, and matching the keyword information with the metadata to obtain a data route corresponding to the data route query request.
In one possible embodiment, the obtaining the data conversion model corresponding to the data structure, and entering the sample data into the data conversion model to obtain the database storage data corresponding to the sample data includes:
acquiring a data conversion model corresponding to the data structure, and dividing the sample data into a test group and a verification group;
the test group is added into the data conversion model for data conversion, and database storage data corresponding to the test group is obtained;
the verification group is added into the data conversion model for data conversion, and database storage data corresponding to the verification group is obtained;
and comparing the data types of the database storage data corresponding to the test group with the data types of the database storage data corresponding to the verification group, if so, obtaining the database storage data corresponding to the sample data, and otherwise, performing data conversion again until the data types are consistent.
In one possible embodiment, the extracting the identifier ID value in the database template, generating a primary key field of the database table according to the identifier ID value, and establishing an index of the database table according to the primary key field includes:
extracting the ID values of the identifiers in the database template, and extracting the maximum value in the ID values of the identifiers as an initial ID value;
adding a preset value to the initial ID value to obtain a real-time identifier ID value;
carrying out preset scale conversion on the real-time identifier ID value to obtain a main key field of the database table;
and clustering the database tables with the same primary key fields, then giving class identification, summarizing the class identification, and establishing an index of the database tables.
In one possible embodiment, after the database table is queried according to the query condition of the category feature word in the index of the database table, the category of the database is obtained, and the metadata corresponding to the database table is obtained according to the category of the database, where the metadata corresponds to the data route one to one. The method comprises the following steps:
acquiring category feature words in an index of the database table, traversing each database table according to the category feature words, and extracting all database tables with the category feature words;
if the category feature words of all the database tables are uniform, taking the category feature words as categories of the database, if the category feature words of any two database tables are not uniform, taking the category feature words with the highest occurrence frequency of the category feature words as the categories of the database, and if the category feature words with the highest occurrence frequency are more than one, voting by adopting a voting mechanism to obtain the categories of the database;
and establishing a metadata acquisition node according to the database category, and extracting metadata corresponding to the database table from the metadata acquisition node.
In one possible embodiment, the receiving a data routing query request, extracting keyword information in the data routing query request, and matching the keyword information with the metadata to obtain a data route corresponding to the data routing query request includes:
acquiring the configuration information of the terminal system which sends the data routing query request, and acquiring a corresponding database template according to the configuration information of the terminal system;
extracting keyword information in the data routing query request, and obtaining data type information after the keyword is entered into the database template;
and acquiring the node positions of the metadata corresponding to the data type information in a database, summarizing the node positions, and acquiring the data route corresponding to the data route query request.
In one possible embodiment, after receiving the data routing query request, extracting keyword information in the data routing query request, and matching the keyword information with the metadata to obtain a data route corresponding to the data routing query request, the method further includes:
acquiring the access quantity of the data route, and if the access quantity is greater than an access quantity threshold value, marking the data corresponding to the data route as hot data;
after the hot data are cached, the cache address is recorded into the database template, and the database template is applied to generate a data route for accessing the hot data.
A data route constructing device comprises the following modules:
the template generation module is used for acquiring sample data of a data source, inquiring the characteristic identifier in the sample data and obtaining the data structure of the data source according to the characteristic identifier; acquiring a data conversion model corresponding to the data structure, and participating the sample data into the data conversion model to obtain database storage data corresponding to the sample data; summarizing database storage data corresponding to a plurality of sample data of the data source, extracting common items in the database storage data, converting the common items into conditional statements, summarizing all the conditional statements, and obtaining a database template corresponding to the data source;
the table index creating module is set to extract the ID value of the identifier in the database template, generate a main key field of the database table according to the ID value of the identifier, and create an index of the database table according to the main key field;
the metadata acquisition module is set to use category feature words in the index of the database table as query conditions, obtain the category of the database after querying the database table, and obtain metadata corresponding to the database table according to the category of the database;
and the route acquisition module is used for receiving the data route query request, extracting the keyword information in the data route query request, and matching the keyword information with the metadata to obtain the data route corresponding to the data route query request.
In one possible embodiment, the table index creation module is further configured to:
extracting the ID values of the identifiers in the database template, and extracting the maximum value in the ID values of the identifiers as an initial ID value; adding a preset value to the initial ID value to obtain a real-time identifier ID value; carrying out preset scale conversion on the real-time identifier ID value to obtain a main key field of the database table; and clustering the database tables with the same primary key fields, then giving class identification, summarizing the class identification, and establishing an index of the database tables.
A computer device comprising a memory and a processor, the memory having stored therein computer-readable instructions which, when executed by the processor, cause the processor to perform the steps of the above-described data route construction method.
A storage medium having stored thereon computer-readable instructions which, when executed by one or more processors, cause the one or more processors to perform the steps of the above-described data routing construction method.
Compared with the existing mechanism, the method and the device have the advantages that the database template is established to quickly query the metadata of the database table, so that the mode of acquiring the data route from the database is simplified, and the automatic switching of the data route is realized.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
Fig. 1 is an overall flowchart of a method for constructing a data route in an embodiment of the present application, and as shown in fig. 1, the method for constructing a data route includes the following steps:
s1, obtaining sample data of a data source, inquiring a feature identifier in the sample data, and obtaining a data structure of the data source according to the feature identifier;
specifically, different data sources have different data structures, and user raw data of the internet is mostly stored and transmitted through a structured computer language (for example, data formats such as xml, json, binary system and the like). However, the source channels of the original data are various, different data sources often adopt the same data structure, for example, user card swiping consumption data obtained from bank A is stored in an xml format, and user telephone bill data obtained from communication company B is stored in a json format; therefore, the original data have the characteristics of different structures, high data redundancy degree, nonlinearity and the like; the data storage format required by the database is often different from the original data format. In addition, different characteristic identifiers are used in different data structures, such as data in an xml format, and a large number of start tags and end tags are included for marking data contents; data in json format, data objects are represented by "{ }" and "[ ]" groups.
S2, acquiring a data conversion model corresponding to the data structure, and participating the sample data into the data conversion model to obtain database storage data corresponding to the sample data;
specifically, the data conversion model is obtained by training in a machine learning manner, and different data conversion models are required to be adopted for different data structures. When machine learning is adopted, standard data can be input for training, and then other data can be input for checking. Specifically, in the scheme, the machine learning clustering or classification algorithm can be used for logically classifying the sample data, and then the data type of the corresponding database storage data can be obtained according to the result of the logical classification.
In one embodiment, the obtaining the data conversion model corresponding to the data structure, and entering the sample data into the data conversion model to obtain the database storage data corresponding to the sample data includes:
acquiring a data conversion model corresponding to the data structure, and dividing the sample data into a test group and a verification group;
the test group is added into the data conversion model for data conversion, and database storage data corresponding to the test group is obtained;
the verification group is added into the data conversion model for data conversion, and database storage data corresponding to the verification group is obtained;
and comparing the data types of the database storage data corresponding to the test group with the data types of the database storage data corresponding to the verification group, if so, obtaining the database storage data corresponding to the sample data, and otherwise, performing data conversion again until the data types are consistent.
S3, summarizing database storage data corresponding to a plurality of sample data of the data source, extracting common items in the database storage data, converting the common items into conditional statements, summarizing all the conditional statements, and obtaining a database template corresponding to the data source;
specifically, there is a difference in the data stored in the database corresponding to the sample data in the same data source, which is caused by the difference in the parameters in the data. After the common item of the data stored in each database is extracted, the common item is added with "()" to make the common item become a conditional statement, so that the user only needs to input parameters to complete the desired data information. And extracting common items in all sample data of a data source to obtain a database template.
S4, extracting the ID value of the identifier in the database template, generating a primary key field of the database table according to the ID value of the identifier, and establishing an index of the database table according to the primary key field;
specifically, the identifier ID that reflects the type of the database exists in the database template, the relational database and the distributed database are respectively represented by different identifiers, and the identifier ID value may be assigned by a number according to the device fingerprint. And obtaining the storage position of the database table in the database according to the primary key field, and further obtaining the database table index.
S5, using category feature words in the index of the database table as query conditions, obtaining database categories after querying the database table, and obtaining metadata corresponding to the database table according to the database categories, wherein the metadata corresponds to data routes one by one;
metadata (Metadata), also called intermediary data and relay data, is data (data about data) describing data, and is mainly information describing data attribute (property), and is used to support functions such as indicating storage location, history data, resource search, file record, and the like. Metadata is an electronic catalog, and in order to achieve the purpose of creating a catalog, the contents or features of data must be described and collected, so as to achieve the purpose of assisting data retrieval. The metadata can effectively determine the position information of the database table, and the data information in the database can be inquired through the analysis of the metadata.
S6, receiving the data route query request, extracting the keyword information in the data route query request, and matching the keyword information with the metadata to obtain the data route corresponding to the data route query request.
Specifically, for a data routing query request input by a user, keyword query may be performed on the data routing query request according to historical data, and then all keyword information included in the query request is obtained according to a query result, where each metadata corresponds to one keyword.
In this embodiment, by establishing the database template, the manner of acquiring the data route from the database is simplified, and automatic switching of the data route is realized.
Fig. 2 is a schematic diagram illustrating a table index creation process in a data routing construction method according to an embodiment of the present application, where as shown in the drawing, the S4 extracts an identifier ID value in the database template, generates a primary key field of a database table according to the identifier ID value, and establishes an index of the database table according to the primary key field, where the table index creation process includes:
s41, extracting the ID values of the identifiers in the database template, and extracting the maximum value in the ID values of the identifiers as an initial ID value;
when the ID values of two or more identifiers possibly existing in the database template are both maximum values, the identifier ID value may be discarded, and the secondary maximum values corresponding to other identifiers are used as the initial ID value.
S42, adding a preset value to the initial ID value to obtain a real-time identifier ID value;
specifically, the setting range of the predetermined value is wide and flexible, and may be, for example, 10, or 1000, or 5000, or 10000, and the specific setting is mainly determined by actual needs, for example, the configuration may be performed according to the ID frequency, when the new frequency of the software system is good, the predetermined value may be set to be larger to obtain better performance, and if the new frequency is mainly the query operation, the predetermined value may be set to be smaller.
S43, carrying out preset scale conversion on the real-time identifier ID value to obtain a primary key field of the database table;
the preset system can be binary, decimal or thirty-six system, and the system is selected according to actual needs. Different data bases are adopted to adapt to the requirements of the structure and the storage capacity of the data base.
And S44, clustering the database tables with the same primary key fields, giving class identifiers, summarizing the class identifiers, and establishing indexes of the database tables.
Specifically, a tree structure may be adopted when establishing the index of the database table, that is, the main key fields in the plurality of database tables are subjected to field common item extraction to serve as the main node of the tree structure, and the non-common items serve as the slave nodes, so that the index tree structure is established.
In this embodiment, the database table index is effectively established by performing primary key field conversion on the database template ID value.
Fig. 3 is a schematic diagram of a metadata obtaining process in a data routing construction method in an embodiment of the present application, as shown in the drawing, in which, in step S5, a database category is obtained after querying the database table according to the category feature words in the index of the database table as query conditions, metadata corresponding to the database table is obtained according to the database category, a database category is obtained after querying the database table by using the category feature words in the index of the database table as query conditions, and metadata corresponding to the database table is obtained according to the database category, where the metadata corresponds to a data routing one-to-one, and the metadata includes:
s51, acquiring category feature words in the index of the database table, traversing each database table according to the category feature words, and extracting all database tables with the category feature words;
specifically, different index information is recorded in the database table index, some index information is specific to the database table function, and some index information is specific to the database table category. The information in the database table index can be queried according to the category feature words used in the historical data of the database table, and all the database tables with the category feature words are obtained.
S52, if the category feature words of all the database tables are uniform, taking the category feature words as the categories of the database, if the category feature words of any two database tables are not uniform, taking the category feature words with the largest number of occurrences of the category feature words as the categories of the database, and if the number of occurrences of the category feature words is more than one, voting by adopting a voting mechanism to obtain the categories of the database;
the voting mechanism is a mechanism commonly used in a machine learning algorithm, namely a plurality of class classifiers are arranged or trained in a machine learning model, classification authenticity judgment is carried out on class feature words, voting is carried out according to a judgment result, and the number of the classifiers is an odd number.
And S53, establishing a metadata acquisition node according to the database type, and extracting metadata corresponding to the database table from the metadata acquisition node.
Specifically, different types of databases have different mechanisms for obtaining metadata. For example, the metadata acquisition node of the conventional relational database is unique, while the distributed database may have multiple nodes when acquiring the metadata, that is, the metadata may be stored in different servers. Therefore, by analyzing the database category, metadata acquisition can be performed efficiently.
In this embodiment, the voting mechanism is used to effectively identify the database category, so as to obtain the metadata corresponding to the database table.
In an embodiment, the S6, receiving the data routing query request, extracting keyword information in the data routing query request, and matching the keyword information with the metadata to obtain a data route corresponding to the data routing query request, includes:
acquiring the configuration information of the terminal system which sends the data routing query request, and acquiring a corresponding database template according to the configuration information of the terminal system;
specifically, different systems use different formats when sending out data routing query information. Therefore, when querying a data route, the system needs to be identified first to obtain corresponding metadata.
Extracting keyword information in the data routing query request, and obtaining data type information after the keyword is entered into the database template;
specifically, when the keywords are entered into the database template, the keywords may be sequentially entered into each conditional statement entered into the database template, then the conditional statement is executed, if the execution is successful, the keywords belong to the conditional statement, then the execution results are collected, and the execution results related to the data type are screened out to obtain the data type information.
And acquiring the node positions of the metadata corresponding to the data type information in a database, summarizing the node positions, and acquiring the data route corresponding to the data route query request.
Specifically, when the node position is obtained, the access statistical data of the data type access database may be obtained first, then the pre-node position is set according to the access statistical data, and then the pre-node use or discard is analyzed according to the database table corresponding to the metadata at the pre-node.
In the embodiment, by using the metadata and the database template, the data routes of the data stored in different databases are accurately obtained.
In one embodiment, after receiving the data routing query request, extracting keyword information in the data routing query request, and matching the keyword information with the metadata to obtain a data route corresponding to the data routing query request, the method further includes:
acquiring the access quantity of the data route, and if the access quantity is greater than an access quantity threshold value, marking the data corresponding to the data route as hot data;
the visit amount threshold is obtained through historical data statistics, for example, the visit amount of three letters of 'NBA' is 1000 times/month, and if the visit amount exceeds the visit amount threshold 500 times/month, 'NBA' is used as hot spot data.
After the hot data are cached, the cache address is recorded into the database template, and the database template is applied to generate a data route for accessing the hot data.
The hot data are cached, so that the data with large access amount can be acquired in time, and the acquisition time is saved.
Technical features mentioned in any of the embodiments or implementation manners corresponding to fig. 1 to 3 are also applicable to the embodiment corresponding to fig. 4 in the present application, and similar parts are not repeated in the following.
A method for constructing a data route in the present application is explained above, and an apparatus for performing the above-mentioned construction of the data route is described below.
Fig. 4 is a block diagram of a data route construction apparatus, which is applicable to the construction of a data route. The data route construction device in the embodiment of the present application can implement the steps corresponding to the data route construction method executed in the embodiment corresponding to fig. 1. The functions realized by the data route constructing device can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the above functions, which may be software and/or hardware. The data route constructing device can comprise a template generating module, a table index creating module, a metadata obtaining module and a route obtaining module.
The template generation module is configured to acquire sample data of a data source, query a feature identifier in the sample data, and obtain a data structure of the data source according to the feature identifier; acquiring a data conversion model corresponding to the data structure, and participating the sample data into the data conversion model to obtain database storage data corresponding to the sample data; summarizing database storage data corresponding to a plurality of sample data of the data source, extracting common items in the database storage data, converting the common items into conditional statements, summarizing all the conditional statements, and obtaining a database template corresponding to the data source;
the table index creating module is set to extract the ID value of the identifier in the database template, generate a main key field of the database table according to the ID value of the identifier, and create an index of the database table according to the main key field;
the metadata acquisition module is set to use category feature words in the index of the database table as query conditions, obtain the category of the database after querying the database table, and obtain metadata corresponding to the database table according to the category of the database;
and the route acquisition module is used for receiving the data route query request, extracting the keyword information in the data route query request, and matching the keyword information with the metadata to obtain the data route corresponding to the data route query request.
In some embodiments, the table index creation module is further to:
extracting the ID values of the identifiers in the database template, and extracting the maximum value in the ID values of the identifiers as an initial ID value; adding a preset value to the initial ID value to obtain a real-time identifier ID value; carrying out preset scale conversion on the real-time identifier ID value to obtain a main key field of the database table; and clustering the database tables with the same primary key fields, then giving class identification, summarizing the class identification, and establishing an index of the database tables.
In one embodiment, a computer device is provided, the computer device includes a memory and a processor, the memory stores computer readable instructions, and when executed by the processor, the computer readable instructions cause the processor to execute the steps of the data route construction method in the above embodiments.
In one embodiment, a storage medium storing computer-readable instructions is provided, which when executed by one or more processors, cause the one or more processors to perform the steps of the method for constructing a data route in the above embodiments. Wherein the storage medium may be a non-volatile storage medium.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable storage medium, and the storage medium may include: a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic or optical disk, or the like.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-described embodiments are merely illustrative of some embodiments of the present application, which are described in more detail and detail, but are not to be construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.