Disclosure of Invention
In view of this, embodiments of the present invention provide a data processing method and apparatus for a database, a storage medium, and an electronic device, which mainly solve the problems that the existing data classification method is still used to classify cold data in a classified manner, and the classification of the cold data is an ineffective process, so that the complexity of data classification is increased, and the effect of data classification is affected.
In a first aspect, an embodiment of the present invention provides a data processing method for a database, including:
acquiring all flow data and all query statements of a database;
constructing a virtual table according to each flow data and query statement;
and carrying out hierarchical classification processing on the virtual table.
In a possible implementation manner, the constructing a virtual table according to each data traffic and the query statement includes:
analyzing each query statement to obtain target table information and target field information, wherein the target table information is information of a table searched by the query statement, and the target field information is information of a field searched by the query statement;
obtaining data information transmitted by each flow data based on each flow data;
and constructing a virtual table according to all the target table information, the target field information and the data information.
In a possible implementation manner, the performing a hierarchical classification process on the virtual table includes:
judging whether the data information of each virtual table is a sensitive table, and if the virtual table is the sensitive table, determining the type of the sensitive table; and if the virtual table is not a sensitive table, determining that the virtual table is a non-sensitive table.
In a possible implementation manner, the determining whether each of the virtual tables is a sensitive table includes:
judging whether the target field information of each virtual table is sensitive information, and if the target field information of the virtual table is sensitive information, determining that the virtual table is a sensitive table; and if all the target field information of the virtual table is not sensitive information, determining that the virtual table is not a sensitive table.
In a possible implementation manner, after determining that the virtual table is a sensitive table, the method further includes:
determining the position of the sensitive information of the sensitive table and the type of the sensitive information;
and based on the position of the sensitive information and the type of the sensitive information, finding the sensitive information and marking a label corresponding to the type.
In a possible implementation manner, the determining a location of the sensitive information of the sensitive table includes:
generating corresponding mirror image flow data according to each flow data;
identifying each mirror image flow data to obtain asset information of a database;
establishing an incidence relation among the asset information, the target table information and the target field information of each database;
and determining the position of the sensitive information of the sensitive table based on the association relation among the asset information, the target table information and the target field information of each database.
In one possible implementation, the hierarchical classification processing on the virtual table includes:
acquiring an association relation among the virtual tables;
determining an association table associated with a first virtual table according to the association relationship among the virtual tables, wherein the first virtual table is any one of all the virtual tables;
and determining the type of the first virtual table according to the target field information of the first virtual table and the associated table.
In a second aspect, an embodiment of the present invention provides a data processing apparatus for a database, including:
the acquisition module is used for acquiring all flow data and query statements of the database;
the generating module is used for generating corresponding mirror image flow data according to each flow data;
the component module is used for constructing a virtual table according to each flow data, the mirror flow data and the query statement;
and the hierarchical classification module is used for carrying out hierarchical classification processing on the virtual table.
In a third aspect, an embodiment of the present invention provides a computer-readable storage medium, where at least one executable instruction is stored in the computer-readable storage medium, and the executable instruction causes a processor to execute an operation corresponding to the data processing method of the database in any one of the above schemes.
In a fourth aspect, an embodiment of the present invention provides an electronic device, including: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;
the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the data processing method of the database in any scheme.
According to the data processing method, the data processing device, the storage medium and the electronic equipment of the database provided by the embodiment of the invention, the data information searched within a certain time is constructed into the virtual table based on the data flow and the query statement, so that the data information which is not searched within a certain time, namely cold data, is removed, and thus, when the virtual table is classified in a grading way, the cold data cannot be classified in a grading way, so that the complexity of the grading classification of the data is reduced, and the grading classification effect of the data is improved.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a more thorough understanding of the present invention. It will be apparent, however, to one skilled in the art, that the present invention may be practiced without one or more of these specific details. In other instances, well-known features have not been described in order to avoid obscuring the invention.
It should be noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular is intended to include the plural unless the context clearly dictates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
Exemplary embodiments according to the present invention will now be described in more detail with reference to the accompanying drawings. These exemplary embodiments may, however, be embodied in many different forms and should not be construed as limited to only the embodiments set forth herein. It is to be understood that these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of these exemplary embodiments to those skilled in the art.
In a first aspect, as shown in fig. 1, an embodiment of the present invention provides a data processing method for a database, including:
step S101: and acquiring all flow data and all query statements of the database.
The traffic data refers to a data stream formed between the database server and the user side after the database server responds to a search request initiated by the user and establishes a session with the user side through a network. The user-initiated search request is generated by a user performing a recognizable operation on a computer system, and the network includes but is not limited to a wireless network, the Internet, and the like.
All the flow data of the database refers to all the flow data within a specified time, the specified time may be from the time when the database is established to the current time, or may be a period of time specified by the user, such as one year or one month, and the specified time is not strictly limited in this embodiment.
When a user has a query request, a query statement may be output through a user interface component provided by a data query system running in the computer device. The type of the Query statement of the user data may be a Structured Query Language (SQL) statement, or may be other types of statements. Similarly, all query statements refer to all query statements within a specified time, and the specified time may be from the time when the database is completely built to the current time, or may be a period of time specified by the user, such as one year or one month, and the specified time is not strictly limited in this embodiment.
Step S102: and constructing a virtual table according to each flow data and the query statement.
In the embodiment, the virtual table can be constructed by analyzing each flow data and query statement without accessing the database by acquiring the user name and the password of the user, so that a complex access process is avoided, the risk of data leakage caused by account leakage is avoided, the database is not invasive, and the safety of the database is improved.
Step S103: and carrying out hierarchical classification processing on the virtual table.
According to the data processing method of the database provided by the embodiment of the invention, the data information searched within a certain time is constructed into the virtual table based on the data flow and the query statement, so that the data information (namely cold data) which is not searched within a certain time is removed, and the cold data can not be classified in a grading way when the virtual table is classified in a grading way, thereby reducing the complexity of the grading classification of the data and improving the grading classification effect of the data.
In a possible implementation manner, as shown in fig. 2, step S102 specifically includes:
step S201: and analyzing each query statement to obtain target table information and target field information, wherein the target table information is information of a table searched by the query statement, and the target field information is information of a field searched by the query statement.
The target table information includes, but is not limited to, the table name of the table looked up by the query statement, and the target field information includes, but is not limited to, the field name of the field looked up by the query statement. Both the table name and the field name may be expressed in the form of characters, for example, the table name is "customer information table" and the field name is "name". Of course, the table name and the field name may also be expressed in other forms, which is not limited in the present invention.
When the query statement is an SQL statement, the query statement can be analyzed by a regular expression or other third-party SQL statements, so that target table information and target field information are obtained. And under the condition that the query statement adopts other statements, analyzing by adopting a corresponding analysis mode to obtain corresponding target information and target field information.
Step S202: based on each traffic data, data information transmitted by each traffic data is obtained.
And analyzing each flow data to obtain data information transmitted by each flow data, namely result data obtained by user query.
Step S203: and constructing a virtual table according to all the target table information, the target field information and the data information.
Based on all the target table information, the target field information and the data information, a corresponding virtual table can be constructed and formed by utilizing the corresponding relation among each target table information, each target field information and each data information.
Illustratively, the target table information includes a "customer information table" and a "sales performance table", wherein the target field information of the "customer information table" includes "customer name", "gender", "data information of the customer name" includes "zhang san", "lie san", and data information of the corresponding "gender" includes "woman", "man"; the target field information of the sales performance table comprises a salesman name and a sales amount, the data information of the salesman name comprises a small name and a small red, the data information of the corresponding sales amount comprises 100 yuan and 150 yuan, and therefore a virtual table 1 of a customer information table and a virtual table 2 corresponding to the sales performance table are constructed.
Virtual table 1 customer information table
| Customer name | Sex |
| Zhang San | Woman |
| Li Si | For male |
Virtual table 2 sales performance table
| Customer name | Sales achievement |
| Xiaoming liquor | 100 yuan |
| Xiao Hong | 150 yuan |
In this embodiment, corresponding target table information, target field information, and data information are obtained by analyzing each query statement and traffic data to construct a corresponding virtual table, and a user name and a password of a user do not need to be acquired to access a database, so that a complex access process is avoided, a risk of data leakage caused by account leakage is also avoided, and the database is not intrusive. Meanwhile, data which is not inquired in specific time, namely cold data, cannot appear in the virtual table, and therefore grading and classifying processing cannot be carried out on the cold data when the virtual table is subjected to grading and classifying processing in the follow-up process, complexity of grading and classifying data is reduced, and the grading and classifying effect of the data is improved.
In another possible implementation manner, as shown in fig. 3, step S103 includes:
step S301: judging whether each virtual table is a sensitive table, if so, executing the step S302; if the virtual table is not a sensitive table, step S303 is executed.
Step S302: the type of the sensitive table is determined.
Step S303: and determining the virtual table to be a non-sensitive table.
In this embodiment, first, whether the virtual table is a sensitive table is determined, and if the virtual table is a sensitive table, the table is classified and a sensitive level is determined; if the virtual table is not a sensitive table, the table is directly determined to be a non-sensitive table, and then a hierarchical classification report can be formed and displayed to a user so as to be convenient for the user to check.
In the above embodiment, as shown in fig. 4, the determining whether each virtual table is a sensitive table specifically includes:
step S401: judging whether the target field information of each virtual table is sensitive information, if so, executing the step S402; if all the target field information of the virtual table is not sensitive information, step S403 is executed.
Step S402: and determining the virtual table as a sensitive table.
Step S403: determining that the virtual table is not a sensitive table.
Specifically, a sensitive dictionary can be pre-selected and stored in the processing system, the sensitive dictionary can store all sensitive words, each target field information of each virtual table can be matched with the sensitive words in the sensitive dictionary one by one, if one or more target field information of the virtual table is successfully matched with the sensitive words, the virtual table is determined to be the sensitive table, and if all target field information of the virtual table is not matched with all sensitive words of the sensitive dictionary, the virtual table is determined not to be the sensitive table. Sensitive words in the sensitive dictionary can be added or deleted by the user according to actual requirements.
For example, assuming that the sensitive words of the sensitive dictionary are "user name" and "user mobile phone", the target field information "user name" in the virtual table 1 in the above example is successfully matched with the sensitive word "user name", and then the virtual table 1 is determined to be the sensitive table, and the target field information of the virtual table 2 is not matched with the "user name" and "user mobile phone", and then the virtual table 2 is determined not to be the sensitive table.
Furthermore, the sensitivity level of the sensitive table can be determined according to the quantity of the sensitive information, that is, the greater the quantity of the sensitive information contained in the sensitive table, the higher the sensitivity level. Each sensitivity level corresponds to a certain number of threshold values of the quantity of the sensitive information, and if the sensitive information included in the sensitive table reaches the threshold value of the quantity of the sensitive information of one sensitivity level, the sensitivity level of the sensitive table can be determined. For example, assuming that the threshold of the amount of sensitive information corresponding to the first sensitivity level is 3, and the threshold of the amount of sensitive information corresponding to the second sensitivity level is 6, if the sensitivity table 1 contains 4 pieces of sensitive information in total, the sensitivity table 1 is the first sensitivity level.
In yet another possible implementation manner, as shown in fig. 5, after step S402, the method further includes:
step S501: and determining the position of the sensitive information and the type of the sensitive information of the sensitive table.
The sensitive dictionary also stores sensitive word types corresponding to each sensitive word, and under the condition that the field information of the virtual table is successfully matched with the sensitive words, the types of the sensitive words are determined as the types of the sensitive information. For example, if the type corresponding to the sensitive word "user mobile phone" is "user contact address", the sensitive information in the virtual table includes "user mobile phone", and the type of the field information "user mobile phone" is determined as "user contact address".
Step S502: and based on the position of the sensitive information and the type of the sensitive information, finding the sensitive information and marking a label corresponding to the type.
The label may be a character of the same type as the label, or may be a corresponding specific symbol, and the application is not limited strictly. For example, the label of the type "user contact address" may be "user contact address", or "a".
Specifically, as shown in fig. 6, determining the location of the sensitive information of the sensitive table includes:
step S601: and generating corresponding mirror image flow data according to each flow data.
The mirror image flow data refers to flow data which is obtained by completely copying or intercepting part of information of flow data flowing through a certain device according to needs based on preset conditions and transmitting the flow data to other appointed receiving devices for flow processing. For example, the traffic data may be copied or intercepted from a dimension such as a port or a VLAN to obtain mirrored traffic data.
Step S602: and identifying each mirror image flow data to obtain the asset information of the database.
The database asset information mainly comprises, but is not limited to, a database asset IP, a port, a database type and the like, so that the database asset information can be combed by analyzing the mirror image flow information without combing the database asset information in advance.
Step S603: and establishing an incidence relation among the asset information, the target table information and the target field information of each database.
And establishing an association relationship between each database asset information and the target table information and the target field information in the corresponding database.
Step S604: and determining the position of the sensitive information of the sensitive table based on the association relation among the asset information, the target table information and the target field information of each database.
Based on the association relationship among the asset information of each database, the target table information and the target field information, the target table information and the sensitive information of the sensitive table and the corresponding database asset information can be found, so that the database where the sensitive table is located can be inquired according to the asset information of the database, then the sensitive table is found according to the target table information of the sensitive table, and then the sensitive information is found in the sensitive table according to the corresponding relationship between the target table information and the sensitive information of the sensitive table, so that the position of the sensitive information is determined.
In the above-described embodiment, as shown in fig. 7, the step of step S103 further includes:
step S701: and acquiring the association relation among the virtual tables.
The association relationship between the virtual tables can be obtained through a preset algorithm, and the preset algorithm can adopt a join algorithm to associate two or more virtual tables through the target field information relationship between the virtual tables.
For example, the virtual table a, the virtual table B and the virtual table C are as follows, the "number" may determine that the virtual table a and the virtual table B have an association relationship, and the "address number" may determine that the virtual table B and the virtual table C have an association relationship.
Virtual table A product table
| Numbering | Product name | Price |
| 12 | Notebook computer | 18999.00 |
| 13 | Mobile phone | 1899.00 |
Virtual table B service subscription information table
| Numbering | Time of order | Address numbering |
| 12 | 2020/12/11 15:32 | 1000 |
| 13 | 2020/12/11 15:32 | 1000 |
Virtual table C address table
| Address numbering | Address | Name (I) |
| 1000 | Pudong New Area, Shanghai | Wang Wu |
| 1000 | Shanghai city in the iridescent region | Li Jia |
In the step, the incidence relation among the virtual tables is determined through the target field information, and compared with the existing method of determining the incidence relation through the main foreign key of the tables, the obtained incidence relation is more accurate. Of course, in other embodiments, the association between the virtual tables may also be realized through other algorithms to obtain the association relationship thereof, which is not limited in this application.
Step S702: and determining an association table associated with the first virtual table according to the association relationship among the virtual tables, wherein the first virtual table is any one of all the virtual tables.
The association table of the first virtual table includes a virtual table having a direct association relationship with the first virtual table, and a virtual table having an indirect association relationship with the first virtual packet.
Illustratively, continuing with the example in step S701, assuming that the virtual table a is the first virtual table, the virtual table B is a virtual table directly associated with the virtual table a, and the virtual table C is a virtual table indirectly associated with the virtual table a, so that the associated tables of the virtual table a are the virtual table B and the virtual table C.
Step S703: and determining the type of the first virtual table according to the target field information of the first virtual table and the associated table.
This step determines the type of the first virtual table by the target field information of the first virtual table in combination with the target field information of the associated table, the classification being more accurate than determining the type of the first virtual table by only the target field information within the first virtual table.
Illustratively, continuing with the example in step S702, the first virtual table (virtual table a) includes the target field information "product name", "price", the virtual table B includes the target field information "order time", "address number", and the virtual table C includes the target field information "address", "name". In the prior art, whether the first virtual table is a product table or a service ordering information table cannot be accurately determined only by the product name and the price of the first virtual table. In this step, the first virtual table can be determined to be a product table by combining the order time and the address number of the association table virtual table B of the first virtual table, so as to obtain an accurate type.
In a second aspect, an embodiment of the present invention provides a data processing apparatus for a database, as shown in fig. 8, the apparatus includes:
an obtainingmodule 801, configured to obtain all flow data and query statements of a database;
abuilding module 802, configured to build a virtual table according to each flow data and query statement;
and ahierarchical classification module 803, configured to perform hierarchical classification processing on the virtual table.
According to the data processing device of the database provided by the embodiment of the invention, the device constructs the data information searched within a certain time into the virtual table based on the data flow and the query statement, so that the data information which is not searched within a certain time, namely cold data, is removed, and the cold data can not be classified in a grading way when the virtual table is classified in a grading way, thereby reducing the complexity of data grading and classification and improving the effect of data grading and classification. Meanwhile, classification and classification are carried out in a virtual table building mode, an original database does not need to be processed, and the invasion of the original database is avoided, so that the safety of the database is improved.
In a third aspect, an embodiment of the present invention provides a computer-readable storage medium, where at least one executable instruction is stored in the computer-readable storage medium, and the executable instruction causes a processor to execute an operation corresponding to the data processing method for a database in any one of the above schemes.
In a fourth aspect, an embodiment of the present invention provides an electronic device, including: the processor, the memory, the communication interface and the communication bus, and the processor, the memory and the communication interface complete mutual communication through the communication bus; the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the data processing of the database of any scheme.
In particular, the program may include program code comprising computer operating instructions.
The processor may be a central processing unit CPU or an application Specific Integrated circuit asic or one or more Integrated circuits configured as an embodiment of the invention. The computer device includes one or more processors, which may be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.
And the memory is used for storing programs. The memory may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
The present invention has been illustrated by the above embodiments, but it should be understood that the above embodiments are for illustrative and descriptive purposes only and are not intended to limit the invention to the scope of the described embodiments. Furthermore, it will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that many variations and modifications may be made in accordance with the teachings of the present invention, which variations and modifications are within the scope of the present invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.