Detailed Description
The application is described in further detail below with reference to the accompanying drawings.
In one exemplary configuration of the application, the terminal, the device of the service network, and the trusted party each include one or more processors (e.g., central processing units (Central Processing Unit, CPUs)), input/output interfaces, network interfaces, and memory.
The Memory may include non-volatile Memory, random access Memory (Random Access Memory, RAM), and/or non-volatile Memory in a computer-readable medium, such as Read Only Memory (ROM) or Flash Memory (Flash Memory). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase-Change Memory (PCM), programmable Random Access Memory (Programmable Random Access Memory, PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (Dynamic Random Access Memory, DRAM), other types of Random Access Memory (RAM), read-Only Memory (ROM), electrically erasable programmable read-Only Memory (EEPROM), flash Memory or other Memory technology, read-Only Memory (Compact Disc Read-Only Memory, CD-ROM), digital versatile disks (DIGITAL VERSATILE DISC, DVD) or other optical storage, magnetic cassettes, magnetic tape storage or other magnetic storage devices, or any other non-transmission medium, which may be used to store information that may be accessed by the computing device.
The device according to the present application includes, but is not limited to, a terminal, a network device, or a device formed by integrating a terminal and a network device through a network. The terminal includes, but is not limited to, any mobile electronic product capable of performing man-machine interaction with a user (for example, performing man-machine interaction through a touch pad), such as a smart phone, a tablet computer, and the like, and the mobile electronic product can adopt any operating system, such as an Android operating system, an iOS operating system, and the like. The network device includes an electronic device capable of automatically performing numerical calculation and information processing according to a preset or stored instruction, and its hardware includes, but is not limited to, a microprocessor, an Application SPECIFIC INTEGRATED Circuit (ASIC), a programmable logic device (Programmable Logic Device, PLD), a field programmable gate array (Field Programmable GATE ARRAY, FPGA), a digital signal Processor (DIGITAL SIGNAL Processor, DSP), an embedded device, and the like. The network device includes, but is not limited to, a computer, a network host, a single network server, a plurality of network server sets or a Cloud of servers, where the Cloud is made up of a large number of computers or network servers based on Cloud Computing (Cloud Computing), which is one of distributed Computing, a virtual supercomputer made up of a group of loosely coupled computer sets. Including but not limited to the internet, wide area networks, metropolitan area networks, local area networks, VPN networks, wireless Ad Hoc networks (Ad Hoc networks), and the like. Preferably, the device may also be a program running on the terminal, the network device, or a device formed by integrating the terminal with the network device, the touch terminal, or the network device with the touch terminal through a network.
Of course, those skilled in the art will appreciate that the above-described devices are merely examples, and that other devices now known or hereafter may be present as applicable to the present application, and are intended to be within the scope of the present application and are incorporated herein by reference.
In the description of the present application, the meaning of "a plurality" is two or more unless explicitly defined otherwise.
Fig. 1 shows a flow chart of a method for marking sensitive data, according to an embodiment of the application, the method comprising step S11, step S12 and step S13. In step S11, the computer equipment performs sampling collection on log data of a database to obtain one or more sampling data, in step S12, the computer equipment identifies the sampling data according to classification grading rules, determines sensitive classification corresponding to the sampling data, and obtains sensitive level information corresponding to the sampling data under the sensitive classification, in step S13, the computer equipment marks sensitive marks on at least one sampling data according to the sensitive level information, and stores the at least one sampling data, wherein the sensitive marks comprise sensitive classification corresponding to each sampling data and sensitive level information.
In step S11, the computer device performs sampling on log data of the database to obtain one or more sampled data. In some embodiments, log data refers to operational behavior log data of a database, including, but not limited to, behavior time, behavior content (e.g., stored data read, inserted, modified, deleted in the database), behavior results (e.g., whether successful), behavior objects (i.e., at least one stored data in the database), and the like, for a certain operational behavior of the database. In some embodiments, the log data of the database may be sampled at a predetermined sampling rate, where the sampling may be performed at intervals according to the chronological order of the behavior, for example, if the sampling rate is 10%, the log data of 1 operation behavior is collected in every 10 operation behaviors that have occurred in the database, i.e., if the log data of operation behavior 1 is collected, the log data of operation behavior 2 that has occurred in the tenth time after the operation behavior 1 is collected next time according to the chronological order of the behavior. In some embodiments, the sampling manner may be random sampling without sequence, for example, first collecting a plurality of log data for all operation behaviors of the database, if the sampling rate is 10%, according to the number Num of the plurality of log data, randomly extracting the number Num of log data, which is 10%, from the plurality of log data as sampling data.
In step S12, the computer device identifies the sample data according to the classification rule, determines a sensitive classification corresponding to the sample data, and obtains sensitive level information corresponding to the sample data under the sensitive classification. In some embodiments, the classification rule includes a plurality of predetermined sensitive classifications (for example, personal identity information class, personal property information class) and an identification policy of each sensitive classification, the identification policy is used for identifying whether the sensitive classification corresponding to the sensitive classification is included in the data, a specific identification manner includes, but is not limited to, regular expression identification, keyword identification, model feature identification, and the like, the classification rule further includes a classification policy, the classification policy is used for determining, when the sensitive classification corresponding to the identification policy is included in the data is identified, a corresponding sensitivity level of the data under the sensitive classification, where the sensitivity level may be represented by a numerical form, for example, the larger the numerical value is, the more sensitive or unsafe the corresponding data is described, or the sensitivity level may be represented by a text form, for example, "light sensitive", "medium sensitive", "heavy sensitive", and the like, if the sensitive classification corresponding to a certain identification policy is included in the data is identified according to the classification rule, and then the corresponding sensitivity level of the data under the sensitive classification is determined according to the classification rule is continued to be determined. In some embodiments, the data may be classified into a class of sensitivity, if it is identified that the class of sensitivity corresponding to a certain classification policy is included in the data, the class of sensitivity corresponding to the class of sensitivity is directly used as the class of sensitivity corresponding to the class of sensitivity of the data, or may be classified into a plurality of different classes of sensitivity corresponding to the class of sensitivity, where the class of sensitivity corresponding to the class of sensitivity of the data needs to be determined according to a classification policy, and specific determining methods include, but are not limited to, a semantic analysis method, a keyword extraction method, a model feature method, etc., for example, the class of sensitivity corresponding to the class of sensitivity of the data is determined by a semantic analysis method, or the class of sensitivity corresponding to the class of sensitivity of the data is predetermined to be set, and the class of sensitivity corresponding to the class of sensitivity of the data is determined according to the class of sensitivity mapped by keywords included in the data, or the class of sensitivity corresponding to the class of sensitivity model corresponding to the class of sensitivity that has been trained is input to obtain the class of sensitivity corresponding to the class of sensitivity output by the model.
In step S13, the computer device marks at least one sample data according to the sensitivity level information, and stores the at least one sample data, where the sensitivity mark includes a sensitivity classification and sensitivity level information corresponding to each sample data. In some embodiments, sample data whose sensitivity level meets a predetermined level threshold may be determined as sensitive data according to a sensitivity level to which the sample data corresponds, e.g., sample data whose corresponding sensitivity level is greater than or equal to a predetermined level value threshold may be determined as sensitive data, or sample data that is identified as including at least one sensitivity class in a classification rule may also be determined as sensitive data. In some embodiments, sensitive data (i.e., at least one sample data) is marked with a corresponding sensitive label, the sensitive label including a sensitive class and a sensitive level corresponding to the sensitive data, and the sensitive data is stored. In some embodiments, the user may retrieve the stored sensitive data to obtain the demand data, or may also perform asset visualization analysis on the stored sensitive data and export from the report. In some embodiments, external invocation of stored sensitive data and/or export of stored sensitive data is also supported. According to the method and the device, the log data of the database can be intelligently classified and graded through the preset classification and grading rule, the corresponding sensitive marks are automatically marked, and the classification and grading rule used can be flexibly selected, so that an intelligent flow form is realized.
In some embodiments, the step S11 includes the computer device sampling the log data of the database through the plug-in which is already deployed on the database, and obtaining one or more sampled data. In some embodiments, the plug-in can be deployed (or installed) on the database to automatically sample and collect the log data of the database, and the plug-in belongs to a semi-invasive software application, so that the plug-in is low in cost, good in convenience and high in expandability, and can be used for directly acquiring the log data of the database, so that the plug-in has a more accurate effect.
In some embodiments, the method further comprises the computer device determining a classification rating rule corresponding to the database. In some embodiments, the classification rule used for the database needs to be determined first, and a specific determination manner may be to obtain a classification rule corresponding to the database selected by a user from a plurality of default classification rules, or may also be to determine a stored data feature corresponding to the database by performing semantic analysis on stored data in the database, and automatically determine a classification rule matched with the stored data feature according to the stored data feature.
In some embodiments, the determining the classification rating rule corresponding to the database includes obtaining a classification rating rule corresponding to the database selected by a user from a plurality of default classification rating rules. In some embodiments, a plurality of default classification grading rules are preset, and a user can freely and flexibly select one classification grading rule from the plurality of default classification grading rules as the classification grading rule corresponding to the database. For example, the classification rule may be a rule classified according to personal information protection, which includes a plurality of sensitive classifications of personal identity information, personal property information, and the like, and for example, a rule classified according to telecommunications carrier industry, which includes a plurality of sensitive classifications of user profile, location data, consumption information, and the like.
In some embodiments, the determining the classification rule corresponding to the database includes determining a stored data feature corresponding to the database by performing semantic analysis on stored data in the database, and determining a classification rule matching the stored data feature according to the stored data feature. In some embodiments, the stored data in the database may be analyzed to obtain stored data features in the database, where the stored data features are used to characterize what type of data of what kind of features are primarily stored in the database, and then based on the stored data features, automatically determining a classification rule for the stored data of that type of feature from a plurality of default classification rules, and takes the data as a classification rule corresponding to the database, for example, if the stored data features characterize that the database mainly stores the dialogue type message text data, the classification rule suitable for the dialogue type message text data may be a rule for classifying according to the inclusion of social information, which includes a plurality of sensitive classifications such as personal chat information, personal space release information, personal information, and the like.
In some embodiments, the determining the classification rule matched with the stored data feature according to the stored data feature comprises determining related sensitive scene information corresponding to the database according to the stored data feature, and determining the classification rule matched with the related sensitive information according to the related sensitive scene information. In some embodiments, the sensitive scene information related to the database may be determined according to the stored data features, and then the classification rule suitable for the sensitive scene information may be automatically determined according to the sensitive scene information in a plurality of default classification rules, for example, if the stored data features characterize that the database mainly stores data of types such as commodity links, commodity prices, receiving addresses, etc., the sensitive scene related to the database may be determined to be a shopping scene, and the classification rule suitable for the shopping scene may be automatically determined in a plurality of default classification rules, including a plurality of sensitive classifications such as personal consumption information, personal payment information, personal contact information, personal address information, etc.
In some embodiments, the classification rule comprises a classification policy and a sensitivity level policy, wherein the step S12 comprises the steps that the computer equipment identifies the sampled data according to the classification policy, determines the sensitivity level information corresponding to the sampled data under the sensitivity level policy, and determines the sensitivity level information corresponding to the sampled data under the sensitivity level policy. In some embodiments, the classification policy generally includes a plurality of preset sensitive classifications, and the classification policy is used for identifying whether the sampled data includes one or several sensitive classifications in the classification policy, and specific identification manners include, but are not limited to, regular expression identification, keyword identification, model feature identification, and the like. In some embodiments, one sample data may correspond to multiple sensitive classifications, i.e., if a certain sensitive classification is identified in the sample data according to a classification policy, it may be directly used as one of the sensitive classifications to which the sample data corresponds. In some embodiments, only one sensitive classification may be corresponding to one sample data, and if it is identified that the sample data includes multiple different sensitive classifications according to the classification rule, one sensitive classification needs to be determined among the multiple different sensitive classifications as the sensitive classification corresponding to the sample data, for example, each sensitive classification corresponds to a different identification confidence or matching degree, and the sensitive classification with the highest identification confidence or matching degree in the multiple different sensitive classifications may be used as the sensitive classification corresponding to the sample data. In some embodiments, after determining the sensitivity class to which the sampled data corresponds, the sensitivity level to which the sampled data corresponds under the sensitivity class is further determined according to a sensitivity level policy, the sensitivity level may be characterized in terms of a numerical value, e.g., the larger the numerical value, the more sensitive or less secure the corresponding data is declared, or the sensitivity level may be characterized in terms of text, e.g., "lightly sensitive," "moderately sensitive," "severely sensitive," etc. In some embodiments, the sensitivity level policy may include a sensitivity level corresponding to each sensitivity category, and the sensitivity level corresponding to the sensitivity category corresponding to the sampled data may be directly used as the sensitivity level corresponding to the sampled data under the sensitivity category. In some embodiments, a specific determination mode of the sensitivity level corresponding to each sensitivity category can be included in the sensitivity level policy, and the specific determination mode includes but is not limited to a semantic analysis mode, The method comprises the steps of determining a sensitivity class corresponding to sample data according to a sensitivity class determining mode corresponding to the sensitivity class, namely determining the sensitivity class corresponding to the sample data under the sensitivity class by a semantic analysis mode, or determining the sensitivity class corresponding to the data under the sensitivity class according to the sensitivity class mapped by the keywords included in the data by preset mapping relations between a plurality of keywords and the sensitivity class, or inputting the data into a trained sensitivity class model corresponding to the sensitivity class to obtain the sensitivity class corresponding to the data output by the sensitivity class model. In some embodiments, if the sampled data corresponds to a plurality of sensitivity classifications, the sensitivity level of the sampled data under each sensitivity classification needs to be determined first, and then the sensitivity level of the sampled data under the plurality of sensitivity classifications is determined according to the plurality of sensitivity levels.
In some embodiments, the sensitivity level policy includes first sensitivity level information corresponding to each sensitivity category under the category policy, wherein the determining the sensitivity level information corresponding to the sample data under the sensitivity category according to the sensitivity level policy includes obtaining the first sensitivity level information corresponding to the sensitivity category according to the sensitivity level policy, and determining the sensitivity level information corresponding to the sample data under the sensitivity category according to the first sensitivity level information. In some embodiments, each sensitivity category corresponds to a sensitivity level, and the sensitivity level policy includes a first sensitivity level corresponding to each sensitivity category under the category policy. In some embodiments, after determining the sensitivity class corresponding to the sample data, the first sensitivity level corresponding to the sensitivity class may be directly used as the sensitivity level corresponding to the sample data under the sensitivity class, or the first sensitivity level corresponding to the sensitivity class may be input into a predetermined functional relation, and the output of the functional relation may be used as the sensitivity level corresponding to the sample data under the sensitivity class.
In some embodiments, the sensitivity level policy further includes a plurality of sub-sensitivity classifications corresponding to each sensitivity classification under the classification policy and second sensitivity level information corresponding to each sub-sensitivity classification, where the sensitivity level information corresponding to the sampled data under the sensitivity classification is determined according to the first sensitivity level information. The method comprises the steps of determining sub-sensitive classification corresponding to the sampled data under the sensitive classification according to the sensitive level strategy, and determining sensitive level information corresponding to the sampled data under the sensitive classification according to the first sensitive level information and second sensitive level information corresponding to the sub-sensitive classification. In some embodiments, each sensitivity category further corresponds to a plurality of sub-sensitivity categories, each sub-sensitivity category corresponds to a sensitivity level, and the sensitivity level policy further includes a plurality of sub-sensitivity categories corresponding to each sensitivity category under the classification policy and a second sensitivity level corresponding to each sub-sensitivity category. In some embodiments, after determining the sensitivity class corresponding to the sample data, determining the sub-sensitivity class corresponding to the sample data under the sensitivity class according to a sensitivity level policy is needed, where specific determining manners include, but are not limited to, regular expression recognition, keyword recognition, semantic analysis, model feature recognition, and the like, by which specific sub-sensitivity class under the sensitivity class the sample data can be identified through the sensitivity level policy, then determining the sensitivity level corresponding to the sample data under the sensitivity class according to a first sensitivity level corresponding to the sensitivity class and a second sensitivity level corresponding to the sub-sensitivity class to which the sample data belongs, for example, if the sensitivity level is in a numerical form, an average value or a maximum value or a minimum value of the first sensitivity level and the second sensitivity level can be used as the sensitivity level corresponding to the sample data under the sensitivity class, or the first sensitivity level and the second sensitivity level can be input into a predetermined functional relation, and then the output of the functional relation can be used as the sensitivity level corresponding to the sample data under the sensitivity class.
In some embodiments, the sensitivity level policy includes a plurality of sub-sensitivity classifications corresponding to each sensitivity classification under the classification policy and third sensitivity level information corresponding to each sub-sensitivity classification, wherein determining the sensitivity level information corresponding to the sample data under the sensitivity classification according to the sensitivity level policy includes determining the sub-sensitivity classification corresponding to the sample data under the sensitivity classification according to the sensitivity level policy, and determining the sensitivity level information corresponding to the sample data under the sensitivity classification according to the third sensitivity level information corresponding to the sub-sensitivity classification. In some embodiments, the third sensitivity level is the same as or similar to the second sensitivity level described above, and will not be described again. In some embodiments, after determining the sensitivity classification corresponding to the sample data, determining a sub-sensitivity classification corresponding to the sample data under the sensitivity classification according to the sensitivity level policy, and determining the sensitivity level corresponding to the sample data under the sensitivity classification according to only the third sensitivity level corresponding to the sub-sensitivity classification, for example, taking the third sensitivity level corresponding to the sub-sensitivity classification as the sensitivity level corresponding to the sample data under the sensitivity classification.
In some embodiments, the method further comprises the steps of obtaining a rechecking result of a user on stored target sample data, and adjusting a sensitive mark of the target sample data according to the rechecking result if at least one of sensitive classification and sensitive level information corresponding to the target sample data is inconsistent with the rechecking result. In some embodiments, all the sensitive data (i.e., at least one sample data) may be presented to the user for review by the user, or only sensitive data with identification confidence or matching below or equal to a predetermined threshold may be presented to the user for review, where review refers to the user manually checking whether the sensitive marker on which the sensitive data was marked is accurate. In some embodiments, if the rechecking result of the user for the target sensitive data (i.e. the target sampling data) indicates that the marked sensitive mark of the target sensitive data is inaccurate, that is, if at least one of the sensitive classification and the sensitive level corresponding to the target sensitive data is inconsistent with the rechecking result, the sensitive mark of the target sensitive data needs to be adjusted according to the rechecking result, wherein specific adjustment manners include, but are not limited to, modifying the sensitive classification in the sensitive mark, modifying the sensitive level information in the sensitive mark, removing the sensitive mark for the target sensitive data, and canceling storing the target sensitive data.
In some embodiments, the adjusting the sensitivity level of the target sample data includes at least one of modifying a sensitivity classification in the sensitivity level, modifying sensitivity level information in the sensitivity level, removing the sensitivity level for the target sample data, and canceling storing the target sample data. In some embodiments, if the checked sensitivity level corresponding to the target sensitive data does not meet the predetermined level threshold, for example, the checked sensitivity level is smaller than the predetermined level numerical threshold, deleting the marked sensitivity mark for the target sensitive data, and canceling storing the target sensitive data. In some embodiments, if the rechecking result indicates that the target sensitive data does not include any predetermined sensitive classification in the classification rule, deleting the marked sensitive mark for the target sensitive data, and canceling storing the target sensitive data.
In some embodiments, the method further comprises the computer device adjusting the sampling rate of the database according to the sensitivity level information so that new log data of the database is sampled later using the adjusted sampling rate. In some embodiments, a ratio of a number of target sample data (for example, the target sample data may be the at least one sample data marked with a sensitive label, that is, the sensitive data) with a sensitivity level satisfying a predetermined level threshold to a total number of one or more sample data collected by sampling may be calculated according to a sensitivity level corresponding to the sample data, for example, a ratio of a number of sample data with a sensitivity level greater than or equal to a predetermined level value threshold to a total number of one or more sample data collected by sampling may be calculated, if the ratio value is greater than or equal to a predetermined first ratio threshold, an original sample rate of the database may be increased, an original sample rate may be increased according to a default amplitude, or an original sample rate of the database may be decreased according to a predetermined second ratio threshold, an original sample rate may be decreased according to a default amplitude, or an original sample rate may be dynamically determined according to a default amplitude, and then the sample rate may be adjusted again when the new sample data is required to be collected again.
FIG. 2 shows a flow chart of a method for marking sensitive data according to one embodiment of the application.
As shown in fig. 2, the intelligent data classification and classification system comprises an audit log and database asset acquisition module, a data classification and classification engine, an intelligent strategy configuration module, a manual rechecking module, a storage module and an asset retrieval and analysis module, wherein the audit log and database asset acquisition module is used for sampling and acquiring log data of a database in a plug-in mode, the intelligent strategy configuration module is used for intelligently configuring classification and classification rules used by the data classification and classification engine, the data classification and classification engine is used for carrying out sensitive data identification on the log data by utilizing the classification and classification rules, the manual rechecking module is used for carrying out manual rechecking on identification results of the log data, the storage module is used for storing identification data, and the asset retrieval and analysis module is used for calling stored assets to retrieve the identification data and carrying out visual analysis on the identification data.
Fig. 3 shows a block diagram of a computer device for marking sensitive data, the device comprising a one-to-one module 11, a two-to-two module 12 and a three-to-three module 13, according to one embodiment of the application. The system comprises a first module 11 for sampling and collecting log data of a database to obtain one or more sampling data, a second module 12 for identifying the sampling data according to classification rules, determining a sensitive classification corresponding to the sampling data, obtaining sensitive level information corresponding to the sampling data under the sensitive classification, and a third module 13 for marking sensitive marks for at least one sampling data according to the sensitive level information and storing the at least one sampling data, wherein the sensitive marks comprise the sensitive classification corresponding to each sampling data and the sensitive level information.
And a module 11, configured to sample and collect log data of the database, so as to obtain one or more sample data. In some embodiments, log data refers to operational behavior log data of a database, including, but not limited to, behavior time, behavior content (e.g., stored data read, inserted, modified, deleted in the database), behavior results (e.g., whether successful), behavior objects (i.e., at least one stored data in the database), and the like, for a certain operational behavior of the database. In some embodiments, the log data of the database may be sampled at a predetermined sampling rate, where the sampling may be performed at intervals according to the chronological order of the behavior, for example, if the sampling rate is 10%, the log data of 1 operation behavior is collected in every 10 operation behaviors that have occurred in the database, i.e., if the log data of operation behavior 1 is collected, the log data of operation behavior 2 that has occurred in the tenth time after the operation behavior 1 is collected next time according to the chronological order of the behavior. In some embodiments, the sampling manner may be random sampling without sequence, for example, first collecting a plurality of log data for all operation behaviors of the database, if the sampling rate is 10%, according to the number Num of the plurality of log data, randomly extracting the number Num of log data, which is 10%, from the plurality of log data as sampling data.
And the second module 12 is used for identifying the sampled data according to classification rules, determining the sensitive classification corresponding to the sampled data, and obtaining the sensitive level information corresponding to the sampled data under the sensitive classification. In some embodiments, the classification rule includes a plurality of predetermined sensitive classifications (for example, personal identity information class, personal property information class) and an identification policy of each sensitive classification, the identification policy is used for identifying whether the sensitive classification corresponding to the sensitive classification is included in the data, a specific identification manner includes, but is not limited to, regular expression identification, keyword identification, model feature identification, and the like, the classification rule further includes a classification policy, the classification policy is used for determining, when the sensitive classification corresponding to the identification policy is included in the data is identified, a corresponding sensitivity level of the data under the sensitive classification, where the sensitivity level may be represented by a numerical form, for example, the larger the numerical value is, the more sensitive or unsafe the corresponding data is described, or the sensitivity level may be represented by a text form, for example, "light sensitive", "medium sensitive", "heavy sensitive", and the like, if the sensitive classification corresponding to a certain identification policy is included in the data is identified according to the classification rule, and then the corresponding sensitivity level of the data under the sensitive classification is determined according to the classification rule is continued to be determined. In some embodiments, the data may be classified into a class of sensitivity, if it is identified that the class of sensitivity corresponding to a certain classification policy is included in the data, the class of sensitivity corresponding to the class of sensitivity is directly used as the class of sensitivity corresponding to the class of sensitivity of the data, or may be classified into a plurality of different classes of sensitivity corresponding to the class of sensitivity, where the class of sensitivity corresponding to the class of sensitivity of the data needs to be determined according to a classification policy, and specific determining methods include, but are not limited to, a semantic analysis method, a keyword extraction method, a model feature method, etc., for example, the class of sensitivity corresponding to the class of sensitivity of the data is determined by a semantic analysis method, or the class of sensitivity corresponding to the class of sensitivity of the data is predetermined to be set, and the class of sensitivity corresponding to the class of sensitivity of the data is determined according to the class of sensitivity mapped by keywords included in the data, or the class of sensitivity corresponding to the class of sensitivity model corresponding to the class of sensitivity that has been trained is input to obtain the class of sensitivity corresponding to the class of sensitivity output by the model.
And a three-module 13, configured to mark sensitive marks for at least one sample data according to the sensitive level information, and store the at least one sample data, where the sensitive marks include sensitive classification and sensitive level information corresponding to each sample data. In some embodiments, sample data whose sensitivity level meets a predetermined level threshold may be determined as sensitive data according to a sensitivity level to which the sample data corresponds, e.g., sample data whose corresponding sensitivity level is greater than or equal to a predetermined level value threshold may be determined as sensitive data, or sample data that is identified as including at least one sensitivity class in a classification rule may also be determined as sensitive data. In some embodiments, sensitive data (i.e., at least one sample data) is marked with a corresponding sensitive label, the sensitive label including a sensitive class and a sensitive level corresponding to the sensitive data, and the sensitive data is stored. In some embodiments, the user may retrieve the stored sensitive data to obtain the demand data, or may also perform asset visualization analysis on the stored sensitive data and export from the report. In some embodiments, external invocation of stored sensitive data and/or export of stored sensitive data is also supported. According to the method and the device, the log data of the database can be intelligently classified and graded through the preset classification and grading rule, the corresponding sensitive marks are automatically marked, and the classification and grading rule used can be flexibly selected, so that an intelligent flow form is realized.
In some embodiments, the one-to-one module 11 is configured to sample log data of a database by using a plug-in deployed on the database to obtain one or more sample data. The related operations are the same as or similar to those of the embodiment shown in fig. 1, and thus are not described in detail herein, and are incorporated by reference.
In some embodiments, the apparatus is further configured to determine a classification rating rule corresponding to the database. The related operations are the same as or similar to those of the embodiment shown in fig. 1, and thus are not described in detail herein, and are incorporated by reference.
In some embodiments, the determining the classification rating rule corresponding to the database includes obtaining a classification rating rule corresponding to the database selected by a user from a plurality of default classification rating rules. The related operations are the same as or similar to those of the embodiment shown in fig. 1, and thus are not described in detail herein, and are incorporated by reference.
In some embodiments, the determining the classification rule corresponding to the database includes determining a stored data feature corresponding to the database by performing semantic analysis on stored data in the database, and determining a classification rule matching the stored data feature according to the stored data feature. The related operations are the same as or similar to those of the embodiment shown in fig. 1, and thus are not described in detail herein, and are incorporated by reference.
In some embodiments, the determining the classification rule matched with the stored data feature according to the stored data feature comprises determining related sensitive scene information corresponding to the database according to the stored data feature, and determining the classification rule matched with the related sensitive information according to the related sensitive scene information. The related operations are the same as or similar to those of the embodiment shown in fig. 1, and thus are not described in detail herein, and are incorporated by reference.
In some embodiments, the classification rule includes a classification policy and a sensitivity level policy, wherein the two modules 12 are configured to identify the sample data according to the classification policy, determine a sensitivity class corresponding to the sample data, and determine sensitivity level information corresponding to the sample data under the sensitivity class according to the sensitivity level policy. The related operations are the same as or similar to those of the embodiment shown in fig. 1, and thus are not described in detail herein, and are incorporated by reference.
In some embodiments, the sensitivity level policy includes first sensitivity level information corresponding to each sensitivity category under the category policy, wherein the determining the sensitivity level information corresponding to the sample data under the sensitivity category according to the sensitivity level policy includes obtaining the first sensitivity level information corresponding to the sensitivity category according to the sensitivity level policy, and determining the sensitivity level information corresponding to the sample data under the sensitivity category according to the first sensitivity level information. The related operations are the same as or similar to those of the embodiment shown in fig. 1, and thus are not described in detail herein, and are incorporated by reference.
In some embodiments, the sensitivity level policy further includes a plurality of sub-sensitivity classifications corresponding to each sensitivity classification under the classification policy and second sensitivity level information corresponding to each sub-sensitivity classification, where the sensitivity level information corresponding to the sampled data under the sensitivity classification is determined according to the first sensitivity level information. The method comprises the steps of determining sub-sensitive classification corresponding to the sampled data under the sensitive classification according to the sensitive level strategy, and determining sensitive level information corresponding to the sampled data under the sensitive classification according to the first sensitive level information and second sensitive level information corresponding to the sub-sensitive classification. The related operations are the same as or similar to those of the embodiment shown in fig. 1, and thus are not described in detail herein, and are incorporated by reference.
In some embodiments, the sensitivity level policy includes a plurality of sub-sensitivity classifications corresponding to each sensitivity classification under the classification policy and third sensitivity level information corresponding to each sub-sensitivity classification, wherein determining the sensitivity level information corresponding to the sample data under the sensitivity classification according to the sensitivity level policy includes determining the sub-sensitivity classification corresponding to the sample data under the sensitivity classification according to the sensitivity level policy, and determining the sensitivity level information corresponding to the sample data under the sensitivity classification according to the third sensitivity level information corresponding to the sub-sensitivity classification. The related operations are the same as or similar to those of the embodiment shown in fig. 1, and thus are not described in detail herein, and are incorporated by reference.
In some embodiments, the device is further configured to obtain a review result of the stored target sample data by the user, and adjust the sensitive label of the target sample data according to the review result if at least one of the sensitive classification and the sensitive level information corresponding to the target sample data is inconsistent with the review result. The related operations are the same as or similar to those of the embodiment shown in fig. 1, and thus are not described in detail herein, and are incorporated by reference.
In some embodiments, the adjusting the sensitivity level of the target sample data includes at least one of modifying a sensitivity classification in the sensitivity level, modifying sensitivity level information in the sensitivity level, removing the sensitivity level for the target sample data, and canceling storing the target sample data. The related operations are the same as or similar to those of the embodiment shown in fig. 1, and thus are not described in detail herein, and are incorporated by reference.
In some embodiments, the apparatus is further configured to adjust a sampling rate of the database based on the sensitivity level information such that new log data of the database is subsequently sampled using the adjusted sampling rate. The related operations are the same as or similar to those of the embodiment shown in fig. 1, and thus are not described in detail herein, and are incorporated by reference.
In addition to the methods and apparatus described in the above embodiments, the present application also provides a computer-readable storage medium storing computer code which, when executed, performs a method as described in any one of the preceding claims.
The application also provides a computer program product which, when executed by a computer device, performs a method as claimed in any preceding claim.
The present application also provides a computer device comprising:
one or more processors;
A memory for storing one or more computer programs;
the one or more computer programs, when executed by the one or more processors, cause the one or more processors to implement the method of any preceding claim.
FIG. 4 illustrates an exemplary system that may be used to implement various embodiments described herein;
In some embodiments, as shown in fig. 4, system 300 can function as any of the devices of the various described embodiments. In some embodiments, system 300 may include one or more computer-readable media (e.g., system memory or NVM/storage 320) having instructions and one or more processors (e.g., processor(s) 305) coupled with the one or more computer-readable media and configured to execute the instructions to implement the modules to perform the actions described in the present application.
For one embodiment, the system control module 310 may include any suitable interface controller to provide any suitable interface to at least one of the processor(s) 305 and/or any suitable device or component in communication with the system control module 310.
The system control module 310 may include a memory controller module 330 to provide an interface to the system memory 315. Memory controller module 330 may be a hardware module, a software module, and/or a firmware module.
The system memory 315 may be used, for example, to load and store data and/or instructions for the system 300. For one embodiment, system memory 315 may include any suitable volatile memory, such as, for example, a suitable DRAM. In some embodiments, the system memory 315 may comprise a double data rate type four synchronous dynamic random access memory (DDR 4 SDRAM).
For one embodiment, system control module 310 may include one or more input/output (I/O) controllers to provide an interface to NVM/storage 320 and communication interface(s) 325.
For example, NVM/storage 320 may be used to store data and/or instructions. NVM/storage 320 may include any suitable nonvolatile memory (e.g., flash memory) and/or may include any suitable nonvolatile storage device(s) (e.g., one or more Hard Disk Drives (HDDs), one or more Compact Disc (CD) drives, and/or one or more Digital Versatile Disc (DVD) drives).
NVM/storage 320 may include storage resources that are physically part of the device on which system 300 is installed or which may be accessed by the device without being part of the device. For example, NVM/storage 320 may be accessed over a network via communication interface(s) 325.
Communication interface(s) 325 may provide an interface for system 300 to communicate over one or more networks and/or with any other suitable device. The system 300 may wirelessly communicate with one or more components of a wireless network in accordance with any of one or more wireless network standards and/or protocols.
For one embodiment, at least one of the processor(s) 305 may be packaged together with logic of one or more controllers (e.g., memory controller module 330) of the system control module 310. For one embodiment, at least one of the processor(s) 305 may be packaged together with logic of one or more controllers of the system control module 310 to form a System In Package (SiP). For one embodiment, at least one of the processor(s) 305 may be integrated on the same die as logic of one or more controllers of the system control module 310. For one embodiment, at least one of the processor(s) 305 may be integrated on the same die with logic of one or more controllers of the system control module 310 to form a system on chip (SoC).
In various embodiments, system 300 may be, but is not limited to being, a server, workstation, desktop computing device, or mobile computing device (e.g., a laptop computing device, handheld computing device, tablet, netbook, etc.). In various embodiments, system 300 may have more or fewer components and/or different architectures. For example, in some embodiments, system 300 includes one or more cameras, keyboards, liquid Crystal Display (LCD) screens (including touch screen displays), non-volatile memory ports, multiple antennas, graphics chips, application Specific Integrated Circuits (ASICs), and speakers.
It should be noted that the present application may be implemented in software and/or a combination of software and hardware, e.g., using Application Specific Integrated Circuits (ASIC), a general purpose computer or any other similar hardware device. In one embodiment, the software program of the present application may be executed by a processor to perform the steps or functions described above. Likewise, the software programs of the present application (including associated data structures) may be stored on a computer readable recording medium, such as RAM memory, magnetic or optical drive or diskette and the like. In addition, some steps or functions of the present application may be implemented in hardware, for example, as circuitry that cooperates with the processor to perform various steps or functions.
Furthermore, portions of the present application may be implemented as a computer program product, such as computer program instructions, which when executed by a computer, may invoke or provide methods and/or techniques in accordance with the present application by way of operation of the computer. Those skilled in the art will appreciate that the existence of computer program instructions in a computer-readable medium includes, but is not limited to, source files, executable files, installation package files, and the like, and accordingly, the manner in which computer program instructions are executed by a computer includes, but is not limited to, the computer directly executing the instructions, or the computer compiling the instructions and then executing the corresponding compiled programs, or the computer reading and executing the instructions, or the computer reading and installing the instructions and then executing the corresponding installed programs. Herein, a computer-readable medium may be any available computer-readable storage medium or communication medium that can be accessed by a computer.
Communication media includes media whereby a communication signal containing, for example, computer readable instructions, data structures, program modules, or other data, is transferred from one system to another. Communication media may include conductive transmission media such as electrical cables and wires (e.g., optical fibers, coaxial, etc.) and wireless (non-conductive transmission) media capable of transmitting energy waves, such as acoustic, electromagnetic, RF, microwave, and infrared. Computer readable instructions, data structures, program modules, or other data may be embodied as a modulated data signal, for example, in a wireless medium, such as a carrier wave or similar mechanism, such as that embodied as part of spread spectrum technology. The term "modulated data signal" means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. The modulation may be analog, digital or hybrid modulation techniques.
By way of example, and not limitation, computer-readable storage media may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. For example, computer-readable storage media include, but are not limited to, volatile memory such as random access memory (RAM, DRAM, SRAM), and non-volatile memory such as flash memory, various read-only memory (ROM, PROM, EPROM, EEPROM), magnetic and ferromagnetic/ferroelectric memory (MRAM, feRAM), and magnetic and optical storage devices (hard disk, tape, CD, DVD), or other now known or later developed media capable of storing computer-readable information/data for use by a computer system.
An embodiment according to the application comprises an apparatus comprising a memory for storing computer program instructions and a processor for executing the program instructions, wherein the computer program instructions, when executed by the processor, trigger the apparatus to operate a method and/or a solution according to the embodiments of the application as described above.
It will be evident to those skilled in the art that the application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. A plurality of units or means recited in the apparatus claims can also be implemented by means of one unit or means in software or hardware. The terms first, second, etc. are used to denote a name, but not any particular order.