Invention content
The technical assignment of the present invention is to be directed to the above shortcoming, is provided a kind of highly practical, de- to government data resourceQuick method and system.
The present invention provides a kind of method to desensitize to government data resource, and realization process is:
Step 1: initially set up desensitization rule base, storage desensitization information establish rules really then, desensitization rule;
Step 2: establishing sensitive dictionary, it is arranged and stores sensitive data information;
Step 3: obtaining government data, according to information in sensitive dictionary, detection data susceptibility, and the sensitive word to detectingData desensitize according to the rule stored in sensitive pattern library.
In the step 1, the desensitization rule stored in the rule base that desensitizes is specially:
Sensitive data is determined first;
Then will desensitization rule setting at two kinds of general rule and complex rule, and the data after desensitization are divided into can restore andIrrecoverable two class, the generality rule refers to that single curing process is carried out to sensitive data, and complex rule refers to quickSense data are combined the fixed line processing of formula;The recoverable data classification refers to that can revert to original sensitive data, noRecoverable data classification refers to that can not revert to original sensitive data.
In the step of determining desensitization data, can Manual definition complete, the sensitive data of the Manual definition includes:SurnameName, ID card No., address, telephone number, Bank Account Number, email address, affiliated city, postcode, cipher type, institution termTitle, business license number, account No., trade date, transaction amount.
It is described desensitization rule in curing process include data replacement, data rearrangement, data encryption, data truncation,Data mask and date deviate rounding, and wherein data truncation refers to a member-retaining portion data, and remainder data is deleted;Data mask isRefer to and same data message is not shown, is replaced with the letter including " XXX ";Date offset rounding refers to when only retaining integral pointBetween.
The sensitive data information stored in the sensitivity dictionary includes sensitive word information, privacy field information, concerning security matters fieldInformation, red white black list information, wherein Red List refer to sensitive enterprise, individual;White list refers to can wide-open enterprise, individual;Blacklist refers to enterprise of problems, individual.
Detection data susceptibility in the step 3 refers to that detected rule is arranged by the information in sensitive dictionary, rightData privacy and concerning security matters do detection and output data examining report, and the setting of the detected rule includes:Privacy field is setIt sets, the setting of concerning security matters field, sensitive word is arranged, red white blacklist is arranged.
Further include step 4 after the completion of step 3:The step of regular or random risk supervision, that is, pass through self defined timePeriod carries out susceptibility detection to last time desensitization data again, and the sensitive word data to detecting are again according to sensitive patternThe rule stored in library desensitizes.
A kind of system to the desensitization of government data resource, including,
Sensitive pattern library, storage desensitization information establish rules really then, desensitization rule;
Sensitive dictionary, for being arranged and storing sensitive data information;
Desensitize module, for the government data of acquisition, according to information in sensitive dictionary, and detection data susceptibility, and to detectionThe sensitive word data gone out desensitize according to the rule stored in sensitive pattern library.
In the desensitization module, configured with detection unit, desensitization unit, wherein
Detection unit is used to, by the information in sensitive dictionary, detected rule be arranged, detect data privacy and concerning security mattersAnd output data examining report, the setting of the detected rule include:Privacy field setting, the setting of concerning security matters field, sensitive word are setIt sets, red white blacklist setting;
The sensitive data that desensitization unit is used to detect detection unit carries out desensitization process, the processing be single curing process orCombined type curing process, the curing process include that data replacement, data rearrangement, data encryption, data truncation, data are coveredCode and date deviate rounding, and wherein data truncation refers to a member-retaining portion data, and remainder data is deleted;Data mask refers to portionDivided data information is not shown, is replaced with the letter including " XXX ";Date offset rounding refers to only retaining the integral point time.
It is also configured with timing unit in the desensitization module, the timing unit is for periodically starting risk supervision, i.e., logicalThe time cycle of self-defined timely unit is spent, susceptibility detection, and the sensitivity to detecting are carried out to last time desensitization data againWord data desensitize according to the rule stored in sensitive pattern library again.
A kind of method and system to the desensitization of government data resource of the present invention, have the following advantages:
A kind of method and system to the desensitization of government data resource of the present invention, the government information resources to relating to quick concerning security matters carry outDesensitization is carried out sensitive information by the rule that desensitizes the deformation of data, the reliably protecting of privacy-sensitive data is realized, so as to government affairsThe data providing and data user that information resources open eliminate open data and relate to the trouble and worry of quick concerning security matters, and then push political affairsBusiness data resource is open and the breadth and depth that utilizes, to convenient in exploitation, test and other nonproductive environment and outsourcingIt is highly practical safely using the real data set after desensitization in environment, it is applied widely, it is easy to spread.
Specific implementation mode
In order to enable those skilled in the art to better understand the solution of the present invention, with reference to the accompanying drawings and detailed descriptionThe present invention is described in further detail.Obviously, described embodiments are only a part of the embodiments of the present invention, rather thanWhole embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not making creative work premiseLower obtained every other embodiment, shall fall within the protection scope of the present invention.
As mobile Internet, cloud computing, the rapid development of Internet of Things and innovation and application, big data are deeply infiltrated into economyEach field of society, becomes strategic assets.Government, enterprise etc. strengthen big data processing capacity one after another, and China is it is also proposed that will expandGovernment data resource is open to the society, explores big data application service.But meanwhile stealing business secret, leakage personal information etc.Behavior is also accompanied, and government data open surface faces security threat.
Government data resources open is related to national security, trade secret, the data of individual privacy not from the angle of policyCan be open, or be opened after desensitization, but till now each department do not formed still special data desensitization technology withSpecification.
This patent mainly provide it is a kind of to the data resource of individual privacy and trade secret involved in government data resource intoThe technology of row desensitization.
As shown in Fig. 1, the present invention provides a kind of method to desensitize to government data resource, and realization process is:
Step 1: initially set up desensitization rule base, storage desensitization information establish rules really then, desensitization rule;
Step 2: establishing sensitive dictionary, it is arranged and stores sensitive data information;
Step 3: obtaining government data, according to information in sensitive dictionary, detection data susceptibility, and the sensitive word to detectingData desensitize according to the rule stored in sensitive pattern library.
In the step 1, the desensitization rule stored in the rule base that desensitizes is specially:
Sensitive data is determined first;
Then will desensitization rule setting at two kinds of general rule and complex rule, and the data after desensitization are divided into can restore andIrrecoverable two class, the generality rule refers to that single curing process is carried out to sensitive data, and complex rule refers to quickSense data are combined the fixed line processing of formula;The recoverable data classification refers to that can revert to original sensitive data, noRecoverable data classification refers to that can not revert to original sensitive data.
More specifically, establish desensitization rule base be to provide for privacy information rule, relevant law rule management andSetting.
The sensitive data of definition is also known as private data, and common sensitive data has:Name, ID card No., address, electricityTalk about number, Bank Account Number, email address, affiliated city, postcode, cipher type (such as account inquiries password, withdrawal password, loginPassword etc.), organization's title, business license number, account No., trade date, transaction amount etc..User can basisBusiness actual conditions Manual definition's sensitive data.
Desensitization rule base should be divided into general rule and complex rule, and general rule requires to be cured, can be directIt calls and carries out data desensitization, complex rule requires have stronger combined and applicability, flexibly can arrange in pairs or groups and adjust, parameterIt is configurable, model reusable.
Desensitization rule classification is that can restore and irrecoverable two class.It can restore class, the data referred to after desensitization can be by oneFixed mode, reverts to original sensitive data, and such desensitization rule refers mainly to all kinds of enciphering and deciphering algorithm rules;It is irrecoverableClass, referring to the part that the data after desensitization are desensitized makes all recover in any way.It generally can be divided into and replace algorithm and lifeAt algorithm two major classes.Replacing algorithm will need the part desensitized to be replaced using the character or character string defined, generate class and calculateRule is more more complicated, it is desirable that the data fit logic rules after desensitization are " seeming very true false data ".
Data desensitization follows following two principle:It is the application after desensitization as far as possible, retains the significant letter before desensitizationBreath;Prevent hacker from cracking to the full extent.
It is described desensitization rule in curing process include data replacement, data rearrangement, data encryption, data truncation,Data mask and date deviate rounding, and wherein data truncation refers to a member-retaining portion data, and remainder data is deleted;Data mask isRefer to and same data message is not shown, is replaced with the letter including " XXX ";Date offset rounding refers to when only retaining integral pointBetween.
Further, above-mentioned curing process is described as follows:
It replaces:It is such as unified that name in an account book for women is replaced with into A, information integrity can be kept completely to internal staff, but easily crack.
It resets:Serial number 12345 is rearranged to 54321, is upset in a certain order, like " replacement ", can needPrime information is conveniently gone back when wanting, but is equally easily cracked.
Encryption:Number 12345 is encrypted as 23456, and safe coefficient depends on which kind of Encryption Algorithm used, generally according to realityDepending on situation.
It blocks:13811001111 to block be 138, gives up necessary information to ensure the ambiguity of data, is more commonly usedDesensitization method, but it is often not friendly enough to producing.
Mask: 123456 ->1xxxx6 remains partial information, and ensure that the length invariance of information, to letterBreath holder is more easy to distinguish, such as the identity information on train ticket.
Date deviates rounding:20130520 12:30:45 -> 20130520 12:00:00, give up precision to ensure originalThe safety of beginning data, general such method can protect the Annual distribution density of data.
The sensitive data information stored in the sensitivity dictionary includes sensitive word information, privacy field information, concerning security matters fieldInformation, red white black list information, wherein sensitive word include containing political orientation, violent tenet, unhealthy, uncultivated term,Or the term based on self-demand setting;Privacy field refer to enterprise, individual privacy data field or term;Concerning security mattersField is to be related to tissue, personal secret information;Red List refers to sensitive enterprise, individual etc.;White list refer to can open completely enterprise,Individual etc.;Blacklist refers to enterprise of problems, individual etc..
Privacy rule library can be also configured in sensitive dictionary, the pipe for providing privacy information rule, relevant law ruleReason and setting.
Detection data susceptibility in the step 3 refers to that detected rule is arranged by the information in sensitive dictionary, rightData privacy and concerning security matters do detection and output data examining report, and the setting of the detected rule includes:Privacy field is setIt sets, the setting of concerning security matters field, sensitive word is arranged, red white blacklist is arranged.
Further include step 4 after the completion of step 3:The step of regular or random risk supervision, that is, pass through self defined timePeriod carries out susceptibility detection to last time desensitization data again, and the sensitive word data to detecting are again according to sensitive patternThe rule stored in library desensitizes.
After data desensitization process, it is known that sensitive information be hidden and handled, but desensitize after data due toMaintain the information such as part statistical nature and the structure feature of initial data, it would still be possible to which there are certain sensitive informations to leak windDanger.Therefore, there is still a need for taking suitable mode to control knows range, by appropriate safety management means, prevent outside dataIt lets out.Here form periodically or non-periodically, the mode formulated detection scheme, export examining report, further detection data are usedSafety, prevention and control sensitive data leaks.
In addition, Security Auditing Mechanism need to be added in each stage of data desensitization, stringent, detailed recorded data processing procedureIn relevant information, formed partial data processing record, for contingency question investigation with data tracing analyze, once divulge a secretEvent, which can trace back to, to be occurred in which data processing link.
As shown in Fig. 2, a kind of system to the desensitization of government data resource, including,
Sensitive pattern library, storage desensitization information establish rules really then, desensitization rule;
Sensitive dictionary, for being arranged and storing sensitive data information;
Desensitize module, for the government data of acquisition, according to information in sensitive dictionary, and detection data susceptibility, and to detectionThe sensitive word data gone out desensitize according to the rule stored in sensitive pattern library.
In the desensitization module, configured with detection unit, desensitization unit, wherein
Detection unit is used to, by the information in sensitive dictionary, detected rule be arranged, detect data privacy and concerning security mattersAnd output data examining report, the setting of the detected rule include:Privacy field setting, the setting of concerning security matters field, sensitive word are setIt sets, red white blacklist setting;
The sensitive data that desensitization unit is used to detect detection unit carries out desensitization process, the processing be single curing process orCombined type curing process, the curing process include that data replacement, data rearrangement, data encryption, data truncation, data are coveredCode and date deviate rounding, and wherein data truncation refers to a member-retaining portion data, and remainder data is deleted;Data mask refers to portionDivided data information is not shown, is replaced with the letter including " XXX ";Date offset rounding refers to only retaining the integral point time.
It is also configured with timing unit in the desensitization module, the timing unit is for periodically starting risk supervision, i.e., logicalThe time cycle of self-defined timely unit is spent, susceptibility detection, and the sensitivity to detecting are carried out to last time desensitization data againWord data desensitize according to the rule stored in sensitive pattern library again.
The present invention provides a kind of effective solution for the desensitization DecryptDecryption of data in government data resources openAnd technological means, compensate for the defect without clearly desensitization rule and technology during government information resources open.
Practicability of the present invention is stronger, applied widely, and easy to spread, pushes government data resources open to rangeAnd depth development.
Above-mentioned specific implementation mode is only the specific case of the present invention, and scope of patent protection of the invention includes but not limited toAbove-mentioned specific implementation mode, a kind of claim of any method and system to desensitize to government data resource for meeting the present inventionThe appropriate change or replacement that the those of ordinary skill of book and any technical field does it should all fall into the present invention'sScope of patent protection.