Specific embodiment
Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related toWhen attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodimentDescribed in embodiment do not represent all embodiments consistent with this specification.On the contrary, they are only and such as instituteThe example of the consistent device and method of some aspects be described in detail in attached claims, this specification.
It is only to be not intended to be limiting this explanation merely for for the purpose of describing particular embodiments in the term that this specification usesBook.The "an" of used singular, " described " and "the" are also intended to packet in this specification and in the appended claimsMost forms are included, unless the context clearly indicates other meaning.It is also understood that term "and/or" used herein isRefer to and includes that one or more associated any or all of project listed may combine.
It will be appreciated that though various information may be described using term first, second, third, etc. in this specification, butThese information should not necessarily be limited by these terms.These terms are only used to for same type of information being distinguished from each other out.For example, not taking offIn the case where this specification range, the first information can also be referred to as the second information, and similarly, the second information can also be claimedFor the first information.Depending on context, word as used in this " if " can be construed to " ... when " or" when ... " or " in response to determination ".
Fig. 1 is a kind of recognition methods of open platform abnormal data access shown in one exemplary embodiment of this specificationFlow diagram.
The recognition methods of the open platform abnormal data access can be applied in open platform, the object of the open platformManaging carrier is usually server or server cluster.
The open platform is externally provided with data access interface, and the third-party institutions such as enterprise, trade company, government can pass through instituteIt states data access interface and accesses the open platform.
From the perspective of business realizing, on the one hand, the third-party institution can provide a user by the open platformThree party service, for example, open platform can provide third party's ingress for service in its client, user can pass through third party's businessEntrance realizes third party's business.
On the other hand, the third-party institution can obtain user data from the open platform, for example, obtaining business operation userIdentity information, to carry out authentication etc. to user.
As an example it is assumed that the third-party institution is certain hospital, which is accessed by the data access interface of open platform.Open platform can provide hospital business entrance in its client, can be realized by the portal users and register, pay the fees on line, checkingThe hospitals such as result of laboratory test related service.During user realizes above-mentioned business, hospital can utilize user in open platformThe information such as identification card number, cell-phone number carry out authentication to user.
From the perspective of data flow, for open platform, user is right during realizing third party's businessThe data access of open platform is also based on data access interface used in third party's service access open platform.In other words,For open platform, the data access behavior of third party's business includes the data access behavior of third party's business itself,Pass through the data access behavior that third party's business is carried out including user.
Referring to FIG. 1, the recognition methods of the open platform abnormal data access can comprise the following steps that
Step 102, for third party's business, the original access information of its target data access behavior, the target are obtainedData access behavior includes the access behavior that user carries out data access based on third party's business.
It in the present embodiment, whether is abnormal data access behavior in the target data access behavior for judging third party's businessWhen, the original access information of available its target data access behavior.
Wherein, whether target data access behavior is that can set in advance on judgement opportunity of abnormal data access behaviorIt sets, such as: it can judge that the period was judged according to 24 hours, 48 hours etc., it can also be in the judgement for receiving administrator and issuingJudged when instruction, this specification is not particularly limited this.
The original access information may include: access time point, access data volume, access user login name, access useFamily permanent residence etc..
Step 104, the original access information is quantified as to the target access characteristic parameter of multiple dimensions.
Based on abovementioned steps 102, after getting the original access information, it can be based on preset dimension, it will be describedOriginal access information is quantified as the access characteristic parameter of multiple dimensions, for convenient for distinguish, in the present specification, can by the access spySign parameter is known as target access characteristic parameter.
Wherein, the preset dimension can be configured in advance by developer, for example, the preset dimension can be withIt include: data access dimension, business operation dimension, access user characteristics dimension etc..
Step 106, the identification model trained is inputted using the target access characteristic parameter of the multiple dimension as ginseng is entered.
Step 108, judge whether the target data access behavior is abnormal according to the output result of the identification model.
In the present embodiment, the identification model can be to have monitor model, for example, neural network model etc..The knowledgeOther model may be unsupervised model, for example, Isolation Forest (Algorithm of Outliers Detection) model, Clustering ModelDeng.
In the present embodiment, based on different identification models, the judgment mode of the output result is not also identical, for example,According to Isolation Forest model, it can determine whether the scoring of output is greater than threshold value, if more than threshold value, it may be determined that targetData access abnormal behavior.Those skilled in the art can judge according to the characteristics of identification model, and this specification is herein notIt repeats one by one again.
The present embodiment combines the access information of access user to visit the data of third party's business it can be seen from above descriptionThe behavior of asking is identified, the accuracy rate of abnormal data access Activity recognition is greatly improved.
This specification is described in terms of the training of identification model, the application two for the identification model trained separately belowSpecific implementation process.
One, the training of identification model
In the present embodiment, in training identification model, original identification model can first be selected.The original identification model canThinking has monitor model, or unsupervised model is described by taking unsupervised model as an example below.
In the present embodiment, the history access information of different third party's business in a period of time can first be obtained.The historyAccess information is the access information that each third party's business calls data access interface progress data access in history, it may include: it visitsAsk time point, access data volume, the institute possession for accessing data, the login name for accessing user, the age for accessing user etc..Wherein,The access user is the user based on open platform sign-on access third party's business.
After getting the history access information, the history access information can be quantified as to the history of multiple dimensionsAccess characteristic parameter.
The dimension of the history access characteristic parameter can be configured by developer according to business characteristic in advance, subsequentIt can also be increased again according to business characteristic, be adjusted, this specification is not particularly limited this.
In one example, the target access characteristic parameter of the multiple dimension may include one or more of:
1, data access parameters.The data access parameters may include: data access total amount, sensitive data amount of access ratioExample, abnormal period amount of access ratio etc..
Wherein, the data access total amount can be the total number of access number evidence.Assuming that certain third party's business access 50,000Data, then data access total amount is 50,000.
The sensitive data amount of access ratio is the ratio of sensitive data amount and data access total amount in the data of access.InstituteIt states sensitive data and is often referred to include data, such as identification card number, cell-phone number of user privacy information etc..
It will again be assumed that data access total amount is 50,000, and if there is 10,000 sensitive datas in this 50,000 data, sensitive data accessAmount ratio is 1/5.
It is understood that sensitive data amount of access ratio is higher, illustrate that data access behavior is the possibility of abnormal accessProperty it is bigger, such as black production by the loophole of third party's business steals privacy of user data etc. from open platform.
Abnormal period amount of access ratio is the data volume and data access total amount of abnormal period access in the data of accessRatio.The abnormal period can be preset.
By taking third party's business is hospital business as an example, between data access behavior focuses mostly at work, if when inoperativeBetween data access amount increase sharply, then be usually abnormal data access.
Assuming that when -6 when 0,18 when -24 when be abnormal period, data access total amount is 50,000, if amount of access when -6 when 0 isAmount of access when -24 when 10000,18 is 20,000, then abnormal period amount of access ratio is 1/5 and 2/5 respectively.It is of course also possible to by thisTwo abnormal access period joint accounts, then abnormal period amount of access ratio is 3/5.
In the present embodiment, the data access parameters can also be other and the visit such as data access amount in the unit timeThe relevant parameter of the amount of asking, this specification are not particularly limited this.
2, business operation parameter.The business operation parameter includes: the business operation amount of data access amount and access userAlignment parameters, such as ratio etc..Wherein, the business operation amount of the access user can be the access user and access thirdThe quantity of square service chaining.
It is normal to access user when the operation such as registering on carrying out line still by taking third party's business is hospital business as an example, lead toOther links often can be also accessed, such as link, access doctor's brief introduction link etc. are introduced by access department.If legitimate user is usurped in black productionAccount log in, then the private data of user would generally be directly accessed, without accessing third party's service chaining or less accessThird party's service chaining.
The present embodiment can be using the alignment parameters of data access amount and the business operation amount of access user as abnormal number as a result,According to one of the judgment basis of access behavior.
It is assumed that access user is 100 in the data access amount of open platform, and accesses user and access third party's service chainingQuantity be 1000, then the business operation parameter be 100/1000, i.e., 1/10.If accessing user accesses third party's service chainingQuantity be 200, then the business operation parameter be 1/2.It is understood that the business operation parameter is bigger, data are visitedA possibility that behavior of asking is abnormal access will be bigger.
3, user characteristics parameter is accessed.The access user characteristics parameter may include: similar login name ratio, age pointCloth parameter, permanent residence distribution parameter log in ground distribution parameter, any active ues ratio, aggregation login ratio, registion time distributionParameter etc..
1) similar login name ratio
The similar login name ratio can be the quantity of similar login name access user and access the ratio of total number of users amountValue.
In the present embodiment, can judge whether the login name for accessing user is similar by modes such as semantics recognitions, thenThe quantity of similar login name access user is counted, then calculates the quantity of similar login name access user and accesses total number of users amountRatio.
For example, black production is registered a large amount of rubbish account and is logged in, login name be respectively " xxx001@qq.com,Xxx002@qq.com, xxx003@qq.com... " can calculate the visit of similar login name by the identification to login name semantemeAsk that number of users is 200, and accessing total number of users amount is 300, then similar login name ratio is 2/3.It is understood that similarA possibility that login name ratio is higher, and data access behavior is abnormal access will be bigger.
2) age distribution parameter
In the present embodiment, the age distribution parameter can access quantity and the access user of user for different age groupThe ratio of total quantity.
For example, the quantity that access total number of users amount is 1000,0-24 years old access user is 100,24-45 years old access number of usersAmount is 700, and the quantity of 45 years old or more access user is 200, then the age distribution parameter are as follows: 1/10,7/10,2/10.
In the present embodiment, it is to be understood that may be black production if the access number of users of some age bracket increases sharplyIllegal purchase user account carries out aggregation login, for example, black produce the account for having purchased a collection of university student, then concentrates to log in and stealTake family private data.
3) permanent residence distribution parameter
Third party's business usually has area attribute, for example, the hospital's main services permanent residence for being located at Hangzhou is Zhejiang ProvinceUser, Beijing common reserve fund inquiry business towards the permanent residence of user be Beijing.
In the present embodiment, the top n user permanent province more using third party's business can be first determined as unit of savingPart, it then can calculate the access number of users ratio in this N number of province.Under normal conditions, the proportional numerical value is larger, more surelyIt is fixed.
As an example it is assumed that third party's business is the hospital of Hangzhou one, the value of N is 2, uses business on hospital's lineThe permanent province of the first two user be Zhejiang Province, Jiangsu Province respectively.Under normal circumstances, access user's permanent residence is Zhejiang Province and riverThe ratio of Su Sheng is 80%.If the ratio that access user's permanent residence is Zhejiang Province and Jiangsu Province is 20%, then very likely depositingBehavior is accessed in abnormal data.For example, there is a large amount of Qinghai users using.
It is, of course, also possible to determine the permanent residence distribution parameter using other calculations, this specification does not make spy to thisDifferent limitation.
4) ground distribution parameter is logged in
It is similar with permanent residence distribution parameter, under normal circumstances, the login of third party business access user usually withThree party service location is identical, for example, the login of the access user of Zhejiang Hospital it is most of be all Zhejiang Province.
In the present embodiment, the ratio of the access user of different-place login can be calculated as login ground distribution parameter.As an example it is assumed that access user's total amount of certain Zhejiang Hospital is 1000 people, logging in the access number of users that ground is Zhejiang Province is100 people, then the login distribution parameter be 9/10, there are abnormal data access behavior probability it is larger.
5) any active ues ratio
Any active ues ratio is to access the ratio of any active ues in user.It is understood that any active ues ratioSmaller, a possibility that data access behavior is abnormal access, will be bigger.
6) aggregation logs in ratio
It is the identical access user's ratio for logging in environment that the aggregation, which logs in ratio,.
In the present embodiment, the login environment of access user can be obtained, under normal circumstances, the identical access for logging in environmentNumber of users is less, and ratio is lower.If ratio is higher, illustrate to log in the identical access user of environment in the presence of a large amount of, there are exceptionsThe probability of data access behavior is higher.
The login environment may include: logging device ID, logging device SSID (Service Set Identifier,Service set) etc., this specification is not particularly limited this.
7) registion time distribution parameter
It is similar with permanent residence distribution parameter, registion time can be divided into several sections in advance, determine and use third partyThen the more preceding M registion time section of business can calculate the access number of users ratio in this M registion time section, thisThis is no longer going to repeat them for specification.
It in the present embodiment, can be with after the history that history access information is quantified as to multiple dimensions accesses characteristic parameterOriginal identification model is trained using history access characteristic parameter, the identification model after being trained.
In the present embodiment, using unsupervised model as original identification model, without to the history access as sampleInformation is labeled, and saves a large amount of process resources, while also can effectively avoid cold start-up problem.
It in one example, is the accuracy for ensuring identification model, it, can be artificial right after being trained to identification modelIdentification model after training is detected.
It in another example, is the accuracy for ensuring identification model, after accumulating a certain amount of history access information again,Again identification model can also be trained, this specification is not particularly limited this.
Two, the application for the identification model trained
In one example, the data access behavior of each third party's business can be obtained according to 24 hours time cyclesAccess information, for ease of description, the access information can be known as to original access information.
Wherein, the original access information may include: access time point, access data volume, access user login name, visitAsk user's permanent residence etc..
The above-mentioned time cycle may be 48 hours, 36 hours etc., and this specification is not particularly limited this.
In the present embodiment, after getting the original access information, the original access information can be quantified as moreThe target access characteristic parameter of a dimension.The dimension and quantizing rule of the target access characteristic parameter can refer to aforementioned knowledgeThe training process of other model, this is no longer going to repeat them for this specification.
In the present embodiment, it can be inputted using the target access characteristic parameter of the multiple dimension after quantization as ginseng has been enteredTrained identification model, and judge whether the target data access behavior is abnormal according to the output result of identification model.
For example, can daily zero when obtain the previous day (24 hours) each hospital original access information, for each doctorThe original access information of institute can be quantified as the target access characteristic parameter of multiple dimensions, and can be by the multiple dimensionTarget access characteristic parameter as the identification model trained of ginseng input is entered, then can be sentenced according to the output result of identification modelWhether the hospital of breaking is abnormal in the data access behavior of the previous day.If abnormal, administrator can be prompted to check.
Corresponding with the embodiment of recognition methods of aforementioned open platform abnormal data access, this specification additionally provides outIt is laid flat the embodiment of the identification device of platform abnormal data access.
The embodiment of the identification device of this specification open platform abnormal data access can be using on the server.DeviceEmbodiment can also be realized by software realization by way of hardware or software and hardware combining.Taking software implementation as an example,It is by the processor of server where it by meter corresponding in nonvolatile memory as the device on a logical meaningCalculation machine program instruction is read into memory what operation was formed.For hardware view, as shown in Fig. 2, open flat for this specificationPlatform abnormal data access identification device where server a kind of hardware structure diagram, in addition to processor shown in Fig. 2, memory,Except network interface and nonvolatile memory, the reality of server in embodiment where device generally according to the serverBorder function can also include other hardware, repeat no more to this.
Fig. 3 is a kind of identification device of open platform abnormal data access shown in one exemplary embodiment of this specificationBlock diagram.
Referring to FIG. 3, the identification device 200 of open platform abnormal data access can be applied shown in the earlier figures 2Server in, include: acquiring unit 201, quantifying unit 202, input unit 203 and judging unit 204.
Wherein, acquiring unit 201 obtain the original access of its target data access behavior for third party's businessInformation, the target data access behavior include the access behavior that user carries out data access based on third party's business;
The original access information is quantified as the target access characteristic parameter of multiple dimensions by quantifying unit 202;
Input unit 203 inputs the identification trained using the target access characteristic parameter of the multiple dimension as ginseng is enteredModel;
Judging unit 204 judges whether the target data access behavior is different according to the output result of the identification modelOften.
Optionally, the training process of the identification model, comprising:
Obtain the history access information of each third party's business historical data access behavior;
The history that the history access information is quantified as multiple dimensions is accessed into characteristic parameter;
It accesses characteristic parameter according to the history to be trained original identification model, the identification model trained.
Optionally, the identification model is unsupervised model.
Optionally, the target access characteristic parameter of the multiple dimension includes one or more of:
The data access parameters of the target data access behavior;
The business operation parameter of the target data access behavior;
The access user characteristics parameter of the target data access behavior.
Optionally, the data access parameters include one or more of:
Data access total amount, sensitive data amount of access ratio, abnormal period amount of access ratio.
Optionally, the business operation parameter includes: the comparison ginseng of data access amount and the business operation amount of access userNumber;
The business operation amount is the quantity that the access user accesses third party's service chaining.
Optionally, the access user characteristics parameter includes one or more of:
Similar login name ratio, permanent residence distribution parameter, logs in ground distribution parameter, any active ues ratio at age distribution parameterExample, aggregation log in ratio, registion time distribution parameter.
The function of each unit and the realization process of effect are specifically detailed in the above method and correspond to step in above-mentioned apparatusRealization process, details are not described herein.
For device embodiment, since it corresponds essentially to embodiment of the method, so related place is referring to method realityApply the part explanation of example.The apparatus embodiments described above are merely exemplary, wherein described be used as separation unitThe unit of explanation may or may not be physically separated, and component shown as a unit can be or can also be withIt is not physical unit, it can it is in one place, or may be distributed over multiple network units.It can be according to actualThe purpose for needing to select some or all of the modules therein to realize this specification scheme.Those of ordinary skill in the art are notIn the case where making the creative labor, it can understand and implement.
System, device, module or the unit that above-described embodiment illustrates can specifically realize by computer chip or entity,Or it is realized by the product with certain function.A kind of typically to realize that equipment is computer, the concrete form of computer canTo be personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media playIn device, navigation equipment, E-mail receiver/send equipment, game console, tablet computer, wearable device or these equipmentThe combination of any several equipment.
Corresponding with the embodiment of recognition methods of aforementioned open platform abnormal data access, this specification also provides one kindThe identification device of open platform abnormal data access, the device include: processor and for storing machine-executable instructionMemory.Wherein, processor and memory are usually connected with each other by internal bus.In other possible implementations, instituteIt states equipment and is also possible that external interface, can be communicated with other equipment or component.
In the present embodiment, the knowledge with the access of open platform abnormal data stored by reading and executing the memoryThe corresponding machine-executable instruction of other logic, the processor are prompted to:
For third party's business, the original access information of its target data access behavior, the target data are obtainedAccess behavior includes the access behavior that user carries out data access based on third party's business;
The original access information is quantified as to the target access characteristic parameter of multiple dimensions;
The identification model trained is inputted using the target access characteristic parameter of the multiple dimension as ginseng is entered;
Judge whether the target data access behavior is abnormal according to the output result of the identification model.
Optionally, the training process of the identification model, comprising:
Obtain the history access information of each third party's business historical data access behavior;
The history that the history access information is quantified as multiple dimensions is accessed into characteristic parameter;
It accesses characteristic parameter according to the history to be trained original identification model, the identification model trained.
Optionally, the identification model is unsupervised model.
Optionally, the target access characteristic parameter of the multiple dimension includes one or more of:
The data access parameters of the target data access behavior;
The business operation parameter of the target data access behavior;
The access user characteristics parameter of the target data access behavior.
Optionally, the data access parameters include one or more of:
Data access total amount, sensitive data amount of access ratio, abnormal period amount of access ratio.
Optionally, the business operation parameter includes: the comparison ginseng of data access amount and the business operation amount of access userNumber;
The business operation amount is the quantity that the access user accesses third party's service chaining.
Optionally, the access user characteristics parameter includes one or more of:
Similar login name ratio, permanent residence distribution parameter, logs in ground distribution parameter, any active ues ratio at age distribution parameterExample, aggregation log in ratio, registion time distribution parameter.
Corresponding with the embodiment of recognition methods of aforementioned open platform abnormal data access, this specification also provides one kindComputer readable storage medium is stored with computer program on the computer readable storage medium, which is held by processorIt is performed the steps of when row
For third party's business, the original access information of its target data access behavior, the target data are obtainedAccess behavior includes the access behavior that user carries out data access based on third party's business;
The original access information is quantified as to the target access characteristic parameter of multiple dimensions;
The identification model trained is inputted using the target access characteristic parameter of the multiple dimension as ginseng is entered;
Judge whether the target data access behavior is abnormal according to the output result of the identification model.
Optionally, the training process of the identification model, comprising:
Obtain the history access information of each third party's business historical data access behavior;
The history that the history access information is quantified as multiple dimensions is accessed into characteristic parameter;
It accesses characteristic parameter according to the history to be trained original identification model, the identification model trained.
Optionally, the identification model is unsupervised model.
Optionally, the target access characteristic parameter of the multiple dimension includes one or more of:
The data access parameters of the target data access behavior;
The business operation parameter of the target data access behavior;
The access user characteristics parameter of the target data access behavior.
Optionally, the data access parameters include one or more of:
Data access total amount, sensitive data amount of access ratio, abnormal period amount of access ratio.
Optionally, the business operation parameter includes: the comparison ginseng of data access amount and the business operation amount of access userNumber;
The business operation amount is the quantity that the access user accesses third party's service chaining.
Optionally, the access user characteristics parameter includes one or more of:
Similar login name ratio, permanent residence distribution parameter, logs in ground distribution parameter, any active ues ratio at age distribution parameterExample, aggregation log in ratio, registion time distribution parameter.
It is above-mentioned that this specification specific embodiment is described.Other embodiments are in the scope of the appended claimsIt is interior.In some cases, the movement recorded in detail in the claims or step can be come according to the sequence being different from embodimentIt executes and desired result still may be implemented.In addition, process depicted in the drawing not necessarily require show it is specific suitableSequence or consecutive order are just able to achieve desired result.In some embodiments, multitasking and parallel processing be also canWith or may be advantageous.
The foregoing is merely the preferred embodiments of this specification, all in this explanation not to limit this specificationWithin the spirit and principle of book, any modification, equivalent substitution, improvement and etc. done should be included in the model of this specification protectionWithin enclosing.