Summary of the invention
Embodiment of the present invention technical problem to be solved there are provided a kind of method and system covering extensive and real extraction tax system typical case's test case。
A kind of method extracting tax system typical case's test case provided by the present invention, including:
Determine the quantity that test sample is final;
Obtain the key message in initial data, to determine the dimension of input data, and these information are normalized;And
SOM learning parameter is set, initializes and utilize SOM algorithm to carry out the calculating that iterates, until stable, to obtain the neuron after stablizing, and choose the sample point minimum with each neuron Euclidean distance as final test sample。
Wherein, in described step " normalized ", the key message in described initial data includes the tax amount of money, Late Payment Fee, generation date, tax generation date and date differences of declaring dutiable goods, and wherein the data after normalized are designated as: x=[x1,x2,x3,...,xm]T, m represents the dimension of data。
Wherein, described step " normalized " also includes: start to initialize synapse: wj=[wj1,wj2,wj3,...,wjm,]T, j=1,2,3...l, in network, each each input space dimension of neuronic synaptic weight vector sum is identical, and wherein l is neuronic sum in network。
Wherein, described step " initialization utilizes SOM algorithm to carry out the calculating that iterates " including:
Select maximum inner productNeuron as activating neuron, utilize index i (x) to identify the neuron of Optimum Matching input vector x, wherein i (x)=argminj||x-wj| |, j=A,
If hi,jRepresenting topological neighborhood centered by triumph neuron i and comprise a combination and make neuron, one of them neuron is j, selects Gaussian function:
Wherein di,jIt is integer and is equal to | j-i |。Under two-dimensional caseCan be defined as:
And the σ of SOM width declines over time, it is possible to be defined as:
Use discrete vector form, it is assumed that the weight vector at time n neuron j is wjN (), updates weight vector wj(n+1) it is defined as at time n+1:
wj(n+1)=wj(n)+η(n)hj,i(x)N () (x (n)-w (n)), such training network is until stablizing, to obtain output neuron。
Present invention also offers a kind of system extracting tax system typical case's test case, including:
Key message acquiring unit, for obtaining the key message of initial data, to determine the dimension of input data;
Data normalization processing unit, for being normalized the key message of above-mentioned acquisition;
Synapse initialization unit, is used for initializing synapse;
Iterative computation unit, is used for arranging SOM learning parameter, and initialization utilizes SOM algorithm to carry out the calculating that iterates, until stable, to obtain the neuron after stablizing;And
Euclidean distance computing unit, the sample point minimum for calculating Euclidean distance between the neuron of above-mentioned acquisition, these sample points seek to the test case obtained。
Wherein, the key message in described initial data includes the tax amount of money, Late Payment Fee, generation date, tax generation date and date differences of declaring dutiable goods, and wherein the data after normalized are designated as: x=[x1,x2,x3,...,xm]T, m represents the dimension of data。
Wherein, the synaptic weight vector of neuron j is designated as: wj=[wj1,wj2,wj3,...,wjm,]T, j=1,2,3...l, wherein l is neuronic sum in network。
Wherein, described iterative computation Unit selection maximum inner productNeuron as activating neuron, and utilize index i (x) to identify the neuron of Optimum Matching input vector x, wherein i (x) is determined by following equation: i (x)=argminj||x-wj| |, j=A, if hi,jRepresenting topological neighborhood centered by triumph neuron i and comprise a combination and make neuron, one of them neuron is j, selects Gaussian function:
Wherein di,jIt is integer and is equal to | j-i |, under two-dimensional caseIt is defined as:
And the σ of SOM width declines over time, it is defined as:
Use discrete vector form, it is assumed that the weight vector at time n neuron j is wjN (), updates weight vector wj(n+1) it is defined as at time n+1: wj(n+1)=wj(n)+η(n)hj,i(x)N () (x (n)-w (n)), such training network is until stablizing, to obtain stable output neuron。
The method and system of said extracted tax system typical case's test case have the advantage that 1) test sample selected by the present invention is most representational truthful data, avoid and artificially make up the problem that data are likely bigger with reality deviation, add the credibility of test。2) present invention effectively compresses from historical data and repeats data sample in a large number, significantly reduces the working strength of tester, improve the work efficiency of tester while ensureing test quality。3) present invention utilizes the characteristic of SOM algorithm order preserving map cleverly, effectively saves the error sample in truthful data and boundary sample, improves the reliability of test。
Detailed description of the invention
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is only a part of embodiment of the present invention, rather than whole embodiments。Based on the embodiment in the present invention, the every other embodiment that those of ordinary skill in the art obtain under not making creative work premise, broadly fall into the scope of protection of the invention。
First, before embodiment is described, it is necessary to some herein presented terms are made an explanation。Such as:
If occurring herein using the term such as " first ", " second " to describe various element, but these elements should not limited by these terms。These terms are only used for distinguishing an element and another element。Therefore, " first " element can also be referred to as " second " element without departing from the teachings of the present invention。
In addition, it is to be understood that when mentioning an element " connection " or " coupled " to another element, it can be directly connected or be directly coupled to another element or can also there is intermediary element。On the contrary, when mentioning that an element " being directly connected " or " directly coupling " are to another element, be then absent from intermediary element。
The various terms occurred in this article are used only for describing the purpose of specific embodiment and being not intended as limitation of the invention。Unless context clearly dictates otherwise, then singulative is intended to also include plural form。
When using term " including " and/or " including " in this manual, these terms specify the existence of described feature, entirety, step, operation, element and/or parts, but are also not excluded for the existence of other features more than one, entirety, step, operation, element, parts and/or its group and/or additional。
About embodiment:
What a kind of system and method extracting tax system typical case's test case of the present invention to solve is how to choose to have the sample representing meaning most as field test data from a large amount of historical data sample in the past。Solve this problem mainly to seek to solve effectively to choose test data, selected data should cover whole system data scope, it is possible to prominent error sample, effective letter lid boundary sample, and test size should be reduced on this basis as far as possible, alleviate tester's pressure。Therefore the present invention utilizes the data compression of self-organizing map neural network (SOM) and the characteristic of order preserving map, uses Self-organizing Maps algorithm to extract most representational test sample from a large amount of historical datas。
Refer to the flow chart that Fig. 1, Fig. 1 are the better embodiment of a kind of method extracting tax system typical case's test case of the present invention。The better embodiment of the method for described extraction tax system typical case's test case includes:
Step S1: determine the quantity that test sample is final, the i.e. neuronal quantity of Self-organizing Maps algorithm。In general, this numerical value is generally determined according to scale of the project and historical data amount。
Step S2: take out initial data, it is thus achieved that in data, date and the key messages such as date differences of declaring dutiable goods occur for the tax amount of money, Late Payment Fee, generation date, the tax, it is determined that the dimension m of input data, and these information are normalized。It is denoted as:
X=[x1,x2,x3,...,xm]T,
Wherein, in network, each each input space dimension of neuronic synaptic weight vector sum is identical, and the synaptic weight vector of neuron j is designated as:
wj=[wj1,wj2,wj3,...,wjm,]T, j=1,2,3...l,
Wherein l is neuronic sum in network。
Step S3: arrange SOM learning parameter, initializes and utilizes SOM algorithm to carry out the calculating that iterates, until stable。Obtain the neuron after stablizing, choose and each neuron Euclidean distanceMinimum sample point is as final test sample。
In order to make the above-mentioned purpose of the present invention, feature and advantage more aobvious and understandable, carry out above-mentioned each step below describing in detail further:
Described step S2 specifically includes: chooses critical data in historical data and quantifies, in tax data, tax take, the tax rate, tax revenue date, tolerance natural law etc. are all critical datas, also will according to the service logic stressing different choice stress test of different tax systems, selecting the relevant data of these service logics in detail and quantify, the data after quantization should present in vector form:
X=[x1,x2,x3,...,xm]T。
Just should starting to initialize synapse after determining the dimension of input data, wherein the synaptic weight vector of neuron j is designated as:
wj=[wj1,wj2,wj3,...,wjm,]T, j=1,2,3...l。
In network, each each input space dimension of neuronic synaptic weight vector sum is identical, and wherein l is neuronic sum in network。
Described step S3 specifically includes:
Select maximum inner productNeuron as activate neuron, based on inner productMaximized matching criterior is mathematically equivalent to vector x and wjEuclidean distance minimize, if index of reference i (x) identifies the neuron of Optimum Matching input vector x, then can by following conditional decision i (x):
I (x)=argminj||x-wj| |, j=A。
If hi,jRepresenting topological neighborhood centered by triumph neuron i and comprise a combination and make neuron, one of them neuron is j, is typically chosen Gaussian function:
Wherein di,jIt is integer and is equal to | j-i |。Under two-dimensional caseCan be defined as:
And the σ of SOM width declines over time, it is possible to be defined as:
Use discrete vector form, it is assumed that the weight vector at time n neuron j is wjN (), updates weight vector wj(n+1) it is defined as at time n+1:
wj(n+1)=wj(n)+η(n)hj,i(x)(n)(x(n)-w(n))。
Training network is until stablizing, it is thus achieved that output neuron, calculates the sample point minimum with these neuron Euclidean distances, and these sample points seek to the test case obtained。
The method of said extracted tax system typical case's test case has the advantage that 1) test sample selected by the present invention is most representational truthful data, avoid and artificially make up the problem that data are likely bigger with reality deviation, add the credibility of test。2) present invention effectively compresses from historical data and repeats data sample in a large number, significantly reduces the working strength of tester, improve the work efficiency of tester while ensureing test quality。3) present invention utilizes the characteristic of SOM algorithm order preserving map cleverly, effectively saves the error sample in truthful data and boundary sample, improves the reliability of test。
Refer to shown in Fig. 2, for the block diagram of the better embodiment of a kind of system extracting tax system typical case's test case of the present invention。The better embodiment of the system of described extraction tax system typical case's test case includes key message acquiring unit 1, data normalization processing unit 2, synapse initialization unit 3, iterative computation unit 5 and Euclidean distance computing unit 6。
Wherein said key message acquiring unit is used for obtaining in initial data the tax amount of money, date and the key messages such as date differences of declaring dutiable goods occur for Late Payment Fee, generation date, the tax, to determine the dimension m of input data。
Described data normalization processing unit is for being normalized the key message of above-mentioned acquisition, and the result after process is designated as: x=[x1,x2,x3,...,xm]T。
Described synapse initialization unit is used for initializing synapse, is designated as: wj=[wj1,wj2,wj3,...,wjm,]T, j=1,2,3...l, wherein in network, each each input space dimension of neuronic synaptic weight vector sum is identical, and l is neuronic sum in network。
Described iterative computation unit arranges SOM learning parameter, and initialization utilizes SOM algorithm to carry out the calculating that iterates, until stable, to obtain the neuron after stablizing。Specifically, described iterative computation Unit selection maximum inner productNeuron as activating neuron, and utilize index i (x) to identify the neuron of Optimum Matching input vector x, wherein i (x) can be determined by following equation: i (x)=argminj||x-wj| |, j=A, if hi,jRepresenting topological neighborhood centered by triumph neuron i and comprise a combination and make neuron, one of them neuron is j, is typically chosen Gaussian function:
Wherein di,jIt is integer and is equal to | j-i |, under two-dimensional caseCan be defined as:And the σ of SOM width declines over time, it is possible to be defined as:
Use discrete vector form, it is assumed that the weight vector at time n neuron j is wjN (), updates weight vector wj(n+1) it is defined as at time n+1: wj(n+1)=wj(n)+η(n)hj,i(x)N () (x (n)-w (n)), such training network, until stablizing, can obtain stable output neuron。
The sample point that described Euclidean distance computing unit is minimum for calculating Euclidean distance between the neuron of above-mentioned acquisition, these sample points seek to the test case obtained。
The system and method use Self-organizing Maps algorithm of said extracted tax system typical case's test case compresses from historical data and repeats data sample in a large number, it is ensured that test sample is succinctly effective;And utilize the characteristic of SOM algorithm order preserving map, effectively save the error sample in truthful data and boundary sample, improve the reliability of test。
These are only embodiments of the present invention; not thereby the scope of the claims of the present invention is limited; every equivalent structure utilizing description of the present invention and accompanying drawing content to make or equivalence flow process conversion; or directly or indirectly it is used in other relevant technical fields, all in like manner include in the scope of patent protection of the present invention。