CN1758263A

Movatterモバイル変換

Info

Publication number: CN1758263A
Application number: CN200510061359.0A
Authority: CN
Inventors: 吴朝晖; 杨莹春; 李东东
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2005-10-31
Filing date: 2005-10-31
Publication date: 2006-04-12
Anticipated expiration: 2025-10-31
Also published as: CN100363938C

Abstract

本发明涉及一种基于得分差加权融合的多模态身份识别方法，首先利用一组说话人样本数据，通过原有的传统单模态分类器每个样本相对模板中每个说话人模型的得分；如果得分最高的那个模型和样本属于不同的说话人，则记录下两者的得分差；然后把单个分类器中所有这些差值都累加起来；最后利用各分类器的得分差来确定各个模态的权重。本发明有益的效果是：利用多生物特征进行交叉身份认证，并采用一种修正的基于得分差的加权算法SDWS对两个生物认证模态进行融合，把两种身份认证的结果加以综合。利用两种生物特征信息识别的优点，提高容错性，降低不确定性，克服单个生物特征信息的不完整性，增强识别决策结果的可靠性，使其具有更广泛安全性和适应性。

The present invention relates to a multi-modal identity recognition method based on score difference weighted fusion. Firstly, a set of speaker sample data is used to pass the score of each speaker model in each sample relative template of the original traditional single-mode classifier. ; if the model with the highest score and the sample belong to different speakers, record the score difference between the two; then add up all these differences in a single classifier; finally use the score difference of each classifier to determine each model state weight. The beneficial effects of the present invention are: using multi-biological features for cross-identity authentication, adopting a modified weighted algorithm SDWS based on score difference to fuse two biometric authentication modes, and to synthesize the results of the two identity authentications. Utilize the advantages of two types of biometric information identification, improve fault tolerance, reduce uncertainty, overcome the incompleteness of single biometric information, enhance the reliability of identification decision results, and make it more widely safe and adaptable.

Description

Multi-modal personal identification method based on score difference weighting fusion

Technical field

The present invention relates to the Multiple Classifier Fusion technology, mainly is a kind of multi-modal personal identification method based on score difference weighting fusion.

Background technology

In the application of actual life, the discriminating of identity is a very complicated job, has very strong robustness because it need reach very high performance and requirement.The biological identification technology with people's self physical features as the authentication foundation, fundamentally be different from traditional authentication techniques based on " thing that you had " or " thing known to you ", real with the foundation of people self as authentication, own authentic representative oneself.

In numerous biological identification technology, differentiate it is current two kinds of popular methods based on the identity of sound and image.Application on Voiceprint Recognition, i.e. Speaker Identification does not have and can lose, need not memory and advantage such as easy to use, economic, accurate; Recognition of face then has initiative, non-infringement and many advantages such as user friendly.When this several method uses separately, its separately performance always can be subjected to the constraint of certain extreme value or show instability.So, adopt information fusion to come the advantage of comprehensive each subpattern, be that the reliability of raising identification is a valid approach.

Nearly all multi-modal recognition methods at present all is to carry out on the fusion rank of decision level.According to fusion rule, decision-making level merges generally two kinds of strategies.A kind of is the fixing fusion method of parameter, as the method for average, and ballot method, addition or the like; Another kind is the method that needs parameter training, as Dempster-Shafer, and knowledge and behavior space and naive Bayesian method or the like.

The fusion method of preset parameter can influence performance because of the pairing effect of sorter to a great extent.And the quality of training set and size make the decision level fusion method of parameter training often can not reach theoretic syncretizing effect.

Summary of the invention

The present invention will solve the existing defective of above-mentioned technology, and a kind of multi-modal personal identification method based on score difference weighting fusion is provided.By research to the identification score of single sorter, recognition category and affiliated class score difference as the weights foundation, obtain a kind of new weighting parameters training method " based on the weighting of score difference " SDWS (Scores Difference-BasedWeightedSum Rule) and merged vocal print sorter and people's face sorter, thus the performance of raising Speaker Identification.

The technical solution adopted for the present invention to solve the technical problems: this multi-modal personal identification method based on score difference weighting fusion, at first utilize one group of speaker's sample data, by the score of each speaker model in the relative template of original traditional each sample of single mode sorter; If that model that score is the highest belongs to different speakers with sample, the score of then noting both is poor; Then all these differences in the single sorter are all added up; Utilize the score difference of each sorter to determine the weight of each mode at last.

The technical solution adopted for the present invention to solve the technical problems can also be further perfect.Described traditional single mode sorter is Application on Voiceprint Recognition sorter and recognition of face sorter.Described be divided into sorter belongs to this guess of certain classification to the data of input support.Described score difference is under separation vessel is differentiated error situation, import this moment data former under the classification of the input data supposed of classification and sorter when inconsistent, sorter is to the difference of the support of above-mentioned two classifications.The score difference of described sorter be all speakers differentiate the score of the speaker model that the sample under the error situation belongs to and top score in the single sorter difference and.Described sorter based on the weight of score difference be single separation vessel score difference inverse to the inverse of all separation vessel score differences and ratio.

The effect that the present invention is useful is: utilize multi-biological characteristic (vocal print, people's face) intersects authentication, and adopt a kind of weighting algorithm SDWS of correction that two biological identification mode are merged based on the score difference, comprehensive in addition the result of two kinds of authentications.Utilize the advantage of two kinds of biological information identifications and the field that is suitable for, improve fault-tolerance, reduction is uncertain, overcomes the imperfection of single biological information, strengthens recognition decision result's reliability, makes it have more extensive security and adaptability.

Description of drawings

Fig. 1 is the multi-modal identification system frame diagram based on score difference weighting fusion SDWS of the present invention;

Fig. 2 is the topological structure synoptic diagram of dynamic Bayesian model of the present invention.

Embodiment

The invention will be described further below in conjunction with drawings and Examples: method of the present invention was divided into for three steps.

The first step, Application on Voiceprint Recognition

Speaker Identification is divided into the voice pre-service, feature extraction, and model training is discerned four parts.

1. voice pre-service

The voice pre-service is divided into sample quantization, zero-suppresses and floats, three parts of pre-emphasis and windowing.

A), sample quantization

I. with sharp filter sound signal is carried out filtering, make its nyquist frequency F_NBe 4KHZ;

II., audio sample rate F=2F is set_N

III. to sound signal S_a(t) sample by the cycle, obtain the amplitude sequence of digital audio and video signals

s (n) = s_{a} (\frac{n}{F});

IV. with pulse code modulation (pcm) s (n) is carried out quantization encoding, the quantization means s ' that obtains amplitude sequence (n).

B), zero-suppress and float

I. calculate the mean value s of the amplitude sequence that quantizes;

II. each amplitude is deducted mean value, obtaining zero-suppressing, to float back mean value be 0 amplitude sequence s " (n).

C), pre-emphasis

I., Z transfer function H (the z)=1-α z of digital filter is set^-1In pre emphasis factor α, the value that the desirable ratio of α 1 is slightly little;

II.s " (n) by digital filter, obtain the suitable amplitude sequence s (n) of high, medium and low frequency amplitude of sound signal.

D), windowing

I. frame length N of computing voice frame (32 milliseconds) and the frame amount of moving T (10 milliseconds), satisfy respectively:

\frac{N}{F} = 0.032

\frac{T}{F} = 0.010

Here F is the speech sample rate, and unit is Hz;

II. be that N, the frame amount of moving are T with the frame length, s (n) is divided into a series of speech frame F_m, each audio frame comprises N voice signal sample;

III. calculate the hamming code window function:

IV. to each speech frame F_mAdd hamming code window:

2.MFCC extraction:

A), the exponent number p of Mel cepstrum coefficient is set;

B), be fast fourier transform FFT, time-domain signal s (n) is become frequency domain signal X (k).

C), calculate Mel territory scale:

M_{i} = \frac{i}{p} \times 2595 \log (1 + \frac{8000 / 2.0}{700.0}), (i = 0,1,2, . . ., p)

D), calculate corresponding frequency domain scale:

f_{i} = 700 \times e^{\frac{M_{i}}{2595} \ln 10} - 1, (i = 0,1,2, . . ., p)

E), calculate each Mel territory passage φ_jOn the logarithm energy spectrum:

E_{j} = Σ_{k = 0}^{\frac{K}{2} - 1} φ_{j} (k) {| X (k) |}^{2}

Wherein

Σ_{k = 0}^{\frac{K}{2} - 1} φ_{j} (k) = 1 .

F), be discrete cosine transform DCT

3.DBN model training

Dynamic bayesian network model (DBN) is similar to HMM, is a generation model, and it only needs a people's speech data just can carry out modeling to it, finishes identifying.

The purpose of training is in order to make under given speech data, and the parameter of model can better be described the distribution situation of voice in feature space.Here DBN training mainly lays particular emphasis on the training to model parameter, does not learn at network topology.

A) if likelihood score does not have convergence, and iterations changes B less than preset times) step; Otherwise, change E).

Here convergent definition is:

Converged = \{\begin{matrix} TRUE, if | PreLogLik - CurLogLik | < θ \\ FALSE, otherwize \end{matrix}

The PreLogLik here is meant the likelihood score of back iteration, and CurLogLik is meant the likelihood score of current iteration, and they all are by step C) in forward-backward algorithm traversal obtain.θ is the threshold values of presetting.Default maximum iteration time MAXITER can set arbitrarily.The judgement in this step is to make iteration be unlikely to unconfined to carry out.

B), the ASSOCIATE STATISTICS value of each node empties.

Will empty statistical value before forward-backward algorithm traversal, said here statistical value is meant CPD (conditional probability distribution) to node needed data when learning.

C), pooled observations, carry out forward-backward algorithm traversal, the output likelihood score.

Network is carried out the forward-backward algorithm traversal, make observed reading can make other nodes in the network also can obtain upgrading to the renewal of some node, satisfy locally coherence and global coherency condition, this step has realized in abutting connection with algorithm, and the frame inner structure has been carried out the probability diffusion with COLLECT-EVIDENCE (collecting evidence) and DISTRIBUTE-EVIDENCE (issue evidence).This traversal will be exported the Log likelihood score, at A in step) in will be used to.Used probability output also obtains by this traversal in the identification.

D), according to observed reading, calculate the ASSOCIATE STATISTICS value, upgrade the probability distribution of interdependent node, change A).

According to observed reading, calculate the ASSOCIATE STATISTICS value, the probability distribution of new node more, this is determined by the EM learning algorithm.

E), preserve model.

4. identification

After the user speech input,, obtain a characteristic vector sequence C through feature extraction.Press Bayes rule,, meet model M giving under the given data C_iLikelihood score be,

P (M_{i} | C) = \frac{P (C | M_{i}) * P (M_{i})}{P (C)}

Because without any the knowledge of priori, so we think to all models P (M_i) be identical, i.e. P (M_i)=1/N, i=1,2 ..., N, and concerning all speakers, P (C) is a unconditional probability, also is identical, that is:

P(M_i|C)∝P(C|M_i)

We are converted into the posterior probability of asking model and ask the prior probability of model to data.So, speaker's identification test is exactly to calculate following formula,

i^{*} = \underset{i}{\arg \max} P ({C | M}_{i})

Second step: recognition of face

2 dimension face identification systems mainly comprise image pre-service, feature extraction and three parts of sorter classification.

1. image pre-service

The pretreated general objects of image is to adjust the difference of original image on illumination and geometry, obtains normalized new images.Pre-service comprises the alignment and the convergent-divergent of image.

2.PCA feature extraction

By the pivot conversion, with a low n-dimensional subspace n (pivot subspace) facial image is described, try hard to when rejecting the classification interference components, remain with the discriminant information that is beneficial to classification.

With the standard picture after pretreated as training sample set, and with the covariance matrix of this sample set generation matrix as the pivot conversion:

Σ = \frac{1}{M} Σ_{i = 0}^{M - 1} (x_{i} - μ) {(x_{i} - μ)}^{T}

X wherein_iBe the image vector of i training sample, μ is the average image vector of training sample set, and M is the sum of training sample.If the image size is K * L, then the matrix ∑ has KL * KL dimension.When image is very big, directly calculating the eigenwert and the proper vector that produce matrix will have certain difficulty.As sample number M during less than KL * KL, available svd theorem (SVD) is converted to the calculating of M dimension matrix.

With the eigenwert λ that sorts from big to small₀〉=λ₁〉=... λ_R-1, and establish the vectorial u of being of their characteristics of correspondence_iLike this, each width of cloth facial image can project to by u₀, u₁..., u_M-1In the subspace of opening.Obtained M proper vector altogether, chosen preceding k maximum proper vector, made:

\frac{Σ_{i = 0}^{k} λ_{i}}{Σ_{i = 0}^{M - 1} λ_{i}} = α

Wherein α is called the energy ratio, accounts for the ratio of whole energy for the energy of sample set on preceding k axle.

3. sorter classification

With the arest neighbors sorting technique as component classifier.What distance metric used is the Euclidean distance formula.

The 3rd step: based on the Multiple Classifier Fusion of score difference weighting

Multiple Classifier Fusion algorithm based on the weighting of score difference is divided into the sorter formalized description, trains and discern three parts.

1. sorter formalized description

A), sorter is described: establish D={D₁, D₂..., D_LRepresent a group component sorter;

B), classification is described: establish Ω={ ω₁..., ω_c) represent a category not identify, promptly all possible classification results

C), input: proper vector

D), output: length is the vectorial D of c_i(x)=[d_{I, 1}(x), d_{I, 2}(x) ..., d_{I, c}(x)]^T, d wherein_{I, j}(x) represent D_iBelong to for x

The support .d of this guess_{I, j}(x) normalized to [0,1] interval interior component classifier output, and

Σ_{j = 1}^{c} d_{i, j} (x) = 1

E), the output of all sorters can be synthesized a DP (Decision Profile) matrix:

DP (x) = [\begin{matrix} d_{1,1} (x), d_{1,2} (x), . . ., d_{1, c} (x) \\ . . . \\ d_{i, 1} (x), d_{i, 2} (x), . . ., d_{i, c} (x) \\ . . . \\ d_{L, 1} (x), d_{I, 2} (x), . . ., d_{I, c} (x) \end{matrix}]

In this matrix, the i row element is represented component classifier D_iOutput D_i(x); The j column element represents each component classifier right

Support.

2. training

A), training sample: the training set X={x that N element arranged₁, x₂..., x_N}

B), sorter is to the recognition result of sample:

S (X) = [\begin{matrix} s_{1,1} (X), . . ., S_{1, L} (X) \\ . \\ . \\ . \\ s_{j, 1} (X), . . ., s_{j, L} (X) \\ . \\ . \\ . \\ s_{N, 1} (X), . . ., s_{N, L} (X) \end{matrix}]

S wherein_{J, i}Be sorter D_iTo sample elements x_jThe class that is identified, and if only if

s_j，i＝D_i(x_j)

= s &DoubleLeftRightArrow; d_{i, s} (x_{j}) = \max_{o = 1,2, . ., c} {d_{i, o} (x_{j})}

Here j=1 ..., N is the number of element in the training set; I=1 ... L is that the number .C of sorter is the number of classification, is number to be identified herein.

C), original affiliated classification: the L (X) of sample=[k_{1 ...}, k_N]^T,

D), the score difference SD of i sorter_i(X) be:

{SD}_{i} (X) = Σ_{j = 1}^{N} {SD}_{i}^{j} (x_{j})

= Σ_{j = 1}^{N} Σ_{s_{j, i} &NotEqual; k_{j}} | d_{i, k_{j}} (x_{j}) - d_{i, s_{j, i}} (x_{j}) |

SD_i(X) be under separation vessel is differentiated error situation, import this moment data former under the classification of the input data supposed of classification and sorter s when inconsistent_{J, i}≠ k_j, sorter is to the difference of the support of above-mentioned two classifications.D wherein_{I, j}(x) be element in DP (x) matrix.

E), sorter is based on the weights of score difference:

W_{i} = \frac{{SD}_{i} {(X)}^{- 1}}{Σ_{i = 1}^{L} {SD}_{i} {(X)}^{- 1}}

3. judgement

According to weights, recomputate under the multi-modal state support of each classification:

D(x)＝[d₁(x)，d₂(x)，...，d_c(x)]^T

= {[Σ_{i = 1}^{L} W_{i} * d_{i, 1} (x), Σ_{i = 1}^{L} W_{i} * d_{i, 2} (x), . . . Σ_{i = 1}^{L} W_{i} * d_{i, c} (x),]}^{T}

A plurality of sorters are ω to the classification results of test vector x_sAnd if only if

s = \max_{i = 1, . . . c} d_{i} (x) .

Experimental result

Native system is tested on a multi-modal speech database that comprises 54 user's vocal prints and voice messaging.This database has been gathered the people's face and the voiceprint of 54 students of Zhejiang University (37 schoolboys, 17 schoolgirls).The collecting work of entire database carries out in the environment of low noise bright and clear.In the phonological component, everyone is required to saypersonal information 3 times; The mandarin numeric string, dialect numeric string, english digit string, mandarin word string, each 10 of picture talks, one section of short essay.The voice document form is the wav/nist form, and all standard becomes the 8000Hz sampling rate, the 16bit data.Experiment adopts short essay and personal information as training, and all the other 50 voice are as test.In the facial image part, everyone respectively produces the front and people from side face shines totally 4, and wherein positive according to two, the side is according to two.Experiment employing wherein positive a photograph is trained, and another is tested.

We use the single mode Application on Voiceprint Recognition simultaneously on this storehouse, single mode recognition of face and addition, weighting, ballot method and carried out same experiment based on this several frequently seen decision-making level's blending algorithm of method of behavior knowledge space, be used for and native system (SDWS is based on the blending algorithm of score difference weighting) compares.Wherein Application on Voiceprint Recognition is based on people's phonetic feature, and recognition of face is based on people's face feature.Blending algorithm combines these two kinds of features, and addition and ballot are owned by France in the fixing fusion method of parameter; Weighted sum belongs to the blending algorithm that needs parameter training based on the method for behavior knowledge space.

Single mode vocal print method for distinguishing speek person is based on the first step of this explanation, voice are carried out pre-service after, it is extracted the Mel cepstrum feature, utilize dynamic Bayesian model to speaker's modeling.Dynamically the topology of Bayesian model adopts structure as shown in Figure 2, wherein qⁱ_j, i=1,2,3, j=1,2 ... T represents latent node variable, and each node hypothesis has two discrete values, oⁱ_j, i=1,2,3, j=1,2 ... T is an observer nodes, corresponding to observation vector, has the father node q of Discrete Distributionⁱ_j, satisfy Gaussian distribution.Same, tested speech is carried out rightly with the speaker model of building up after the process of extracting through pre-service and Mel cepstrum feature, obtains the pairing artificial identification person that speaks of the highest model of branch.

The single mode recognition of face goes on foot based on second of this explanation, after facial image is manuallyd locate according to eyes, it is extracted the PCA feature, by comparing the Euclidean distance between the PCA feature, gets the pairing artificial identification person that speaks of nearest feature.

For addition, its thought can be by following formulate:

μ_i(x)＝F(d_1，i(x)，...，d_L，i(x))，i＝1，...，c

Wherein F has represented add operation (Sum), and final classification results is to make μ_iThe ω of maximum i correspondence_i

Weighting algorithm is to grow up on the basis of addition, embodies difference good and bad between each sorter by weight.Here adopt each sorter etc. error rate as its weight.

The basic thought of ballot method is " the minority is subordinate to the majority ".Wherein, the voter is all component classifiers, and the candidate is all possible classification results.Give its candidate's ballot of supporting by the voter, the candidate that poll is maximum wins.

Method based on the behavior knowledge space is to estimate posterior probability under the situation of knowing the component classifier classification results.It need add up the number that each class sample drops on each unit of behavior knowledge space.When using this method, the sample in the training set is divided into different unit, and these unit are that the various combination by all component classifier classification results defines.When a unknown sample need be carried out the branch time-like, all component classifiers all can be known the combination of classification results, can find corresponding unit thus.Then, according to the sample concrete class in this unit, unknown sample is included into the maximum classification of occurrence number.

We are being different under the voice collection of voice content and languages, and single mode identification and above several blending algorithm are assessed.

Assess for performance, select for use discrimination (IR, Identification Rate) to be used as the evaluation criteria of experimental result Speaker Recognition System.

The computing formula of discrimination IR is:

Experimental result is as follows:

Fusion method	Discrimination (%)
	Discrimination (%)					Mandarin	Dialect	English	Vocabulary	Picture talk
	Application on Voiceprint Recognition	84.63	85.55	91.11	87.78	Mandarin	Dialect	English	Vocabulary	Picture talk	87.78
Recognition of face	Application on Voiceprint Recognition	84.63	85.55	91.11	87.78	85.18					87.78
Recognition of face	Addition	85.37	85.18	86.11	85.18	85.18					85
Weighting	Addition	85.37	85.18	86.11	85.18	85.37	85.18	86.67	85.18	85	85
Weighting	SDWS	97.96	97.98	98.89	99.26	85.37	85.18	86.67	85.18	85	98.33
The ballot method	SDWS	97.96	97.98	98.89	99.26	85.18	85.18	85.18	85.18	85.18	98.33
The ballot method	Method based on the behavior knowledge space	89.15	89.68	92.33	90.21	85.18	85.18	85.18	85.18	85.18	88.10

Experimental result shows that the biological authentication method of single mode can't reach discrimination preferably, can not satisfy the requirement of security and robustness.

Under the situation of two Multiple Classifier Fusion, the method for addition and weighting tends to make the advantage of two sorters disappear mutually on the contrary because do not consider the score distribution situation of sorter.

The ballot method has only been considered the category label of each sorter output, and does not consider their error rate, and this has wasted the information of training sample to a certain extent.

Though behavior knowledge space method is to tie up the direct statistics that distributes more than a plurality of sorter results of decision, the decision-making that can make up component classifier is to obtain best result.Yet, because the relative training sample quantity of behavior knowledge space is too huge, being easy to occur undisciplined situation, this is because training set can't be huge to each unit is filled into enough density.

This recognizer can be by the analysis to the sorter score, according under the situation of sorter identification error, difference between the score of the model under the score of the model that the sorter of collecting is judged and the sample, with this weight as sorter, by simple and effective method of weighting sorter is merged in decision-making level, make two kinds of sorters have complementary advantages, to improving a lot on the system performance, head and shoulders above other fusion method, improved about 7.8-13.3% than the method for single mode.Thereby improved the recognition performance of Speaker Identification.

Claims

Translated fromChinese

1、一种基于得分差加权融和的多模态身份识别方法，其特征在于：首先利用一组说话人样本数据，通过原有的传统单模态分类器每个样本相对模版中每个说话人模型的得分；如果得分最高的那个模型和样本属于不同的说话人，则记录下两者的得分差；然后把单个分类器中所有这些差值都累加起来；最后利用各分类器的得分差来确定各个模态的权重。1. A multimodal identity recognition method based on weighted fusion of score differences, characterized in that: first, a set of speaker sample data is used, and each sample is compared to each speaker in the template through the original traditional single-modal classifier. The score of the model; if the model with the highest score and the sample belong to different speakers, record the score difference between the two; then add up all these differences in a single classifier; finally use the score difference of each classifier to Determine the weight of each modality.

2、根据权利要求1所述的基于得分差加权融和的多模态身份识别方法，其特征在于：所述的传统单模态分类器为声纹识别分类器和人脸识别分类器。2. The multi-modal identity recognition method based on score difference weighted fusion according to claim 1, wherein the traditional single-modal classifier is a voiceprint recognition classifier and a face recognition classifier.

3、根据权利要求1所述的基于得分差加权融和的多模态身份识别方法，其特征在于：所述的得分为分类器对输入的数据属于某个类别的这一猜想的支持度。3. The multi-modal identity recognition method based on score difference weighted fusion according to claim 1, wherein the score is the support degree of the classifier to the conjecture that the input data belongs to a certain category.

4、根据权利要求1所述的基于得分差加权融和的多模态身份识别方法，其特征在于：所述的得分差为在分离器判别错误情况下，此时输入数据原所属类别与分类器假设的输入数据的类别不一致时，分类器对上述两个类别的支持度的差值。4. The multi-modal identity recognition method based on weighted fusion of score differences according to claim 1, characterized in that: the score difference is the difference between the original category of the input data and the classifier when the separator makes an error. When the categories of the assumed input data are inconsistent, the difference between the support of the classifier for the above two categories.

5、根据权利要求1所述的基于得分差加权融和的多模态身份识别方法，其特征在于：所述的分类器的得分差为单个分类器中所有说话人判别错误情况下的样本属于的说话人模型的得分与最高得分的差值的和。5. The multimodal identity recognition method based on weighted fusion of score differences according to claim 1, characterized in that: the score difference of the classifiers belongs to the samples in the case of all speaker discrimination errors in a single classifier. The sum of the difference between the speaker model's score and the highest score.

6、根据权利要求1所述的基于得分差加权的多模态身份识别方法，其特征在于：所述的分类器基于得分差的权重为单个分离器得分差的倒数对所有分离器得分差的倒数和的比值。6. The multi-modal identity recognition method based on weighted score difference according to claim 1, characterized in that: the weight of the classifier based on the score difference is the reciprocal of the score difference of a single separator to the weight of the score difference of all separators The ratio of the reciprocal sum.

7、根据权利要求1或2或3或4或5或6所述的基于得分差加权的多模态身份识别方法，其特征在于：基于得分差加权的分类器融合算法分为分类器形式化描述，训练和识别三个部分；7. The multimodal identification method based on score difference weighting according to claim 1 or 2 or 3 or 4 or 5 or 6, characterized in that: the classifier fusion algorithm based on score difference weighting is divided into classifier formalization Describe, train and recognize three parts;

1)、分类器形式化描述1) Formal description of the classifier

A)、分类器描述：设D＝{D₁，D₂，...，D_L}代表一组分量分类器；A), classifier description: Let D={D₁ , D₂ ,..., D_L } represent a group of component classifiers;

B)、类别描述：设Ω＝{ω₁，...，ω_c}代表一组类别标识，即所有可能的分类结果；B), category description: Let Ω={ω₁ ,...,ω_c } represent a set of category identifiers, that is, all possible classification results;

C)、输入：特征向量

C), input: feature vector

D)、输出：长度为c的向量D_i(x)＝[d_i，1(x)，d_i，2(x)，...，d_i，c(x)]^T，其中d_i，j(x)代表D_i对于x属于

这一猜想的支持度，d_i，j(x)是被归一化到[0，1]区间内的分量分类器输出，且D), output: length is the vector D_i (x)=[d_{i, 1} (x), d_{i, 2} (x), ..., d_{i, c} (x)]^{T of length} c, wherein d_{i , j} (x) represents D_i for x belongs to

The support of this conjecture, d_{i, j} (x) is the output of the component classifier normalized to the interval [0, 1], and

{Σ Σ}_{j j = = 11}^{c c} {d d}_{i i,, j j} ((x x)) = = 11;;

E)、所有分类器的输出合成一个DP矩阵：E), the output of all classifiers is synthesized into a DP matrix:

DP DP ((x x)) = = [\begin{matrix} {d d}_{1,1 1,1} ((x x)),, {d d}_{1,2 1,2} ((x x)),, \cdot &Center Dot; \cdot &Center Dot; \cdot &Center Dot;,, {d d}_{11,, c c} ((x x)) \\ \cdot &Center Dot; \cdot &Center Dot; \cdot &Center Dot; \\ {d d}_{i i,, 11} ((x x)),, {d d}_{i i,, 22} ((x x)),, \cdot &Center Dot; \cdot &Center Dot; \cdot &Center Dot;,, {d d}_{i i,, c c} ((x x)) \\ \cdot &Center Dot; \cdot &Center Dot; \cdot &Center Dot; \\ {d d}_{L L,, 11} ((x x)),, {d d}_{l l,, 22} ((x x)) \cdot &Center Dot; \cdot &Center Dot; \cdot &Center Dot;,, {d d}_{l l,, c c} ((x x)) \end{matrix}]

在这个矩阵中，第i行元素代表分量分类器D_i的输出D_i(x)；第j列元素代表每个分量分类器对

的支持度；In this matrix, the i-th row element represents the output D_i (x) of the component classifier D_i ; the j-th column element represents each component classifier pair

the degree of support;

2)、训练2), training

A)、训练样本：有N个元素的训练集合X＝{x₁，x₂，...，x_N}；A), training samples: a training set X={x₁ , x₂ , . . . , x_N } with N elements;

B)、分类器对样本的识别结果：B), the recognition result of the classifier to the sample:

S S ((X x)) = = [\begin{matrix} {s the s}_{11,, 11} ((X x)),, \cdot &Center Dot; \cdot &Center Dot; \cdot &Center Dot;,, {s the s}_{11,, L L} ((X x)) \\ \cdot \cdot \\ \cdot &Center Dot; \\ \cdot \cdot \\ {s the s}_{j j,, i i} ((X x)),, \cdot \cdot \cdot &Center Dot; \cdot &Center Dot;,, {s the s}_{j j,, L L} ((X x)) \\ \cdot &Center Dot; \\ \cdot \cdot \\ \cdot &Center Dot; \\ {s the s}_{N N,, 11} ((X x)),, \cdot &Center Dot; \cdot &Center Dot; \cdot &Center Dot;,, {s the s}_{N N,, L L} ((X x)) \end{matrix}]

其中s_j，i为分类器D_i对样本元素x_j所标识的类，当且仅当where s_{j, i} is the class identified by the classifier D_i for the sample element x_j , if and only if

\begin{matrix} {s the s}_{j j,, i i} = = {D D.}_{i i} (({x x}_{j j})) \\ = = s the s &DoubleLeftRightArrow; &DoubleLeftRightArrow; {d d}_{i i,, s the s} (({x x}_{j j})) = = \underset{o o = = 1,2 1,2,, \cdot &Center Dot; \cdot &Center Dot; \cdot &Center Dot;,, c c}{max max} {{{d d}_{i i,, o o} (({x x}_{j j}))}} \end{matrix}

这里j＝1，...，N是训练集合中元素的数目：i＝1，...L是分类器的数目，C是分类的数目，此处为待识别的人数；Here j=1, ..., N is the number of elements in the training set: i=1, ... L is the number of classifiers, C is the number of classifications, here is the number of people to be identified;

C)、样本原始所属类别：L(X)＝[k₁，...，k_N]^T，

C), the original category of the sample: L(X)=[k₁ ,...,k_N ]^T ,

D)、第i个分类器的得分差SD_i(X)为：D), the score difference SD_i (X) of the i-th classifier is:

\begin{matrix} {SD SD}_{i i} ((X x)) = = {Σ Σ}_{J J = = 11}^{N N} {SD SD}_{i i}^{j j} (({x x}_{j j})) \\ = = {Σ Σ}_{j j = = 11}^{N N} {Σ Σ}_{{s the s}_{j j,, i i} &NotEqual; &NotEqual; {k k}_{j j}} | | {d d}_{i i,, {k k}_{j j}} (({x x}_{j j})) - - {d d}_{i i,, {s the s}_{j j,, i i}} (({x x}_{j j})) | | \end{matrix}

SD_i(X)为在分离器判别错误情况下，此时输入数据原所属类别与分类器假设的输入数据的类别不一致时s_j，i≠k_j，分类器对上述两个类别的支持度的差值。其中d_i，j(x)为DP(x)矩阵中的元素；SD_i (X) is the support degree of the classifier for the_above two categories when the original category of the input data is inconsistent with the category of the input data assumed by the classifier when the separator is wrongly judged, i≠k_j difference. Where d_{i, j} (x) is an element in the DP(x) matrix;

E)、分类器基于得分差的权值：E), the weight of the classifier based on the score difference:

{W W}_{i i} = = \frac{{SD SD}_{i i} {((X x))}^{- - 11}}{{Σ Σ}_{i i = = 11}^{L L} {SD SD}_{i i} {((X x))}^{- - 11}}

3)、判决3), Judgment

根据权值，重新计算多模态状态下，每个类别的支持度：According to the weight, recalculate the support of each category in the multimodal state:

\begin{matrix} D D. ((x x)) = = {[[{d d}_{11} ((x x)),, {d d}_{22} ((x x)),, \cdot &Center Dot; \cdot &Center Dot; \cdot &Center Dot;,, {d d}_{c c} ((x x))]]}^{T T} \\ = = {[[{Σ Σ}_{i i = = 11}^{L L} {W W}_{i i}^{* *} {d d}_{i i,, 11} ((x x)),, {Σ Σ}_{i i = = 11}^{L L} {W W}_{i i}^{* *} {d d}_{i i,, 22} ((x x)),, \cdot &Center Dot; \cdot &Center Dot; \cdot &Center Dot; {Σ Σ}_{i i = = 11}^{L L} {W W}_{i i}^{* *} {d d}_{i i,, c c} ((x x)),,]]}^{T T} \end{matrix}

多个分类器对测试向量x的分类结果为ω_s当且仅当

s = \max_{i = 1, \cdot \cdot \cdot c} d_{i} (x) .

The classification result of multiple classifiers on the test vector x is ω_s if and only if

the s = \max_{i = 1, &Center Dot; \cdot &Center Dot; c} d_{i} (x) .