dividing the collected face image into n × p blocks, and counting the occurrence times of each mode in each block, that is, counting the feature types in the sub-regions of each block of face image to obtain a face image feature component U ═ consisting of n × p histograms (U ═ n ═ p₁,U₂,…U_n×p)；

Wherein,

molecule P_j(Q) the number of features used to describe that the local binary pattern value in the jth sub-region is Q,

a binary pattern histogram for describing the jth sub-region;

establishing a face image characteristic histogram to provide a data basis for face image retrieval;

then denoising based on median filtering human eye visual characteristics, firstly determining noise points, setting the size of an image R as mxn, and sliding on the image by adopting a window with the size of 3 × 3;

defining the gray value as the central pixel value of the window, and then the value set of all the pixel points in the window is:

w_i，j＝{g(i+k,j+r)|k,r＝(1,0,-1)}；

calculating the average value of pixels in the window

Finding out the maximum gray value and the minimum gray value of the image R, and respectively marking as I_max(m×n)、I_min(m × n). The threshold value for marking the central pixel point is H_i,j；

Then when the gray value of the central pixel point meets the following conditions, the pixel point can be judged as a noise point:

if | g (i, j) -w_m|＞H_i,jIf yes, the pixel point is a noise point;

if | g (I, j) | ═ I_max(m.times.n) or I_min(m × n), the pixel point is a noise point;

for the above conditions, the threshold H is determined according to the noise sensitivity coefficient λ_i,jThe size of (d); defining the noise sensitivity coefficient lambda of the window center pixel point g (i, j) as:

by calculating the noise sensitivity factor lambda_i,jIf | g (i, j) -w is satisfied_m|＞λ_i,jJudging whether the pixel points are noise points or not under the condition;

after dividing image pixel points into two types of noise points and non-noise points, smoothing the image g (i, j) by adopting a NURBS function, and describing the image as a result of discrete convolution of an original image, a k-order spline and an l-order spline function as follows:

wherein, B_k(x-i) and B_l(y-i) are k-order splines and l-order spline convolution templates of NURBS respectively, if g (i, j) is a noise point, a filtering window of 3 multiplied by 3 is taken, filtered values are obtained, and then median filtering is carried out, so that a final value is obtained;

if i takes [0,255], the resolution function is defined as follows:

F_r(i)＝N(i)/max[N(i)]

the membership function defining the target area is:

wherein f (i) is a monotonically increasing function, and satisfies the conditions f (a) 0, f (b) 1; when the gray value is in the interval of [0, a ], the pixel point belongs to a background area, when the gray value is in the interval of [ b, 255], the pixel point belongs to a target area, and when the gray value is in the interval of [ a, b ], the pixel point needs to be further represented by a fuzzy function.

Compared with the prior art, the invention has the following advantages:

the invention provides a mass data processing method which is beneficial to improving the accuracy of face recognition under the conditions of face shielding, low sample quantity and quality and information loss and reducing the operation time of the recognition.

Drawings

Fig. 1 is a flowchart of a mass data processing method according to an embodiment of the present invention.

Detailed Description

A detailed description of one or more embodiments of the invention is provided below along with accompanying figures that illustrate the principles of the invention. The invention is described in connection with such embodiments, but the invention is not limited to any embodiment. The scope of the invention is limited only by the claims and the invention encompasses numerous alternatives, modifications and equivalents. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. These details are provided for the purpose of example and the invention may be practiced according to the claims without some or all of these specific details.

One aspect of the present invention provides a method for processing mass data. Fig. 1 is a flowchart of a mass data processing method according to an embodiment of the present invention.

The mass data storage and retrieval system based on cloud computing comprises data nodes and index nodes. The index nodes maintain indexes of the face image data blocks, mapping relations among the data blocks and data block attributes, and the data nodes store actual face image data blocks by taking different image data owners as units. When the image data owner accesses the storage system, an independent space is obtained. Each data block is assigned a different ChunkID, and each image block and copy is stored on each data node. The index of the face image data block includes the following attributes: ChunkID, name, type, size, image data owner name, access time, and location information thereof.

The user block records ChunkID, sharing mode, image data block name corresponding relation and image data owner name of the image data owner stored in the system. The system acquires the user block by accessing the ChunkID mapped by OwnerID of the owner of the image data and gives the owner of the image data an independent user block space. The user block of the owner of the image data only allows the owner of the image data corresponding to OwnerID to have the right to access all the user blocks, assigns a ChunkID to the user blocks, and stores the user blocks in each data node of the system after the user blocks.

And the data storage and retrieval system is also internally provided with super nodes, the super nodes are provided with high-speed bandwidth and high-performance nodes, each super node maintains a routing table, and the routing table can be adjusted according to the strength of the self capacity. All supernodes in the data storage retrieval system form a storage ring. The routing table of the super node n comprises m rows, each row comprises x items, x represents the strength of the node performance, the kth row comprises x items, x is equally divided into intervals [ n +3i, n +3i +1 ], wherein k is more than 0 and less than m, and each super node dynamically adjusts the x value according to the current capability of the super node. The storage ring is used for routing query requests. Each supernode is responsible for maintaining its predecessor and successor nodes. The successor node of the supernode n is the immediate successor node of the node n on the memory ring, i.e. the first supernode in the clockwise direction from n on the memory ring, and likewise, the predecessor node of n is the immediate predecessor supernode of n on the memory ring. Each super node also maintains a data node list, each item in the routing table of the super node points to one super node, all data nodes from the super node to a subsequent node are recorded in the data node list, and the node pointed in the data node list can be used for backing up data information on the super node or transferring overflowed image data to the node pointed in the data node list when image storage data of the super node overflows. All requested information on the data storage retrieval system is routed through the memory ring. Wherein a query request of a data node is first forwarded to its successor supernode, and then routing can be performed on the storage ring, eventually reaching the destination. The query requests of the supernodes are routed directly on the storage ring.

The indexing mechanism of the face image data block comprises the establishment of an index and block retrieval. The index establishment is used for establishing association between the actual data block and the storage position, establishing association between the retrieval information and the actual storage data block, and storing information of necessary data blocks. The block retrieval comprises retrieving request information sent by an image data owner through a client in an index, and then feeding back a related retrieval result to the client. The block search involves the storage format of the image data block itself, the node composition structure and the block search mode. The invention divides the data block storage address by adopting address segmentation retrieval.

The face image storage format divides address information of a storage data block into 3 segments according to an image data visitor/image data owner/content. The database of the image data visitor records a list of all image data owners owned by the image data visitor; the database of the image data owner maintains all the contents in the image data owner server; the image content service is responsible for storing, deleting and searching image content on the node device. The data of the image content and the index are stored together and saved as one unit. The path of each content is composed of the HASH value of the image content name and the operation time.

The method of the present invention is to divide the keyword identifier into 3 parts. The three parts are respectively an image data visitor k according to the sequence of the keyword identifier from high to low₁Owner k of image data₂Content of image k₃ID of keyword of image content query value according to k₁k₂k₃Are linked in sequence. The absolute value thereof is: i k₁|，|k₂|，|k₃I is calculated according to the following sequence, firstly, the HASH value of the first content query value is calculated according to the sequence of the image content query feature, and | k is taken according to the sequence from high order to low order₁| position as k₁Then calculating the HASH value of the second content query value to take | k₂| position as k₂Then HASH is calculated for the rest of the content query value and k is taken₃| position as k₃The value of (c). Similar queries are arranged in close proximity.

The image data block is divided into s according to its key word when stored, and its ID is divided into s₁/s₂/s₃Arranged on the routing memory ring from high to low 3 parts, s₁Is a field of picture data visitor addresses, s₂Is the image data owner address field, s₃Is a picture content address field. Through data block address segmentation, after an image data owner determines the position of an image data visitor or the image data owner where the data block is, the specific position of the image data block is quickly determined.

The invention records the node storing the visitorID as the image data visitor node for content query, and the node storing the ownerID as the image data owner node for content query. And finally, recording the node storing the content query ID as the image content node of the image content query. The index information of the content query may be stored in all of its image data visitor nodes, image data owner nodes, and image content nodes.

Each data node maintains local storage content, an image data visitor content index table, an image data owner content index table and a content index table, and simultaneously maintains an image data visitor node list, an image data owner node list and an image content node list. Each row in the image data visitor content index table comprises a content query address, the last access time of image content, a plurality of nodes where the image content corresponding to the content query is located and the like; similarly, the content index table of the image data owner and the content index table also contain the items, but the image content node of the content query in the content index table is at the node; the list of image content nodes includes image data content. The image data visitor node list maintains the image data owner node with the highest recent access frequency of the node; the image data owner node list maintains the image content node which has the highest frequency of being accessed recently; the image content node stores content specific content.

Each node operates a local storage server by a local proxy server; the local storage server shares the storage content with other nodes in the network through the data storage and retrieval system platform. And performing content query service on a data storage and retrieval system platform at the bottom layer, providing a retrieval information request and returning a retrieval result of the data storage and retrieval system to the local Agent. All routing is provided through the underlying data storage retrieval system. The content query comprises an image data visitor service, an image data owner service and a content service, and comprises query, insertion and deletion operations. The task of the content query service is to determine whether to forward to the image data owner node or send to the underlying routing mechanism based on the information in the image data visitor node list. And inserting the index information of the corresponding inquiry image data owner node into the corresponding node according to the image data owner node and the image content node information returned by the content inquiry service.

The data storage retrieval system carries the inquired address in the inquiry operation, if the path image content node list of the request node does not have the address similar to the inquiry of the corresponding request content, the inquiry information sent to the data storage retrieval system firstly inquires an image data visitor node, after the image data visitor node is reached, if the corresponding index item is not found, the ID of the image data owner node is calculated, and the inquiry is continuously carried out in the data storage retrieval system from the image data visitor node until the result is found or the image content node is reached. If the image data visitor node list of the request node has a path address similar to the corresponding request content query, the path address is directly forwarded to the corresponding image data owner node, and the query is started from the image data owner node.

According to the block retrieval mode, the block retrieval process can be divided into two steps, wherein firstly, the node issues request content information; and secondly, the data block retrieves the data block according to the request content information and feeds back the request.

The workflow of the node for issuing the request content comprises the following steps:

(1) the image data owner provides a content query request to the file system, the request is sent to a local Agent, and after the request is analyzed, whether the request is to be forwarded to the cloud end is determined according to whether the request can be cached or not;

(2) the local Agent service inquires whether the requested content exists in the local storage service, and if so, the operation goes to (12);

(3) if the requested content is not found in the local storage service, forwarding the request to a content query service;

(4) the content query service checks whether a path address similar to the request content query exists in the image data visitor node list;

(5) if the image data visitor node list has an address similar to the request content query, directly transferring to the corresponding image data owner node;

(6) if the image data visitor node list does not have the query similar to the request content query, the content query service sends the query to a lower-layer data storage and retrieval system module to query the image data owner node of the content query;

(7) the data storage retrieval system finds out the corresponding image data owner node by searching;

(8) the image data visitor node, the image data owner node or the image content node inquires corresponding index tables of the image data visitor node, the image data owner node or the image content node, whether corresponding indexes for inquiring the request content exist or not is confirmed, and if the index for inquiring the request content is not found in the image data visitor node at the moment, the next image data visitor node or the image content node is continuously inquired through the data storage and retrieval system;

(9) the image data visitor node, the image data owner node or the image content node returns the inquired index information to the content inquiry service of the node which initiates the inquiry request; at this time, if the returned result contains the image data visitor node which requests the query, the content query service updates the image data visitor node list.

(10) Forwarding the return information to a local Agent service;

(11) and if the returned data is empty, the local Agent service acquires the content through the cloud end, otherwise, the content is acquired from the corresponding node according to the returned result. Then storing the backup of the content in the local storage service, and issuing corresponding index information to corresponding image data visitor nodes and image data owner nodes;

(12) the content is sent to a file system.

And wherein the block retrieving the requested content and feeding back the information comprises:

and after the content is acquired, the path index information of the content is issued to the corresponding image data visitor node and the image content node. When a node n sends a content query request content, the node firstly checks whether the content needs to be queried and obtained through a lower layer routing mechanism, and if the content needs to be queried, the node checks whether an image data visitor node of the requested content query exists in an image data visitor node list. If the image data visitor node does not request the image data visitor node of the content in the image data visitor node list, at the moment, the visitorID of the content is calculated according to the content query address and sent to the subjacent data storage and retrieval system to query the viewer image content node of the content, through the query, the data storage and retrieval system finds the image data visitor node of the content, then the image data visitor node checks whether the index queried by the content exists in the image data visitor content index table, the image data owner content index table and the content index table, if the index exists, the path image content node returns the address of the node where the requested content is located to the node n, and the node n obtains the content by requesting the proxy node; if the returned proxy node does not have the requested content, the request is sent to the cloud server. Otherwise, if the index does not exist, computing the ownerID and sending the ownerID to the image content node of the viewer, continuing to query from the image data visitor node by the data storage and retrieval system, finding out the image data owner node through query, checking the index table information on the image data owner node, returning a piece of proxy node information to the local node if corresponding index information exists, otherwise continuing to query the next image data owner node, sequentially circulating the storage loop until the index information of the requested content query exists on a certain image data owner node or the image content node of the requested content query is found, and finishing the query. And if the index table of the image content node inquired by the content does not have the index of the requested content, the request is sent to the cloud server. During this query, each image data owner node will return its own address information to the local node n.

If the image data visitor node list has an image data owner node requesting content inquiry, the inquiry is directly forwarded to the image data owner node, the image data owner node checks the index table, if the index exists, a proxy node is returned, otherwise, the next image data owner node is continuously inquired until the image content node is found.

When a node obtains a content, it will issue information to the image data owner node and the image content node issuing the request, informing these nodes that it has the requested content of the content query, and the image data owner node and the image content node update their index tables accordingly. In the query process, if the image data owner node requesting content query is stored in the image data owner node list of the node n, the image data owner node directly forwards the image data owner node to start the query.

In the aspect of face image feature extraction, the invention uses the difference value of the pixel point of the center of the area and the pixel point of the storage ring neighborhood to represent the texture feature value of the pixel point, takes the neighborhood of the pixel point as a texture unit of the image, quantizes the texture unit through binary value to obtain the local texture feature value, obtains the texture feature vector of the description image by counting the texture unit in the image and carrying out normalization operation, and the detailed steps of using the method to carry out feature extraction are as follows:

first, the image is binary-coded. Randomly selecting a region from the collected face image, wherein any pixel point in the region can be described by G (y, z), and the geometric center point can be h_cDescribing the neighborhood pixel point h in the 3 x 3 window₀To h₇The binary conversion processing is performed as follows:

h_d＝t(h₀-h_c)，…t(h₇-h_c)；

wherein

q is set to describe K feature types, Q ∈ (0, 1, 2, …, K-1). Dividing the collected face image into n × p blocks, and counting the occurrence times of each mode in each block, that is, counting the feature types in the sub-region of each block of face image to obtain a face image feature component U ═ U (U) composed of n × p histograms₁,U₂,…U_n×p). Wherein,

a binary pattern histogram for describing the jth sub-region.

A face image feature histogram is built according to the method set forth above, thereby providing a data basis for face image retrieval.

In order to reduce the noise influence in the image processing process, the invention carries out denoising based on the median filtering human eye visual characteristics. First, a noise point is determined, the size of the image R is set to m × n, and a window of 3 × 3 is adopted to slide on the image.

w_i，j＝{g(i+k,j+r)|k,r＝(1,0,-1)}

calculating the average value of pixels in the window

Finding out the maximum gray value and the minimum gray value of the image R, and respectively marking as I_max(m×n)、I_min(m × n). The threshold value for marking the central pixel point is H_i,j。

if | g (i, j) -w_m|＞H_i,jThen the pixel point is a noise point.

If | g (I, j) | ═ I_max(m.times.n) or I_min(m × n), the pixel is a noise point.

For the above conditions, the present invention determines the threshold H according to the noise sensitivity factor λ_i,jThe size of (2). Defining the noise sensitivity coefficient lambda of the central pixel point g (i, j) of the window as

At this time, whether the pixel point is a noise point is judged, and only the calculated noise sensitivity coefficient lambda is needed_i,jIf | g (i, j) -w_m|＞λ_i,jThe condition is satisfied.

After dividing image pixel points into two types of noise points and non-noise points, smoothing the image g (i, j) by using a NURBS function, wherein the image can be regarded as uniform sampling of a curved surface, is a result of discrete convolution of an original image and k-times spline and l-times spline function, and is described as follows:

wherein, B_k(x-i) and B_lAnd (y-i) are k-order splines and l-order spline convolution templates of NURBS respectively, if g (i, j) is a noise point, a 3 multiplied by 3 filtering window is taken, filtered values are obtained, and then median filtering is carried out, so that a final value is obtained.

If i takes [0,255], the resolution function is defined as follows:

F_r(i)＝N(i)/max[N(i)]

the membership function defining the target area is:

where f (i) is a monotonically increasing function, and satisfies the conditions f (a) 0 and f (b) 1. When the gray value is in the interval of [0, a ], the pixel point belongs to a background area, when the gray value is in the interval of [ b, 255], the pixel point belongs to a target area, and when the gray value is in the interval of [ a, b ], the pixel point needs to be further represented by a fuzzy function.

The features of the face image are extracted according to the method, and a human eye visual perception model is established according to the features, so that face image retrieval is realized.

The invention adopts a characteristic operator to carry out the characteristic operation in the neighborhood of the characteristic pointThe peak value of the pixel gradient direction histogram is taken as the main direction of the feature point, and the coordinate axis is rotated as the main direction of the feature point. Calculating two vector histograms H_i(x) And H_j(x) Similarity of (2):

wherein, | | H_iI and H_jAnd | | represents the length of the histogram feature vector.

And then detecting abnormal characteristic point pairs by combining the scale direction, and finally discarding the abnormal pairs by using random sampling consistency. The whole process is to fit an image transformation matrix through the sample data set. Initial sample data n ═ min { n ═ n₀,max{n_s,n_slog₂μn₀}}。n₀Is the number of matched feature points, n, determined according to a K-nearest neighbor algorithm_sTo discard the number of matching feature points before the outlier feature point pair, μ is the tuning parameter. Of the original image (x)₁，y₁) With the target image (x)₂，y₂) The transformation relationship is as follows:

is a transformation matrix with 8 parameters, at least four characteristic feature point pairs are needed to obtain the matrix parameters, the preferred embodiment of the invention adopts a weighted least square method to solve the matrix parameters, and the method is set

K＝[k₁k₂k₃k₄k₅k₆k₇k₈]

L＝-[x₂y₂]^T/μ

The transformation is then:

K＝-[G^TG]^-1G^TL

and (3) firstly setting the initial value of mu to be 1 and obtaining the initial value of K, and then continuously carrying out iterative computation on mu to finally obtain stable K. The specific algorithm is as follows:

(1) randomly extracting matching characteristic point pairs of different planes, and calculating a transformation matrix K of the point pairs;

(2) for the matching point pair (x, y) to be detected, if the condition | K · x-y | < epsilon, epsilon is a tolerance value, the point is an interior point. If the number of the inner points is larger than the set threshold value t, recalculating the matrix K by an iterative weighted least square method, updating the number of the inner points, and if the number of the inner points is smaller than t, returning to the step (1);

(3) and if the maximum number of the inner point sets is determined and is greater than t after W iterations, calculating a transformation matrix K according to the combination of the inner points.

In a further aspect, the invention provides a condition that the threshold is satisfied, and the processing is carried out through the judgment condition to prevent the misjudgment of the noise point. Assuming that coordinates (x, y) and (x, y) represent feature operators of the source image and the target image, respectively, the feature operator of each feature point pair can be obtained in the following manner.

Δx＝x-s_m(x*·cos(Δθ_m)-y*·sin(Δθ_m))

Δy＝y-s_m(x*·cos(Δθ_m)-y*·sin(Δθ_m))

Where Δ x and Δ y represent histogram representations of the extracted features. Four-item(s)_m，Δθ_m，Δx_m，Δy_m) And representing the transformation approximation of the deletion abnormal feature pair, wherein the following conditions are satisfied:

|Δx-Δx_m|>Δx_t；|Δy-Δy_m|>Δy_t

according to the width of the histogram, Δ x_tAnd Δ y_tRepresenting the threshold values for the horizontal and vertical differences of the histogram, respectively.

After the specific points are matched through the image transformation, the decision tree is applied to the positioning of the human face characteristic points. Firstly, the positioning of the face feature points is trained by adopting shape index pixel gray scale features. Two coordinate points are randomly sampled in a local coordinate system established by two reference points, a pixel gray difference value between the two points is made, then a random offset is added to the middle point of the two reference points to generate a characteristic point, and the pixel gray difference value of the two characteristic points is used as a characteristic.

In training the decision tree, the inputs of the tree are the face image I, the shape S composed of the coordinates of the corresponding reference points, and the true shape S' of the reference points, and the output is the predicted offset Δ S of the reference points. Training of the decision tree first determines the split to non-leaf nodes in the tree. And I (p, delta x, delta y) represents the gray value of a (delta x, delta y) pixel point in a local coordinate system established by taking the p-th reference point as an origin in the obtained picture after the shape of the current reference point is subjected to similarity transformation. Setting a segmentation threshold for a current node

Its value range is [ -255, 255 [)]. In a local coordinate system established by taking the reference point p as an origin, the difference of the gray value characteristics of the shape index pixels of two points and a threshold value are taken

Comparing if it is less than the threshold

The training samples are divided into left child nodes, otherwise into right child nodes.

Optimal characteristic function f₀And an optimum threshold value

The selection of (c) can be described by the following formula:

wherein Δ S_LIs composed of

The Δ S portion of time; delta S_RIs composed of

The Δ S portion of time; f (I) ═ I (p, Δ x)₁，Δy₁)-I(p，Δx₂，Δy₂)；

Var(ΔS_L) Variance value, Var (Δ S), representing the offset of the corresponding p-th reference point in the left child node_R) Representing the variance value of the offset of the corresponding pth reference point in the right child node.

For each non-leaf node, selecting a feature function f to extract shape index features of all samples corresponding to the node, and then selecting a threshold

The shape index features are divided, and the training sample (I, S, S') of the current node is divided into a left child node part and a right child node part (I)_L,S_L,S’_L) And (I)_R,S_R,S’_R)。

Each internal node of the decision tree is trained in the manner described above, such that the decision trees trained for each keypoint are combined into a decision forest. And outputting information of samples contained in leaf nodes in the decision tree to be represented as a binary feature vector, and connecting the binary features of all the decision trees in the decision forest back and forth to form a one-dimensional feature vector. The corresponding feature mapping of a face image in the t-th decision forest is represented by the following formula:

δ^t＝{δ^t_i}i＝1,…L

wherein t represents the number of layers where the decision tree is located, and L represents the number of reference points in the face shape. Delta^t_iThe feature vector is formed by connecting binary features extracted from all decision trees corresponding to the ith reference point in series and is called as local binary feature. By extracting delta corresponding to each reference point in the face^t_iAfter characterization, all δ are added^t_iAre connected in series into a final binary feature vector to represent the feature mapping relation delta of the human face^t。

Randomly extracting images from all face images as a training sample set, and taking the rest images as a test sample set; and respectively extracting SIFT features and DCT features from all the training images, wherein the SIFT features comprise SIFT phase features and SIFT amplitude features.

The face image vector is mapped into a high-dimensional feature space F through a nonlinear function phi, and then principal component analysis transformation is carried out in the high-dimensional feature space F. When principal component analysis transformation is carried out, a nonlinear function E meeting kernel conditions is introduced to replace inner product operation of vectors, namely E (x)_i,x_j)＝Φ(x_i)·Φ(x_j). The process of principal component analysis is:

training sample face vector x with m dimensions₁,x₂,...,x_tMapping to a high-dimensional feature space F by using a nonlinear function phi to obtain phi (x)₁),Φ(x₂),...,Φ(x_t)；

In F for phi (x)_i) And (6) carrying out transformation. Solving a characteristic equation l lambda^Φα ═ K α, where K ═ E (x)_i,x_j))_l×iSo as to obtain a feature vector as:

corresponding characteristic value is lambda^Φ₁，λ^Φ₂，…λ^Φ_l(ii) a Taking the first M eigenvalues in the eigenvalues and the corresponding eigenvectors to obtain an eigen matrix M^Φ＝(D^Φ)^1/2(V^Φ)^TWherein:

D^Φ＝diag(λ^Φ₁，λ^Φ₂，…λ^Φ_m)

V^Φ＝(v₁,v₂,...,v_m)

so the training samples are transformed in space F to:

determining a corresponding separation matrix W^Φ；

Mapping any test sample y to a space F to form phi (y), and extracting a feature vector of the test sample y

After the principal component analysis process is completed, obtaining a kernel independent feature vector and a feature subspace; performing feature fusion on the kernel independent features to obtain a one-dimensional feature vector, and finally obtaining all feature vectors of the training sample set; training an SVM model by using the obtained feature vector;

after obtaining the feature vectors of all the test sample sets by adopting the same method, projecting the feature vectors of the test sample sets to subspaces of the test sample sets respectively to obtain kernel independent feature vectors of the test sample sets;

and (4) using the kernel independent feature vector in a trained SVM model for classification test to obtain a primary recognition result of the face image.

The invention preferably further divides the training sample into overlapped blocks, respectively calculates the discrimination rate of each block, then selects the block construction template with higher discrimination rate, filters the training sample, constructs a new dictionary from the filtered training sample, and finally classifies by sparse representation.

Given a set of n samples containing C classes a ═ a₁*,A₂*,…A_n*]。A_iDenotes the ith image matrix. Each training image is divided into k overlapped blocks, and the block matrix of each image is converted into a vector, namely A_i*＝[a_i,1,a_i,2,…a_i,k]. Denote the whole training dictionary set a as a ═ a₁,A₂,…A_n]Wherein A is_iThe ith module vector representing all images.

For each module set A_iBy using

Which represents the corresponding mean vector of the image,

ith module vector a representing all images in class c_c,iC is [1, C ]]. Then module A_iThe discrimination ratios of (1) are as follows:

and sorting the module discrimination rates from high to low, and only keeping the first h modules to construct a template T. The template is used to filter test and training sample images. Filtered training set fA ═ fA₁,fa₂,...,fa_h]In which fa is_iIs the filtered image A_iThe vector of x represents, and h represents the number of templates included in the template.

To further reduce the amount of computation, principal components are extracted by principal component analysis on fA, and a projection matrix P is constructed, then the dimensions of the training image and the test sample y can be further reduced to:

f_pA＝P'fA

f_py＝P'fy

f_py may be represented as f_pLinear combination of A

f_py＝f_pA·X

And X is a sparse matrix, and the test sample is classified into a class corresponding to the minimum reconstruction residual according to the class residual:

wherein

Is a selection function. | () | laces₂Is 1₂And (5) norm constraint.

In summary, the present invention provides a method for processing mass data, which is helpful for improving the accuracy of face recognition under the conditions of face occlusion, low sample quantity and quality, and information loss, and simultaneously reducing the operation time of the recognition.

It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented in a general purpose computing system, centralized on a single computing system, or distributed across a network of computing systems, and optionally implemented in program code that is executable by the computing system, such that the program code is stored in a storage system and executed by the computing system. Thus, the present invention is not limited to any specific combination of hardware and software.

It is to be understood that the above-described embodiments of the present invention are merely illustrative of or explaining the principles of the invention and are not to be construed as limiting the invention. Therefore, any modification, equivalent replacement, improvement and the like made without departing from the spirit and scope of the present invention should be included in the protection scope of the present invention. Further, it is intended that the appended claims cover all such variations and modifications as fall within the scope and boundaries of the appended claims or the equivalents of such scope and boundaries.

Claims

1. A mass data processing method is characterized by comprising the following steps:

(2) establishing a human eye visual perception model according to the extracted human face image characteristics, and thus, retrieving the human face image;

before the face image retrieval in the step (2), dividing address information of a storage data block in advance according to three categories of an image data visitor, an image data owner and content; the database of the image data visitor records a list of all image data owners owned by the image data visitor; the database of the image data owner maintains all contents in the image data owner server; image contentThe service is responsible for storing, deleting and searching image content on the node device; the path of each content is composed of the HASH value of the image content name and the operation time; dividing the keyword identifier into 3 parts; the image data visitor k is respectively arranged from high to low bits of the keyword identifier₁Owner k of image data₂Content of image k₃ID of keyword of image content query value according to k₁k₂k₃Are connected in sequence; the absolute value thereof is: i k₁|，|k₂|，|k₃I is calculated according to the following sequence, firstly, the HASH value of the first content query value is calculated according to the sequence of the image content query feature, and | k is taken according to the sequence from high order to low order₁| position as k₁Then calculating the HASH value of the second content query value to take | k₂| position as k₂Then HASH is calculated for the rest of the content query value and k is taken₃| position as k₃A value of (d);

distributing different ChunkIDs for each data block, and storing each image block and copy on each data node; obtaining a user block by accessing ChunkID mapped by OwnerID of an image data owner and giving an independent user block space to the image data owner; the user block of the owner of the image data only allows the owner of the image data corresponding to OwneriD to have the authority to access all the user blocks; when image data block is stored, dividing the ChunkID into s according to the key word division₁、s₂、s₃Arranged on the routing memory ring from high to low 3 parts, s₁Is a field of picture data visitor addresses, s₂Is the image data owner address field, s₃Is an image content address field; and through data block address segmentation, after an image data owner determines the position of an image data visitor or the image data owner where the data block is, the specific position of the image data block is determined.

2. The method of claim 1, wherein the texture feature value of the pixel is characterized by a difference between a region-centered pixel and a storage-ring-neighborhood pixel.

3. The method according to claim 1, wherein in the facial image feature extraction, an image takes a pixel neighborhood as a texture unit, the texture unit is quantized through a binary value to obtain a local texture feature value, and a texture feature vector describing the image is obtained by counting the texture unit in the image and performing normalization operation.