Preferably, in the step (2), the basic model parameters of the xgboost model use the tree model as a base classifier, the iteration number, that is, the number of used trees, is set to be 500, and the maximum depth of the tree is set to be 12.

Preferably, in the step (2), the training parameters of the xgboost model are L objective functions for classifying blood donors, the model evaluation function is set to be a two-classification error rate, the learning rate is 0.3, and in order to avoid the influence caused by overfitting during training of each piece, the sampling rate of the data set is set to be 90%, and the selected feature proportion is also 90%.

Preferably, in the step (3), the training step is as follows:

(3.1) tracking the personnel receiving the short messages by means of the previous short message recruitment information, and if blood donation records exist within seven days, determining the personnel as effective blood donation personnel, otherwise, determining the personnel as ineffective blood donation personnel;

(3.2) constructing the collected characteristics of the effective blood donators and the ineffective blood donators when sending short messages to obtain training data of the model;

(3.3) training the xgboost model with the training data as model input and whether the person is a valid donor as expected output.

Preferably, the step (4) comprises the following sub-steps:

(4.1) the blood collection and donation interval period is in accordance with the regulation (the interval period between two times of whole blood/single erythrocyte and whole blood donation is not less than 6 months, and the interval period between two times of single blood platelet/plasma/single granulocyte and whole blood donation is not less than 4 weeks), and the blood donor characteristics of healthy citizens aged from eighteen years to fifty-five years or multiple blood donors who have no blood donation reaction and are in accordance with the health examination requirements and are not more than sixty years old;

(4.2) obtaining a feature set of the subject from a blood donator feature library;

and (4.3) inputting the basic attribute of each blood donation person into the trained xgboost model to obtain the expected output of the effective blood donation persons, sequencing according to the output result, and recommending the persons with higher rank.

Compared with the prior art, the invention utilizes the annual blood donation data of the blood center to construct a basic attribute library of blood donators, obtains training data through the past short message recruitment data, and obtains an xgboost model through training. After the model training is completed, when new short messages need to be issued for recruitment, the blood donator data meeting the conditions can be correspondingly screened, then the xgboost model can be used for giving reference recognition results, the reference recognition results are ranked, and corresponding recruitment objects are selected according to the required number of people. The method is realized by adopting an xgboost model, and effectively excavates the importance degree of the attribute importance, and effectively learns the internal relation of the attribute of the blood donation object. Experimental results show that the classification precision achieved by the method can reach more than 80%, and the method is obviously superior to the prior art.

Drawings

Fig. 1 is a flow chart of a conventional blood center for selecting suitable recruiting subjects;

note: 1. age regulation: healthy citizens aged eighteen to fifty-five or multiple blood donors who have no blood donation reaction and meet the requirement of health examination and do not exceed sixty years old;

2. blood donation interval regulation: the interval between two times of blood collection/single erythrocyte collection and whole blood donation is not less than 6 months, and the interval between two times of single blood collection platelet/plasma/single granulocyte collection and whole blood donation is not less than 4 weeks;

FIG. 2 is a schematic diagram of the xgboost model used in the present invention.

Detailed Description

The technical contents of the present invention will be further described in detail with reference to the accompanying drawings and specific embodiments.

The invention provides a method for improving short message recruitment precision of blood donators by means of a machine learning method. The following are specifically described below:

a method for recruiting a blood donor by a short message commonly used in a blood center is shown in fig. 1, and blood donors with a specified blood donation interval and age are mainly selected from blood donors in the last year. The part of subjects who have participated in blood donation activities recently shows that the subjects have willingness to donate blood recently, and is a reasonable recruitment target, so that the blood donation time is an effective characteristic for whether to participate in blood donation. Meanwhile, according to the past experience of recruiting blood donations in the blood center, it is important to find out whether the occupation and education degree is important for a person to donate blood. Table 1 shows a blood donation record of a blood donor, which contains basic characteristics of the individual, such as age, sex, residence condition, blood donation response, and the like, and the basic information also has influence on the individual's blood donation. After research, the inventor finds that the willingness of a blood donor is not only dependent on one blood donation, but is often related to the record of multiple blood donations, such as the number of blood donations, the total amount of blood donations, the number of blood donations, the frequency of blood donations and the like. According to the combination of experience and experimental study, a characteristic library containing 13 items of age, sex, blood type, recent blood donation amount, total blood donation amount, blood donation times, blood donation interval, blood donation frequency, whether blood detection is qualified or not, education degree, living condition, occupation and blood donation reaction is designed, wherein the blood donation frequency is calculated by adopting a formula (1).

TABLE 1 recording of blood donation by a blood donor

The blood donation records often have certain missing values, such as education degree missing, occupation missing and the like, in order to deal with the situation, default values are adopted to replace the missing values, the missing education degree is replaced by education degrees of junior high and below, the missing occupation is mainly replaced by other occupation due to the fact that the occupation is not classified in a blood donation system, and the missing living state is replaced by temporary living.

In the embodiment of the present invention, the xgboost model shown in fig. 2 is used, which uses a decision tree model as a base classifier and uses 500 trees, wherein the model input is the characteristic of the blood donator, each characteristic data is firstly classified in a first tree and finally falls into one leaf node, the value of the obtained leaf node is the output value of the first tree, then the same operation is performed by using a second tree, and the output values of all the trees are added, and the expected output of the effective blood donator is output after calculation by using L g donations function, so that the sampling rate of the fitting data set is 90% in order to avoid the situation that the fitting data set occupies 90% in each tree.

In one embodiment of the present invention, the input to the xgboost model is assumed to be

According to the xgboost model, using the first tree we obtain:

wherein,

the weight of the leaf node of the model input x falling into the tree is obtained, the second tree also adopts the same operation as the first tree, and the following analogy is carried out to obtain:

…

produced for each treeWeight set

Add them to obtain the final predicted target value

The goal of the final enrollment is to derive a expectation of whether the donor is an effective blood donation goal

Therefore, when outputting, it needs to be adjusted by using L g-logic function, which is expressed as:

before recruiting using the xgboost model, the model must be iteratively learned, i.e. each tree is generated, and the generation of each tree mainly selects the best splitting characteristics and leaf weights. The model determines the splitting characteristics and the leaf node weight of the tree model according to the loss function, and the calculation only depends on the first derivative and the second derivative of the loss function, and the current optimal splitting characteristics and weight can be directly obtained, so the calculation speed is high. The training process can be realized by using a Python third-party library xgboost, the loss function is defaulted in the third-party library, and the loss function can be designed by self. In this case, the default loss function is better. It should be noted that the hyper-parameters such as the number of decision trees and the learning rate used in the experiment are obtained when the optimization is obtained from the verification set.

In the invention, the model training process is a process for determining how the tree model in the formula is divided and the leaf node weight, and the specific training steps are as follows:

(1) and tracking the personnel receiving the short messages by means of the previous short message recruitment information according to the date of sending the short messages, and if the personnel have blood donation records within seven days, namely participate in blood donation, the personnel are considered as effective blood donation personnel, otherwise, the personnel are ineffective blood donation personnel.

(2) And establishing a blood donator feature library when the collected effective blood donators and the collected ineffective blood donators send short messages, and selecting the features of the corresponding effective blood donators and the ineffective blood donators to obtain a data set of the model.

(3) And (3) taking the training data in the step (2) as model input, using a predefined loss function, and taking whether the person is an effective blood donor as expected output to train the xgboost model and determine the optimal segmentation characteristics and the leaf node weight of the tree model.

In the embodiment of the invention, 95476 short message recruitings are implemented from 2016 to 2019 in total according to the long-term short message recruiting records, wherein 56026 effective blood donator data and 39450 ineffective blood donator data are provided. For these data, to keep the ratio of valid blood donations to invalid blood donations the same, the same amount of data as the number of invalid blood donations is selected from the valid blood donations to construct a data set. And (3) dividing the data set into a training set, a verification set and a test set, training and learning the hyper-parameters and the parameters of the model, and determining the optimal tree model parameters and splitting characteristics.

After the model training is completed, the model can give the importance degree of different features to the classification, and table 2 gives the four most important features and the importance values thereof. Once model training is complete, it can be used in a process for efficient donor identification.

TABLE 2 top four tables of importance of model features

Next, the specific steps of using the trained xgboost model to recruit and recommend blood donation subjects are introduced:

(1) collecting blood donation personnel whose blood donation age and blood donation interval are in accordance with blood donation related regulations.

(2) The characteristics of the subject are obtained from a blood donator basic characteristic library.

(3) And inputting the basic attribute of each blood donator into the trained xgboost model to obtain the expected output of the effective blood donators, sequencing according to the output result, and recommending the personnel with higher rank.

Compared with the prior art, the invention utilizes the annual blood donation data of the blood center to construct the blood donator feature library, obtains training data through the past short message recruitment data, and obtains the xgboost model through training. After the model training is completed, when new short messages need to be issued for recruitment, the blood donator data meeting the conditions can be correspondingly screened, then the xgboost model can be used for giving reference recognition results, the reference recognition results are ranked, and corresponding recruitment objects are selected according to the required number of people. The method is realized by adopting an xgboost model, and effectively excavates the importance degree of the attribute importance, and effectively learns the internal relation of the attribute of the blood donation object. Experimental results show that the classification precision achieved by the method can reach more than 80%, the method is obviously superior to the prior art, and the accuracy of donation recruitment is improved, so that more manpower and material resources are saved, and the efficiency and quality of recruitment are improved.

The blood donator identification and recruitment method based on machine learning provided by the invention is explained in detail above. It will be apparent to those skilled in the art that any obvious modifications thereof can be made without departing from the spirit of the invention, which infringes the patent right of the invention and bears the corresponding legal responsibility.

Claims

1. A blood donation person identification and recruitment method based on machine learning is characterized by comprising the following steps:

(3) acquiring data of available/invalid blood donators by using past short messages, and training the xgboost model; (ii) a

2. The machine learning-based blood donor identification and recruitment method of claim 1, wherein:

in the step (1), the designed and calculated characteristics comprise age, sex, blood type, recent blood donation amount, total blood donation amount, blood donation times, blood donation interval, blood donation frequency, whether blood detection is qualified or not, education degree, living condition, occupation and blood donation reaction, wherein the calculation of the blood donation frequency adopts a formula (1):

3. the machine learning-based blood donor identification and recruitment method of claim 1, wherein: and when some attribute values have missing conditions, complementing the missing conditions by adopting default values.

4. The machine learning-based blood donor identification and recruitment method of claim 1, wherein:

in the step (2), the basic model parameters of the xgboost model take the tree model as a base classifier, the iteration number is set to be 500, and the maximum depth of the tree is set to be 12.

5. The machine learning-based blood donor identification and recruitment method of claim 1, wherein:

in the step (2), training parameters of the xgboost model are that an objective function of the model is an L objective function, a model evaluation function is set to be a two-classification error rate, a learning rate is 0.3, and when each piece of data is trained, in order to avoid the influence caused by overfitting, a sampling rate of a data set is set to be 90%, and a selected feature proportion is also 90%.

6. The machine learning-based blood donor identification and recruitment method of claim 1, wherein:

in the step (3), the training step is as follows:

(3.1) tracking the personnel receiving the short messages by means of the previous short message recruitment information, if blood donation records exist in seven days, the personnel are considered as effective blood donation personnel, and if the blood donation records do not exist, the personnel are invalid;

(3.2) obtaining training data of the model by using the collected characteristics of the effective blood donators and the ineffective blood donators when sending short messages;

7. The method of blood-donating person identification and recruitment according to claim 1, wherein: the step (4) comprises the following substeps:

(4.1) collecting blood donors of healthy citizens who meet the requirement of health examination and do not exceed sixty years and have no blood donation reaction in the past and are aged from eighteen years to fifty-five years and have no blood donation reaction for a plurality of times when the interval period of two times of whole blood/single erythrocyte and whole blood donation is not less than 6 months and the interval period of two times of single platelet/plasma/single granulocyte/whole blood donation is not less than 4 weeks;