wherein acu represents the average accuracy of the classifier; where ti represents the statistical period; ci represents the correct times of classification in the counting period; p represents the number of statistical periods

f. And performing linear fitting of a linear function on c1/t1, c2/t2 … cp/tp by using a least square method to obtain the slope of a fitting straight line, namely the accuracy rate change trend trd of the classifier.

the calculation formula for trd is:

y＝[c₁/t₁，c₂/t₂，...，c_p/t_p]^T

the detection of triggers and the triggering process can refer to the evaluation process automatically enabled by the classifier of fig. 3.

After an evaluation period is finished, when the average accuracy of the classifier is greater than the average accuracy lower limit of the classifier, the lowest accuracy lower limit difference of the classifier is smaller than the lowest accuracy lower limit difference maximum value of the classifier, the accuracy fluctuation level of the classifier is within the accuracy fluctuation level upper limit of the classifier, and the accuracy change trend of the classifier is greater than or equal to zero, starting automatic classification of the classifier, and setting the evaluation period to be 1.2 times until the preset evaluation period upper limit (for example, the evaluation period upper limit can be 2 times).

When the average accuracy of the classifier is greater than the average accuracy lower limit of the classifier, the lowest accuracy lower limit difference of the classifier is smaller than the maximum value of the lowest accuracy lower limit difference of the classifier, the accuracy fluctuation level of the classifier is within the accuracy fluctuation level upper limit of the classifier, but the accuracy change trend of the classifier is smaller than zero, the classifier is continuously disabled, but the evaluation period is set to be 0.8 times of the previous evaluation period until the lower limit (for example, the evaluation period with the evaluation period lower limit of 0.3 times).

When the average accuracy of the classifier is greater than the average accuracy lower limit of the classifier and the lowest accuracy lower limit difference of the classifier is smaller than the maximum value of the lowest accuracy lower limit difference of the classifier, but the accuracy fluctuation level of the classifier is out of the accuracy fluctuation level upper limit of the classifier, and meanwhile the accuracy change trend of the classifier is greater than or equal to zero, the classifier is continuously disabled, but the evaluation period is set to be 0.8 times of the previous evaluation period until the lower limit (for example, the evaluation period lower limit can be 0.3 times of the evaluation period).

8. The classification proxy server also stores the classification result to a classification result library. The historical classification results accumulated in the classification result library provide labeled training corpora under the real user scene for the training of the classifier, and manpower is not needed to label the training corpora. The classification proxy server can use the reserved historical manual classification results to continuously train the existing classifier, so that the classification accuracy is improved.

9. The classifier uses the "natural language text, classification result" binary corpus data pairs in the historical artificial classification results to expand the currently owned training set, and then reconstructs an updated version of the classifier model using a mathematical method (such as SVM, naive Bayes classifier) by using statistical rules or using a neural network algorithm (such as RNN/LSTM) according to the natural language sequence order rules. Because the updated training set contains richer and more accurate corpus information, the reconstructed classifier has higher accuracy than the old-version classifier.

10. The classifier proxy server can use a plurality of modes, after the automatic classification of the classifier is started, the accuracy of the classifier is verified, when the accuracy of the classifier is reduced to be below a commercial threshold value, the manual classification is started again to ensure the accuracy of the classification, namely, the information whether the classifier stored in the classifier configuration library is automatically started or not is updated to be not automatically started.

After the automatic activation of the classifier, there may be one or more of the following mechanisms for each automatic classification to determine whether it is correct.

a. Front inspection mechanism

The check-ahead mechanism occurs before the classification proxy server returns the classification result to the classifier calling unit, and the classification proxy server updates the result of the check-ahead mechanism to the classifier configuration library in time. There are two ways of judging the correct classification result by two ways of manual sampling inspection and multi-classifier mutual inspection. And in the manual sampling inspection mode, the classification proxy server randomly selects a classification request and sends the classification request to a human to be inspected, when the manual classification result is the same as the classifier prediction, 1 time of correct classification is counted, and otherwise, 0 time of correct classification is counted. Multi-classifier cross-detection N (N > ═ 2) equivalent detection classifiers are prepared for each classifier, using the same corpus data as the corresponding classifier, but using a different classification algorithm. For example, the main classifier is LSTM algorithm with attention model, and the detection classifier uses Bayesian algorithm, and uses the same Chinese vocabulary based training corpus. Alternatively, the detection classifier may use the same algorithm as the corresponding classifier but with different types of corpora, e.g., the main classifier uses a Chinese vocabulary based corpus and the detection classifier uses a corresponding vocabulary Chinese Pinyin symbols and tones corpus (the corpus format for the detected classification request may be manually converted). For each classification request, 1 correct classification is counted when the result of the master classifier (the classifier whose result is to be verified) is the same as the result of all classifier dominance (i.e., the same result obtained by most classifiers including the master classifier). Otherwise, the classifier counts 0.25 correct classifications.

b. Mechanism of posterior

The posterior mechanism occurs after the classification proxy server returns the classification result to the classifier invoking unit. The classification proxy server provides the classifier invocation unit with a unique number representing a classification. The classifier calling unit embeds the unique number in the answer for the user to select, and 2-3 most probable classification results are usually reserved in the answer and sent to the customer service staff or the client for the customer service staff or the client to select. After the customer service staff or the customer selects the corresponding result, the selected result represents the selection of the customer service staff or the customer service on the classification, namely, a feedback is provided for the classification. The classification proxy server judges whether the automatic classification result of the classifier is correct or not by collecting and using the feedback result. The classification proxy server will record the received feedback to the classifier configuration library for a period of time (typically 30-60 seconds) after the classification occurs, and if no feedback is received beyond the appointed time, no feedback result is recorded, i.e. the a posteriori mechanism has no result. When the feedback comes from the customer service staff, the feedback is called customer service feedback, and the result of the feedback is high in credibility. And when the customer service feedback result is the same as the classification result with the highest probability in the classification results, counting 1 time of correct classification, or counting 0 time of correct classification. When the feedback comes from the client, called client feedback, the confidence of the feedback result is low. When the result fed back by the client is consistent with the automatic classification result of the classifier, the current classifier counts 1 correct classification. When the result fed back by the client is inconsistent with the classification result with the highest probability in the automatic classification results of the classifier, the classifier is used for 0.4 times of correct classification. When no client feeds back the result, the classifier counts 0.5 times of correct classification.

For each automatic classification result, the correct value count of the classification is calculated as follows.

● if there is a manual spot check, the correct count of the results of the manual spot check is used.

● if there is no manual spot check, see if there is a result of customer service feedback, if so, use the result of customer service feedback as the result of this classification to count correctly.

● if there is no customer service feedback result, the calculation method for the correct counting of this classification is:

if the client feedback has a value, the correct counting of the classification is as follows:

(correctly counting the mutual inspection results of multiple classifiers + correctly counting the feedback results of clients)/2

If the feedback result of the client has no value, the classification is correctly counted as the correct counting of the mutual inspection result of the multiple classifiers

And after the correct evaluation result is automatically classified every time, evaluating the correct rate index in an evaluation period by using the same method in the step 7. And the flow illustrated in fig. 4 is used to determine whether to stop the automatic classification of the classifier:

at the end of an evaluation period, when the average accuracy of the classifier is greater than the average accuracy lower limit of the classifier, the lowest accuracy lower limit difference of the classifier is smaller than the lowest accuracy lower limit difference maximum value of the classifier, the accuracy fluctuation level of the classifier is within the accuracy fluctuation level upper limit of the classifier, and the accuracy change trend of the classifier is greater than or equal to zero, the automatic classification of the classifier is maintained, and the evaluation period is set to be 1.2 times until the preset evaluation period upper limit (for example, the evaluation period upper limit can be 2 times).

When the average accuracy of the classifier is larger than the average accuracy lower limit of the classifier, the lowest accuracy lower limit difference of the classifier is smaller than the lowest accuracy lower limit difference maximum value of the classifier, the accuracy fluctuation level of the classifier is within the accuracy fluctuation level upper limit of the classifier, but the accuracy change trend of the classifier is smaller than zero, the automatic classification of the classifier is maintained, and the evaluation period is set to be 0.8 times of the previous evaluation period until the lower limit (for example, the evaluation period with the evaluation period lower limit of 0.3 times).

When the average accuracy of the classifier is larger than the average accuracy lower limit of the classifier and the lowest accuracy lower limit difference of the classifier is smaller than the maximum value of the lowest accuracy lower limit difference of the classifier, but the accuracy fluctuation level of the classifier is out of the accuracy fluctuation level upper limit of the classifier, and meanwhile the accuracy change trend of the classifier is larger than or equal to zero, the automatic classification of the classifier is maintained, but the evaluation period is set to be 0.8 times of the previous period until the lower limit (for example, the evaluation period lower limit can be 0.3 times of the evaluation period).

When the average accuracy of the classifier is larger than the average accuracy lower limit of the classifier and the lowest accuracy lower limit difference of the classifier is smaller than the lowest accuracy lower limit difference maximum value of the classifier, but the accuracy fluctuation level of the classifier is out of the accuracy fluctuation level upper limit of the classifier, and meanwhile the accuracy change trend of the classifier is smaller than zero, the automatic classification of the classifier is maintained, but the evaluation period is set to be 0.5 times of the previous period until the lower limit (for example, the evaluation period with the evaluation period lower limit of 0.3 times).

The natural language classifier system realized by the invention realizes thesteps 5, 6A and 6B, so that the natural language classification requirement with high accuracy can be quickly realized by introducing a manual platform when the accuracy of the classifier does not reach the commercial standard. Meanwhile, through the step 8, a large amount of correct real corpus training data sets are generated while manual classification work is finished on classification tasks, and the cost for manually generating the training sets is greatly saved. Through step 9, the system continuously improves the classifier accuracy. Through step 7, when the classifier reaches the commercial standard, the system is allowed to automatically or manually start automatic classification, and finally the construction of the full-automatic natural language classifier is completed quickly and at low cost. By step 10 it is ensured that the correctness of the classifier is kept at a high level.

Claims

1. A natural language classifier system comprises a classification proxy server, a classifier and an artificial classification platform, and is characterized in that: the classification proxy server is used for receiving the classifier serial number and the classification source language material and sending the classification source language material to the classifier with the corresponding serial number; the classifier is used for classifying the classified source linguistic data and sending a classification result to the classification proxy server; the classification proxy server is also used for carrying out automatic starting judgment on the classifier after receiving the classification result, and if the classifier is the automatic starting classifier, the classification proxy server outputs the classification result; if the classifier is not the automatic starting classifier, the classification proxy server sends the classification source material to the manual classification platform and outputs the received classification result returned by the manual classification platform; the system also comprises a classifier configuration library, wherein the classifier configuration library is used for storing information of the background classifier, including classifier number, whether the classifier is automatically started or not, classifier address, address of manual processing queue and accuracy index information; the classification proxy server inquires a classifier configuration library according to the classifier number, obtains a classifier address and sends the classification source material to a classifier of a corresponding address according to the obtained classifier address; after obtaining the manual classification result, the classification proxy server also compares the manual classification result with the automatic classification result, calculates the classification accuracy of the classifier and stores the classification accuracy in a classifier configuration library; when the classification accuracy of the classifier reaches a preset threshold value, the classification proxy server sets the classifier as an automatic classification classifier and updates the information of whether the classifier is automatically started or not in the classifier configuration library to automatic starting.

2. The classifier system as claimed in claim 1, wherein: the system also comprises a classification result library, and the classifier proxy server also stores the classification result into the classification result library.

3. The classifier system as claimed in claim 1, wherein: the classifier is also used for constructing a training set according to historical classification results and continuously training.

4. The classifier system as claimed in claim 1, wherein: and after the automatic classification of the classifier is started, the classification proxy server verifies the accuracy of the classifier.