Movatterモバイル変換


[0]ホーム

URL:


CN109388712A - A kind of trade classification method and terminal device based on machine learning - Google Patents

A kind of trade classification method and terminal device based on machine learning
Download PDF

Info

Publication number
CN109388712A
CN109388712ACN201811107159.8ACN201811107159ACN109388712ACN 109388712 ACN109388712 ACN 109388712ACN 201811107159 ACN201811107159 ACN 201811107159ACN 109388712 ACN109388712 ACN 109388712A
Authority
CN
China
Prior art keywords
text
vector
industry
training set
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811107159.8A
Other languages
Chinese (zh)
Inventor
吴壮伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co LtdfiledCriticalPing An Technology Shenzhen Co Ltd
Priority to CN201811107159.8ApriorityCriticalpatent/CN109388712A/en
Publication of CN109388712ApublicationCriticalpatent/CN109388712A/en
Pendinglegal-statusCriticalCurrent

Links

Classifications

Landscapes

Abstract

The present invention provides a kind of trade classification method and terminal device based on machine learning, comprising: obtain training set, the text in training set includes management functions information and is labeled with corresponding category of employment;Word segmentation processing is carried out to text, obtains vocabulary;By feature extraction, the vocabulary of the first preset number is obtained in vocabulary as keyword;The term vector of keyword is obtained by term vector model;The term vector of all keywords is averaging, primary vector is obtained;Maximum term vector in the term vector of all keywords is obtained, secondary vector is obtained;The smallest term vector in the term vector of all keywords is obtained, third vector is obtained;By primary vector to third vector, the feature vector of text is formed;Pass through training set training trade classification model;By completing the trade classification model of training, treats classifying text and carry out trade classification.The present invention carries out trade classification by the method for machine learning, improves the efficiency and precision of trade classification.

Description

A kind of trade classification method and terminal device based on machine learning
Technical field
The invention belongs to field of computer technology more particularly to a kind of trade classification methods and terminal based on machine learningEquipment.
Background technique
After reform and opening-up, Chinese national economy is fast-developing, and market economy is constantly flourishing, and national economic structure is graduallyPerfect, specialization also gradually refines;Enterprise is no longer satisfied with longitudinal development, occur in succession in every profession and trade it is inter-trade, across productionIndustry, crosswise development large enterprise.In this background, it analyzes and studies new period allergy economic sectors and the development of industry is existingShape understands expanding economy and is found to have important reference value for holding the development trend of national economy.
(enterprises and institutions of such as nomocracy are single according to each mechanism by the staff of National Administration for Code Allocation to OrganizationsPosition, organ, public organization etc.) electronic record on the business scope that records etc. information, the GB/ issued in conjunction with State Statistics BureauT 4754-2017 " industrial sectors of national economy classification " sorts out each mechanism, which is divided into designated trade.ByIt is related to multiple industries in the operation of many enterprises at this stage, the method by manually carrying out trade classification is inefficient and inaccurateReally.
Summary of the invention
In view of this, the embodiment of the invention provides a kind of trade classification method and terminal device based on machine learning,To solve the problems, such as that trade classification in the prior art is inefficient and inaccurate.
The first aspect of the embodiment of the present invention provides a kind of trade classification method based on machine learning, comprising:
Training set is obtained, the training set is the text collection through manually marking, and the training set is by a variety of categorys of employmentText constitute, for any text in the training set, the text includes management functions information, and the text markingThere is corresponding category of employment;
Word segmentation processing is carried out to the text, obtains vocabulary corresponding to the text;
By feature extraction, the vocabulary of the first preset number is obtained in the vocabulary as keyword;
For any keyword of acquisition, the term vector of the keyword is obtained by term vector model;
The term vector of all keywords is averaging, primary vector is obtained;
Maximum term vector in the term vector of all keywords is obtained, secondary vector is obtained;
The smallest term vector in the term vector of all keywords is obtained, third vector is obtained;
By the primary vector, the secondary vector and the third vector, the feature vector of the text is formed;
Pass through training set training trade classification model;
By completing the trade classification model of training, treats classifying text and carry out trade classification.
The second aspect of the embodiment of the present invention provides a kind of computer readable storage medium, the computer-readable storageMedia storage has computer-readable instruction, and the computer-readable instruction realizes following steps when being executed by processor:
Training set is obtained, the training set is the text collection through manually marking, and the training set is by a variety of categorys of employmentText constitute, for any text in the training set, the text includes management functions information, and the text markingThere is corresponding category of employment;
Word segmentation processing is carried out to the text, obtains vocabulary corresponding to the text;
By feature extraction, the vocabulary of the first preset number is obtained in the vocabulary as keyword;
For any keyword of acquisition, the term vector of the keyword is obtained by term vector model;
The term vector of all keywords is averaging, primary vector is obtained;
Maximum term vector in the term vector of all keywords is obtained, secondary vector is obtained;
The smallest term vector in the term vector of all keywords is obtained, third vector is obtained;
By the primary vector, the secondary vector and the third vector, the feature vector of the text is formed;
Pass through training set training trade classification model;
By completing the trade classification model of training, treats classifying text and carry out trade classification.
The third aspect of the embodiment of the present invention provides a kind of terminal device, including memory, processor and is stored inIn the memory and the computer-readable instruction that can run on the processor, the processor executes the computer canFollowing steps are realized when reading instruction:
Training set is obtained, the training set is the text collection through manually marking, and the training set is by a variety of categorys of employmentText constitute, for any text in the training set, the text includes management functions information, and the text markingThere is corresponding category of employment;
Word segmentation processing is carried out to the text, obtains vocabulary corresponding to the text;
By feature extraction, the vocabulary of the first preset number is obtained in the vocabulary as keyword;
For any keyword of acquisition, the term vector of the keyword is obtained by term vector model;
The term vector of all keywords is averaging, primary vector is obtained;
Maximum term vector in the term vector of all keywords is obtained, secondary vector is obtained;
The smallest term vector in the term vector of all keywords is obtained, third vector is obtained;
By the primary vector, the secondary vector and the third vector, the feature vector of the text is formed;
Pass through training set training trade classification model;
By completing the trade classification model of training, treats classifying text and carry out trade classification.
The present invention provides a kind of trade classification method and terminal device based on machine learning, will manually mark industry classOther text composing training collection, the content that text includes is management functions information, for any text in training set, by rightText carries out keyword extraction, constitutes this article by average value, maximum value and the minimum value of all keyword term vectors gotThis corresponding feature vector, comprehensively considers the semantic content of different term vectors, has preferable semantic depth, by completing featureThe training set of extraction is trained trade classification model, until reaching trained termination condition, by trained trade classification mouldType classifies to the text for including operation information, improves the efficiency and precision of classification.
Detailed description of the invention
It to describe the technical solutions in the embodiments of the present invention more clearly, below will be to embodiment or description of the prior artNeeded in attached drawing be briefly described, it should be apparent that, the accompanying drawings in the following description is only of the invention someEmbodiment for those of ordinary skill in the art without any creative labor, can also be according to theseAttached drawing obtains other attached drawings.
Fig. 1 is a kind of flow diagram of the trade classification method based on machine learning provided in an embodiment of the present invention;
Fig. 2 is the flow diagram of the method provided in an embodiment of the present invention for obtaining optimal industry disaggregated model;
Fig. 3 is the flow diagram of another trade classification method based on machine learning provided in an embodiment of the present invention;
Fig. 4 is a kind of structural block diagram of the trade classification device based on machine learning provided in an embodiment of the present invention;
Fig. 5 is a kind of schematic diagram of terminal device provided in an embodiment of the present invention.
Specific embodiment
In being described below, for illustration and not for limitation, the tool of such as particular system structure, technology etc is proposedBody details, to understand thoroughly the embodiment of the present invention.However, it will be clear to one skilled in the art that there is no these specificThe present invention also may be implemented in the other embodiments of details.In other situations, it omits to well-known system, device, electricityThe detailed description of road and method, in case unnecessary details interferes description of the invention.
In order to illustrate technical solutions according to the invention, the following is a description of specific embodiments.
The trade classification method based on machine learning that the embodiment of the invention provides a kind of.In conjunction with Fig. 1, this method comprises:
S101 obtains training set, and the training set is the text collection through manually marking, and the training set is by a variety of industriesThe text of classification is constituted, and for any text in the training set, the text includes management functions information, and the textIt is labeled with corresponding category of employment.
Wherein, every kind of category of employment in training set at least corresponds to a text, and each text uniquely maps a kind of industryClassification.
The business scope of managerial setup describes what enterprise, public institution, organ and self-employed individual were engaged inProduction and operating activities or other social activitieies contain its all management functions information.
The business scope information of one enterprise can be obtained by multiple channel, for example, in national organization mechanism code managementThe heart is that the organization of each nomocracy distributes a unique marking code, and retains a electronic record, archives for itIn record the business scope of the organization in detail, what is recorded in the business scope is the management functions information of enterprise.Certainly,The management functions information of each organization can also be obtained by other methods, it is not limited in the embodiment of the present invention.
The management functions information of one organization and its affiliated industry have very strong relevance, with its management functions letterBreath is foundation, which can be divided into a corresponding industry.
In embodiments of the present invention, the management functions information of multiple organizations, the operation of each organization can be obtainedBusiness information corresponds to a text, by manually carrying out category of employment mark to text, obtains the training set in this step.
Optionally, in embodiments of the present invention, a category of employment list can be preset, for example, can wrap in the listPointed all categorys of employment in classification standard containing " industrial sectors of national economy classification ", being arranged one for each category of employment canWith the mark of unique identification the sector classification, when manually carrying out category of employment mark to the text in training set, according toUnique mapping relations that category of employment in category of employment list is identified with it mark the category of employment corresponding to it for the textMark.
S102 carries out word segmentation processing to the text, obtains vocabulary corresponding to the text.
To the text in training set, text can be segmented by existing a variety of participle models, obtain the text pairThe word lists answered.
Optionally, can word segmentation result to text is further is screened, remove default stop words in word segmentation result andPunctuation mark, and remaining vocabulary is arranged according to word frequency descending, the vocabulary for being arranged in front the second preset number is chosen, by screeningTo vocabulary constitute the vocabulary in this step.
Remove default stop words, refer to removal " ", " " and some pairs of trade classifications there is no the default word of purposes,The vocabulary obtained after stop words and punctuation mark will be removed according to word frequency descending sort, optionally, before choosing in ranking results90% word, remove ranking results in come rear 10% word, using final the selection result as the vocabulary.
S103 obtains the vocabulary of the first preset number as keyword by feature extraction in the vocabulary.
TF-IDF (term frequency-inverse document frequency, word frequency-inverse document frequency)Algorithm is that by TF-IDF matrix a word and text can be calculated in a kind of feature extraction and feature weight computing techniqueThe correlation degree of classification obtains the value of a score, and the higher word of score, class discrimination ability is higher, in this step, can lead toThe value of the different degree of the text will be calculated by crossing TF-IDF and successively calculating each vocabulary in the word lists that step S102 is obtainedAs a result descending arranges, and chooses the vocabulary of the first preset number as keyword.For example, there are 50 words in word lists, pass through TF-IDF successively calculates the value of each word, and calculated result descending is arranged, and chooses preceding 10 words in ranking results as keyword.
S104 obtains the term vector of the keyword by term vector model for any keyword of acquisition.
By existing term vector model, the term vector of keyword can be obtained, in general, the term vector is one 256 dimensionVector.
The term vector of all keywords is averaging by S105, obtains primary vector, obtains all keywordsMaximum term vector in term vector obtains secondary vector, obtains the smallest term vector in the term vector of all keywords, obtainsThe feature vector of the text is formed by the primary vector, the secondary vector and the third vector to third vector.
For example, in step s105, the term vector of each keyword is 256 dimensions, then the text constructed by this stepFeature vector is the vector of 256*3 dimension, and this feature vector is made of primary vector, secondary vector and third vector,Primary vector, secondary vector and third vector are the vector of continuous 256 dimension in the feature vector of the text.
By the feature vector for the text that this method obtains, the semantic content of different keyword term vectors is comprehensively considered,There is higher semantic depth compared with the method for existing building Text eigenvector, to improve the essence of trade classificationDegree.
S106 passes through training set training trade classification model.
In embodiments of the present invention, trade classification model is deep neural network model, the deep neural network modelIncluding 4 layers, respectively input layer, the first hidden layer, the second hidden layer and output layer, the input of the input layer are the textCorresponding feature vector, first hidden layer include the first present count destination node, and second hidden layer includes secondThe activation primitive of present count destination node, first hidden layer and second hidden layer is relu function, the output layerFor the probability of the type of the text, the activation primitive of the output layer is logistics function.
Optionally, input layer includes a node, by the feature vector of the obtained text of step S105, as the input layerThe input of node;
First hidden layer includes 100 nodes, including 1 × 100 dimension, and activation primitive is relu function;
Second hidden layer includes 200 nodes, including 1 × 200 dimension, and activation primitive is relu function;
The activation primitive of output layer is logistics function, and output result is the probability of industry type, such as in training setIndustry is divided into 95 classes, then what output layer exported is the probability that the text is every one kind in this 95 class.
Optionally, being trained by training set to trade classification model includes: by the training set to the industryLearning rate, frequency of training, batch size and the termination error of disaggregated model are trained, until reach default training termination condition,Wherein, the default trained termination condition is to reach the frequency of training or word segmentation result error lower than the termination error.
Further, in conjunction with Fig. 2, the embodiment of the invention also provides a kind of method for obtaining optimal industry disaggregated model,This method comprises:
S1061 establishes multiple deep neural network models, for any two in the multiple deep neural network modelA deep neural network disaggregated model, the learning rate of described two deep neural network models, frequency of training, batch size and terminationError is different.
Optionally, it for the deep neural network trade classification model provided in step S106, establishes multiple by different ginsengsThe deep neural network model that number is constituted.
For example, learning rate chooses a value in 0.01,0.02 and 0.03;
Frequency of training chooses a value in 500,1000 and 2000;
Criticize a value in selection of dimension 100,200 and 500;
Termination error choose a value in 0.05,0.1 and 0.5;
Thus a variety of deep neural network disaggregated models be may make up, for example, learning rate is 0.01, frequency of training 500, is criticizedHaving a size of 100 and termination error be 0.05 when may make up a trade classification model.
S1062 is respectively trained the multiple deep neural network model by the training set.
Multiple deep neural network models in S1061 are trained respectively by training set, until reaching training eventuallyOnly condition.
S1063 obtains default test set.
In embodiments of the present invention, the acquisition process of test set and the acquisition process of training set are identical.
S1064, by the default test set to testing respectively the multiple deep neural network model.
S1065 chooses the highest deep neural network model of trade classification accuracy according to test result and treats pointClass text carries out trade classification.
Since the industry type of text each in test set is known, a such as test text X, test text X'sIndustry type is agricultural, by the way that the feature vector of X is inputted industry disaggregated model, if the X that trade classification model is calculatedType is the maximum probability of agricultural, then trade classification model is correctly, if trade classification model to the prediction of the type of text XThe type for the X being calculated is that the probability of agricultural is not the largest, such as trade classification model is calculated the type of text X and isThe maximum probability of animal husbandry, then trade classification model is wrong to the prediction of the type of text X.
By this method, by the test of test set, the accuracy of each trade classification model can be obtained respectively, fromAnd obtain optimal industry disaggregated model.
The embodiment of the present invention is respectively trained and is tested by the trade classification model combined to many kinds of parameters, obtains classification essenceHighest trade classification model is spent, the precision of trade classification is further improved.
By the highest deep neural network model of the precision obtained, treats classifying text and carry out trade classification.
S107 treats classifying text and carries out trade classification by completing the trade classification model of training.
The trade classification method based on machine learning that the embodiment of the invention provides a kind of, by training set comprising warpThe text for seeking mechanism management functions information carries out feature extraction, obtains the corresponding feature vector of the text, and the text marks someoneThe trade classification of work point class identifies, and using the feature vector of text in training set as input, is trained to trade classification model,Trade classification model by completing training treats classifying text and carries out trade classification, has reached automatic to industry based on management functionsThe purpose of classification, classification effectiveness is high and classification is accurate.
In conjunction with Fig. 3, in the trade classification model by completing training, after treating classifying text progress trade classification, thisInventive embodiments additionally provide a kind of trade classification method based on machine learning, and this method can be used for obtaining abnormal trade classificationAs a result, this method comprises:
S301, obtains all texts that category of employment in the training set is the first category of employment, and first industry isAny category of employment in a variety of categorys of employment in the training set.
As included multiple texts corresponding to 95 kinds of categorys of employment in category of employment list in training set, in this stepIn, each category of employment and all texts for belonging to the category are screened, all texts corresponding to the category are obtained.
S302 carries out Density Clustering to the feature vector of all texts of first category of employment, obtains described firstThe cluster of category of employment.
First category of employment is any of multiple categorys of employment in industry list, for example, the first category of employmentFor agricultural, then the text that all categorys of employment in training set are labeled as agricultural, such as 100 are obtained by step S101, to this 100A category of employment is the feature vector of the text of agricultural, carries out Density Clustering, such as passes through DBSCAN (Density-BasedSpatial Clustering of Applications with Noise) algorithm progress clustering, DBSCAN is a ratioMore representational density-based algorithms can be based on Density Clustering, obtain cluster corresponding to agricultural industry, cluster definitionIt can be cluster having region division highdensity enough for the maximum set for the point that density is connected, it is alternatively referred to as agriculture hereinThe portrait of industry corresponding to industry.
S303 obtains the central point and radius of the cluster of first category of employment.
S304, if classification results are the probability highest that the text to be sorted is the first category of employment, by it is described toThe feature vector of classifying text calculates the text to be sorted at a distance from the central point of the cluster of first industry.
Text to be sorted for one classifies to the text by the method for embodiment corresponding to Fig. 1 and Fig. 2Later, the probability that the text belongs to each category of employment is obtained, for example, obtain the probability highest that the text belongs to agricultural, then baseIn the feature vector of the text, at a distance from the central point that calculates cluster corresponding to the text and agricultural, if the distance is greater than stepThe radius of the cluster of agricultural in S303 then judges the text for an abnormal text.
Since the result that trade classification model obtains calculates its obtained for the management functions information for being included according to textThe probability for belonging to each category of employment, the trade classification model provided through the embodiment of the present invention, as obtained corresponding to the textCategory of employment be agricultural maximum probability, it is likely that in the presence of it includes management functions information also include a variety of with agriculture passThe information of the lower other industry of connection degree, although the category of employment of the text is caused to be the probability highest of agricultural, itself and agriculturalThe degree of association be also not especially big, determine that the text is an abnormal text at this time.
S305 judges the text to be sorted for abnormal text if the distance is greater than the radius of the cluster.
The embodiment of the present invention is based on clustering algorithm, cluster corresponding to an industry is obtained by training set, if a textTrade classification result be the sector, then judge whether the text big at a distance from the sector cluster according to the feature vector of the textIn the radius of the cluster, if more than then judging that the text for abnormal text, further provides foundation for industry exact classification.
Fig. 4 is a kind of trade classification schematic device based on machine learning provided in an embodiment of the present invention, in conjunction with Fig. 4,The device includes: first acquisition unit 41, participle unit 42, second acquisition unit 43, the acquisition list of third acquiring unit the 44, the 4thMember 45, training unit 46 and taxon 47;
First acquisition unit 41 is the text collection through manually marking, the instruction for obtaining training set, the training setPractice collection to be made of the text of a variety of categorys of employment, every kind of category of employment at least corresponds to a text, and each text uniquely maps oneKind of category of employment, for any text in the training set, the text includes management functions information, and the text markingThere is corresponding category of employment;
Participle unit 42 is used to carry out word segmentation processing to the text, obtains vocabulary corresponding to the text;
The vocabulary that second acquisition unit 43 is used to obtain the first preset number in the vocabulary by feature extraction is madeFor keyword;
Third acquiring unit 44 is used to obtain the keyword by term vector model for any keyword obtainedTerm vector;
4th acquiring unit 45 is used to for the term vector of all keywords being averaging, and obtains primary vector, obtains instituteMaximum term vector in the term vector of all keywords is stated, secondary vector is obtained, in the term vector for obtaining all keywordsThe smallest term vector obtains third vector, by the primary vector, the secondary vector and the third vector, described in compositionThe feature vector of text;
Training unit 46 is used for through training set training trade classification model;
Taxon 47 is used for the trade classification model by completing training, treats classifying text and carries out trade classification.
Further, which further includes screening unit 48, for removing default stop words and punctuate in word segmentation resultSymbol;Remaining vocabulary is arranged according to word frequency descending, the vocabulary for being arranged in front the second preset number is chosen, obtains the vocabularyTable.
Further, the second acquisition unit 43 is specifically used for: successively being calculated by TF-IDF every in the vocabularyDifferent degree of a vocabulary for the text;Calculated result descending is arranged, the vocabulary for being arranged in front the first preset number is chosenAs keyword.
Further, the trade classification model is deep neural network model, and the deep neural network model includes 4Layer, respectively input layer, the first hidden layer, the second hidden layer and output layer, the input of the input layer are the spy of the textVector is levied, first hidden layer includes the first present count destination node, and second hidden layer includes the second preset numberThe activation primitive of node, first hidden layer and second hidden layer is relu function, and the output layer is the textIndustry type probability, the activation primitive of the output layer is logistics function.
Further, which further includes establishing unit 49 and selection unit 410;
It is described to establish unit 49, multiple deep neural network models are established, for the multiple deep neural network modelIn any two deep neural network model, the learning rates of described two deep neural network models, frequency of training, batch sizeIt is different with termination error;
The training unit 46 is respectively trained the multiple deep neural network model by the training set;
First acquisition unit 41 is also used to obtain default test set;
Taxon 47 respectively tests the multiple deep neural network model by the default test set;
Selection unit 410 chooses the classification highest deep neural network model of accuracy according to test result;
The taxon 47 is specifically used for: by the classification highest deep neural network model of accuracy to described wait divideClass text carries out trade classification.
Further, which further includes cluster cell 411, computing unit 412 and judging unit 413, for obtainingAll texts that category of employment in training set is the first category of employment are stated, first industry is a variety of rows in the training setAny category of employment in industry classification;Density Clustering is carried out to the feature vector of all texts of first category of employment,Obtain the cluster of first category of employment;Obtain the central point and radius of the cluster of first category of employment;
If classification results are the probability highest that the text to be sorted is the first category of employment, computing unit 412 passes throughThe feature vector of the text to be sorted calculates the text to be sorted at a distance from the central point of the cluster of first industry;
If the distance is greater than the radius of the cluster, judging unit 413 judges to be abnormal literary to the text to be sortedThis.
The embodiment of the invention provides a kind of trade classification device based on machine learning, the device pass through in training setText comprising managerial setup management functions information carries out feature extraction, obtains the corresponding feature vector of the text, text markBe marked with manual sort trade classification mark, using the feature vector of text in training set as input, to trade classification model intoRow training, the trade classification model by completing training treat classifying text and carry out trade classification, reached based on management functions pairThe purpose that industry is classified automatically, classification effectiveness is high and classification is accurate.
Fig. 5 is a kind of schematic diagram of the trade classification terminal device based on machine learning provided in an embodiment of the present invention.Such asShown in Fig. 5, the trade classification terminal device 5 of the embodiment includes: processor 50, memory 51 and is stored in the memoryIn 51 and the computer program 52 that can be run on the processor 50, such as trade classification program.The processor 50 executesThe step in above-mentioned each trade classification embodiment of the method based on machine learning is realized when the computer program 52, such as is schemedStep 101 shown in 1 is to 107 or step 1061 shown in Fig. 2 to 1065 or step 301 shown in Fig. 3 to 305.Alternatively, instituteThe function that each module/unit in above-mentioned each Installation practice is realized when processor 50 executes the computer program 52 is stated, such asThe function of module 41 to 413 shown in Fig. 4.
Illustratively, the computer program 52 can be divided into one or more module/units, it is one orMultiple module/units are stored in the memory 51, and are executed by the processor 50, to complete the present invention.Described oneA or multiple module/units can be the series of computation machine program instruction section that can complete specific function, which is used forImplementation procedure of the computer program 52 in the trade classification terminal device 5 is described.
The trade classification terminal device 5 can be desktop PC, notebook, palm PC and cloud server etc.Calculate equipment.The trade classification terminal device may include, but be not limited only to, processor 50, memory 51.Those skilled in the artMember is appreciated that Fig. 5 is only the example of trade classification terminal device 5, does not constitute the limit to trade classification terminal device 5It is fixed, it may include perhaps combining certain components or different components, such as the row than illustrating more or fewer componentsIndustry classified terminal equipment can also include input-output equipment, network access equipment, bus etc..
The processor 50 can be central processing unit (Central Processing Unit, CPU), can also beOther general processors, digital signal processor (Digital Signal Processor, DSP), specific integrated circuit(Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field-Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic,Discrete hardware components etc..General processor can be microprocessor or the processor is also possible to any conventional processorDeng.
The memory 51 can be the internal storage unit of the trade classification terminal device 5, such as trade classification endThe hard disk or memory of end equipment 5.The memory 51 is also possible to the External memory equipment of the trade classification terminal device 5,Such as the plug-in type hard disk being equipped on the trade classification terminal device 5, intelligent memory card (Smart Media Card, SMC),Secure digital (Secure Digital, SD) card, flash card (Flash Card) etc..Further, the memory 51 may be used alsoWith the internal storage unit both including the trade classification terminal device 5 or including External memory equipment.The memory 51 is usedOther programs and data needed for storing the computer program and the trade classification terminal device.The memory 51It can be also used for temporarily storing the data that has exported or will export.
The embodiment of the present invention also provides a kind of computer readable storage medium, and the computer-readable recording medium storage hasComputer program, the computer program realize the row described in any of the above-described embodiment based on machine learning when being executed by processorThe step of industry classification method.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unitIt is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated listMember both can take the form of hardware realization, can also realize in the form of software functional units.
If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent productWhen, it can store in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantiallyThe all or part of the part that contributes to existing technology or the technical solution can be in the form of software products in other wordsIt embodies, which is stored in a storage medium, including some instructions are used so that a computerEquipment (can be personal computer, server or the network equipment etc.) executes the complete of each embodiment the method for the present inventionPortion or part steps.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (ROM, Read-OnlyMemory), random access memory (RAM, Random Access Memory), magnetic or disk etc. are various can store journeyThe medium of sequence code.
Embodiment described above is merely illustrative of the technical solution of the present invention, rather than its limitations;Although referring to aforementioned realityApplying example, invention is explained in detail, those skilled in the art should understand that: it still can be to aforementioned eachTechnical solution documented by embodiment is modified or equivalent replacement of some of the technical features;And these are modifiedOr replacement, the essence of corresponding technical solution is departed from the spirit and scope of the technical scheme of various embodiments of the present invention, it should allIt is included within protection scope of the present invention.

Claims (10)

Translated fromChinese
1.一种基于机器学习的行业分类方法,其特征在于,该方法包括:1. an industry classification method based on machine learning, it is characterised in that the method comprises:获取训练集,所述训练集为经人工标注的文本集合,所述训练集由多种行业类别的文本构成,针对所述训练集中的任一文本,所述文本包括经营业务信息,且所述文本标注有对应的行业类别;Obtain a training set, where the training set is a manually labeled text set, the training set is composed of texts of various industry categories, and for any text in the training set, the text includes business information, and the The text is marked with the corresponding industry category;对所述文本进行分词处理,得到所述文本所对应的词汇表;Perform word segmentation processing on the text to obtain a vocabulary corresponding to the text;通过特征提取,在所述词汇表中获取第一预设数目的词汇作为关键词;Through feature extraction, a first preset number of words are obtained as keywords in the vocabulary;针对获得的任一关键词,通过词向量模型得到所述关键词的词向量;For any obtained keyword, obtain the word vector of the keyword through the word vector model;将所有关键词的词向量求平均,得到第一向量;Average the word vectors of all keywords to get the first vector;获取所有关键词的词向量中最大的词向量,得到第二向量;Obtain the largest word vector among the word vectors of all keywords, and obtain the second vector;获取所有关键词的词向量中最小的词向量,得到第三向量;Obtain the smallest word vector among the word vectors of all keywords, and obtain the third vector;由所述第一向量、所述第二向量和所述第三向量,组成所述文本的特征向量;The feature vector of the text is composed of the first vector, the second vector and the third vector;通过所述训练集训练行业分类模型;Train an industry classification model through the training set;通过完成训练的行业分类模型,对待分类文本进行行业分类。By completing the trained industry classification model, industry classification is performed on the text to be classified.2.根据权利要求1所述的行业分类方法,其特征在于,所述对所述文本进行分词处理之后,该方法还包括:2. The industry classification method according to claim 1, wherein after the word segmentation is performed on the text, the method further comprises:去除分词结果中的预设停用词和标点符号;Remove preset stop words and punctuation marks in word segmentation results;将剩余词汇按照词频降序排列,选取排列在前第二预设数目的词汇,得到所述词汇表。Arrange the remaining words in descending order of word frequency, and select words arranged in the first second preset number to obtain the vocabulary list.3.根据权利要求1所述的行业分类方法,其特征在于,所述通过特征提取,在所述词汇表中获取第一预设数目的词汇作为关键词包括:3. The industry classification method according to claim 1, characterized in that, by feature extraction, obtaining the first preset number of words in the vocabulary as keywords, comprising:通过TF-IDF算法依次计算所述词汇表中每个词汇对于所述文本的重要度;Calculate the importance of each word in the vocabulary for the text in turn by the TF-IDF algorithm;将计算结果降序排列,选取排列在前第一预设数目的词汇作为关键词。Arrange the calculation results in descending order, and select the words that are arranged in the first preset number as keywords.4.根据权利要求1-3任一项所述的行业分类方法,其特征在于,所述行业分类模型为深度神经网络模型,所述深度神经网络模型包括4层,分别为输入层、第一隐藏层、第二隐藏层和输出层,所述输入层的输入为所述文本的特征向量,所述第一隐藏层包括第一预设数目的节点,所述第二隐藏层包括第二预设数目的节点,所述第一隐藏层和所述第二隐藏层的激活函数为relu函数,所述输出层为所述文本的行业类型的概率,所述输出层的激活函数为logistics函数。4. The industry classification method according to any one of claims 1-3, wherein the industry classification model is a deep neural network model, and the deep neural network model comprises 4 layers, which are an input layer, a first A hidden layer, a second hidden layer and an output layer, the input of the input layer is the feature vector of the text, the first hidden layer includes a first preset number of nodes, and the second hidden layer includes a second preset number of nodes. Assuming a number of nodes, the activation function of the first hidden layer and the second hidden layer is the relu function, the output layer is the probability of the industry type of the text, and the activation function of the output layer is the logistics function.5.根据权利要求4所述的行业分类方法,其特征在于,该方法还包括:5. The industry classification method according to claim 4, wherein the method further comprises:建立多个深度神经网络模型,针对所述多个深度神经网络模型中的任意两个深度神经网络模型,所述两个深度神经网络模型的学习率、训练次数、批尺寸和终止误差各不相同;Establish a plurality of deep neural network models, for any two deep neural network models in the plurality of deep neural network models, the learning rates, training times, batch sizes and termination errors of the two deep neural network models are different. ;通过所述训练集分别对所述多个深度神经网络模型进行训练;respectively train the multiple deep neural network models through the training set;获取预设测试集;Get the preset test set;通过所述预设测试集分别对所述多个深度神经网络模型进行测试;respectively test the multiple deep neural network models through the preset test set;根据测试结果选取分类精确度最高的一个深度神经网络模型对所述待分类文本进行行业分类。According to the test results, a deep neural network model with the highest classification accuracy is selected to classify the text to be classified by industry.6.根据权利要求1所述的行业分类方法,其特征在于,该方法还包括:6. The industry classification method according to claim 1, wherein the method further comprises:获取所述训练集中行业类别为第一行业类别的所有文本,所述第一行业为所述训练集中的多种行业类别中的任一种行业类别;Obtain all texts in the training set whose industry category is the first industry category, where the first industry is any one of the various industry categories in the training set;对所述第一行业类别的所有文本的特征向量进行密度聚类,得到所述第一行业类别的簇;Density clustering is performed on feature vectors of all texts of the first industry category to obtain clusters of the first industry category;获取所述第一行业类别的簇的中心点和半径;Obtain the center point and radius of the cluster of the first industry category;所述通过完成训练的行业分类模型,对待分类文本进行行业分类之后,该方法还包括:After the industry classification model is completed for the training, the method further includes:若分类结果为所述待分类文本为第一行业类别的概率最高,则通过所述待分类文本的特征向量,计算所述待分类文本与所述第一行业的簇的中心点的距离;If the classification result is that the text to be classified has the highest probability of being in the first industry category, calculating the distance between the text to be classified and the center point of the cluster of the first industry through the feature vector of the text to be classified;若所述距离大于所述簇的半径,则判断对所述待分类文本为异常文本。If the distance is greater than the radius of the cluster, it is determined that the text to be classified is abnormal text.7.一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至6任一项所述方法的步骤。7. A computer-readable storage medium storing a computer program, wherein the computer program implements the steps of the method according to any one of claims 1 to 6 when the computer program is executed by a processor .8.一种终端设备,其特征在于,所述终端设备包括存储器、处理器,所述存储器上存储有可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如下步骤:8. A terminal device, characterized in that the terminal device comprises a memory and a processor, and the memory stores a computer program that can run on the processor, and the processor implements the computer program when the processor executes the computer program. Follow the steps below:获取训练集,所述训练集为经人工标注的文本集合,所述训练集由多种行业类别的文本构成,针对所述训练集中的任一文本,所述文本包括经营业务信息,且所述文本标注有对应的行业类别;Obtain a training set, where the training set is a manually labeled text set, the training set is composed of texts of various industry categories, and for any text in the training set, the text includes business information, and the The text is marked with the corresponding industry category;对所述文本进行分词处理,得到所述文本所对应的词汇表;Perform word segmentation processing on the text to obtain a vocabulary corresponding to the text;通过特征提取,在所述词汇表中获取第一预设数目的词汇作为关键词;Through feature extraction, a first preset number of words are obtained as keywords in the vocabulary;针对获得的任一关键词,通过词向量模型得到所述关键词的词向量;For any obtained keyword, obtain the word vector of the keyword through the word vector model;将所有关键词的词向量求平均,得到第一向量;Average the word vectors of all keywords to get the first vector;获取所有关键词的词向量中最大的词向量,得到第二向量;Obtain the largest word vector among the word vectors of all keywords, and obtain the second vector;获取所有关键词的词向量中最小的词向量,得到第三向量;Obtain the smallest word vector among the word vectors of all keywords, and obtain the third vector;由所述第一向量、所述第二向量和所述第三向量,组成所述文本的特征向量;The feature vector of the text is composed of the first vector, the second vector and the third vector;通过所述训练集训练行业分类模型;Train an industry classification model through the training set;通过完成训练的行业分类模型,对待分类文本进行行业分类。By completing the trained industry classification model, industry classification is performed on the text to be classified.9.根据权利要求8所述的终端设备,其特征在于,所述处理器执行所述计算机程序时还用于实现如下步骤:9. The terminal device according to claim 8, wherein the processor is further configured to implement the following steps when executing the computer program:去除分词结果中的预设停用词和标点符号;Remove preset stop words and punctuation marks in word segmentation results;将剩余词汇按照词频降序排列,选取排列在前第二预设数目的词汇,得到所述词汇表。Arrange the remaining words in descending order of word frequency, and select words arranged in the first second preset number to obtain the vocabulary list.10.根据权利要求8所述的终端设备,其特征在于,所述通过特征提取,在所述词汇表中获取预设数目的词汇作为关键词包括:10 . The terminal device according to claim 8 , wherein obtaining a preset number of words in the vocabulary table as keywords through feature extraction comprises: 10 .通过TF-IDF算法依次计算所述词汇表中每个词汇对于所述文本的重要度;Calculate the importance of each word in the vocabulary for the text in turn by the TF-IDF algorithm;将计算结果降序排列,选取排列在前第一预设数目的词汇作为关键词。Arrange the calculation results in descending order, and select the words that are arranged in the first preset number as keywords.
CN201811107159.8A2018-09-212018-09-21A kind of trade classification method and terminal device based on machine learningPendingCN109388712A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201811107159.8ACN109388712A (en)2018-09-212018-09-21A kind of trade classification method and terminal device based on machine learning

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201811107159.8ACN109388712A (en)2018-09-212018-09-21A kind of trade classification method and terminal device based on machine learning

Publications (1)

Publication NumberPublication Date
CN109388712Atrue CN109388712A (en)2019-02-26

Family

ID=65418981

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201811107159.8APendingCN109388712A (en)2018-09-212018-09-21A kind of trade classification method and terminal device based on machine learning

Country Status (1)

CountryLink
CN (1)CN109388712A (en)

Cited By (29)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN109961368A (en)*2019-03-182019-07-02京东数字科技控股有限公司Data processing method and device based on machine learning
CN110059692A (en)*2019-04-162019-07-26厦门商集网络科技有限责任公司A kind of method and terminal identifying the affiliated industry of enterprise
CN110069252A (en)*2019-04-112019-07-30浙江网新恒天软件有限公司A kind of source code file multi-service label mechanized classification method
CN110110143A (en)*2019-04-152019-08-09厦门网宿有限公司A kind of video classification methods and device
CN110264318A (en)*2019-06-262019-09-20拉扎斯网络科技(上海)有限公司Data processing method and device, electronic equipment and storage medium
CN110781955A (en)*2019-10-242020-02-11中国银联股份有限公司Method and device for classifying label-free objects and detecting nested codes and computer-readable storage medium
CN110781296A (en)*2019-09-162020-02-11中国平安人寿保险股份有限公司Data classification method based on deep learning and related equipment thereof
CN110888982A (en)*2019-11-222020-03-17成都市映潮科技股份有限公司High-precision agricultural product classification method and system
CN110990562A (en)*2019-10-292020-04-10新智认知数字科技股份有限公司Alarm classification method and system
CN111027318A (en)*2019-10-122020-04-17中国平安财产保险股份有限公司Industry classification method, device, equipment and storage medium based on big data
CN111177375A (en)*2019-12-162020-05-19医渡云(北京)技术有限公司Electronic document classification method and device
CN111753496A (en)*2020-06-222020-10-09平安付科技服务有限公司 Industry category identification method, apparatus, computer equipment and readable storage medium
CN112183152A (en)*2019-07-012021-01-05财付通支付科技有限公司Industry type judgment method and device, electronic equipment and storage medium
CN112434889A (en)*2020-12-182021-03-02深圳赛安特技术服务有限公司Expert industry analysis method, device, equipment and storage medium
WO2021042511A1 (en)*2019-09-032021-03-11平安科技(深圳)有限公司Legal text storage method and device, readable storage medium and terminal device
CN112612870A (en)*2020-12-112021-04-06广东电力通信科技有限公司Unstructured data management method
CN112632980A (en)*2020-12-302021-04-09广州友圈科技有限公司Enterprise classification method and system based on big data deep learning and electronic equipment
CN112749557A (en)*2020-08-062021-05-04腾讯科技(深圳)有限公司Text processing model construction method and text processing method
CN113255370A (en)*2021-06-222021-08-13中国平安财产保险股份有限公司Industry type recommendation method, device, equipment and medium based on semantic similarity
CN113688247A (en)*2021-09-292021-11-23有米科技股份有限公司Text-based industry identification model determining method and device
CN113704467A (en)*2021-07-292021-11-26大箴(杭州)科技有限公司Massive text monitoring method and device based on data template, medium and equipment
CN113806311A (en)*2021-09-172021-12-17平安普惠企业管理有限公司Deep learning-based file classification method and device, electronic equipment and medium
CN113868420A (en)*2021-09-292021-12-31有米科技股份有限公司 Method and device for determining characteristics of text
CN114385811A (en)*2021-12-212022-04-22中国电子科技集团公司第三十研究所Network space strategy information classification method, equipment and medium
CN114444504A (en)*2022-04-112022-05-06西南交通大学 A kind of enterprise business classification coding method, device, equipment and readable storage medium
CN114462659A (en)*2020-11-092022-05-10航天信息股份有限公司 A method and device for correcting an industry category
CN114600133A (en)*2019-09-042022-06-07邓白氏公司Classifying business summaries for a hierarchical industry classification structure using supervised machine learning
CN115099310A (en)*2022-06-022022-09-23蚂蚁区块链科技(上海)有限公司 Method and apparatus for training a model and classifying enterprises by industry
CN115563289A (en)*2022-12-062023-01-03中信证券股份有限公司Industry classification label generation method and device, electronic equipment and readable medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20140229307A1 (en)*2013-02-122014-08-14Ebay Inc.Method of identifying outliers in item categories
CN106920147A (en)*2017-02-282017-07-04华中科技大学A kind of commodity intelligent recommendation method that word-based vector data drives
CN107169086A (en)*2017-05-122017-09-15北京化工大学A kind of file classification method
CN108171276A (en)*2018-01-172018-06-15百度在线网络技术(北京)有限公司For generating the method and apparatus of information
CN108417274A (en)*2018-03-062018-08-17东南大学 Epidemic prediction method, system and equipment
CN108520009A (en)*2018-03-192018-09-11北京工业大学 A kind of English text clustering method and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20140229307A1 (en)*2013-02-122014-08-14Ebay Inc.Method of identifying outliers in item categories
CN106920147A (en)*2017-02-282017-07-04华中科技大学A kind of commodity intelligent recommendation method that word-based vector data drives
CN107169086A (en)*2017-05-122017-09-15北京化工大学A kind of file classification method
CN108171276A (en)*2018-01-172018-06-15百度在线网络技术(北京)有限公司For generating the method and apparatus of information
CN108417274A (en)*2018-03-062018-08-17东南大学 Epidemic prediction method, system and equipment
CN108520009A (en)*2018-03-192018-09-11北京工业大学 A kind of English text clustering method and system

Cited By (40)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN109961368A (en)*2019-03-182019-07-02京东数字科技控股有限公司Data processing method and device based on machine learning
CN110069252A (en)*2019-04-112019-07-30浙江网新恒天软件有限公司A kind of source code file multi-service label mechanized classification method
CN110110143A (en)*2019-04-152019-08-09厦门网宿有限公司A kind of video classification methods and device
CN110059692A (en)*2019-04-162019-07-26厦门商集网络科技有限责任公司A kind of method and terminal identifying the affiliated industry of enterprise
CN110264318A (en)*2019-06-262019-09-20拉扎斯网络科技(上海)有限公司Data processing method and device, electronic equipment and storage medium
CN112183152A (en)*2019-07-012021-01-05财付通支付科技有限公司Industry type judgment method and device, electronic equipment and storage medium
WO2021042511A1 (en)*2019-09-032021-03-11平安科技(深圳)有限公司Legal text storage method and device, readable storage medium and terminal device
CN114600133A (en)*2019-09-042022-06-07邓白氏公司Classifying business summaries for a hierarchical industry classification structure using supervised machine learning
CN110781296A (en)*2019-09-162020-02-11中国平安人寿保险股份有限公司Data classification method based on deep learning and related equipment thereof
CN111027318A (en)*2019-10-122020-04-17中国平安财产保险股份有限公司Industry classification method, device, equipment and storage medium based on big data
CN110781955A (en)*2019-10-242020-02-11中国银联股份有限公司Method and device for classifying label-free objects and detecting nested codes and computer-readable storage medium
CN110990562A (en)*2019-10-292020-04-10新智认知数字科技股份有限公司Alarm classification method and system
CN110990562B (en)*2019-10-292022-08-26新智认知数字科技股份有限公司Alarm classification method and system
CN110888982A (en)*2019-11-222020-03-17成都市映潮科技股份有限公司High-precision agricultural product classification method and system
CN111177375A (en)*2019-12-162020-05-19医渡云(北京)技术有限公司Electronic document classification method and device
CN111753496A (en)*2020-06-222020-10-09平安付科技服务有限公司 Industry category identification method, apparatus, computer equipment and readable storage medium
CN111753496B (en)*2020-06-222023-06-23平安付科技服务有限公司Industry category identification method and device, computer equipment and readable storage medium
CN112749557A (en)*2020-08-062021-05-04腾讯科技(深圳)有限公司Text processing model construction method and text processing method
CN114462659A (en)*2020-11-092022-05-10航天信息股份有限公司 A method and device for correcting an industry category
CN112612870B (en)*2020-12-112023-12-01广东电力通信科技有限公司Unstructured data management method and system
CN112612870A (en)*2020-12-112021-04-06广东电力通信科技有限公司Unstructured data management method
CN112434889A (en)*2020-12-182021-03-02深圳赛安特技术服务有限公司Expert industry analysis method, device, equipment and storage medium
CN112632980A (en)*2020-12-302021-04-09广州友圈科技有限公司Enterprise classification method and system based on big data deep learning and electronic equipment
CN112632980B (en)*2020-12-302022-09-30广州友圈科技有限公司Enterprise classification method and system based on big data deep learning and electronic equipment
CN113255370A (en)*2021-06-222021-08-13中国平安财产保险股份有限公司Industry type recommendation method, device, equipment and medium based on semantic similarity
CN113255370B (en)*2021-06-222022-09-20中国平安财产保险股份有限公司Industry type recommendation method, device, equipment and medium based on semantic similarity
CN113704467A (en)*2021-07-292021-11-26大箴(杭州)科技有限公司Massive text monitoring method and device based on data template, medium and equipment
CN113704467B (en)*2021-07-292024-07-02大箴(杭州)科技有限公司Massive text monitoring method and device based on data template, medium and equipment
CN113806311A (en)*2021-09-172021-12-17平安普惠企业管理有限公司Deep learning-based file classification method and device, electronic equipment and medium
CN113806311B (en)*2021-09-172023-08-29深圳市深可信科学技术有限公司File classification method and device based on deep learning, electronic equipment and medium
CN113868420A (en)*2021-09-292021-12-31有米科技股份有限公司 Method and device for determining characteristics of text
CN113868420B (en)*2021-09-292025-05-30有米科技股份有限公司 Method and device for determining features of text
CN113688247B (en)*2021-09-292024-10-18有米科技股份有限公司Method and device for determining text-based industry recognition model
CN113688247A (en)*2021-09-292021-11-23有米科技股份有限公司Text-based industry identification model determining method and device
CN114385811A (en)*2021-12-212022-04-22中国电子科技集团公司第三十研究所Network space strategy information classification method, equipment and medium
CN114444504A (en)*2022-04-112022-05-06西南交通大学 A kind of enterprise business classification coding method, device, equipment and readable storage medium
CN114444504B (en)*2022-04-112022-08-05西南交通大学Enterprise business classification coding method, device, equipment and readable storage medium
CN115099310A (en)*2022-06-022022-09-23蚂蚁区块链科技(上海)有限公司 Method and apparatus for training a model and classifying enterprises by industry
CN115563289B (en)*2022-12-062023-03-07中信证券股份有限公司Industry classification label generation method and device, electronic equipment and readable medium
CN115563289A (en)*2022-12-062023-01-03中信证券股份有限公司Industry classification label generation method and device, electronic equipment and readable medium

Similar Documents

PublicationPublication DateTitle
CN109388712A (en)A kind of trade classification method and terminal device based on machine learning
CN113626607B (en)Abnormal work order identification method and device, electronic equipment and readable storage medium
CN111369003B (en) A method and device for determining the fidelity of a qubit read signal
WO2020062660A1 (en)Enterprise credit risk evaluation method, apparatus and device, and storage medium
CN105373606A (en)Unbalanced data sampling method in improved C4.5 decision tree algorithm
CN106445919A (en)Sentiment classifying method and device
CN108241867B (en)Classification method and device
García et al.On the use of data filtering techniques for credit risk prediction with instance-based models
Kumar et al.A benchmark to select data mining based classification algorithms for business intelligence and decision support systems
CN106651574A (en)Personal credit assessment method and apparatus
CN112348417B (en) A marketing value evaluation method and device based on principal component analysis algorithm
CN108228622A (en)The sorting technique and device of traffic issues
CN112598089B (en)Image sample screening method, device, equipment and medium
CN118035751A (en)Data construction method and device for large language model fine tuning training
CN112434884A (en)Method and device for establishing supplier classified portrait
CN113705201A (en)Text-based event probability prediction evaluation algorithm, electronic device and storage medium
CN110705281A (en)Resume information extraction method based on machine learning
CN113268665A (en)Information recommendation method, device and equipment based on random forest and storage medium
CN111353607B (en) A method and device for obtaining a quantum state discrimination model
CN113918709A (en)Industry classification model training method, classification method and device
CN111311201A (en)Intelligent project matching analysis tool and implementation method thereof
CN108830302B (en)Image classification method, training method, classification prediction method and related device
CN116362589B (en) A Method of Quality Work Assessment and Evaluation
CN111612023A (en) A method and device for constructing a classification model
CN109033081A (en)Method of discrimination towards legal system related text

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
RJ01Rejection of invention patent application after publication

Application publication date:20190226

RJ01Rejection of invention patent application after publication

[8]ページ先頭

©2009-2025 Movatter.jp