Disclosure of Invention
The invention aims to provide a method and a system for searching disease knowledge through voice, which are based on a preset disease knowledge base and are additionally provided with a voice recognition function and a disease classification learning model, so that an unnecessary input process is saved for a user, the time for filtering and screening information is saved for the user, and the user experience and the information searching precision are improved.
In order to achieve the above object, a first aspect of embodiments of the present invention provides a method for searching knowledge about diseases through voice, the method including:
the disease knowledge search system receives the first voice data, performs first voice preprocessing on the first voice data and generates first sentence audio data;
performing first audio word recognition processing on the first sentence audio data to generate first sentence word data;
performing first semantic tag identification processing on the first statement text data to generate a first semantic tag data set; the first set of semantic tag data comprises first tag type data and a plurality of first semantic tag data;
according to the multiple first semantic label data, performing first disease classification learning processing corresponding to the first label type data to generate multiple first disease name data and corresponding first disease probability data;
according to each first disease name data, inquiring a first name and related information corresponding relation table reflecting the corresponding relation between the disease name and the related information of the disease to generate a corresponding first disease knowledge data set;
forming first search result data by each first disease name data, the first disease probability data corresponding to the first disease name data and the first disease knowledge data set;
and forming a first search result data set by all the first search result data and outputting the first search result data set.
Preferably, the disease knowledge search system receives the first speech data, performs first speech preprocessing on the first speech data, and generates first sentence audio data, including:
and a data preprocessing module of the disease knowledge search system receives the first voice data, and performs first audio filtering and noise reduction processing on the first voice data to generate the first statement audio data.
Preferably, the performing a first audio word recognition process on the first sentence audio data to generate first sentence word data specifically includes:
and the voice recognition module of the disease knowledge search system inputs the first sentence audio data into a first acoustic language recognition model for recognition processing to generate the first sentence text data.
Preferably, the performing a first semantic tag identification process on the first sentence text data to generate a first semantic tag data set specifically includes:
the semantic recognition module of the disease knowledge search system inputs the first sentence character data into a first intelligent word segmentation recognition model for recognition processing to generate a plurality of first word segmentation data;
using the plurality of first participle data to query a first participle and semantic label corresponding relation table reflecting the corresponding relation between the participle and the semantic label to obtain a plurality of first semantic label data;
according to each piece of first semantic label data, inquiring a first semantic label and label type corresponding relation table reflecting the corresponding relation of semantic labels and label types, and generating corresponding first inquiry label type data;
combining the first query tag type data with the same type into a type group in all the first query tag type data, and taking the tag type corresponding to the type group containing the first query tag type data with the largest quantity as the first tag type data;
composing the plurality of first semantic tag data from all of the first semantic tag data; and forming the first semantic tag data set by the first tag type data and the plurality of first semantic tag data.
Further, the querying, by using the plurality of first participle data, a first participle and semantic label correspondence table reflecting correspondence between participles and semantic labels to obtain a plurality of first semantic label data specifically includes:
polling all first participle and semantic label corresponding relation records in the first participle and semantic label corresponding relation table, and taking the currently polled first participle and semantic label corresponding relation record as a first current record; the first participle and semantic label corresponding relation table comprises a plurality of first participle and semantic label corresponding relation records; the first word segmentation and semantic label corresponding relation record comprises first word segmentation information and first semantic label information;
performing first matching processing with the first word segmentation information of the first current record by using the plurality of first word segmentation data; sequentially extracting first word segmentation data from the plurality of first word segmentation data to serve as first current word segmentation data; when the first current participle data is the same as the first participle information, the first matching processing is successful;
and when the first matching processing is successful, extracting the first semantic label information of the first current record to generate the first semantic label data.
Preferably, the performing, according to the plurality of first semantic tag data, first disease classification learning processing corresponding to the first tag type data to generate a plurality of first disease name data and corresponding first disease probability data specifically includes:
a disease learning module of the disease knowledge search system determines a corresponding first disease classification learning model according to the first label type data; inputting the plurality of first semantic label data into the first disease classification learning model for learning to obtain a plurality of groups of first learning output data groups; each set of the first learning output data includes the first disease name data and the corresponding first disease probability data.
Preferably, the first and second liquid crystal materials are,
the first name and related information corresponding relation table comprises a plurality of first name and related information corresponding relation records; the first name and related information corresponding relation record comprises first disease name information, first disease definition information, first disease symptom information, first disease cause information, first disease diagnosis mode information, first disease clinical expression information and first disease treatment mode information;
the first disease knowledge data set includes at least first disease definition data, first disease symptom data, first disease cause data, first disease diagnosis mode data, first disease clinical presentation data, and first disease treatment mode data.
Preferably, the querying, according to each piece of the first disease name data, a first name and related information correspondence table reflecting a correspondence between disease names and disease related information to generate a corresponding first disease knowledge data set specifically includes:
a disease knowledge extraction module of the disease knowledge search system polls all first name and related information corresponding relation records of the first name and related information corresponding relation table according to each first disease name data, and takes the currently polled first name and related information corresponding relation record as a second current record;
when each first disease name data is the same as the first disease name information of the second current record, extracting the first disease definition information as the corresponding first disease definition data, extracting the first disease symptom information as the corresponding first disease symptom data, extracting the first disease cause information as the corresponding first disease cause data, extracting the first disease diagnosis mode information as the corresponding first disease diagnosis mode data, extracting the first disease clinical expression information as the corresponding first disease clinical expression data, and extracting the first disease treatment mode information as the corresponding first disease treatment mode data from the second current record;
and the corresponding first disease knowledge data set is composed of the first disease definition data, the first disease symptom data, the first disease cause data, the first disease diagnosis mode data, the first disease clinical presentation data and the first disease treatment mode data.
Preferably, before using the first disease classification learning model, the method further comprises:
the model training module of the disease knowledge search system performs learning model training processing on the first disease classification learning model by using semantic labels and an epidemic disease name training library; extracting multiple groups of semantic label training data corresponding to the specified epidemic disease name training data from the semantic label and epidemic disease name training library, and inputting the semantic label training data into the first disease classification learning model for training to obtain multiple groups of training output data groups; the semantic label and epidemic disease name training library comprises a plurality of semantic label training data and a plurality of epidemic disease name training data; each epidemic disease name training data corresponds to a plurality of semantic label training data; the training output data set comprises training output disease name data and training output disease probability data;
when the training output disease name data which is the highest in probability in the multiple groups of training output data sets and corresponds to the training output disease probability data is the same as the training data of the appointed epidemic disease name and is the highest in probability the training output disease probability data exceeds a set training probability threshold value and/or other degrees of correlation between the training output disease name data and the training data of the appointed epidemic disease name exceeds a set training degree of correlation threshold value, the training of the learning model is successful.
Preferably, after the disease knowledge search system outputs the first search result data set, the method further comprises:
a scoring processing module of the disease knowledge search system receives a first set of scoring data; the first set of scoring data comprises a plurality of first scoring data; the first set of scoring data corresponds to the first set of search result data; the first scoring data corresponds to the first search result data;
taking the plurality of first semantic label numbers as newly-added semantic label training data;
in the semantic label and epidemic disease name training library, taking the training disease name data corresponding to the first score data with the highest score as target training disease name data;
and adding the newly added semantic tag training data into the semantic tag and epidemic disease name training library, and establishing a corresponding relation between the newly added semantic tag training data and the target training disease name data.
A second aspect of an embodiment of the present invention provides a system for searching knowledge of a disease through speech, the system including:
the data preprocessing module is used for receiving the first voice data, performing first voice preprocessing on the first voice data and generating first statement audio data;
the voice recognition module is used for performing first audio character recognition processing on the first sentence audio data to generate first sentence character data;
the semantic identification module is used for carrying out first semantic tag identification processing on the first statement character data to generate a first semantic tag data set; the first set of semantic tag data comprises first tag type data and a plurality of first semantic tag data;
the disease learning module is used for performing first disease classification learning processing corresponding to the first label type data according to the plurality of first semantic label data to generate a plurality of first disease name data and corresponding first disease probability data;
the disease knowledge extraction module is used for inquiring a first name and related information corresponding relation table reflecting the corresponding relation between the disease name and the related information according to each first disease name data to generate a corresponding first disease knowledge data set;
the search result output module is used for combining each first disease name data, the corresponding first disease probability data and the corresponding first disease knowledge data set into first search result data; and forming a first search result data set by all the first search result data and outputting the first search result data set.
The embodiment of the invention provides a method and a system for searching disease knowledge through voice, which are based on a preset disease knowledge base and are additionally provided with a voice recognition function and a disease classification learning model, so that an unnecessary input process is saved for a user, the time for filtering and screening information is saved for the user, and the user experience and the information searching precision are improved.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
An embodiment of the present invention provides a method for searching knowledge of diseases by using voice, as shown in fig. 1, which is a schematic diagram of a method for searching knowledge of diseases by using voice according to an embodiment of the present invention, the method mainly includes the following steps:
step 1, a disease knowledge search system receives first voice data, and performs first voice preprocessing on the first voice data to generate first sentence audio data;
the method specifically comprises the following steps: the data preprocessing module of the disease knowledge search system receives the first voice data, and carries out first audio filtering and noise reduction processing on the first voice data to generate first statement audio data.
Here, the disease knowledge search system may be understood as a system having speech semantic recognition and an intelligent knowledge base; the system comprises a data preprocessing module, a voice recognition module, a semantic recognition module, a disease learning module and a disease knowledge extraction module; the data preprocessing module is used for acquiring, denoising and filtering original voice data; the voice recognition module is used for carrying out voice recognition on the preprocessed audio data to obtain sentence character data; the semantic recognition module carries out word segmentation and disease semantic recognition on the sentence character data, and all disease labels, namely semantic labels, and disease types with the maximum probability, namely label types are counted; the disease learning module is used for determining a disease classification learning model according to the label type, inputting all the counted disease labels into the disease classification learning model for deep learning, and finally obtaining a plurality of possible disease names and corresponding probabilities; the disease knowledge extraction module is used for extracting disease knowledge related to all possible diseases to serve as a final voice search result.
Here, in this step, the first Voice data is from a Voice recording device connected to the disease knowledge search system or a terminal device or a server storing original Voice data, and the data preprocessing module of the disease knowledge search system performs mute and noise separation processing on the Voice through a Voice Activity Detection algorithm (VAD); noise cancellation processing is performed on ambient noise, echoes, reverberation, and the like in the voice data using Least Mean Square (LMS) adaptive filtering, wiener filtering, and the like.
Step 2, carrying out first audio character recognition processing on the first sentence audio data to generate first sentence character data;
the method specifically comprises the following steps: and a voice recognition module of the disease knowledge search system inputs the first sentence audio data into a first acoustic language recognition model for recognition processing to generate first sentence text data.
Here, the first acoustic language recognition model used by the speech recognition module of the disease knowledge search system is commonly used as: 1) an acoustic Language identification Model composed of a Hidden Markov Model (HMM) + Gaussian Mixture Model (GMM) + N-Gram Language Model/Chinese Language Model (CLM); 2) an acoustic language recognition model consisting of HMM + Deep Neural Network (DNN) + N-Gram/CLM; the first acoustic language recognition model extracts characteristic data of input first sentence audio data, performs pronunciation matching on the characteristic data to obtain a pronunciation data sequence with maximum probability, and performs language word and word recognition on the pronunciation data sequence to obtain a word string with maximum probability, namely first sentence character data.
Step 3, carrying out first semantic tag identification processing on the first statement character data to generate a first semantic tag data set;
wherein the first set of semantic tag data comprises first tag type data and a plurality of first semantic tag data;
here, the semantic recognition module of the disease knowledge search system extracts a tag type and a semantic tag related to a known disease from the first sentence text data;
the method specifically comprises the following steps: step 31, a semantic recognition module of the disease knowledge search system inputs the first sentence character data into a first intelligent word segmentation recognition model for recognition processing to generate a plurality of first word segmentation data;
here, the first intelligent word segmentation recognition model used by the semantic recognition module of the disease knowledge search system is an algorithm model based on Natural Language Processing (NLP), and commonly used are: a forward Maximum Matching (MM) algorithm model, a Reverse Maximum Matching (RMM) algorithm model, a Bi-directional Maximum Matching (BM) algorithm model, an HMM algorithm model, and a Conditional Random Field (CRF) algorithm model;
here, the NLP theory is a technical theory for processing, understanding, and using human language in the field of computer science and artificial intelligence, so as to achieve effective communication between a human and a computer; NLP can be basically divided into two parts: natural language decomposition processing and natural language generation processing; the embodiment of the invention mainly relates to a natural language decomposition processing part, in particular to a method for extracting participles from first original information by using a first artificial intelligent participle algorithm model based on an NLP theory; the word segmentation is the word of the minimum unit in a segment of text information, and the segment of text information comprises a plurality of word segmentations;
for example, the first sentence text data is "my toothache and swollen gum", the first sentence text data is segmented and refined by using the first intelligent segmentation recognition model, and nouns and verbs are used as refinement addition items in the refinement, and finally obtained first segmentation information is respectively: "my", "tooth", "pain", "get", "tooth", "bed", "swelling", "plus" toothache "," gum swelling ";
step 32, using the plurality of first participle data to query a first participle and semantic label corresponding relation table reflecting the corresponding relation between the participle and the semantic label to obtain a plurality of first semantic label data;
the first participle and semantic label corresponding relation table comprises a plurality of first participle and semantic label corresponding relation records; the first word segmentation and semantic label corresponding relation record comprises first word segmentation information and first semantic label information;
the method specifically comprises the following steps: polling the corresponding relation record of the first participle and the semantic label in the corresponding relation table of the first participle and the semantic label, and taking the corresponding relation record of the currently polled first participle and the semantic label as a first current record;
performing first matching processing on the first word segmentation information recorded at the first current time by using a plurality of first word segmentation data; sequentially extracting first word segmentation data from the plurality of first word segmentation data to serve as first current word segmentation data; when the first current word segmentation data is the same as the first word segmentation information, the first matching processing is successful; when the first matching processing is successful, extracting first semantic label information of a first current record to generate first semantic label data;
here, the corresponding relation table of the first participle and the semantic tag used by the semantic recognition module of the disease knowledge search system may be a database relation table or a data file; the first participle and semantic label corresponding relation table is used for carrying out disease semantic labeling processing on the natural language words, so that redundant data generated by repeated expression and approximate expression can be reduced; semantic tags here are actually tags related to disease symptoms, e.g., 195 for poor dental nerve perception, 196 for gum pathology, 197 for tooth bleeding symptoms, 279 for chest discomfort, 280 for poor breathing, etc.;
for example, the table of correspondence between the first participle and the semantic tag is shown in table one, and the plurality of pieces of first participle information are respectively: "i", "tooth", "pain", "get", "tooth", "bed", "swelling", "plus" pain "," gum swelling "are given two first semantic label data: 195 and 196;
watch 1
Step 33, according to each first semantic tag data, inquiring a first semantic tag and tag type corresponding relation table reflecting the semantic tag and tag type corresponding relation, and generating corresponding first inquiry tag type data;
the first semantic label and label type corresponding relation table comprises a plurality of first semantic label and label type corresponding relation records; the first semantic label and label type corresponding relation record comprises second semantic label information and first label type information;
the method specifically comprises the following steps: polling a first semantic label and label type corresponding relation record of a first semantic label and label type corresponding relation table, and taking the currently polled first semantic label and label type corresponding relation record as a second current record;
when each first semantic label data is the same as second semantic label information of a second current record, extracting first label type information of the second current record as corresponding first query label type data;
here, the first semantic tag and tag type correspondence table used by the semantic identification module of the disease knowledge search system may be a database relationship table or a data file; querying the disease type corresponding to the disease semantic label through the first semantic label and label type corresponding relation table, wherein the disease type is actually a large class, for example, 11 represents dental related diseases, 21 represents heart related diseases, 31 represents respiratory related diseases, and the like;
for example, the table of correspondence between the first semantic tag and the tag type is shown in table two, and two first semantic tag data: 195 and 196, the two first query tag type data obtained are 11, 11;
watch two
Step 34, merging the first query tag type data with the same type into a type group in all the first query tag type data, and taking the tag type corresponding to the type group containing the most first query tag type data as the first tag type data;
here, the disease category with the highest probability is selected from the disease categories;
for example, two first semantic tag data: 195 and 196; all corresponding first query tag type data are 11 and 11; generating a set of types comprising 11, 11; if the tag type corresponding to the type group containing the largest amount of the first query tag type data is the tag type 11 of the type group, the first tag type data is 11;
step 35, forming a plurality of first semantic tag data by all the first semantic tag data; the first semantic tag data set is composed of first tag type data and a plurality of first semantic tag data.
Here, after steps 31-35 ofstep 3, the semantic recognition module of the disease knowledge search system performs further semantic analysis on the first sentence text data obtained instep 2, and the obtained first semantic tag data set includes the maximum probability disease category, that is, the first tag type data, and all semantic tags related to symptoms extracted from the original sentence.
Step 4, according to the plurality of first semantic tag data, performing first disease classification learning processing corresponding to the first tag type data to generate a plurality of first disease name data and corresponding first disease probability data;
the method specifically comprises the following steps: a disease learning module of the disease knowledge search system determines a corresponding first disease classification learning model according to the first label type data; inputting a plurality of first semantic label data into a first disease classification learning model for learning to obtain a plurality of groups of first learning output data groups; each set of first learning output data includes first disease name data and corresponding first disease probability data.
Here, the disease knowledge search system may have a plurality of disease classification learning models, such as a dental disease classification learning model for dental-related diseases, a cardiac disease classification learning model for cardiac-related diseases, a respiratory tract classification learning model for respiratory-related diseases, and the like; before each disease classification learning model is used, a model training module of a disease knowledge search system needs to train the disease classification learning model to be mature by using a semantic label and an epidemic disease name training library; an algorithm model adopted by the disease classification learning model is a random forest model commonly used, the type of input data can be classified and identified, and a plurality of possible classification results and the probability of each result are obtained; for example, the first tag type data is 11, a corresponding disease classification learning model, that is, a dental disease classification learning model, is selected, and for two first semantic tag data: 195 and 196, learning, the final calculation results are: periodontitis and its probable probability of 44%, gingivitis and its probable probability of 10.27%, pulpitis and its probable probability of 8.57%, and caries and its probable probability of 4.11%.
Step 5, according to each first disease name data, inquiring a first name and related information corresponding relation table reflecting the corresponding relation between the disease name and the related information of the disease, and generating a corresponding first disease knowledge data set;
the first name and related information corresponding relation table comprises a plurality of first name and related information corresponding relation records; the first name and related information corresponding relation record comprises first disease name information, first disease definition information, first disease symptom information, first disease cause information, first disease diagnosis mode information, first disease clinical expression information and first disease treatment mode information; the first disease knowledge data set at least comprises first disease definition data, first disease symptom data, first disease cause data, first disease diagnosis mode data, first disease clinical presentation data and first disease treatment mode data;
the method specifically comprises the following steps: a disease knowledge extraction module of the disease knowledge search system polls all first name and related information corresponding relation records of the first name and related information corresponding relation table according to each first disease name data, and takes the currently polled first name and related information corresponding relation record as a second current record;
when each first disease name data is the same as the first disease name information of the second current record, extracting first disease definition information from the second current record as corresponding first disease definition data, extracting first disease symptom information as corresponding first disease symptom data, extracting first disease cause information as corresponding first disease cause data, extracting first disease diagnosis mode information as corresponding first disease diagnosis mode data, extracting first disease clinical expression information as corresponding first disease clinical expression data, and extracting first disease treatment mode information as corresponding first disease treatment mode data;
and a corresponding first disease knowledge data set is composed of first disease definition data, first disease symptom data, first disease cause data, first disease diagnosis mode data, first disease clinical presentation data and first disease treatment mode data.
Here, the disease knowledge extraction module of the disease knowledge search system uses a first name and related information correspondence table which is actually a disease knowledge base, which may be a relational database, a form set composed of a plurality of database relationship tables, or a file set composed of a plurality of data files; in the first name and related information corresponding relation table, each first name and related information corresponding relation record records related information of a disease, including name, definition, common symptoms, etiology and inducement, diagnosis mode, clinical manifestation, treatment mode and the like; by taking the first disease name data as a query keyword, all relevant information can be extracted by querying the corresponding relation between the first name and the relevant information;
for example, 4 sets of first disease name data and corresponding first disease probability data are obtained from step 4: periodontitis and its probable probability of 44%, gingivitis and its probable probability of 10.27%, pulpitis and its probable probability of 8.57%, caries and its probable probability of 4.11%; then 4 first disease knowledge data sets are obtained by step 5: a disease knowledge data set (including definitions, common symptoms, etiologies and causes, diagnostic modalities, clinical manifestations, treatment modalities, etc.) for periodontitis, a disease knowledge data set (including definitions, common symptoms, etiologies and causes, diagnostic modalities, clinical manifestations, treatment modalities, etc.) for gingivitis, a disease knowledge data set (including definitions, common symptoms, etiologies and causes, diagnostic modalities, clinical manifestations, treatment modalities, etc.) for pulpitis, a disease knowledge data set (including definitions, common symptoms, etiologies and causes, diagnostic modalities, clinical manifestations, treatment modalities, etc.) for caries.
And 6, forming first search result data by each first disease name data, the corresponding first disease probability data and the corresponding first disease knowledge data set.
For example, the disease knowledge extraction module of the disease knowledge search system obtains 4 first search result data from 4 sets of first disease name data and corresponding first disease probability data, and 4 first disease knowledge data sets:
1 st first search result data: periodontitis, probability of probable 44%, a knowledge data set of diseases about periodontitis (including definitions, common symptoms, etiologies and causes, diagnostic modalities, clinical manifestations, treatment modalities, etc.);
2 nd first search result data: gingivitis, probability of being 10.27%, a set of knowledge data about the disease of gingivitis (including definitions, common symptoms, etiologies and causes, diagnostic modalities, clinical manifestations, treatment modalities, etc.);
3 rd first search result data: pulpitis, likely probability of 8.57%, a set of disease knowledge data about pulpitis (including definitions, common symptoms, etiologies and causes, diagnostic modalities, clinical manifestations, treatment modalities, etc.);
4 th first search result data: caries, probability 4.11%, disease knowledge data set about caries (including definitions, common symptoms, etiologies and causes, diagnostic modalities, clinical manifestations, treatment modalities, etc.).
And 7, forming a first search result data set by all the first search result data and outputting the first search result data set.
Here, the disease knowledge extraction module of the disease knowledge search system assembles all the obtained first search result data into a first search result data set to be fed back to the user.
In addition, the disease knowledge search system further comprises a model training module, and in the embodiment of the invention, before each disease classification learning model is put into use, the model training module needs to use the semantic label and the epidemic disease name training library to train each disease classification learning model, wherein the training process is briefly described as follows:
a1, extracting multiple groups of semantic label training data corresponding to specified epidemic disease name training data from semantic labels and an epidemic disease name training library by a model training module of the disease knowledge search system, inputting the semantic label training data into a first disease classification learning model for training, and obtaining multiple groups of training output data groups;
the semantic label and epidemic disease name training library comprises a plurality of semantic label training data and a plurality of epidemic disease name training data; each epidemic disease name training data corresponds to a plurality of semantic label training data; the training output data set includes training output disease name data and training output disease probability data.
The training data in the semantic label and the epidemic disease name training database are verified data, wherein the corresponding relation between the semantic label training data and the epidemic disease name training data is verified to be correct; the data of the semantic label and the epidemic disease name training library can be effective test data provided by a third-party testing organization, and can also be medical data acquired from a medical organization; the larger the training data amount is, the more accurate the corresponding relation is, and the higher the precision of the trained model is.
And step A2, when the training output disease name data corresponding to the training output disease probability data with the highest probability in the multiple training output data sets is the same as the training data of the designated epidemic disease name, and the training output disease probability data with the highest probability exceeds the set training probability threshold, and/or the correlation between the training output disease name data and the training data of the designated epidemic disease name exceeds the set training correlation threshold, the training of the learning model is successful.
Here are the conditions described for terminating training during model training: on the premise of ensuring that the designated disease name data appears and the probability is maximum, the precision of the probability is high enough to exceed a set training probability threshold value as a reference; and the relevance between other classification results and the main classification result can be considered, and the higher the relevance is, the higher the calculation precision of the model is, and the reference is that the relevance exceeds the set training relevance threshold.
In addition, the disease knowledge search system further includes a score processing module, and after the first search result data set is output, the score processing module automatically enriches the semantic label and the epidemic disease name training library according to the score of the user on the output result, which is specifically described as follows:
step B1, a scoring processing module of the disease knowledge search system receives a first scoring data set;
wherein the first set of scoring data comprises a plurality of first scoring data; the first set of scoring data corresponds to the first set of search result data; the first score data corresponds to the first search result data.
For example, after the disease knowledge search system displays 4 pieces of first search result data to the user, the disease knowledge search system also provides the user with an evaluation function, and the evaluation is ranked in three levels: best, general, not; if the score of the user on the 1 st first search result data is the most consistent, the score on the 2 nd is generally the consistent, and the scores on the 3 rd and 4 th are not consistent, the score processing module may obtain 4 first score data in the first score data set as: best, general, not.
Step B2, using a plurality of first semantic label numbers as newly-added semantic label training data; training disease name data corresponding to first grade data with the highest grade are used as target training disease name data in a semantic label and epidemic disease name training library; and adding new semantic tag training data into the semantic tag and epidemic disease name training library, and establishing a corresponding relation between the new semantic tag training data and the target training disease name data.
For example, if the disease name data corresponding to the 1 st first search result data with the highest score, i.e., the score that is the most matched with the disease name data, is "periodontitis", from the 4 first score data of the first score data set, this step will convert the current two first semantic tag numbers obtained by the user's voice: 195 and 196, which are added to the semantic tag and epidemic name training library and are associated with training disease name data in the library, specifically "periodontitis", which is what is actually adding valid training data to the semantic tag and epidemic name training library.
A second embodiment of the present invention provides a system for searching knowledge of diseases by using voice, where the system is used to implement the system function of the disease knowledge searching system in the above embodiment, and specifically, as shown in fig. 2, which is a schematic structural diagram of a system for searching knowledge of diseases by using voice, the system 20 mainly includes: a data preprocessing module 201, a voice recognition module 202, a semantic recognition module 203, a disease learning module 204, a disease knowledge extraction module 205, and a search result output module 206.
The data preprocessing module 201 is configured to receive first voice data, perform first voice preprocessing on the first voice data, and generate first sentence audio data.
The voice recognition module 202 is configured to perform a first audio character recognition process on the first sentence audio data to generate first sentence character data.
The semantic identification module 203 is configured to perform a first semantic tag identification process on the first statement text data to generate a first semantic tag data set; the first set of semantic tag data includes a first tag type data and a plurality of first semantic tag data.
The disease learning module 204 is configured to perform a first disease classification learning process corresponding to the first tag type data according to the plurality of first semantic tag data, and generate a plurality of first disease name data and corresponding first disease probability data.
The disease knowledge extraction module 205 is configured to query, according to each first disease name data, a first name and related information correspondence table that reflects a correspondence between disease names and related information, and generate a corresponding first disease knowledge data set.
The search result output module 206 is configured to combine each first disease name data, and the corresponding first disease probability data and first disease knowledge data set thereof into first search result data; and forming a first search result data set by all the first search result data and outputting the first search result data set.
Here, in the system for searching knowledge of disease through voice provided in the second embodiment of the present invention, the functions of the modules are the same as those of the modules corresponding to the system for searching knowledge of disease in the first embodiment, which is not further described herein.
The embodiment of the invention provides a method and a system for searching disease knowledge through voice, which are based on a preset disease knowledge base and are additionally provided with a voice recognition function and a disease classification learning model, so that an unnecessary input process is saved for a user, the time for filtering and screening information is saved for the user, and the user experience and the information searching precision are improved.
Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware, a software module executed by a processor, or a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.