Movatterモバイル変換


[0]ホーム

URL:


CN112735475A - Method and system for searching disease knowledge through voice - Google Patents

Method and system for searching disease knowledge through voice
Download PDF

Info

Publication number
CN112735475A
CN112735475ACN202011567638.5ACN202011567638ACN112735475ACN 112735475 ACN112735475 ACN 112735475ACN 202011567638 ACN202011567638 ACN 202011567638ACN 112735475 ACN112735475 ACN 112735475A
Authority
CN
China
Prior art keywords
data
disease
semantic
training
name
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011567638.5A
Other languages
Chinese (zh)
Other versions
CN112735475B (en
Inventor
游峰磊
李响
刘沛丰
胡鑫平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Borui Tongyun Technology Co ltd
Original Assignee
Beijing Borui Tongyun Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Borui Tongyun Technology Co ltdfiledCriticalBeijing Borui Tongyun Technology Co ltd
Priority to CN202011567638.5ApriorityCriticalpatent/CN112735475B/en
Publication of CN112735475ApublicationCriticalpatent/CN112735475A/en
Application grantedgrantedCritical
Publication of CN112735475BpublicationCriticalpatent/CN112735475B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

The embodiment of the invention relates to a method and a system for searching disease knowledge through voice, wherein the method comprises the following steps: preprocessing the first voice data to generate first sentence audio data; carrying out first audio word recognition processing on the first sentence audio data to generate first sentence word data; performing first semantic tag identification processing on the first statement character data to generate a first semantic tag data set; performing first disease classification learning processing corresponding to the first label type data to generate a plurality of first disease name data and corresponding first disease probability data; generating a corresponding first disease knowledge data set according to each first disease name data; forming first search result data by each first disease name data, disease probability data and disease knowledge data set; and outputting the first search result data set. The embodiment of the invention saves unnecessary input process, saves information filtering time and improves user experience and information searching precision.

Description

Method and system for searching disease knowledge through voice
Technical Field
The invention relates to the technical field of data processing, in particular to a method and a system for searching disease knowledge through voice.
Background
The old people pay more attention to various diseases and health information, related information is often searched, the current main searching mode is realized through a text input mode, and information filtering is needed to be carried out on massive searching results by individuals. The method is difficult for the old, and on one hand, due to the eyesight problem, the old has low typing input speed and high error rate, and the searching effect is influenced; on the other hand, if too much information is fully filtered, the processing time is long, and the health of the old people is affected.
Disclosure of Invention
The invention aims to provide a method and a system for searching disease knowledge through voice, which are based on a preset disease knowledge base and are additionally provided with a voice recognition function and a disease classification learning model, so that an unnecessary input process is saved for a user, the time for filtering and screening information is saved for the user, and the user experience and the information searching precision are improved.
In order to achieve the above object, a first aspect of embodiments of the present invention provides a method for searching knowledge about diseases through voice, the method including:
the disease knowledge search system receives the first voice data, performs first voice preprocessing on the first voice data and generates first sentence audio data;
performing first audio word recognition processing on the first sentence audio data to generate first sentence word data;
performing first semantic tag identification processing on the first statement text data to generate a first semantic tag data set; the first set of semantic tag data comprises first tag type data and a plurality of first semantic tag data;
according to the multiple first semantic label data, performing first disease classification learning processing corresponding to the first label type data to generate multiple first disease name data and corresponding first disease probability data;
according to each first disease name data, inquiring a first name and related information corresponding relation table reflecting the corresponding relation between the disease name and the related information of the disease to generate a corresponding first disease knowledge data set;
forming first search result data by each first disease name data, the first disease probability data corresponding to the first disease name data and the first disease knowledge data set;
and forming a first search result data set by all the first search result data and outputting the first search result data set.
Preferably, the disease knowledge search system receives the first speech data, performs first speech preprocessing on the first speech data, and generates first sentence audio data, including:
and a data preprocessing module of the disease knowledge search system receives the first voice data, and performs first audio filtering and noise reduction processing on the first voice data to generate the first statement audio data.
Preferably, the performing a first audio word recognition process on the first sentence audio data to generate first sentence word data specifically includes:
and the voice recognition module of the disease knowledge search system inputs the first sentence audio data into a first acoustic language recognition model for recognition processing to generate the first sentence text data.
Preferably, the performing a first semantic tag identification process on the first sentence text data to generate a first semantic tag data set specifically includes:
the semantic recognition module of the disease knowledge search system inputs the first sentence character data into a first intelligent word segmentation recognition model for recognition processing to generate a plurality of first word segmentation data;
using the plurality of first participle data to query a first participle and semantic label corresponding relation table reflecting the corresponding relation between the participle and the semantic label to obtain a plurality of first semantic label data;
according to each piece of first semantic label data, inquiring a first semantic label and label type corresponding relation table reflecting the corresponding relation of semantic labels and label types, and generating corresponding first inquiry label type data;
combining the first query tag type data with the same type into a type group in all the first query tag type data, and taking the tag type corresponding to the type group containing the first query tag type data with the largest quantity as the first tag type data;
composing the plurality of first semantic tag data from all of the first semantic tag data; and forming the first semantic tag data set by the first tag type data and the plurality of first semantic tag data.
Further, the querying, by using the plurality of first participle data, a first participle and semantic label correspondence table reflecting correspondence between participles and semantic labels to obtain a plurality of first semantic label data specifically includes:
polling all first participle and semantic label corresponding relation records in the first participle and semantic label corresponding relation table, and taking the currently polled first participle and semantic label corresponding relation record as a first current record; the first participle and semantic label corresponding relation table comprises a plurality of first participle and semantic label corresponding relation records; the first word segmentation and semantic label corresponding relation record comprises first word segmentation information and first semantic label information;
performing first matching processing with the first word segmentation information of the first current record by using the plurality of first word segmentation data; sequentially extracting first word segmentation data from the plurality of first word segmentation data to serve as first current word segmentation data; when the first current participle data is the same as the first participle information, the first matching processing is successful;
and when the first matching processing is successful, extracting the first semantic label information of the first current record to generate the first semantic label data.
Preferably, the performing, according to the plurality of first semantic tag data, first disease classification learning processing corresponding to the first tag type data to generate a plurality of first disease name data and corresponding first disease probability data specifically includes:
a disease learning module of the disease knowledge search system determines a corresponding first disease classification learning model according to the first label type data; inputting the plurality of first semantic label data into the first disease classification learning model for learning to obtain a plurality of groups of first learning output data groups; each set of the first learning output data includes the first disease name data and the corresponding first disease probability data.
Preferably, the first and second liquid crystal materials are,
the first name and related information corresponding relation table comprises a plurality of first name and related information corresponding relation records; the first name and related information corresponding relation record comprises first disease name information, first disease definition information, first disease symptom information, first disease cause information, first disease diagnosis mode information, first disease clinical expression information and first disease treatment mode information;
the first disease knowledge data set includes at least first disease definition data, first disease symptom data, first disease cause data, first disease diagnosis mode data, first disease clinical presentation data, and first disease treatment mode data.
Preferably, the querying, according to each piece of the first disease name data, a first name and related information correspondence table reflecting a correspondence between disease names and disease related information to generate a corresponding first disease knowledge data set specifically includes:
a disease knowledge extraction module of the disease knowledge search system polls all first name and related information corresponding relation records of the first name and related information corresponding relation table according to each first disease name data, and takes the currently polled first name and related information corresponding relation record as a second current record;
when each first disease name data is the same as the first disease name information of the second current record, extracting the first disease definition information as the corresponding first disease definition data, extracting the first disease symptom information as the corresponding first disease symptom data, extracting the first disease cause information as the corresponding first disease cause data, extracting the first disease diagnosis mode information as the corresponding first disease diagnosis mode data, extracting the first disease clinical expression information as the corresponding first disease clinical expression data, and extracting the first disease treatment mode information as the corresponding first disease treatment mode data from the second current record;
and the corresponding first disease knowledge data set is composed of the first disease definition data, the first disease symptom data, the first disease cause data, the first disease diagnosis mode data, the first disease clinical presentation data and the first disease treatment mode data.
Preferably, before using the first disease classification learning model, the method further comprises:
the model training module of the disease knowledge search system performs learning model training processing on the first disease classification learning model by using semantic labels and an epidemic disease name training library; extracting multiple groups of semantic label training data corresponding to the specified epidemic disease name training data from the semantic label and epidemic disease name training library, and inputting the semantic label training data into the first disease classification learning model for training to obtain multiple groups of training output data groups; the semantic label and epidemic disease name training library comprises a plurality of semantic label training data and a plurality of epidemic disease name training data; each epidemic disease name training data corresponds to a plurality of semantic label training data; the training output data set comprises training output disease name data and training output disease probability data;
when the training output disease name data which is the highest in probability in the multiple groups of training output data sets and corresponds to the training output disease probability data is the same as the training data of the appointed epidemic disease name and is the highest in probability the training output disease probability data exceeds a set training probability threshold value and/or other degrees of correlation between the training output disease name data and the training data of the appointed epidemic disease name exceeds a set training degree of correlation threshold value, the training of the learning model is successful.
Preferably, after the disease knowledge search system outputs the first search result data set, the method further comprises:
a scoring processing module of the disease knowledge search system receives a first set of scoring data; the first set of scoring data comprises a plurality of first scoring data; the first set of scoring data corresponds to the first set of search result data; the first scoring data corresponds to the first search result data;
taking the plurality of first semantic label numbers as newly-added semantic label training data;
in the semantic label and epidemic disease name training library, taking the training disease name data corresponding to the first score data with the highest score as target training disease name data;
and adding the newly added semantic tag training data into the semantic tag and epidemic disease name training library, and establishing a corresponding relation between the newly added semantic tag training data and the target training disease name data.
A second aspect of an embodiment of the present invention provides a system for searching knowledge of a disease through speech, the system including:
the data preprocessing module is used for receiving the first voice data, performing first voice preprocessing on the first voice data and generating first statement audio data;
the voice recognition module is used for performing first audio character recognition processing on the first sentence audio data to generate first sentence character data;
the semantic identification module is used for carrying out first semantic tag identification processing on the first statement character data to generate a first semantic tag data set; the first set of semantic tag data comprises first tag type data and a plurality of first semantic tag data;
the disease learning module is used for performing first disease classification learning processing corresponding to the first label type data according to the plurality of first semantic label data to generate a plurality of first disease name data and corresponding first disease probability data;
the disease knowledge extraction module is used for inquiring a first name and related information corresponding relation table reflecting the corresponding relation between the disease name and the related information according to each first disease name data to generate a corresponding first disease knowledge data set;
the search result output module is used for combining each first disease name data, the corresponding first disease probability data and the corresponding first disease knowledge data set into first search result data; and forming a first search result data set by all the first search result data and outputting the first search result data set.
The embodiment of the invention provides a method and a system for searching disease knowledge through voice, which are based on a preset disease knowledge base and are additionally provided with a voice recognition function and a disease classification learning model, so that an unnecessary input process is saved for a user, the time for filtering and screening information is saved for the user, and the user experience and the information searching precision are improved.
Drawings
FIG. 1 is a diagram illustrating a method for searching knowledge about diseases through speech according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a system for searching knowledge of diseases through voice according to a second embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
An embodiment of the present invention provides a method for searching knowledge of diseases by using voice, as shown in fig. 1, which is a schematic diagram of a method for searching knowledge of diseases by using voice according to an embodiment of the present invention, the method mainly includes the following steps:
step 1, a disease knowledge search system receives first voice data, and performs first voice preprocessing on the first voice data to generate first sentence audio data;
the method specifically comprises the following steps: the data preprocessing module of the disease knowledge search system receives the first voice data, and carries out first audio filtering and noise reduction processing on the first voice data to generate first statement audio data.
Here, the disease knowledge search system may be understood as a system having speech semantic recognition and an intelligent knowledge base; the system comprises a data preprocessing module, a voice recognition module, a semantic recognition module, a disease learning module and a disease knowledge extraction module; the data preprocessing module is used for acquiring, denoising and filtering original voice data; the voice recognition module is used for carrying out voice recognition on the preprocessed audio data to obtain sentence character data; the semantic recognition module carries out word segmentation and disease semantic recognition on the sentence character data, and all disease labels, namely semantic labels, and disease types with the maximum probability, namely label types are counted; the disease learning module is used for determining a disease classification learning model according to the label type, inputting all the counted disease labels into the disease classification learning model for deep learning, and finally obtaining a plurality of possible disease names and corresponding probabilities; the disease knowledge extraction module is used for extracting disease knowledge related to all possible diseases to serve as a final voice search result.
Here, in this step, the first Voice data is from a Voice recording device connected to the disease knowledge search system or a terminal device or a server storing original Voice data, and the data preprocessing module of the disease knowledge search system performs mute and noise separation processing on the Voice through a Voice Activity Detection algorithm (VAD); noise cancellation processing is performed on ambient noise, echoes, reverberation, and the like in the voice data using Least Mean Square (LMS) adaptive filtering, wiener filtering, and the like.
Step 2, carrying out first audio character recognition processing on the first sentence audio data to generate first sentence character data;
the method specifically comprises the following steps: and a voice recognition module of the disease knowledge search system inputs the first sentence audio data into a first acoustic language recognition model for recognition processing to generate first sentence text data.
Here, the first acoustic language recognition model used by the speech recognition module of the disease knowledge search system is commonly used as: 1) an acoustic Language identification Model composed of a Hidden Markov Model (HMM) + Gaussian Mixture Model (GMM) + N-Gram Language Model/Chinese Language Model (CLM); 2) an acoustic language recognition model consisting of HMM + Deep Neural Network (DNN) + N-Gram/CLM; the first acoustic language recognition model extracts characteristic data of input first sentence audio data, performs pronunciation matching on the characteristic data to obtain a pronunciation data sequence with maximum probability, and performs language word and word recognition on the pronunciation data sequence to obtain a word string with maximum probability, namely first sentence character data.
Step 3, carrying out first semantic tag identification processing on the first statement character data to generate a first semantic tag data set;
wherein the first set of semantic tag data comprises first tag type data and a plurality of first semantic tag data;
here, the semantic recognition module of the disease knowledge search system extracts a tag type and a semantic tag related to a known disease from the first sentence text data;
the method specifically comprises the following steps: step 31, a semantic recognition module of the disease knowledge search system inputs the first sentence character data into a first intelligent word segmentation recognition model for recognition processing to generate a plurality of first word segmentation data;
here, the first intelligent word segmentation recognition model used by the semantic recognition module of the disease knowledge search system is an algorithm model based on Natural Language Processing (NLP), and commonly used are: a forward Maximum Matching (MM) algorithm model, a Reverse Maximum Matching (RMM) algorithm model, a Bi-directional Maximum Matching (BM) algorithm model, an HMM algorithm model, and a Conditional Random Field (CRF) algorithm model;
here, the NLP theory is a technical theory for processing, understanding, and using human language in the field of computer science and artificial intelligence, so as to achieve effective communication between a human and a computer; NLP can be basically divided into two parts: natural language decomposition processing and natural language generation processing; the embodiment of the invention mainly relates to a natural language decomposition processing part, in particular to a method for extracting participles from first original information by using a first artificial intelligent participle algorithm model based on an NLP theory; the word segmentation is the word of the minimum unit in a segment of text information, and the segment of text information comprises a plurality of word segmentations;
for example, the first sentence text data is "my toothache and swollen gum", the first sentence text data is segmented and refined by using the first intelligent segmentation recognition model, and nouns and verbs are used as refinement addition items in the refinement, and finally obtained first segmentation information is respectively: "my", "tooth", "pain", "get", "tooth", "bed", "swelling", "plus" toothache "," gum swelling ";
step 32, using the plurality of first participle data to query a first participle and semantic label corresponding relation table reflecting the corresponding relation between the participle and the semantic label to obtain a plurality of first semantic label data;
the first participle and semantic label corresponding relation table comprises a plurality of first participle and semantic label corresponding relation records; the first word segmentation and semantic label corresponding relation record comprises first word segmentation information and first semantic label information;
the method specifically comprises the following steps: polling the corresponding relation record of the first participle and the semantic label in the corresponding relation table of the first participle and the semantic label, and taking the corresponding relation record of the currently polled first participle and the semantic label as a first current record;
performing first matching processing on the first word segmentation information recorded at the first current time by using a plurality of first word segmentation data; sequentially extracting first word segmentation data from the plurality of first word segmentation data to serve as first current word segmentation data; when the first current word segmentation data is the same as the first word segmentation information, the first matching processing is successful; when the first matching processing is successful, extracting first semantic label information of a first current record to generate first semantic label data;
here, the corresponding relation table of the first participle and the semantic tag used by the semantic recognition module of the disease knowledge search system may be a database relation table or a data file; the first participle and semantic label corresponding relation table is used for carrying out disease semantic labeling processing on the natural language words, so that redundant data generated by repeated expression and approximate expression can be reduced; semantic tags here are actually tags related to disease symptoms, e.g., 195 for poor dental nerve perception, 196 for gum pathology, 197 for tooth bleeding symptoms, 279 for chest discomfort, 280 for poor breathing, etc.;
for example, the table of correspondence between the first participle and the semantic tag is shown in table one, and the plurality of pieces of first participle information are respectively: "i", "tooth", "pain", "get", "tooth", "bed", "swelling", "plus" pain "," gum swelling "are given two first semantic label data: 195 and 196;
Figure BDA0002861427170000101
watch 1
Step 33, according to each first semantic tag data, inquiring a first semantic tag and tag type corresponding relation table reflecting the semantic tag and tag type corresponding relation, and generating corresponding first inquiry tag type data;
the first semantic label and label type corresponding relation table comprises a plurality of first semantic label and label type corresponding relation records; the first semantic label and label type corresponding relation record comprises second semantic label information and first label type information;
the method specifically comprises the following steps: polling a first semantic label and label type corresponding relation record of a first semantic label and label type corresponding relation table, and taking the currently polled first semantic label and label type corresponding relation record as a second current record;
when each first semantic label data is the same as second semantic label information of a second current record, extracting first label type information of the second current record as corresponding first query label type data;
here, the first semantic tag and tag type correspondence table used by the semantic identification module of the disease knowledge search system may be a database relationship table or a data file; querying the disease type corresponding to the disease semantic label through the first semantic label and label type corresponding relation table, wherein the disease type is actually a large class, for example, 11 represents dental related diseases, 21 represents heart related diseases, 31 represents respiratory related diseases, and the like;
for example, the table of correspondence between the first semantic tag and the tag type is shown in table two, and two first semantic tag data: 195 and 196, the two first query tag type data obtained are 11, 11;
Figure BDA0002861427170000111
watch two
Step 34, merging the first query tag type data with the same type into a type group in all the first query tag type data, and taking the tag type corresponding to the type group containing the most first query tag type data as the first tag type data;
here, the disease category with the highest probability is selected from the disease categories;
for example, two first semantic tag data: 195 and 196; all corresponding first query tag type data are 11 and 11; generating a set of types comprising 11, 11; if the tag type corresponding to the type group containing the largest amount of the first query tag type data is the tag type 11 of the type group, the first tag type data is 11;
step 35, forming a plurality of first semantic tag data by all the first semantic tag data; the first semantic tag data set is composed of first tag type data and a plurality of first semantic tag data.
Here, after steps 31-35 ofstep 3, the semantic recognition module of the disease knowledge search system performs further semantic analysis on the first sentence text data obtained instep 2, and the obtained first semantic tag data set includes the maximum probability disease category, that is, the first tag type data, and all semantic tags related to symptoms extracted from the original sentence.
Step 4, according to the plurality of first semantic tag data, performing first disease classification learning processing corresponding to the first tag type data to generate a plurality of first disease name data and corresponding first disease probability data;
the method specifically comprises the following steps: a disease learning module of the disease knowledge search system determines a corresponding first disease classification learning model according to the first label type data; inputting a plurality of first semantic label data into a first disease classification learning model for learning to obtain a plurality of groups of first learning output data groups; each set of first learning output data includes first disease name data and corresponding first disease probability data.
Here, the disease knowledge search system may have a plurality of disease classification learning models, such as a dental disease classification learning model for dental-related diseases, a cardiac disease classification learning model for cardiac-related diseases, a respiratory tract classification learning model for respiratory-related diseases, and the like; before each disease classification learning model is used, a model training module of a disease knowledge search system needs to train the disease classification learning model to be mature by using a semantic label and an epidemic disease name training library; an algorithm model adopted by the disease classification learning model is a random forest model commonly used, the type of input data can be classified and identified, and a plurality of possible classification results and the probability of each result are obtained; for example, the first tag type data is 11, a corresponding disease classification learning model, that is, a dental disease classification learning model, is selected, and for two first semantic tag data: 195 and 196, learning, the final calculation results are: periodontitis and its probable probability of 44%, gingivitis and its probable probability of 10.27%, pulpitis and its probable probability of 8.57%, and caries and its probable probability of 4.11%.
Step 5, according to each first disease name data, inquiring a first name and related information corresponding relation table reflecting the corresponding relation between the disease name and the related information of the disease, and generating a corresponding first disease knowledge data set;
the first name and related information corresponding relation table comprises a plurality of first name and related information corresponding relation records; the first name and related information corresponding relation record comprises first disease name information, first disease definition information, first disease symptom information, first disease cause information, first disease diagnosis mode information, first disease clinical expression information and first disease treatment mode information; the first disease knowledge data set at least comprises first disease definition data, first disease symptom data, first disease cause data, first disease diagnosis mode data, first disease clinical presentation data and first disease treatment mode data;
the method specifically comprises the following steps: a disease knowledge extraction module of the disease knowledge search system polls all first name and related information corresponding relation records of the first name and related information corresponding relation table according to each first disease name data, and takes the currently polled first name and related information corresponding relation record as a second current record;
when each first disease name data is the same as the first disease name information of the second current record, extracting first disease definition information from the second current record as corresponding first disease definition data, extracting first disease symptom information as corresponding first disease symptom data, extracting first disease cause information as corresponding first disease cause data, extracting first disease diagnosis mode information as corresponding first disease diagnosis mode data, extracting first disease clinical expression information as corresponding first disease clinical expression data, and extracting first disease treatment mode information as corresponding first disease treatment mode data;
and a corresponding first disease knowledge data set is composed of first disease definition data, first disease symptom data, first disease cause data, first disease diagnosis mode data, first disease clinical presentation data and first disease treatment mode data.
Here, the disease knowledge extraction module of the disease knowledge search system uses a first name and related information correspondence table which is actually a disease knowledge base, which may be a relational database, a form set composed of a plurality of database relationship tables, or a file set composed of a plurality of data files; in the first name and related information corresponding relation table, each first name and related information corresponding relation record records related information of a disease, including name, definition, common symptoms, etiology and inducement, diagnosis mode, clinical manifestation, treatment mode and the like; by taking the first disease name data as a query keyword, all relevant information can be extracted by querying the corresponding relation between the first name and the relevant information;
for example, 4 sets of first disease name data and corresponding first disease probability data are obtained from step 4: periodontitis and its probable probability of 44%, gingivitis and its probable probability of 10.27%, pulpitis and its probable probability of 8.57%, caries and its probable probability of 4.11%; then 4 first disease knowledge data sets are obtained by step 5: a disease knowledge data set (including definitions, common symptoms, etiologies and causes, diagnostic modalities, clinical manifestations, treatment modalities, etc.) for periodontitis, a disease knowledge data set (including definitions, common symptoms, etiologies and causes, diagnostic modalities, clinical manifestations, treatment modalities, etc.) for gingivitis, a disease knowledge data set (including definitions, common symptoms, etiologies and causes, diagnostic modalities, clinical manifestations, treatment modalities, etc.) for pulpitis, a disease knowledge data set (including definitions, common symptoms, etiologies and causes, diagnostic modalities, clinical manifestations, treatment modalities, etc.) for caries.
And 6, forming first search result data by each first disease name data, the corresponding first disease probability data and the corresponding first disease knowledge data set.
For example, the disease knowledge extraction module of the disease knowledge search system obtains 4 first search result data from 4 sets of first disease name data and corresponding first disease probability data, and 4 first disease knowledge data sets:
1 st first search result data: periodontitis, probability of probable 44%, a knowledge data set of diseases about periodontitis (including definitions, common symptoms, etiologies and causes, diagnostic modalities, clinical manifestations, treatment modalities, etc.);
2 nd first search result data: gingivitis, probability of being 10.27%, a set of knowledge data about the disease of gingivitis (including definitions, common symptoms, etiologies and causes, diagnostic modalities, clinical manifestations, treatment modalities, etc.);
3 rd first search result data: pulpitis, likely probability of 8.57%, a set of disease knowledge data about pulpitis (including definitions, common symptoms, etiologies and causes, diagnostic modalities, clinical manifestations, treatment modalities, etc.);
4 th first search result data: caries, probability 4.11%, disease knowledge data set about caries (including definitions, common symptoms, etiologies and causes, diagnostic modalities, clinical manifestations, treatment modalities, etc.).
And 7, forming a first search result data set by all the first search result data and outputting the first search result data set.
Here, the disease knowledge extraction module of the disease knowledge search system assembles all the obtained first search result data into a first search result data set to be fed back to the user.
In addition, the disease knowledge search system further comprises a model training module, and in the embodiment of the invention, before each disease classification learning model is put into use, the model training module needs to use the semantic label and the epidemic disease name training library to train each disease classification learning model, wherein the training process is briefly described as follows:
a1, extracting multiple groups of semantic label training data corresponding to specified epidemic disease name training data from semantic labels and an epidemic disease name training library by a model training module of the disease knowledge search system, inputting the semantic label training data into a first disease classification learning model for training, and obtaining multiple groups of training output data groups;
the semantic label and epidemic disease name training library comprises a plurality of semantic label training data and a plurality of epidemic disease name training data; each epidemic disease name training data corresponds to a plurality of semantic label training data; the training output data set includes training output disease name data and training output disease probability data.
The training data in the semantic label and the epidemic disease name training database are verified data, wherein the corresponding relation between the semantic label training data and the epidemic disease name training data is verified to be correct; the data of the semantic label and the epidemic disease name training library can be effective test data provided by a third-party testing organization, and can also be medical data acquired from a medical organization; the larger the training data amount is, the more accurate the corresponding relation is, and the higher the precision of the trained model is.
And step A2, when the training output disease name data corresponding to the training output disease probability data with the highest probability in the multiple training output data sets is the same as the training data of the designated epidemic disease name, and the training output disease probability data with the highest probability exceeds the set training probability threshold, and/or the correlation between the training output disease name data and the training data of the designated epidemic disease name exceeds the set training correlation threshold, the training of the learning model is successful.
Here are the conditions described for terminating training during model training: on the premise of ensuring that the designated disease name data appears and the probability is maximum, the precision of the probability is high enough to exceed a set training probability threshold value as a reference; and the relevance between other classification results and the main classification result can be considered, and the higher the relevance is, the higher the calculation precision of the model is, and the reference is that the relevance exceeds the set training relevance threshold.
In addition, the disease knowledge search system further includes a score processing module, and after the first search result data set is output, the score processing module automatically enriches the semantic label and the epidemic disease name training library according to the score of the user on the output result, which is specifically described as follows:
step B1, a scoring processing module of the disease knowledge search system receives a first scoring data set;
wherein the first set of scoring data comprises a plurality of first scoring data; the first set of scoring data corresponds to the first set of search result data; the first score data corresponds to the first search result data.
For example, after the disease knowledge search system displays 4 pieces of first search result data to the user, the disease knowledge search system also provides the user with an evaluation function, and the evaluation is ranked in three levels: best, general, not; if the score of the user on the 1 st first search result data is the most consistent, the score on the 2 nd is generally the consistent, and the scores on the 3 rd and 4 th are not consistent, the score processing module may obtain 4 first score data in the first score data set as: best, general, not.
Step B2, using a plurality of first semantic label numbers as newly-added semantic label training data; training disease name data corresponding to first grade data with the highest grade are used as target training disease name data in a semantic label and epidemic disease name training library; and adding new semantic tag training data into the semantic tag and epidemic disease name training library, and establishing a corresponding relation between the new semantic tag training data and the target training disease name data.
For example, if the disease name data corresponding to the 1 st first search result data with the highest score, i.e., the score that is the most matched with the disease name data, is "periodontitis", from the 4 first score data of the first score data set, this step will convert the current two first semantic tag numbers obtained by the user's voice: 195 and 196, which are added to the semantic tag and epidemic name training library and are associated with training disease name data in the library, specifically "periodontitis", which is what is actually adding valid training data to the semantic tag and epidemic name training library.
A second embodiment of the present invention provides a system for searching knowledge of diseases by using voice, where the system is used to implement the system function of the disease knowledge searching system in the above embodiment, and specifically, as shown in fig. 2, which is a schematic structural diagram of a system for searching knowledge of diseases by using voice, the system 20 mainly includes: a data preprocessing module 201, a voice recognition module 202, a semantic recognition module 203, a disease learning module 204, a disease knowledge extraction module 205, and a search result output module 206.
The data preprocessing module 201 is configured to receive first voice data, perform first voice preprocessing on the first voice data, and generate first sentence audio data.
The voice recognition module 202 is configured to perform a first audio character recognition process on the first sentence audio data to generate first sentence character data.
The semantic identification module 203 is configured to perform a first semantic tag identification process on the first statement text data to generate a first semantic tag data set; the first set of semantic tag data includes a first tag type data and a plurality of first semantic tag data.
The disease learning module 204 is configured to perform a first disease classification learning process corresponding to the first tag type data according to the plurality of first semantic tag data, and generate a plurality of first disease name data and corresponding first disease probability data.
The disease knowledge extraction module 205 is configured to query, according to each first disease name data, a first name and related information correspondence table that reflects a correspondence between disease names and related information, and generate a corresponding first disease knowledge data set.
The search result output module 206 is configured to combine each first disease name data, and the corresponding first disease probability data and first disease knowledge data set thereof into first search result data; and forming a first search result data set by all the first search result data and outputting the first search result data set.
Here, in the system for searching knowledge of disease through voice provided in the second embodiment of the present invention, the functions of the modules are the same as those of the modules corresponding to the system for searching knowledge of disease in the first embodiment, which is not further described herein.
The embodiment of the invention provides a method and a system for searching disease knowledge through voice, which are based on a preset disease knowledge base and are additionally provided with a voice recognition function and a disease classification learning model, so that an unnecessary input process is saved for a user, the time for filtering and screening information is saved for the user, and the user experience and the information searching precision are improved.
Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware, a software module executed by a processor, or a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. A method for searching knowledge of a disease by speech, the method comprising:
the disease knowledge search system receives the first voice data, performs first voice preprocessing on the first voice data and generates first sentence audio data;
performing first audio word recognition processing on the first sentence audio data to generate first sentence word data;
performing first semantic tag identification processing on the first statement text data to generate a first semantic tag data set; the first set of semantic tag data comprises first tag type data and a plurality of first semantic tag data;
according to the multiple first semantic label data, performing first disease classification learning processing corresponding to the first label type data to generate multiple first disease name data and corresponding first disease probability data;
according to each first disease name data, inquiring a first name and related information corresponding relation table reflecting the corresponding relation between the disease name and the related information of the disease to generate a corresponding first disease knowledge data set;
forming first search result data by each first disease name data, the first disease probability data corresponding to the first disease name data and the first disease knowledge data set;
and forming a first search result data set by all the first search result data and outputting the first search result data set.
2. The method for searching knowledge about diseases through voices according to claim 1, wherein the disease knowledge searching system receives first voice data and performs first voice preprocessing on the first voice data to generate first sentence audio data, and specifically comprises:
and a data preprocessing module of the disease knowledge search system receives the first voice data, and performs first audio filtering and noise reduction processing on the first voice data to generate the first statement audio data.
3. The method for searching knowledge about diseases through voice according to claim 1, wherein the performing a first audio word recognition process on the first sentence audio data to generate first sentence word data specifically includes:
and the voice recognition module of the disease knowledge search system inputs the first sentence audio data into a first acoustic language recognition model for recognition processing to generate the first sentence text data.
4. The method for searching knowledge about diseases through voice according to claim 1, wherein the performing a first semantic tag recognition process on the first sentence text data to generate a first semantic tag data set specifically includes:
the semantic recognition module of the disease knowledge search system inputs the first sentence character data into a first intelligent word segmentation recognition model for recognition processing to generate a plurality of first word segmentation data;
using the plurality of first participle data to query a first participle and semantic label corresponding relation table reflecting the corresponding relation between the participle and the semantic label to obtain a plurality of first semantic label data;
according to each piece of first semantic label data, inquiring a first semantic label and label type corresponding relation table reflecting the corresponding relation of semantic labels and label types, and generating corresponding first inquiry label type data;
combining the first query tag type data with the same type into a type group in all the first query tag type data, and taking the tag type corresponding to the type group containing the first query tag type data with the largest quantity as the first tag type data;
composing the plurality of first semantic tag data from all of the first semantic tag data; and forming the first semantic tag data set by the first tag type data and the plurality of first semantic tag data.
5. The method of claim 4, wherein the searching a table of correspondence between the first participles and the semantic tags, which reflects the correspondence between the participles and the semantic tags, using the plurality of first participles data, to obtain a plurality of first semantic tag data, specifically comprises:
polling all first participle and semantic label corresponding relation records in the first participle and semantic label corresponding relation table, and taking the currently polled first participle and semantic label corresponding relation record as a first current record; the first participle and semantic label corresponding relation table comprises a plurality of first participle and semantic label corresponding relation records; the first word segmentation and semantic label corresponding relation record comprises first word segmentation information and first semantic label information;
performing first matching processing with the first word segmentation information of the first current record by using the plurality of first word segmentation data; sequentially extracting first word segmentation data from the plurality of first word segmentation data to serve as first current word segmentation data; when the first current participle data is the same as the first participle information, the first matching processing is successful;
and when the first matching processing is successful, extracting the first semantic label information of the first current record to generate the first semantic label data.
6. The method of claim 1, wherein the performing a first disease classification learning process corresponding to the first tag type data according to the plurality of first semantic tag data to generate a plurality of first disease name data and corresponding first disease probability data specifically includes:
a disease learning module of the disease knowledge search system determines a corresponding first disease classification learning model according to the first label type data; inputting the plurality of first semantic label data into the first disease classification learning model for learning to obtain a plurality of groups of first learning output data groups; each set of the first learning output data includes the first disease name data and the corresponding first disease probability data.
7. The method for searching knowledge of illness by speech according to claim 1,
the first name and related information corresponding relation table comprises a plurality of first name and related information corresponding relation records; the first name and related information corresponding relation record comprises first disease name information, first disease definition information, first disease symptom information, first disease cause information, first disease diagnosis mode information, first disease clinical expression information and first disease treatment mode information;
the first disease knowledge data set at least comprises first disease definition data, first disease symptom data, first disease cause data, first disease diagnosis mode data, first disease clinical presentation data and first disease treatment mode data;
the querying, according to each of the first disease name data, a first name and related information correspondence table that reflects a correspondence between a disease name and disease related information, and generating a corresponding first disease knowledge data set specifically include:
a disease knowledge extraction module of the disease knowledge search system polls all first name and related information corresponding relation records of the first name and related information corresponding relation table according to each first disease name data, and takes the currently polled first name and related information corresponding relation record as a second current record;
when each first disease name data is the same as the first disease name information of the second current record, extracting the first disease definition information as the corresponding first disease definition data, extracting the first disease symptom information as the corresponding first disease symptom data, extracting the first disease cause information as the corresponding first disease cause data, extracting the first disease diagnosis mode information as the corresponding first disease diagnosis mode data, extracting the first disease clinical expression information as the corresponding first disease clinical expression data, and extracting the first disease treatment mode information as the corresponding first disease treatment mode data from the second current record;
and the corresponding first disease knowledge data set is composed of the first disease definition data, the first disease symptom data, the first disease cause data, the first disease diagnosis mode data, the first disease clinical presentation data and the first disease treatment mode data.
8. The method for searching knowledge of illness through speech of claim 6, wherein prior to using the first illness classification learning model, the method further comprises:
the model training module of the disease knowledge search system performs learning model training processing on the first disease classification learning model by using semantic labels and an epidemic disease name training library; extracting multiple groups of semantic label training data corresponding to the specified epidemic disease name training data from the semantic label and epidemic disease name training library, and inputting the semantic label training data into the first disease classification learning model for training to obtain multiple groups of training output data groups; the semantic label and epidemic disease name training library comprises a plurality of semantic label training data and a plurality of epidemic disease name training data; each epidemic disease name training data corresponds to a plurality of semantic label training data; the training output data set comprises training output disease name data and training output disease probability data;
when the training output disease name data which is the highest in probability in the multiple groups of training output data sets and corresponds to the training output disease probability data is the same as the training data of the appointed epidemic disease name and is the highest in probability the training output disease probability data exceeds a set training probability threshold value and/or other degrees of correlation between the training output disease name data and the training data of the appointed epidemic disease name exceeds a set training degree of correlation threshold value, the training of the learning model is successful.
9. The method for searching knowledge of illness through speech of claim 8, wherein after the illness knowledge search system outputs the first search result data set, the method further comprises:
a scoring processing module of the disease knowledge search system receives a first set of scoring data; the first set of scoring data comprises a plurality of first scoring data; the first set of scoring data corresponds to the first set of search result data; the first scoring data corresponds to the first search result data;
taking the plurality of first semantic label numbers as newly-added semantic label training data;
in the semantic label and epidemic disease name training library, taking the training disease name data corresponding to the first score data with the highest score as target training disease name data;
and adding the newly added semantic tag training data into the semantic tag and epidemic disease name training library, and establishing a corresponding relation between the newly added semantic tag training data and the target training disease name data.
10. A system for searching knowledge of diseases by voice, the system comprising:
the data preprocessing module is used for receiving the first voice data, performing first voice preprocessing on the first voice data and generating first statement audio data;
the voice recognition module is used for performing first audio character recognition processing on the first sentence audio data to generate first sentence character data;
the semantic identification module is used for carrying out first semantic tag identification processing on the first statement character data to generate a first semantic tag data set; the first set of semantic tag data comprises first tag type data and a plurality of first semantic tag data;
the disease learning module is used for performing first disease classification learning processing corresponding to the first label type data according to the plurality of first semantic label data to generate a plurality of first disease name data and corresponding first disease probability data;
the disease knowledge extraction module is used for inquiring a first name and related information corresponding relation table reflecting the corresponding relation between the disease name and the related information according to each first disease name data to generate a corresponding first disease knowledge data set;
the search result output module is used for combining each first disease name data, the corresponding first disease probability data and the corresponding first disease knowledge data set into first search result data; and forming a first search result data set by all the first search result data and outputting the first search result data set.
CN202011567638.5A2020-12-252020-12-25Method and system for searching disease knowledge through voiceActiveCN112735475B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202011567638.5ACN112735475B (en)2020-12-252020-12-25Method and system for searching disease knowledge through voice

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202011567638.5ACN112735475B (en)2020-12-252020-12-25Method and system for searching disease knowledge through voice

Publications (2)

Publication NumberPublication Date
CN112735475Atrue CN112735475A (en)2021-04-30
CN112735475B CN112735475B (en)2023-02-21

Family

ID=75616691

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202011567638.5AActiveCN112735475B (en)2020-12-252020-12-25Method and system for searching disease knowledge through voice

Country Status (1)

CountryLink
CN (1)CN112735475B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN114547373A (en)*2022-02-212022-05-27山东浪潮超高清视频产业有限公司Method for intelligently identifying and searching programs based on audio

Citations (11)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20030105638A1 (en)*2001-11-272003-06-05Taira Rick K.Method and system for creating computer-understandable structured medical data from natural language reports
CN101404035A (en)*2008-11-212009-04-08北京得意音通技术有限责任公司Information search method based on text or voice
CN105760399A (en)*2014-12-192016-07-13华为软件技术有限公司Data retrieval method and device
CN106682411A (en)*2016-12-222017-05-17浙江大学Method for converting physical examination diagnostic data into disease label
US20180137250A1 (en)*2016-11-152018-05-17Hefei University Of TechnologyMobile health intelligent medical guide system and method thereof
CN108052659A (en)*2017-12-282018-05-18北京百度网讯科技有限公司Searching method, device and electronic equipment based on artificial intelligence
CN108182262A (en)*2018-01-042018-06-19华侨大学Intelligent Answer System construction method and system based on deep learning and knowledge mapping
US20200159848A1 (en)*2018-11-202020-05-21International Business Machines CorporationSystem for responding to complex user input queries using a natural language interface to database
CN111274373A (en)*2020-01-162020-06-12山东大学Electronic medical record question-answering method and system based on knowledge graph
CN111522910A (en)*2020-04-142020-08-11浙江大学 An Intelligent Semantic Retrieval Method Based on Knowledge Graph of Cultural Relics
CN111597308A (en)*2020-05-192020-08-28中国电子科技集团公司第二十八研究所Knowledge graph-based voice question-answering system and application method thereof

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20030105638A1 (en)*2001-11-272003-06-05Taira Rick K.Method and system for creating computer-understandable structured medical data from natural language reports
CN101404035A (en)*2008-11-212009-04-08北京得意音通技术有限责任公司Information search method based on text or voice
CN105760399A (en)*2014-12-192016-07-13华为软件技术有限公司Data retrieval method and device
US20180137250A1 (en)*2016-11-152018-05-17Hefei University Of TechnologyMobile health intelligent medical guide system and method thereof
CN106682411A (en)*2016-12-222017-05-17浙江大学Method for converting physical examination diagnostic data into disease label
CN108052659A (en)*2017-12-282018-05-18北京百度网讯科技有限公司Searching method, device and electronic equipment based on artificial intelligence
CN108182262A (en)*2018-01-042018-06-19华侨大学Intelligent Answer System construction method and system based on deep learning and knowledge mapping
US20200159848A1 (en)*2018-11-202020-05-21International Business Machines CorporationSystem for responding to complex user input queries using a natural language interface to database
CN111274373A (en)*2020-01-162020-06-12山东大学Electronic medical record question-answering method and system based on knowledge graph
CN111522910A (en)*2020-04-142020-08-11浙江大学 An Intelligent Semantic Retrieval Method Based on Knowledge Graph of Cultural Relics
CN111597308A (en)*2020-05-192020-08-28中国电子科技集团公司第二十八研究所Knowledge graph-based voice question-answering system and application method thereof

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN114547373A (en)*2022-02-212022-05-27山东浪潮超高清视频产业有限公司Method for intelligently identifying and searching programs based on audio

Also Published As

Publication numberPublication date
CN112735475B (en)2023-02-21

Similar Documents

PublicationPublication DateTitle
CN112750465B (en)Cloud language ability evaluation system and wearable recording terminal
Syed et al.Automated recognition of alzheimer’s dementia using bag-of-deep-features and model ensembling
CN105845134B (en)Spoken language evaluation method and system for freely reading question types
CN108491486B (en)Method, device, terminal equipment and storage medium for simulating patient inquiry dialogue
Lipping et al.Crowdsourcing a dataset of audio captions
CN109637674B (en)Method, system, medium, and apparatus for automatically obtaining answers to health care questions
Levitan et al.Combining Acoustic-Prosodic, Lexical, and Phonotactic Features for Automatic Deception Detection.
Gontijo et al.Grapheme—phoneme probabilities in British English
CN109192194A (en)Voice data mask method, device, computer equipment and storage medium
CN113689951B (en) Intelligent diagnosis guidance method, system and computer-readable storage medium
Tasnim et al.Depac: a corpus for depression and anxiety detection from speech
CN106897559A (en)A kind of symptom and sign class entity recognition method and device towards multi-data source
CA3179063A1 (en)Machine learning systems and methods for multiscale alzheimer's dementia recognition through spontaneous speech
CN113593523B (en)Speech detection method and device based on artificial intelligence and electronic equipment
Sindhu et al.Automatic speech and voice disorder detection using deep learning—a systematic literature review
CN110675292A (en)Child language ability evaluation method based on artificial intelligence
Wagner et al.Applying cooperative machine learning to speed up the annotation of social signals in large multi-modal corpora
WO2022257630A1 (en)Risk detection method and apparatus based on multi-modal concealed information test
Ren et al.Evaluation of the pain level from speech: Introducing a novel pain database and benchmarks
CN118116532A (en)RTHD electronic medical record generation method and man-machine interaction system
CN112735475B (en)Method and system for searching disease knowledge through voice
CN113761899B (en) A medical text generation method, device, equipment and storage medium
CN110033778B (en) A real-time recognition and correction system for lying state
Hsiao et al.A Text-Dependent End-to-End Speech Sound Disorder Detection and Diagnosis in Mandarin-Speaking Children
CN112562856B (en)Method and system for searching health knowledge through voice

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp