Movatterモバイル変換


[0]ホーム

URL:


CN108804423A - Medical Text character extraction and automatic matching method and system - Google Patents

Medical Text character extraction and automatic matching method and system
Download PDF

Info

Publication number
CN108804423A
CN108804423ACN201810537989.8ACN201810537989ACN108804423ACN 108804423 ACN108804423 ACN 108804423ACN 201810537989 ACN201810537989 ACN 201810537989ACN 108804423 ACN108804423 ACN 108804423A
Authority
CN
China
Prior art keywords
word
medical
vector
modular
distance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810537989.8A
Other languages
Chinese (zh)
Other versions
CN108804423B (en
Inventor
陈娴娴
丁睿
汤时虎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Ping An Medical Health Technology Service Co Ltd
Original Assignee
Ping An Medical and Healthcare Management Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Medical and Healthcare Management Co LtdfiledCriticalPing An Medical and Healthcare Management Co Ltd
Priority to CN201810537989.8ApriorityCriticalpatent/CN108804423B/en
Publication of CN108804423ApublicationCriticalpatent/CN108804423A/en
Application grantedgrantedCritical
Publication of CN108804423BpublicationCriticalpatent/CN108804423B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

This disclosure relates to medical Text character extraction and automatic matching method and system, the method includes:Step 1 extracts medical text from externally input medical data, and carries out word segmentation processing to medical text, obtains waiting for carrying out matched medical word with the modular word in controlled term list;Step 2, for each medical word, operated by term vector, obtain the N-dimensional vector corresponding to each morpheme in medical word, form M × N-dimensional matrix corresponding with medical word, M is the quantity of morpheme included in medical treatment word;M corresponding with medical word × N-dimensional matrix dimensionality reduction is vector, the vector after generation dimensionality reduction by step 3;Step 4, the vector distance for calculating separately vector and the vector corresponding to each modular word in controlled term list after dimensionality reduction;Step 5 is ranked up calculated vector distance from small to large, the preceding one or more modular words of vector distance sequence with the vector after dimensionality reduction is chosen, as candidate modular word.

Description

Medical Text character extraction and automatic matching method and system
Technical field
The present invention relates to Internet service technical field more particularly to a kind of medical Text character extraction and Auto-matching sidesMethod and system.
Background technology
External medical text big data is based on big with the problem of docking of commercial entity (such as insurance institution) internal informationThe hot research topic of health field under data, deep learning background.Wherein, for external medical data, according to existing classificationTarget played a very important role during standardization is divided in inside and outside information butt joint.
Since medical bodies distinguish larger, mark identification hardly possible of the Opening field entity to general corpus information with common solidFor medical text, this is because the judgement of medical terms needs professional person to carry out, which greatly enhances medical textsIdentification and matched cost.
Currently, most conventional methods are all based on search engine, text similarity, simple editing distance etc., it is aided withThe rule manually extracted is operated, there are regular mutual exclusion, it is cumbersome, extremely inefficient, matching rate is extremely low the problems such as.
Due to the extensive infiltration of rule, portability is poor between leading to different scenes, is needed when needing to carry out scene switchingDouble teeming rule code is wanted, workload is huge, cannot meet the support of later stage decision model.And due to the medical big data of magnanimityIt constantly pours in, the control range of the order of magnitude remote ultra-traditional Rule Extraction.
Deep learning is achieving major progress in recent years, it has been proved to be able to excavate out the complexity in high dimensional dataStructure is learnt.Term vector (word embedding) is to be commonly used to substitute traditional bag of words (bag of word) in recent yearsWord representation method solves the problems, such as that bag of words indicate the dimension disaster brought.However, have not yet to see have been reported that based on term vector andCarry out the matched technological means of medical text normalization.
Invention content
In view of the above problem of the prior art, inventor is made that the present invention, for realizing under word in medical textThe ability of semantics recognition, lift scheme generalization ability be free to switch between different scenes, portable very strong, greatlyIt is big to reduce human resources consumption.
According to an embodiment of the invention, it provides a kind of medical Text character extraction and automatic matching method, feature existsIn including the following steps:
Step 1 extracts medical text from externally input medical data, and carries out word segmentation processing to medical text, obtainsTo the matched medical treatment word of modular word progress waited for and in controlled term list;
Step 2, for each medical word, operated by term vector, obtain each morpheme in the medical wordCorresponding N-dimensional vector forms M × N-dimensional matrix corresponding with the medical treatment word, wherein M in the medical word by wrappingThe quantity of the morpheme contained;
M × N-dimensional matrix dimensionality reduction corresponding with the medical word is vector by step 3, generates the vector after dimensionality reduction;
Step 4, calculate separately the vector after the dimensionality reduction with corresponding to each modular word in the controlled term list toThe vector distance of amount;
Step 5 is ranked up calculated vector distance from small to large, is chosen from the modular word of the controlled term listWith the preceding one or more modular words of vector distance sequence of the vector after the dimensionality reduction, as candidate modular word.
According to an embodiment of the invention, the method is further comprising the steps of:
Step 6, the logic calculated between the medical word and each candidate modular word include distance, by logic include away fromFrom minimum candidate modular word as with the final matched modular word of the medical word.
The method is further comprising the steps of according to an embodiment of the present invention, optionally,:
Editing distance between step 6, the calculating medical word and each candidate modular word, editing distance is minimumCandidate modular word is as the final matched modular word of the medical word.
The method is further comprising the steps of according to an embodiment of the present invention, optionally,:
Step 6, the logic calculated between the medical word and each candidate modular word include distance and editing distance, andInclude distance and the editing distance weighted sum by the logic, obtains the maximum candidate modular word of weighted sum result, makeFor the final matched modular word of the medical word.
According to an embodiment of the invention, in the step 3, the dimensionality reduction, the pondization side are carried out by pond methodMethod is one or more of average pond, maximum pond, minimum pond,
It, will M × N corresponding with the medical treatment word when using a kind of in average pond, maximum pond, minimum pondDimension matrix dimensionality reduction is 1 × N-dimensional vector, as the vector after the dimensionality reduction,
Wherein, when using several in average pond, maximum pond, minimum pond, by the vectorial cascade of Chi Huahou, shapeAt the vector after the dimensionality reduction.
According to an embodiment of the invention, further include after step 1:
Step 1-1, compared by text, judge whether the medical word and some modular word in controlled term list are completeIt is identical, if it is, the medical word match is directly terminated this method to the modular word.
According to an embodiment of the invention, the modular word and the medical word have attribute labeling, in the step 4In, it calculates separately the vector after the dimensionality reduction corresponding to the medical word and has with the medical word with the controlled term listThere is the vector distance of the vector corresponding to each modular word of identical attribute labeling.
According to an embodiment of the invention, the vector distance is Euclidean distance.
According to an embodiment of the invention, additionally provide it is a kind of for execute the medical Text character extraction of the method with fromDynamic matching system, it is characterised in that including word-dividing mode, term vector module, dimensionality reduction module, matching module,
Wherein, the word-dividing mode is used to extract medical text from externally input medical data, and to medical textWord segmentation processing is carried out, obtains waiting for carrying out matched medical word with the modular word in controlled term list;
For the term vector module for being operated by term vector, each morpheme institute obtained in the medical word is rightThe N-dimensional vector answered forms M × N-dimensional matrix, wherein M is the quantity of morpheme included in the medical word;
The dimensionality reduction module is for being vector by M × N-dimensional matrix dimensionality reduction corresponding with the medical treatment word, after generating dimensionality reductionVector;
The matching module is used for:
Calculate separately after the dimensionality reduction vector with the controlled term list in each modular word corresponding to vector toSpan from;
Calculated vector distance is ranked up from small to large, from the modular word of the controlled term list choose with it is describedThe preceding one or more modular words of vector distance sequence of vector after dimensionality reduction, as candidate modular word;
It includes distance and/or editing distance to calculate logic between the medical word and each candidate modular word, according to meterCalculate result select one in candidate modular word as with the final matched modular word of the medical treatment word.
According to an embodiment of the invention, a kind of computer readable storage medium, the computer-readable storage are additionally providedThe program for the above method is stored on medium, when described program is executed by processor, the step of execution according to the method.
Beneficial effects of the present invention essentially consist in:Feature extraction efficiency is improved, semantic knowledge under word in medical text is realizedOther ability, the model for being not based on rule make model generalization ability significant increase, be free to cut between different scenesIt changes, it is portable very strong, greatly reduce human resources consumption;We when test data matches, raw data setDirect matching rate can be stablized with matching rate after standardization automatic patching system 85%, substantially without manpower branch less than 8%It holds;Dynamically incremental data structure regularization system helps to timely feedback non-middle word character information, reaches in cycle is fed backThe further promotion of recognition effect;Compared to only with single vector-quantities Distance evaluation standard, effect is obviously improved, meanwhile, pass throughEuclidean distance substantially reduces the quantity that text compares operation as first order screening means, it means that has saved calculating moneySource improves calculating speed;Since term vector is without marking work, and the semantic information of vocabulary is contained, can greatly reduceHuman resources consume, and reduce the burden and difficulty of the differentiation work that medical terms need professional person to carry out.
Description of the drawings
Fig. 1 is to be illustrated according to the medical Text character extraction of one embodiment of the present of invention and the flow of automatic matching methodFigure;
Fig. 2 is to be flowed according to the medical Text character extraction of an alternative embodiment of the invention and the part of automatic matching methodJourney schematic diagram;
Fig. 3 is according to the medical Text character extraction of an alternative embodiment of the invention and the term vector of automatic matching methodThe conceptual schematic view of change;
Fig. 4 is to be illustrated according to the medical Text character extraction of the embodiment of the present invention and the function structure of automatic patching systemFigure;
Fig. 5 is the schematic diagram according to the running environment of the system for being mounted with application program of the embodiment of the present invention.
Specific implementation mode
In the following, being described in further detail to the implementation of technical solution in conjunction with attached drawing.
It will be appreciated by those of skill in the art that although the following description is related to many of embodiment for the present inventionTechnical detail, but be only for not meaning that any restrictions for illustrating the example of the principle of the present invention.The present invention can be applicable inIn different from the occasion except technical detail exemplified below, without departing from the principle and spirit of the invention.
It, may pair can be in description in the present specification in addition, tedious in order to avoid making the description of this specification be limited toThe portion of techniques details obtained in prior art data has carried out the processing such as omission, simplification, accommodation, this technology for this fieldIt will be understood by for personnel, and this does not interfere with the open adequacy of this specification.
Hereinafter, it will describe for carrying out the embodiment of the present invention.Note that by description is provided with following order:1, it sends outThe summary of bright design;2, medical Text character extraction and automatic matching method (Fig. 1 to 3);3, medical Text character extraction with fromDynamic matching system (Fig. 4);4, the system according to an embodiment of the invention for being mounted with application program and storage are described using journeyThe computer-readable medium (Fig. 5) of sequence.
1, the summary of inventive concept
The unsupervised Chinese text Automatic signature extraction that the present invention relates to a kind of based on Word2Vec and Euclidean distance, logicIncluding distance standardizes the model combined with editing distance, include mainly following aspect:
1, according to the Word Embedding methods of Skip-Gram models, by word pair in internal medical text to be matchedThe specific coordinate points under N-dimensional space should be arrived, the Chinese text vectorization under unsupervised scene is realized, has reached the energy of semantics recognitionPower, i.e. algorithm can solve conventional method with semantic information in automatic identification text, the Chinese text after vectorization from sourceIt needs that a large amount of rule defects are added;
2, the vector matrix under text length/short sentence is constructed on this basis, passes through Max, Min, the longitudinal directions Mean feature poolChange method carries out dimensionality reduction and captures text key feature, and Optimum Matching list to be selected is calculated in conjunction with Euclidean distance;
3, include that distance and editing distance weighted calculation obtain Optimum Matching item by logic, it is outer to efficiently improveThe matching rate of portion's medical treatment big data and target classification;
4, before mode input, can establish the incremental data structure regularization system of a set of dynamic based on Active Learning comeExclude non-middle word character noisy to semantics recognition.Include mainly all kinds of writing architecture digital/letters, spcial character, EnglishAlias etc..
In the following, in conjunction with the embodiments come illustrate foregoing invention design realization.
2, medical Text character extraction and automatic matching method
Fig. 1 and 2 is according to the medical Text character extraction of the embodiment of the present invention and the part flow of automatic matching methodSchematic diagram.
As shown in Figure 1, the embodiment provides a kind of medical Text character extraction and automatic matching methods, mainlyInclude the following steps:
Step S100, medical text is extracted from externally input medical data, and word segmentation processing is carried out to medical text,It obtains waiting for carrying out matched medical word with the modular word in controlled term list;
Step S200, operated by term vector, obtain N-dimensional corresponding to each morpheme in the medical word toAmount forms M × N-dimensional matrix, wherein M is the quantity of morpheme included in the medical word;
Step S300, it is N-dimensional corresponding with the medical treatment word by M × N-dimensional matrix dimensionality reduction using unidirectional pond methodVector;
Step S400, the N-dimensional vector corresponding to the medical word and each rule in the controlled term list are calculated separatelyThe Euclidean distance of N-dimensional vector corresponding to model word;
Step S500, calculated Euclidean distance is ranked up, from the modular word in the controlled term list choose withThe smaller multiple modular words of Euclidean distance of N-dimensional vector corresponding to the medical treatment word, as candidate modular word;
Step S600, calculate logic between the medical word and each candidate modular word include distance and/or editor away fromFrom, and include distance and the editing distance weighted sum by the logic, obtain the maximum candidate specification of weighted sum resultWord, as the final matched modular word of the medical word.
Wherein, the N-dimensional vector corresponding to each modular word in the controlled term list is by each modular wordCarry out N-dimensional vector obtained from above-mentioned steps S200 and S300, wherein each modular word corresponds to the medical word.
It specifically, in the step s 100, for example, can be matched by carrying out medical text with the entry in Medical DictionaryMethod carries out the word segmentation processing, the medical word after being split, for example, split after medical word can be " asthma "," tumour ", etc..
Optionally, in the step s 100, compared by text, judge the medical word and some rule in controlled term listWhether model word is identical, if it is, the medical word match is directly terminated this method to the modular word.
For example, the Medical Dictionary may include 500,000 disease names (disease name in production).
Wherein, following methods can be used in the matching:
1) Forward Maximum Method method (by left-to-right direction);
2) reverse maximum matching method (by right to left direction);
3) minimum cutting (keeping the word number cut out in each sentence minimum);
4) two-way maximum matching method (carry out by it is left-to-right, by right to left twice sweep).
It will be appreciated by those of skill in the art that the above method can be carried out by various known ways/algorithm, and also can be intoRow combination, details are not described herein.
In step s 200, Word2vec tools can be used in the term vectorization operation, are also referred to as wordEmbeddings, effect are exactly that the words in natural language is switched to the dense vector (Dense that computer is appreciated thatVector), and the word of wherein similar import will be mapped to that similar position in vector space.
It is operated by the term vectorization, is each to look for a suitable position vector by word in embedded space.ThisA vector can reflect some meanings on the syntax and semantics of word.
As an example, the step S100 may also include:
Step S101, the forbidden character in the medical text is excluded, including all kinds of writings architecture digital/letter, specialCharacter, English alias etc..Here, above-mentioned forbidden character can be filtered out by the filtering rule set in advance.
Specifically, as shown in Fig. 2, in step s 200, the term vector turn to the unsupervised word based on Word2Vec toQuantization operation, the term vectorization operation mainly include the following steps that:
Step S201, learnt first by a large amount of language material, identification " just sampling word " and " negative sampling word ";
Step S202, positive sampling word distance constantly being furthered, the degree to further depends on its current distance, meanwhile, it willNegative sampling word distance constantly pushes away far, pushes away remote degree and depends on its current distance;
Step S203, it is that each word looks for a suitable position vector in embedded space.This vector can reflect wordSyntax and semantics on some meanings.
That is, the angle cos of the position vector of i.e. two words is bigger, i.e., it is more similar, then illustrate the two words at wordPossibility is bigger or the possibility of near synonym is bigger;Opposite cos is smaller, i.e., more dissimilar, illustrates that the two words may at wordThe smaller or near synonym possibility of property is smaller.It is also contemplated for the distance between vector distance simultaneously.
Wherein, the meaning of above-mentioned term " just sampling word " and " negative sampling word " is as follows:
Positive sampling word:The character/word in a window is frequently appeared in, the semantic similarity between them is very high, also complies withGrammer logic, for example " swollen " and " tumor ", " heavy breathing " and " asthma " etc. at word word.
Negative sampling word:Few character/word appeared in a window, that is, do not meet grammer logic, semantic similarity is lowWord, for example " swollen " and " black ", " heavy breathing " and " pain " etc. be not at the word of word.
As an example, learning by Word2Vec, word vector (that is, " morpheme " corresponds to a Chinese character) is obtained in mouldThe value dimension d=100 that word vector in type is trained by hyper parameter takes d=2 to carry out example below for ease of understanding.Such asShown in Fig. 3.
Swollen [0.498006, -2.489054], tumor [0.691923, -2.792727],
- cos (swollen, tumor)=0.999, distance (swollen, tumor)=0.360
Swollen [0.498006, -2.489054], big [- 0.340440, -0.981898]
- cos (swollen, big)=0.862, distance (swollen, big)=1.725
Swollen [0.498006, -2.489054], it is red [1.092340, -3.372209]
- cos (swollen, red)=0.993, distance (swollen, red)=1.064
Swollen [0.498006, -2.489054], the upper arm [- 4.788107,2.656263]
- cos (swollen, the upper arm)=- 0.647, distance (swollen, the upper arm)=7.376
Swollen [0.498006, -2.489054], split [- 4.193781,2.289126]
- cos (swollen, to split)=- 0.642, distance (swollen, to split)=6.696
Swollen [0.498006, -2.489054] are rolled over [- 3.655881,2.100383]
- cos (swollen, folding)=- 0.658, distance (swollen, folding)=6.190
Bone [- 3.781678,2.185360], tumor [0.691923, -2.792727],
- cos (bone, tumor)=- 0.693, distance (bone, tumor)=6.692
Bone [- 3.781678,2.185360], big [- 0.340440, -0.981898]
- cos (bone, big)=- 0.189, distance (bone, big)=4.676
Bone [- 3.781678,2.185360], it is red [1.092340, -3.372209]
- cos (bone, red)=- 0.742, distance (bone, red)=7.392
Bone [- 3.781678,2.185360], the upper arm [- 4.788107,2.656263]
- cos (bone, the upper arm)=0.999, distance (bone, the upper arm)=1.111
Bone [- 3.781678,2.185360], splits [- 4.193781,2.289126]
- cos (bone is split)=0.999, distance (bone is split)=0.424
Bone [- 3.781678,2.185360] is rolled over [- 3.655881,2.100383]
- cos (bone, folding)=0.999, distance (bone, folding)=0.151
The word vector corresponding to each word in medicine word can be obtained as a result,.
Specifically, in the step S300, the unidirectional pond method is longitudinal pond, for average pond (meanPooling), one kind in maximum pond (max pooling), minimum pond (min pooling) (that is, pond window for M ×1);
For example, by one in above-mentioned pond method, it can be by 10 × 100 matrix (10 1 × 100 dimension word vector shapesAt matrix) dimensionality reduction be 1 × 100 dimension vector.
Optionally, three kinds of pond methods above can be used, (10 1 × 100 dimension word vectors are formed by 10 × 100 matrixMatrix) dimensionality reduction be three 1 × 100 dimension vectors, later further cascade formed 1 × 300 dimensional vector, in order to rearIn the step of face matching primitives are carried out with each 1 × 300 dimensional vector corresponding to each modular word in controlled term list.
Optionally, in step S400, the modular word and the medical word have attribute labeling, in the stepIn S400, calculating separately the N-dimensional vector corresponding to the medical word has with the controlled term list with the medical wordThe Euclidean distance of N-dimensional vector corresponding to each modular word of identical attribute labeling.
Specifically, in step S400, (that is, through the above steps will each medical word after text vector be standardizedSpecification turns to dimension identical with modular word), based on all canonical names by theorem in Euclid space range formula, calculate the medical treatmentThe Euclidean distance of N-dimensional vector and the N-dimensional vector corresponding to each modular word in the controlled term list corresponding to word.
As an example, the medical treatment word can be quantified as 1 × 300 dimensional vector, indicate as follows:A=[1.393092,1.349219,…,1.311361,-2.02858,-0.15119,…,-1.24318,-0.44072,0.98503,…,-0.05916]
And the modular word (such as disease name) in controlled term list respectively has corresponding 1 × 300 dimensional vector, indicates as follows:
B1=[0.395221,0.45926 ..., -3.252446, -3.020052,4.52419 ..., 2.214458, -1.4547,-1.98543,…,2.56514]
……
Bk=[1.393092,1.349219 ..., 1.311361, -2.02858, -0.15119 ..., -1.24318, -0.44072,0.98503,…,-0.05916]
……
By calculating Euclidean distance, it can be deduced that, the modular word with A closest to (Euclidean distance is minimum) is Bk.Show hereinIn example, BkEuclidean distance with A is 0, indicates to exactly match.
Specifically, in step S500, calculated Euclidean distance is ranked up from small to large, from the controlled term listIn modular word in choose multiple modular words with the Euclidean distance minimum of the N-dimensional vector corresponding to the medical word, asCandidate modular word;
Alternately collect as an example, filtering out shortest 10 modular words of Euclidean distance.
Specifically, in step S600, the logic includes the weight that distance indicates the medical word and the modular wordIt is right, for example, indicating that the logic includes distance by the number of identical characters;The editing distance is indicated the medical treatmentWord is compiled as the required minimum edit operation number of the modular word;In the weighted sum, the logic include away fromFrom weight be 2 times of weight of the editing distance.
Optionally, it includes distance or the editing distance that the logic, which also can only be used alone,.
Specifically, editing distance (Edit Distance) is also known as Levenshtein distances, refer between two word strings byOne minimum edit operation number changed into needed for another.The edit operation of license includes that a character is substituted for anotherCharacter is inserted into a character, deletes a character.In general, editing distance is smaller, and the similarity of two strings is bigger.
For example, the screening after calculating Euclidean distance, from externally input medical word " palm of the hand locality epidermis groupKnit contusion " have with modular word " contusion of centre of the palm locality epidermal tissue ", " centre of the palm locality epidermal tissue dampens tumour "There is smaller distance.Next editing distance is calculated.
The editing distance of " contusion of palm of the hand locality epidermal tissue " → " contusion of centre of the palm locality epidermal tissue " is 1;" handThe editing distance of heart locality epidermal tissue contusion " → " centre of the palm locality epidermal tissue dampens tumour " is 3.According to above-mentioned behaviourMake, the former may be selected as matching result.
Above-mentioned " Euclidean distance ", " logic include distance ", the concept of " editing distance " belong to the known concept of this field, areFor the sake of concise, details are not described herein.
3, medical Text character extraction and automatic patching system
According to an embodiment of the invention, a kind of medical Text character extraction and automatic patching system are additionally provided, for holdingEach step of the method in row the application, as shown in figure 4, the medical treatment Text character extraction and automatic patching system systemSystem includes mainly word-dividing mode, term vector module, dimensionality reduction module, matching module.
Wherein, the word-dividing mode is used to extract medical text from externally input medical data, and to medical textWord segmentation processing is carried out, obtains waiting for carrying out matched medical word with the modular word in controlled term list;
For the term vector module for being operated by term vector, each morpheme institute obtained in the medical word is rightThe N-dimensional vector answered forms M × N-dimensional matrix, wherein M is the quantity of morpheme included in the medical word;
The dimensionality reduction module is used to use unidirectional pond method, is opposite with the medical word by M × N-dimensional matrix dimensionality reductionThe N-dimensional vector answered;
The matching module is used for:
It calculates separately corresponding to the N-dimensional vector corresponding to the medical word and each modular word in the controlled term listN-dimensional vector Euclidean distance;
Calculated Euclidean distance is ranked up, is chosen from the modular word in the controlled term list and the medical wordThe smaller multiple modular words of the Euclidean distance of N-dimensional vector corresponding to language, as candidate modular word;
It includes distance and/or editing distance to calculate logic between the medical word and each candidate modular word, and by instituteIt includes distance and the editing distance weighted sum to state logic, the maximum candidate modular word of weighted sum result is obtained, as instituteState medical word finally matched modular word.
In addition, different embodiments of the invention by software module or can also be stored in one or more computer-readableThe mode of computer-readable instruction on medium is realized, wherein the computer-readable instruction is when by processor or equipment groupWhen part executes, different embodiment of the present invention is executed.Similarly, software module, computer-readable medium and Hardware SubdivisionThe arbitrary combination of part is all expected from the present invention.The software module can be stored in any type of computer-readable storageOn medium, such as RAM, EPROM, EEPROM, flash memory, register, hard disk, CD-ROM, DVD etc..
4, the system according to an embodiment of the invention for being mounted with application program
With reference to Fig. 5, it illustrates the running environment of the system according to the ... of the embodiment of the present invention for being mounted with application program.
In the present embodiment, the system of the installation application program is installed and is run in electronic device.The electronicsDevice can be the computing devices such as desktop PC, notebook, palm PC and server.The electronic device may include but notIt is limited to memory, processor and display.Attached drawing illustrates only the electronic device with said modules, it should be understood thatIt is not required for implementing all components shown, the implementation that can be substituted is more or less component.
The memory can be the internal storage unit of the electronic device, such as electronics dress in some embodimentsThe hard disk or memory set.The memory can also be the External memory equipment of the electronic device in further embodiments,Such as the plug-in type hard disk being equipped on the electronic device, intelligent memory card (Smart Media Card, SMC), secure digital(Secure Digital, SD) blocks, flash card (Flash Card) etc..Further, the memory can also both include instituteThe internal storage unit for stating electronic device also includes External memory equipment.The memory is installed on the electronics dress for storingThe application software and Various types of data set, for example, it is described installation application program system program code etc..The memory may be used alsoFor temporarily storing the data that has exported or will export.
The processor can be in some embodiments central processing unit (Central Processing Unit,CPU), microprocessor or other data processing chips, for running the program code stored in the memory or processing data,Such as execute the system etc. of the installation application program.
The display can be in some embodiments light-emitting diode display, liquid crystal display, touch-control liquid crystal display withAnd OLED (Organic Light-Emitting Diode, Organic Light Emitting Diode) touches device etc..The display is for showingShow the information handled in the electronic device and for showing visual customer interface, such as application menu interface, answersWith icon interface etc..The component of the electronic device is in communication with each other by system bus.
Through the above description of the embodiments, those skilled in the art is it will be clearly understood that the above embodimentIn method the mode of required general hardware platform can be added to realize by software, naturally it is also possible to realized by hardware,But the former is more preferably embodiment in many cases.Based on this understanding, the technical solution of the application of the present invention is substantiallyThe part that contributes to existing technology can be embodied in the form of Software Commodities in other words, which depositsStorage is in a storage medium (such as ROM/RAM, magnetic disc, CD), including use (can be with so that a station terminal equipment for some instructionsIt is mobile phone, computer, server, air conditioner or the network equipment etc.) execute side described in each embodiment of the application of the present inventionMethod.
That is, according to an embodiment of the invention, additionally providing a kind of computer readable storage medium, the computerThe program for executing the method according to an embodiment of the invention is stored on readable storage medium storing program for executing, described program is handledWhen device executes, each step of the method is executed.
By upper, it will be appreciated that for illustrative purposes, specific embodiments of the present invention are described herein, still, can makeEach modification, without departing from the scope of the present invention.It will be apparent to one skilled in the art that drawn in flow chart step or thisIn the operation that describes and routine can be varied in many ways.More specifically, the order of step can be rearranged, step can be executed parallelSuddenly, step can be omitted, it may include other steps can make the various combinations or omission of routine.Thus, the present invention is only by appended powerProfit requires limitation.

Claims (10)

CN201810537989.8A2018-05-302018-05-30Medical text feature extraction and automatic matching method and systemActiveCN108804423B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201810537989.8ACN108804423B (en)2018-05-302018-05-30Medical text feature extraction and automatic matching method and system

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201810537989.8ACN108804423B (en)2018-05-302018-05-30Medical text feature extraction and automatic matching method and system

Publications (2)

Publication NumberPublication Date
CN108804423Atrue CN108804423A (en)2018-11-13
CN108804423B CN108804423B (en)2023-09-08

Family

ID=64089361

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201810537989.8AActiveCN108804423B (en)2018-05-302018-05-30Medical text feature extraction and automatic matching method and system

Country Status (1)

CountryLink
CN (1)CN108804423B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN109582955A (en)*2018-11-142019-04-05金色熊猫有限公司Method, apparatus and medium for standardizing medical terms
CN110517747A (en)*2019-08-302019-11-29志诺维思(北京)基因科技有限公司Pathological data processing method, device and electronic equipment
CN110931090A (en)*2019-11-262020-03-27太平金融科技服务(上海)有限公司Disease data processing method and device, computer equipment and storage medium
CN111160012A (en)*2019-12-262020-05-15上海金仕达卫宁软件科技有限公司Medical term recognition method and device and electronic equipment
CN111680168A (en)*2020-05-292020-09-18平安银行股份有限公司Text feature semantic extraction method and device, electronic equipment and storage medium
CN111968744A (en)*2020-10-222020-11-20深圳大学Bayesian optimization-based parameter optimization method for stroke and chronic disease model
CN112115715A (en)*2020-09-042020-12-22北京嘀嘀无限科技发展有限公司Natural language text processing method and device, storage medium and electronic equipment
CN112800183A (en)*2021-02-252021-05-14国网河北省电力有限公司电力科学研究院 Content name data processing method and terminal device
CN113223657A (en)*2021-06-012021-08-06联仁健康医疗大数据科技股份有限公司Medicine information processing method and device, electronic equipment and storage medium
CN113689923A (en)*2020-05-192021-11-23北京平安联想智慧医疗信息技术有限公司Medical data processing apparatus, system and method

Citations (18)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN104572892A (en)*2014-12-242015-04-29中国科学院自动化研究所Text classification method based on cyclic convolution network
CN106021272A (en)*2016-04-042016-10-12上海大学Keyword automatic extraction method based on distributed expression word vector calculation
CN106933806A (en)*2017-03-152017-07-07北京大数医达科技有限公司The determination method and apparatus of medical synonym
CN106970910A (en)*2017-03-312017-07-21北京奇艺世纪科技有限公司A kind of keyword extracting method and device based on graph model
CN107122413A (en)*2017-03-312017-09-01北京奇艺世纪科技有限公司A kind of keyword extracting method and device based on graph model
US20170277783A1 (en)*2016-03-282017-09-28Oki Electric Industry Co., Ltd.Ontology processing device and a non-transitory computer-readable storage medium
CN107247780A (en)*2017-06-122017-10-13北京理工大学A kind of patent document method for measuring similarity of knowledge based body
CN107291693A (en)*2017-06-152017-10-24广州赫炎大数据科技有限公司A kind of semantic computation method for improving term vector model
CN107315734A (en)*2017-05-042017-11-03中国科学院信息工程研究所A kind of method and system for becoming pronouns, general term for nouns, numerals and measure words standardization based on time window and semanteme
CN107562717A (en)*2017-07-242018-01-09南京邮电大学A kind of text key word abstracting method being combined based on Word2Vec with Term co-occurrence
CN107562792A (en)*2017-07-312018-01-09同济大学A kind of question and answer matching process based on deep learning
CN107562715A (en)*2017-07-182018-01-09阿里巴巴集团控股有限公司Term vector processing method, device and electronic equipment
CN107577668A (en)*2017-09-152018-01-12电子科技大学Social media non-standard word correcting method based on semanteme
CN107679144A (en)*2017-09-252018-02-09平安科技(深圳)有限公司News sentence clustering method, device and storage medium based on semantic similarity
CN107862058A (en)*2017-11-102018-03-30北京百度网讯科技有限公司Method and apparatus for generating information
CN107957993A (en)*2017-12-132018-04-24北京邮电大学The computational methods and device of english sentence similarity
CN107977352A (en)*2016-10-212018-05-01富士通株式会社Information processor and method
CN108073565A (en)*2016-11-102018-05-25株式会社Ntt都科摩The method and apparatus and machine translation method and equipment of words criterion

Patent Citations (18)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN104572892A (en)*2014-12-242015-04-29中国科学院自动化研究所Text classification method based on cyclic convolution network
US20170277783A1 (en)*2016-03-282017-09-28Oki Electric Industry Co., Ltd.Ontology processing device and a non-transitory computer-readable storage medium
CN106021272A (en)*2016-04-042016-10-12上海大学Keyword automatic extraction method based on distributed expression word vector calculation
CN107977352A (en)*2016-10-212018-05-01富士通株式会社Information processor and method
CN108073565A (en)*2016-11-102018-05-25株式会社Ntt都科摩The method and apparatus and machine translation method and equipment of words criterion
CN106933806A (en)*2017-03-152017-07-07北京大数医达科技有限公司The determination method and apparatus of medical synonym
CN106970910A (en)*2017-03-312017-07-21北京奇艺世纪科技有限公司A kind of keyword extracting method and device based on graph model
CN107122413A (en)*2017-03-312017-09-01北京奇艺世纪科技有限公司A kind of keyword extracting method and device based on graph model
CN107315734A (en)*2017-05-042017-11-03中国科学院信息工程研究所A kind of method and system for becoming pronouns, general term for nouns, numerals and measure words standardization based on time window and semanteme
CN107247780A (en)*2017-06-122017-10-13北京理工大学A kind of patent document method for measuring similarity of knowledge based body
CN107291693A (en)*2017-06-152017-10-24广州赫炎大数据科技有限公司A kind of semantic computation method for improving term vector model
CN107562715A (en)*2017-07-182018-01-09阿里巴巴集团控股有限公司Term vector processing method, device and electronic equipment
CN107562717A (en)*2017-07-242018-01-09南京邮电大学A kind of text key word abstracting method being combined based on Word2Vec with Term co-occurrence
CN107562792A (en)*2017-07-312018-01-09同济大学A kind of question and answer matching process based on deep learning
CN107577668A (en)*2017-09-152018-01-12电子科技大学Social media non-standard word correcting method based on semanteme
CN107679144A (en)*2017-09-252018-02-09平安科技(深圳)有限公司News sentence clustering method, device and storage medium based on semantic similarity
CN107862058A (en)*2017-11-102018-03-30北京百度网讯科技有限公司Method and apparatus for generating information
CN107957993A (en)*2017-12-132018-04-24北京邮电大学The computational methods and device of english sentence similarity

Cited By (15)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN109582955A (en)*2018-11-142019-04-05金色熊猫有限公司Method, apparatus and medium for standardizing medical terms
CN109582955B (en)*2018-11-142023-04-07金色熊猫有限公司Method, apparatus and medium for standardizing medical terms
CN110517747A (en)*2019-08-302019-11-29志诺维思(北京)基因科技有限公司Pathological data processing method, device and electronic equipment
CN110517747B (en)*2019-08-302022-06-03志诺维思(北京)基因科技有限公司Pathological data processing method and device and electronic equipment
CN110931090A (en)*2019-11-262020-03-27太平金融科技服务(上海)有限公司Disease data processing method and device, computer equipment and storage medium
CN111160012A (en)*2019-12-262020-05-15上海金仕达卫宁软件科技有限公司Medical term recognition method and device and electronic equipment
CN111160012B (en)*2019-12-262024-02-06上海金仕达卫宁软件科技有限公司Medical term identification method and device and electronic equipment
CN113689923A (en)*2020-05-192021-11-23北京平安联想智慧医疗信息技术有限公司Medical data processing apparatus, system and method
CN111680168A (en)*2020-05-292020-09-18平安银行股份有限公司Text feature semantic extraction method and device, electronic equipment and storage medium
CN111680168B (en)*2020-05-292024-06-28平安银行股份有限公司Text feature semantic extraction method and device, electronic equipment and storage medium
CN112115715A (en)*2020-09-042020-12-22北京嘀嘀无限科技发展有限公司Natural language text processing method and device, storage medium and electronic equipment
CN111968744A (en)*2020-10-222020-11-20深圳大学Bayesian optimization-based parameter optimization method for stroke and chronic disease model
CN112800183A (en)*2021-02-252021-05-14国网河北省电力有限公司电力科学研究院 Content name data processing method and terminal device
CN112800183B (en)*2021-02-252023-09-26国网河北省电力有限公司电力科学研究院 Content name data processing method and terminal equipment
CN113223657A (en)*2021-06-012021-08-06联仁健康医疗大数据科技股份有限公司Medicine information processing method and device, electronic equipment and storage medium

Also Published As

Publication numberPublication date
CN108804423B (en)2023-09-08

Similar Documents

PublicationPublication DateTitle
CN108804423A (en)Medical Text character extraction and automatic matching method and system
US10740561B1 (en)Identifying entities in electronic medical records
CN107977361B (en)Chinese clinical medical entity identification method based on deep semantic information representation
CN108804530B (en) Add subtitles to areas of the image
CN112214995A (en) Hierarchical Multi-Task Term Embedding Learning for Synonym Prediction
US11361002B2 (en)Method and apparatus for recognizing entity word, and storage medium
CN111814465A (en) Machine learning-based information extraction method, device, computer equipment and medium
WO2022001623A1 (en)Image processing method and apparatus based on artificial intelligence, and device and storage medium
CN109559300A (en)Image processing method, electronic equipment and computer readable storage medium
CN109766437A (en)A kind of Text Clustering Method, text cluster device and terminal device
CN113420552B (en)Biomedical multi-event extraction method based on reinforcement learning
CN109325242A (en) Method, device and device for judging whether sentences are aligned based on word pairs and translations
CN111581972A (en) Method, device, equipment and medium for identifying the corresponding relationship between symptoms and parts in text
CN114897060B (en)Training method and device for sample classification model, and sample classification method and device
CN113657105A (en)Medical entity extraction method, device, equipment and medium based on vocabulary enhancement
CN113658720A (en) Method, apparatus, electronic device and storage medium for matching diagnostic name and ICD code
CN113723077B (en)Sentence vector generation method and device based on bidirectional characterization model and computer equipment
CN113435582B (en)Text processing method and related equipment based on sentence vector pre-training model
CN110489765A (en)Machine translation method, device and computer readable storage medium
CN114613515B (en)Medical entity relationship extraction method and device, storage medium and electronic equipment
Florian et al.Factorizing complex models: A case study in mention detection
Ye et al.Multi-level composite neural networks for medical question answer matching
CN118069090A (en)Exhibition hall interface design and man-machine interaction method based on voice instruction
CN118211931A (en)Policy flow canvas generation method and device, computer equipment and storage medium
CN112749251B (en)Text processing method, device, computer equipment and storage medium

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
TA01Transfer of patent application right
TA01Transfer of patent application right

Effective date of registration:20220525

Address after:518000 China Aviation Center 2901, No. 1018, Huafu Road, Huahang community, Huaqiang North Street, Futian District, Shenzhen, Guangdong Province

Applicant after:Shenzhen Ping An medical and Health Technology Service Co.,Ltd.

Address before:Room 12G, Area H, 666 Beijing East Road, Huangpu District, Shanghai 200001

Applicant before:PING AN MEDICAL AND HEALTHCARE MANAGEMENT Co.,Ltd.

GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp