CN109065015B

Movatterモバイル変換

Info

Publication number: CN109065015B
Application number: CN201810844009.9A
Authority: CN
Inventors: 国家喜; 吴及; 李承程; 吕萍; 岳阔; 赵湖勇; 李群
Original assignee: Tsinghua University; iFlytek Co Ltd
Current assignee: Tsinghua University; iFlytek Co Ltd
Priority date: 2018-07-27
Filing date: 2018-07-27
Publication date: 2021-06-08
Anticipated expiration: 2038-07-27
Also published as: CN109065015A

Abstract

The application discloses a data acquisition method, a device, equipment and a readable storage medium, wherein a question and answer node set corresponding to a target project is obtained, the set comprises question information corresponding to the target project, automatic data acquisition of a machine based on the question information is realized, the problem of data acquisition loss caused by manual question missing is avoided, and the efficiency of manual acquisition compared with machine acquisition is greatly improved.

Description

Data acquisition method, device and equipment and readable storage medium

Technical Field

The present application relates to the field of natural language processing technologies, and in particular, to a data collection method, apparatus, device, and readable storage medium.

Background

With the development of the era, the data era has been entered at present. Each business needs to accumulate basic data to support higher-level decisions.

For example, the query content data acquisition usually requires the answer of the person to be queried to be collected in the form of question and answer, and finally forms a query record. The trial notes can be used as a support material for the trial of subsequent cases. For another example, the doctor can obtain the onset and treatment course of the patient through question-answer communication between doctors and patients, and form a case book. Case collection is one of the important bases for disease diagnosis, and is used as a support material for disease diagnosis.

Research shows that the existing data acquisition process of the question-answer form of each item is manually realized, namely, a questioner proposes a question, an answering party provides an answer corresponding to the question, and the questioner manually records the question and the corresponding answer content. Obviously, the data acquisition mode is influenced by personal experience and state of the questioner, and the problem of data acquisition loss caused by incomplete consideration of the questioner is easy to occur aiming at complex projects. Moreover, manual collection also has the problem of low collection efficiency.

Disclosure of Invention

In view of this, the present application provides a data acquisition method, apparatus, device and readable storage medium, which are used to solve the problems of easy data acquisition loss, high cost and low efficiency in the existing manual data acquisition.

In order to achieve the above object, the following solutions are proposed:

a method of data acquisition, comprising:

acquiring a question-answer node set corresponding to a target project to be subjected to data acquisition, wherein the question-answer node set comprises question-answer nodes corresponding to the target project, and the question-answer nodes comprise question information;

selecting question and answer nodes from the question and answer node set, and outputting question information contained in the selected question and answer nodes;

and acquiring answer information fed back to the output question information to obtain answer information corresponding to the question-answer node.

Preferably, the selecting of question and answer nodes from the question and answer node set includes:

and selecting question and answer nodes from the question and answer node set according to a preset inquiry sequence of each question and answer node corresponding to the target project.

Preferably, the selecting a question-answer node from the question-answer node set according to a preset query sequence of each question-answer node corresponding to the target item includes:

and selecting the question and answer nodes from head to tail according to the sorting sequence of the question and answer nodes in the question and answer node set, wherein the sorting sequence of the question and answer nodes in the question and answer node set is consistent with the inquiry sequence.

Preferably, the question-answer node further includes a next question-answer node slot for storing an index of a next question-answer node of the question-answer nodes determined according to the question order;

selecting question and answer nodes from the question and answer node set according to a preset inquiry sequence of each question and answer node corresponding to the target project, wherein the method comprises the following steps:

and when obtaining the answer information corresponding to the currently selected question-answer node and determining that a next question-answer node needs to be selected, taking the question-answer node corresponding to the index of the next question-answer node stored in a next question-answer node slot contained in the currently selected question-answer node as the next question-answer node.

determining the node characteristics of the question and answer nodes according to question information and answer information of the question and answer nodes aiming at each selected question and answer node in the question and answer node set;

combining the node characteristics of each selected question and answer node into a node characteristic set according to the sequence of selection;

inputting the node feature set into a preset node selection model to obtain an index of a next question and answer node output by the node selection model;

the node selection model is obtained by taking a node characteristic training data set formed by combining the node characteristic training data of the selected question and answer nodes corresponding to the target project according to the selection sequence as a training sample and taking the index of the next to-be-selected question and answer node as a sample label for training.

Preferably, the question-answer node further includes a next question-answer node slot for storing an index of the next question-answer node;

the selecting of the question and answer nodes from the question and answer node set further comprises:

when answer information corresponding to the currently selected question and answer node is obtained and a next question and answer node is determined to be selected, judging whether an index of the next question and answer node is stored in a next question and answer node slot contained in the currently selected question and answer node;

if so, taking the question-answer node corresponding to the index of the next question-answer node stored in the next question-answer node slot contained in the currently selected question-answer node as the next question-answer node;

and if not, executing the operation of determining the node characteristics of the question-answer nodes according to the question information and the answer information of the question-answer nodes aiming at each selected question-answer node in the question-answer node set.

Preferably, the determining the node characteristics of the question-answering node according to the question information and the answer information of the question-answering node includes:

inputting preset node coding models by taking the question information and the answer information of the question-answer nodes as input data, wherein the node coding models are models capable of extracting features of the input data and predicting project results of third-party projects according to the extracted features, and the third-party projects are projects applying data collected by the target projects;

and acquiring the characteristics of the node coding model extracted from the input data as the node characteristics of the question and answer nodes.

Preferably, the outputting the question information included in the selected question and answer node includes:

if the question information is in a text form, outputting the question information contained in the selected question and answer node in the text form, or performing voice synthesis on the question information contained in the selected question and answer node and outputting the synthesized question information in the voice form;

and if the question information is in a voice form, outputting the question information contained in the selected question and answer node in the voice form, or performing voice transcription on the question information contained in the selected question and answer node and outputting the transcribed text-form question information.

Preferably, the obtaining of the answer information fed back to the output question information to obtain the answer information corresponding to the question-answer node includes:

acquiring answer information in a voice form fed back to the output question information, and transcribing the answer information into answer information in a text form; or the like, or, alternatively,

acquiring answer information in an image form fed back to the output question information, and performing image text recognition on the answer information to recognize the answer information in a text form; or the like, or, alternatively,

acquiring answer information in a text form fed back to the output question information;

and standardizing the acquired answer information to obtain the standard answer information corresponding to the question-answer node.

Preferably, the question-answer node further comprises a question type slot for storing the type of the question information;

the step of standardizing the acquired answer information to obtain standard answer information corresponding to the question-answer node includes:

if the type of the acquired answer information corresponding to the question information is determined to be whether the question is similar to the question or not according to the question type slot, determining the standard answer information to be positive or negative according to the inclusion condition of the acquired answer information to the positive or negative type key words;

and if the type of the acquired answer information corresponding to the question information is determined to be a description type question according to the question type slot, the acquired answer information is used as standard answer information.

Preferably, the question-answering node further comprises a candidate answer slot for storing candidate answer information matched with the question information;

the step of standardizing the acquired answer information to obtain standard answer information corresponding to the question-answer node further comprises the following steps:

if the type of the acquired answer information corresponding to the question information is determined to be a selection type question according to the question type groove, calculating the similarity between the acquired answer information and each candidate answer information stored in the candidate answer groove;

and determining standard answer information from the candidate answer information according to the similarity.

Preferably, the target items include any one or more of a case collection item, an interview content collection item, and an interview data collection item.

Preferably, if the target item is a case collection item, the generation process of the question and answer node set corresponding to the target item includes:

acquiring symptom terms related to department diseases according to the department diseases corresponding to case acquisition items;

collecting question and answer data related to the symptom terms from medical question and answer resources, and arranging the data into question information and answer information;

and nodularizing the sorted question information, and forming a question-answer node set by the nodularized question information according to a preset inquiry flow.

A data acquisition device comprising:

the system comprises a question-answer node set acquisition unit, a question-answer node set acquisition unit and a question-answer node set acquisition unit, wherein the question-answer node set acquisition unit is used for acquiring a question-answer node set corresponding to a target project to be subjected to data acquisition, the question-answer node set comprises question nodes corresponding to the target project, and the question-answer nodes comprise question information;

the question-answer node selection unit is used for selecting question-answer nodes from the question-answer node set;

the question information output unit is used for outputting the question information contained in the selected question and answer node;

and the answer information acquisition unit is used for acquiring answer information fed back to the output question information to obtain answer information corresponding to the question-answer node.

Preferably, the question and answer node selecting unit includes:

and the sequential selection unit is used for selecting question and answer nodes from the question and answer node set according to the preset inquiry sequence of each question and answer node corresponding to the target project.

Preferably, the sequential selection unit includes:

and the in-set sequence selection unit is used for selecting the question and answer nodes from head to tail according to the sequence of the question and answer nodes in the question and answer node set, and the sequence of the question and answer nodes in the question and answer node set is consistent with the query sequence.

Preferably, the question-answer node further includes a next question-answer node slot for storing an index of a next question-answer node of the question-answer nodes determined according to the question order; the sequential selection unit includes:

and the index selection unit is used for taking the question-answer node corresponding to the index of the next question-answer node stored in the next question-answer node slot contained in the currently selected question-answer node as the next question-answer node when the answer information corresponding to the currently selected question-answer node is obtained and the next question-answer node is determined to be selected.

Preferably, the question and answer node selecting unit includes:

a node characteristic determining unit, configured to determine, for each question-answer node selected in the question-answer node set, a node characteristic of the question-answer node according to question information and answer information of the question-answer node;

the characteristic combination unit is used for combining the node characteristics of each selected question and answer node into a node characteristic set according to the selection sequence;

the node selection model prediction unit is used for inputting the node feature set into a preset node selection model to obtain an index of a next question-answer node output by the node selection model;

the question and answer node selecting unit further comprises:

the question-answer node slot judging unit is used for judging whether an index of a next question-answer node is stored in a next question-answer node slot contained in the currently selected question-answer node or not when answer information corresponding to the currently selected question-answer node is obtained and the next question-answer node is determined to be selected; if yes, executing a question-answer node slot using unit, and if not, executing the node characteristic determining unit;

and the question-answer node slot using unit is used for taking the question-answer node corresponding to the index of the next question-answer node stored in the next question-answer node slot contained in the currently selected question-answer node as the next question-answer node.

Preferably, the node characteristic determination unit includes:

a node coding model prediction unit, configured to input a preset node coding model by using the question information and the answer information of the question and answer node as input data, where the node coding model is a model that can perform feature extraction on the input data and predict a project result of a third-party project according to the extracted features, and the third-party project is a project that applies data acquired by the target project;

and the node coding model feature extraction unit is used for acquiring the features of the node coding model extracted from the input data as the node features of the question and answer nodes.

Preferably, the question information output unit includes:

a first question information output subunit, configured to output, in a text form, question information included in the selected question and answer node, if the question information is in the text form, or perform voice synthesis on the question information included in the selected question and answer node, and output the synthesized question information in the voice form;

and the second question information output subunit is used for outputting the question information contained in the selected question and answer node in a voice form if the question information is in the voice form, or performing voice transcription on the question information contained in the selected question and answer node and outputting the transcribed question information in a text form.

Preferably, the answer information acquisition unit includes:

a voice answer information acquisition subunit, configured to acquire answer information in a voice form fed back to the output question information, and transcribe the answer information in a text form; or the like, or, alternatively,

the image answer information acquisition subunit is used for acquiring answer information in an image form fed back by the output question information, performing image text recognition on the answer information and recognizing the answer information in a text form; or the like, or, alternatively,

a text answer information acquisition subunit, configured to acquire answer information in a text form fed back to the output question information;

and the standardization processing unit is used for standardizing the acquired answer information to obtain standard answer information corresponding to the question-answer node.

the normalization processing unit includes:

the first standardization processing subunit is used for determining whether the type of the acquired answer information corresponding to the question information is a question or not according to the question type slot, and determining whether the standard answer information is positive or negative according to the inclusion condition of the acquired answer information on the positive or negative keywords;

and the second standardization processing subunit is used for taking the acquired answer information as standard answer information if the type of the acquired answer information corresponding to the question information is determined to be a description type question according to the question type groove.

the normalization processing unit further includes:

the third standardization processing subunit is used for calculating the similarity between the acquired answer information and each piece of candidate answer information stored in the candidate answer slot if the type of the acquired answer information corresponding to the question information is determined to be a selection type question according to the question type slot; and determining standard answer information from the candidate answer information according to the similarity.

A data acquisition device comprises a memory and a processor;

the memory is used for storing programs;

the processor is configured to execute the program to implement the steps of the data acquisition method.

A readable storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the steps of the data acquisition method as described above.

It can be seen from the foregoing technical solutions that, in the data acquisition method provided in the embodiments of the present application, a question-answer node set corresponding to a target project is obtained, where the set includes question-answer nodes corresponding to the target project, and the question-answer nodes include question information, and the question-answer nodes are selected from the question-answer node set, and question information included in the selected question-answer nodes is output, so that a user feeds back answer information for the question information, and obtains the fed-back answer information to obtain answer information corresponding to the question-answer nodes. According to the method and the device, the question and answer node set corresponding to the target project is obtained, the problem information corresponding to the target project is contained in the set, automatic data acquisition of the machine based on the problem information is achieved, the problem of acquisition data loss caused by manual question missing can not occur, and the efficiency of manual acquisition compared with the efficiency of mechanical acquisition is greatly improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a flowchart of a data acquisition method disclosed in an embodiment of the present application;

FIG. 2 is a schematic structural diagram of a data acquisition device according to an embodiment of the present application;

fig. 3 is a block diagram of a hardware structure of a data acquisition device disclosed in an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The data acquisition scheme that this application embodiment provided can be applied to data acquisition equipment, intelligent terminal such as computer, cell-phone, server, and this intelligent terminal can carry out data interaction with the user, and the interactive mode is not limited to: voice, text, images, and the like. The data acquisition scheme of the embodiment can be used for any target project in a question and answer form, such as a case acquisition project, an interrogation content acquisition project, an interview data acquisition project and the like. It should be noted that the purpose of the case collecting item is to obtain a case book, which is not a disease diagnosis result but a support material for assisting a doctor in disease diagnosis.

Next, a data acquisition method according to an embodiment of the present application is described with reference to fig. 1, and as shown in fig. 1, the method may include:

and S100, obtaining a question and answer node set corresponding to a target project to be subjected to data acquisition.

The question-answer node set comprises question-answer nodes corresponding to the target project, and the question-answer nodes comprise question information.

As previously mentioned, the target item may be an item that requires data collection in the form of a question and answer. And the corresponding question-answering node sets are different according to different target projects. The question-answer node set comprises all question-answer nodes corresponding to the target project, each question-answer node comprises corresponding question information, and the question information can be understood as description information of a question, such as 'whether abdominal pain occurs', 'whether chest pain is continuous or intermittent', and the like. At least one question-answer node is included in the question-answer node set, and generally, the number of the question-answer nodes is multiple.

Taking the target project as a case collecting project as an example, the target project can be further subdivided into sub-projects, for example, the sub-projects can be divided into a plurality of case collecting sub-projects in different departments according to different departments in clinic.

The problem information needing to be collected is determined in advance according to different target projects, and a question-answer node set corresponding to the target projects is constructed on the basis of the problem information. When a target project which needs to be subjected to data acquisition is determined, a pre-generated question-answer node set corresponding to the target project can be directly obtained.

And step S110, selecting question and answer nodes from the question and answer node set, and outputting question information contained in the selected question and answer nodes.

Specifically, question-answer nodes may be selected from the question-answer node set, and question information included in the selected question-answer nodes may be output. The selection mode may be one by one, or a plurality of selection modes may be selected at one time.

It will be appreciated that the question information may be in a variety of forms, such as text, voice, etc. If the problem information is in a text form, the problem information in the text form can be directly output, for example, displayed in a display screen form for a user to disclose. In addition, the question information may be speech-synthesized in a text form, and the synthesized question information in a speech form may be output. Specifically, the synthesized question information in the form of voice can be played through a microphone for a user to listen to.

Further, if the form of the question information is a voice form, the question information may be output in a voice form. In addition, the question information can be subjected to voice transcription, and the transcribed text-form question information can be output.

Of course, the above only illustrates several output forms of the question information, and in addition, the question information may be output in other forms, so as to ensure that the user can know the question information.

And step S120, obtaining the answer information fed back to the output question information to obtain the answer information corresponding to the question-answer node.

Specifically, the question information included in the selected question and answer node is output in the previous step, and on the basis, the user can feed back answer information according to the output question information. In this step, the answer information fed back to the output question information is obtained, and the answer information corresponds to the output question information, that is, corresponds to the question-answering node where the question information is located, so that the answer information corresponding to the question-answering node can be obtained.

It is understood that the answer information obtained in this step may be voice answer information, text answer information, or text answer information in other forms such as images. Taking the case collection item as an example, the patient can feed back the answer information in the form of voice and text, and can also take the examination and examination sheet as the answer information.

Because the question-answer nodes are selected from the question-answer node set in the previous step and the question information contained in the selected question-answer nodes is output, the answer information corresponding to each question-answer node in the question-answer node set can be obtained through the step, and finally the answer information corresponding to each question-answer node in the question-answer node set is obtained. And the answer information corresponding to each question-answer node is the collected data corresponding to the target project.

According to the data acquisition method provided by the embodiment of the application, the question-answer node set corresponding to the target project is obtained, the set comprises the problem information corresponding to the target project, automatic data acquisition of the machine based on the problem information is achieved, the problem of data acquisition loss caused by manual question missing is avoided, and the efficiency of machine acquisition compared with manual acquisition is greatly improved.

In an embodiment of the present application, a process of obtaining the answer information fed back to the output question information in step S120 to obtain the answer information corresponding to the question-answer node is described.

It has been described above that the answer information may be in various forms such as a voice form, an image form, a text form, etc. In order to edit the answer information conveniently, the embodiment converts the acquired answer information in various forms into a text form, and specifically includes:

1) if the answer information is in a voice form, the embodiment acquires the answer information in the voice form fed back to the output question information, and transcribes the answer information in the text form.

Specifically, in order to improve the accuracy of voice transcription, corresponding voice training data may be obtained in advance for a target project, text content corresponding to the semantic training data is labeled, and then a voice transcription model is trained by using the voice training data and the corresponding text content. And subsequently, the answer information can be transcribed by utilizing the trained voice transcription model to obtain the corresponding answer information in a text form.

Taking the target project as case acquisition as an example, the voice training data can be collected voice data of real patients summarizing and answering the doctor questions in the process of seeing a doctor.

2) If the answer information is in an image form, the embodiment acquires the answer information in the image form fed back to the output question information, and performs image text recognition on the answer information to recognize the answer information in the text form.

Specifically, the present embodiment may employ an OCR (optical character Recognition) technology to perform text Recognition on the answer information in the form of an image, and obtain the answer information in the form of a recognized text.

3) If the answer information is in a text form, the embodiment directly acquires the answer information in the text form fed back to the output question information.

4) Further, the acquired answer information is standardized to obtain standard answer information corresponding to the question-answer node.

After the answer information in the text form is obtained, the answer information can be further standardized to obtain standard answer information corresponding to the question node.

In an alternative embodiment, the question and answer node may further include a question type slot for storing the type of the question information. The present embodiment may record the type of the question information of the question and answer node in the question type slot corresponding to the question and answer node in advance through various types of the question information corresponding to the target item.

The type of question information can be various, and common questions are whether questions are of a type, description questions, selection questions and the like.

In this embodiment, if it is determined according to the question type slot whether the type of the acquired answer information corresponding to the question information is a question, it is determined according to the inclusion of the acquired answer information to the positive or negative type keyword that the standard answer information is positive or negative.

Specifically, the present embodiment may count positive keywords and negative keywords in advance, where the positive keywords are as follows: is, has …; negative category keywords such as: not, none ….

Matching the obtained answer information with the inclusion conditions of the two types of keywords, and if the answer information is matched with the positive type of keywords, determining that the standard answer information is positive; and if the answer information is matched with the negative keywords, determining that the standard answer information is negative.

Further, if the type of the acquired answer information corresponding to the question information is determined to be a description type question according to the question type slot, the acquired answer information is used as standard answer information.

Specifically, for the description-type question, the acquired answer information may be directly used as the standard answer information.

In another alternative embodiment, the question and answer node may further include a candidate answer slot for storing candidate answer information matching the question information. Specifically, for some question information, the candidate answer information is fixed, such as the question information: "continuous or intermittent chest pain", the corresponding candidate response information may include: "continuous" and "intermittent".

On this basis, if it is determined that the type of the question information corresponding to the obtained answer information is a selection type question according to the question type slot, the similarity between the obtained answer information and each piece of candidate answer information stored in the candidate answer slot can be calculated.

Further, according to the size of the similarity, standard answer information is determined from the candidate answer information.

Specifically, one candidate answer information with the maximum similarity may be selected as the standard answer information, or topN candidate answer information with the maximum similarity may be selected as the standard answer information.

A. and respectively segmenting the answer information and the candidate answer information.

When segmenting words, a segmentation model can be formed. Specifically, word segmentation labeling is carried out on answer information training data corresponding to the target item, and a word segmentation model is trained based on labeling results. And performing word segmentation processing on the answer information and the candidate answer information respectively by using the trained word segmentation model.

B. And removing stop words from the results of the segmentation of the answer information and the candidate answer information to obtain processed answer information and processed candidate answer information.

C. And calculating semantic similarity of the processed answer information and the processed candidate answer information.

Specifically, a word vector of each participle included in the processed answer information and a word vector of each participle in each processed candidate answer information are obtained from the word vector model. Further, according to the word vector of each participle included in the processed answer information and the word vector of each participle included in the processed candidate answer information, a vector distance is calculated to serve as the similarity of the word vector and the word vector.

In this step, semantic similarity between the processed answer information and each processed candidate answer information is obtained.

In another embodiment of the present application, taking the target item as case collection as an example, a process of acquiring, in step S100, a question-answer node set corresponding to the target item to be subjected to data collection is described, where the process may include:

s1, acquiring symptom terms related to department diseases according to the department diseases corresponding to the case collection items.

The case collection project can correspond to a plurality of departments, such as internal medicine, surgery and the like. The disease of each department can be predetermined, so that the symptom term related to the department disease can be obtained according to the department disease corresponding to the case collection item in the step.

In particular, a set of symptom terms associated with a department disease may be derived from medical resource data by a data mining method. The medical resource data includes medical related books and other medical data related on the network. Alternatively, the relevant diseases are obtained from medical textbooks, for example, by department name. Further, the description content related to the disease is extracted from the medical resource data. And further, marking symptom terms on the extracted description content by a sequence marking method to obtain a symptom term set.

Wherein, symptom terms are as follows: headache, fever, abdominal pain, etc.

Optionally, for the acquired symptom term set, a frequent set algorithm may be employed to acquire topM disease terms with the highest frequency of occurrence with the department disease.

And S2, collecting question and answer data related to the symptom terms from the medical question and answer resources, and arranging the question and answer data into question information and answer information.

And S3, nodulating the sorted question information, and forming a question-answer node set by the nodulated question information according to a preset inquiry flow.

Specifically, each question information may correspond to one question-answer node, and the question-answer nodes may be grouped into a question-answer node set according to the inquiry flow.

Optionally, in this embodiment, a question type slot may be set in the question-answering node, and the type corresponding to the question information is filled in the question type slot.

Further optionally, in this embodiment, a candidate answer slot may be further set in the question-answering node, and the answer information corresponding to the question information is filled into the candidate answer slot.

Still further optionally, since the inquiry flow is determined, the order sequence between the inquiry nodes may also be determined, so that a next inquiry node slot may also be set in the inquiry node, and an index of a next inquiry node of the current inquiry node determined according to the inquiry flow is filled in the next inquiry node slot, so that a next inquiry node may be determined according to the next inquiry node slot in the subsequent process.

The question-answer node set generated in this embodiment may be stored in a list form, or may be stored in a tree structure, and the storage form is not particularly limited.

In another embodiment of the present application, a process of selecting question and answer nodes from the question and answer node set in step S110 is described.

The embodiment of the application discloses several different ways of selecting question and answer nodes from a question and answer node set, and each implementation way is respectively introduced as follows:

the first method comprises the following steps:

the query order may be preset for each question-answering node corresponding to the target item. Furthermore, in this embodiment, the question-answering nodes may be selected from the question-answering node set according to a preset query sequence.

Specifically, the preset query sequence may be embodied in various forms, such as:

1) the ordering sequence of each question-answering node in the question-answering node set is kept consistent with the inquiry sequence. Based on the method, the question-answer nodes can be selected from beginning to end according to the sorting sequence of the question-answer nodes in the question-answer node set.

2) As described above with respect to the generation process of the question-answer node set, the question-answer node may include a next question-answer node slot for storing an index of a next question-answer node of the question-answer nodes determined according to the question-answer order. For example, in query order, the question-answering nodes are ordered as follows: A-B-C-D. The index of question-answer node B may be populated in the next question-answer node slot of question-answer node a. Similarly, question and answer node B, C, D is addressed.

Based on the setting mode, when the answer information corresponding to the currently selected question-answer node is obtained and the next question-answer node is determined to be selected, the question-answer node corresponding to the index of the next question-answer node stored in the next question-answer node slot contained in the currently selected question-answer node is used as the next question-answer node.

And the second method comprises the following steps:

for some types of target items, the query order may not be predetermined by the corresponding question and answer nodes. The next question-answer node needs to be determined according to the question-answer nodes which have already been traversed. Based on such target items, the present embodiment provides a scheme for prediction by a deep neural network model, which is detailed as follows:

1) and determining the node characteristics of the question and answer nodes according to question information and answer information of the question and answer nodes aiming at each selected question and answer node in the question and answer node set.

Specifically, the currently selected question-answer nodes and the question-answer nodes selected before the current time are defined as the selected question-answer nodes, and then for each selected question-answer node, the node characteristics of the selected question-answer node are determined according to the question information and the answer information of the selected question-answer node.

Alternatively, a node coding model may be used to determine the node characteristics of the question-answering nodes.

Specifically, the item of data collected by the application target item is defined as a third party item. Taking the target item as a case collecting item as an example, the disease type can be determined based on the collected case data, and then the disease diagnosis can be taken as a third party item. For another example, the target item is an interrogation content acquisition item, and the criminal determination can be performed based on the acquired interrogation content, so that the criminal determination can be used as a third-party item.

Based on this, the node coding model may be a model that takes question and answer information of the target item as input data, and is capable of performing feature extraction on the input data and predicting an item result of the third-party item according to the extracted features. The node coding model can adopt a model in a bidirectional long-time neural network form or adopt other forms of models.

Based on the node coding model, the process of determining the node characteristics of the question-answering node may include:

a. inputting preset node coding models by using the question information and the answer information of the question-answer nodes as input data;

b. and acquiring the characteristics of the node coding model extracted from the input data as the node characteristics of the question and answer nodes.

In addition, other ways of determining the node characteristics of the question-answering node can be used. For example, a word vector set corresponding to question information and answer information of the question-answer nodes is determined, and the word vector set is used as node characteristics of the question-answer nodes.

2) And combining the node characteristics of each selected question and answer node into a node characteristic set according to the selection sequence.

Specifically, the node features of the question and answer nodes may be in the form of feature vectors, and then the feature vectors of each selected question and answer node may be combined in this step, and combined into a feature vector matrix according to the sequence of selection.

3) And inputting the node feature set into a preset node selection model to obtain an index of a next question and answer node output by the node selection model.

Specifically, the node selection model may be trained in advance, a node feature training data set formed by combining the node feature training data of the selected question and answer nodes corresponding to the target item in the selection order is used as a training sample during training, and an index of a labeled next question and answer node to be selected is used as a sample label. The node selection model can be a model in a one-way long-time memory form or other forms of models.

Based on the trained node selection model, the node feature set can be input into the model to obtain the index of the next question-answer node output by the model.

The output of the node selection model can be a vector matrix, the dimensionality of the vector matrix is the same as the number of question-answer nodes in the question-answer node set, and each dimensionality vector corresponds to only one question-answer node in the question-answer node set. In the vector matrix output by the node selection model, the vectors of the selected question-answering nodes corresponding to the dimensionality are deleted, the dimensionality with the maximum vector value is determined in the remaining dimensionality vectors, and the indexes of the question-answering nodes corresponding to the dimensionality are used as the indexes of the next question-answering nodes.

The method for predicting the next question-answer node based on the model provided by the embodiment considers the node characteristics of each selected question-answer node, and combines the node selection model trained according to the training data, so that the index of the next question-answer node can be accurately predicted.

And the third is that:

for some types of target items, the order of querying between possible partial querying nodes in the corresponding querying node set can be predetermined, and the order of querying between other partial querying nodes cannot be predetermined. Based on this, the above two implementation schemes may be combined, and specifically, the implementation schemes may include:

s1, when the answer information corresponding to the currently selected question and answer node is obtained and the next question and answer node is determined to be selected, judging whether the index of the next question and answer node is stored in the next question and answer node slot contained in the currently selected question and answer node; if so, go to S2, otherwise, go to S3.

Specifically, if the index of the next question-answering node is stored in the next question-answering node slot of the question-answering node, the description may determine the next question-answering node according to a predetermined query sequence, otherwise, the description may not be determined, and may be predicted based on the node selection model.

And S2, taking the question-answer node corresponding to the index of the next question-answer node stored in the next question-answer node slot contained in the currently selected question-answer node as the next question-answer node.

And S3, aiming at each selected question and answer node in the question and answer node set, determining the node characteristics of the question and answer node according to the question information and answer information of the question and answer node.

And S4, combining the node characteristics of each selected question and answer node into a node characteristic set according to the selection sequence.

And S5, inputting the node feature set into a preset node selection model to obtain the index of the next question and answer node output by the node selection model.

The data acquisition device provided by the embodiment of the present application is described below, and the data acquisition device described below and the data acquisition method described above may be referred to correspondingly.

Referring to fig. 2, fig. 2 is a schematic structural diagram of a data acquisition device disclosed in the embodiment of the present application. As shown in fig. 2, the apparatus may include:

the system comprises a question-answer node setacquisition unit 11, a question-answer node set acquisition unit and a question-answer node setting unit, wherein the question-answer node set acquisition unit is used for acquiring a question-answer node set corresponding to a target project to be subjected to data acquisition, the question-answer node set comprises question nodes corresponding to the target project, and the question-answer nodes comprise question information;

a question-answernode selecting unit 12, configured to select a question-answer node from the question-answer node set;

a questioninformation output unit 13 for outputting question information included in the selected question and answer node;

and an answerinformation acquiring unit 14 configured to acquire answer information fed back to the output question information, and obtain answer information corresponding to the question-answer node.

Optionally, the question and answer node selecting unit may include:

Optionally, the sequentially selecting unit may include:

Optionally, the question-answer node may further include a next question-answer node slot, configured to store an index of a next question-answer node of the question-answer nodes determined according to the question-answer sequence. Based on this, the sequentially selecting unit may include:

Optionally, the question and answer node selecting unit may include:

Optionally, the question-answer node may further include a next question-answer node slot, which is used to store an index of the next question-answer node. Based on this, the question answering node selecting unit may further include:

Optionally, the node characteristic determining unit may include:

Optionally, the problem information output unit may include:

Optionally, the answer information obtaining unit may include:

Optionally, the question-answering node may further include a question type slot for storing the type of the question information. Based on this, the normalization processing unit may include:

Optionally, the question-answering node may further include a candidate answer slot for storing candidate answer information matched with the question information. Based on this, the normalization processing unit may further include:

The data acquisition device provided by the embodiment of the application can be applied to data acquisition equipment, such as a PC terminal, a cloud platform, a server cluster and the like. Optionally, fig. 3 shows a block diagram of a hardware structure of the data acquisition device, and referring to fig. 3, the hardware structure of the data acquisition device may include: at least one processor 1, at least onecommunication interface 2, at least onememory 3 and at least onecommunication bus 4;

in the embodiment of the application, the number of the processor 1, thecommunication interface 2, thememory 3 and thecommunication bus 4 is at least one, and the processor 1, thecommunication interface 2 and thememory 3 complete mutual communication through thecommunication bus 4;

the processor 1 may be a central processing unit CPU or an ASIC specific integrated circuit

(Application Specific Integrated Circuit), or one or more Integrated circuits or the like configured to implement embodiments of the present invention;

thememory 3 may include a high-speed RAM memory, and may further include a non-volatile memory (non-volatile memory) or the like, such as at least one disk memory;

wherein the memory stores a program and the processor can call the program stored in the memory, the program for:

Alternatively, the detailed function and the extended function of the program may be as described above.

Embodiments of the present application further provide a readable storage medium, where a program suitable for being executed by a processor may be stored, where the program is configured to:

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method of data acquisition, comprising:

2. The method of claim 1, wherein the question and answer node further comprises a next question and answer node slot for storing an index of a next question and answer node;

3. The method according to claim 1, wherein the determining the node characteristics of the question-answering node according to the question information and the answer information of the question-answering node comprises:

4. The method according to claim 1, wherein the obtaining of answer information fed back to the output question information to obtain answer information corresponding to the question-answer node comprises:

5. The method of claim 4, wherein the question and answer node further comprises a question type slot for storing the type of question information;

6. The method according to claim 5, wherein the question answering node further comprises a candidate answer slot for storing candidate answer information matching the question information;

7. The method according to any one of claims 1 to 6, wherein the target items include any one or more of a case collection item, an interview content collection item, and an interview data collection item.

8. The method according to claim 7, wherein the target item is a case collection item, and the generation process of the question-answering node set corresponding to the target item includes:

9. A data acquisition device, comprising:

10. The apparatus according to claim 9, wherein the answer information acquisition unit includes:

11. A data acquisition device comprising a memory and a processor;

the memory is used for storing programs;

the processor, configured to execute the program, implementing the steps of the data acquisition method according to any one of claims 1 to 8.

12. A readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the data acquisition method according to any one of claims 1 to 8.