Detailed Description
The present application relates to an artificial intelligence technology, and in order to make the objects, technical solutions, and advantages of the present application more clearly understood, the present application will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The depression grade testing system provided by the application can be applied to the application environment as shown in fig. 1. When a user needs to monitor the self depression degree in real time, answer sheet data and answer video data are provided to a depression grade testing system arranged in a terminal through interaction with theterminal 104, the depression grade testing system firstly obtains the answer sheet data and the answer video data of the user to be evaluated through a question sheet data obtaining module, then the answer scores are counted according to the answer sheet data through a video data processing module, video frame extraction is carried out on the answer video data to obtain a video frame set corresponding to the answer video data, then each video frame in the video frame set is input into a trained face analysis model through a model analysis module to obtain the expression category and eyeball identification data of each video frame, the parameter to be evaluated of the user to be evaluated is obtained according to the answer scores, the expression category and eyeball identification data of each video frame, and finally the parameter to be evaluated is input into the trained depression grade testing model through a model testing module, results of the depression rating test were obtained. Theterminal 104 may be, but is not limited to, various personal computers, notebook computers, smart phones, and tablet computers.
In one embodiment, as shown in fig. 2, a depression rating test system is provided, which is illustrated by taking the application of the system to the terminal in fig. 1 as an example, and includes a questionnairedata acquisition module 202, a videodata processing module 204, amodel analysis module 206, and amodel evaluation module 208;
the questionnairedata acquisition module 202 acquires answer sheet data and answer video data of the user to be evaluated.
The answer sheet data refers to data answered by the user to be evaluated for the questionnaire, namely answers of the questionnaire. The answer video data refers to the facial information of the user to be evaluated, which is acquired in real time when the user to be evaluated answers, namely the answer video of the user to be evaluated, which is recorded in the answer process of the user to be evaluated.
Specifically, the data acquisition module acquires answer sheet data and answer video data of the user to be evaluated. For example, the answer sheet data may be submitted by the user to be evaluated through online answering through a terminal provided with a depression level test system, and the answer video data may be a video recorded online in the answering process of the user to be evaluated.
For example, the questionnaire may specifically be: 1. the inability to fall asleep, to remain asleep, or to sleep for too long, 2. feeling hypodynamia and lack of energy, 3. no appetite or excessive eating, 4. discontent on oneself (feeling that oneself is a loser), or feeling disappointed by oneself or family, 5. inability to concentrate on, such as when reading books or watching television, 6. slow movements or speaking to draw attention to others, or restlessness due to fidgetiness, 7. think of death or injury to oneself is a solution, etc. 7 subjects. The answer includes four options, respectively: A. no b at all, a few days c, more than half d, almost daily.
The videodata processing module 204 counts the answer scores according to the answer sheet data, and performs video frame extraction on the answer video data to obtain a video frame set corresponding to the answer video data.
The answer score is the score corresponding to the answer answered by the user to be evaluated to the questionnaire. Specifically, each answer has a corresponding score, and the answer score can be obtained by superposing the scores corresponding to the answers. For example, when the answer is four choices, the four choices are respectively assigned with scores of 1, 2, 3 and 4 according to the corresponding relationship between the four choices and the severity of depression, and the scores corresponding to the answers are superimposed to obtain the answer score.
Specifically, the video data processing module counts the answer scores according to the answer questionnaire data and preset scores corresponding to the answers, and performs video frame extraction on the answer video data to obtain a video frame set corresponding to the answer video data.
Themodel analysis module 206 inputs each video frame in the video frame set into the trained facial analysis model to obtain the expression category and eyeball identification data of each video frame, and obtains the to-be-evaluated parameters of the to-be-evaluated user according to the answer score, the expression category and eyeball identification data of each video frame.
The trained face analysis model is used for analyzing face information of a user to be evaluated, and comprises a face recognition network, an expression classification network and an eyeball recognition network. The face recognition network is used for recognizing a face image from a video frame, for example, the face recognition network may specifically be a network for performing face detection by using skin color, and for example, the face recognition network may specifically be a pre-trained target detection network. The expression classification network is used for classifying the expressions of the recognized facial images to obtain the expression classes. The eyeball identification network is used for carrying out eyeball positioning on the identified face image to obtain eyeball identification data. For example, the expression categories may be seven categories, specifically, angry, disgust, fear, worry, hurry, surprise, normal, and the like. The parameter to be evaluated refers to a parameter for evaluating the degree of depression for use.
Specifically, the model analysis module inputs each video frame in the video frame set into a trained face analysis model, performs face recognition on each video frame through a face recognition network in the face analysis model to obtain a face image corresponding to each video frame, inputs the face image into an expression classification network and an eyeball recognition network in the face analysis model respectively, performs expression classification on the face image through the expression classification network to obtain an expression category of the video frame, and performs eyeball positioning on the face image through the eyeball recognition network to obtain eyeball recognition data.
Specifically, the model analysis module calculates a first time proportion of the video frames corresponding to each expression type to the answer video data according to the expression type of each video frame, determines an eyeball gaze direction and user blinking times of each video frame according to the eyeball identification data, calculates a second time proportion of the video frames corresponding to each eyeball gaze direction to the answer video data according to the eyeball gaze direction of each video frame, and obtains the to-be-evaluated parameters of the to-be-evaluated user according to the answer fraction, the first time proportion, the second time proportion and the user blinking times.
Themodel evaluation module 208 inputs the parameters to be evaluated into the trained depression grade test model to obtain a depression grade test result.
Wherein the depression rating test result is used for characterizing the degree of depression of the user. Specifically, the treatment includes non-depression, moderate depression, major depression and the like.
Specifically, the model evaluation module inputs parameters to be evaluated into the trained depression grade test model to obtain a depression grade test label, the depression grade test label can represent the degree of depression, and the server analyzes the depression grade test label to obtain a depression grade test result.
The depression grade test system obtains answer sheet data and answer video data of a user to be evaluated by using the answer sheet data obtaining module, counts answer scores according to the answer sheet data by using the video data processing module, extracts video frames of the answer video data to obtain a video frame set corresponding to the answer video data, inputs each video frame in the video frame set into a trained face analysis model by using the model analysis module to obtain expression categories and eyeball identification data of each video frame, obtains a parameter to be evaluated of the user to be evaluated according to the answer scores, the expression categories and the eyeball identification data of each video frame, inputs the parameter to be evaluated into a trained depression grade test model by using the model evaluation module to obtain a depression grade test result, and can realize depression grade test of the user according to the answer sheet data and the answer video data, therefore, the user can monitor the degree of self depression in real time only by providing answer sheet data and answer video data.
In one embodiment, as shown in fig. 3, the system further includes a facial analysismodel training module 302, where the facial analysis model training module obtains sample face data carrying expression category labels and eyeball identification feature point labels, and trains the initial facial analysis model according to the sample face data to obtain a trained facial analysis model.
The initial face analysis model comprises a face recognition network, an initial expression classification network and an initial eyeball recognition network. The expression category label is a label representing a preset expression category. The eyeball identification feature point label is a label for positioning the eyeball identification feature point, namely, a label of the eyeball identification feature point.
Specifically, the facial analysis model training module acquires sample face data carrying expression category labels and eyeball identification feature point labels, firstly trains a face identification network by using the sample face data, then takes the sample face data as input, takes the expression category labels corresponding to the sample face data as supervised learning labels, and carries out supervised training on an expression classification network in an initial facial analysis model to obtain a trained expression classification network. Meanwhile, the face model training module takes the sample face data as input, takes the eyeball identification feature point labels corresponding to the sample face data as supervised learning labels, and conducts supervised training on the initial eyeball identification network in the initial face analysis model to obtain a trained initial eyeball identification network.
Further, when supervised training is carried out on the expression classification network in the initial facial analysis model, feature extraction is carried out on sample face data through a feature extraction network in the expression classification network to obtain a feature map corresponding to the sample face data, convolution processing is carried out on the feature map through a convolution network in the expression classification network to obtain the probability that the sample face data belongs to each preset expression class, an expression class analysis result is output according to the probability that the sample face data belongs to each preset expression class, and finally network parameters of the expression classification network are adjusted through comparison of the expression class analysis result and the expression class label to obtain the trained expression classification network.
In this embodiment, the trained face analysis model is obtained by acquiring sample face data carrying an expression category label and an eyeball identification feature point label and training the initial face analysis model according to the sample face data, and the trained face analysis model can be acquired.
In one embodiment, the model analysis module is further configured to input each video frame in the video frame set into a face recognition network in a trained face analysis model to obtain a face image corresponding to each video frame, input the face image into an expression classification network in the trained face analysis model to obtain an expression category of the video frame, input the face image into an eyeball recognition network in the trained face analysis model, determine eyeball recognition feature points through the eyeball recognition network, and position eyeballs according to the eyeball recognition feature points to obtain eyeball recognition data.
The eyeball identification feature points comprise an upper eyelid, a lower eyelid, a pupil and the like, the eyeball can be positioned through the eyeball identification feature points, and the eyeball identification data comprise whether the eyeball is identified or not, the position of the identified eyeball in the face image and the like. Further, whether the eyeball is identified or not includes the condition that the eyeball is not identified, the condition that the eyeball is not identified refers to the condition that the user blinks, and the number of times of the user blinks can be obtained by counting whether the eyeball is identified or not.
Specifically, the model analysis module inputs each video frame in the video frame set into a face recognition network in a trained face analysis model to obtain a face image corresponding to each video frame, inputs the face image into an expression classification network in the trained face analysis model, performs feature extraction on the face image through a feature extraction network in the expression classification network to obtain a face feature map corresponding to the face image, performs convolution processing on the face feature map through a convolution network in the expression classification network to obtain probabilities that the face image belongs to each preset expression category, sorts the probabilities that the face image belongs to each preset expression category, selects the preset expression category corresponding to the maximum probability value as the expression category corresponding to the face image, and obtains the expression category of the video frame. Meanwhile, the model analysis module inputs the face image into an eyeball recognition network in the trained face analysis model, eyeball recognition characteristic points are determined through the eyeball recognition network, and eyeballs are positioned according to the eyeball recognition characteristic points to obtain eyeball recognition data.
In this embodiment, the expression category of the video frame is obtained by inputting the face image into the expression classification network in the trained face analysis model, the eye recognition feature point is determined by the eye recognition network, the eye is positioned according to the eye recognition feature point to obtain the eye recognition data, and the acquisition of the expression category and the eye recognition data of the video frame can be realized.
In one embodiment, the model analysis module is further configured to count a first time proportion of the video frames corresponding to each expression category to the answer video data according to the expression category of each video frame, determine an eyeball gaze direction and a user blinking number of each video frame according to the eyeball identification data, count a second time proportion of the video frames corresponding to each eyeball gaze direction to the answer video data according to the eyeball gaze direction of each video frame, and obtain a to-be-evaluated parameter of the to-be-evaluated user according to the answer score, the first time proportion, the second time proportion and the user blinking number.
The gaze direction of the eyeball includes a front direction, a lower left direction, a lower right direction, an upper left direction, an upper right direction, and the like. When the eyeball gaze direction is determined, the up-down direction and the left-right direction are judged, the eyeball gaze direction can be determined by the position of the pupil in the eyeball, and the eyeball gaze direction can be determined by setting a threshold value to determine the deviation tendency of the pupil. The number of blinks of the user can be determined by whether the eyeball is recognized or not, and when the eyeball is not recognized, the number indicates that the user blinks at the moment.
For example, when the eye gaze direction is directly ahead, the pupil should be located at the center of the eyeball, and the distances between the pupil and the upper limit of the eyeball and the lower limit of the eyeball should be the same, and the distances between the pupil and the left limit of the eyeball and the right limit of the eyeball should also be the same, the deviation tendency of the pupil in the up-down direction can be determined by the distances between the pupil and the upper limit of the eyeball and the lower limit of the eyeball, the deviation tendency of the pupil in the left-right direction can be determined by the distances between the pupil and the left limit and the right limit of the eyeball, and the eye gaze direction can be determined according to the deviation tendency of the pupil in the up-down direction and the deviation tendency of the pupil in.
Specifically, the model analysis module calculates a first time proportion of the video frames corresponding to each expression type to the answer video data according to the expression type of each video frame, obtains eyeball feature data corresponding to each video frame and a recognizable eyeball video frame number according to the eyeball recognition data, determines an eyeball gaze direction of each video frame according to the eyeball feature data, obtains the blink frequency of a user according to the recognizable eyeball video frame number, calculates a second time proportion of the video frames corresponding to each eyeball gaze direction to the answer video data according to the eyeball gaze direction of each video frame, and takes the answer fraction, the first time proportion, the second time proportion and the blink frequency of the user as parameters to be evaluated of the user to be evaluated.
In this embodiment, a first time proportion can be obtained through analysis of expression categories of each video frame, a second time proportion and user blinking times can be obtained through analysis of eyeball identification data, and then, according to the answer score, the first time proportion, the second time proportion and the user blinking times, acquisition of parameters to be evaluated of a user to be evaluated can be achieved.
In one embodiment, the model analysis module is further configured to obtain eyeball feature data and identifiable eyeball video frame numbers corresponding to the video frames according to the eyeball identification data, determine an eyeball gaze direction of each video frame according to a pupil position in the eyeball feature data, and obtain the blink frequency of the user according to the identifiable eyeball video frame numbers.
The pupil position refers to the position of the pupil in the eyeball.
Specifically, the model analysis module obtains eyeball characteristic data corresponding to each video frame according to eyeball identification data, the eyeball characteristic data comprises a pupil position, the model analysis module determines the deviation tendency of the pupil according to the pupil position, the eyeball gaze direction of each video frame can be determined according to the deviation tendency of the pupil, the video frame number without identifying the eyeball is determined according to the video frame number with identifying the eyeball and the total number of the video frames of the video frame set, and the video frame number without identifying the eyeball is used as the blinking frequency of the user.
In this embodiment, the eyeball characteristic data and the recognizable eyeball video frame number corresponding to each video frame are obtained according to the eyeball identification data, the eyeball gaze direction of each video frame is determined according to the pupil position in the eyeball characteristic data, the user blinking number is obtained according to the recognizable eyeball video frame number, and the acquisition of the eyeball gaze direction and the user blinking number of each video frame can be realized.
In one embodiment, as shown in fig. 3, the system further includes a testmodel training module 304, where the test model training module obtains sample training data carrying a label of the depression grade, and performs supervised learning training on the initial depression grade test model according to the sample training data to obtain a trained depression grade test model.
The sample training data comprises a parameter set used for evaluating the depression degree of each sample user, and the parameter set comprises parameters such as a first time proportion of a video frame corresponding to each expression category to the answer video data, a second time proportion of a video frame corresponding to each eyeball gaze direction to the answer video data, answer scores and the blinking times of the user. The depression rating label is used to characterize the degree of depression of the sample user. For example, the depression rating label may specifically be no depression. As another example, the depression rating label may specifically be moderate depression. As another example, the depression rating label may specifically be major depression.
Specifically, the test model training module acquires sample training data carrying a depression grade label, constructs a sample training vector of each sample user according to the sample training data carrying the depression grade label, performs supervised learning training on the initial depression grade test model by taking the sample training vector as input and the depression grade label of each sample user as a supervised learning label, and obtains a trained depression grade test model.
In this embodiment, the trained depression grade test model is obtained by obtaining the sample training data carrying the depression grade label and performing supervised learning training on the initial depression grade test model according to the sample training data carrying the depression grade label, and the trained depression grade test model can be obtained.
In one embodiment, the model evaluation module is further configured to input the parameter to be evaluated into the trained depression grade test model to obtain a depression grade test label, and determine a depression grade test result according to the depression grade test label.
Wherein the depression rating test label is used to characterize whether the user to be evaluated is depressed or not. For example, the depression rating test label may specifically be 0 and 1, where 0 represents no depression and 1 represents depression.
Specifically, the model evaluation module constructs a corresponding vector to be evaluated according to the parameter to be evaluated, inputs the vector to be evaluated into the trained depression grade test model to obtain a depression grade test label, and determines a depression grade test result according to the depression grade test label.
In this embodiment, the parameter to be evaluated is input into the trained depression grade test model to obtain a depression grade test label, and the depression grade test result is determined according to the depression grade test label, so that the depression grade test result can be determined.
In one embodiment, the model evaluation module is further configured to obtain a depression rating test result as no depression when the depression rating test label is characterized as no depression.
In one embodiment, the model evaluation module is further configured to obtain a depression grade probability corresponding to the depression grade test model when the depression grade test label is characterized as having depression, and determine the depression grade test result according to the depression grade probability.
Specifically, the depression level probability refers to a probability value generated by a depression level test model in the process of analyzing the parameter to be evaluated, and can be used for representing the depression degree of the user to be evaluated. For example, the corresponding relationship between the depression level probability and the depression level test result may be preset, and when the depression level test label is characterized as having depression, the depression level test result may be determined by obtaining the depression level probability corresponding to the depression level test model and querying the corresponding relationship between the preset depression level probability and the depression level test result according to the depression level probability. Further by way of example, the correspondence between the depression grade probability and the depression grade test result may specifically be that when the depression grade probability is between 0.5 and 0.75, the corresponding depression grade test result is moderate depression, and when the depression grade probability is greater than 0.75, the corresponding depression grade test result is severe depression.
In the present embodiment, the determination of the depression level test result can be achieved by acquiring the depression level probability corresponding to the depression level test model and determining the depression level test result according to the depression level probability.
In one embodiment, as shown in fig. 3, the depression rating test system further comprises aninformation recommending module 306 that recommends depression rating test result information when the depression rating test label is characterized by depression.
Wherein, the depression grade test result information refers to hospital information and doctor information.
Specifically, when the depression grade test label is characterized by depression, the information recommending module recommends the depression grade test result information to the user to be evaluated. Further, the recommended information of the depression level test result may be high-quality hospital information and doctor information that are crawled in advance by a crawler technique. Furthermore, the information recommendation module can push a geographic position acquisition prompt to the user to be evaluated, when receiving feedback of agreeing to acquire the geographic position, the geographic position of the user to be evaluated is acquired, and intelligent recommendation is performed according to the geographic position of the user to be evaluated, the high-quality hospital information and the doctor information which are crawled in advance. For example, the recommendation may be sorted according to the distance between the hospital address corresponding to the high-quality hospital information and the doctor information and the geographic location of the user to be evaluated.
In the present embodiment, the user to be evaluated can be helped to make further appointments and treatments by recommending the information of the depression level test result.
In one embodiment, as shown in fig. 3, the depression rating test system further comprises acalibration module 308 that calibrates the video recording of the user to be evaluated.
Specifically, the calibration module can detect an evaluation request of a user to be evaluated, when the evaluation request of the user to be evaluated is detected, the calibration module can push a text display window and a video window to a terminal for displaying, wherein the text display window is used for displaying questionnaire data, the user to be evaluated can adjust the position of the text display window as required, and the video window is used for carrying out video recording calibration on the user to be evaluated so as to ensure that the user can record comprehensive user video data in the answering process.
In this embodiment, by performing video recording calibration on the user to be evaluated, comprehensive user video data can be recorded, so that accurate evaluation of the user to be evaluated is realized.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.