Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the art described above. Therefore, the invention aims to provide a drinking capacity prediction system based on gene screening, which provides a drinking capacity judgment standard, quantifies the individual drinking capacity, gives more intuitive and valuable drinking capacity evaluation and drinking advice according to the physical condition of a user, and improves the user experience.
In order to achieve the above object, an embodiment of the present invention provides a system for predicting alcohol consumption based on gene screening, including:
the user gene acquisition module is used for acquiring gene data of a user;
the alcohol consumption prediction model interface module is connected with the user gene acquisition module, inputs gene data of the user into the alcohol consumption prediction model to predict the alcohol consumption of the user and obtain a first prediction result of the alcohol consumption;
the drinking suggestion database module is connected with the drinking capacity prediction model interface module and used for giving drinking suggestions according to a first prediction result of the drinking capacity of the user;
the query module is connected with the alcohol consumption prediction model interface module and is used for querying alcohol consumption suggestions corresponding to alcohol consumption sections based on the alcohol consumption related genes, the relation between the alcohol consumption sections and the alcohol consumption in the alcohol consumption prediction model;
and the display module is connected with the drinking advice database module and is used for displaying the user gene data, the prediction result of the drinking amount and the drinking advice.
According to the alcohol consumption prediction system based on gene screening, gene data of a user are obtained, the gene data of the user are input into an alcohol consumption prediction model to predict the alcohol consumption of the user, wherein the alcohol consumption prediction model is built through a decision tree classification model to obtain a first prediction result of the alcohol consumption, an alcohol consumption suggestion is given according to the first prediction result of the alcohol consumption of the user through an alcohol consumption suggestion database module, finally, the gene data of the user, the prediction result of the alcohol consumption and the alcohol consumption suggestion are displayed through a display module, more visual and valuable alcohol consumption evaluation and alcohol consumption suggestion are given according to the body condition of the user, and the user experience is improved. The query module is used for querying the drinking advice corresponding to the drinking section position based on the relationship between the drinking quantity related gene, the drinking section position and the drinking quantity in the drinking quantity prediction model; the drinking capacity judgment standard is provided, the individual drinking capacity is quantized, the user can conveniently inquire the drinking capacity, and the related knowledge is known.
According to some embodiments of the invention, the user gene acquisition module comprises:
the first acquisition module is used for acquiring saliva of a user;
the first processing module is connected with the first acquisition module and is used for extracting DNA from saliva of the user and performing gene sequencing on the extracted DNA;
and the second processing module is connected with the first processing module and used for performing biographic analysis after gene sequencing to obtain the genotype of the designated site and performing formatting processing on the genotype data of the designated sites rs1229984 and rs 671.
According to some embodiments of the invention, the alcohol consumption prediction model is constructed by using a machine learning model, and comprises:
s1, acquiring the relation between the drinking capacity and the drinking capacity of the sample, dividing the sample into a first preset number of drinking section positions according to the drinking capacity, and establishing a first database according to the relation between the drinking capacity and the drinking capacity of the sample and the drinking section positions;
s2, acquiring gene data of the sample and formatting the gene data;
and S3, constructing a drinking capacity prediction model according to the gene data of the formatted sample and the first database.
According to some embodiments of the invention, the obtaining and formatting gene data of the sample comprises:
s21, collecting saliva of the sample;
s22, extracting DNA according to the saliva of the sample, and performing gene sequencing on the extracted DNA;
s23, processing the gene data after gene sequencing to obtain the genotype of the gene locus related to the drinking capacity of each sample;
and S24, formatting the gene locus into numbers according to the genotype.
According to some embodiments of the invention, the gene locus screening of the gene data of the formatted sample comprises:
s241, respectively calculating the purity improvement value or uncertainty reduction value of each data subset obtained after the first database is divided and the data set before division;
s242, selecting a gene locus N with a maximum purity improvement value or a maximum uncertainty reduction value and a characteristic value N of the gene locus N, wherein the gene locus N is used as a node, and the first database is divided into two sub data sets according to the grouping of the characteristic value N of the gene locus N;
s243, sequentially calculating the purity improvement value or uncertainty reduction value of the characteristic value of each gene locus in the two subdata sets; selecting a gene locus M with a maximum purity improvement value or a maximum uncertainty reduction value and a characteristic value M of the gene locus M, wherein the gene locus M is used as a child node, and the child data set is split again according to the grouping of the characteristic value M of the gene locus M;
and S244, stopping splitting when the purity of the divided subdata set is determined to be greater than a preset purity threshold or the uncertainty value is determined to be smaller than a preset uncertainty threshold, and finally obtaining the gene locus related to the drinking volume and the relationship between the gene locus and the drinking section.
According to some embodiments of the invention, further comprising: and the user terminal is connected with the display module and used for receiving the related data information sent by the display module so as to be convenient for the user to check.
According to some embodiments of the invention, further comprising: a second obtaining module, configured to obtain second information that affects a drinking volume of a user, where the second information includes: disease history, type of drinking, alcohol degree, and alcohol frequency;
the calculation module is connected with the alcohol consumption prediction model interface module, the second acquisition module and the alcohol consumption suggestion database module and is used for calculating a second prediction result of the alcohol consumption according to the first prediction result and the second information according to a preset algorithm;
and the drinking suggestion database module is also used for giving drinking suggestions according to a second prediction result of the drinking amount of the user.
According to some embodiments of the invention, the preset algorithm comprises:
calculating the amount of ethanol in the first prediction of alcohol consumption:
V1=A×c
wherein A is the drinking capacity (ml) output by the alcohol capacity prediction model based on the gene data of the user; c is the preset alcohol concentration (% vol) in the alcohol capacity prediction model;
calculating the ethanol amount in the second prediction of alcohol consumption:
V2=V1×d×t×f
wherein d is a correlation coefficient of the disease history and the drinking capacity of the user; t is a correlation coefficient between the type of drinking and the drinking amount; f is a correlation coefficient of the drinking frequency and the drinking amount;
alcohol consumption of the second prediction result:
wherein, cuThe alcohol degree input by the user.
According to some embodiments of the invention, the gene screening-based alcohol consumption prediction system employs an SOA system architecture; the method comprises the following steps:
the data layer is used for acquiring user data according to a data source;
the data platform layer is connected with the data layer and used for providing a development environment for data management;
and the service layer is connected with the data platform layer and used for receiving the data information sent by the data platform and providing services for data consumers according to the data information.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and drawings.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Detailed Description
The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.
A drinking capacity prediction system based on genetic screening according to an embodiment of the present invention will be described with reference to FIGS. 1 to 8.
FIG. 1 is a block diagram of a system for predicting alcohol consumption based on genetic screening according to an embodiment of the present invention; as shown in fig. 1, an embodiment of the present invention provides a system for predicting alcohol consumption based on gene screening, including:
the user gene acquisition module 1 is used for acquiring gene data of a user;
the alcohol consumption prediction model interface module 2 is connected with the user gene acquisition module 1, inputs gene data of the user into the alcohol consumption prediction model to predict the alcohol consumption of the user and obtains a first prediction result of the alcohol consumption;
the drinking advice database module 3 is connected with the drinking capacity prediction model interface module 2 and used for giving drinking advice according to a first prediction result of the drinking capacity of the user;
the query module 4 is connected with the alcohol consumption prediction model interface module 2 and is used for querying alcohol consumption suggestions corresponding to alcohol consumption related genes, alcohol consumption section positions and the alcohol consumption section positions in the alcohol consumption prediction model;
and the display module 5 is connected with the drinking advice database module 3 and is used for displaying the user gene data, the prediction result of the drinking amount and the drinking advice.
According to the alcohol consumption prediction system based on gene screening, gene data of a user are obtained, the gene data of the user are input into an alcohol consumption prediction model to predict the alcohol consumption of the user, wherein the alcohol consumption prediction model is built through a decision tree classification model to obtain a first prediction result of the alcohol consumption, an alcohol consumption suggestion is given according to the first prediction result of the alcohol consumption of the user through an alcohol consumption suggestion database module, finally, the gene data of the user, the prediction result of the alcohol consumption and the alcohol consumption suggestion are displayed through a display module, more visual and valuable alcohol consumption evaluation and alcohol consumption suggestion are given according to the body condition of the user, and the user experience is improved. The query module is used for querying the drinking advice corresponding to the drinking section position based on the relationship between the drinking quantity related gene, the drinking section position and the drinking quantity in the drinking quantity prediction model; the drinking capacity judgment standard is provided, the individual drinking capacity is quantized, the user can conveniently inquire the drinking capacity, and the related knowledge is known.
FIG. 2 is a block diagram of a user gene acquisition module according to one embodiment of the present invention; as shown in fig. 2, the user gene obtaining module 1 includes:
a first obtaining module 11, configured to obtain saliva of a user;
the first processing module 12 is connected to the first obtaining module 11, and is configured to perform DNA extraction on the saliva of the user, and perform gene sequencing on the extracted DNA;
and the second processing module 13 is connected to the first processing module 12, and is configured to perform biographical analysis after gene sequencing, acquire the genotype of the designated site, and format the genotype data of the designated sites rs1229984 and rs 671.
The working principle and the beneficial effects of the technical scheme are as follows: DNA extraction and chip sequencing are carried out according to saliva of a user, rs1229984 gene locus and rs671 gene locus of the user are found out, the genotype data of the designated locus rs1229984 and rs671 are formatted, gene data related to the drinking capacity of the user are obtained, and the method is convenient and rapid.
FIG. 3 is a flow chart of a method of constructing a model for predicting alcohol consumption according to an embodiment of the present invention; as shown in fig. 3, the alcohol consumption prediction model is constructed by using a machine learning model, and includes:
s1, acquiring the relation between the drinking capacity and the drinking capacity of the sample, dividing the sample into a first preset number of drinking section positions according to the drinking capacity, and establishing a first database according to the relation between the drinking capacity and the drinking capacity of the sample and the drinking section positions;
s2, acquiring gene data of the sample and formatting the gene data;
and S3, constructing a drinking capacity prediction model according to the gene data of the formatted sample and the first database.
The working principle and the beneficial effects of the technical scheme are as follows: the method comprises the steps of obtaining the relation between the drinking capacity and the drinking capacity of a sample through a questionnaire investigation method, carrying out data analysis, dividing the drinking capacity into a first preset number of drinking sections, establishing a first database, dividing the drinking capacity into drinking grades, and specifically quantifying the drinking capacity, so as to be beneficial to providing more valuable drinking suggestions. And formatting the gene data of the sample, and constructing a drinking capacity prediction model according to the formatted gene data of the sample and the first database.
FIG. 4 is a flow chart of the processing of genetic data of a sample according to one embodiment of the present invention; as shown in fig. 4, the acquiring and formatting gene data of the sample includes:
s21, collecting saliva of the sample;
s22, extracting DNA according to the saliva of the sample, and performing gene sequencing on the extracted DNA;
s23, processing the gene data after gene sequencing to obtain the genotype of the gene locus related to the drinking capacity of each sample;
and S24, formatting the gene locus into numbers according to the genotype.
The working principle and the beneficial effects of the technical scheme are as follows: obtaining gene data of a sample, and performing DNA extraction, gene sequencing and genotyping on saliva of the sample; the gene sequencing method comprises the following steps: at least one of chip sequencing, second-generation sequencing, third-generation sequencing, PCR sequencing and panel sequencing. Finally, the genotype of the genetic locus related to the drinking capacity of each sample is obtained, and in order to effectively calculate the influence of the genetic locus on the drinking capacity, the genetic locus is formatted into numbers according to the genotype. Illustratively, the wild type is 0, the heterozygous mutant type is 1, and the homozygous mutant type is 2. If at the locus of the rs1229984 gene, CC is a homozygous mutant type and is formatted into a number of 2; TT is wild type, formatted to number 0; CT is a hybrid mutant type, and the formatting number is 1; if at the locus of the rs671 gene, AA is homozygous mutant and is formatted into a number of 2; GG is wild type, formatted into a number of 0; AG is a heterozygous mutant with a formatting number of 1.
FIG. 5 is a flowchart of the screening for alcohol consumption-related gene loci according to one embodiment of the present invention; as shown in fig. 5, the gene site screening is performed on the gene data of the formatted sample, which includes:
s241, respectively calculating the purity improvement value or uncertainty reduction value of each data subset obtained after the first database is divided and the data set before division;
s242, selecting a gene locus N with a maximum purity improvement value or a maximum uncertainty reduction value and a characteristic value N of the gene locus N, wherein the gene locus N is used as a node, and the first database is divided into two sub data sets according to the grouping of the characteristic value N of the gene locus N;
s243, sequentially calculating the purity improvement value or uncertainty reduction value of the characteristic value of each gene locus in the two subdata sets; selecting a gene locus M with a maximum purity improvement value or a maximum uncertainty reduction value and a characteristic value M of the gene locus M, wherein the gene locus M is used as a child node, and the child data set is split again according to the grouping of the characteristic value M of the gene locus M;
and S244, stopping splitting when the purity of the divided subdata set is determined to be greater than a preset purity threshold or the uncertainty value is determined to be smaller than a preset uncertainty threshold, and finally obtaining the gene locus related to the drinking volume and the relationship between the gene locus and the drinking section.
The working principle and the beneficial effects of the technical scheme are as follows: the method for measuring the purity and uncertainty of the data set before and after dividing the data set comprises the steps of calculating at least one parameter of information gain, information gain rate and a kini coefficient, wherein in the method for determining the purity and uncertainty according to the kini coefficient, the larger the kini coefficient is, the higher the uncertainty of the data is, the lower the sample purity is, and the smaller the proportion of a target sample in the data set in the total sample is; the smaller the kini coefficient is, the lower the uncertainty of the data is, the higher the sample purity is, and the higher the proportion of the target sample in the data set in the total sample is represented; and when the Gini coefficient is smaller than a preset numerical value, the divided subdata sets are shown to have the purity larger than a preset purity threshold value or the uncertainty value is smaller than a preset uncertainty threshold value, the splitting is stopped, and finally the gene locus related to the drinking capacity and the relationship between the gene locus and the drinking segment position are obtained. For example, when the kini coefficient is equal to 0, all samples in the dataset are of the same class.
In one embodiment, as shown in fig. 8, it is determined whether the result of the rs671 gene site of the sample is GG, i.e., it is determined whether the formatting of the rs671 gene site of the sample is 0, and the first database is divided into two data sets, i.e., a first data set and a second data set, according to whether the result of the rs671 gene site of the sample in the first data set is GG and the result of the rs671 gene site of the sample in the second data set is AA and AG; calculating the keny coefficient of the characteristic value of each gene locus in the first data set and the second data set, and selecting the gene locus A with the minimum calculated keny coefficient and the characteristic value a of the gene locus A, wherein the gene locus A is used as a child node, and the data sets are split again according to the grouping of the characteristic value a of the gene locus A; for example, in the first data set, it is determined whether the rs1229984 gene locus of the sample is CC or CT, and when it is determined as False, the rs1229984 gene locus of the sample is TT, i.e., the result of the rs671 gene locus of the sample in the group is GG, and the rs1229984 gene locus of the sample is TT, as shown in table one, the drinking segment is 8 segments. When determining that each group is a sample of the same type, namely the Gini coefficient is 0, stopping splitting, and finally obtaining the gene locus related to the drinking capacity and the relationship between the gene locus and the drinking section. The genes related to the drinking capacity are divided into corresponding drinking sections according to the gene types, so that the method is convenient to memorize, can accurately reflect the corresponding relation between the gene types and the drinking capacity, is clear at a glance, and improves the user experience.
According to some embodiments of the present invention, the gene loci related to drinking capacity include an rs1229984 gene locus and an rs671 gene locus, wherein the rs1229984 gene locus is located on the ADH1B gene, and when the result of the rs1229984 gene locus is TT type, the activity of alcohol dehydrogenase is strong, and alcohol metabolism is fast; the results show that the activity of the ethanol dehydrogenase is moderate in the CT type, and the metabolism speed of the ethanol is moderate; the result shows that the activity of the alcohol dehydrogenase is weak in the CC type, and the metabolism speed of the alcohol is slow; the rs671 gene locus is positioned on an ALDH2 gene, and the result of the rs671 gene locus is that acetaldehyde dehydrogenase activity is strong and acetaldehyde metabolism is fast when GG type genes are adopted; as a result, the activity of acetaldehyde dehydrogenase was weak in GA \ AA type, and acetaldehyde metabolism was slow.
According to some embodiments of the invention, the decision tree classification model is selected to construct the alcohol consumption prediction model.
Selecting a decision tree classification model to construct a drinking capacity prediction model;
the algorithm comprises the following steps:
using Python to program and call a decisionTreeConsiliier module of Sklern to carry out data mining and construct a drinking capacity prediction model;
decisiontreelsifier module main parameter settings:
criterion ═ gini': selecting a Gini coefficient as a measurement standard of node division quality;
splitter ═ best': finding the best cut point among all the features;
max _ depth ═ None: setting the maximum depth of the decision tree, wherein None represents that the maximum depth of the decision tree is not restricted until samples on each leaf node belong to the same class;
min _ samples _ split ═ 2: when an internal node is partitioned, the minimum number of samples on the node is required to be 2;
min _ samples _ leaf ═ 1: setting the minimum number of samples on the leaf node to be 1;
finally, the relation between the rs1229984 gene locus and the rs671 gene locus and the drinking capacity is obtained, and when the first preset number is 9, the relation is shown in the table I.
Watch 1
In one embodiment, the first predetermined number is 7, and the drinking segment is 7 segments, and the relationship between the rs1229984 gene locus and the rs671 gene locus and the drinking amount is shown in Table two.
Watch two
The working principle and the beneficial effects of the technical scheme are as follows: when the drinking section is 0, 3 situations are included: 1. the locus of the rs1229984 gene is CC, and the locus of the rs671 gene is AA; 2. the locus of the rs1229984 gene is TT, and the locus of the rs671 gene is AA; 3. the locus of the rs1229984 gene is CT, and the locus of the rs671 gene is AA. The naming of the drinking segment position is named in a discontinuous mode, such as 3 segments and 6 segments which are lacked, the discontinuous mode naming can match the drinking segment position with the specific alcohol capacity of the alcohol capacity, and for example, when the drinking segment position is 9 segments, the alcohol capacity of a user is more than 9.
FIG. 6 is a block diagram of a system for predicting alcohol consumption based on genetic screening according to still another embodiment of the present invention; as shown in fig. 6, the method further includes: and the user terminal 6 is connected with the display module and used for receiving the related data information sent by the display module so as to be convenient for the user to check.
According to some embodiments of the invention, further comprising: a second obtaining module 7, configured to obtain second information that affects a drinking amount of the user, where the second information includes: disease history, type of drinking, alcohol degree, and alcohol frequency;
the calculation module 8 is connected with the alcohol consumption prediction model interface module, the second acquisition module and the alcohol consumption suggestion database module, and is used for calculating a second prediction result of the alcohol consumption according to the first prediction result and the second information according to a preset algorithm;
and the drinking advice database module 3 is also used for giving drinking advice according to a second prediction result of the drinking capacity of the user.
The working principle and the beneficial effects of the technical scheme are as follows: correcting the prediction of the drinking capacity by combining a first prediction result of the drinking capacity output by the drinking capacity prediction model based on the gene data of the user and second information which influences the drinking capacity of the user by the actual condition, wherein the second information comprises: disease history, type of drinking, alcohol degree, and alcohol frequency; for example, as shown in table two, if the gene data of the user is rs1229984 gene locus CC and rs671 gene locus GG, the drinking amount of the user is predicted to be 7 segments, that is, the user can drink more than 7 alcohol (taking 50 ° white spirit as an example), but the user is recently on stomach illness and cannot drink alcohol, and the stomach perforation is easily caused by drinking alcohol, and the health is seriously damaged. Similarly, the prediction of the amount of alcohol consumed by the user may be influenced by the type of alcohol consumed by the user, the alcohol consumption level, and the alcohol consumption frequency. And according to a preset algorithm, a second prediction result of the drinking capacity is obtained through calculation, so that more effective drinking capacity prediction can be performed according to the actual condition of the user, and the prediction result is more accurate.
According to some embodiments of the invention, the preset algorithm comprises:
calculating the amount of ethanol in the first prediction of alcohol consumption:
V1=A×c
wherein A is the drinking capacity (ml) output by the alcohol capacity prediction model based on the gene data of the user; c is the preset alcohol concentration (% vol) in the alcohol capacity prediction model;
calculating the ethanol amount in the second prediction of alcohol consumption:
V2=V1×d×t×f
wherein d is a correlation coefficient of the disease history and the drinking capacity of the user; t is a correlation coefficient between the type of drinking and the drinking amount; f is a correlation coefficient of the drinking frequency and the drinking amount;
alcohol consumption of the second prediction result:
wherein, cuThe alcohol degree input by the user.
When a user belongs to a patient with stomach illness, a patient with liver disease, a patient with cardiovascular and cerebrovascular diseases, a pregnant woman, and takes a cold drug, a hypnotic drug and a tranquilizer, the correlation coefficient d of the disease history of the user and the drinking capacity is 0, namely the user can not drink wine; the value of the correlation coefficient d of the disease history and the drinking capacity of other users is between 0 and 1; the correlation coefficient t between the type of drinking and the drinking amount is shown in Table III; the correlation coefficient f between the drinking frequency and the drinking amount is shown in the fourth table; through the preset algorithm, the first prediction result of the drinking capacity is corrected, the second prediction result of the drinking capacity is obtained through calculation, more effective drinking capacity prediction can be performed according to the actual situation of the user, the prediction result is more accurate, the most correct drinking suggestion of the user is given, and the user experience is improved.
Watch III
| Type of drinking | Correlation coefficient t |
| White spirit | 1 |
| Beer with improved flavor | 1.5 |
| Grape wine | 1.8 |
Watch four
| Frequency of drinking | Coefficient of correlation f |
| Daily drinking | 0.3 |
| Once drinking for three days | 0.6 |
| Wine drinking once in 7 days | 0.8 |
FIG. 8 is a block diagram of an SOA system architecture according to one embodiment of the invention; as shown in fig. 8, the alcohol consumption prediction system based on gene screening employs an SOA system architecture; the method comprises the following steps:
the data layer 9 is used for acquiring user data according to a data source;
the data platform layer 10 is connected with the data layer 9 and used for providing a development environment for data management;
and the service layer 14 is connected with the data platform layer 10 and is used for receiving the data information sent by the data platform and providing services for data consumers according to the data information.
The working principle and the beneficial effects of the technical scheme are as follows: based on a Service Oriented Architecture (SOA), the advanced performance, the safety and the reliability of the system can be realized, different functional units (called services) of an application program are split, and good interfaces and contracts are defined among the services to be connected, so that the working efficiency is improved, and the feasibility, the flexibility and the expansibility of the system are realized. These services are connected together in a loosely coupled manner, and the SOA system architecture is employed to manage these various services, coordinate the interactions between the services, and enable users to conveniently access the services.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.