
技术领域technical field
本发明涉及疾病防控的技术领域,尤其涉及一种儿童病毒性腹泻发病风险的滞后分析方法,以及儿童病毒性腹泻发病风险的滞后分析装置。The invention relates to the technical field of disease prevention and control, in particular to a hysteresis analysis method for the onset risk of children's viral diarrhea, and a hysteresis analysis device for the onset risk of children's viral diarrhea.
背景技术Background technique
急性胃肠炎是常见人类消化道疾病,以呕吐、腹泻、发热等症状为主要特征。导致急性胃肠炎的因素有多种,包括细菌、病毒、寄生虫等。其中病毒性腹泻是由多种人类肠道病毒,如轮状病毒、诺如病毒、腺病毒、星状病毒等引起的常见消化道系统疾病,易感人群以5岁以下儿童为主,并且病毒被认为是全世界儿童严重急性腹泻的主要病原体,也是发展中国家儿童死亡的主要原因之一。Acute gastroenteritis is a common human digestive tract disease characterized by vomiting, diarrhea, and fever. There are many factors that cause acute gastroenteritis, including bacteria, viruses, and parasites. Among them, viral diarrhea is a common digestive tract disease caused by a variety of human enteroviruses, such as rotavirus, norovirus, adenovirus, astrovirus, etc. The susceptible population is mainly children under 5 years old, and the virus It is considered to be the leading causative agent of severe acute diarrhea in children worldwide and one of the leading causes of death among children in developing countries.
相关研究表明,诺如病毒为典型的食源性病毒,容易通过不洁净的水源及不洁净的食品传播;轮状病毒能够与空气中的污染粒子形成气溶胶而通过粪便-口腔、污染接触等因素传播。由于病毒性腹泻的传播途径直接贴近人们的日常生活,因而这类疾病能够在全球广泛存在,并且在全年各个季节均有发生。病毒性腹泻可归因于各种气象、水文等环境因素。在现有的证据中,低温、干旱因素对轮状病毒的传播起到促进作用,在温带气候中表现出冬季季节性,而诺如病毒的流行模式较不规则,其高峰可能在季节之间几周或者几个月内转移,表现出季节性的高度可变性。研究发现,在英格兰和威尔士,夏季出生的儿童确认轮状病毒感染的风险更高。在英国、荷兰、土耳其、澳大利亚、德国、印度、哥斯达黎加、尼泊尔等世界各地区的研究表明,轮状病毒引发腹泻的发病风险与气温因素呈现负相关关系,在部分地区如孟加拉国,高温下轮状病毒爆发风险会提升。而河流径流量、河流水位的提升则对诺如病毒的爆发具有促进作用。一些关于海产品养殖环境的研究表明,太阳辐射、水温、盐度等因素能够影响一些作为诺如病毒宿主的海洋产品,从而进一步影响食源性诺如病毒在人群中的爆发。Relevant studies have shown that norovirus is a typical food-borne virus, which is easily transmitted through unclean water sources and unclean food; rotavirus can form aerosols with pollutant particles in the air and pass through feces-oral, pollution contact, etc. factor spread. Because the transmission route of viral diarrhea is directly close to people's daily life, the disease can be widespread in the world and occur in all seasons of the year. Viral diarrhea can be attributed to various meteorological, hydrological and other environmental factors. In the existing evidence, low temperature and drought factors promote the transmission of rotavirus, showing winter seasonality in temperate climates, while the epidemic pattern of norovirus is more irregular, and its peak may be between seasons Metastases over weeks or months, exhibiting a high degree of seasonal variability. In England and Wales, children born in summer have a higher risk of confirmed rotavirus infection, the study found. Studies in the United Kingdom, the Netherlands, Turkey, Australia, Germany, India, Costa Rica, Nepal and other regions of the world have shown that the risk of diarrhea caused by rotavirus is negatively correlated with temperature factors. The risk of coronavirus outbreaks will increase. The increase of river runoff and river water level can promote the outbreak of norovirus. Some studies on the seafood farming environment have shown that factors such as solar radiation, water temperature, and salinity can affect some marine products as hosts of norovirus, thereby further affecting the outbreak of food-borne norovirus in the population.
在中国,环境因子对普遍性的急性胃肠炎和细菌性腹泻与的关联分析已被广泛报道,但环境因子对病毒性腹泻的报道仍然较少。Wang,P.研究了中国香港诺如病毒和轮状病毒感染住院数的季节变化,发现轮状病毒容易在冬季爆发,而诺如病毒则与夏季关联性更强。与此同时,极端降水对诺如病毒的感染风险比微量降雨的更高,但对轮状病毒的感染风险要低。Gao,Y.针对中国无锡地区环境温度与病毒性腹泻感染负担的调查显示,低温与病毒性腹泻的爆发有促进作用,与世界其他地区的研究一致。Ye,Q.调查了中国杭州地区儿童轮状病毒感染率与气温和空气污染物的关联,本研究除了进一步验证了温度与轮状病毒感染率的负相关关系外,还发现温度的变化对轮状病毒检出率具有显著影响。特别值得注意的是,作者发现PM2.5浓度、PM10浓度等空气污染物能够显著增加轮状病毒感染的风险,并观察到剂量、滞后和累积效应。In China, the association analysis of environmental factors on common acute gastroenteritis and bacterial diarrhea has been widely reported, but the report on environmental factors on viral diarrhea is still less. Wang, P. studied the seasonal changes in the number of hospitalizations for norovirus and rotavirus infections in Hong Kong, China, and found that rotavirus outbreaks are more likely to occur in winter, while norovirus is more strongly associated with summer. At the same time, extreme precipitation was associated with a higher risk of norovirus infection than trace rainfall, but a lower risk of rotavirus infection. Gao, Y. A survey of ambient temperature and viral diarrhea infection burden in Wuxi, China, shows that hypothermia is associated with viral diarrhea outbreaks, consistent with studies in other parts of the world. Ye, Q. investigated the association between rotavirus infection rate and air temperature and air pollutants in children in Hangzhou, China. In addition to further verifying the negative correlation between temperature and rotavirus infection rate, this study also found that changes in temperature had a significant impact on the rotavirus infection rate. The virus detection rate has a significant impact. Of particular note, the authors found that air pollutants such as PM2.5 concentration and PM10 concentration can significantly increase the risk of rotavirus infection, and observed dose, lag, and cumulative effects.
到目前为止,研究者已经提出了使用互联网搜索数据跟踪和检测传染病的积极意义。例如在美国,谷歌搜索数据能够提前两周报告流感趋势。其他研究者同样使用搜索查询数据来检测登革热、埃博拉病毒、手足口病等传染病的发病情况。Liu,K.则采用不同时间滞后的复合百度指数与诺如病毒发病数据采用spearman相关方法构建指数曲线模型来拟合2014年中国浙江省诺如病毒疫情相关数据,研究发现每增加一个单位的平均复合百度指数,诺如病毒感染风险增加2.15倍。由于互联网的监控系统利用来自社交媒体、搜索引擎查询数据和新闻报道,因而利用互联网搜索数据可以提高健康检测事件的敏感性和及时性。然而由于媒体、互联网使用行为、地区政策等外部干扰等诸多偏差和问题都会影响搜索引擎查询数据的预测准确性,因此单一使用互联网搜索数据检测传染病的发生具有一定的局限。So far, researchers have suggested the positive implications of using Internet search data to track and detect infectious diseases. In the United States, for example, Google search data can report flu trends two weeks in advance. Other researchers have also used search query data to detect the incidence of infectious diseases such as dengue fever, Ebola virus, hand, foot and mouth disease. Liu, K. used the composite Baidu index and norovirus incidence data with different time lags to construct an exponential curve model using the spearman correlation method to fit the data related to the norovirus epidemic in Zhejiang Province, China in 2014. Compounding the Baidu index, the risk of norovirus infection increased by 2.15 times. Since the monitoring system of the Internet utilizes query data from social media, search engines and news reports, the use of Internet search data can improve the sensitivity and timeliness of health detection events. However, many deviations and problems such as external interference such as media, Internet usage behavior, and regional policies will affect the prediction accuracy of search engine query data. Therefore, the use of Internet search data alone to detect the occurrence of infectious diseases has certain limitations.
发明内容SUMMARY OF THE INVENTION
为克服现有技术的缺陷,本发明要解决的技术问题是提供了一种儿童病毒性腹泻发病风险的滞后分析方法,其能提高传染性疾病监测的准确性,对传染病的爆发做出正确的预测预警,协助地区卫生相关部门更好地开展病毒性腹泻爆发的防控工作。In order to overcome the defects of the prior art, the technical problem to be solved by the present invention is to provide a hysteresis analysis method for the incidence risk of viral diarrhea in children, which can improve the accuracy of infectious disease monitoring and make correct decisions on the outbreak of infectious diseases. to assist regional health-related departments to better carry out prevention and control of viral diarrhea outbreaks.
本发明的技术方案是:这种儿童病毒性腹泻发病风险的滞后分析方法,其包括以下步骤:The technical scheme of the present invention is: the hysteresis analysis method of the onset risk of viral diarrhea in children, which comprises the following steps:
(1)采用描述性分析方法对每日病毒性腹泻病例和各类变量的均值、标准差及时间序列进行统计分析,然后使用皮尔逊相关性检验以评估每日病毒性腹泻感染数量与各类因子之间的相互关系,以判断病例数据与各因子之间的相关性及相关显著程度,挑选与病例数据相关系数绝对值超过0.1,显著性水平小于0.05的因子作进一步分析;(1) Statistical analysis was performed on the mean, standard deviation and time series of daily viral diarrhea cases and various variables using descriptive analysis methods, and then Pearson's correlation test was used to evaluate the relationship between the daily number of viral diarrhea infections and various variables. To determine the correlation and significance of the correlation between the case data and each factor, select the factors whose absolute value of the correlation coefficient with the case data exceeds 0.1 and the significance level is less than 0.05 for further analysis;
(2)针对选中的全部气象因子,采用主成分分析方法进行数据降维,提取主成分作为构建回归模型的要素,每个主成分为公式(2) For all the selected meteorological factors, the principal component analysis method is used to reduce the data dimension, and the principal components are extracted as the elements for constructing the regression model, and each principal component is a formula
(1):(1):
其中zi为每个气象因子的主成分,xj为每个气象因子序列数据,αj为执行主成分分析后每个主成分每个因子的载荷,n为选中的主成分的数量;wherezi is the principal component of each meteorological factor, xj is the sequence data of each meteorological factor, αj is the loading of each principal component and each factor after performing the principal component analysis, and n is the number of selected principal components;
(3)选取对气象参数贡献率最高的若干个主成分作为新的气象因子成分,所选取的主成分积累贡献率超过90%,针对选中的全部空气质量因子,采用相同的主成分分析方法得到贡献率最高的主成分,得到新的空气质量因子;(3) Select several principal components with the highest contribution rate to meteorological parameters as new meteorological factor components. The cumulative contribution rate of the selected principal components exceeds 90%. For all the selected air quality factors, the same principal component analysis method is used to obtain The principal component with the highest contribution rate is used to obtain a new air quality factor;
(4)针对选中的全部百度搜索数据列,采用公式(2)获得复合百度搜索关键词:(4) For all the selected Baidu search data columns, formula (2) is used to obtain compound Baidu search keywords:
其中BDI为复合百度搜索指数,xi为各个被选中的百度搜索数据列,βi为每个数据列与疫情序列之间的皮尔逊相关系数,n为选中的百度搜索数据列的数目;Among them, BDI is the composite Baidu search index, xi is each selected Baidu search data column, βi is the Pearson correlation coefficient between each data column and the epidemic series, and n is the number of selected Baidu search data columns;
(5)将上述经过处理后的各类因子纳入到模型构建过程,分布式滞后非线性模型是基于滞后效应的回归模型,模型为公式(3):(5) The above processed factors are incorporated into the model construction process. The distributed lag nonlinear model is a regression model based on the lag effect, and the model is formula (3):
其中E(Yt)为每日病毒性腹泻发病数量,cb(xi)为每个气象因子主成分的交叉积矩阵,cb(xj)为每个空气质量因子主成分的交叉矩阵,cb(BDI)为复合百度搜索指数的交叉积矩阵;各类要素空间采用自然三次样条函数,样条节点选自对数尺度25%、50%及75%分位数处,自由度df的初始值定为3;在为各类因子建立交叉矩阵时选取21天的滞后期,滞后期自由度df初始值的初始值定为3,模型的混杂要素包括日期因子、星期几因子和季节因子,其中ns为自然三次样条函数,用以控制时间变量的长期趋势,时间因子的自由度df初始值定为每年7个自由度。where E(Yt ) is the daily number of viral diarrhea cases, cb(xi ) is the cross-product matrix of the principal components of each meteorological factor, cb(xj ) is the cross-product matrix of the principal components of each air quality factor, and cb (BDI) is the cross product matrix of the composite Baidu search index; the natural cubic spline function is used in various element spaces, and the spline nodes are selected from the 25%, 50% and 75% quantiles of the logarithmic scale, and the initial degree of freedom df The value is set to 3; a lag period of 21 days is selected when establishing a cross matrix for various factors, and the initial value of the initial value of the degree of freedom df of the lag period is set to 3. The confounding elements of the model include date factor, day of week factor and seasonal factor. Among them, ns is the natural cubic spline function, which is used to control the long-term trend of the time variable. The initial value of the degree of freedom df of the time factor is set to 7 degrees of freedom per year.
本发明通过将互联网查询数据和传统监控结合,通过部分气象因子、空气质量因子以及互联网搜索数据对中国温带地区5岁以下儿童病毒性腹泻发病风险的滞后依赖性,为儿童病毒性腹泻的发病风险提供外部自然环境、社会活动等多方面视角,协助地区卫生相关部门更好地开展病毒性腹泻爆发的防控工作能提高传染性疾病监测的准确性,对传染病的爆发做出正确的预测预警。The present invention combines Internet query data with traditional monitoring, and through the hysteresis dependence of some meteorological factors, air quality factors and Internet search data on the incidence risk of viral diarrhea in children under 5 years old in temperate regions of China, the incidence risk of viral diarrhea in children Provide the external natural environment, social activities and other perspectives, and assist local health departments to better carry out the prevention and control of viral diarrhea outbreaks, which can improve the accuracy of infectious disease monitoring and make correct predictions and early warnings for infectious disease outbreaks. .
还提供了儿童病毒性腹泻发病风险的滞后分析装置,其包括:Also provided is a hysteresis analysis device for the risk of developing viral diarrhea in children, including:
数据采集选择模块,其配置来采用描述性分析方法对每日病毒性腹泻病例和各类变量的均值、标准差及时间序列进行统计分析,然后使用皮尔逊相关性检验以评估每日病毒性腹泻感染数量与各类因子之间的相互关系,以判断病例数据与各因子之间的相关性及相关显著程度,挑选与病例数据相关系数绝对值超过0.1,显著性水平小于0.05的因子作进一步分析;Data collection selection module configured to perform statistical analysis of daily viral diarrhea cases and mean, standard deviation, and time series of various variables using descriptive analysis methods, and then use Pearson's correlation test to assess daily viral diarrhea The relationship between the number of infections and various factors is used to determine the correlation and significance between the case data and each factor, and select the factors whose absolute value of the correlation coefficient with the case data exceeds 0.1 and the significance level is less than 0.05 for further analysis. ;
数据降维模块,其配置来针对选中的全部气象因子,采用主成分分析方法进行数据降维,提取主成分作为构建回归模型的要素,每个主成分为公式(1):The data dimension reduction module is configured to reduce the dimension of the data by using the principal component analysis method for all the selected meteorological factors, and extract the principal components as the elements for constructing the regression model. Each principal component is formula (1):
其中zi为每个气象因子的主成分,xj为每个气象因子序列数据,αj为执行主成分分析后每个主成分每个因子的载荷,n为选中的主成分的数量;wherezi is the principal component of each meteorological factor, xj is the sequence data of each meteorological factor, αj is the loading of each principal component and each factor after performing the principal component analysis, and n is the number of selected principal components;
空气质量因子获取模块,其配置来选取对气象参数贡献率最高的若干个主成分作为新的气象因子成分,所选取的主成分积累贡献率超过90%,针对选中的全部空气质量因子,采用相同的主成分分析方法得到贡献率最高的主成分,得到新的空气质量因子;The air quality factor acquisition module is configured to select several principal components with the highest contribution rate to meteorological parameters as new meteorological factor components. The cumulative contribution rate of the selected principal components exceeds 90%. For all selected air quality factors, the same The principal component analysis method of , obtains the principal component with the highest contribution rate, and obtains a new air quality factor;
百度搜索数据模块,其配置来针对选中的全部百度搜索数据列,采用公式(2)获得复合百度搜索关键词:The Baidu search data module is configured to use formula (2) to obtain compound Baidu search keywords for all the selected Baidu search data columns:
其中BDI为复合百度搜索指数,xi为各个被选中的百度搜索数据列,βi为每个数据列与疫情序列之间的皮尔逊相关系数,n为选中的百度搜索数据列的数目;Among them, BDI is the composite Baidu search index, xi is each selected Baidu search data column, βi is the Pearson correlation coefficient between each data column and the epidemic series, and n is the number of selected Baidu search data columns;
模型构建模块,其配置来将上述经过处理后的各类因子纳入到模型构建过程,分布式滞后非线性模型是基于滞后效应的回归模型,模型为公式(3):The model building module is configured to incorporate the above processed factors into the model building process. The distributed lag nonlinear model is a regression model based on the lag effect, and the model is formula (3):
其中E(Yt)为每日病毒性腹泻发病数量,cb(xi)为每个气象因子主成分的交叉积矩阵,cb(xj)为每个空气质量因子主成分的交叉矩阵,cb(BDI)为复合百度搜索指数的交叉积矩阵;各类要素空间采用自然三次样条函数,样条节点选自对数尺度25%、50%及75%分位数处,自由度df的初始值定为3;在为各类因子建立交叉矩阵时选取21天的滞后期,滞后期自由度df初始值的初始值定为3,模型的混杂要素包括日期因子、星期几因子和季节因子,其中ns为自然三次样条函数,用以控制时间变量的长期趋势,时间因子的自由度df初始值定为每年7个自由度。where E(Yt ) is the daily number of viral diarrhea cases, cb(xi ) is the cross-product matrix of the principal components of each meteorological factor, cb(xj ) is the cross-product matrix of the principal components of each air quality factor, and cb (BDI) is the cross product matrix of the composite Baidu search index; the natural cubic spline function is used in various element spaces, and the spline nodes are selected from the 25%, 50% and 75% quantiles of the logarithmic scale, and the initial degree of freedom df The value is set to 3; a lag period of 21 days is selected when establishing a cross matrix for various factors, and the initial value of the initial value of the degree of freedom df of the lag period is set to 3. The confounding elements of the model include date factor, day of week factor and seasonal factor. Among them, ns is the natural cubic spline function, which is used to control the long-term trend of the time variable. The initial value of the degree of freedom df of the time factor is set to 7 degrees of freedom per year.
附图说明Description of drawings
图1是根据本发明的儿童病毒性腹泻发病风险的滞后分析方法的流程图。FIG. 1 is a flow chart of a method for hysteresis analysis of the risk of developing viral diarrhea in children according to the present invention.
具体实施方式Detailed ways
如图1所示,这种儿童病毒性腹泻发病风险的滞后分析方法,其包括以下步骤:As shown in Figure 1, this lag analysis method for the risk of developing viral diarrhea in children includes the following steps:
(1)采用描述性分析方法对每日病毒性腹泻病例和各类变量的均值、标准差及时间序列进行统计分析,然后使用皮尔逊相关性检验以评估每日病毒性腹泻感染数量与各类因子之间的相互关系,以判断病例数据与各因子之间的相关性及相关显著程度,挑选与病例数据相关系数绝对值超过0.1,显著性水平小于0.05的因子作进一步分析;(1) Statistical analysis was performed on the mean, standard deviation and time series of daily viral diarrhea cases and various variables using descriptive analysis methods, and then Pearson's correlation test was used to evaluate the relationship between the daily number of viral diarrhea infections and various variables. To determine the correlation and significance of the correlation between the case data and each factor, select the factors whose absolute value of the correlation coefficient with the case data exceeds 0.1 and the significance level is less than 0.05 for further analysis;
(2)针对选中的全部气象因子,采用主成分分析方法进行数据降维,提取主成分作为构建回归模型的要素,每个主成分为公式(2) For all the selected meteorological factors, the principal component analysis method is used to reduce the data dimension, and the principal components are extracted as the elements for constructing the regression model, and each principal component is a formula
(1):(1):
其中zi为每个气象因子的主成分,xj为每个气象因子序列数据,αj为执行主成分分析后每个主成分每个因子的载荷,n为选中的主成分的数量;wherezi is the principal component of each meteorological factor, xj is the sequence data of each meteorological factor, αj is the loading of each principal component and each factor after performing the principal component analysis, and n is the number of selected principal components;
(3)选取对气象参数贡献率最高的若干个主成分作为新的气象因子成分,所选取的主成分积累贡献率超过90%,针对选中的全部空气质量因子,采用相同的主成分分析方法得到贡献率最高的主成分,得到新的空气质量因子;(3) Select several principal components with the highest contribution rate to meteorological parameters as new meteorological factor components. The cumulative contribution rate of the selected principal components exceeds 90%. For all the selected air quality factors, the same principal component analysis method is used to obtain The principal component with the highest contribution rate is used to obtain a new air quality factor;
(4)针对选中的全部百度搜索数据列,采用公式(2)获得复合百度搜索关键词:(4) For all the selected Baidu search data columns, formula (2) is used to obtain compound Baidu search keywords:
其中BDI为复合百度搜索指数,xi为各个被选中的百度搜索数据列,βi为每个数据列与疫情序列之间的皮尔逊相关系数,n为选中的百度搜索数据列的数目;Among them, BDI is the composite Baidu search index, xi is each selected Baidu search data column, βi is the Pearson correlation coefficient between each data column and the epidemic series, and n is the number of selected Baidu search data columns;
(5)将上述经过处理后的各类因子纳入到模型构建过程,分布式滞后非线性模型是基于滞后效应的回归模型,模型为公式(3):(5) The above processed factors are incorporated into the model construction process. The distributed lag nonlinear model is a regression model based on the lag effect, and the model is formula (3):
其中E(Yt)为每日病毒性腹泻发病数量,cb(xi)为每个气象因子主成分的交叉积矩阵,cb(xj)为每个空气质量因子主成分的交叉矩阵,cb(BDI)为复合百度搜索指数的交叉积矩阵;各类要素空间采用自然三次样条函数,样条节点选自对数尺度25%、50%及75%分位数处,自由度df的初始值定为3;在为各类因子建立交叉矩阵时选取21天的滞后期,滞后期自由度df初始值的初始值定为3,模型的混杂要素包括日期因子、星期几因子和季节因子,其中ns为自然三次样条函数,用以控制时间变量的长期趋势,时间因子的自由度df初始值定为每年7个自由度。where E(Yt ) is the daily number of viral diarrhea cases, cb(xi ) is the cross-product matrix of the principal components of each meteorological factor, cb(xj ) is the cross-product matrix of the principal components of each air quality factor, and cb (BDI) is the cross product matrix of the composite Baidu search index; the natural cubic spline function is used in various element spaces, and the spline nodes are selected from the 25%, 50% and 75% quantiles of the logarithmic scale, and the initial degree of freedom df The value is set to 3; a lag period of 21 days is selected when establishing a cross matrix for various factors, and the initial value of the initial value of the degree of freedom df of the lag period is set to 3. The confounding elements of the model include date factor, day of week factor and seasonal factor. Among them, ns is the natural cubic spline function, which is used to control the long-term trend of the time variable. The initial value of the degree of freedom df of the time factor is set to 7 degrees of freedom per year.
本发明通过将互联网查询数据和传统监控结合,通过部分气象因子、空气质量因子以及互联网搜索数据对中国温带地区5岁以下儿童病毒性腹泻发病风险的滞后依赖性,为儿童病毒性腹泻的发病风险提供外部自然环境、社会活动等多方面视角,协助地区卫生相关部门更好地开展病毒性腹泻爆发的防控工作能提高传染性疾病监测的准确性,对传染病的爆发做出正确的预测预警。The present invention combines Internet query data with traditional monitoring, and through the hysteresis dependence of some meteorological factors, air quality factors and Internet search data on the incidence risk of viral diarrhea in children under 5 years old in temperate regions of China, the incidence risk of viral diarrhea in children Provide the external natural environment, social activities and other perspectives, and assist local health departments to better carry out the prevention and control of viral diarrhea outbreaks, which can improve the accuracy of infectious disease monitoring and make correct predictions and early warnings for infectious disease outbreaks. .
优选地,在所述步骤(5)中,为了控制过度离散效应,模型中的连接函数采用准泊松函数。Preferably, in the step (5), in order to control the overdispersion effect, the connection function in the model adopts a quasi-Poisson function.
优选地,在所述步骤(5)中,更改模型中的要素交叉积矩阵和日期自由度df值、添加或删除季节因子的方式针对模型进行敏感性分析,根据赤池信息准则AIC对模型进行评价,以确定最终的各个df值。Preferably, in the step (5), sensitivity analysis is performed on the model by changing the element cross-product matrix and the date degree of freedom df value in the model, adding or deleting seasonal factors, and evaluating the model according to the Akaike Information Criterion (AIC). , to determine the final individual df values.
优选地,亚组分析中,5岁以下儿童按照性别及年龄进行分组,采用同一模型针对不同群体进行亚组分析。Preferably, in the subgroup analysis, children under the age of 5 are grouped according to gender and age, and the same model is used to perform subgroup analysis for different groups.
依据中华人民共和国传染病防治法,病毒性腹泻属于C类传染性疾病。2003年以后,中国政府建立了国家法定传染病报告系统,要求临床医生在确诊患者后24小时内,以标准化表格向中国疾病预防控制中心在线报告患者个人信息。优选地,在所述步骤(1)中,从中国疾病预防控制中心收集吉林省2014到2019年的病毒性腹泻案例数据,每条案例包含患者的性别、年龄、发病日期、致病病毒类别;气象数据集由中国气象数据共享服务系统提供,该气象数据集包括蒸发量(毫米)、降水量(毫米)、日照时长、三组地表温度数据(平均地表温度、最大地表温度、最小地表温度(摄氏度))、三组气压数据(平均气压、最大气压、最小气压(百帕))、两组相对湿度数据(平均相对湿度、最小相对湿度(百分率))、三组气温数据(平均气温、最高气温、最低气温(摄氏度))、三组风速数据(平均风速、最大风速、极大风速(米每秒)),将吉林省30个监测点所监测的上述每一项气象因子按日做算术平均,获得吉林省的17种气象因子时间序列数据;空气质量数据从中国空气质量在线监测分析平台获取,获取吉林省9个市级行政单位的每日AQI指数、PM2.5浓度、PM10浓度、CO浓度、NO2浓度、SO2浓度、O3浓度共计7个空气质量因子的时间序列数据,并将9个监测区域上述每一项空气质量因子按日做算术平均,获得吉林省的7项空气质量因子时间序列数据;在中国,百度是市场占比最高的搜索引擎,中国疾病预防控制中心病毒病所针对病毒性腹泻相关症状、致病因子、预防与治疗产品而提供的多达20个关键词,选取吉林省相应时间段内上述所有关键词的百度搜索指数时间序列数据。According to the Law of the People's Republic of China on the Prevention and Control of Infectious Diseases, viral diarrhea is a Class C infectious disease. After 2003, the Chinese government established a national notifiable infectious disease reporting system, requiring clinicians to report patients’ personal information online in a standardized form to the Chinese Center for Disease Control and Prevention within 24 hours of diagnosing a patient. Preferably, in the step (1), the viral diarrhea case data in Jilin Province from 2014 to 2019 are collected from the Chinese Center for Disease Control and Prevention, and each case includes the patient's gender, age, date of onset, and pathogenic virus category; The meteorological data set is provided by the China Meteorological Data Sharing Service System. The meteorological data set includes evaporation (mm), precipitation (mm), sunshine duration, and three sets of surface temperature data (average surface temperature, maximum surface temperature, minimum surface temperature ( Celsius)), three groups of air pressure data (average air pressure, maximum air pressure, minimum air pressure (hPa)), two groups of relative humidity data (average relative humidity, minimum relative humidity (percentage)), three groups of air temperature data (average air temperature, maximum Temperature, minimum temperature (degree Celsius), three sets of wind speed data (average wind speed, maximum wind speed, maximum wind speed (meters per second)), each of the above meteorological factors monitored by the 30 monitoring points in Jilin Province is calculated on a daily basis On average, the time series data of 17 meteorological factors in Jilin Province were obtained; the air quality data was obtained from the China Air Quality Online Monitoring and Analysis Platform, and the daily AQI index, PM2.5 concentration, PM10 concentration, daily AQI index, PM2.5 concentration, PM10 concentration, The time-series data of 7 air quality factors, including CO concentration, NO2 concentration, SO2 concentration, and O3 concentration, and the arithmetic average of each of the above air quality factors in the 9 monitoring areas on a daily basis to obtain 7 air quality factors in Jilin Province Time-series data; Baidu is the search engine with the highest market share in China. As many as 20 keywords are provided by the Institute of Viral Diseases of the Chinese Center for Disease Control and Prevention for viral diarrhea-related symptoms, pathogenic factors, and prevention and treatment products. Select the Baidu search index time series data of all the above keywords in the corresponding time period of Jilin Province.
本领域普通技术人员可以理解,实现上述实施例方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,所述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,包括上述实施例方法的各步骤,而所述的存储介质可以是:ROM/RAM、磁碟、光盘、存储卡等。因此,与本发明的方法相对应的,本发明还同时包括一种儿童病毒性腹泻发病风险的滞后分析装置,该装置通常以与方法各步骤相对应的功能模块的形式表示。该装置包括:Those of ordinary skill in the art can understand that all or part of the steps in the method of the above-mentioned embodiments can be completed by instructing the relevant hardware through a program, and the program can be stored in a computer-readable storage medium, and the program can be stored in a computer-readable storage medium. During execution, it includes each step of the method in the above embodiment, and the storage medium may be: ROM/RAM, magnetic disk, optical disk, memory card, and the like. Therefore, corresponding to the method of the present invention, the present invention also includes a hysteresis analysis device for the onset risk of viral diarrhea in children, which is usually expressed in the form of functional modules corresponding to each step of the method. The device includes:
数据采集选择模块,其配置来采用描述性分析方法对每日病毒性腹泻病例和各类变量的均值、标准差及时间序列进行统计分析,然后使用皮尔逊相关性检验以评估每日病毒性腹泻感染数量与各类因子之间的相互关系,以判断病例数据与各因子之间的相关性及相关显著程度,挑选与病例数据相关系数绝对值超过0.1,显著性水平小于0.05的因子作进一步分析;Data collection selection module configured to perform statistical analysis of daily viral diarrhea cases and mean, standard deviation, and time series of various variables using descriptive analysis methods, and then use Pearson's correlation test to assess daily viral diarrhea The relationship between the number of infections and various factors is used to determine the correlation and significance between the case data and each factor, and select the factors whose absolute value of the correlation coefficient with the case data exceeds 0.1 and the significance level is less than 0.05 for further analysis. ;
数据降维模块,其配置来针对选中的全部气象因子,采用主成分分析方法进行数据降维,提取主成分作为构建回归模型的要素,每个主成分为公式(1):The data dimension reduction module is configured to reduce the dimension of the data by using the principal component analysis method for all the selected meteorological factors, and extract the principal components as the elements for constructing the regression model. Each principal component is formula (1):
其中zi为每个气象因子的主成分,xj为每个气象因子序列数据,αj为执行主成分分析后每个主成分每个因子的载荷,n为选中的主成分的数量;wherezi is the principal component of each meteorological factor, xj is the sequence data of each meteorological factor, αj is the loading of each principal component and each factor after performing the principal component analysis, and n is the number of selected principal components;
空气质量因子获取模块,其配置来选取对气象参数贡献率最高的若干个主成分作为新的气象因子成分,所选取的主成分积累贡献率超过90%,针对选中的全部空气质量因子,采用相同的主成分分析方法得到贡献率最高的主成分,得到新的空气质量因子;The air quality factor acquisition module is configured to select several principal components with the highest contribution rate to meteorological parameters as new meteorological factor components. The cumulative contribution rate of the selected principal components exceeds 90%. For all selected air quality factors, the same The principal component analysis method of , obtains the principal component with the highest contribution rate, and obtains a new air quality factor;
百度搜索数据模块,其配置来针对选中的全部百度搜索数据列,采用公式(2)获得复合百度搜索关键词:The Baidu search data module is configured to use formula (2) to obtain compound Baidu search keywords for all the selected Baidu search data columns:
其中BDI为复合百度搜索指数,xi为各个被选中的百度搜索数据列,βi为每个数据列与疫情序列之间的皮尔逊相关系数,n为选中的百度搜索数据列的数目;Among them, BDI is the composite Baidu search index, xi is each selected Baidu search data column, βi is the Pearson correlation coefficient between each data column and the epidemic series, and n is the number of selected Baidu search data columns;
模型构建模块,其配置来将上述经过处理后的各类因子纳入到模型构建过程,分布式滞后非线性模型是基于滞后效应的回归模型,模型为公式(3):The model building module is configured to incorporate the above processed factors into the model building process. The distributed lag nonlinear model is a regression model based on the lag effect, and the model is formula (3):
其中E(Yt)为每日病毒性腹泻发病数量,cb(xi)为每个气象因子主成分的交叉积矩阵,cb(xj)为每个空气质量因子主成分的交叉矩阵,cb(BDI)为复合百度搜索指数的交叉积矩阵;各类要素空间采用自然三次样条函数,样条节点选自对数尺度25%、50%及75%分位数处,自由度df的初始值定为3;在为各类因子建立交叉矩阵时选取21天的滞后期,滞后期自由度df初始值的初始值定为3,模型的混杂要素包括日期因子、星期几因子和季节因子,其中ns为自然三次样条函数,用以控制时间变量的长期趋势,时间因子的自由度df初始值定为每年7个自由度。where E(Yt ) is the daily number of viral diarrhea cases, cb(xi ) is the cross-product matrix of the principal components of each meteorological factor, cb(xj ) is the cross-product matrix of the principal components of each air quality factor, and cb (BDI) is the cross product matrix of the composite Baidu search index; the natural cubic spline function is used in various element spaces, and the spline nodes are selected from the 25%, 50% and 75% quantiles of the logarithmic scale, and the initial degree of freedom df The value is set to 3; a lag period of 21 days is selected when establishing a cross matrix for various factors, and the initial value of the initial value of the degree of freedom df of the lag period is set to 3. The confounding elements of the model include date factor, day of week factor and seasonal factor. Among them, ns is the natural cubic spline function, which is used to control the long-term trend of the time variable. The initial value of the degree of freedom df of the time factor is set to 7 degrees of freedom per year.
优选地,所述模型构建模块,为了控制过度离散效应,模型中的连接函数采用准泊松函数。Preferably, in the model building module, in order to control the overdispersion effect, the connection function in the model adopts a quasi-Poisson function.
优选地,所述模型构建模块,更改模型中的要素交叉积矩阵和日期自由度df值、添加或删除季节因子的方式针对模型进行敏感性分析,根据赤池信息准则AIC对模型进行评价,以确定最终的各个df值。Preferably, the model building module performs sensitivity analysis on the model by changing the cross-product matrix of elements and the df value of the date degree of freedom in the model, adding or deleting seasonal factors, and evaluating the model according to the Akaike Information Criterion AIC to determine The final individual df values.
优选地,亚组分析中,5岁以下儿童按照性别及年龄进行分组,采用同一模型针对不同群体进行亚组分析。Preferably, in the subgroup analysis, children under the age of 5 are grouped according to gender and age, and the same model is used to perform subgroup analysis for different groups.
优选地,所述数据采集选择模块,从中国疾病预防控制中心收集吉林省2014到2019年的病毒性腹泻案例数据,每条案例包含患者的性别、年龄、发病日期、致病病毒类别;气象数据集由中国气象数据共享服务系统提供,该气象数据集包括蒸发量、降水量、日照时长、三组地表温度数据、三组气压数据、两组相对湿度数据、三组气温数据、三组风速数据,将吉林省30个监测点所监测的上述每一项气象因子按日做算术平均,获得吉林省的17种气象因子时间序列数据;空气质量数据从中国空气质量在线监测分析平台获取,获取吉林省9个市级行政单位的每日AQI指数、PM2.5浓度、PM10浓度、CO浓度、NO2浓度、SO2浓度、O3浓度共计7个空气质量因子的时间序列数据,并将9个监测区域上述每一项空气质量因子按日做算术平均,获得吉林省的7项空气质量因子时间序列数据;中国疾病预防控制中心病毒病所针对病毒性腹泻相关症状、致病因子、预防与治疗产品提供多达20个关键词,选取吉林省相应时间段内上述所有关键词的百度搜索指数时间序列数据。Preferably, the data collection selection module collects viral diarrhea case data in Jilin Province from 2014 to 2019 from the Chinese Center for Disease Control and Prevention, and each case includes the patient's gender, age, date of onset, and type of pathogenic virus; meteorological data The meteorological data set is provided by the China Meteorological Data Sharing Service System. The meteorological data set includes evaporation, precipitation, sunshine duration, three sets of surface temperature data, three sets of air pressure data, two sets of relative humidity data, three sets of air temperature data, and three sets of wind speed data. , each of the above-mentioned meteorological factors monitored by 30 monitoring points in Jilin Province was arithmetically averaged on a daily basis to obtain the time series data of 17 meteorological factors in Jilin Province; the air quality data was obtained from the China Air Quality Online Monitoring and Analysis Platform. The daily AQI index, PM2.5 concentration, PM10 concentration, CO concentration, NO2 concentration, SO2 concentration, and O3 concentration of 9 municipal administrative units in the province are the time series data of 7 air quality factors, and the 9 monitoring areas above Each air quality factor is arithmetically averaged on a daily basis to obtain time-series data of 7 air quality factors in Jilin Province; the Institute of Viral Diseases of the Chinese Center for Disease Control and Prevention provides a variety of symptoms, pathogenic factors, prevention and treatment products for viral diarrhea. Up to 20 keywords, select the Baidu search index time series data of all the above keywords in the corresponding time period of Jilin Province.
以上所述,仅是本发明的较佳实施例,并非对本发明作任何形式上的限制,凡是依据本发明的技术实质对以上实施例所作的任何简单修改、等同变化与修饰,均仍属本发明技术方案的保护范围。The above are only preferred embodiments of the present invention, and do not limit the present invention in any form. Any simple modifications, equivalent changes and modifications made to the above embodiments according to the technical essence of the present invention still belong to the present invention The protection scope of the technical solution of the invention.
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202110218435.3ACN112951442B (en) | 2021-02-23 | 2021-02-23 | Hysteresis analysis method and device for child viral diarrhea onset risk |
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202110218435.3ACN112951442B (en) | 2021-02-23 | 2021-02-23 | Hysteresis analysis method and device for child viral diarrhea onset risk |
| Publication Number | Publication Date |
|---|---|
| CN112951442A CN112951442A (en) | 2021-06-11 |
| CN112951442Btrue CN112951442B (en) | 2022-09-23 |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202110218435.3AActiveCN112951442B (en) | 2021-02-23 | 2021-02-23 | Hysteresis analysis method and device for child viral diarrhea onset risk |
| Country | Link |
|---|---|
| CN (1) | CN112951442B (en) |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114519308A (en)* | 2022-02-22 | 2022-05-20 | 河南大学 | Method for determining river water and underground water interconversion lag response time influenced by river water and sand regulation |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103793621A (en)* | 2014-03-06 | 2014-05-14 | 上海市浦东新区疾病预防控制中心 | Comprehensive dysentery monitoring platform |
| CN104008164A (en)* | 2014-05-29 | 2014-08-27 | 华东师范大学 | Generalized regression neural network based short-term diarrhea multi-step prediction method |
| CN111415752A (en)* | 2020-03-01 | 2020-07-14 | 集美大学 | A prediction method of hand, foot and mouth disease integrating meteorological factors and search index |
| CN111430040A (en)* | 2020-03-03 | 2020-07-17 | 广东省公共卫生研究院 | Hand-foot-and-mouth disease epidemic situation prediction method based on case, weather and pathogen monitoring data |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CA2901275C (en)* | 2013-02-15 | 2023-10-17 | Battelle Memorial Institute | Use of web-based symptom checker data to predict incidence of a disease or disorder |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103793621A (en)* | 2014-03-06 | 2014-05-14 | 上海市浦东新区疾病预防控制中心 | Comprehensive dysentery monitoring platform |
| CN104008164A (en)* | 2014-05-29 | 2014-08-27 | 华东师范大学 | Generalized regression neural network based short-term diarrhea multi-step prediction method |
| CN111415752A (en)* | 2020-03-01 | 2020-07-14 | 集美大学 | A prediction method of hand, foot and mouth disease integrating meteorological factors and search index |
| CN111430040A (en)* | 2020-03-03 | 2020-07-17 | 广东省公共卫生研究院 | Hand-foot-and-mouth disease epidemic situation prediction method based on case, weather and pathogen monitoring data |
| Title |
|---|
| 2007年-2008年吉林市儿童病毒性腹泻病监测结果分析;赵勇等;《中国实验诊断学》;20111125(第11期);82-85* |
| 997例其他感染性腹泻疾病的流行及病原学特征分析;郭雪鸿;《中国卫生标准管理》;20200415(第07期);26-28* |
| 主成分回归分析在细菌性痢疾与气象因素关系中的应用;廖洪秀等;《现代预防医学》;20090310(第05期);813-815* |
| 气象因素对其他感染性腹泻病的影响;陶燕等;《兰州大学学报(自然科学版)》;20151015(第05期);646-651* |
| Publication number | Publication date |
|---|---|
| CN112951442A (en) | 2021-06-11 |
| Publication | Publication Date | Title |
|---|---|---|
| Tilston et al. | Internet-based surveillance of Influenza-like-illness in the UK during the 2009 H1N1 influenza pandemic | |
| Wang et al. | Propagation from meteorological to hydrological drought and its influencing factors in the Huaihe River Basin | |
| Huang et al. | Monitoring hand, foot and mouth disease by combining search engine query data and meteorological factors | |
| Chen et al. | Assessing water resources vulnerability by using a rough set cloud model: A case study of the Huai River Basin, China | |
| CN110852493A (en) | Atmospheric PM2.5 concentration prediction method based on multiple model comparisons | |
| Karamuz et al. | Is it a drought or only a fluctuation in precipitation patterns?—Drought reconnaissance in Poland | |
| Liang et al. | Assessing the illegal hunting of native wildlife in China | |
| CN112951442B (en) | Hysteresis analysis method and device for child viral diarrhea onset risk | |
| Xu et al. | Impact of heatwaves and cold spells on the morbidity of respiratory diseases: a case study in Lanzhou, China | |
| Chattopadhyay et al. | Effect of a summer flood on benthic macroinvertebrates in a medium-sized, temperate, lowland river | |
| Zeng et al. | A landscape‐level analysis of bird taxonomic, functional and phylogenetic β‐diversity in habitat island systems | |
| CN118657497A (en) | A digital twin-based estuary and bay early warning management method and system | |
| CN116596308A (en) | Comprehensive evaluation method of heavy metal ecotoxicity risk in river and lake sediments | |
| Che et al. | Phylogenetic and functional structure of wintering waterbird communities associated with ecological differences | |
| Zhong et al. | Using the apriori algorithm and Copula function for the bivariate analysis of flash flood risk | |
| Markiewicz | Depth–duration–frequency relationship model of extreme precipitation in flood risk assessment in the Upper Vistula Basin | |
| Ghazvinian et al. | Investigating the effect of climatic parameters predicting the mortality rate due to cardiovascular and respiratory disease with soft computing methods | |
| CN110583533A (en) | Method for screening fish function indicating species in river ecosystem | |
| Qin et al. | Bivariate frequency of meteorological drought in the upper Minjiang River based on copula function | |
| Hou et al. | Drought hazard analysis in the Jilin province based on a three-dimensional copula method | |
| Deusdará-Leal et al. | Trends and climate elasticity of streamflow in south-eastern Brazil basins | |
| Li et al. | Missing data imputation for paired stream and air temperature sensor data | |
| Miller et al. | Faster indicators of chikungunya incidence using Google searches | |
| Hari Prasad Peri | Short-term exposure to air pollution and COVID-19 in India: spatio-temporal analysis of relative risk from 20 metropolitan cities | |
| Chong et al. | Sprouting and genetic structure vary with flood disturbance in the tropical riverine paperbark tree, Melaleuca leucadendra (Myrtaceae) |
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |