Movatterモバイル変換


[0]ホーム

URL:


CN112951442B - Hysteresis analysis method and device for child viral diarrhea onset risk - Google Patents

Hysteresis analysis method and device for child viral diarrhea onset risk
Download PDF

Info

Publication number
CN112951442B
CN112951442BCN202110218435.3ACN202110218435ACN112951442BCN 112951442 BCN112951442 BCN 112951442BCN 202110218435 ACN202110218435 ACN 202110218435ACN 112951442 BCN112951442 BCN 112951442B
Authority
CN
China
Prior art keywords
data
factor
meteorological
viral diarrhea
factors
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110218435.3A
Other languages
Chinese (zh)
Other versions
CN112951442A (en
Inventor
艾丹妮
路文高
杨健
宋红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BITfiledCriticalBeijing Institute of Technology BIT
Priority to CN202110218435.3ApriorityCriticalpatent/CN112951442B/en
Publication of CN112951442ApublicationCriticalpatent/CN112951442A/en
Application grantedgrantedCritical
Publication of CN112951442BpublicationCriticalpatent/CN112951442B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

The method and the device for the lag analysis of the disease risk of the viral diarrhea of the children can improve the accuracy of infectious disease monitoring, make correct prediction and early warning on outbreak of infectious diseases and assist regional health related departments in better performing prevention and control work on the outbreak of the viral diarrhea. The method comprises the following steps: (1) carrying out statistical analysis on the mean value, standard deviation and time sequence of each viral diarrhea case and each variable by adopting a descriptive analysis method, and carrying out Pearson correlation test; (2) performing data dimensionality reduction by adopting a principal component analysis method aiming at all selected meteorological factors, and extracting principal components as elements for constructing a regression model; (3) selecting a plurality of principal components with highest contribution rate to meteorological parameters as new meteorological factor components, and aiming at the selected principal components, obtaining the principal components with highest contribution rate by adopting the same principal component analysis method to obtain new air quality factors; (4) obtaining a composite hundred-degree search keyword; (5) and constructing a distributed hysteresis nonlinear model.

Description

Translated fromChinese
儿童病毒性腹泻发病风险的滞后分析方法及装置Method and device for hysteresis analysis of the risk of viral diarrhea in children

技术领域technical field

本发明涉及疾病防控的技术领域,尤其涉及一种儿童病毒性腹泻发病风险的滞后分析方法,以及儿童病毒性腹泻发病风险的滞后分析装置。The invention relates to the technical field of disease prevention and control, in particular to a hysteresis analysis method for the onset risk of children's viral diarrhea, and a hysteresis analysis device for the onset risk of children's viral diarrhea.

背景技术Background technique

急性胃肠炎是常见人类消化道疾病,以呕吐、腹泻、发热等症状为主要特征。导致急性胃肠炎的因素有多种,包括细菌、病毒、寄生虫等。其中病毒性腹泻是由多种人类肠道病毒,如轮状病毒、诺如病毒、腺病毒、星状病毒等引起的常见消化道系统疾病,易感人群以5岁以下儿童为主,并且病毒被认为是全世界儿童严重急性腹泻的主要病原体,也是发展中国家儿童死亡的主要原因之一。Acute gastroenteritis is a common human digestive tract disease characterized by vomiting, diarrhea, and fever. There are many factors that cause acute gastroenteritis, including bacteria, viruses, and parasites. Among them, viral diarrhea is a common digestive tract disease caused by a variety of human enteroviruses, such as rotavirus, norovirus, adenovirus, astrovirus, etc. The susceptible population is mainly children under 5 years old, and the virus It is considered to be the leading causative agent of severe acute diarrhea in children worldwide and one of the leading causes of death among children in developing countries.

相关研究表明,诺如病毒为典型的食源性病毒,容易通过不洁净的水源及不洁净的食品传播;轮状病毒能够与空气中的污染粒子形成气溶胶而通过粪便-口腔、污染接触等因素传播。由于病毒性腹泻的传播途径直接贴近人们的日常生活,因而这类疾病能够在全球广泛存在,并且在全年各个季节均有发生。病毒性腹泻可归因于各种气象、水文等环境因素。在现有的证据中,低温、干旱因素对轮状病毒的传播起到促进作用,在温带气候中表现出冬季季节性,而诺如病毒的流行模式较不规则,其高峰可能在季节之间几周或者几个月内转移,表现出季节性的高度可变性。研究发现,在英格兰和威尔士,夏季出生的儿童确认轮状病毒感染的风险更高。在英国、荷兰、土耳其、澳大利亚、德国、印度、哥斯达黎加、尼泊尔等世界各地区的研究表明,轮状病毒引发腹泻的发病风险与气温因素呈现负相关关系,在部分地区如孟加拉国,高温下轮状病毒爆发风险会提升。而河流径流量、河流水位的提升则对诺如病毒的爆发具有促进作用。一些关于海产品养殖环境的研究表明,太阳辐射、水温、盐度等因素能够影响一些作为诺如病毒宿主的海洋产品,从而进一步影响食源性诺如病毒在人群中的爆发。Relevant studies have shown that norovirus is a typical food-borne virus, which is easily transmitted through unclean water sources and unclean food; rotavirus can form aerosols with pollutant particles in the air and pass through feces-oral, pollution contact, etc. factor spread. Because the transmission route of viral diarrhea is directly close to people's daily life, the disease can be widespread in the world and occur in all seasons of the year. Viral diarrhea can be attributed to various meteorological, hydrological and other environmental factors. In the existing evidence, low temperature and drought factors promote the transmission of rotavirus, showing winter seasonality in temperate climates, while the epidemic pattern of norovirus is more irregular, and its peak may be between seasons Metastases over weeks or months, exhibiting a high degree of seasonal variability. In England and Wales, children born in summer have a higher risk of confirmed rotavirus infection, the study found. Studies in the United Kingdom, the Netherlands, Turkey, Australia, Germany, India, Costa Rica, Nepal and other regions of the world have shown that the risk of diarrhea caused by rotavirus is negatively correlated with temperature factors. The risk of coronavirus outbreaks will increase. The increase of river runoff and river water level can promote the outbreak of norovirus. Some studies on the seafood farming environment have shown that factors such as solar radiation, water temperature, and salinity can affect some marine products as hosts of norovirus, thereby further affecting the outbreak of food-borne norovirus in the population.

在中国,环境因子对普遍性的急性胃肠炎和细菌性腹泻与的关联分析已被广泛报道,但环境因子对病毒性腹泻的报道仍然较少。Wang,P.研究了中国香港诺如病毒和轮状病毒感染住院数的季节变化,发现轮状病毒容易在冬季爆发,而诺如病毒则与夏季关联性更强。与此同时,极端降水对诺如病毒的感染风险比微量降雨的更高,但对轮状病毒的感染风险要低。Gao,Y.针对中国无锡地区环境温度与病毒性腹泻感染负担的调查显示,低温与病毒性腹泻的爆发有促进作用,与世界其他地区的研究一致。Ye,Q.调查了中国杭州地区儿童轮状病毒感染率与气温和空气污染物的关联,本研究除了进一步验证了温度与轮状病毒感染率的负相关关系外,还发现温度的变化对轮状病毒检出率具有显著影响。特别值得注意的是,作者发现PM2.5浓度、PM10浓度等空气污染物能够显著增加轮状病毒感染的风险,并观察到剂量、滞后和累积效应。In China, the association analysis of environmental factors on common acute gastroenteritis and bacterial diarrhea has been widely reported, but the report on environmental factors on viral diarrhea is still less. Wang, P. studied the seasonal changes in the number of hospitalizations for norovirus and rotavirus infections in Hong Kong, China, and found that rotavirus outbreaks are more likely to occur in winter, while norovirus is more strongly associated with summer. At the same time, extreme precipitation was associated with a higher risk of norovirus infection than trace rainfall, but a lower risk of rotavirus infection. Gao, Y. A survey of ambient temperature and viral diarrhea infection burden in Wuxi, China, shows that hypothermia is associated with viral diarrhea outbreaks, consistent with studies in other parts of the world. Ye, Q. investigated the association between rotavirus infection rate and air temperature and air pollutants in children in Hangzhou, China. In addition to further verifying the negative correlation between temperature and rotavirus infection rate, this study also found that changes in temperature had a significant impact on the rotavirus infection rate. The virus detection rate has a significant impact. Of particular note, the authors found that air pollutants such as PM2.5 concentration and PM10 concentration can significantly increase the risk of rotavirus infection, and observed dose, lag, and cumulative effects.

到目前为止,研究者已经提出了使用互联网搜索数据跟踪和检测传染病的积极意义。例如在美国,谷歌搜索数据能够提前两周报告流感趋势。其他研究者同样使用搜索查询数据来检测登革热、埃博拉病毒、手足口病等传染病的发病情况。Liu,K.则采用不同时间滞后的复合百度指数与诺如病毒发病数据采用spearman相关方法构建指数曲线模型来拟合2014年中国浙江省诺如病毒疫情相关数据,研究发现每增加一个单位的平均复合百度指数,诺如病毒感染风险增加2.15倍。由于互联网的监控系统利用来自社交媒体、搜索引擎查询数据和新闻报道,因而利用互联网搜索数据可以提高健康检测事件的敏感性和及时性。然而由于媒体、互联网使用行为、地区政策等外部干扰等诸多偏差和问题都会影响搜索引擎查询数据的预测准确性,因此单一使用互联网搜索数据检测传染病的发生具有一定的局限。So far, researchers have suggested the positive implications of using Internet search data to track and detect infectious diseases. In the United States, for example, Google search data can report flu trends two weeks in advance. Other researchers have also used search query data to detect the incidence of infectious diseases such as dengue fever, Ebola virus, hand, foot and mouth disease. Liu, K. used the composite Baidu index and norovirus incidence data with different time lags to construct an exponential curve model using the spearman correlation method to fit the data related to the norovirus epidemic in Zhejiang Province, China in 2014. Compounding the Baidu index, the risk of norovirus infection increased by 2.15 times. Since the monitoring system of the Internet utilizes query data from social media, search engines and news reports, the use of Internet search data can improve the sensitivity and timeliness of health detection events. However, many deviations and problems such as external interference such as media, Internet usage behavior, and regional policies will affect the prediction accuracy of search engine query data. Therefore, the use of Internet search data alone to detect the occurrence of infectious diseases has certain limitations.

发明内容SUMMARY OF THE INVENTION

为克服现有技术的缺陷,本发明要解决的技术问题是提供了一种儿童病毒性腹泻发病风险的滞后分析方法,其能提高传染性疾病监测的准确性,对传染病的爆发做出正确的预测预警,协助地区卫生相关部门更好地开展病毒性腹泻爆发的防控工作。In order to overcome the defects of the prior art, the technical problem to be solved by the present invention is to provide a hysteresis analysis method for the incidence risk of viral diarrhea in children, which can improve the accuracy of infectious disease monitoring and make correct decisions on the outbreak of infectious diseases. to assist regional health-related departments to better carry out prevention and control of viral diarrhea outbreaks.

本发明的技术方案是:这种儿童病毒性腹泻发病风险的滞后分析方法,其包括以下步骤:The technical scheme of the present invention is: the hysteresis analysis method of the onset risk of viral diarrhea in children, which comprises the following steps:

(1)采用描述性分析方法对每日病毒性腹泻病例和各类变量的均值、标准差及时间序列进行统计分析,然后使用皮尔逊相关性检验以评估每日病毒性腹泻感染数量与各类因子之间的相互关系,以判断病例数据与各因子之间的相关性及相关显著程度,挑选与病例数据相关系数绝对值超过0.1,显著性水平小于0.05的因子作进一步分析;(1) Statistical analysis was performed on the mean, standard deviation and time series of daily viral diarrhea cases and various variables using descriptive analysis methods, and then Pearson's correlation test was used to evaluate the relationship between the daily number of viral diarrhea infections and various variables. To determine the correlation and significance of the correlation between the case data and each factor, select the factors whose absolute value of the correlation coefficient with the case data exceeds 0.1 and the significance level is less than 0.05 for further analysis;

(2)针对选中的全部气象因子,采用主成分分析方法进行数据降维,提取主成分作为构建回归模型的要素,每个主成分为公式(2) For all the selected meteorological factors, the principal component analysis method is used to reduce the data dimension, and the principal components are extracted as the elements for constructing the regression model, and each principal component is a formula

(1):(1):

Figure BDA0002948441550000031
Figure BDA0002948441550000031

其中zi为每个气象因子的主成分,xj为每个气象因子序列数据,αj为执行主成分分析后每个主成分每个因子的载荷,n为选中的主成分的数量;wherezi is the principal component of each meteorological factor, xj is the sequence data of each meteorological factor, αj is the loading of each principal component and each factor after performing the principal component analysis, and n is the number of selected principal components;

(3)选取对气象参数贡献率最高的若干个主成分作为新的气象因子成分,所选取的主成分积累贡献率超过90%,针对选中的全部空气质量因子,采用相同的主成分分析方法得到贡献率最高的主成分,得到新的空气质量因子;(3) Select several principal components with the highest contribution rate to meteorological parameters as new meteorological factor components. The cumulative contribution rate of the selected principal components exceeds 90%. For all the selected air quality factors, the same principal component analysis method is used to obtain The principal component with the highest contribution rate is used to obtain a new air quality factor;

(4)针对选中的全部百度搜索数据列,采用公式(2)获得复合百度搜索关键词:(4) For all the selected Baidu search data columns, formula (2) is used to obtain compound Baidu search keywords:

Figure BDA0002948441550000041
Figure BDA0002948441550000041

其中BDI为复合百度搜索指数,xi为各个被选中的百度搜索数据列,βi为每个数据列与疫情序列之间的皮尔逊相关系数,n为选中的百度搜索数据列的数目;Among them, BDI is the composite Baidu search index, xi is each selected Baidu search data column, βi is the Pearson correlation coefficient between each data column and the epidemic series, and n is the number of selected Baidu search data columns;

(5)将上述经过处理后的各类因子纳入到模型构建过程,分布式滞后非线性模型是基于滞后效应的回归模型,模型为公式(3):(5) The above processed factors are incorporated into the model construction process. The distributed lag nonlinear model is a regression model based on the lag effect, and the model is formula (3):

Figure BDA0002948441550000042
Figure BDA0002948441550000042

其中E(Yt)为每日病毒性腹泻发病数量,cb(xi)为每个气象因子主成分的交叉积矩阵,cb(xj)为每个空气质量因子主成分的交叉矩阵,cb(BDI)为复合百度搜索指数的交叉积矩阵;各类要素空间采用自然三次样条函数,样条节点选自对数尺度25%、50%及75%分位数处,自由度df的初始值定为3;在为各类因子建立交叉矩阵时选取21天的滞后期,滞后期自由度df初始值的初始值定为3,模型的混杂要素包括日期因子、星期几因子和季节因子,其中ns为自然三次样条函数,用以控制时间变量的长期趋势,时间因子的自由度df初始值定为每年7个自由度。where E(Yt ) is the daily number of viral diarrhea cases, cb(xi ) is the cross-product matrix of the principal components of each meteorological factor, cb(xj ) is the cross-product matrix of the principal components of each air quality factor, and cb (BDI) is the cross product matrix of the composite Baidu search index; the natural cubic spline function is used in various element spaces, and the spline nodes are selected from the 25%, 50% and 75% quantiles of the logarithmic scale, and the initial degree of freedom df The value is set to 3; a lag period of 21 days is selected when establishing a cross matrix for various factors, and the initial value of the initial value of the degree of freedom df of the lag period is set to 3. The confounding elements of the model include date factor, day of week factor and seasonal factor. Among them, ns is the natural cubic spline function, which is used to control the long-term trend of the time variable. The initial value of the degree of freedom df of the time factor is set to 7 degrees of freedom per year.

本发明通过将互联网查询数据和传统监控结合,通过部分气象因子、空气质量因子以及互联网搜索数据对中国温带地区5岁以下儿童病毒性腹泻发病风险的滞后依赖性,为儿童病毒性腹泻的发病风险提供外部自然环境、社会活动等多方面视角,协助地区卫生相关部门更好地开展病毒性腹泻爆发的防控工作能提高传染性疾病监测的准确性,对传染病的爆发做出正确的预测预警。The present invention combines Internet query data with traditional monitoring, and through the hysteresis dependence of some meteorological factors, air quality factors and Internet search data on the incidence risk of viral diarrhea in children under 5 years old in temperate regions of China, the incidence risk of viral diarrhea in children Provide the external natural environment, social activities and other perspectives, and assist local health departments to better carry out the prevention and control of viral diarrhea outbreaks, which can improve the accuracy of infectious disease monitoring and make correct predictions and early warnings for infectious disease outbreaks. .

还提供了儿童病毒性腹泻发病风险的滞后分析装置,其包括:Also provided is a hysteresis analysis device for the risk of developing viral diarrhea in children, including:

数据采集选择模块,其配置来采用描述性分析方法对每日病毒性腹泻病例和各类变量的均值、标准差及时间序列进行统计分析,然后使用皮尔逊相关性检验以评估每日病毒性腹泻感染数量与各类因子之间的相互关系,以判断病例数据与各因子之间的相关性及相关显著程度,挑选与病例数据相关系数绝对值超过0.1,显著性水平小于0.05的因子作进一步分析;Data collection selection module configured to perform statistical analysis of daily viral diarrhea cases and mean, standard deviation, and time series of various variables using descriptive analysis methods, and then use Pearson's correlation test to assess daily viral diarrhea The relationship between the number of infections and various factors is used to determine the correlation and significance between the case data and each factor, and select the factors whose absolute value of the correlation coefficient with the case data exceeds 0.1 and the significance level is less than 0.05 for further analysis. ;

数据降维模块,其配置来针对选中的全部气象因子,采用主成分分析方法进行数据降维,提取主成分作为构建回归模型的要素,每个主成分为公式(1):The data dimension reduction module is configured to reduce the dimension of the data by using the principal component analysis method for all the selected meteorological factors, and extract the principal components as the elements for constructing the regression model. Each principal component is formula (1):

Figure BDA0002948441550000051
Figure BDA0002948441550000051

其中zi为每个气象因子的主成分,xj为每个气象因子序列数据,αj为执行主成分分析后每个主成分每个因子的载荷,n为选中的主成分的数量;wherezi is the principal component of each meteorological factor, xj is the sequence data of each meteorological factor, αj is the loading of each principal component and each factor after performing the principal component analysis, and n is the number of selected principal components;

空气质量因子获取模块,其配置来选取对气象参数贡献率最高的若干个主成分作为新的气象因子成分,所选取的主成分积累贡献率超过90%,针对选中的全部空气质量因子,采用相同的主成分分析方法得到贡献率最高的主成分,得到新的空气质量因子;The air quality factor acquisition module is configured to select several principal components with the highest contribution rate to meteorological parameters as new meteorological factor components. The cumulative contribution rate of the selected principal components exceeds 90%. For all selected air quality factors, the same The principal component analysis method of , obtains the principal component with the highest contribution rate, and obtains a new air quality factor;

百度搜索数据模块,其配置来针对选中的全部百度搜索数据列,采用公式(2)获得复合百度搜索关键词:The Baidu search data module is configured to use formula (2) to obtain compound Baidu search keywords for all the selected Baidu search data columns:

Figure BDA0002948441550000061
Figure BDA0002948441550000061

其中BDI为复合百度搜索指数,xi为各个被选中的百度搜索数据列,βi为每个数据列与疫情序列之间的皮尔逊相关系数,n为选中的百度搜索数据列的数目;Among them, BDI is the composite Baidu search index, xi is each selected Baidu search data column, βi is the Pearson correlation coefficient between each data column and the epidemic series, and n is the number of selected Baidu search data columns;

模型构建模块,其配置来将上述经过处理后的各类因子纳入到模型构建过程,分布式滞后非线性模型是基于滞后效应的回归模型,模型为公式(3):The model building module is configured to incorporate the above processed factors into the model building process. The distributed lag nonlinear model is a regression model based on the lag effect, and the model is formula (3):

Figure BDA0002948441550000062
Figure BDA0002948441550000062

其中E(Yt)为每日病毒性腹泻发病数量,cb(xi)为每个气象因子主成分的交叉积矩阵,cb(xj)为每个空气质量因子主成分的交叉矩阵,cb(BDI)为复合百度搜索指数的交叉积矩阵;各类要素空间采用自然三次样条函数,样条节点选自对数尺度25%、50%及75%分位数处,自由度df的初始值定为3;在为各类因子建立交叉矩阵时选取21天的滞后期,滞后期自由度df初始值的初始值定为3,模型的混杂要素包括日期因子、星期几因子和季节因子,其中ns为自然三次样条函数,用以控制时间变量的长期趋势,时间因子的自由度df初始值定为每年7个自由度。where E(Yt ) is the daily number of viral diarrhea cases, cb(xi ) is the cross-product matrix of the principal components of each meteorological factor, cb(xj ) is the cross-product matrix of the principal components of each air quality factor, and cb (BDI) is the cross product matrix of the composite Baidu search index; the natural cubic spline function is used in various element spaces, and the spline nodes are selected from the 25%, 50% and 75% quantiles of the logarithmic scale, and the initial degree of freedom df The value is set to 3; a lag period of 21 days is selected when establishing a cross matrix for various factors, and the initial value of the initial value of the degree of freedom df of the lag period is set to 3. The confounding elements of the model include date factor, day of week factor and seasonal factor. Among them, ns is the natural cubic spline function, which is used to control the long-term trend of the time variable. The initial value of the degree of freedom df of the time factor is set to 7 degrees of freedom per year.

附图说明Description of drawings

图1是根据本发明的儿童病毒性腹泻发病风险的滞后分析方法的流程图。FIG. 1 is a flow chart of a method for hysteresis analysis of the risk of developing viral diarrhea in children according to the present invention.

具体实施方式Detailed ways

如图1所示,这种儿童病毒性腹泻发病风险的滞后分析方法,其包括以下步骤:As shown in Figure 1, this lag analysis method for the risk of developing viral diarrhea in children includes the following steps:

(1)采用描述性分析方法对每日病毒性腹泻病例和各类变量的均值、标准差及时间序列进行统计分析,然后使用皮尔逊相关性检验以评估每日病毒性腹泻感染数量与各类因子之间的相互关系,以判断病例数据与各因子之间的相关性及相关显著程度,挑选与病例数据相关系数绝对值超过0.1,显著性水平小于0.05的因子作进一步分析;(1) Statistical analysis was performed on the mean, standard deviation and time series of daily viral diarrhea cases and various variables using descriptive analysis methods, and then Pearson's correlation test was used to evaluate the relationship between the daily number of viral diarrhea infections and various variables. To determine the correlation and significance of the correlation between the case data and each factor, select the factors whose absolute value of the correlation coefficient with the case data exceeds 0.1 and the significance level is less than 0.05 for further analysis;

(2)针对选中的全部气象因子,采用主成分分析方法进行数据降维,提取主成分作为构建回归模型的要素,每个主成分为公式(2) For all the selected meteorological factors, the principal component analysis method is used to reduce the data dimension, and the principal components are extracted as the elements for constructing the regression model, and each principal component is a formula

(1):(1):

Figure BDA0002948441550000071
Figure BDA0002948441550000071

其中zi为每个气象因子的主成分,xj为每个气象因子序列数据,αj为执行主成分分析后每个主成分每个因子的载荷,n为选中的主成分的数量;wherezi is the principal component of each meteorological factor, xj is the sequence data of each meteorological factor, αj is the loading of each principal component and each factor after performing the principal component analysis, and n is the number of selected principal components;

(3)选取对气象参数贡献率最高的若干个主成分作为新的气象因子成分,所选取的主成分积累贡献率超过90%,针对选中的全部空气质量因子,采用相同的主成分分析方法得到贡献率最高的主成分,得到新的空气质量因子;(3) Select several principal components with the highest contribution rate to meteorological parameters as new meteorological factor components. The cumulative contribution rate of the selected principal components exceeds 90%. For all the selected air quality factors, the same principal component analysis method is used to obtain The principal component with the highest contribution rate is used to obtain a new air quality factor;

(4)针对选中的全部百度搜索数据列,采用公式(2)获得复合百度搜索关键词:(4) For all the selected Baidu search data columns, formula (2) is used to obtain compound Baidu search keywords:

Figure BDA0002948441550000072
Figure BDA0002948441550000072

其中BDI为复合百度搜索指数,xi为各个被选中的百度搜索数据列,βi为每个数据列与疫情序列之间的皮尔逊相关系数,n为选中的百度搜索数据列的数目;Among them, BDI is the composite Baidu search index, xi is each selected Baidu search data column, βi is the Pearson correlation coefficient between each data column and the epidemic series, and n is the number of selected Baidu search data columns;

(5)将上述经过处理后的各类因子纳入到模型构建过程,分布式滞后非线性模型是基于滞后效应的回归模型,模型为公式(3):(5) The above processed factors are incorporated into the model construction process. The distributed lag nonlinear model is a regression model based on the lag effect, and the model is formula (3):

Figure BDA0002948441550000081
Figure BDA0002948441550000081

其中E(Yt)为每日病毒性腹泻发病数量,cb(xi)为每个气象因子主成分的交叉积矩阵,cb(xj)为每个空气质量因子主成分的交叉矩阵,cb(BDI)为复合百度搜索指数的交叉积矩阵;各类要素空间采用自然三次样条函数,样条节点选自对数尺度25%、50%及75%分位数处,自由度df的初始值定为3;在为各类因子建立交叉矩阵时选取21天的滞后期,滞后期自由度df初始值的初始值定为3,模型的混杂要素包括日期因子、星期几因子和季节因子,其中ns为自然三次样条函数,用以控制时间变量的长期趋势,时间因子的自由度df初始值定为每年7个自由度。where E(Yt ) is the daily number of viral diarrhea cases, cb(xi ) is the cross-product matrix of the principal components of each meteorological factor, cb(xj ) is the cross-product matrix of the principal components of each air quality factor, and cb (BDI) is the cross product matrix of the composite Baidu search index; the natural cubic spline function is used in various element spaces, and the spline nodes are selected from the 25%, 50% and 75% quantiles of the logarithmic scale, and the initial degree of freedom df The value is set to 3; a lag period of 21 days is selected when establishing a cross matrix for various factors, and the initial value of the initial value of the degree of freedom df of the lag period is set to 3. The confounding elements of the model include date factor, day of week factor and seasonal factor. Among them, ns is the natural cubic spline function, which is used to control the long-term trend of the time variable. The initial value of the degree of freedom df of the time factor is set to 7 degrees of freedom per year.

本发明通过将互联网查询数据和传统监控结合,通过部分气象因子、空气质量因子以及互联网搜索数据对中国温带地区5岁以下儿童病毒性腹泻发病风险的滞后依赖性,为儿童病毒性腹泻的发病风险提供外部自然环境、社会活动等多方面视角,协助地区卫生相关部门更好地开展病毒性腹泻爆发的防控工作能提高传染性疾病监测的准确性,对传染病的爆发做出正确的预测预警。The present invention combines Internet query data with traditional monitoring, and through the hysteresis dependence of some meteorological factors, air quality factors and Internet search data on the incidence risk of viral diarrhea in children under 5 years old in temperate regions of China, the incidence risk of viral diarrhea in children Provide the external natural environment, social activities and other perspectives, and assist local health departments to better carry out the prevention and control of viral diarrhea outbreaks, which can improve the accuracy of infectious disease monitoring and make correct predictions and early warnings for infectious disease outbreaks. .

优选地,在所述步骤(5)中,为了控制过度离散效应,模型中的连接函数采用准泊松函数。Preferably, in the step (5), in order to control the overdispersion effect, the connection function in the model adopts a quasi-Poisson function.

优选地,在所述步骤(5)中,更改模型中的要素交叉积矩阵和日期自由度df值、添加或删除季节因子的方式针对模型进行敏感性分析,根据赤池信息准则AIC对模型进行评价,以确定最终的各个df值。Preferably, in the step (5), sensitivity analysis is performed on the model by changing the element cross-product matrix and the date degree of freedom df value in the model, adding or deleting seasonal factors, and evaluating the model according to the Akaike Information Criterion (AIC). , to determine the final individual df values.

优选地,亚组分析中,5岁以下儿童按照性别及年龄进行分组,采用同一模型针对不同群体进行亚组分析。Preferably, in the subgroup analysis, children under the age of 5 are grouped according to gender and age, and the same model is used to perform subgroup analysis for different groups.

依据中华人民共和国传染病防治法,病毒性腹泻属于C类传染性疾病。2003年以后,中国政府建立了国家法定传染病报告系统,要求临床医生在确诊患者后24小时内,以标准化表格向中国疾病预防控制中心在线报告患者个人信息。优选地,在所述步骤(1)中,从中国疾病预防控制中心收集吉林省2014到2019年的病毒性腹泻案例数据,每条案例包含患者的性别、年龄、发病日期、致病病毒类别;气象数据集由中国气象数据共享服务系统提供,该气象数据集包括蒸发量(毫米)、降水量(毫米)、日照时长、三组地表温度数据(平均地表温度、最大地表温度、最小地表温度(摄氏度))、三组气压数据(平均气压、最大气压、最小气压(百帕))、两组相对湿度数据(平均相对湿度、最小相对湿度(百分率))、三组气温数据(平均气温、最高气温、最低气温(摄氏度))、三组风速数据(平均风速、最大风速、极大风速(米每秒)),将吉林省30个监测点所监测的上述每一项气象因子按日做算术平均,获得吉林省的17种气象因子时间序列数据;空气质量数据从中国空气质量在线监测分析平台获取,获取吉林省9个市级行政单位的每日AQI指数、PM2.5浓度、PM10浓度、CO浓度、NO2浓度、SO2浓度、O3浓度共计7个空气质量因子的时间序列数据,并将9个监测区域上述每一项空气质量因子按日做算术平均,获得吉林省的7项空气质量因子时间序列数据;在中国,百度是市场占比最高的搜索引擎,中国疾病预防控制中心病毒病所针对病毒性腹泻相关症状、致病因子、预防与治疗产品而提供的多达20个关键词,选取吉林省相应时间段内上述所有关键词的百度搜索指数时间序列数据。According to the Law of the People's Republic of China on the Prevention and Control of Infectious Diseases, viral diarrhea is a Class C infectious disease. After 2003, the Chinese government established a national notifiable infectious disease reporting system, requiring clinicians to report patients’ personal information online in a standardized form to the Chinese Center for Disease Control and Prevention within 24 hours of diagnosing a patient. Preferably, in the step (1), the viral diarrhea case data in Jilin Province from 2014 to 2019 are collected from the Chinese Center for Disease Control and Prevention, and each case includes the patient's gender, age, date of onset, and pathogenic virus category; The meteorological data set is provided by the China Meteorological Data Sharing Service System. The meteorological data set includes evaporation (mm), precipitation (mm), sunshine duration, and three sets of surface temperature data (average surface temperature, maximum surface temperature, minimum surface temperature ( Celsius)), three groups of air pressure data (average air pressure, maximum air pressure, minimum air pressure (hPa)), two groups of relative humidity data (average relative humidity, minimum relative humidity (percentage)), three groups of air temperature data (average air temperature, maximum Temperature, minimum temperature (degree Celsius), three sets of wind speed data (average wind speed, maximum wind speed, maximum wind speed (meters per second)), each of the above meteorological factors monitored by the 30 monitoring points in Jilin Province is calculated on a daily basis On average, the time series data of 17 meteorological factors in Jilin Province were obtained; the air quality data was obtained from the China Air Quality Online Monitoring and Analysis Platform, and the daily AQI index, PM2.5 concentration, PM10 concentration, daily AQI index, PM2.5 concentration, PM10 concentration, The time-series data of 7 air quality factors, including CO concentration, NO2 concentration, SO2 concentration, and O3 concentration, and the arithmetic average of each of the above air quality factors in the 9 monitoring areas on a daily basis to obtain 7 air quality factors in Jilin Province Time-series data; Baidu is the search engine with the highest market share in China. As many as 20 keywords are provided by the Institute of Viral Diseases of the Chinese Center for Disease Control and Prevention for viral diarrhea-related symptoms, pathogenic factors, and prevention and treatment products. Select the Baidu search index time series data of all the above keywords in the corresponding time period of Jilin Province.

本领域普通技术人员可以理解,实现上述实施例方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,所述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,包括上述实施例方法的各步骤,而所述的存储介质可以是:ROM/RAM、磁碟、光盘、存储卡等。因此,与本发明的方法相对应的,本发明还同时包括一种儿童病毒性腹泻发病风险的滞后分析装置,该装置通常以与方法各步骤相对应的功能模块的形式表示。该装置包括:Those of ordinary skill in the art can understand that all or part of the steps in the method of the above-mentioned embodiments can be completed by instructing the relevant hardware through a program, and the program can be stored in a computer-readable storage medium, and the program can be stored in a computer-readable storage medium. During execution, it includes each step of the method in the above embodiment, and the storage medium may be: ROM/RAM, magnetic disk, optical disk, memory card, and the like. Therefore, corresponding to the method of the present invention, the present invention also includes a hysteresis analysis device for the onset risk of viral diarrhea in children, which is usually expressed in the form of functional modules corresponding to each step of the method. The device includes:

数据采集选择模块,其配置来采用描述性分析方法对每日病毒性腹泻病例和各类变量的均值、标准差及时间序列进行统计分析,然后使用皮尔逊相关性检验以评估每日病毒性腹泻感染数量与各类因子之间的相互关系,以判断病例数据与各因子之间的相关性及相关显著程度,挑选与病例数据相关系数绝对值超过0.1,显著性水平小于0.05的因子作进一步分析;Data collection selection module configured to perform statistical analysis of daily viral diarrhea cases and mean, standard deviation, and time series of various variables using descriptive analysis methods, and then use Pearson's correlation test to assess daily viral diarrhea The relationship between the number of infections and various factors is used to determine the correlation and significance between the case data and each factor, and select the factors whose absolute value of the correlation coefficient with the case data exceeds 0.1 and the significance level is less than 0.05 for further analysis. ;

数据降维模块,其配置来针对选中的全部气象因子,采用主成分分析方法进行数据降维,提取主成分作为构建回归模型的要素,每个主成分为公式(1):The data dimension reduction module is configured to reduce the dimension of the data by using the principal component analysis method for all the selected meteorological factors, and extract the principal components as the elements for constructing the regression model. Each principal component is formula (1):

Figure BDA0002948441550000101
Figure BDA0002948441550000101

其中zi为每个气象因子的主成分,xj为每个气象因子序列数据,αj为执行主成分分析后每个主成分每个因子的载荷,n为选中的主成分的数量;wherezi is the principal component of each meteorological factor, xj is the sequence data of each meteorological factor, αj is the loading of each principal component and each factor after performing the principal component analysis, and n is the number of selected principal components;

空气质量因子获取模块,其配置来选取对气象参数贡献率最高的若干个主成分作为新的气象因子成分,所选取的主成分积累贡献率超过90%,针对选中的全部空气质量因子,采用相同的主成分分析方法得到贡献率最高的主成分,得到新的空气质量因子;The air quality factor acquisition module is configured to select several principal components with the highest contribution rate to meteorological parameters as new meteorological factor components. The cumulative contribution rate of the selected principal components exceeds 90%. For all selected air quality factors, the same The principal component analysis method of , obtains the principal component with the highest contribution rate, and obtains a new air quality factor;

百度搜索数据模块,其配置来针对选中的全部百度搜索数据列,采用公式(2)获得复合百度搜索关键词:The Baidu search data module is configured to use formula (2) to obtain compound Baidu search keywords for all the selected Baidu search data columns:

Figure BDA0002948441550000111
Figure BDA0002948441550000111

其中BDI为复合百度搜索指数,xi为各个被选中的百度搜索数据列,βi为每个数据列与疫情序列之间的皮尔逊相关系数,n为选中的百度搜索数据列的数目;Among them, BDI is the composite Baidu search index, xi is each selected Baidu search data column, βi is the Pearson correlation coefficient between each data column and the epidemic series, and n is the number of selected Baidu search data columns;

模型构建模块,其配置来将上述经过处理后的各类因子纳入到模型构建过程,分布式滞后非线性模型是基于滞后效应的回归模型,模型为公式(3):The model building module is configured to incorporate the above processed factors into the model building process. The distributed lag nonlinear model is a regression model based on the lag effect, and the model is formula (3):

Figure BDA0002948441550000112
Figure BDA0002948441550000112

其中E(Yt)为每日病毒性腹泻发病数量,cb(xi)为每个气象因子主成分的交叉积矩阵,cb(xj)为每个空气质量因子主成分的交叉矩阵,cb(BDI)为复合百度搜索指数的交叉积矩阵;各类要素空间采用自然三次样条函数,样条节点选自对数尺度25%、50%及75%分位数处,自由度df的初始值定为3;在为各类因子建立交叉矩阵时选取21天的滞后期,滞后期自由度df初始值的初始值定为3,模型的混杂要素包括日期因子、星期几因子和季节因子,其中ns为自然三次样条函数,用以控制时间变量的长期趋势,时间因子的自由度df初始值定为每年7个自由度。where E(Yt ) is the daily number of viral diarrhea cases, cb(xi ) is the cross-product matrix of the principal components of each meteorological factor, cb(xj ) is the cross-product matrix of the principal components of each air quality factor, and cb (BDI) is the cross product matrix of the composite Baidu search index; the natural cubic spline function is used in various element spaces, and the spline nodes are selected from the 25%, 50% and 75% quantiles of the logarithmic scale, and the initial degree of freedom df The value is set to 3; a lag period of 21 days is selected when establishing a cross matrix for various factors, and the initial value of the initial value of the degree of freedom df of the lag period is set to 3. The confounding elements of the model include date factor, day of week factor and seasonal factor. Among them, ns is the natural cubic spline function, which is used to control the long-term trend of the time variable. The initial value of the degree of freedom df of the time factor is set to 7 degrees of freedom per year.

优选地,所述模型构建模块,为了控制过度离散效应,模型中的连接函数采用准泊松函数。Preferably, in the model building module, in order to control the overdispersion effect, the connection function in the model adopts a quasi-Poisson function.

优选地,所述模型构建模块,更改模型中的要素交叉积矩阵和日期自由度df值、添加或删除季节因子的方式针对模型进行敏感性分析,根据赤池信息准则AIC对模型进行评价,以确定最终的各个df值。Preferably, the model building module performs sensitivity analysis on the model by changing the cross-product matrix of elements and the df value of the date degree of freedom in the model, adding or deleting seasonal factors, and evaluating the model according to the Akaike Information Criterion AIC to determine The final individual df values.

优选地,亚组分析中,5岁以下儿童按照性别及年龄进行分组,采用同一模型针对不同群体进行亚组分析。Preferably, in the subgroup analysis, children under the age of 5 are grouped according to gender and age, and the same model is used to perform subgroup analysis for different groups.

优选地,所述数据采集选择模块,从中国疾病预防控制中心收集吉林省2014到2019年的病毒性腹泻案例数据,每条案例包含患者的性别、年龄、发病日期、致病病毒类别;气象数据集由中国气象数据共享服务系统提供,该气象数据集包括蒸发量、降水量、日照时长、三组地表温度数据、三组气压数据、两组相对湿度数据、三组气温数据、三组风速数据,将吉林省30个监测点所监测的上述每一项气象因子按日做算术平均,获得吉林省的17种气象因子时间序列数据;空气质量数据从中国空气质量在线监测分析平台获取,获取吉林省9个市级行政单位的每日AQI指数、PM2.5浓度、PM10浓度、CO浓度、NO2浓度、SO2浓度、O3浓度共计7个空气质量因子的时间序列数据,并将9个监测区域上述每一项空气质量因子按日做算术平均,获得吉林省的7项空气质量因子时间序列数据;中国疾病预防控制中心病毒病所针对病毒性腹泻相关症状、致病因子、预防与治疗产品提供多达20个关键词,选取吉林省相应时间段内上述所有关键词的百度搜索指数时间序列数据。Preferably, the data collection selection module collects viral diarrhea case data in Jilin Province from 2014 to 2019 from the Chinese Center for Disease Control and Prevention, and each case includes the patient's gender, age, date of onset, and type of pathogenic virus; meteorological data The meteorological data set is provided by the China Meteorological Data Sharing Service System. The meteorological data set includes evaporation, precipitation, sunshine duration, three sets of surface temperature data, three sets of air pressure data, two sets of relative humidity data, three sets of air temperature data, and three sets of wind speed data. , each of the above-mentioned meteorological factors monitored by 30 monitoring points in Jilin Province was arithmetically averaged on a daily basis to obtain the time series data of 17 meteorological factors in Jilin Province; the air quality data was obtained from the China Air Quality Online Monitoring and Analysis Platform. The daily AQI index, PM2.5 concentration, PM10 concentration, CO concentration, NO2 concentration, SO2 concentration, and O3 concentration of 9 municipal administrative units in the province are the time series data of 7 air quality factors, and the 9 monitoring areas above Each air quality factor is arithmetically averaged on a daily basis to obtain time-series data of 7 air quality factors in Jilin Province; the Institute of Viral Diseases of the Chinese Center for Disease Control and Prevention provides a variety of symptoms, pathogenic factors, prevention and treatment products for viral diarrhea. Up to 20 keywords, select the Baidu search index time series data of all the above keywords in the corresponding time period of Jilin Province.

以上所述,仅是本发明的较佳实施例,并非对本发明作任何形式上的限制,凡是依据本发明的技术实质对以上实施例所作的任何简单修改、等同变化与修饰,均仍属本发明技术方案的保护范围。The above are only preferred embodiments of the present invention, and do not limit the present invention in any form. Any simple modifications, equivalent changes and modifications made to the above embodiments according to the technical essence of the present invention still belong to the present invention The protection scope of the technical solution of the invention.

Claims (10)

Translated fromChinese
1.儿童病毒性腹泻发病风险的滞后分析方法,其特征在于:其包括以下步骤:1. The hysteresis analysis method of the onset risk of viral diarrhea in children is characterized in that: it comprises the following steps:(1)采用描述性分析方法对每日病毒性腹泻病例和各类变量的均值、标准差及时间序列进行统计分析,然后使用皮尔逊相关性检验以评估每日病毒性腹泻感染数量与各类因子之间的相互关系,以判断病例数据与各因子之间的相关性及相关显著程度,挑选与病例数据相关系数绝对值超过0.1,显著性水平小于0.05的因子作进一步分析;(1) Statistical analysis was performed on the mean, standard deviation and time series of daily viral diarrhea cases and various variables using descriptive analysis methods, and then Pearson's correlation test was used to evaluate the relationship between the daily number of viral diarrhea infections and various variables. To determine the correlation and significance of the correlation between the case data and each factor, select the factors whose absolute value of the correlation coefficient with the case data exceeds 0.1 and the significance level is less than 0.05 for further analysis;(2)针对选中的全部气象因子,采用主成分分析方法进行数据降维,提取主成分作为构建回归模型的要素,每个主成分为公式(1):(2) For all the selected meteorological factors, the principal component analysis method is used to reduce the dimension of the data, and the principal components are extracted as the elements for constructing the regression model. Each principal component is the formula (1):
Figure FDA0002948441540000012
Figure FDA0002948441540000012
其中zi为每个气象因子的主成分,xj为每个气象因子序列数据,αj为执行主成分分析后每个主成分每个因子的载荷,n为选中的主成分的数量;wherezi is the principal component of each meteorological factor, xj is the sequence data of each meteorological factor, αj is the loading of each principal component and each factor after performing the principal component analysis, and n is the number of selected principal components;(3)选取对气象参数贡献率最高的若干个主成分作为新的气象因子成分,所选取的主成分积累贡献率超过90%,针对选中的全部空气质量因子,采用相同的主成分分析方法得到贡献率最高的主成分,得到新的空气质量因子;(3) Select several principal components with the highest contribution rate to meteorological parameters as new meteorological factor components. The cumulative contribution rate of the selected principal components exceeds 90%. For all the selected air quality factors, the same principal component analysis method is used to obtain The principal component with the highest contribution rate is used to obtain a new air quality factor;(4)针对选中的全部百度搜索数据列,采用公式(2)获得复合百度搜索关键词:(4) For all the selected Baidu search data columns, formula (2) is used to obtain compound Baidu search keywords:
Figure FDA0002948441540000011
Figure FDA0002948441540000011
其中BDI为复合百度搜索指数,xi为各个被选中的百度搜索数据列,βi为每个数据列与疫情序列之间的皮尔逊相关系数,n为选中的百度搜索数据列的数目;Among them, BDI is the composite Baidu search index, xi is each selected Baidu search data column, βi is the Pearson correlation coefficient between each data column and the epidemic series, and n is the number of selected Baidu search data columns;(5)将上述经过处理后的各类因子纳入到模型构建过程,分布式滞后非线性模型是基于滞后效应的回归模型,模型为公式(3):(5) The above processed factors are incorporated into the model construction process. The distributed lag nonlinear model is a regression model based on the lag effect, and the model is formula (3):
Figure FDA0002948441540000021
Figure FDA0002948441540000021
其中E(Yt)为每日病毒性腹泻发病数量,cb(xi)为每个气象因子主成分的交叉积矩阵,cb(xj)为每个空气质量因子主成分的交叉矩阵,cb(BDI)为复合百度搜索指数的交叉积矩阵;各类要素空间采用自然三次样条函数,样条节点选自对数尺度25%、50%及75%分位数处,自由度df的初始值定为3;在为各类因子建立交叉矩阵时选取21天的滞后期,滞后期自由度df初始值的初始值定为3,模型的混杂要素包括日期因子、星期几因子和季节因子,其中ns为自然三次样条函数,用以控制时间变量的长期趋势,时间因子的自由度df初始值定为每年7个自由度。where E(Yt ) is the daily number of viral diarrhea cases, cb(xi ) is the cross-product matrix of the principal components of each meteorological factor, cb(xj ) is the cross-product matrix of the principal components of each air quality factor, and cb (BDI) is the cross product matrix of the composite Baidu search index; the natural cubic spline function is used in various element spaces, and the spline nodes are selected from the 25%, 50% and 75% quantiles of the logarithmic scale, and the initial degree of freedom df The value is set to 3; a lag period of 21 days is selected when establishing a cross matrix for various factors, and the initial value of the initial value of the degree of freedom df of the lag period is set to 3. The confounding elements of the model include date factor, day of week factor and seasonal factor. Among them, ns is the natural cubic spline function, which is used to control the long-term trend of the time variable. The initial value of the degree of freedom df of the time factor is set to 7 degrees of freedom per year.2.根据权利要求1所述的儿童病毒性腹泻发病风险的滞后分析方法,其特征在于:在所述步骤(5)中,为了控制过度离散效应,模型中的连接函数采用准泊松函数。2 . The method for hysteresis analysis of the incidence risk of viral diarrhea in children according to claim 1 , wherein in the step (5), in order to control the overdispersion effect, the connection function in the model adopts a quasi-Poisson function. 3 .3.根据权利要求2所述的儿童病毒性腹泻发病风险的滞后分析方法,其特征在于:在所述步骤(5)中,更改模型中的要素交叉积矩阵和日期自由度df值、添加或删除季节因子的方式针对模型进行敏感性分析,根据赤池信息准则AIC对模型进行评价,以确定最终的各个df值。3. the hysteresis analysis method of the incidence risk of children's viral diarrhea according to claim 2, is characterized in that: in described step (5), change the element cross-product matrix in the model and the date degree of freedom df value, add or Sensitivity analysis was performed on the model by removing the seasonal factor, and the model was evaluated according to the Akaike Information Criterion (AIC) to determine the final df values.4.根据权利要求3所述的儿童病毒性腹泻发病风险的滞后分析方法,其特征在于:亚组分析中,5岁以下儿童按照性别及年龄进行分组,采用同一模型针对不同群体进行亚组分析。4. The hysteresis analysis method of the incidence risk of children's viral diarrhea according to claim 3, is characterized in that: in subgroup analysis, children under 5 years old are grouped according to gender and age, and the same model is used to carry out subgroup analysis for different groups .5.根据权利要求4所述的儿童病毒性腹泻发病风险的滞后分析方法,其特征在于:在所述步骤(1)中,从中国疾病预防控制中心收集吉林省2014到2019年的病毒性腹泻案例数据,每条案例包含患者的性别、年龄、发病日期、致病病毒类别;气象数据集由中国气象数据共享服务系统提供,该气象数据集包括蒸发量、降水量、日照时长、三组地表温度数据、三组气压数据、两组相对湿度数据、三组气温数据、三组风速数据,将吉林省30个监测点所监测的上述每一项气象因子按日做算术平均,获得吉林省的17种气象因子时间序列数据;空气质量数据从中国空气质量在线监测分析平台获取,获取吉林省9个市级行政单位的每日AQI指数、PM2.5浓度、PM10浓度、CO浓度、NO2浓度、SO2浓度、O3浓度共计7个空气质量因子的时间序列数据,并将9个监测区域上述每一项空气质量因子按日做算术平均,获得吉林省的7项空气质量因子时间序列数据;中国疾病预防控制中心病毒病所针对病毒性腹泻相关症状、致病因子、预防与治疗产品提供多达20个关键词,选取吉林省相应时间段内上述所有关键词的百度搜索指数时间序列数据。5. The hysteresis analysis method of the onset risk of viral diarrhea in children according to claim 4, characterized in that: in the step (1), the viral diarrhea in Jilin Province from 2014 to 2019 was collected from the Chinese Center for Disease Control and Prevention Case data, each case includes the patient's gender, age, date of onset, and type of pathogenic virus; the meteorological data set is provided by the China Meteorological Data Sharing Service System. The meteorological data set includes evaporation, precipitation, sunshine duration, three groups of surface Temperature data, three sets of air pressure data, two sets of relative humidity data, three sets of air temperature data, and three sets of wind speed data, each of the above-mentioned meteorological factors monitored by 30 monitoring points in Jilin Province is arithmetically averaged on a daily basis. Time series data of 17 meteorological factors; air quality data is obtained from China Air Quality Online Monitoring and Analysis Platform, and daily AQI index, PM2.5 concentration, PM10 concentration, CO concentration, NO2 concentration, daily AQI index, PM2. Time series data of 7 air quality factors in total of SO2 concentration and O3 concentration, and arithmetic average of each of the above air quality factors in 9 monitoring areas on a daily basis to obtain the time series data of 7 air quality factors in Jilin Province; The Viral Disease Institute of the Prevention and Control Center provides up to 20 keywords for viral diarrhea-related symptoms, pathogenic factors, prevention and treatment products, and selects the Baidu search index time series data of all the above keywords in the corresponding time period in Jilin Province.6.儿童病毒性腹泻发病风险的滞后分析装置,其特征在于:其包括:数据采集选择模块,其配置来采用描述性分析方法对每日病毒性腹泻病例和各类变量的均值、标准差及时间序列进行统计分析,然后使用皮尔逊相关性检验以评估每日病毒性腹泻感染数量与各类因子之间的相互关系,以判断病例数据与各因子之间的相关性及相关显著程度,挑选与病例数据相关系数绝对值超过0.1,显著性水平小于0.05的因子作进一步分析;6. The device for hysteresis analysis of the incidence risk of viral diarrhea in children, characterized in that: it comprises: a data acquisition and selection module, which is configured to use a descriptive analysis method to analyze the daily viral diarrhea cases and the mean, standard deviation and value of various variables. Statistical analysis was performed on the time series, and then the Pearson correlation test was used to evaluate the relationship between the number of daily viral diarrhea infections and various factors, so as to determine the correlation between the case data and each factor and the significance of the correlation. The absolute value of the correlation coefficient with the case data exceeds 0.1, and the factor whose significance level is less than 0.05 will be further analyzed;数据降维模块,其配置来针对选中的全部气象因子,采用主成分分析方法进行数据降维,提取主成分作为构建回归模型的要素,每个主成分为公式(1):The data dimension reduction module is configured to reduce the dimension of the data by using the principal component analysis method for all the selected meteorological factors, and extract the principal components as the elements for constructing the regression model. Each principal component is formula (1):
Figure FDA0002948441540000041
Figure FDA0002948441540000041
其中zi为每个气象因子的主成分,xj为每个气象因子序列数据,αj为执行主成分分析后每个主成分每个因子的载荷,n为选中的主成分的数量;wherezi is the principal component of each meteorological factor, xj is the sequence data of each meteorological factor, αj is the loading of each principal component and each factor after performing the principal component analysis, and n is the number of selected principal components;空气质量因子获取模块,其配置来选取对气象参数贡献率最高的若干个主成分作为新的气象因子成分,所选取的主成分积累贡献率超过90%,针对选中的全部空气质量因子,采用相同的主成分分析方法得到贡献率最高的主成分,得到新的空气质量因子;The air quality factor acquisition module is configured to select several principal components with the highest contribution rate to meteorological parameters as new meteorological factor components. The cumulative contribution rate of the selected principal components exceeds 90%. For all selected air quality factors, the same The principal component analysis method of , obtains the principal component with the highest contribution rate, and obtains a new air quality factor;百度搜索数据模块,其配置来针对选中的全部百度搜索数据列,采用公式(2)获得复合百度搜索关键词:The Baidu search data module is configured to use formula (2) to obtain compound Baidu search keywords for all the selected Baidu search data columns:
Figure FDA0002948441540000042
Figure FDA0002948441540000042
其中BDI为复合百度搜索指数,xi为各个被选中的百度搜索数据列,βi为每个数据列与疫情序列之间的皮尔逊相关系数,n为选中的百度搜索数据列的数目;Among them, BDI is the composite Baidu search index, xi is each selected Baidu search data column, βi is the Pearson correlation coefficient between each data column and the epidemic series, and n is the number of selected Baidu search data columns;模型构建模块,其配置来将上述经过处理后的各类因子纳入到模型构建过程,分布式滞后非线性模型是基于滞后效应的回归模型,模型为公式(3):The model building module is configured to incorporate the above processed factors into the model building process. The distributed lag nonlinear model is a regression model based on the lag effect, and the model is formula (3):
Figure FDA0002948441540000051
Figure FDA0002948441540000051
其中E(Yt)为每日病毒性腹泻发病数量,cb(xi)为每个气象因子主成分的交叉积矩阵,cb(xj)为每个空气质量因子主成分的交叉矩阵,cb(BDI)为复合百度搜索指数的交叉积矩阵;各类要素空间采用自然三次样条函数,样条节点选自对数尺度25%、50%及75%分位数处,自由度df的初始值定为3;在为各类因子建立交叉矩阵时选取21天的滞后期,滞后期自由度df初始值的初始值定为3,模型的混杂要素包括日期因子、星期几因子和季节因子,其中ns为自然三次样条函数,用以控制时间变量的长期趋势,时间因子的自由度df初始值定为每年7个自由度。where E(Yt ) is the daily number of viral diarrhea cases, cb(xi ) is the cross-product matrix of the principal components of each meteorological factor, cb(xj ) is the cross-product matrix of the principal components of each air quality factor, and cb (BDI) is the cross product matrix of the composite Baidu search index; the natural cubic spline function is used in various element spaces, and the spline nodes are selected from the 25%, 50% and 75% quantiles of the logarithmic scale, and the initial degree of freedom df The value is set to 3; a lag period of 21 days is selected when establishing a cross matrix for various factors, and the initial value of the initial value of the degree of freedom df of the lag period is set to 3. The confounding elements of the model include date factor, day of week factor and seasonal factor. Among them, ns is the natural cubic spline function, which is used to control the long-term trend of the time variable. The initial value of the degree of freedom df of the time factor is set to 7 degrees of freedom per year.
7.根据权利要求6所述的儿童病毒性腹泻发病风险的滞后分析装置,其特征在于:所述模型构建模块,为了控制过度离散效应,模型中的连接函数采用准泊松函数。7 . The device for hysteresis analysis of the onset risk of viral diarrhea in children according to claim 6 , wherein the model building module adopts a quasi-Poisson function as a connection function in order to control the overdispersion effect. 8 .8.根据权利要求7所述的儿童病毒性腹泻发病风险的滞后分析装置,其特征在于:所述模型构建模块,更改模型中的要素交叉积矩阵和日期自由度df值、添加或删除季节因子的方式针对模型进行敏感性分析,根据赤池信息准则AIC对模型进行评价,以确定最终的各个df值。8. The hysteresis analysis device for the onset risk of viral diarrhea in children according to claim 7, characterized in that: the model building module modifies the element cross-product matrix and the date degree of freedom df value in the model, and adds or deletes seasonal factors. Sensitivity analysis is carried out for the model in the way of , and the model is evaluated according to the Akaike Information Criterion (AIC) to determine the final df value.9.根据权利要求8所述的儿童病毒性腹泻发病风险的滞后分析装置,其特征在于:亚组分析中,5岁以下儿童按照性别及年龄进行分组,采用同一模型针对不同群体进行亚组分析。9. The hysteresis analysis device for the onset risk of viral diarrhea in children according to claim 8, wherein in the subgroup analysis, children under 5 years old are grouped according to gender and age, and the same model is used to carry out subgroup analysis for different groups .10.根据权利要求9所述的儿童病毒性腹泻发病风险的滞后分析装置,其特征在于:所述数据采集选择模块,从中国疾病预防控制中心收集吉林省2014到2019年的病毒性腹泻案例数据,每条案例包含患者的性别、年龄、发病日期、致病病毒类别;气象数据集由中国气象数据共享服务系统提供,该气象数据集包括蒸发量、降水量、日照时长、三组地表温度数据、三组气压数据、两组相对湿度数据、三组气温数据、三组风速数据,将吉林省30个监测点所监测的上述每一项气象因子按日做算术平均,获得吉林省的17种气象因子时间序列数据;空气质量数据从中国空气质量在线监测分析平台获取,获取吉林省9个市级行政单位的每日AQI指数、PM2.5浓度、PM10浓度、CO浓度、NO2浓度、SO2浓度、O3浓度共计7个空气质量因子的时间序列数据,并将9个监测区域上述每一项空气质量因子按日做算术平均,获得吉林省的7项空气质量因子时间序列数据;中国疾病预防控制中心病毒病所针对病毒性腹泻相关症状、致病因子、预防与治疗产品提供多达20个关键词,选取吉林省相应时间段内上述所有关键词的百度搜索指数时间序列数据。10. The device for hysteresis analysis of the onset risk of viral diarrhea in children according to claim 9, wherein the data collection and selection module collects viral diarrhea case data in Jilin Province from 2014 to 2019 from the Chinese Center for Disease Control and Prevention , each case includes the patient's gender, age, date of onset, and type of pathogenic virus; the meteorological data set is provided by the China Meteorological Data Sharing Service System, which includes evaporation, precipitation, sunshine duration, and three sets of surface temperature data. , three sets of air pressure data, two sets of relative humidity data, three sets of air temperature data, and three sets of wind speed data, each of the above meteorological factors monitored by 30 monitoring points in Jilin Province is arithmetically averaged on a daily basis, and 17 kinds of weather in Jilin Province are obtained. Meteorological factor time series data; air quality data is obtained from China Air Quality Online Monitoring and Analysis Platform, and daily AQI index, PM2.5 concentration, PM10 concentration, CO concentration, NO2 concentration, SO2 concentration of 9 municipal-level administrative units in Jilin Province are obtained , O3 concentration, a total of 7 air quality factors time series data, and the arithmetic average of each of the above air quality factors in the 9 monitoring areas on a daily basis to obtain the time series data of 7 air quality factors in Jilin Province; China Disease Control and Prevention The Center for Viral Diseases provides up to 20 keywords for viral diarrhea-related symptoms, pathogenic factors, prevention and treatment products, and selects the Baidu search index time series data of all the above keywords in the corresponding time period in Jilin Province.
CN202110218435.3A2021-02-232021-02-23Hysteresis analysis method and device for child viral diarrhea onset riskActiveCN112951442B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202110218435.3ACN112951442B (en)2021-02-232021-02-23Hysteresis analysis method and device for child viral diarrhea onset risk

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202110218435.3ACN112951442B (en)2021-02-232021-02-23Hysteresis analysis method and device for child viral diarrhea onset risk

Publications (2)

Publication NumberPublication Date
CN112951442A CN112951442A (en)2021-06-11
CN112951442Btrue CN112951442B (en)2022-09-23

Family

ID=76246474

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202110218435.3AActiveCN112951442B (en)2021-02-232021-02-23Hysteresis analysis method and device for child viral diarrhea onset risk

Country Status (1)

CountryLink
CN (1)CN112951442B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN114519308A (en)*2022-02-222022-05-20河南大学Method for determining river water and underground water interconversion lag response time influenced by river water and sand regulation

Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN103793621A (en)*2014-03-062014-05-14上海市浦东新区疾病预防控制中心Comprehensive dysentery monitoring platform
CN104008164A (en)*2014-05-292014-08-27华东师范大学Generalized regression neural network based short-term diarrhea multi-step prediction method
CN111415752A (en)*2020-03-012020-07-14集美大学 A prediction method of hand, foot and mouth disease integrating meteorological factors and search index
CN111430040A (en)*2020-03-032020-07-17广东省公共卫生研究院Hand-foot-and-mouth disease epidemic situation prediction method based on case, weather and pathogen monitoring data

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CA2901275C (en)*2013-02-152023-10-17Battelle Memorial InstituteUse of web-based symptom checker data to predict incidence of a disease or disorder

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN103793621A (en)*2014-03-062014-05-14上海市浦东新区疾病预防控制中心Comprehensive dysentery monitoring platform
CN104008164A (en)*2014-05-292014-08-27华东师范大学Generalized regression neural network based short-term diarrhea multi-step prediction method
CN111415752A (en)*2020-03-012020-07-14集美大学 A prediction method of hand, foot and mouth disease integrating meteorological factors and search index
CN111430040A (en)*2020-03-032020-07-17广东省公共卫生研究院Hand-foot-and-mouth disease epidemic situation prediction method based on case, weather and pathogen monitoring data

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
2007年-2008年吉林市儿童病毒性腹泻病监测结果分析;赵勇等;《中国实验诊断学》;20111125(第11期);82-85*
997例其他感染性腹泻疾病的流行及病原学特征分析;郭雪鸿;《中国卫生标准管理》;20200415(第07期);26-28*
主成分回归分析在细菌性痢疾与气象因素关系中的应用;廖洪秀等;《现代预防医学》;20090310(第05期);813-815*
气象因素对其他感染性腹泻病的影响;陶燕等;《兰州大学学报(自然科学版)》;20151015(第05期);646-651*

Also Published As

Publication numberPublication date
CN112951442A (en)2021-06-11

Similar Documents

PublicationPublication DateTitle
Tilston et al.Internet-based surveillance of Influenza-like-illness in the UK during the 2009 H1N1 influenza pandemic
Wang et al.Propagation from meteorological to hydrological drought and its influencing factors in the Huaihe River Basin
Huang et al.Monitoring hand, foot and mouth disease by combining search engine query data and meteorological factors
Chen et al.Assessing water resources vulnerability by using a rough set cloud model: A case study of the Huai River Basin, China
CN110852493A (en)Atmospheric PM2.5 concentration prediction method based on multiple model comparisons
Karamuz et al.Is it a drought or only a fluctuation in precipitation patterns?—Drought reconnaissance in Poland
Liang et al.Assessing the illegal hunting of native wildlife in China
CN112951442B (en)Hysteresis analysis method and device for child viral diarrhea onset risk
Xu et al.Impact of heatwaves and cold spells on the morbidity of respiratory diseases: a case study in Lanzhou, China
Chattopadhyay et al.Effect of a summer flood on benthic macroinvertebrates in a medium-sized, temperate, lowland river
Zeng et al.A landscape‐level analysis of bird taxonomic, functional and phylogenetic β‐diversity in habitat island systems
CN118657497A (en) A digital twin-based estuary and bay early warning management method and system
CN116596308A (en) Comprehensive evaluation method of heavy metal ecotoxicity risk in river and lake sediments
Che et al.Phylogenetic and functional structure of wintering waterbird communities associated with ecological differences
Zhong et al.Using the apriori algorithm and Copula function for the bivariate analysis of flash flood risk
MarkiewiczDepth–duration–frequency relationship model of extreme precipitation in flood risk assessment in the Upper Vistula Basin
Ghazvinian et al.Investigating the effect of climatic parameters predicting the mortality rate due to cardiovascular and respiratory disease with soft computing methods
CN110583533A (en)Method for screening fish function indicating species in river ecosystem
Qin et al.Bivariate frequency of meteorological drought in the upper Minjiang River based on copula function
Hou et al.Drought hazard analysis in the Jilin province based on a three-dimensional copula method
Deusdará-Leal et al.Trends and climate elasticity of streamflow in south-eastern Brazil basins
Li et al.Missing data imputation for paired stream and air temperature sensor data
Miller et al.Faster indicators of chikungunya incidence using Google searches
Hari Prasad PeriShort-term exposure to air pollution and COVID-19 in India: spatio-temporal analysis of relative risk from 20 metropolitan cities
Chong et al.Sprouting and genetic structure vary with flood disturbance in the tropical riverine paperbark tree, Melaleuca leucadendra (Myrtaceae)

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp