技术领域technical field
本发明涉及网络数据安全领域,特别涉及一种防止网页跟踪的系统及方法。The invention relates to the field of network data security, in particular to a system and method for preventing web page tracking.
背景技术Background technique
目前,很多网站都使用了网页跟踪技术,如在线广告平台(如淘宝联盟)和大型社交网站,通过根据用户的上网行为,记录用户的上网数据,挖掘用户喜好,精确投递广告或提供定制服务。然而用户的隐私数据也在不知不觉中被第三方平台记录,从而留下用户信息被滥用的隐患。At present, many websites use web tracking technology, such as online advertising platforms (such as Taobao Alliance) and large social networking sites. According to the user's online behavior, record the user's online data, mine user preferences, accurately deliver advertisements or provide customized services. However, the user's private data is also unknowingly recorded by the third-party platform, leaving hidden dangers of user information being abused.
尤其对机构而言,如果一个机构(如企业、政府或者军事机构)的上网行为被第三方跟踪记录,从这些数据中就可能分析出该机构在做什么,甚至挖掘出商业机密、军事机密或国家机密,故很多时候用户是不希望被跟踪记录的。Especially for institutions, if the online behavior of an institution (such as an enterprise, government or military institution) is tracked and recorded by a third party, it is possible to analyze what the institution is doing from these data, and even unearth commercial secrets, military secrets or State secrets, so users do not want to be tracked and recorded in many cases.
目前市场上,一些WEB浏览器提供了Do Not Track(DNT)的功能,即在用户选择该功能后,会在HTTP的头部增加一个标识,以提示网站方或广告商拒绝被跟踪,然而是否跟踪还是取决于网站方或广告商,有时,网站方及广告商为了获取用户数据的分析,投放精准广告,而选择忽略该功能,如此依然可以继续跟踪用户。Currently on the market, some WEB browsers provide the Do Not Track (DNT) function, that is, after the user selects this function, a logo will be added to the header of the HTTP to remind the website or advertiser to refuse to be tracked. Tracking still depends on the website or advertiser. Sometimes, in order to obtain user data analysis and deliver accurate advertisements, the website and advertiser choose to ignore this function, so they can still continue to track users.
发明内容Contents of the invention
本发明要解决的技术问题在于,针对相关技术中的不足,提供一种防止网页跟踪的系统及方法,可有效防止用户的上网数据被跟踪。The technical problem to be solved by the present invention is to provide a system and method for preventing web page tracking, which can effectively prevent users' online data from being tracked.
本发明解决上述技术问题所采用的技术方案是:提供一种防止网页跟踪的系统,该系统包括:一种防止网页跟踪的系统,包括:一设置模块,用于设置识别规则和拦截规则;一识别模块,用于侦测HTTP请求,并根据设置的识别规则,识别HTTP请求中是否包含有网页跟踪信息;一拦截模块,用于对所识别的包含有网页跟踪信息的HTTP请求通过拦截规则进行拦截处理,防止网页跟踪信息被发送至跟踪服务器。The technical solution adopted by the present invention to solve the above-mentioned technical problems is to provide a system for preventing web page tracking, which includes: a system for preventing web page tracking, including: a setting module for setting identification rules and interception rules; An identification module is used to detect the HTTP request, and according to the identification rules set, to identify whether the HTTP request contains webpage tracking information; an interception module is used to identify the HTTP request that contains the webpage tracking information through the interception rules Intercept processing to prevent web page tracking information from being sent to the tracking server.
还提供一种防止网页跟踪的方法,包括:Also provides a way to prevent web tracking, including:
设置识别规则和拦截规则;Set identification rules and interception rules;
侦测HTTP请求;Detect HTTP requests;
根据设置的识别规则,识别包含有网页跟踪信息的HTTP请求;Identify HTTP requests containing web page tracking information according to the set identification rules;
对所识别的包含有网页跟踪信息的HTTP请求通过拦截规则进行拦截处理,防止网页跟踪信息被发送至跟踪服务器。The identified HTTP requests containing webpage tracking information are intercepted by interception rules to prevent the webpage tracking information from being sent to the tracking server.
本发明的有益效果是:通过预设的识别规则确定HTTP请求中是否包含网页跟踪信息,若包含,则拦截该HTTP请求,从而防止该网页跟踪信息被传输至跟踪服务器,有效的防止用户数据被跟踪。The beneficial effects of the present invention are: determine whether the HTTP request contains web page tracking information through preset identification rules, and if so, intercept the HTTP request, thereby preventing the web page tracking information from being transmitted to the tracking server, effectively preventing user data from being track.
附图说明Description of drawings
下面将结合附图及实施例对本发明作进一步说明,附图中:The present invention will be further described below in conjunction with accompanying drawing and embodiment, in the accompanying drawing:
图1为本发明一实施方式中的防止网页跟踪的系统的模块及该系统运行环境示意图;Fig. 1 is the module of the system that prevents web page tracking in one embodiment of the present invention and the schematic diagram of the system operating environment;
图2为本发明一实施方式中的防止网页跟踪方法的流程图。FIG. 2 is a flowchart of a method for preventing web page tracking in an embodiment of the present invention.
具体实施方式detailed description
以下结合具体实施例和说明书附图对本发明做进一步详细说明。The present invention will be described in further detail below in conjunction with specific embodiments and accompanying drawings.
如图1所示,为本发明一实施方式中的防止网页跟踪的系统1的模块及该系统1运行环境示意图。该系统1包括设置模块11、识别模块12及拦截模块13。该系统1可独立运行于一电子装置2,也可附加于一网络系统,如防火墙。该电子装置2可通过有线或者无线的方式连接至至少一追踪服务器3。As shown in FIG. 1 , it is a schematic diagram of the modules of the system 1 for preventing web page tracking and the operating environment of the system 1 in an embodiment of the present invention. The system 1 includes a setting module 11 , an identification module 12 and an interception module 13 . The system 1 can run independently on an electronic device 2, and can also be attached to a network system, such as a firewall. The electronic device 2 can be connected to at least one tracking server 3 in a wired or wireless manner.
该设置模块11设置至少一识别规则及一拦截规则。该识别规则用于识别超文本传输协议(HTTP)请求中是否包含网页跟踪信息。该识别规则包括至少一关键字。The setting module 11 sets at least one identification rule and one interception rule. The identification rule is used to identify whether a hypertext transfer protocol (HTTP) request contains web page tracking information. The recognition rule includes at least one keyword.
本实施方式中,该关键字为超文本传输协议(HTTP)请求中的统一资源定位符(URL)中的特定字符串,该特定字符串用于标识该HTTP请求中具有网页跟踪信息。目前的网页跟踪服务工具,如谷歌的谷歌分析(Google Analytic)及百度的百度统计,可在网站中加入一个收集代码,当用户在访问该网站时,该收集代码在浏览器中执行,动态插入一个脚本文件链接,该链接指向一存储于该跟踪服务器3的脚本文件,浏览器通过该文件链接向跟踪服务器3请求该脚本文件,同时会携带一个用于记录用户关键字(ID)的跟踪Cookie,所请求的脚本文件在该浏览器中执行,收集用户信息,如当前浏览网页的标题、上一跳网页的URL、当前网页的Cookie等。浏览器将所收集的信息编码到URL中的HTTP参数中,该URL指向跟踪服务器中的一个透明的图片,然后生成一个包括该URL及跟踪Cookie的HTTP请求,该URL中包括的网页跟踪信息及跟踪Cookie中的数据即为该HTTP请求中包含的网页跟踪信息,通过传输该HTTP请求即可将该HTTP请求中记录的网页跟踪信息传输至跟踪服务器3,从而实现对用户数据的收集。在该URL中具有一个标识该URL具有网页跟踪信息的代码,如在URL“http://xxx.com/a.gif?a=x&b=x”中,前缀“http://xxx.com/a.gif”即为标识该URL具有网页跟踪信息的关键字。In this implementation manner, the keyword is a specific character string in a Uniform Resource Locator (URL) in a hypertext transfer protocol (HTTP) request, and the specific character string is used to identify that the HTTP request has web page tracking information. Current web tracking service tools, such as Google's Google Analytics and Baidu's Baidu Statistics, can add a collection code to the website. When the user visits the website, the collection code is executed in the browser and inserted dynamically. A script file link, the link points to a script file stored in the tracking server 3, the browser requests the script file from the tracking server 3 through the file link, and will carry a tracking cookie for recording user keywords (ID) at the same time , the requested script file is executed in the browser to collect user information, such as the title of the currently browsed web page, the URL of the last web page, the cookie of the current web page, and the like. The browser encodes the collected information into the HTTP parameters in the URL, the URL points to a transparent picture in the tracking server, and then generates an HTTP request including the URL and the tracking Cookie, the web page tracking information included in the URL and The data in the tracking cookie is the webpage tracking information contained in the HTTP request. By transmitting the HTTP request, the webpage tracking information recorded in the HTTP request can be transmitted to the tracking server 3, thereby realizing the collection of user data. There is a code in the URL that identifies that the URL has web page tracking information, such as in the URL "http://xxx.com/a.gif?a=x&b=x", the prefix "http://xxx.com/ a.gif" is a keyword identifying that the URL has web page tracking information.
每个跟踪服务器所设置的在URL中标识具有网页跟踪信息的字符串不同,故该设置模块11可预先将不同的跟踪服务工具标识具有网页跟踪信息的字符串收集存储作为关键字。Each tracking server sets different character strings identifying web page tracking information in the URL, so the setting module 11 can collect and store different tracking service tool identification character strings having web page tracking information as keywords in advance.
在其他实施方式中,该关键字也可以为该HTTP请求中的cookie信息、头部信息或正文信息中的特定字符串,该特定字符串用于标识该HTTP请求含有网页跟踪信息。In other implementation manners, the keyword may also be a specific character string in cookie information, header information or text information in the HTTP request, and the specific character string is used to identify that the HTTP request contains web page tracking information.
该识别模块12用于侦测HTTP请求,并根据设置的识别规则,识别HTTP请求中是否包含有网页跟踪信息。该识别模块12确定与该关键字匹配的HTTP请求为包含有网页跟踪信息的HTTP请求。在该在确定与该关键字匹配的HTTP请求时,可以为关键字的精确匹配,也可为针对该关键字的正则表达式匹配。本实施方式中,该识别模块12可通过确定该URL中是否包括设置的关键字来确定与该关键字匹配的HTTP请求。在其他实施方式中,该识别模块12也可在该HTTP请求的其他部分,如cookie信息、头部信息或正文信息中确定与该关键字匹配的HTTP请求。The identifying module 12 is used to detect HTTP requests, and identify whether the HTTP requests contain web page tracking information according to the set identifying rules. The identifying module 12 determines that the HTTP request matching the keyword is an HTTP request containing web page tracking information. When determining the HTTP request matching the keyword, it may be an exact match of the keyword, or a regular expression match for the keyword. In this embodiment, the identifying module 12 can determine the HTTP request matching the keyword by determining whether the URL includes the keyword set. In other implementation manners, the identification module 12 may also determine the HTTP request matching the keyword in other parts of the HTTP request, such as cookie information, header information or text information.
该拦截模块13对所识别的包含有网页跟踪信息的HTTP请求通过拦截规则进行拦截处理,防止网页跟踪信息被发送至跟踪服务器。该拦截模块13对包含有网页跟踪信息的HTTP请求的拦截的处理方式包括:中止TCP连接、丢弃HTTP请求中的数据包、对HTTP请求中包含的网页跟踪信息进行清洗或替换或回复失败的HTTP应答码。The interception module 13 intercepts the identified HTTP request containing webpage tracking information through interception rules to prevent the webpage tracking information from being sent to the tracking server. The interception module 13 includes the processing method of intercepting the HTTP request of web page tracking information: aborting the TCP connection, discarding the data packet in the HTTP request, cleaning or replacing the web page tracking information contained in the HTTP request or replying to the failed HTTP request. answer code.
如图2所示,为本发明一实施方式中的防止网页跟踪方法的流程图。As shown in FIG. 2 , it is a flowchart of a method for preventing webpage tracking in an embodiment of the present invention.
在步骤S201中,该设置模块11设置识别规则及拦截规则。In step S201, the setting module 11 sets identification rules and interception rules.
在步骤S202中,该识别模块12侦测HTTP请求。In step S202, the identification module 12 detects HTTP requests.
在步骤S203中,该识别模块12识别HTTP请求中是否包含有网页跟踪信息,若包含,则执行步骤S204,若不包含,则结束程序。In step S203, the identifying module 12 identifies whether the HTTP request includes web page tracking information, if yes, executes step S204, and if not, ends the procedure.
在步骤S204中,该拦截模块13对所识别的包含有网页跟踪信息的HTTP请求通过拦截规则进行拦截处理,防止网页跟踪信息被发送至跟踪服务器3。In step S204, the interception module 13 intercepts the identified HTTP request containing webpage tracking information through interception rules to prevent the webpage tracking information from being sent to the tracking server 3 .
以上所述仅是本发明的优选实施方式,本发明的保护范围并不仅局限于上述实施例,凡属于本发明思路下的技术方案均属于本发明的保护范围。应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理前提下的若干个改进和润饰,这些改进和润饰也应视为本发明的保护范围。The above descriptions are only preferred implementations of the present invention, and the scope of protection of the present invention is not limited to the above-mentioned embodiments, and all technical solutions under the idea of the present invention belong to the scope of protection of the present invention. It should be pointed out that for those skilled in the art, several improvements and modifications without departing from the principle of the present invention should also be regarded as the protection scope of the present invention.
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201310039947.9ACN103118024B (en) | 2013-02-01 | 2013-02-01 | Prevent the system and method that webpage is followed the tracks of |
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201310039947.9ACN103118024B (en) | 2013-02-01 | 2013-02-01 | Prevent the system and method that webpage is followed the tracks of |
| Publication Number | Publication Date |
|---|---|
| CN103118024A CN103118024A (en) | 2013-05-22 |
| CN103118024Btrue CN103118024B (en) | 2016-09-28 |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201310039947.9AExpired - Fee RelatedCN103118024B (en) | 2013-02-01 | 2013-02-01 | Prevent the system and method that webpage is followed the tracks of |
| Country | Link |
|---|---|
| CN (1) | CN103118024B (en) |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN104283839B (en)* | 2013-07-02 | 2019-09-17 | 腾讯科技(深圳)有限公司 | A kind of method and apparatus obtaining Cookie |
| US11093644B2 (en)* | 2019-05-14 | 2021-08-17 | Google Llc | Automatically detecting unauthorized re-identification |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101901232A (en)* | 2009-05-31 | 2010-12-01 | 西门子(中国)有限公司 | Method and device for processing webpage data |
| CN101937439A (en)* | 2009-06-30 | 2011-01-05 | 国际商业机器公司 | Method and system for collecting user access related information |
| CN102043840A (en)* | 2010-12-13 | 2011-05-04 | 北京安天电子设备有限公司 | Method and system for detecting and tracking cookie cache files |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8015174B2 (en)* | 2007-02-28 | 2011-09-06 | Websense, Inc. | System and method of controlling access to the internet |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101901232A (en)* | 2009-05-31 | 2010-12-01 | 西门子(中国)有限公司 | Method and device for processing webpage data |
| CN101937439A (en)* | 2009-06-30 | 2011-01-05 | 国际商业机器公司 | Method and system for collecting user access related information |
| CN102043840A (en)* | 2010-12-13 | 2011-05-04 | 北京安天电子设备有限公司 | Method and system for detecting and tracking cookie cache files |
| Publication number | Publication date |
|---|---|
| CN103118024A (en) | 2013-05-22 |
| Publication | Publication Date | Title |
|---|---|---|
| Englehardt et al. | I never signed up for this! Privacy implications of email tracking | |
| CN102594934B (en) | Method and device for identifying hijacked website | |
| ES2679286T3 (en) | Distinguish valid users of robots, OCR and third-party solvers when CAPTCHA is presented | |
| CN104954372B (en) | A kind of evidence obtaining of fishing website and verification method and system | |
| CN103443781B (en) | data delivery | |
| JP6744480B2 (en) | Network-based ad data traffic latency reduction | |
| US8874695B2 (en) | Web access using cross-domain cookies | |
| TWI515588B (en) | Machine behavior determination method, web browser and web server | |
| CN104717185B (en) | Displaying response method, device, server and the system of short uniform resource locator | |
| US20120071131A1 (en) | Method and system for profiling data communication activity of users of mobile devices | |
| US20110191664A1 (en) | Systems for and methods for detecting url web tracking and consumer opt-out cookies | |
| CN102752288A (en) | Network access behavior identification method and device | |
| CN102571846A (en) | Method and device for forwarding hyper text transport protocol (HTTP) request | |
| CN102592089B (en) | Detection method and detection device for webpage redirection skip loophole | |
| CN104462509A (en) | Review spam detection method and device | |
| CN104468790B (en) | The processing method and client of cookie data | |
| CN105337993A (en) | Dynamic and static combination-based mail security detection device and method | |
| CN108632219A (en) | A kind of website vulnerability detection method, detection service device and system | |
| CN104753730A (en) | Vulnerability detection method and device | |
| CN104636392A (en) | Method and system for issuing recommending information, server and browser | |
| KR101329034B1 (en) | System and method for collecting url information using retrieval service of social network service | |
| CN106899549A (en) | A kind of network security detection method and device | |
| JP6623128B2 (en) | Log analysis system, log analysis method, and log analysis device | |
| CN103118024B (en) | Prevent the system and method that webpage is followed the tracks of | |
| US11075867B2 (en) | Method and system for detection of potential spam activity during account registration |
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| C14 | Grant of patent or utility model | ||
| GR01 | Patent grant | ||
| TR01 | Transfer of patent right | ||
| TR01 | Transfer of patent right | Effective date of registration:20200612 Address after:Nanshan District Xueyuan Road in Shenzhen city of Guangdong province 518000 No. 1001 Nanshan Chi Park building A1 layer Patentee after:SANGFOR TECHNOLOGIES Inc. Address before:518000 Nanshan Science and Technology Pioneering service center, No. 1 Qilin Road, Guangdong, Shenzhen 418, 419, Patentee before:Sangfor Network Technology (Shenzhen) Co.,Ltd. | |
| CF01 | Termination of patent right due to non-payment of annual fee | ||
| CF01 | Termination of patent right due to non-payment of annual fee | Granted publication date:20160928 |