CN115587364B

Movatterモバイル変換

Info

Publication number: CN115587364B
Application number: CN202211232579.5A
Authority: CN
Inventors: 潘祖烈; 刘翎翔; 沈毅; 于璐; 陈远超
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2022-10-10
Filing date: 2022-10-10
Publication date: 2023-07-14
Anticipated expiration: 2042-10-10
Also published as: CN115587364A

Abstract

The invention discloses a firmware vulnerability input point positioning method and device based on front-end and back-end association analysis, wherein the method comprises the following steps: performing text analysis on a front-end script file of firmware to extract front-end data information, wherein the front-end data information comprises API interface information; performing reverse analysis on a rear-end binary file of firmware to extract rear-end data information, wherein the rear-end data information comprises constant character strings and function call information of the constant character strings; performing association analysis on the API interface information and the constant character string to obtain front-end and back-end sharing keywords; and carrying out I/O interaction function identification on the calling function of the shared keyword, and taking an API interface corresponding to the shared keyword called by the I/O interaction function as a firmware bug input point. Compared with SaTC, the method can effectively reduce false alarm caused by manual definition factors, improve the vulnerability detection efficiency and reduce the operation cost of a vulnerability analysis tool.

Description

Translated fromChinese

基于前后端关联性分析的固件漏洞输入点定位方法及装置Firmware vulnerability input point location method and device based on front-end and back-end correlation analysis

技术领域technical field

本发明属于网络安全领域，具体涉及一种基于前后端关联性分析的固件漏洞输入点定位方法及装置，用于针对物联网设备固件二进制漏洞进行敏感输入点定位。The invention belongs to the field of network security, and in particular relates to a method and device for locating firmware vulnerability input points based on front-end and back-end correlation analysis, which are used to locate sensitive input points for firmware binary vulnerabilities of Internet of Things devices.

背景技术Background technique

随着万物互联(IoT)的时代到来，各种物联网设备路由器，网络摄像头，智能汽车等已经深入到人们的生活之中。物联网产业规模不断扩大，全球IoT设备数量正保持每年20％的速度增长，预计在2025年将会达到416亿台。但随着设备数量的增多，物联网设备相关安全标准未能跟上发展的步伐，再加上厂商对设备安全意识的淡薄，导致目前物联网存在严重的安全隐患。With the advent of the Internet of Everything (IoT), various IoT devices such as routers, network cameras, and smart cars have penetrated into people's lives. The scale of the Internet of Things industry continues to expand, and the number of global IoT devices is maintaining an annual growth rate of 20%, which is expected to reach 41.6 billion in 2025. However, with the increase in the number of devices, the security standards related to IoT devices have not kept up with the pace of development, coupled with manufacturers' weak awareness of device security, leading to serious security risks in the current Internet of Things.

根据CNCERT运营的CNVD漏洞平台统计报告，2020年收录的通用型物联网设备漏洞数目达到3047个(同比增长28％)，事件型漏洞2141个，其中包括因设备程序对从客户端传输的数据处理不当导致的缓冲区溢出漏洞和命令注入漏洞，一旦这些漏洞被恶意用户合理利用就可能造成服务器破坏，甚至达到远程控制的效果。According to the statistical report of the CNVD vulnerability platform operated by CNCERT, the number of general-purpose IoT device vulnerabilities recorded in 2020 reached 3,047 (a year-on-year increase of 28%), and event-type vulnerabilities were 2,141, including the processing of data transmitted from the client by the device program. Improperly caused buffer overflow vulnerabilities and command injection vulnerabilities, once these vulnerabilities are properly exploited by malicious users, it may cause server damage, and even achieve the effect of remote control.

物联网设备程序不同于常见PC应用程序，其与用户的交互接口通常部署在外围(例如Web管理接口)，用户通过这些接口发送请求给后台Web服务器，当后台服务器接收到请求后再分发给具体应用程序，因此后台有多少个应用程序处理用户输入数据，以及这些应用程序之间是如何进行交互，对于处在前端的测试人员来说完全是黑盒。Internet of Things device programs are different from common PC applications, and their interaction interfaces with users are usually deployed on the periphery (such as Web management interfaces). Users send requests to background Web servers through these interfaces, and distribute them to specific users after receiving requests. Applications, so how many applications are in the background handling user input data, and how those applications interact with each other, are completely black boxes to the tester on the front end.

二进制静态分析工具karonte用于精简二进制文件分析对象，找出用于处理来自Web管理接口的数据的二进制文件，并将其作为后续污点分析目标。但其在边界二进制识别上，忽略了前端的影响导致过多误报。SaTC在karonte的基础上考虑到了前端的影响，基于前端脚本文件(html，xml，js)和后端二进制文件建立了一个共享关键字匹配模型，匹配双方分别来自前端脚本文件中的API参数，后端二进制文件可见字符串，匹配数量多的二进制文件将作为后续污点分析的目标。但其在前后端数据的提取方式上较为粗糙，前端数据提取未考虑前端脚本文件的标签因素影响，后端数据提取未考虑到字符串能否被程序引用的影响。因此在数据分析上数据量较大，且不精确。并且，在敏感输入点定位上以及未考虑将共享关键字作为参数的函数是否为I/O交互函数，因此分析上有较多误报。The binary static analysis tool karonte is used to refine the binary file analysis objects, find out the binary files used to process data from the web management interface, and use them as targets for subsequent taint analysis. However, in the boundary binary recognition, it ignores the influence of the front end, resulting in too many false positives. On the basis of karonte, SaTC takes into account the impact of the front end, and establishes a shared keyword matching model based on front-end script files (html, xml, js) and back-end binary files. Binary files with a large number of matching strings will be the target of subsequent taint analysis. However, its method of extracting front-end and back-end data is relatively rough. The front-end data extraction does not take into account the label factors of the front-end script files, and the back-end data extraction does not take into account the influence of whether the string can be referenced by the program. Therefore, the amount of data in data analysis is large and inaccurate. Moreover, in the positioning of sensitive input points and whether the function that takes the shared keyword as a parameter is an I/O interaction function is not considered, so there are many false positives in the analysis.

发明内容Contents of the invention

本发明的目的在于，提供一种基于前后端关联性分析的固件漏洞输入点定位方法及装置，以有效缩小传统物联网设备固件二进制漏洞分析的范围，降低误报。The purpose of the present invention is to provide a method and device for locating firmware vulnerability input points based on front-end and back-end correlation analysis, so as to effectively narrow the scope of traditional Internet of Things device firmware binary vulnerability analysis and reduce false positives.

本发明的一个方面，公开了一种固件漏洞输入点定位方法，包括：One aspect of the present invention discloses a method for locating firmware vulnerability input points, including:

对固件的前端脚本文件进行文本分析以提取前端数据信息，所述前端数据信息包括API接口信息；Carry out text analysis to the front-end script file of firmware to extract front-end data information, and described front-end data information includes API interface information;

对固件的后端二进制文件进行逆向分析以提取后端数据信息，所述后端数据信息包括常量字符串以及所述常量字符串的函数调用信息；Performing reverse analysis on the back-end binary file of the firmware to extract back-end data information, the back-end data information including constant strings and function call information of the constant strings;

对所述API接口信息与所述常量字符串进行关联分析以得到前后端共享关键字；Performing an association analysis on the API interface information and the constant string to obtain the front-end and back-end shared keywords;

对所述共享关键字的调用函数进行I/O交互函数识别，将被I/O交互函数调用的共享关键字对应的API接口作为固件漏洞输入点。Perform I/O interaction function identification on the calling function of the shared keyword, and use the API interface corresponding to the shared keyword called by the I/O interaction function as a firmware vulnerability input point.

在一些示例中，所述对固件的后端二进制文件进行逆向分析以提取后端数据信息，包括：利用逆向工具对二进制程序进行反汇编，提取常量字符串以及将这些常量字符串作为参数的函数信息。In some examples, the reverse analysis of the back-end binary file of the firmware to extract the back-end data information includes: using a reverse tool to disassemble the binary program, extracting constant strings and functions using these constant strings as parameters information.

在一些示例中，所述对所述API接口信息与所述常量字符串进行关联分析，还包括：生成一个三元组数据(url，[binary，func，addr]，keyword)，该三元组数据表示所述共享关键字keyword对应的API接口可通过访问地址url访问得到，以及该API接口在二进制文件binary的引用地址addr会被识别并被函数func作为参数调用。In some examples, the analyzing the association between the API interface information and the constant string further includes: generating a triplet data (url, [binary, func, addr], keyword), the triplet The data indicates that the API interface corresponding to the shared keyword keyword can be accessed through the access address url, and the reference address addr of the API interface in the binary file binary will be identified and called by the function func as a parameter.

在一些示例中，所述对所述共享关键字的调用函数进行I/O交互函数识别，包括：根据所述调用函数对所述共享关键字的调用次数进行I/O交互函数识别。In some examples, the identifying the I/O interaction function of the call function of the shared keyword includes: performing identification of the I/O interaction function according to the number of calls of the shared keyword by the call function.

在一些示例中，所述对所述共享关键字的调用函数进行I/O交互函数识别，包括：将函数名中包含get或find字符的调用函数识别为I/O交互函数。In some examples, the identifying the I/O interactive function of the calling function of the shared keyword includes: identifying the calling function whose function name contains characters of get or find as an I/O interactive function.

在一些示例中，所述对固件的前端脚本文件进行文本分析以提取前端数据信息，包括：针对html脚本文件基于正则表达式，提取表单标记。In some examples, the performing text analysis on the front-end script file of the firmware to extract the front-end data information includes: extracting form tags based on regular expressions for the html script file.

在一些示例中，所述对固件的前端脚本文件进行文本分析以提取前端数据信息，包括：针对xml脚本文件，采取基于树状数据结构对其进行解析，通过深度优先遍历提取叶子节点中的信息作为xml文件数据提取的目标信息。In some examples, the text analysis of the front-end script file of the firmware to extract the front-end data information includes: analyzing the xml script file based on a tree-like data structure, and extracting the information in the leaf nodes through depth-first traversal Target information extracted as xml file data.

在一些示例中，所述对固件的后端二进制文件进行逆向分析以提取后端数据信息，包括：对提取得到的字符串进行两层过滤，包括：对字符串地址进行交叉引用分析，筛选出能够交叉引用到.text段的字符串，以及筛除不符合Web前端API命名规范的字符串。In some examples, the reverse analysis of the back-end binary file of the firmware to extract the back-end data information includes: performing two layers of filtering on the extracted string, including: performing cross-reference analysis on the string address, and filtering out Ability to cross-reference strings in the .text section, and filter out strings that do not conform to the naming convention of the Web front-end API.

在一些示例中，还包括：根据函数调用信息，将引用字符串指针的函数与I/O输入不相关的字符串过滤。In some examples, it also includes: according to the function call information, filtering the strings that are not related to the function that refers to the string pointer and the I/O input.

本发明的另一方面，公开了一种固件漏洞输入点定位装置，包括：Another aspect of the present invention discloses a firmware vulnerability input point locating device, including:

前端数据信息提取模块，用于对固件的前端脚本文件进行文本分析以提取前端数据信息，所述前端数据信息包括API接口信息；The front-end data information extraction module is used to carry out text analysis to the front-end script file of the firmware to extract the front-end data information, and the front-end data information includes API interface information;

后端数据信息提取模块，用于对固件的后端二进制文件进行逆向分析以提取后端数据信息，所述后端数据信息包括常量字符串以及所述常量字符串的函数调用信息；The back-end data information extraction module is used to reversely analyze the back-end binary file of the firmware to extract the back-end data information, and the back-end data information includes constant character strings and function call information of the constant character strings;

关联分析模块，用于对所述API接口信息与所述常量字符串进行关联分析以得到前后端共享关键字；An association analysis module, configured to perform association analysis on the API interface information and the constant character string to obtain front-end and back-end shared keywords;

识别模块，用于对所述共享关键字的调用函数进行I/O交互函数识别，将被I/O交互函数调用的共享关键字对应的API接口作为固件漏洞输入点。The identification module is used to identify the I/O interactive function of the calling function of the shared keyword, and use the API interface corresponding to the shared keyword called by the I/O interactive function as a firmware vulnerability input point.

与SaTC相比，本发明不需要人工定义输入敏感函数，根据关联性分析的结果可以得到专门用于读取Web前端输入的I/O交互函数，能有效降低人工定义因素导致的误报。Compared with SaTC, the present invention does not need to manually define input-sensitive functions, and the I/O interaction function specially used to read the input of the Web front-end can be obtained according to the results of correlation analysis, which can effectively reduce false positives caused by manually defined factors.

同时，本发明无需对敏感输入点之前的程序执行流程进行分析，只需要找到对应该敏感输入点的Web前端API接口，通过发送与该API相关请求即可到达敏感输入点，提高漏洞检测的效率。At the same time, the present invention does not need to analyze the program execution process before the sensitive input point, but only needs to find the Web front-end API interface corresponding to the sensitive input point, and the sensitive input point can be reached by sending a request related to the API, thereby improving the efficiency of vulnerability detection .

此外，本发明可以将后续漏洞分析的维度从二进制程序降低到代码片段，因此能够降低漏洞分析工具的运行开销。In addition, the present invention can reduce the dimension of subsequent vulnerability analysis from binary programs to code fragments, thus reducing the operating overhead of vulnerability analysis tools.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍。显而易见地，下面描述中的附图是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following briefly introduces the drawings required for the description of the embodiments or the prior art. Apparently, the drawings in the following description are some embodiments of the present invention, and those skilled in the art can obtain other drawings according to these drawings without any creative effort.

图1为根据本发明实施例的基于前后端关联性分析的固件漏洞输入点定位方法工作流程示意图；Fig. 1 is a schematic workflow diagram of a method for locating firmware vulnerability input points based on front-end and back-end correlation analysis according to an embodiment of the present invention;

图2为前后端共享关键字示意图；Figure 2 is a schematic diagram of front-end and back-end shared keywords;

图3为三元组数据生成示意图；Fig. 3 is a schematic diagram of generating triplet data;

图4为为根据本发明实施例的基于前后端关联性分析的固件漏洞输入点定位装置组成示意图。FIG. 4 is a schematic diagram of the composition of a device for locating firmware vulnerability input points based on front-end and back-end correlation analysis according to an embodiment of the present invention.

具体实施方式Detailed ways

以下结合附图对本发明的示范性实施例做出说明，其中包括本发明实施例的各种细节，以助于理解，应当将它们认为仅仅是示范性的。因此，本领域普通技术人员应当认识到，可以对这里描述的实施例做出各种改变和修改，而不会背离本发明的范围和精神。同样，为了清楚和简明，以下的描述中省略了对公知功能和结构的描述。Exemplary embodiments of the present invention are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present invention to facilitate understanding, and they should be considered as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

目前，针对物联网设备固件二进制漏洞的挖掘主要关注点在于二进制层面，而物联网固件中存在成百上千个二进制程序，对于哪些二进制程序是进行漏洞挖掘的目标缺乏判断依据。考虑到物联网设备固件二进制程序的对用户的输入接口通常部署在Web界面，本发明提出一种基于前后端关联性分析的固件漏洞输入点定位方法，图1为根据本发明实施例的基于前后端关联性分析的固件漏洞输入点定位方法流程示意图。如图所示，该方法包括：At present, the mining of binary vulnerabilities in IoT device firmware focuses on the binary level, and there are hundreds of binary programs in IoT firmware, and there is no basis for judging which binary programs are the targets of vulnerability mining. Considering that the input interface to the user of the firmware binary program of the Internet of Things device is usually deployed on the Web interface, the present invention proposes a method for locating firmware vulnerability input points based on front-end and back-end correlation analysis. Schematic diagram of the process flow of the firmware vulnerability input point location method for terminal correlation analysis. As shown, the method includes:

步骤101、对固件的前端脚本文件进行文本分析以提取前端数据信息，所述前端数据信息包括API接口信息；Step 101, text analysis is carried out to the front-end script file of firmware to extract front-end data information, and described front-end data information includes API interface information;

根据语言类型，物联网设备固件的前端脚本文件主要包括html、xml以及js脚本文件。通过语法对该脚本文件进行解析，进而提取相关数据信息。According to the language type, the front-end script files of the IoT device firmware mainly include html, xml and js script files. The script file is parsed through grammar, and then relevant data information is extracted.

(1)html文件数据提取(1) HTML file data extraction

SaTC针对html的信息提取主要是基于正则表达式匹配文本中出现的id值以及name值，根据html的语法格式，这些值并不与用户输入直接相关，因此提取的数据较为冗余。因此本方法基于正则表达式，结合html语法的表单标记，提取表单输入标记(<input>)、下拉列表标记(<select>)等，这些标记的值相较于SaTC考虑到了用户输入属性。SaTC's information extraction for html is mainly based on regular expressions matching the id value and name value that appear in the text. According to the grammatical format of html, these values are not directly related to user input, so the extracted data is relatively redundant. Therefore, this method is based on regular expressions, combined with form tags of html syntax, to extract form input tags (<input>), drop-down list tags (<select>), etc. The values of these tags take into account user input attributes compared to SaTC.

(2)xml文件数据提取(2) XML file data extraction

在SaTC中，xml文件数据也是基于正则表达式提取。本发明采取基于树状数据结构对其进行解析，通过深度优先遍历提取叶子节点中的信息作为xml文件数据提取的目标信息，相较于SaTC该方法不需要基于过多的人工经验分析。In SaTC, xml file data is also extracted based on regular expressions. The present invention analyzes it based on the tree-like data structure, and extracts the information in the leaf nodes through depth-first traversal as the target information of the xml file data extraction. Compared with SaTC, this method does not require too much manual experience analysis.

(3)js文件数据提取(3) js file data extraction

本发明使用的方法与SaTC相同，即基于抽象语法树对文本进行解析。通过遍历每个Literal的节点得到js常用的API发送请求的接口函数，提取其中参数值作为js文件数据提取的数据。The method used in the present invention is the same as that of SaTC, that is, the text is parsed based on the abstract syntax tree. By traversing each Literal node, the interface function of the js common API sending request is obtained, and the parameter value is extracted as the data extracted from the js file.

步骤102、对固件的后端二进制文件进行逆向分析以提取后端数据信息，所述后端数据信息包括常量字符串以及所述常量字符串的函数调用信息；Step 102, perform reverse analysis on the back-end binary file of the firmware to extract back-end data information, the back-end data information includes constant strings and function call information of the constant strings;

在物联网设备中，后端文件主要由一些cgi二进制程序组成，这些二进制文件本质上都是ELF文件。因此，本发明采取IDA逆向工具对ELF文件进行预加载，然后通过逆向工具对用户开放的接口，进行二次插件开发实现对后端数据信息提取。后端数据信息主要提取常量字符串以及这些字符串的函数调用信息。In IoT devices, backend files are mainly composed of some cgi binary programs, which are essentially ELF files. Therefore, the present invention adopts the IDA reverse tool to preload the ELF file, and then performs secondary plug-in development through the interface opened to the user by the reverse tool to extract the back-end data information. The back-end data information mainly extracts constant strings and function call information of these strings.

(1)常量字符串(1) Constant string

在ELF文件中rodata段存储了常量字符串，例如编写程序时定义为const的全局变量，在固件二进制程序中与网络连接相关的字符串(如HTTP，POST，CONTENT)基本上存储在该区域，从rodata段起始地址开始，依次读取每个地址的所存储的字符，当读取到“\x00”则表示一个字符串结尾，下一个地址则为新的字符串起始位置。通过这种方式提取，可以得到字符串值及其所在地址。In the rodata section of the ELF file, constant strings are stored, such as global variables defined as const when writing programs, and strings related to network connections (such as HTTP, POST, CONTENT) in firmware binary programs are basically stored in this area. Starting from the starting address of the rodata segment, read the characters stored in each address in turn. When "\x00" is read, it means the end of a string, and the next address is the starting position of a new string. Extracted in this way, you can get the string value and its address.

同时为了保证这些字符串的有效性，再对这些字符串进行两层过滤，首先对这些字符串地址进行交叉引用分析，筛选出能够交叉引用到.text段的字符串(符合数据处理要求)；其次是针对字符串名，对于带有‘％’、‘#’、‘$’等字符(不符合Web前端API命名规范)进行筛除。At the same time, in order to ensure the validity of these character strings, two layers of filtering are performed on these character strings. First, cross-reference analysis is performed on these string addresses, and the character strings that can be cross-referenced to the .text segment are screened out (meeting data processing requirements); The second is to filter out characters with '%', '#', '$' and other characters (which do not conform to the naming convention of the Web front-end API) for the string name.

(2)函数调用信息(2) Function call information

基于常量字符串提取得到的text段交叉引用地址信息，获取得到该地址所在基本块信息，从基本块中分析出将该字符串函数指针作为参数的函数信息。Based on the text segment cross-reference address information extracted from the constant string, the basic block information where the address is located is obtained, and the function information using the string function pointer as a parameter is analyzed from the basic block.

根据调用的函数信息再次精简部分字符串信息，如引用该字符串指针的函数只有printf，strcmp等与I/O输入不相关的函数，则再对该字符串进行过滤。According to the called function information, part of the string information is simplified again. If the functions that refer to the string pointer are only printf, strcmp and other functions that are not related to I/O input, then the string is filtered again.

根据物联网设备特点，前端页面是由html、xml、js等脚本文件组成，后端为ELF二进制程序。前端脚本文件主要作用为Web页面的显示，传输用户的请求到后端，后端二进制程序主要数据交互及网站数据的保存和读取。由于前后端是两种不同的语言类型编写(前端为解释性语言，后端为编译性语言)，且后端已是编译完成的程序缺乏源码等信息，因此本发明根据不同的语言类型建立不同的数据提取方法：针对前端脚本文件，根据其语言格式进行文本解析，提取用户在该页面可以提交的get和post请求的参数名；针对后端二进制程序，通过利用现有的逆向工具，将二进制程序反汇编，提取常量字符串以及将这些字符串作为参数的函数信息。相比SaTC前端提取的数据，本发明考虑和用户输入相关的标签名，相比SaTC后端提取的数据，本发明只考虑只被二进制程序可执行段所引用的常量字符串，得到的后端数据信息更加精确且全面。According to the characteristics of IoT devices, the front-end page is composed of script files such as html, xml, and js, and the back-end is an ELF binary program. The front-end script file is mainly used to display the web page, transmit the user's request to the back-end, and the back-end binary program mainly interacts with data and saves and reads website data. Because the front end and the front end are written in two different language types (the front end is an interpretive language, and the back end is a compiled language), and the back end is a compiled program that lacks information such as source code, so the present invention establishes different language types according to different language types. The data extraction method: for the front-end script file, analyze the text according to its language format, and extract the parameter names of the get and post requests that the user can submit on this page; for the back-end binary program, by using the existing reverse tool, the binary Program disassembly, extracting constant strings and information about functions that take these strings as parameters. Compared with the data extracted by the front-end of SaTC, the present invention considers the tag names related to user input, compared with the data extracted by the back-end of SaTC, the present invention only considers the constant character strings that are only quoted by the executable segment of the binary program, and the obtained back-end Data information is more accurate and comprehensive.

步骤103、对所述API接口信息与所述常量字符串进行关联分析以得到前后端共享关键字；Step 103, performing association analysis on the API interface information and the constant character string to obtain the front-end and back-end shared keywords;

基于文本分析获取的前端API字符串与基于软件逆向获取的后端二进制程序字符串常量进行关联分析(同名匹配)，得到前后端共享关键字keyword。The front-end API string obtained based on text analysis is associated with the back-end binary program string constant obtained based on software reverse engineering (same name matching), and the front-end and back-end shared keywords are obtained.

如图2所示，前端存在一个名叫langType的API输入接口，用户通过向该接口输入输入数据，通过发送http协议数据包到后端二进制程序由WebsGetVar函数进行识别读取用户输入的内容。As shown in Figure 2, there is an API input interface named langType on the front end. Users input data to this interface and send http protocol packets to the back-end binary program to identify and read the content input by the user through the WebsGetVar function.

通过分析上述过程发现，后端二进制程序也是通过识别该API接口名进行读取用户输入。因此可以围绕前后端提取的同名字符串建立前后端关联性模型。如图3所示，通过将文本解析提取的请求参数名和软件逆向提取的常量字符串进行对比分析，生成一个三元组数据(url，[binary，func，addr]，keyword)，该三元组数据表示所述共享关键字keyword对应的API接口可通过访问地址url访问得到，以及该API接口在二进制文件binary的引用地址addr会被识别并被函数func作为参数调用。Through the analysis of the above process, it is found that the back-end binary program also reads user input by identifying the API interface name. Therefore, a front-end and back-end correlation model can be established around the strings of the same name extracted by the front and back ends. As shown in Figure 3, a triple data (url, [binary, func, addr], keyword) is generated by comparing and analyzing the request parameter name extracted by text analysis and the constant string extracted by the software reversely. The data indicates that the API interface corresponding to the shared keyword keyword can be accessed through the access address url, and the reference address addr of the API interface in the binary file binary will be identified and called by the function func as a parameter.

步骤104、对所述共享关键字的调用函数进行I/O交互函数识别，将被I/O交互函数调用的共享关键字对应的API接口作为固件漏洞输入点。Step 104: Perform I/O interaction function identification on the calling function of the shared keyword, and use the API interface corresponding to the shared keyword called by the I/O interaction function as a firmware vulnerability input point.

并不是所有前述步骤得到的三元组数据都能作为二进制程序的敏感输入点识别的依据，只有三元组数据中的函数为I/O交互函数时，该三元组才能在后续使用，因此对I/O交互函数进行识别，以有效提高后续分析的准确性。Not all triplet data obtained in the preceding steps can be used as the basis for identifying sensitive input points of binary programs. Only when the function in the triplet data is an I/O interaction function, the triplet can be used in the future, so Identify the I/O interaction function to effectively improve the accuracy of subsequent analysis.

具体地，本发明建立以下规则对二进制程序使用的读取从Web前端发送过来的数据的I/O交互函数进行识别：Specifically, the present invention establishes the following rules to identify the I/O interaction function used by the binary program to read the data sent from the Web front end:

(1)这类函数对共享关键字调用在使用次数上大于其他类函数；(1) This type of function calls shared keywords more frequently than other types of functions;

(2)这类函数中既有通用的C标准库函数如getenv,nvram_get等，也有各厂商定制的函数如Cisco的get_cgi,TP-Link的httpGetEnv函数，NETGEAR的find_var函数等，综合来看，对函数名中带有get或find字符的函数可以视其为一种I/O交互函数。(2) There are not only common C standard library functions such as getenv, nvram_get, etc., but also functions customized by various manufacturers such as Cisco's get_cgi, TP-Link's httpGetEnv function, NETGEAR's find_var function, etc. In general, the A function with get or find characters in the function name can be regarded as a kind of I/O interaction function.

一个识别I/O交互函数的算法示例如下所示：An example algorithm for identifying I/O interaction functions is shown below:

图4为根据本发明实施例的基于前后端关联性分析的固件漏洞输入点定位装置400组成示意图。如图所示，该装置包括：FIG. 4 is a schematic composition diagram of an apparatus 400 for locating firmware vulnerability input points based on front-end and back-end correlation analysis according to an embodiment of the present invention. As shown, the unit includes:

前端数据信息提取模块401，用于对固件的前端脚本文件进行文本分析以提取前端数据信息，所述前端数据信息包括API接口信息；The front-end data information extraction module 401 is used to perform text analysis on the front-end script file of the firmware to extract the front-end data information, and the front-end data information includes API interface information;

后端数据信息提取模块402，用于对固件的后端二进制文件进行逆向分析以提取后端数据信息，所述后端数据信息包括常量字符串以及所述常量字符串的函数调用信息；The back-end data information extraction module 402 is used to reversely analyze the back-end binary file of the firmware to extract the back-end data information, and the back-end data information includes constant character strings and function call information of the constant character strings;

关联分析模块403，用于对所述API接口信息与所述常量字符串进行关联分析以得到前后端共享关键字；An association analysis module 403, configured to perform association analysis on the API interface information and the constant character string to obtain front-end and back-end shared keywords;

识别模块404，用于对所述共享关键字的调用函数进行I/O交互函数识别，将被I/O交互函数调用的共享关键字对应的API接口作为固件漏洞输入点。The identification module 404 is configured to identify the I/O interaction function of the calling function of the shared keyword, and use the API interface corresponding to the shared keyword called by the I/O interaction function as a firmware vulnerability input point.

以上各实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述各实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分或者全部技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。The above embodiments are only used to illustrate the technical solutions of the present invention, and are not intended to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it can still be applied to the foregoing embodiments The technical solutions described in the examples are modified, or some or all of the technical features are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

Translated fromChinese

1.一种固件漏洞输入点定位方法，其特征在于，包括：1. A method for locating a firmware vulnerability input point, comprising:

对所述API接口信息与所述常量字符串进行关联分析以得到前后端共享关键字；其中，对所述API接口信息与所述常量字符串进行关联分析包括：生成一个三元组数据（url，[binary，func，addr]，keyword），该三元组数据表示所述共享关键字keyword对应的API接口可通过访问地址url访问得到，以及该API接口在二进制文件binary的引用地址addr会被识别并被函数func作为参数调用；Performing association analysis on the API interface information and the constant character string to obtain front-end and back-end shared keywords; wherein, performing association analysis on the API interface information and the constant character string includes: generating a triplet data (url , [binary, func, addr], keyword), the triple data indicates that the API interface corresponding to the shared keyword keyword can be accessed through the access address url, and the reference address addr of the API interface in the binary file binary will be Recognized and called by the function func as a parameter;

对所述共享关键字的调用函数进行I/O交互函数识别，将被I/O交互函数调用的共享关键字对应的API接口作为固件漏洞输入点；其中，对所述共享关键字的调用函数进行I/O交互函数识别，包括：根据所述调用函数对所述共享关键字的调用次数大于其他类函数进行I/O交互函数识别，或者将函数名中包含get或find字符的调用函数识别为I/O交互函数。Carry out I/O interactive function identification to the call function of described shared keyword, the API interface corresponding to the shared keyword called by I/O interactive function is used as firmware vulnerability input point; Wherein, to the call function of described shared keyword Identifying the I/O interactive function includes: identifying the I/O interactive function according to the number of times the calling function calls the shared keyword is greater than other functions, or identifying the calling function whose name contains get or find characters For the I/O interaction function.

2.根据权利要求1所述的固件漏洞输入点定位方法，其特征在于，所述对固件的后端二进制文件进行逆向分析以提取后端数据信息，包括：利用逆向工具对二进制程序进行反汇编，提取常量字符串以及将这些常量字符串作为参数的函数信息。2. The method for locating firmware vulnerability input points according to claim 1, wherein said reverse analysis of the back-end binary file of the firmware to extract the back-end data information comprises: using a reverse tool to disassemble the binary program , to extract constant strings and information about functions that take those constant strings as arguments.

3.根据权利要求1所述的固件漏洞输入点定位方法，其特征在于，所述对固件的前端脚本文件进行文本分析以提取前端数据信息，包括：针对html脚本文件基于正则表达式，提取表单标记。3. The firmware vulnerability input point location method according to claim 1, wherein said text analysis is carried out to the front-end script file of the firmware to extract the front-end data information, comprising: extracting the form based on regular expressions for the html script file mark.

4.根据权利要求1所述的固件漏洞输入点定位方法，其特征在于，所述对固件的前端脚本文件进行文本分析以提取前端数据信息，包括：针对xml脚本文件，采取基于树状数据结构对其进行解析，通过深度优先遍历提取叶子节点中的信息作为xml文件数据提取的目标信息。4. The firmware vulnerability input point location method according to claim 1, wherein said text analysis is carried out to the front-end script file of the firmware to extract the front-end data information, comprising: for the xml script file, adopting a tree-based data structure It is parsed, and the information in the leaf nodes is extracted through depth-first traversal as the target information of the xml file data extraction.

5.根据权利要求1所述的固件漏洞输入点定位方法，其特征在于，所述对固件的后端二进制文件进行逆向分析以提取后端数据信息，包括：对提取得到的字符串进行两层过滤，包括：对字符串地址进行交叉引用分析，筛选出能够交叉引用到.text段的字符串，以及筛除不符合Web前端API命名规范的字符串。5. The method for locating firmware vulnerability input points according to claim 1, wherein said performing reverse analysis on the back-end binary file of the firmware to extract the back-end data information comprises: performing two layers of extraction on the extracted character string Filtering, including: performing cross-reference analysis on string addresses, filtering out strings that can be cross-referenced to the .text segment, and filtering out strings that do not conform to the naming convention of the Web front-end API.

6.根据权利要求5所述的固件漏洞输入点定位方法，其特征在于，还包括：根据函数调用信息，将引用字符串指针的函数与I/O输入不相关的字符串过滤。6. The method for locating firmware vulnerability input points according to claim 5, further comprising: according to the function call information, filtering the strings that refer to the functions of the string pointers and the I/O input are not related.

7.一种固件漏洞输入点定位装置，其特征在于，包括：7. A device for locating a firmware vulnerability input point, comprising:

关联分析模块，用于对所述API接口信息与所述常量字符串进行关联分析以得到前后端共享关键字；其中，所述关联分析模块对所述API接口信息与所述常量字符串进行关联分析，包括：生成一个三元组数据（url，[binary，func，addr]，keyword），该三元组数据表示所述共享关键字keyword对应的API接口可通过访问地址url访问得到，以及该API接口在二进制文件binary的引用地址addr会被识别并被函数func作为参数调用；An association analysis module, configured to perform association analysis on the API interface information and the constant string to obtain front-end and back-end shared keywords; wherein, the association analysis module associates the API interface information with the constant string The analysis includes: generating a triplet data (url, [binary, func, addr], keyword), the triplet data indicates that the API interface corresponding to the shared keyword keyword can be accessed through the access address url, and the The reference address addr of the API interface in the binary file binary will be recognized and called by the function func as a parameter;

识别模块，用于对所述共享关键字的调用函数进行I/O交互函数识别，将被I/O交互函数调用的共享关键字对应的API接口作为固件漏洞输入点；其中，对所述共享关键字的调用函数进行I/O交互函数识别，包括：根据所述调用函数对所述共享关键字的调用次数大于其他类函数进行I/O交互函数识别，或者将函数名中包含get或find字符的调用函数识别为I/O交互函数。The identification module is used to identify the I/O interactive function of the calling function of the shared keyword, and use the API interface corresponding to the shared keyword called by the I/O interactive function as a firmware vulnerability input point; wherein, the shared The call function of the keyword is used to identify the I/O interaction function, including: according to the number of calls of the call function to the shared keyword is greater than that of other functions, the I/O interaction function is identified, or the function name contains get or find The calling function of the character is recognized as an I/O interaction function.