Movatterモバイル変換


[0]ホーム

URL:


CN114943078B - File identification method and device - Google Patents

File identification method and device

Info

Publication number
CN114943078B
CN114943078BCN202210588323.1ACN202210588323ACN114943078BCN 114943078 BCN114943078 BCN 114943078BCN 202210588323 ACN202210588323 ACN 202210588323ACN 114943078 BCN114943078 BCN 114943078B
Authority
CN
China
Prior art keywords
file
identified
hash
virus
header
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210588323.1A
Other languages
Chinese (zh)
Other versions
CN114943078A (en
Inventor
郭玲玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New H3C Security Technologies Co Ltd
Original Assignee
New H3C Security Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by New H3C Security Technologies Co LtdfiledCriticalNew H3C Security Technologies Co Ltd
Priority to CN202210588323.1ApriorityCriticalpatent/CN114943078B/en
Publication of CN114943078ApublicationCriticalpatent/CN114943078A/en
Application grantedgrantedCritical
Publication of CN114943078BpublicationCriticalpatent/CN114943078B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

The application provides a file identification method and device, and relates to the technical field of security. The method comprises the steps of carrying out file identification on a received file to be identified, extracting a file header from the file to be identified when the file to be identified is a file of a set type, carrying out hash calculation on target information in the file header to obtain a local hash value, matching the local hash value with a virus hash feature library, wherein the virus hash feature library comprises virus feature hash values, and confirming that the file to be identified is a virus file when matching is successful. By adopting the method, the accuracy of the virus detection result is improved under the condition that more memory is not required to be occupied when the file passing through the network equipment is subjected to virus detection.

Description

File identification method and device
Technical Field
The present application relates to the field of security technologies, and in particular, to a method and an apparatus for identifying a file.
Background
DPI (DEEP PACKET Instructions, deep packet Inspection) deep security is a security mechanism that detects and controls network traffic through network devices based on application layer information. In increasingly complex network security threats, many malicious acts (e.g., worms, spam, vulnerabilities, etc.) are hidden in the application layer payload of the data message. Traditional security protection technology only depends on the security detection technology of a network layer and a transmission layer, and cannot meet the network security requirement. Therefore, the network device must have a DPI function to detect and control information of the network application layer, so as to ensure the security of the data content and improve the security of the network.
At present, virus prevention detection mainly detects virus files transmitted in a network through pattern string matching and full-text hash modes. The mode string matching mode is to detect malicious files through a static feature code scanning technology, but the method needs to configure large-scale feature code rules, so that more memory is occupied, and higher requirements are put on the memory of network equipment. The full text hash mode has stronger integrity to the file, and if the message is disordered in the file transmission process, the file hash calculation is wrong, so that the accuracy of the virus file detection result is affected.
Therefore, how to improve the accuracy of the virus detection result is one of the technical problems to be considered under the condition that more memory is not required to be occupied when the virus identification is performed on the file passing through the network device.
Disclosure of Invention
In view of this, the present application provides a method and apparatus for identifying a file, which are used to improve the accuracy of a virus detection result without occupying more memory when detecting a virus for a file passing through a network device.
Specifically, the application is realized by the following technical scheme:
according to a first aspect of the present application, there is provided a file identification method comprising:
carrying out file identification on the received file to be identified;
when the file to be identified is a file of a set type, extracting a file header from the file to be identified;
carrying out hash calculation on the target information in the file header to obtain a local hash value;
Matching the local hash value with a virus hash feature library, wherein the virus hash feature library comprises feature hash values of viruses;
and when the matching is successful, confirming that the file to be identified is a virus file.
Optionally, the file to be identified comprises an executable file under a Windows operating system, wherein the file header comprises an image file header and an optional image header;
performing hash calculation on the target information in the file header to obtain a local hash value, including:
Carrying out hash calculation on the mapping file header to obtain an intermediate hash value;
Extracting target selectable image head information matched with the machine type code from the selectable image head according to the machine type code;
And carrying out hash calculation according to the intermediate hash value and the target selectable image head information to obtain the local hash value.
Optionally, the file identifying method provided in this embodiment further includes:
And when the matching is unsuccessful, carrying out pattern string feature matching on the file to be identified so as to identify whether the file to be identified is a virus file or not.
Optionally, the file identifying method provided in this embodiment further includes:
when the local hash value is not successfully matched with the virus hash feature library or the file to be identified is not identified as a virus file when pattern string feature matching is carried out on the file to be identified, carrying out full-text hash calculation on the file to be identified to obtain a hash result, and identifying whether the file to be identified is a virus file according to the hash result.
Optionally, extracting the header from the file to be identified includes:
And disassembling the file to be identified by using the file analysis plug-in corresponding to the set type so as to extract the file header in the file to be identified.
According to a second aspect of the present application, there is provided a document identifying apparatus comprising:
the identification module is used for carrying out file identification on the received file to be identified;
The extraction module is used for extracting a file header from the file to be identified when the file to be identified is a file of a set type;
The hash calculation module is used for carrying out hash calculation on the target information in the file header to obtain a local hash value;
The first matching module is used for matching the local hash value with a virus hash characteristic library, wherein the virus hash characteristic library comprises characteristic hash values of viruses;
And the confirming module is used for confirming that the file to be identified is a virus file when the matching result of the first matching module is that the matching is successful.
Optionally, the file to be identified comprises an executable file under a Windows operating system, wherein the file header comprises an image file header and an optional image header;
The hash calculation module is specifically configured to perform hash calculation on the image file header to obtain an intermediate hash value, extract target selectable image header information matched with a machine type code from the selectable image header according to the machine type code, and perform hash calculation on the intermediate hash value and the target selectable image header information to obtain the local hash value.
Optionally, the file identifying apparatus provided in this embodiment further includes:
And the second matching module is used for carrying out pattern string feature matching on the file to be identified when the matching result of the first matching module is that the matching is unsuccessful, so as to identify whether the file to be identified is a virus file.
Optionally, the file identifying apparatus provided in this embodiment further includes:
And the third matching module is used for carrying out full-text hash calculation on the file to be identified to obtain a hash result when the matching result of the first matching module is unsuccessful or the matching result of the second matching module is that the file to be identified is a virus file, and identifying whether the file to be identified is a virus file according to the hash result when the file to be identified is a complete file.
Optionally, the extracting module is specifically configured to disassemble the file to be identified by using a file parsing plug-in corresponding to the setting type, so as to extract a file header in the file to be identified.
According to a third aspect of the present application there is provided an electronic device comprising a processor and a machine-readable storage medium storing a computer program executable by the processor, the processor being caused by the computer program to perform the method provided by the first aspect of the embodiment of the present application.
According to a fourth aspect of the present application there is provided a machine-readable storage medium storing a computer program which, when invoked and executed by a processor, causes the processor to carry out the method provided by the first aspect of the embodiments of the present application.
The embodiment of the application has the beneficial effects that:
According to the file identification method and device, after the received file to be identified is identified, when the file to be identified is identified as the file with the set type, the file header is extracted from the file to be identified, then hash calculation is carried out on target information in the file header to obtain a local hash value, the local hash value is matched with a virus hash feature library, and when the matching is successful, the file to be identified is confirmed to be a virus file. According to the method and the device, only the local hash calculation is needed to be carried out on the file header of the file to be identified, and the matching processing can be carried out on the local hash result obtained through the local hash calculation and the virus hash feature library, so that whether the file to be identified is a virus file or not is identified, and therefore when the file passing through the network equipment is subjected to virus detection, more memory is not needed to be occupied, and accuracy of the virus detection result is improved. In addition, since hash calculation is not needed based on the whole file to be identified, the identification speed of file identification is improved.
Drawings
FIG. 1 is a schematic flow chart of a file identification method according to an embodiment of the present application;
Fig. 2 is a schematic structural diagram of a document identification apparatus according to an embodiment of the present application;
Fig. 3 is a schematic hardware structure of an electronic device for implementing a file identification method according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the application. Rather, they are merely examples of apparatus and methods consistent with aspects of the application.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this disclosure, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the corresponding listed items.
It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the application. The term "if" as used herein may be interpreted as "at..once" or "when..once" or "in response to a determination", depending on the context.
The file identification method provided by the application is described in detail below.
Referring to fig. 1, fig. 1 is a flowchart of a file identification method provided by the present application, where the method may be applied to a network security device, and the network security device may, but is not limited to, a firewall, etc., and when the network security device implements the method, the method may include the following steps:
s101, carrying out file identification on the received file to be identified.
In this step, after the traffic enters the network security device, the file transmitted in the traffic is identified, and for convenience of description, the transmitted file may be referred to as the file to be identified.
Optionally, before the file to be identified is identified, application identification may be performed on the stream, and when the application of the set protocol is identified, step S101 is performed, that is, file identification is performed on the file to be identified of the application conforming to the set protocol.
Specifically, for some scenes only needing application control, when an application corresponding to the data stream is identified, the data stream of the application is characterized to be safe to a certain extent, that is, deep message detection processing is not needed, so that message processing performance is improved to a certain extent and message processing time is saved. In this scenario, in order to further improve the security of the data flow entering the network, this embodiment proposes to execute the flow shown in fig. 1 after identifying the application.
In addition, there may be a case where an application needs to be identified in other scenes, in order to adapt to the actual needs of other scenes, application identification is performed first on the premise of changing the implementation flow of other scenes as little as possible, and then the flow shown in fig. 1 is executed after the application identification.
It should be noted that the above-mentioned setting protocol may be, but not limited to, http and FTP (file transfer protocol), and the like.
S102, when the file to be identified is a file of a set type, extracting a file header from the file to be identified.
In this step, after the file type of the file to be identified is identified, since the ratio of the executable files of some types in the traffic is very large, viruses generally invade into such files, in order to ensure the security of the file and the security of the network, the file type of the file to be identified by viruses is set, when the file type of the file to be identified is the file type, it is indicated that the virus detection needs to be performed on the file to be identified, and then the file header is extracted from the file to be identified.
S103, carrying out hash calculation on the target information in the file header to obtain a local hash value.
In this step, the target information is feature information for virus identification, that is, the feature information may be changed due to viruses, based on which the target information of the file header is extracted, and then hash calculation is performed on the target information to obtain the local hash value.
S104, matching the local hash value with a virus hash characteristic library.
Wherein the virus hash feature library comprises feature hash values of viruses.
In this step, in order to identify whether a virus exists in a file, a virus feature library is preconfigured, and virus features of the currently existing virus are subjected to hash calculation, so that a feature hash value of the virus is obtained, and a virus hash feature library is generated. On the basis, when the local hash value is matched with the virus hash feature library, if the virus hash feature library comprises the local hash value, the local hash value is confirmed to be successfully matched with the virus hash feature library, step S105 is executed, namely the file to be identified is a virus file, and if the virus hash feature library does not comprise the local hash value, the local hash value is confirmed to be not matched with the virus hash feature library.
The virus hash feature library can be dynamically updated, and as viruses increase, the virus features of the newly added viruses are subjected to hash calculation to obtain the feature hash values of the newly added viruses, and then the feature hash values are updated into the virus hash feature library.
It should be noted that the virus hash feature library may further include a virus identifier of a virus, that is, a correspondence between the virus identifier and a virus feature is recorded in the virus hash feature library, so that when the local hash value is matched with the virus hash feature library, it can be identified whether a file to be identified has a virus or not, and also which virus belongs to the file to be identified. Specifically, when the local hash value is confirmed to be in the virus hash feature library, viruses in the files to be identified are confirmed, and meanwhile, the virus identification of the viruses in the files to be identified can be determined based on the corresponding relation between the virus identification and the virus features. Therefore, the accuracy of the virus identification result can be improved, and the user can conveniently execute effective countermeasure based on the identified virus after the virus identification result is displayed to the user.
And S105, when the matching is successful, confirming that the file to be identified is a virus file.
According to the file identification method, after the received file to be identified is identified, when the file to be identified is identified as the file with the set type, the file header is extracted from the file to be identified, then hash calculation is carried out on target information in the file header to obtain a local hash value, the local hash value is matched with a virus hash feature library, and when the matching is successful, the file to be identified is confirmed to be a virus file. According to the method and the device, only the local hash calculation is needed to be carried out on the file header of the file to be identified, and the matching processing can be carried out on the local hash result obtained through the local hash calculation and the virus hash feature library, so that whether the file to be identified is a virus file or not is identified, and therefore when the file passing through the network equipment is subjected to virus detection, more memory is not needed to be occupied, and accuracy of the virus detection result is improved. In addition, since hash calculation is not needed based on the whole file to be identified, the identification speed of file identification is improved.
Alternatively, the files to be identified may be, but are not limited to, executable files, office files, compressed files, and the like.
Alternatively, the setting type may be, but is not limited to, a PE executable file under a Windows operating system. Based on the above, the file header includes an image file header and an optional image header, and based on this, step S103 may be executed according to the following procedure, where a hash calculation is performed on the image file header to obtain an intermediate hash value, target optional image header information matched with a machine type code is extracted from the optional image header according to the machine type code, and hash calculation is performed according to the intermediate hash value and the target optional image header information to obtain the local hash value.
Optionally, step S102 may be executed according to a procedure that the file to be identified is disassembled by using the file parsing plug-in corresponding to the setting type, so as to extract a header of the file to be identified.
Specifically, the file analysis plug-in corresponding to the file to be identified may be utilized to disassemble the file to be identified, and when the identified file type of the file to be identified is a PE executable file of PE type, the mapping file header and the optional mapping header may be resolved when the file analysis plug-in utilizing the PE executable file performs analysis identification on the PE executable file.
On the basis, because the flow of the network security device continuously enters the network security device, correspondingly, when the identification of the file to be identified is carried out, the image file header is identified firstly according to the sequence of the content in the message header, and then the optional image header is continuously identified when the other content of the message header of the file to be identified is received successively.
It should be noted that, since the bit widths of the machine type codes supported by the viruses are different, the target selectable image header for performing the secondary hash is selected according to the bit widths of the machine type codes, that is, the machine type codes are parsed from the image file header after the image file header is parsed, after the selectable image header is extracted, the target selectable image header information which is consistent with the bit widths of the machine type codes parsed in the foregoing may be extracted from the selectable image header, and then the hash calculation is performed based on the intermediate hash value and the target selectable influence header information, so as to obtain the local hash value, thereby adapting the machine type codes supported by the viruses, and further, based on the local hash value, to identify whether the file to be identified is the virus file or not in preparation.
Optionally, based on any one of the above embodiments, in this embodiment, when the matching is unsuccessful, pattern string feature matching is performed on the file to be identified, so as to identify whether the file to be identified is a virus file.
Specifically, when the matching is unsuccessful based on the local hash value, it can be confirmed to a certain extent that the file to be identified is not a virus file, and in order to perform virus detection on the file to be identified more easily, the embodiment proposes that pattern string feature matching is performed on the file to be identified, so as to further confirm whether the file to be identified is a virus file, thereby further improving the virus identification result of the file to be identified.
It should be noted that, the method based on pattern string feature matching may be implemented with reference to the method provided so far, which is not limited in this embodiment.
Further, when the local hash value is not successfully matched with the virus hash feature library or the file to be identified is not identified as a virus file when pattern string feature matching is performed on the file to be identified, performing full-text hash calculation on the file to be identified to obtain a hash result, and identifying whether the file to be identified is a virus file according to the hash result.
Specifically, when the matching result based on the local hash value and the virus hash feature library is that the matching result is not successful, in order to more accurately identify the virus of the file to be identified, the embodiment proposes that whether the current file to be identified is complete is confirmed, if the current file to be identified is a complete file, hash calculation is performed on the file to be identified to obtain a hash result, and then whether the file to be identified is a virus file is confirmed according to the hash result, so that the accuracy of the virus detection result of the file is further improved.
And when the matching result of the pattern string feature matching is unsuccessful, the result of the pattern string feature matching based on the local hash value is indicated to be unsuccessful, and in this embodiment, it is proposed to confirm whether the current file to be identified is complete, if the current file to be identified is complete, hash calculation is performed on the file to be identified to obtain a hash result, and then, whether the file to be identified is a virus file is confirmed according to the hash result, thereby further improving accuracy of the virus detection result of the file.
It should be noted that, the virus identification based on the complete file to be identified may be implemented according to the method provided at present, which is not limited in this embodiment.
Alternatively, the hash calculation may be performed by, but not limited to, employing a message digest MD5 algorithm, or the like.
It should be noted that the parsed image file header may include, but is not limited to, a Machine type code Machine, a number of segments in the file to be identified, a size of an optional image header, and the like.
When the file to be identified is a PE executable file, the extracted file header may be recorded as PE FILE HEADER, and on the basis, when the file resolution plug-in corresponding to the PE executable file is used for file disassembly of the PE executable file, the information such as DOS header, DOS STUB, PE signature and the like can be disassembled in addition to the image file header and the optional image header.
First, the contents of the image file header are described as follows:
The Machine type identifier is denoted as Machine, and the unique Machine code used by each CPU may include, but is not limited to, 32 bits, 64 bits, etc., such as a Machine code compatible with a 32-bit Intel x86 chip is 14C.
The number of the segments in the file to be identified is recorded as Number Of Sections, and the number of the sections existing in the file, namely the number of the section segments in the PE file, such as the number of the section segments of data, text and the like of the PE section table part, is indicated.
The size of the optional image Header is Size Of Optional Header, and the size of the structural body optional image Header PE Option Header is shown as a whole.
The optional IMAGE Header is denoted as PE Option Header, and its structure is image_ OPTIONAL _header32, and the number of main members of the optional IMAGE Header is 9, specifically:
Magic word, magic for the type of the file to be identified is PE type, 32 bits or 64 bits.
Address Of Entry Point, a code start address for the program first to execute is indicated for the program entry address.
Image Base, which is the mapping Base address, is used for mapping the real address position of the PE file in the memory space, and indicates the preferential loading address of the file (the virtual memory range of the 32-bit process is 0-7 FFFFFFF).
Section Alignment is the memory alignment granularity, i.e. the alignment granularity when the PE file is mapped to memory. FILE ALIGNMENT is the disk alignment granularity, i.e., the alignment granularity of the PE file when stored in disk. The former establishes the minimum unit of the section in the memory, and the latter establishes the minimum unit of the section in the disk file.
The Size Of the Image is the total Size Of the PE file Image in the memory, namely the Size Of the space occupied by the PE Image in the virtual memory is specified.
Size Of Headers, which is the size of the entire PE header, including DOS header+PE label+standard PE header+optional PE header+section table total size.
Subsystem, which is a Subsystem used by the user interface, distinguishes between system driver files and common executable files.
Number Of Rva And Size, designating the number of Data Directory arrays for the number of Directory entries.
Data Directory, an array of Data Directory table IMAGE Data Directory structures.
It should be noted that, a virus file may be determined according to information such as a program entry address, a node structure address, a timestamp, and a size of a space occupied by the PE image in the virtual memory.
By adopting the file identification method provided by any embodiment of the application, even though the file is not sensitive to local modification, when a virus file is changed (such as a code segment is changed or additional data is changed), the file header information of the file is fixed, so that similar variant viruses can be detected by the same rule, and the practicability and the universality of the file identification method provided by the application are improved. In addition, in order to reduce the false alarm rate, when analyzing a mass sample to extract local hash rules, the rules of PE (polyethylene) shell adding, package adding, infection and the like which are easy to generate false alarm are removed, and the removed parts can be subjected to complementary detection through full-text hash and pattern string feature matching, so that the accuracy of file identification is further improved.
By adopting the file identification method provided by any embodiment of the application, the local hash calculation method has a certain merging rate, and compared with the full-text hash algorithm, the local hash algorithm can cover a large number of virus samples, and reduces the use of memory.
In addition, for the local hash matching method, as the information of the concerned PE file is at the file head, only the first plurality of bytes of the PE file need to be calculated and processed, hash calculation and matching are not needed to be carried out on the whole file, or AC matching processing is carried out on the whole file, so that the CPU utilization rate is greatly reduced, and meanwhile, the file identification performance is greatly improved.
Furthermore, as the local hash can be calculated for the first packet of most scenes, the recognition rate can be improved by adopting any file recognition method provided by the application under the condition that the scenes such as the disorder of the message, including the disorder of the IP layer, the disorder of the TCP and the disorder of the application layer are not recombined.
Based on the same inventive concept, the application also provides a file identification device corresponding to the file identification method. The implementation of the document identification apparatus may refer specifically to the description of the document identification method described above, and will not be discussed here.
Referring to fig. 2, fig. 2 is a schematic diagram of a file identifying apparatus according to an exemplary embodiment of the present application, which is disposed in a network security device, and includes:
The identifying module 201 is configured to identify a received file to be identified;
The extracting module 202 is configured to extract a header from the file to be identified when the file to be identified is a file of a set type;
the hash calculation module 203 is configured to perform hash calculation on the target information in the file header to obtain a local hash value;
A first matching module 204, configured to match the local hash value with a virus hash feature library, where the virus hash feature library includes feature hash values of viruses;
And the confirming module 205 is configured to confirm that the file to be identified is a virus file when the matching result of the first matching module is that the matching is successful.
Optionally, based on the above embodiment, the file to be identified in the present embodiment includes an executable file under a Windows operating system, where the header includes an image header and an optional image header;
The hash calculation module 203 is specifically configured to perform hash calculation on the image file header to obtain an intermediate hash value, extract target selectable image header information matched with the machine type code from the selectable image header according to the machine type code, and perform hash calculation according to the intermediate hash value and the target selectable image header information to obtain the local hash value.
Optionally, based on any one of the foregoing embodiments, the file identifying apparatus provided in this embodiment further includes:
and the second matching module (not shown in the figure) is used for carrying out pattern string feature matching on the file to be identified to identify whether the file to be identified is a virus file or not when the matching result of the first matching module is that the matching is unsuccessful.
Further, the message processing apparatus provided in this embodiment further includes:
And a third matching module (not shown in the figure) configured to perform full-text hash calculation on the file to be identified to obtain a hash result when the matching result of the first matching module 204 is unsuccessful, or when the matching result of the second matching module (not shown in the figure) is that the file to be identified is not identified as a virus file, and identify whether the file to be identified is a virus file according to the hash result.
Optionally, based on any one of the foregoing embodiments, the extracting module 202 is specifically configured to disassemble the file to be identified by using a file parsing plug-in corresponding to the setting type, so as to extract a header of the file to be identified.
In the file identification device provided by any embodiment of the application, after the received file to be identified is identified, when the file to be identified is identified as a file of a set type, a file header is extracted from the file to be identified, then hash calculation is carried out on target information in the file header to obtain a local hash value, the local hash value is matched with a virus hash feature library, and when the matching is successful, the file to be identified is confirmed to be a virus file. According to the method and the device, only the local hash calculation is needed to be carried out on the file header of the file to be identified, and the matching processing can be carried out on the local hash result obtained through the local hash calculation and the virus hash feature library, so that whether the file to be identified is a virus file or not is identified, and therefore when the file passing through the network equipment is subjected to virus detection, more memory is not needed to be occupied, and accuracy of the virus detection result is improved. In addition, since hash calculation is not needed based on the whole file to be identified, the identification speed of file identification is improved.
Based on the same inventive concept, the embodiments of the present application provide an electronic device, which may be, but is not limited to, the network security device described above. As shown in fig. 3, the electronic device includes a processor 301 and a machine-readable storage medium 302, the machine-readable storage medium 302 storing a computer program executable by the processor 301, the processor 301 being caused by the computer program to perform a file identification method provided by any of the embodiments of the present application. The electronic device further comprises a communication interface 303 and a communication bus 304, wherein the processor 301, the communication interface 303 and the machine readable storage medium 302 perform communication with each other via the communication bus 304.
The communication bus mentioned above for the electronic device may be a peripheral component interconnect standard (PERIPHERAL COMPONENT INTERCONNECT, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.
The communication interface is used for communication between the electronic device and other devices.
The machine-readable storage medium 302 may be a Memory, which may include random access Memory (Random Access Memory, RAM), DDR SRAM (Double Data Rate Synchronous Dynamic Random Access Memory, double rate synchronous dynamic random access Memory), or Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.
The Processor may be a general-purpose Processor including a central processing unit (Central Processing Unit, CPU), a network Processor (Network Processor, NP), etc., or may be a digital signal Processor (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components.
For the electronic device and the machine-readable storage medium embodiments, the description is relatively simple, and reference should be made to the description of the method embodiments for relevant points, since the method content involved is substantially similar to that of the method embodiments described above.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.
The implementation process of the functions and roles of each unit/module in the above device is specifically shown in the implementation process of the corresponding steps in the above method, and will not be repeated here.
For the device embodiments, reference is made to the description of the method embodiments for the relevant points, since they essentially correspond to the method embodiments. The above described apparatus embodiments are merely illustrative, wherein the units/modules illustrated as separate components may or may not be physically separate, and the components shown as units/modules may or may not be physical units/modules, i.e. may be located in one place, or may be distributed over a plurality of network units/modules. Some or all of the units/modules may be selected according to actual needs to achieve the purposes of the present solution. Those of ordinary skill in the art will understand and implement the present application without undue burden.
The foregoing description of the preferred embodiments of the application is not intended to be limiting, but rather to enable any modification, equivalent replacement, improvement or the like to be made within the spirit and principles of the application.

Claims (8)

Translated fromChinese
1.一种文件识别方法,其特征在于,包括:1. A file identification method, comprising:对接收到的待识别文件进行文件识别;Performing file identification on the received file to be identified;当所述待识别文件为设定类型的文件时,则从所述待识别文件中提取出文件头;When the file to be identified is a file of a set type, a file header is extracted from the file to be identified;对所述文件头中的目标信息进行哈希计算,得到局部哈希值;Performing hash calculation on the target information in the file header to obtain a local hash value;将所述局部哈希值与病毒哈希特征库进行匹配,所述病毒哈希特征库包括病毒的特征哈希值;Matching the local hash value with a virus hash feature library, wherein the virus hash feature library includes characteristic hash values of viruses;当匹配成功时,则确认所述待识别文件为病毒文件;If the match is successful, the file to be identified is confirmed to be a virus file;所述待识别文件包括Windows操作系统下的可执行文件;所述文件头包括映像文件头和可选映像头;The file to be identified includes an executable file under the Windows operating system; the file header includes an image file header and an optional image header;对所述文件头中的目标信息进行哈希计算,得到局部哈希值,包括:Performing a hash calculation on the target information in the file header to obtain a local hash value includes:对所述映像文件头进行哈希计算,得到中间哈希值;Performing hash calculation on the image file header to obtain an intermediate hash value;根据机器类型码,从所述可选映像头中提取出与所述机器类型码相匹配的目标可选映像头信息;extracting target optional image header information matching the machine type code from the optional image header according to the machine type code;根据所述中间哈希值和所述目标可选映像头信息进行哈希计算,得到所述局部哈希值。A hash calculation is performed based on the intermediate hash value and the target optional image header information to obtain the local hash value.2.根据权利要求1所述的方法,其特征在于,还包括:2. The method according to claim 1, further comprising:当未匹配成功时,对所述待识别文件进行模式串特征匹配,以识别所述待识别文件是否为病毒文件。When the match is unsuccessful, pattern string feature matching is performed on the file to be identified to identify whether the file to be identified is a virus file.3.根据权利要求2所述的方法,其特征在于,还包括:3. The method according to claim 2, further comprising:当所述局部哈希值与所述病毒哈希特征库未匹配成功,或者对所述待识别文件进行模式串特征匹配时未识别出所述待识别文件为病毒文件时,则当所述待识别文件为完整的文件时,则对所述待识别文件进行全文哈希计算,得到哈希结果,根据所述哈希结果识别所述待识别文件是否为病毒文件。When the local hash value fails to match the virus hash feature library, or when the file to be identified is not identified as a virus file when the pattern string feature matching is performed on the file to be identified, then when the file to be identified is a complete file, a full-text hash calculation is performed on the file to be identified to obtain a hash result, and whether the file to be identified is a virus file is identified based on the hash result.4.根据权利要求1所述的方法,其特征在于,从所述待识别文件中提取出文件头,包括:4. The method according to claim 1, wherein extracting the file header from the file to be identified comprises:利用所述设定类型对应的文件解析插件对所述待识别文件进行拆解处理,以提取出所述待识别文件中的文件头。The file to be identified is disassembled using a file parsing plug-in corresponding to the set type to extract a file header from the file to be identified.5.一种文件识别装置,其特征在于,包括:5. A file recognition device, comprising:识别模块,用于对接收到的待识别文件进行文件识别;An identification module is used to identify the received file to be identified;提取模块,用于当所述待识别文件为设定类型的文件时,则从所述待识别文件中提取出文件头;an extraction module, configured to extract a file header from the file to be identified when the file to be identified is a file of a set type;哈希计算模块,用于对所述文件头中的目标信息进行哈希计算,得到局部哈希值;A hash calculation module, configured to perform hash calculation on the target information in the file header to obtain a local hash value;第一匹配模块,用于将所述局部哈希值与病毒哈希特征库进行匹配,所述病毒哈希特征库包括病毒的特征哈希值;A first matching module, configured to match the local hash value with a virus hash feature library, wherein the virus hash feature library includes characteristic hash values of viruses;确认模块,用于当所述第一匹配模块的匹配结果为匹配成功时,则确认所述待识别文件为病毒文件;a confirmation module, configured to confirm that the file to be identified is a virus file when the matching result of the first matching module is a successful match;所述待识别文件包括Windows操作系统下的可执行文件;所述文件头包括映像文件头和可选映像头;The file to be identified includes an executable file under the Windows operating system; the file header includes an image file header and an optional image header;所述哈希计算模块,具体用于对所述映像文件头进行哈希计算,得到中间哈希值;根据机器类型码,从所述可选映像头中提取出与所述机器类型码相匹配的目标可选映像头信息;根据所述中间哈希值和所述目标可选映像头信息进行哈希计算,得到所述局部哈希值。The hash calculation module is specifically configured to perform a hash calculation on the image file header to obtain an intermediate hash value; extract target optional image header information that matches the machine type code from the optional image header according to the machine type code; and perform a hash calculation based on the intermediate hash value and the target optional image header information to obtain the local hash value.6.根据权利要求5所述的装置,其特征在于,还包括:6. The device according to claim 5, further comprising:第二匹配模块,用于当所述第一匹配模块的匹配结果为未匹配成功时,对所述待识别文件进行模式串特征匹配,以识别所述待识别文件是否为病毒文件。The second matching module is configured to perform pattern string feature matching on the file to be identified to identify whether the file to be identified is a virus file when the matching result of the first matching module is unsuccessful.7.根据权利要求6所述的装置,其特征在于,还包括:7. The device according to claim 6, further comprising:第三匹配模块,用于当所述第一匹配模块的匹配结果为未匹配成功,或者,所述第二匹配模块的匹配结果为未识别出所述待识别文件为病毒文件时,则当所述待识别文件为完整的文件时,则对所述待识别文件进行全文哈希计算,得到哈希结果,根据所述哈希结果识别所述待识别文件是否为病毒文件。The third matching module is used to, when the matching result of the first matching module is unsuccessful matching, or the matching result of the second matching module is that the file to be identified is not identified as a virus file, and when the file to be identified is a complete file, perform a full-text hash calculation on the file to be identified to obtain a hash result, and identify whether the file to be identified is a virus file according to the hash result.8.根据权利要求5所述的装置,其特征在于,8. The device according to claim 5, characterized in that所述提取模块,具体用于利用所述设定类型对应的文件解析插件对所述待识别文件进行拆解处理,以提取出所述待识别文件中的文件头。The extraction module is specifically configured to utilize the file parsing plug-in corresponding to the set type to perform a disassembly process on the file to be identified, so as to extract the file header from the file to be identified.
CN202210588323.1A2022-05-272022-05-27 File identification method and deviceActiveCN114943078B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202210588323.1ACN114943078B (en)2022-05-272022-05-27 File identification method and device

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202210588323.1ACN114943078B (en)2022-05-272022-05-27 File identification method and device

Publications (2)

Publication NumberPublication Date
CN114943078A CN114943078A (en)2022-08-26
CN114943078Btrue CN114943078B (en)2025-09-05

Family

ID=82910035

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202210588323.1AActiveCN114943078B (en)2022-05-272022-05-27 File identification method and device

Country Status (1)

CountryLink
CN (1)CN114943078B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN115688099A (en)*2022-11-012023-02-03安天科技集团股份有限公司Computer virus retrieval method and device, computer equipment and storage medium
CN115695031A (en)*2022-11-072023-02-03北京安博通科技股份有限公司Host computer sink-loss detection method, device and equipment

Citations (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN104424438A (en)*2013-09-062015-03-18华为技术有限公司Anti-virus file detection method, anti-virus file detection device and network equipment

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US7577848B2 (en)*2005-01-182009-08-18Microsoft CorporationSystems and methods for validating executable file integrity using partial image hashes
CN103067364B (en)*2012-12-212015-11-25华为技术有限公司Method for detecting virus and equipment
CN108256327B (en)*2017-12-222020-12-29新华三信息安全技术有限公司File detection method and device
EP3588349B1 (en)*2018-06-292022-03-30AO Kaspersky LabSystem and method for detecting malicious files using two-stage file classification

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN104424438A (en)*2013-09-062015-03-18华为技术有限公司Anti-virus file detection method, anti-virus file detection device and network equipment

Also Published As

Publication numberPublication date
CN114943078A (en)2022-08-26

Similar Documents

PublicationPublication DateTitle
CN115398860B (en)Session detection method, device, detection equipment and computer storage medium
CN112468520B (en)Data detection method, device and equipment and readable storage medium
US8220048B2 (en)Network intrusion detector with combined protocol analyses, normalization and matching
US8893278B1 (en)Detecting malware communication on an infected computing device
CN109768992B (en)Webpage malicious scanning processing method and device, terminal device and readable storage medium
JP5832951B2 (en) Attack determination device, attack determination method, and attack determination program
CN114943078B (en) File identification method and device
US9106688B2 (en)System, method and computer program product for sending information extracted from a potentially unwanted data sample to generate a signature
CN107979581B (en) Zombie feature detection method and device
CN113810408B (en)Network attack organization detection method, device, equipment and readable storage medium
RU2653241C1 (en)Detecting a threat of a zero day with the use of comparison of a leading application/program with a user agent
US8336098B2 (en)Method and apparatus for classifying harmful packet
CN113965419B (en)Method and device for judging attack success through reverse connection
CN112202717B (en)HTTP request processing method and device, server and storage medium
US8910281B1 (en)Identifying malware sources using phishing kit templates
CN115695031A (en)Host computer sink-loss detection method, device and equipment
CN113890758B (en)Threat information method, threat information device, threat information equipment and computer storage medium
US9306908B2 (en)Anti-malware system, method of processing packet in the same, and computing device
JP6592196B2 (en) Malignant event detection apparatus, malignant event detection method, and malignant event detection program
CN112822204A (en)NAT detection method, device, equipment and medium
KR20190028597A (en)Matching method of high speed snort rule and yara rule based on fpga
CN112583827B (en)Data leakage detection method and device
CN114363059A (en) An attack identification method, device and related equipment
CN117118707A (en)Malicious network intrusion detection method, system, equipment and medium for transformer substation
WO2024036822A1 (en)Method and apparatus for determining malicious domain name, device, and medium

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp