BACKGROUND OF THE INVENTION1. Field of the Invention
The present invention relates generally to computer networks, and more particularly to insider threat detection in computer networks.
2. Background Art
In many situations, network and security analysts need to map observed network events to the users that generated them. However, many events produced by devices such as network-based intrusion detection systems (NIDSs) and firewalls only identify the source of the event as an Internet Protocol (IP) address of the originating host. Unfortunately, IP addresses cannot be statically mapped to users within most internal corporate local area networks (LANs). The commonly used Dynamic Host Configuration Protocol (DHCP) dynamically leases IP addresses to hosts on a first-come, first-served basis and for limited durations. In addition, many organizations take advantage of Microsoft Windows Roaming Profiles to permit their mobile users to effectively operate from any workstation. Mapping IP addresses to users is particularly important for insider threat detection, which requires knowledge of the user behind the observed behavior.
Passive fingerprinting allows the identification of the host operating system by observing the TCP/IP protocol and welcome banners associated with well known services (e.g., telnet). More recently, passive fingerprinting has allowed detection of applications running on a host by detecting and analyzing network protocols in use. However, passive fingerprinting does not allow the passive attribution of anonymous network events to their associated users.
What are needed therefore are methods for passively attributing anonymous network events to their associated users.
BRIEF SUMMARY OF THE INVENTIONSystems, methods, and computer program products for passively attributing anonymous network events to their associated users are provided herein.
Embodiments of the present invention include filtering network events occurring over a pre-determined time interval to generate a filtered event list. Filtering of the events may be done according to one or more parameters. Based on the filtered event list and the event attribution approach, anonymous network events are attributed to users associated with events in the filtered event list.
In an embodiment, event attribution includes attributing an anonymous network event to a user associated with a nearest-neighbor event relative to the anonymous network event. The nearest-neighbor event may be determined based on time proximity or distance relative to the anonymous event.
In another embodiment, event attribution includes attributing an anonymous network event to a user associated with an event in the filtered event list, wherein that user maximizes an event attribution function.
In a further embodiment, event attribution includes determining a first potential attribution user for an anonymous network event based on a nearest-neighbor attribution approach; determining a second potential attribution user for the anonymous network event based on an event attribution function approach; and comparing the first and second potential attribution users to determine the attribution of the anonymous event.
Embodiments of the present invention can be performed off-line or in real-time.
Embodiments of the present invention can be used, for example, by organizations to complement network intrusion detection systems (NIDSs), network forensic analysis tools (NFATs), and security information management systems (SIMSs). As noted above, NIDSs can only monitor network activity by IP address and would thus benefit from methods according to embodiments of the present invention to increase their monitoring capabilities. Similarly, network forensic analysis tools that analyze system network packets and security information management systems that analyze events from security devices would benefit from methods according to the present invention.
Further embodiments, features, and advantages of the present invention, as well as the structure and operation of the various embodiments of the present invention, are described in detail below with reference to the accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS/FIGURESThe accompanying drawings, which are incorporated herein and form a part of the specification, illustrate the present invention and, together with the description, further serve to explain the principles of the invention and to enable a person skilled in the pertinent art to make and use the invention.
FIG. 1 is an example that illustrates a method for passively attributing anonymous network events to users according to an embodiment of the present invention.
FIG. 2 is an example that illustrates another embodiment of the method ofFIG. 1.
FIG. 3 is a process flowchart of the methods ofFIGS. 1 and 2.
FIG. 4 is an example that illustrates another method for passively attributing anonymous network events to users according to an embodiment of the present invention.
FIG. 5 is a process flowchart of the method ofFIG. 4.
FIG. 6 is a process flowchart of another method for passively attributing anonymous network events to users according to the present invention.
FIG. 7 illustrates an example computer useful for implementing components of the invention.
The present invention will be described with reference to the accompanying drawings. Generally, the drawing in which an element first appears is typically indicated by the leftmost digit(s) in the corresponding reference number.
DETAILED DESCRIPTION OF EMBODIMENT(S)FIG. 1 is an example100 that illustrates a method for passively attributing anonymous network events to users, according to an embodiment of the present invention.
Table101 includes a list of network events that occurred over a pre-determined time interval over a network. Associated with each event in table101 are an event type, a user, an attribution type, an Internet Protocol (IP) address, and a timestamp.
The event type represents an action performed by the event. For example,event107 is associated with an action to send an email over the network.
The user represents an identity of a network user who is thought to have performed the event. Typically, an event is associated with a user with a given degree of certainty. In table101, this is described by the attribution type of the event, which represents a level of confidence between the event and its associated user. In example100, events may be directly attributed, indirectly attributed, or un-attributed. Directly attributed events are attributed with high confidence to their associated users. For example, an event can be directly attributed to a user if it occurs within a network protocol session that is preceded by a successful user authentication. Indirectly attributed events are attributed with less confidence to their associated users but with enough confidence to be attributed. For example, an event can be indirectly attributed to a user by using certain indicators that suggest some confidence that the user performed the event. On the other hand, un-attributed events lack user attribution. Alternatively, events may be either attributed or unattributed. In such embodiment, each event may be associated with a user with a confidence level between 0 and 1. The confidence level is compared to a pre-defined threshold to determine whether the event is attributed or unattributed.
Referring back to table101, the IP address associated with an event represents the IP address where the event originated or is performed. The timestamp represents the time of occurrence of the event.
According to an embodiment of the present invention, events that are not directly attributed undergo a process by which they become directly attributed to a user. In the case of indirectly attributed events, the attribution process usually labels the events with the identity of the same users to which the events were indirectly attributed. Event attribution is described below with respect to an un-attributed event, though the same method is applicable to indirectly attributed events.
The method illustrated in example100 seeks to attributeevent106, which is an un-attributed “search query”, to a network user. Accordingly, the pre-determined time interval spanned by the events in table101 is set according to the timestamp associated withevent106. For example, the time interval is set so that it is centered around the timestamp associated withevent106. It is noted that, for ease of illustration, only nine events102-110 are shown in table101. In actual implementation, table101 may include a larger number of events, which, for example, may span several hours of network time.
The method in example100 works by filtering the list of events contained in table101 to generate a filteredevent list111. In the embodiment of example100, table101 is filtered according to IP address and attribution type so that only events with direct attribution and originating at the same IP address asevent106 are included in filtered event list111 (in addition to event106). Note, for example, thatevents104,105, and108 are filtered out because they occur at a different IP address than whereevent106 occurred. Similarly,events103,107, and109 are filtered out because they are indirectly attributed to their associated users.
According to example100, filteredevent list111 includes only two directly attributedevents102 and110 that also occurred at the same IP address asevent106.Event102 is directly attributed to User1.Event110 is directly attributed to User2. As such,event106 may be attributed to either User1 or User2. In an embodiment, the method attributesevent106 to the user associated with the nearest-neighbor event relative toevent106.
In example100, the nearest-neighbor event relative toevent106 is determined by selecting the event in filteredevent list111 that is closest in time toevent106. Accordingly,event110 is the nearest-neighbor event relative toevent106, andevent106 is attributed to User2. This is becauseevent110 is approximately 3 minutes apart fromevent106, whileevent102 is approximately 14 minutes apart fromevent106. Alternatively, the nearest-neighbor event relative toevent106 is determined by selecting the event in filteredevent list111 that is closest in distance, measured in event count, toevent106. In example100, however,event106 is equidistant fromevents102 and110 (three events apart in both cases) and time proximity would need to be used to resolve the nearest-neighbor determination.
FIG. 2 is an example200 that illustrates another embodiment of the method ofFIG. 1. Similar to example100, the method in example200 works by filtering the list of events in table101 to generate a filteredevent list201. Note that table101 is only filtered by IP address to retain only those events occurring at the same IP address asevent106. As such, filteredevent list201 contains both directly and indirectly attributed events (in addition toevent106, which is sought to be attributed).
The method of example200 also attributesevent106 to the user associated with the nearest-neighbor event relative toevent106. However, in example200, the nearest-neighbor event is determined by selecting the nearest event in distance toevent106 that is directly attributed, in a chronological ordering of the events in filteredevent list201. In other words, the method of example200 considers the relative ordering of events in filteredevent list201 to determine the nearest-neighbor event relative toevent106. Alternatively, the nearest-neighbor event relative toevent106 is determined by selecting the event in filteredevent list201 that is closest in time toevent106.
As illustrated inFIG. 2, each event in filteredevent list201 is assigned a relative position denoted by an event number. The nearest-neighbor event is then determined by comparing the positions of directly attributed events relative to the position ofevent106. In example200, onlyevents102 and110 are directly attributed.Event102 is assignedevent number1 and is separated from event106 (event number3) by a single event. On the other hand,event110 is assignedevent number6 and is separated fromevent106 by two events. Accordingly,event102 is closer in distance toevent106 thanevent110 and is thus the nearest-neighbor event relative toevent106. As such, in example200,event106 is attributed to the same user, User1, asevent102.
In cases where the event being attributed is at an equal distance from the two nearest directly attributed events, other nearest-neighbor determination methods including time proximity can be used.
Note that examples100 and200 result in different attribution ofevent106 based on the approach used for nearest-neighbor event determination. The invention is not limited to the example methods ofFIGS. 1 and 2. As would be understood, other variations of nearest-neighbor determination can be used.
FIG. 3 is aprocess flowchart300 corresponding to the methods ofFIGS. 1 and 2.Process flowchart300 begins instep302, which includes filtering network events occurring over a pre-determined time interval according to IP address and/or event attribution type to generate a filtered event list. In an embodiment, other event characteristics can be used to filter network events instep302.
In an embodiment, the filtering includes determining network events occurring within the pre-determined time interval that originate from the same IP address as the anonymous network event and/or that have direct and/or indirect attribution to associated users. Network events can be directly attributed, indirectly attributed, or un-attributed. As described above, a directly attributed event is one that is attributed to a given user with high confidence. This may be due to a successful authentication event, for example, such as a login. An un-attributed event is an anonymous event. Indirectly attributed events are those with some type of user context. For example, an “email send” event with the sender's email address in the email can be an indirectly attributed event.
The pre-determined time interval is selected according to a timestamp associated with the anonymous network event. In an embodiment, the time interval is centered around the timestamp associated with the anonymous event. The width of the time interval may be a function of the rate of occurrence of network events.
Step304 includes attributing the anonymous network event to a user associated with a nearest-neighbor event relative to the anonymous network event in the filtered event list.
In an embodiment,step304 includes attributing the anonymous network event to a user associated with an event in the filtered event list having direct attribution and that is nearest in distance to the anonymous network event in a chronological ordering of the filtered event list. Attribution according to this embodiment is illustrated, for example, inFIG. 2.
In another embodiment,step304 includes attributing the anonymous network event to a user associated with an event in the filtered event list having direct attribution and that is nearest in time to the anonymous network event. Attribution according to this embodiment is illustrated, for example, inFIG. 1.
In practice, events are attributed to users through user identifiers that are associated with the users. For example, the user “John Smith” may have an account userid of “jsmith” that is used to attribute events performed by the userid to the actual user. At the same time, emails sent from the email account “john.smith@some.company” are also events by the same user “John Smith”. It is important that these events are attributed to the same user identity and not be identified as performed by different users. In an embodiment, the different user identifiers (e.g., jsmith, john.smith@some.company, etc.) are normalized to a common form (e.g., jsmith) through the use of a lookup table that maps all the different identifiers associated with a user to this common form.
FIG. 4 is an example400 that illustrates another method for passively attributing anonymous network events to users according to an embodiment of the present invention.
The method in example400 works by filtering the list of events in table101 to generate a filteredevent list401. Note that table401 is filtered, as in example200, by IP address to retain only those events occurring at the same IP address asevent106. As such, filteredevent list401 contains both directly and indirectly attributed events (in addition toevent106, which is sought to be attributed).
The method then attributesevent106 to a user associated with an event in filteredevent list401, where that user maximizes an event attribution function. In example400, there are only two distinct users, User1 and User2, that are associated with events in filteredevent list401. As such, the method of example400 determines which of User1 or User2 results in a higher value of an event attribution function and attributesevent106 to that user. In an embodiment, if both event attribution function values are lower than a given threshold, the event remains un-attributed.
In an embodiment, the event attribution function value for a given user is related to the events associated with that user in filteredevent list401. For example, the event attribution function value may be a function of certain characteristics of those events including event type, event attribution type, and/or event proximity to the event being attributed (event106 in example400).
In example400,events102 and103 are attributed to User1. On the other hand,events107,109, and110 are attributed to User2. Each of users User1 and User2 has one directly attributed event associated with it, namelyevents102 and110, respectively. In an embodiment, the event attribution function value is calculated for a given user as a sum of the form:
ΣejεS(u)K(ei, ej) (1)
where K is a kernel function, eirepresents the event being attributed, and S(u)is the sequence of events associated with that given user in the filtered event list.
The kernel function K(ei, ej) calculates a value for a given event ejwith respect to event ei. In an embodiment, the kernel function factors in the event type, the event attribution type, and the time proximity of event ejrelative to event ei. For example, the kernel function may be of the form:
K(ei, ej)=ωje−γ(ti−tj)2 (2)
wherein ωjrepresents a weight associated with event ejaccording to event type and/or attribution type, tjrepresents the time of occurrence of event ej, tirepresents the time of occurrence of the anonymous event, and γ represents a width of the kernel function.
In an example implementation, an event ejis assigned a weight ωjof 1.0 if it is directly attributed and of 0.9 if it is indirectly attributed. The weight correlates with the confidence level associated with the attribution of the event. Accordingly, in example400, for γ equal to 5×10−5, the event attribution function value for User1 and User2 with respect toevent106, calculated according to equation (1), would be approximately equal to 4.7×10−11and 1.036, respectively.Event106 is therefore attributed to User2. In another implementation, the weight of an indirectly attributed event also varies according to the event type.
FIG. 5 is aprocess flowchart500 of the method ofFIG. 4.Process flowchart500 begins instep502, which includes filtering network events occurring over a pre-determined time interval according to one or more of IP address and event attribution type to generate a filtered event list.
In an embodiment, the filtering includes determining network events occurring within the pre-determined time interval that originate from the same IP address as the anonymous network event and/or that have direct and/or indirect attribution to associated users. As described above, network events can be directly attributed, indirectly attributed, or un-attributed.
The pre-determined time interval is selected according to a timestamp associated with the anonymous network event. In an embodiment, the time interval is centered around the timestamp associated with the anonymous event. The width of the time interval may be a function of the rate of network events.
Step504 includes attributing the anonymous network event to a user associated with an event in the filtered event list, wherein the user maximizes an event attribution function.
In an embodiment,step504 includes calculating, for each user associated with an event in the filtered event list, an event attribution function value; and selecting a user having the largest event attribution function value to associate with the anonymous network event. In an embodiment, the event attribution function value, for each user, is related to events associated with the user within the pre-determined time interval. Further, the event attribution function value may be related to one or more of the event type, event attribution type, and event time proximity relative to the anonymous network event of the events associated with the user within the pre-determined time interval.
In an embodiment, the event attribution function value is calculated according to:
ΣejεS(u)K(ei, ej) (3)
wherein eirepresents the anonymous network event, S(u)represents a set of events associated with a given user in the filtered event list, and K represents a kernel function.
In an embodiment, the kernel function K is according to:
K(ei, ej)=ωje−γ(ti−tj)2 (4)
wherein ωj represents a weight associated with an event according to event type and/or attribution type, tjrepresents the time of occurrence of the event, tirepresents the time of occurrence of the anonymous event, and γ represents a width of the kernel function.
Directly attributed events are assigned greater weight than indirectly attributed or un-attributed events. In an embodiment, directly attributed events are assigned a weight of 1.0 and un-attributed events are assigned a weight of 0.0. Indirectly attributed events are assigned weights between 0 and 1 depending on event type. For example, indirectly attributed “print” events may be assigned a weight of 0.999, indirectly attributed “email send” events may be assigned a weight of 0.99, and “FTP” events may be assigned a weight of 0.9.
FIG. 6 is aprocess flowchart600 of another method for passively attributing anonymous network events to users according to the present invention.Process flowchart600 begins instep602, which includes filtering network events occurring over a predetermined time interval according to IP address and/or event attribution type to generate a filtered event list.
Step604 includes determining a first potential attribution user in the filtered event list, wherein the first potential attribution user is associated with a nearest-neighbor event relative to the anonymous network event in the filtered event list. In an embodiment, step604 implements a method according toprocess flowchart300 ofFIG. 3.
Step606 includes determining a second potential attribution user in the filtered event list, wherein the second potential attribution user maximizes an event attribution function. In an embodiment, step606 implements a method according toprocess flowchart500 ofFIG. 5.
Step608 includes attributing the anonymous network event to the first or second potential attribution user when the first and second potential attribution users are the same user. Alternatively,step608 includes maintaining the anonymous network event un-attributed if the first and second potential attribution users are different or if the weight calculated for the un-attributed event using the event attribution function is less than a specified threshold.
Embodiments of the present invention such as methods according toprocess flowcharts300,500, and600, for example, can be performed off-line or in real-time.
Embodiments of the present invention can be used by organizations to complement network-based intrusion detection systems (NIDSs), network forensic analysis tools (NFATs), and security information management systems (SIMSs). As noted above, NIDSs can only monitor network activity by IP address and would thus benefit from methods according to the present invention to increase their monitoring capabilities. Similarly, network forensic analysis tools that analyze system network packets and security information management systems that analyze events from security devices would benefit from methods according to the present invention. In both cases, knowing the identity of the user account associated with a given event helps provide analysts the information needed to effectively respond to the observed activity.
Example Computer ImplementationIn an embodiment of the present invention, the system and components of the present invention described herein are implemented using well known computers, such ascomputer702 shown inFIG. 7.
Thecomputer702 can be any commercially available and well known computer capable of performing the functions described herein, such as computers available from International Business Machines, Apple, Sun, HP, Dell, Compaq, Digital, Cray, etc.
Thecomputer702 includes one or more processors (also called central processing units, or CPUs), such as aprocessor706. Theprocessor706 is connected to acommunication bus704.
Thecomputer702 also includes a main orprimary memory708, such as random access memory (RAM). Theprimary memory708 has stored therein controllogic728A (computer software), and data.
Thecomputer702 also includes one or moresecondary storage devices710. Thesecondary storage devices710 include, for example, ahard disk drive712 and/or a removable storage device or drive714, as well as other types of storage devices, such as memory cards and memory sticks. Theremovable storage drive714 represents a floppy disk drive, a magnetic tape drive, a compact disk drive, an optical storage device, tape backup, etc.
Theremovable storage drive714 interacts with aremovable storage unit716. Theremovable storage unit716 includes a computer useable or readable storage medium724 having stored thereincomputer software728B (control logic) and/or data.Removable storage unit716 represents a floppy disk, magnetic tape, compact disk, DVD, optical storage disk, or any other computer data storage device. Theremovable storage drive714 reads from and/or writes to theremovable storage unit716 in a well known manner.
Thecomputer702 also includes input/output/display devices722, such as monitors, keyboards, pointing devices, etc.
Thecomputer702 further includes a communication ornetwork interface718. Thenetwork interface718 enables thecomputer702 to communicate with remote devices. For example, thenetwork interface718 allows thecomputer702 to communicate over communication networks or mediums724B (representing a form of a computer useable or readable medium), such as LANs, WANs, the Internet, etc. Thenetwork interface718 may interface with remote sites or networks via wired or wireless connections.
Control logic728C may be transmitted to and from thecomputer702 via the communication medium724B. More particularly, thecomputer702 may receive and transmit carrier waves (electromagnetic signals) modulated withcontrol logic730 via the communication medium724B.
Any apparatus or manufacture comprising a computer useable or readable medium having control logic (software) stored therein is referred to herein as a computer program product or program storage device. This includes, but is not limited to, thecomputer702, themain memory708, thesecondary storage devices710, theremovable storage unit716 and the carrier waves modulated withcontrol logic730. Such computer program products, having control logic stored therein that, when executed by one or more data processing devices, cause such data processing devices to operate as described herein, represent embodiments of the invention.
The invention can work with software, hardware, and/or operating system implementations other than those described herein. Any software, hardware, and operating system implementations suitable for performing the functions described herein can be used.
CONCLUSIONWhile various embodiments of the present invention have been described above, it should be understood that they have been presented by way of example only, and not limitation. It will be apparent to persons skilled in the relevant art that various changes in form and detail can be made therein without departing from the spirit and scope of the invention. Thus, the breadth and scope of the present invention should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.