Detecting Malicious Communication Activity in Communications Networks
Field of the Invention
This invention relates to a method for detecting malicious communication activity between user devices in an electronic communications network. In particular, the invention relates to the detection of malware which propagates between user devices by sending electronic communications to contacts stored in infected devices.
Background of the Invention
Malicious software, often referred to as malware, is an ever present threat to the security of electronic networks across the globe. Since the first releases of relatively simple malware into the public domain in the early 1980s in the form of the so-called Apple viruses, hackers have persistently challenged network security developments, leading to increasingly sophisticated threats mirrored by further improvements in security.
Different types of malware exist. The three main types are viruses, worms and Trojan Horses. A computer virus attaches itself to a program or file so it can spread from one computer to another when an infected file is transferred. Much like human viruses, computer viruses can range in severity; some viruses cause only mildly annoying effects while others can damage hardware, software, or files. It is important to note that a virus cannot spread without a human action to keep it going. Worms are similar to viruses by design, and can be considered as a sub-class of viruses. Worms spread from , computer to computer, but unlike viruses, have the ability to travel without any help from a human. Worms first appeared in the late 1990s and have become increasingly common. They may spread by scanning infected devices for stored contact details of other devices, such as email addresses stored in an address book or such like. The worms then send versions of themselves on to infect other devices, which can rapidly lead to system and/or network overload. Trojan Horses, at first glance may appear to be useful software but can actually cause damage once installed or run on a computer. Unlike viruses and worms, Trojans Horses do not reproduce by infecting other files, nor do they self- replicate.
The spread of malware to a system can be controlled to a certain extent by the detection of the malware before it infects the system. Systems following such approaches commonly scan for signatures that correspond with known malware and quarantine or delete any incoming data containing such signatures before the malware can infect the system. International patent application WO 03/101037 describes one such system from Symantec(TM), a provider of anti- virus software applications. Here executions of computer malware are analysed to develop register signatures for the malware. The register signatures specify the set of outputs the malware produces when executed with a given set of inputs. A malware detection system holds a database of the register signatures, for which incoming files can be scanned against. Such systems require the maintenance of malware signature databases and rely on such databases being up-to-date for system protection. The development of polymorphic malware which has the ability to modify itself as it propagates, frequently challenges the effectiveness of such systems.
A different approach to malware control is to detect the operation of malware once it has already infected a system and then to apply counter- measures in order to reduce the malware spreading. Such an approach was adopted in European patent application EP 1280039, which describes an email client which serves to detect mass-mailing malware by detecting if over a threshold number of addressees from within the address book of that email client are being sent an email or over a predetermined number of substantially identical emails are being sent by that email client. The sending of email messages to a substantial proportion of the addressees within an address book is a characteristic indicative of mass-mailing malware and preventative action can then be taken. A server-based approach is taken in software provided by Network
Associates Inc., called Outbreak Manager(TM). The software operates upon an email server to detect patterns of email traffic behaviour indicative of a malware outbreak and progressively to apply counter-measures against that outbreak. This activity necessarily places a data processing load upon the email server and can therefore degrade email service performance. Malware outbreaks were initially concentrated on data networks such as the Internet, but in the first half of 2004, the first malware to propagate over a mobile telephone network was reported. The malware, known as Cabir, was designed to attack smartphones running the Symbian™ operating system. Cabir propagated by using the Bluetooth(TM) short-range wireless capabilities of an infected phone to scan for other vulnerable phones to send itself onto. Cabir was relatively easy to contain if users disabled the short-range wireless interfaces on their phones, except for communication with trusted parties. Since then, however, malware such as one known as Comm Warrior have been designed to access the contacts list on infected mobile phones and mail copies of themselves to victims in the form of multimedia messages.
It would then be desirable to provide a system for protecting user devices such as mobile phones and their associated networks from the propagation of malicious software.
Summary of the Invention
In accordance with a first aspect of the present invention, there is provided a method for detecting malicious communication activity between user devices in an electronic communications network, said method comprising the steps of receiving source and destination data for a first plurality of electronic communications made between user devices in said network, storing contact data for a plurality of user devices in said network, the contents of said contact data including identifying data for user devices in said network, said identifying data being derived from said received data, receiving further source and destination data for a second plurality of electronic communications made between user devices in said network, and analysing said further received data and said stored contact data for communication patterns indicative of malicious electronic communication activity between user devices in said network.
Hence, by use of the present invention, the propagation of malware in an electronic communications network can be detected in the network. Source and destination data from electronic communications made across the network can be used to derive contact data such as telephone numbers, email addresses and Internet Protocol (IP) addresses. These contact data are intended to model the contents of the actual contact lists stored on user devices, without the need to individually access each of the user devices themselves. Downloading the actual contact lists from each user device in the network would put undesirable strain on valuable network resources. Furthermore, providing for such remote access may itself create security issues, especially in a mobile telecommunications environment.
Once the models have been created, source and destination data from further electronic communications can be analysed for suspicious patterns of activity with reference to the contact list models. Malware, which propagate by scanning infected user devices for contacts of other user devices to infect and sending copies of themselves on to infect other devices, can therefore be detected by employing the present invention. Preferably, the present invention is used in a data network such as the Internet to protect against malware propagating by email or in a mobile telecommunications network to protect against malware propagating by text, multimedia messaging, file transfer (for example peer-to-peer) or email.
There is further provided apparatus and computer software adapted to perform the method of the present invention.
In accordance with a further aspect of the present invention, there is provided apparatus for use in detecting malicious communication activity between user devices in an electronic communications network, said apparatus being adapted to receive source and destination data for a first plurality of electronic communications made between user devices in said network, store contact data for a plurality of user devices in said network, the contents of said contact data including identifying data for user devices in said network, said identifying data being derived from said received data, receive further source and destination data for a second plurality of electronic communications made between user devices in said network, and analyse said further received data and said stored contact data for communication patterns indicative of malicious electronic communication activity between user devices in said network
In accordance with a yet further aspect of the present invention, there is provided computer software for use in detecting malicious communication activity between user devices in an electronic communications network, said computer software being adapted to receive source and destination data for a first plurality of electronic communications made between user devices in said network, store contact data for a plurality of user devices in said network, the contents of said contact data including identifying data for user devices in said network, said identifying data being derived from said received data, receive further source and destination data for a second plurality of electronic communications made between user devices in said network, and analyse said further received data and said stored contact data for communication patterns indicative of malicious electronic communication activity between user devices in said network Further features and advantages of the invention will become apparent from the following description of preferred embodiments of the invention, given by way of example only, which is made with reference to the accompanying drawings.
Brief Description of the Drawings
Figure 1 is a diagram showing the storing of identifying data in contact lists for a mobile communications network arranged in accordance with an embodiment of the present invention.
Figure 2 is a diagram showing messages being sent from one user device during an analysis period for a mobile communications network arranged in accordance with an embodiment of the present invention.  Figure 3 is a diagram showing messages being sent from another user device during an analysis period for a mobile communications network arranged in accordance with an embodiment of the present invention.
Figure 4 illustrates an example address book model according to an embodiment of the invention.
Detailed Description of the Invention
Figure 1 shows an electronic communications network 30 according to an embodiment of the invention. In this embodiment, the electronic communications network 30 is a mobile telecommunications network and user devices 22, 24, 26 and 28 are able to communicate with each other via telephone calls, emails, Short Message Service (SMS) messages, Multi Media Service (MMS) messages, Instant Messaging (IM) chat messages or other such forms of communication. The user devices 22, 24, 26 and 28 shown in this embodiment are mobile telephones with circuit-switched and packet-switched communications capabilities. Such telephones may be analogue or digital telephones, and/or Voice over Internet Protocol (VoIP) telephones. The invention also comprises a network node such as a security server 20, which is suitably connected (not shown) to the network 30. Security server 20 is responsible for the main data processing functionality of the invention in the form of the modelling and analysis stages which are explained below. Note however, that this functionality may be carried out by more than one suitably connected server, and these servers (not shown) may be situated remotely to each other. Each user device has some form of memory store which is populated with one or more contact lists for associates that the user of the device has communicated with in the past or may wish to communicate with in the future. Such a store may for example be in the form of an electronic address book containing identifying data, i.e. address data such as telephone numbers, email addresses, Internet Protocol (IP) addresses etc., and other data such as corresponding names, for the associates.  In preferred embodiments, the present invention includes a modelling stage, where the contents of these memory stores for a plurality of user devices in the network are modelled without actually accessing the memory stores on the user devices themselves. Accessing each user device would use up bandwidth which is a valuable resource in a mobile telecommunication environment, and furthermore providing for such remote access may itself create security issues. Indeed, such access may not be available. In accordance with this embodiment of the invention, a contact list model is set up and stored at a network node such as the security server 20. The security server 20 receives source and destination data for a first plurality of electronic communications between user devices in the network. The security server 20 then stores contact data dependent on these received communications in order to build up a set of contact data for the electronic communications made by each user in the network. The set of contact data can be in the form of a plurality of contact list models, each associated with a different user device in the network.
The premise used here is that if a user uses their user device to initiate electronic communication with another user device, then the user initiating the communication is likely to have identifying data for the recipient user device stored in a contact list on their user device. The more a user contacts another user, the more likely that this is the case. A user would tend to store such data for easy retrieval, rather than having to enter the identifying data each time for every communication.
In one embodiment of the present invention, if a user has used their user device to initiate a predetermined minimum requirement in relation to contact with another user device, then data identifying the recipient user device is added to the contact list model associated with the initiating user device. The predetermined minimum requirement may for example be one, two, three or more instances of contact. Adding data identifying a recipient user device to a contact list model associated with a given user device may alternatively, or in addition, depend on the frequency with which the recipient user device has been contacted by the given device in a given period of time, the type of the communication, the time of the communication, or other such metrics.
A contact list model associated with a given user and their corresponding user device should then contain data identifying a number of user devices of associates which that given user has initiated communication with in the past. In this way the contact list modelling helps to build a picture of the pattern of electronic communications being made in the network. The modelled contact lists can then be used as an approximation to the contact lists stored in the actual user devices, hereinafter also referred to as actual contact lists. Malware active in the network which propagates by scanning for identifying data in contact lists on infected user devices and sending copies of itself onto other user devices in the network would then produce communication patterns which are recognisable from the contact list models. Once the contact lists models have been set up, analysis of communications in the network for the presence of suspicious communication patterns with reference to the contact list models, thus allows detection of malware activity.
The modelling stage of this embodiment of the invention, which involves the process of storing data identifying user devices in contact list models associated with other user devices, is now described by way of example, with reference to Figure 1. In Figure 1, a first user device 22 is identified by address data A, a second user device 24 is identified by address data B, a third user device 26 is identified by address data C and a fourth user device 28 is identified by address data D.
Table 58 represents the contact list models held in the security server 20. If the user device in question is initiating one or more communications, then the address data will be source address data, as shown in the second column of table 58. If the identified user device is the recipient of one or more communication, then the address data will be destination address data, as shown in the third column of table 58. Note that, in this embodiment, the address data is in the form of telephone dialling numbers, in particular Mobile Station International Subscriber Directory Numbers (MSISDNs). Tables 32, 34, 36 and 38 represent the respective actual contact lists stored in user devices 22, 24, 26 and 28. The rule used here to model the contact lists is the following: "Data identifying a user device is stored in the contact list model associated with a given user device, if the total number of electronic communications, where the given user device is the source and the other user device is the destination, is greater than or equal to a minimum threshold count α."
For the purposes of this example the threshold count α is set to two. The term electronic communications here can be taken to mean SMS messages, MMS messages, email messages and telephone calls and other suitable communication methods.
In this particular example, user devices 22, 24, 26 and 28 have exchanged a certain number of SMS messages, MMS messages and telephone calls illustrated by arrows in Figure 1. The key 40 shows which arrow types indicate which type of electronic communication.
Data identifying user devices 24 and 26 is stored in the contact list associated with user device 22, because user device 22 has sent one . MMS message 44 and made one telephone call 42 to user device 24, and has also sent two SMS messages 46, 48 to user device 26. Accordingly, identifying data B and C is stored in the contact list model associated with user device 24, as the number of communications made to these user devices is equal to the required threshold count of 2. This can be seen in the first row of contact list model table 58.
The contact list model for user device 24, as seen in the second row of contact list model table 58, can be seen to be empty since user device 24 has not sent any message or made any telephone calls to any other user devices in the network.
Since user device 26 has sent one SMS message 50 and one MMS message 52 to user device 22, data (A) identifying user device 22 is stored in the contact list model associated with user device 26, as the number of communications made to this user device is equal to the required threshold count of 2. This can be seen in the third row of contact list model table 58.
Although data identifying user device 24 is stored in the actual contact list of user device 26, data (B) identifying user device 24 is not present in the contact list model for user device 26. Indeed, user device 26 has not sent any SMS messages or made any telephone calls to user device 24, and has only communicated with user device 24 by sending one MMS message 60. This level of communication is below the required threshold count of 2.
Finally, the contact list model for user device 28 contains data (A) identifying user device 22, because user device 28 has made two telephone calls 54, 56 to user device 22, which equals the required threshold count of 2. This can be seen in the fourth row of contact list model table 58.
In this example, it can be seen that the contact list models do not provide an exact match to the actual contact lists stored on the user devices themselves, but only an approximation of the actual contacts lists. However, the number of communications made here between the user devices is very low and in practice the number of communications used for modelling purposes would be higher, i.e. a more statistically valid sample size would be used, thus providing more accurate modelling. Once contact list models have been derived and stored, an analysis stage is used in preferred embodiments of the invention to identify malicious communications activity caused by malware and to identify malware propagation through the analysis of further received source and destination data for a second plurality of electronic communications between user devices in the network. This analysis stage may be carried out at one or more network nodes such as security server 20 and if a suspicious communication pattern is detected at security server 20, then a malicious activity detection state can be triggered and counter-measures undertaken to contain the malware responsible.
In one embodiment of the invention, when a malicious activity detection state is triggered, a plurality of service providers or network operators may be sent a signal, for example some form of message, by security server 20, informing them of possible malicious activity. The message may include details of the possible malicious activity, such as the suspected source or mode of propagation of the malware.
Once informed of the malicious activity detection stage, service providers or network operators may initiate a variety of counter-measures in order to try to contain or stop the malware activity.
In an alternative embodiment of the invention, security server 20 itself may alone initiate counter-measures or in another alternative embodiment of the invention, both the server and also the service providers or network operators may initiate a combination of counter-measures.
What counter-measures are initiated may depend on what is known about the profile of the malware, i.e. how it propagates, how early it was detected, what devices or systems it is designed to attack, what damage it causes etc. Counter-measures may include quarantining one or more user devices or putting them on a black list for which no incoming or outgoing communications are authorised. The service which malware is suspected of using to propagate may be suspended for all user devices, or just for user devices suspected of being infected by malware. Other measures may for example involve transmitting a message informing a user that their user device may be infected by malware. Another counter-measure may be to contact user devices in the contact list model of possibly infected devices, informing them to be aware that they may receive malware in the near future and action to take if they do. Counter-measures may initially be concentrated on users with large contact list models as their user devices may be more likely to cause widespread propagation of the malware.
The analysis stage involves looking for communication patterns that indicate a predetermined relationship between the further received data and the stored contact list model data. The predetermined relationship may be a required minimum correlation between the destination data for communications initiated by a user device and the identifying data stored in the contact list model associated with that device. The correlation may involve a predetermined minimum proportion of correspondence between the destination data for communications initiated by a user device and the identifying data for user devices stored in the contact list model associated with that user device.
If malware has infected a user device, the malware may access the actual contact list stored on the infected device and send out messages to all, a large number or a high proportion of the contacts it finds there. In the analysis stage, the contact list model for that user device can be consulted to see if the communication pattern for the messages indicates suspicious activity.
Note that, whilst the main contact list on a user device is the user's contact book, or address book, malware can also propagate by accessing other contact lists on a user device such as an outbox, a sent folder, a call history list or any other stored data that contains identifying data for other devices.
The analysis stage may involve identifying a plurality of a first type of suspect user device, which will be referred to herein as "malware suspect" user devices. A user device is identified as a malware suspect user device if a communication pattern indicative of malicious activity is detected in electronic communications initiated by that user device. As described above, such a communication pattern may be the user device sending out messages to all, a certain proportion or above a certain threshold of the contacts contained in the contact list model associated with that device.
The analysis stage may further involve identifying a plurality of a second type of suspect user device, which will be referred to herein as "propagation suspect" user devices. A user device is identified as a propagation suspect user device if it has a predetermined association with a malware suspect user device and is thus suspected of being involved in the propagation of the malware. The predetermined association may comprise a malware suspect user device initiating one or more electronic communications to a propagation suspect user device, i.e. a propagation suspect may be defined as a user device which has received one or more communications from a malware suspect during an analysis period. The predetermined association may alternatively, or in addition, comprise a propagation suspect user device being identified in the contact list model associated with the malware suspect user device.
The reasoning here is that if malware is effectively spreading, not only will infected user devices tend to send messages to many of the contacts stored in their actual contact lists in a short period of time, but some of these contacts may also then become infected. If the contact list model is a reasonable approximation of the actual contact list, then further received electronic communication data may be analysed for a predicted communication pattern based on the model. If the pattern is present in the further received electronic communication data, then it can reasonably be assumed, or at least suspected that this is due to malware activity.
Malware suspect user devices can first be identified, accompanied by the identification of propagation suspect user devices. Which user devices are both malware suspect user devices and propagation suspect user devices can also then be identified. If this process reveals that the number of user devices which have been identified as both malware suspect user devices and also propagation suspect user devices is high compared to the number of malware suspect user devices, this would suggest malware activity in the network. Deciding what constitutes "high" here may involve a certain threshold ratio or other suitable metric.
In one embodiment of the invention, the analysis stage may be carried out over a certain period of time and the triggering threshold count α set accordingly. In a further embodiment of the invention, the analysis stage may be followed by a further modelling stage in order to update the contact list models using the further received data In still further embodiments of the invention, the analysis stage may then continue for yet further received data, and so on in a repeating cycle as more and more electronic communication occurs in the network.
The updating of the contact list models in the modelling stage may involve a time-based sliding window over the communication data, or any other windowing of the data which could act to give less weight to older data and more weight to newer data, thus keeping the contact list models as relevant as possible. Other windowing of communication data may be employed, for example giving more weight to communications made during certain time periods in the day or on certain days. In an embodiment of the invention, the analysis stage may be performed substantially contemporaneously with the reception of data, i.e. the detection system may operate approximately in real-time and react to the passage of each electronic communication through the network.
In the case of an embodiment of the invention implemented in a mobile telecommunications network, the analysed data may be data extracted directly from the network signalling path, for example Signalling System #7 (SS7), GPRS Tunnel Protocol (GTP) and/or Session Initiation Protocol (SIP) message data The data may be extracted by an entity such as in the case of SS7 signalling, an SS7 switch or, in the case of GTP signaling, a GPRS Serving Node (GSN) or, in the case of SIP signaling, by a Session Border Controller (SBC). Alternatively, a packet sniffer may be used to extract information from data packets being sent through the network in the signaling or media path.
In an alternative embodiment of the invention, the analysis stage may be performed offline following the modelling stage where contact list data is stored. The detection system may thus still determine if malware is propagating through the network, but only after an elapsed period of time.
In further alternative embodiments of the invention, a combination of both real-time and offline approaches may be used for the modelling and analysis stages. For example, the modelling stage may be performed offline and the analysis stage may be performed online or vice versa.
In the case of an embodiment of the invention implemented in a fixed or mobile telecommunications network, the data for analysis may for example be extracted from Call Data Records (CDRs) which are usually created at the end of calls and at other chargeable events such as messaging events. CDRs are used by network operators to monitor telephone usage, billing reconciliation, network management and other such purposes. In this embodiment, such CDRs are also copied to the security server 20 for use in the modelling and/or analysis stages.
The analysis stage, i.e. the process of analysing further received data in relation to stored contact data for communication patterns indicative of malicious electronic communication activity is now explained by way of an example. Figure 2 is a diagram of an example communication scenario, namely showing SMS messages initiated by user device 22 during an exemplary analysis period. Many of the elements in Figure 2 which are also present in Figure 1 are given the same labels in both figures and will not be described again here for the sake of brevity.
In Figure 2, arrows represent the SMS messages sent from user device 22 during the exemplary analysis period. Here no telephone calls were made and no MMS messages were sent, which is reflected in the key 40 only referring to SMS messages. It should be noted that MMS messages or emails may conceivably have been included in a similar manner to SMS messages in this example, but were excluded for the sake of simplicity of explanation.
The analysis of further data may differentiate between different modes of communication. For example, the analysis may exclude certain modes of communications which are included in the process of producing the contact list models. For example, a mode of communication may not be included in the analysis stage if it is known that using current technology, malware cannot be propagated using that mode, for example telephone calls.
Figure 2 shows that user device 22 has sent two SMS messages in total, 66, 68. These messages include one to user device 24 and the other to user device 26 respectively. As user device 22 has initiated communication with, i.e. has sent SMS messages to a predetermined number or proportion of the user devices identified in its contact list model (in this case, to all of those user devices identified in it), the detection system will identify user device 22 as a malware suspect user device. Indeed, a malware suspect user device may be defined as a user device which sends messages to more than a specified number or proportion of the user devices identified in its contacts list model during an analysis period. Since the contact list model associated with user device 22 contains data identifying user devices 24 and 26, both user devices 24 and 26 are identified as propagation suspect user devices.
The contact list model table 58 in Figure 2, has two additional columns 62, 64, which show that user device 22 has been identified as a malware suspect user device and also that user devices 24 and 26 have been identified as propagation suspect user devices.
Figure 3 is a diagram of the example communication scenario shown in Figure 2, but now also showing SMS messages initiated by user device 26 during the same exemplary analysis period. As with Figure 2, the only electronic communications shown in Figure 3 are SMS messages, by the same reasoning.
Figure 3 shows that user device 26 has sent three SMS messages in total, 70, 72 and 74. These messages include two messages 70, 74 to user device 22 and one message 72 to user device 24 respectively. As user device 26 has initiated communication with a predetermined proportion of its contact list model (in this case, to all of the user devices identified in it), the detection system has identified user device 26 as a malware suspect user device, as shown in the third column of table 58. Since the contact list model associated with user device 26 contains data identifying user device 22, user device 22 is identified as a propagation suspect user device, as shown in the fourth column of table 58.
During the exemplary analysis period, user devices 24 and 28 have not sent any messages to other user devices. Therefore user devices 24 and 28 are not identified as malware suspect user devices. To determine if malware is propagating, the detection system can then calculate the number or proportion of user devices which have been identified as both malware suspect user devices and propagation suspect user devices, compared to those user devices which have been identified as malware suspect user devices. If this number or proportion is larger than a threshold, say 75%, then a malicious activity detection state can be triggered to indicate that there is a detected probability that malware is propagating.  In the above example the number of user devices which have been identified as both malware suspect user devices and propagation suspect user devices is two and the number of user devices which have been identified as malware suspect user devices is two. The associated proportion in this example is thus 100% and a malicious activity detection state can be triggered.
The examples described above involve only four user devices. In reality, a network would have many more and application of the invention should be extended beyond the examples accordingly.
During the modeling stage of the invention, only a percentage of the actual address book of each user devices may be modeled, for example η%.
Indeed if a user has stored contact details of an associate in the address book of his device but never communicates with this associate, these contact details will never be added to the contact list model associated with his device. The mean value of η among users may be evaluated empirically by modeling the address books of user devices for which the content of their actual address book is known. A sample may for example be obtained from users that agree to provide the list of contacts stored in their address books, which can then be compared to the contact list models which have been generated.
Approximately η% of communications (for example messages) which are due to malware activity may typically be sent to recipient devices associated with contacts which are in the address book models that have been created during the modelling stage.
In alternative embodiments of the invention, a malware suspect may alternatively be defined as a user device whose communication pattern follows a formula as per the following inequality range:
Communications sent to contacts in the Address Book Model^ . a < θ = = = — = = — = = = = ≤ λ
Communications _ total _ sent _ during _ the _ period _ analysed
where a ≤ η ≤ λ  The metric θ in the above inequality range gives a measure of how many communications are sent to contacts present in the address book models compared to the total number of communications sent.
If a user is infected by malware the value of θ may be expected to be close to the value of η, since η represents approximately the percentage of the address book of the user that have been modeled. If it is assumed that the malware is sending a message to all contacts stored in the address book, θ may approximately be equal to η. If the malware sends messages only to a subset of contacts stored in the address book, θ should also be approximately equal to η, since the malware may typically send to selected contacts which are randomly related to the contact list model. However since η is a mean calculated empirically, the detection mechanism may use two rules: α≤θ≤ λ and α≤η≤ λ , from which it can be seen that η and θ may be of approximately the same order.
Introducing the α parameter as a minimum in the above formula range can help to avoid cases where a user device is identified as a malware suspect in a case it has initiated communication with too few of the contacts present in its address book model compared to the total number of communications it has initiated. Indeed it may be assumed that the malware will not propagate only among the non frequent contacts which will not be present in the address book model. Typical values for α may range from 1 to 10%.
Introducing the λ parameter as a maximum in the above formula range can help to avoid the scenario where a user device may be identified as a malware suspect if it has initiated communication with too many of the contacts present in its address book model compared to the total number of communications it has initiated. Indeed it may be assumed that the malware will not only propagate among the frequent contacts which will be present in the address book model. Typical values for λ may range from 40 to 90%.
As an example, say η was evaluated as being η = 25% (i.e. the decimal value 0.25), and the limits in the above formula are set at α = 5% and λ = 50%. Further, a user device A has actual address book entries corresponding to other user devices B, ..., K as follows:
and an address book model as follows:
and communications (messages in this example) initiated by user device A during an analysis period are as follows: A to B A to G A to I
A to J A to F
In this case, the total number of communications (messages in this example) sent by user device A during the analysis period is 5 and the number of messages sent to contacts in the address book model of user device A is 2. Therefore the numerator in the above metric is 2 and the denominator is 5, such that 5% < θ =2/5 < 50%. As 2/5 (=0.4) lies between 5% (0.05) and 50% (0.5), user device A is identified as a malware suspect according to the definition in this alternative embodiment of the invention. The definition of a malware suspect in this alternative definition may be used alternatively or in addition to the malware suspect in other embodiments of the invention described herein.
In further alternative embodiments of the invention, additional information relating to communication patterns between user devices in the network may be employed. This additional information can be used to identify a further type of user device which exhibits suspicious qualities, i.e. whose communication patterns indicate suspicious communication behaviour. User devices identified as such will be referred to herein as "behaviour suspects".
Use of communication behavioural information may involve which communication service is used for the communications, for example email message service, SMS message service, MMS message service, telephone call service, chat message service and/or file transfer service.
Use of communication behavioural information may involve indicating the type of the contact data, for example whether the contact data relates to a mobile telephone number type, a premium rate number type or a fixed line number type or such like.
The frequency with which each user device communicates using each service (calls, MMS or SMS, etc.) may be employed, for example the most and least frequently used service. This could also involve identifying the frequency of communications to each recipient device for each service. As an example, a user device may frequently communicate with a particular recipient by telephone call only. If that user device was then to send one or more SMS messages to that particular recipient, this could indicate suspicious communication behaviour.
Use of communication behavioural information may also involve analysing communication patterns over different time intervals. This may help to determine intervals in which a user device is more likely to legitimately communicate with one or more recipient devices. This may help in differentiating from communications due to malware activity and hence help to identify behaviour suspects. An example of identifying behaviour suspects is now described in relation to Figure 4. Table 1 in Figure 4 shows an address book model for a source user device with an exemplary telephone number of 01700000000 As seen in Table 1, this source user device currently has three contacts in its address book model with respective telephone numbers 01711111111, 01722222222 and 01233333333 as shown in the second row of the table.  The third row of Table 1 shows the type of the telephone numbers of the contacts (mobile, fixed line, premium rate numbers).
The fourth row of Table 1 shows a breakdown of the different types of communication (telephone calls, SMS messages and MMS messages) initiated by the source user device.
The fifth row of Table 1 shows a breakdown of the different time intervals in which communications were initiated by the source user device.
The exemplary time intervals here are midnight till 7am, 7am till 12 midday, 12 midday till 7pm and 7pm till midnight, but other suitable numbers of intervals and times may equally be employed.
Thus, source user device can be seen to have communicated 17 calls to
Contact 1, 4 calls and 10 SMSs to Contact 2, and 5 calls to Contact 3. It can also be seen that 10 of the 17 calls to Contact 1 were made between 7am and 12 midday and the other 7 calls to Contact 1 were made between 12 midday and 7pm, etc.
The type of the contact's telephone number may also be used to detect abnormal behaviour. Indeed, a user device which tries to send a SMS to a fixed line number could be suspicious. It should be noted that it may be possible, either now or in the future, to send and receive SMS and MMS messages via fixed line telephones, in which case the above analysis may be adjusted accordingly.
Additional behavioural information such as that exemplified in Tables 1 in Figure 4 may be used to define more complex and sophisticated models which may help to improve identification of behaviour suspects during the malware activity identification process.
Other embodiments of the invention may include defining a number of alarms which may be employed to identify behaviour suspects in the network. Examples of such alarms include the following:
• Fixed destination alarm: raised when a user device sends one or more messages (SMS and/or MMS) to a fixed line telephone number. If it  becomes possible in the future for fixed line telephones to receive SMS and/or MMS messages then this alarm may be modified accordingly.
• Higher usage alarm: raised when a user device has a messaging usage (SMS or MMS) which is higher during an analysis period than during a modelling period, for example 15 times higher.
• New messaging service user alarm: raised when a user device begins to use a messaging service (SMS and/or MMS) during an analysis period, when no such service was used during a modelling period.
• Time alarm: raised when a user device exhibits during an analysis period, a higher communication activity pattern than the average exhibited during a modelling period, for example 10 times higher. A user device may be considered to be a behaviour suspect according to the above alarms alone or in combination.
Embodiments of the invention described previously included identifying a user device as being infected with malware when it is both a malware suspect and a propagation suspect. If communication behavioural analysis is employed, a user device may be identified as infected with malware if it is both a malware suspect and a behavioural suspect. This may involve determining the number of user devices which are both malware suspect user devices and behaviour suspect user devices in comparison with the number of user devices which are malware suspect devices.
The above embodiments are to be understood as illustrative examples of the invention. Further embodiments of the invention are envisaged.
The propagation of types of malware other than those which have the ability to propagate by themselves can also be detected by use of the present invention. Say, a user receives a communication on their user device which tells the user to browse to a certain internet location and download a certain file. If a user then follows these instructions, the downloaded file may contain a copy of the Trojan which can then infect their user device. Although the malware does not propagate itself directly, the Trojan has managed to trick the user into helping the Trojan propagate. Such communication behaviour can be detected during the analysis stage of the present invention.
The present invention may also be applied to detect malware which does not propagate purely by targeting contact lists stored on user devices. The malware may propagate at a close proximity by looking for other user devices to infect over a short-range wireless interface such as Bluetooth0™'. If the malware also sends MMS messages, to propagate or for any other purposes, to members of contact lists over a mobile telephone network, then the present invention can detect such communication behaviour and hence help control further propagation of the malware.
The present invention is also able to detect malware which propagates via the transfer of data files over a communications network. Such transfers may involve use of the File Transfer Protocol (FTP), Secure Shell File Transfer Protocol (SFTP), File Service Protocol (FSP), or other such suitable protocols. The transfers may be part of file sharing arrangements and may involve peer-to- peer or client-server transfer models or hybrids thereof.
It should be noted that the particular rule and threshold count described in relation to Figure 1 were chosen for reasons of explanation. Other rules and thresholds are envisaged, which may vary depending on various factors, including the network type, the number of users in the network and even the type of user devices themselves.
Further, the exact levels at which a malicious activity detection state is triggered may also change according to these or other factors. Other metrics may also be used for this triggering, i.e. using identified malware suspect and propagation suspect user devices compared to identified malware suspect user devices is not the only metric that may be applied. For example, the number of user devices, which are identified as malware user devices may be used as another indicator of malicious activity. The analysis may include looking for a high number of calls to the same (or similar) telephone number that is known to be used by malware, possibly at the same (or similar) times, from more than one device which may have been identified as a malware suspect. This may also include looking at calls to such numbers from user devices which have been identified as propagation suspect user devices.
The above description of the modelling and analysis stages of the invention concentrated on source and destination data extracted from communications in the network. Data associated with the timing of the communications may also be used in either or both of the modelling and analysis stages, for example to produce time dependent thresholds. The analysis may also consider the time sequence of communications, such as if one communication was performed before another one, this may indicate that one was sent as the result of the other, possibly due to malware activity. The time sequence analysis may be used to help identify malware suspect user devices and propagation suspect user devices.
Data associated with the size of communications may also be a factored into either or both of the modelling and analysis stages. For example, if several communications are all of the same size, or over a certain size, this may be an indicator of malware activity.
Probabilistic techniques may be used in the present invention, possibly involving different techniques for the modelling and analysis stages. This may involve assigning different probabilities to different communications or combinations of communications, possibly using Bayesian techniques. For example, probabilities may be different for a telephone call than for an SMS message or could vary with the time of day. A further example may be if a communication is responded to somehow, for example by a later call or message in the opposite direction, then higher probabilities could be assigned. In order to reduce the number of falsely triggered malicious activity detection states, some form of de minimis rule may be applied. For example, only identifying a user device as a malware suspect user device if the number of user devices identified in the contact list associated with that user device is larger than a minimum threshold. This may avoid suspecting that a user device is potentially infected by malware merely because it has sent a message to the only user device stored in its contact list model.  The contact list model associated with a user device may be limited to a certain data size or only hold data identifying a certain number of other user devices. This may for example include only storing data identifying the most frequently contacted user devices or a statistically valid sample of user devices. This limiting of contact lists may serve to keep memory within manageable levels and help reduce associated hardware costs.
The above discussion refers primarily to an embodiment of the invention where the communications network is a mobile telecommunications network.
However, in other embodiments of the invention, the invention may alternatively, or additionally, be applied to other types of user devices such as personal computers (PCs) or personal digital assistants (PDAs) which can be connected to a data network such as the Internet, or a combination of such networks and devices. The communications may be made using emails, VoIP services, Session Initiation Protocol sessions or instant messaging services such as Yahoo! Messenger(TM). In these embodiments of the invention, data for both the offline and real-time analysis stages of the invention may be accessed via one or more servers, which may be dedicated email servers, or from packet switching routers or such like. The detection system may then be applied to the detection of malware propagating for example by email which uses address books of email clients stored on personal computers or such like.
The system may also involve a combination of such networks, for example in tackling malware propagating via VoIP technology, or possibly using wireless access point networks such as WiFi.
The processing for the detection system may be distributed over more than one network node or server and should not be limited to the single security server 20 as shown in the figures.
If it becomes known that a particular piece of malware is on the loose that targets a particular communication type, for example only propagates using
MMS messages, it is envisaged that the various parameters in the detection system may be altered accordingly to tune more finely the system against attacks from such malware.  The above discussion has assumed that the actual contact lists being modelled in the modelling stage are stored on the user devices themselves, but this may not always be the case. A user may have a contact list that is remotely stored somewhere in the network. The user may then access the contact list from more than one type of device and thus has the convenience of just keeping one contact list up-to-date, instead of a separate contact list for each device. The invention is also applicable to this type of scenario, as if the user can access the contact list remotely, then malware may be able to also.
Note that, whilst in the above-described embodiments the historical contact data held in the security system is in the form of contact list models which are individual to particular user devices, the historical contact data may be held in other forms, such as a searchable list of source and destination data for communications made by all users.
The process of identifying user devices as being infected with malware on the basis of malware suspects and propagation suspects is described above. The process of identifying user devices as being infected with malware on the basis of malware suspects and behaviour suspects is also described above. Either of the two processes may be used alone, or alternatively both processes may be used in combination. It is to be understood that any feature described in relation to any one embodiment may be used alone, or in combination with other features described, and may also be used in combination with one or more features of any other of the embodiments, or any combination of any other of the embodiments. Furthermore, equivalents and modifications not described above may also be employed without departing from the scope of the invention, which is defined in the accompanying claims.