ANOMALY DETECTION FOR ASSISTED LIVING
The present invention is concerned with the interpretation of location and activity information, such as collected by a device carried about the person. The use of such a device is proposed for monitoring activities of vulnerable individuals, but is not limited thereto.
By using technology, the effective independence of vulnerable individuals has increased over the years. For instance, emergency alarms have developed, to enable an individual to live alone, without direct and intrusive monitoring by a carer. At first, such alarms needed to be hard-wired, and actuated physically, such as by a pull-cord, but nowadays it is becoming increasingly common to provide devices such as pagers which can be activated by an individual if a concern arises. Such a concern might be that the individual had suffered a fall, and could not recover their feet, or was experiencing a health crisis such as chest pains (which might be symptomatic of a serious medical condition).
One drawback of all techniques identified above is the need for physical activation by the individual. That is, the devices do not provide a monitoring facility in the normal sense -the individual concerned still needs to be in a physical condition to reach for and activate a device.
For instance, International Patent Application WO/2005/008914 describes a system for implementing mobile care-giving and so-called "intelligent assistance". The system comprises a mobile communications device configured to communicate with a base station. Various embodiments of the mobile communications device are illustrated, each including a variety of sensors, for monitoring environmental or patient oriented criteria. The mobile communications device refers data back to the base station.
However, alarm conditions are raised only in response to user actuation, as there is no facility for processing of information to determine, without user intervention, whether a situation has arisen which may give rise to concern for the wellbeing of the user. User actuation of an alarm may be designed ergonomically to be as simple as possible, in recognition of the fact that, in a personal health crisis, a user may be unable to attend to a complex electronic device. However, although a simple actuation may be provided, this may still be too much for a confused or unwell person to tackle.
Moreover, for an alarm to be raised, the user would need to be aware of the source of the alarm situation, and a user in the midst of erratic behaviour such as might be caused by a medical emergency may not be able to make a sound judgement in that regard.
It is therefore desirable to detect a set of circumstances which could be evidence of an alarm situation, without the need for the action of a user of a monitoring device or the like, It is appropriate in such circumstances to consider the teachings of other technical fields, to determine if a suitable arrangement has ever been proposed.
Automatic pre-emptive fault detection is commonplace in purely mechanical systems, for example as discussed in US Patent Application US2009043441A1. That disclosure describes a system for predicting failures within a car. Of course, it will be appreciated that mechanical systems, with appropriate measurement equipment for collecting data from which fault conditions can be ascertained, are some distance from the field of the present invention. In the present field, it is highly desirable that the collection of data be sufficiently unobtrusive to avoid confusion or discomfort to an elderly or vulnerable human subject. Data collection techniques acceptable and suitable in a mechanical setting may be highly inappropriate and considered obtrusive in a human care setting.
As well as this question regarding data collection, the detection of anomalies in collected data is not fully taught in the prior art. Anomaly detection is an umbrella term for a range of related techniques that aim to analyse large quantities of data to find points or sequences of points that fall outside the patterns within the rest of the data.
US200510101250A1 describes a system that can track and collect information about a user's location and triggers an alarm based on a fixed set of rules, If the subject matter of that disclosure were implemented, for monitoring the well-being of an elderly or vulnerable person, a rule might be written that an alarm should be raised if a user is immobile for 8 hours in a room other than a bedroom. Although this can be useful in limited circumstances, it is difficult to accommodate a patient's welfare within fixed and pre-designed rules of this nature.
Alternative approaches to the detection of anomalies must therefore be sought.
"Towards parameter-free data mining" (E. Keogh, S. Lonardi, and C. A. Ratanamahatana, Proceedings of the 10th ACM SIGKKD International Conference on Knowledge Discovery and Data Mining, ACM Press New York NY, USA 206-215) discloses a technique derived from Kolmogorov Complexity measures to compute an abstract difference metric between two sequences of symbols, called the Compression-based Dissimilarity Measure (CDM).
This metric is based on the information content of the sequences, and can be simply approximated using standard compression algorithms such as DEFLATE or bzip2, which have freely available implementations: zlib and libbzip2.
It will be appreciated, by review of relevant literature in the field, that various CDMs have been proposed. Equally, compression based similarity measures (CSMs) have been discussed, such as in "Compression and Machine Learning: A New Perspective on Feature Space Vectors" (D. Sculley and Carla E. Brodley, Tufts University, 2006).
The present disclosure is not intended to imply any limitation as to the type of measure which can be used in connection with an implementation of the invention. Further, where reference is made to the term CDM in this document, the reader will understand that alternative use of a CDM is riot excluded.
It is important to note (and as appreciated in several of the above references) that any CDM will be an approximation of an abstract theoretical measure such as that proposed by Kolmogorov complexity theory. As far as is currently known, the exact value of such a measure cannot be computed, so the purpose of a CDM is to approximate the abstract theoretical measure using a particular specified compression algorithm. The better the compression algorithm is for the particular choice of data, the closer is the approximation. Once a particular compression method has been chosen, the value of the CDM for that method is exact, and there is no further approximation.
Aspects of the invention provide an installation, such as for use in a domestic environment, to collect approximate location data for a target. This data can then be analysed locally to spot unusual patterns of movement of the target. These could indicate sudden health problems or other changes in behaviour.
Unusual patterns in data, also considered as anomalies, can be determined in one embodiment on the basis of the information complexity of sequences of collected data.
A first aspect of the invention provides a method of monitoring a measurable criterion for the existence of an anomaly condition, comprising initially collecting a sequence of measurements of said criterion to provide a training sequence; in real time collecting a sequence of measurement of said criterion to provide a monitoring sequence; selecting, from said monitoring sequence, samples included in an observation interval, said samples making an observation sequence; determining a dissimilarity measure of said observation sequence with said training sequence; and on the basis of said dissimilarity measure, determining for the presence of an anomaly in said observation sequence.
A second aspect of the invention provides a method of monitoring a measurable criterion for the existence of an anomaly condition therein, comprising collecting a sequence of measurements of said criterion to provide a monitoring sequence; selecting, from said monitoring sequence, samples included in an observation interval, said samples making an observation sequence; determining a dissimilarity measure of said observation sequence with a stored training sequence; and on the basis of said dissimilarity measure, determining for the presence of an anomaly in said observation sequence.
The determining for the presence of an anomaly may comprise comparing said dissimilarity measure with a threshold, and making said determination on the basis of said comparing.
The method may include generating an alarm condition on the basis of determining that an anomaly is present.
The determining a dissimilarity measure may comprise compressing said observation sequence, compressing said training sequence, compressing a concatenation of said observation sequence and said training sequence, and comparing said compressed concatenation with the sum of the compressed training and observation sequences.
The comparing may comprise determining a ratio of said sum to said compressed concatenation, said ratio being a measure of dissimilarity.
The method may include providing a facility for user input in response to determining an anomaly and, in response to a user input action indicating that said determined anomaly is not representative of an actual anomaly, feeding back the observation sequence leading to said detection into said training sequence. The feeding back may comprise concatenating said observation sequence to said training sequence, to result in a new training sequence.
A third aspect of the invention provides apparatus for monitoring a measurable criterion for the existence of an anomaly condition, comprising: training data collection means for initially collecting a sequence of measurements of said criterion to provide a training sequence; measurement data collection means for collecting, in real time, a sequence of measurement of said criterion to provide a monitoring sequence; sampling means for selecting, from said monitoring sequence, samples included in an observation interval, said samples making an observation sequence; dissimilarity determining means for determining a dissimilarity measure of said observation sequence with said training sequence; and anomaly detection means for determining, on the basis of said dissimilarity measure, for the presence of an anomaly in said observation sequence.
A fourth aspect of the invention provides apparatus for monitoring a measurable criterion for the existence of an anomaly condition therein, comprising measurement data collection means for collecting a sequence of measurements of said criterion to provide a monitoring sequence; sampling means for selecting, from said monitoring sequence, samples included in an observation interval, said samples making an observation sequence; dissimilarity determining means for determining a dissimilarity measure of said observation sequence with a stored training sequence; and anomaly detection means for determining, on the basis of said dissimilarity measure, for the presence of an anomaly in said observation sequence.
The anomaly detection means may comprise comparison means for comparing said dissimilarity measure with a threshold, said anomaly detection means being operable to make said determination on the basis of said comparing.
The apparatus may comprise alarm means operable to generate an alarm condition on the basis of determining that an anomaly is present.
The dissimilarity determining means may comprise compression means operable to compress said observation sequence, to compress said training sequence, and to compress a concatenation of said observation sequence and said training sequence, and comparing means operable to compare said compressed concatenation with the sum of the compressed training and observation sequences. The comparing means may be operable to determine a ratio of said sum to said compressed concatenation, said ratio being a measure of dissimilarity.
The apparatus may include user actuable input means providing a facility for user input in response to determining an anomaly and wherein the apparatus is responsive to a user input action indicating that said determined anomaly is not representative of an actual anomaly, to feed back the observation sequence leading to said detection into said training sequence. In the event of said feeding back, the apparatus may be operable to concatenate said observation sequence to said training sequence, to result in a new training sequence.
Aspects of the invention may be implemented as a method performed on a bespoke apparatus, or a distributed system designed for the task, or as bespoke apparatus, as the case may be. Alternatively, the invention may be implemented by way of a suitable computer program product, for execution by an appropriate computer with general purpose communications facilities. The computer program product may be in the form of a storage medium, which may be an optical disk, magnetic medium, or readable mass storage device such as Flash memory or other ROM. It may also be in the form of a computer receivable signal. The computer program product may comprise a self contained executable program, or computer executable instructions to make use of expected program resources on the host device, i.e. a program making calls of library functions.
Data may be obtained either via RFID tags worn on the person or from mobile phone transmission signal strength, or from other sources.
A specific embodiment of the invention will now be described, with reference to the accompanying drawings, in which: Figure 1 illustrates a domestic communications installation in accordance with a specific embodiment of the invention, including a home gateway and a mobile station; Figure 2 schematically illustrates, in further detail, the home gateway of the installation illustrated in figure 1; and Figure 3 is a flow diagram of a process performed at the home gateway, in this specific embodiment, to detect the existence of an anomaly in use or behaviour of the mobile station in the installation.
As illustrated in figure 1, a home installation of an embodiment of the invention is shown, comprising a home gateway 10 and a mobile station 20. The mobile station is, in this embodiment of the invention, a mobile telephone, though a pager, a bespoke electronic device (such as an electronic tag to be fitted to a limb of a person to be monitored) or any other electronic means operable to support wireless communication with a base station can be used in alternative implementations.
The home installation, comprising the home gateway 10 and the mobile station 20, is illustrated in a schematic plan of a domestic premises. This is illustrative of one possible implementation of the invention.
The home gateway 10 is illustrated in further detail in figure 2. It comprises a processor 100, a random access memory 102 and a mass storage medium 104. A core gateway unit 106 is operable to establish communications with an external entity via a digital subscriber line, such as using asynchronous digital subscriber line (ADSL) technology. A wireless communications unit 108 (with associated antenna 109) is operable to establish wireless communication with other devices in its vicinity, thereby establishing its function as a femtocell.
A communications bus 110 provides a communication facility between the various other components of the home gateway 10.
The mass storage medium 104 stores a computer program which comprises computer executable instructions, executable by the processor 100, to enable use of the home gateway 10 in accordance with the present embodiment of the invention. In use, as per usual execution of a computer program, stored instructions will be executed by being drawn from the mass storage medium 104 (which may be, in relative terms, accessible at a low access speed) and stored in memory 106. This will allow rapid execution of such instructions by the processor 100 as required. With appropriate description of the software, such as into manageable portions of code, the time to restore a portion of code from mass storage to working memory can be masked by the execution of the program itself. The process carried out by the processor 100 in accordance with the computer program will be described in due course.
The computer program may have been factory installed in the mass storage medium 104, or may alternatively have been downloaded to the home gateway 10 after supply to the user. The computer program could also be supplied on a physical medium such as a readable optical disk.
Moreover, as will be familiar to those skilled in the art, an installation disk, comprising an optical physical medium, could be supplied which stores computer executable instructions operable to cause a computer to install the computer program onto the home gateway 10. Such an installation disk could be executed on the home gateway (if it has optical disk reading facilities) or on another computer connected temporarily thereto (such as via an Ethernet connection). Such techniques will be known to the reader.
The present embodiment of the invention is designed to support a system in which a user has a home gateway (on which this embodiment of the invention can run) which may include femto-cell/RFID features which could support localisation information.
In this scenario a significant quantity of information could be collected locally. This would include phone call start times, call durations, incoming calls. In addition, transmission strength (of the signal emitted by the mobile station 20 as collected at the base station 10) could be used as an approximate measure of the location of the mobile station. This is information that is already collected by network operators, but could have more relevance to the user.
Supplementary location information could be supplied from RFID tags possibly attached to the mobile station 20, with tag reader hardware and software in the base station 10.
Securing such data could pose a problem, as it would be undesirable for an external observer to be able to intercept information identifying the presence and location of an individual. This might compromise the security of the individual or, indeed, the process in which the present arrangement is implemented. Therefore, standard encryption and authentication procedures for accessing any stored data, and authentication for those receiving updates, are essential. Certain encryption methods (a substitution cipher would be the simplest example) would not alter the complexity of the stored sequences, so it is possible to encode the data prior to analysis with no need ever to decode it.
An example implementation will now be described. This implementation focuses on an entirely location based application with a very coarse granularity (only locating the individual down to single rooms), but this could be extended to more complex situations and multiple individuals.
Each room within a house is given a unique symbol from a set S. It is then possible to map these symbols to strings of standard alphanumeric characters for implementation purposes. That is, while the set S might contain symbols such as A' or B', these could represent meaningful natural language descriptions of locations such as "living room", "kitchen" and so on. The location of the individual of interest is measured regularly.
For the present application, the interval between location measurement could be in the order of 2-5 minutes. This produces a sequence of symbols from S. The training and analysis of these sequences proceeds as follows, with reference to figure 3.
1. In step S1-2, a lengthy sequence of training data D is obtained, that is thought to present normal behaviour of the individual. This could be from a period when the individual is under an increased level of supervision to ensure that no serious problems occur during this time.
2. An observation interval is chosen, which is a number of observations that is sufficiently long to show a meaningful sequence of behaviours, yet short enough to allow appropriate intervention.
3. In step S 1-4, a pre-processing stage collects a sample of location data, and the CDM is computed of each sub sequence of D of length t with the whole sequence D* This gives information of the expected variation of this metric over normal behaviour sequences. This only needs to be done once, or whenever the system needs to be recalibrated, such as for a change in the 1 para meter. This is a sample and can be summarised by mean, variance maximum and minimum statistics, if necessary. In particular a threshold value t can be computed such that samples are considered anomalous if the CDM is above this level.
4. During the analysis phase, in step S1-6, a sliding window of observations are kept and the CDM of these sequences with D are computed. An anomaly is detected if the difference is much larger than those in D, i.e if it is larger than t.
Several tests are possible, and a sequence of such windows can be compared with t using standard two sample t-test techniques. Steps S1-8 and Si-b note the different responses to the two outcomes of this test.
With reference to step 2, the decision is taken on the basis of the context of the implementation. For instance, in the context of residential care of the elderly, chronically sick or vulnerable, it will be recognised that it is of no use to detect behaviour indicative of a heart attack several hours later than the event itself. As such, the choice of observation interval could well be dependent on the type of medical scenarios that the system is used to attempt to detect. It should be noted that there are two parameters in the system, namely the sampling frequency f and the observation interval I. The sampling frequency f describes how often samples are collected of location data.
This disclosure makes no explicit recommendation on this, but between 2 and 5 observations per minute may be appropriate. The observation interval is the sliding window of samples that give sufficient information to enable detection of an anomaly, i.e. that something has happened to measured behaviour to give an indication that something is wrong.
There is no operational risk to choice of f too small and itoo large. This would only result in waste of computational time and data storage. The predictions arising from such a configuration would be the same as for a shorter observation interval and less frequent sampling.
It might be apparent to the reader, from the present disclosure, that a sampling frequency of thirty seconds and an observation interval of 3 hours (that is, 1=360) would capture most location based behaviour in a home. It will be appreciated that these figures are merely suggestions and do not imply any limitation on the scope of implementation of the present invention.
Concerning step 3, since a CDM is an approximation of the abstract theoretical measure discussed in the literature appertaining to Kolmogorov complexity theory, an example of such a CDM and its calculation is described here. In this example, the CDM distance is defined as: C(x.y) CDM(X,y) = C(x) + C(y) In the above equation, X and V are two sequences, and C) denotes the length of the compressed version of sequence X, while X. V denotes the concatenated sequence "x followed by Y ".
In practice, as an example, the compression algorithm chosen can be the deflate function of the bzip2 library, such that compression lengths would be computed by compressing a byte sequence representing the sequence and counting its length.
With reference to step 4, an anomaly is stated as having been detected if the difference is "much larger" than those in the sample set T. The phrase "much larger" merits discussion. A simple method of evaluating this condition would compute the mean and standard deviation of the sample distances, and raise an alarm if the distance of the current interval is more than k standard deviations from the mean. The value of k could be increased to avoid false positives; a suitable initial choice of value would be k3.
An optional feedback mechanism may be included (not illustrated), whereby sequences detected as anomalous but which turn out to be false alarms can be fed back and appended to D to increase the range of behaviour considered non-anomalous. Such an update would require a re-computation of T, but this would be assumed to only occur infrequently. The training data D could also be supplemented by every sequence that is judged to be non-anomalous, but this would increase the computation significantly, and would have less effect on the sample T. In contrast, the present invention is concerned with automatic detection of deviation from common practice, from which anomalies can be identified and which can be processed and considered ("triaged") from a central location.
The input device can be a fairly commonplace mobile phone, which could be worn on a lanyard or in a holster for easy access and to ensure that the location information is relevant.
More accurate location information that includes subdivisions of each room (such as that which could be given by an RFID tag) can be added to this method very simply.
The anomaly detection process is flexible enough to allow for different circumstances by altering monitoring frequencies and observation window lengths.
By using the implementation as set out above, operation involves a computation which is simple and relies on standard algorithms. This results in a low complexity computational requirement.
The above embodiment is but one example of the invention, and the specific features thereof are described to illustrate only, and are not intended to limit the scope of protection sought. Instead, the scope of the invention should be taken from the claims appended hereto, which may be interpreted at least in part in light of the description and drawings.