Detailed Description
The present invention provides methods of collecting affective information when a particular user views a scene, correlating that information with the resolution of a captured image of a specified scene for later use of the affective information and image along with information derived from the image. The present invention also provides methods of acquiring and correlating available non-image data related to an image in a manner useful for personal health care and safety.
Information representing the user's mind, physiology, behavior, and behavioral response to a particular scene or image of a scene is referred to herein as affective information. Emotional information includes raw physiological and behavioral signals (e.g., galvanic skin response, heart rate, facial expressions, etc.) as well as their psychological interpretation (e.g., likes, dislikes, etc.) and the emotional classifications associated therewith (e.g., fear, anger, happiness, etc.). When the psychological reflection of the user changes, the emotional information of the user also changes. Such psychological reactions occur, for example, when the user suddenly sees a dangerous accident, a deliberate action, or a beautiful landscape, etc.
The resolution of affective information can provide several levels of user preference (e.g., the degree to which the user likes or dislikes the scenery). This may also provide an indication of the relative importance of the scene to the user. Furthermore, the emotion information can be analyzed by using certain specific emotion triggered by the scene (such as happiness, sadness, fear, anger, etc.).
A scene is defined as what the observer sees. It may refer to a place where an action or event occurs, a collection of objects seen by a viewer, a series of actions and events, a landscape or a portion of a landscape, etc. A subject recorded or displayed by an image capturing apparatus is referred to as an image of the subject. Examples of image capture devices include digital cameras, hand-held cameras, wearable cameras, cameras traditionally used to record still or moving pictures on film, analog cameras, and the like. The user can view the scene directly through the viewfinder of the camera or through the preview screen of the camera which acts as a viewfinder.
The term image as used herein includes, but is not limited to, still images, moving images, multi-perspective images such as stereoscopic images or images of other depths, and other forms of submerged still and moving images.
The information derived from the image of the scene relates to knowledge about the scene, such as location, type of location, description and classification of events, and knowledge about scene composition, e.g. color, objects, people, etc., that can be extracted from the scene image.
Non-image data refers to other types of information available in connection with an image. Examples of non-image data related to an image are the date and time of the captured image provided by the camera.
People capture images of various scenes for different purposes and applications. Capturing memorable events is an example of the common activities of common people, professional photographers, and journalists. These times are meaningful or emotionally important to an individual or group of individuals. The images of these events may attract special attention, recall, trigger emotions, or in general terms, produce a psychological response. Often these psychological responses are accompanied by physiological and/or behavioral changes.
Emotional triggering is defined as the process of determining and storing affective information along with an image of a particular scene. When the affective information is stored along with the user's identifier, it is referred to herein as "personal affective information". The personal identifier may be any type of information capable of identifying a particular user. The user identifier may be a personal identification number, such as a globally unique id (guid), a user number, a social security number, or the like. The user identifier may also be a legal full name, a foreign number, a computer user name, or other similar names. Alternatively, the user identifier may include a facial image or description thereof, a fingerprint image or description thereof, a retinal scan image, or the like. The user identifier may also be an internet web address, a cell phone number, or other identifier.
When the personal emotion information is stored together with the corresponding image, it is called a "personal emotion tag". The affective information and user identifier are types of image "metadata," a term used to denote any non-image information related to an image. Other types of image metadata stored in the affective information tag that can be incorporated into personal affective information include information derived from the scene image and non-image information such as time of image capture, type of capture device, location of capture, date of capture, parameters of capture and history of image editing.
Personal affective information can be associated with a digital image by storing the personal affective information in an image file, for example, using a tagged image document format IFD in an Exif image file. Put another way, affective information can be stored in one or more application segments in a JPEG (Joint Photographic Expert Group) file that includes the first image (or alternatively the second image) according to JPEG standard ISO10918-1 (ITU-T.81). Thus, an image file conforming to an industry standard may be used to contain both the first image compressed according to the JPEG standard and stored in the normal JPEG format, and affective information stored in the appropriate format that is ignored by normal JPEG readers. Yet another variation is that the personal affective information can be stored in a database separate from the image. The information may also be stored with security devices and access rights information to prevent unauthorized access to the information.
The emotion tag can be made manually or automatically when a viewer views a particular scene or image of a scene using an image capture device. In the case of manually produced affective tags, the user can use the buttons of the camera, a touch screen display or a voice recognition interface to provide his/her reaction to the scene. For example, upon surprise, the user may press a button on the camera indicating a "surprise" reaction or simply speak a keyword: ' Wa! ".
In the case of automated affective tagging, an image capture system for personal health and security monitoring (hereinafter referred to simply as an image capture system) can use one or a combination of the following signals to gather affective information for subsequent parsing.
Characteristics of eye movements (e.g. duration of eye fixation, size of pupil, speed of blinking, direction of gaze, eye acceleration, features and parameters extracted from eye movement patterns, their complexity, etc.);
-biometric or physiological responses (e.g. Galvanic Skin Response (GSR), temperature of the hands, heart rate, Electromyography (EMG), breathing patterns, electroencephalography (EEG), brain imaging signals, etc.);
facial expressions (e.g. smiling, frowning, etc.)
-speech characteristics (e.g. volume, pace and pitch, etc.);
body posture includes movements of the face (e.g. pinching the bridge of the nose, flexing the ears, etc.).
According to one embodiment of the present invention described below, affective information is determined based on facial expression, length of fixation time and galvanic skin response.
Referring to fig. 1 a-1 c, there are three exemplary embodiments of image capture systems made in accordance with the present invention. The system depicted in fig. 1a includes a handheld image capture device 6 owned by a particular user 2 viewing a scene 4, either directly or through a viewfinder 24, a camera preview screen 22 (these items may also serve the purpose of communication and information feedback). The image capturing device 6 may comprise a digital camera, a hand-held digital video camera, or a wearable video camera, etc. Examples of implementing a wearable image capturing device 6 are represented by fig. 1b and 1 c. These wearable embodiments have a display 21 for communication and feedback, either mounted on a frame 29 carrying the image capturing device 6 shown in fig. 1b, or as a separate component connected to the image capturing device 6 by wire or wirelessly, as shown in fig. 1 c.
The image capturing means 6 comprises a capture module 8 for capturing the scene 4. The capture module 8 includes a viewfinder lens (not shown), an image sensor (not shown), and an a/D converter (not shown). The capture module 8 also includes a microphone (not shown), an audio amplifier (not shown) and an audio a/D converter (not shown). The capture module 8 provides a digital still or moving image signal and an associated digital audio signal. Image capture device 6 also includes a Central Processing Unit (CPU)14 and a digital storage device 12 that can store high resolution image files including digital still or digital moving images and their associated metadata provided by capture module 8. Digital storage device 12 may be a small hard disk, flash EPROM memory, or other type of digital storage device.
The image capture device 6 shown in the figure has a communication module 18 such as a wireless modem or other communication interface, the communication module 18 exchanging data including still and moving images via a communication service provider such as an internet service provider 20. The communication module 18 may utilize a standard radio frequency wireless interface, such as the well-known bluetooth interface or IEEE 802.15 interface. Alternatively, the communication module 18 may exchange information with other devices using infrared, laser, or other optical communication methods. Alternatively still, the image capture device 6 may have a communication module 18, the communication module 18 being adapted to use data exchange hardware such as a USB (Universal Serial Bus) cable, IEEE standard 1394 cable, or other electrical data path such as a cable, cable assembly or waveguide or optical data path to exchange information including digital images and affective information between the image capture device 6 and other equipment.
Referring to fig. 1a, the communication module 18 is connected to a preview display 22 for displaying text messages and providing video communication with the communication service provider 20. In the embodiment shown in fig. 1b and 1c, the text messages and video information received by the user are displayed on a communication display 21, which is mounted on a wearable embodiment of the image capturing device 6. As an example, the display 21 may be mounted on its bezel by an EG-7 model invisible qVGA monitor sold by MicroOptic corporation of Westwood, Massachusetts, USA. Fig. 1c shows another example where the display 21 may be worn on the wrist as part of a portable communication module 23.
To obtain affective information, the image capture device 6 includes a controller 13 and a set of sensors 15 for detecting physiological signals of the user 2. The user 2 may input affective information via a controller 13, which may for example comprise a manual control button, a touch screen, a voice recognition interface or a body gesture recognition interface, etc.
Affective information can also be collected by the sensor group 15. In the embodiment shown in fig. 1a, the sensor group 15 comprises a galvanic skin response sensor 16 mounted on the surface of the image capturing device 6. As shown in fig. 1b, in a wearable embodiment, the galvanic skin response sensor 16 may be mounted anywhere outside the image capture device 6. Here, the galvanic skin response sensor 16 is mounted on a side 29 of a conventional frame 28 used to support the glass. The sensor array 15 may also include a blood vessel sensor 17 integrated into the side 29 adjacent the temporal arteries of the user's head in use to facilitate the measurement of body temperature and heart rate readings. The sensor set 15 may also comprise a vibration sensor 19, as shown in fig. 1b, beside or in contact with the ear. The vibration sensor 19 can detect both a sound emitted from the user and a sound emitted from another sound source. Any of the sensors of the sensor group 15 may be installed according to a layout advantageous for use. Any of the sensors of the sensor group 15 can be miniaturized such that their presence does not affect the appearance of the wearable embodiment of the image capturing device 6. For example, in the embodiment shown in fig. 1c, the sensor 16 for detecting galvanic skin response is part of the wearable image capture device 6, which is mounted in a bridge 26 of a conventional frame.
In other embodiments, the sensor group 15 may comprise neural sensors and other devices suitable for monitoring electrical activity of neural cells for interaction with the environment. Examples of such devices are the Brain Communicator or the Muscle Communicator sold by Neural Signals of atlanta, georgia, usa. These devices monitor the electrical signals of the nerve cells and the signals emitted by the cranial nerves, respectively, with the aim of detecting signals that cause, for example, the sudden violent action of a normal human body. These signals are sent to a computer where they are interpreted by software as information available. Fortunately, this technique can be used to detect both affective information and other information that can be used to determine affective information. For example, neural activity along nerves carrying sound information from the ears can be monitored and used to determine the sound information that the observer actually heard at the time of the event.
The image capturing means 6 further comprise a user video camera 10 for recording video images of eye movements, pupil size, facial expressions, etc. of the user 2. The consumer video camera 10 may include a conventional Charge Coupled Device (CCD) imaging device, a complementary metal oxide imaging device, or a charge injection device. Other imaging techniques may also be used. The images captured by the user's video camera 10 may include video images used to form an image of the user or certain features of the user's face. The images captured by the user video camera 10 may also include other forms of video images from which affective information may be obtained. For example, representing eye position and pupil size does not require the construction of a complete digital image of the user's 2 eyes. Other forms of imaging techniques having lower resolution or non-linear imaging modes may be used instead in order to reduce cost or simplify the imaging structure.
The video images captured by the user's video camera 10 are stored in the digital memory 12 and then processed by the CPU 141. The user video camera 10 may comprise, for example, a camera sensitive to infrared light. In this embodiment, a set of infrared light emitting diodes (infrared LEDs) direct infrared light toward the user's pupils. The user video camera 10 detects infrared signals emitted from the user's eyes. The pupils are then tracked from the facial image of the user. An example of a good consumer video camera is the blu-eye camera sold by international business machines corporation (IBM) of Armonkde, n.y., usa, and an example of another good consumer video camera 10 is the Eyegazesystem sold by LC Technology of Fairfax, fl.
Other useful embodiments of user video camera 10 are described in detail in commonly assigned U.S. patent application serial No. 10/303,978 entitled "camera system with eye monitoring".
The user video camera 10 may be mounted or located inside the hand-held image capture device 6 as shown in FIG. 1 a; on the bezel of the wearable image capturing device 6 as shown in fig. 1 b; or a remote part of the glass frame of the wearable image capturing device 6 as shown in fig. 1 c. For the case of FIG. 1c, the user video camera 10 is particularly suited for capturing various facial features of the user, including pupil size, eye or forehead movement. For the case shown in fig. 1b, it is particularly useful for capturing eye movements and other eye features. User video camera 10 may also be separate from image capture device 6, and in this embodiment, user video camera 10 may comprise any image capture device capable of capturing an image on an image screen of a user of image capture device 6 to transmit the image to image capture device 6. The transfer of images from the remote user video camera 10 can be accomplished wirelessly using any existing wireless communication system.
Feature tracking may be performed; this is done using various algorithms, as described in the document entitled "facial feature tracking for human-machine interface for eye-head control", written by Ko et al, ieee tencon, 1999 conference proceedings, pages 72-75. The algorithm can be used for real-time facial feature tracking, constructing a complete graph from the use of candidate blocks, identifying from the processed facial image, and then calculating a similarity measure for each pair of blocks. When the squares have the greatest similarity, the eyes are considered to be found. Based on the position of the eyes, the mouth, lip angle and nostrils are found. The found features are then tracked.
An example of a wearable image capturing device 6 with a user video camera 10 suitable for recording eye movements can be found in "eye movement behaviour and sensory strategies in complex tasks" written by Pelz et al, for example, Vision Research, No. 41 (2001), 3587-. The authors describe a wearable, lightweight eye tracking system in the form of a head-mounted goggle comprising a module emitting infrared light, a small video eye camera, and a beam splitter to align the camera coaxially with the emitted light beam. Back-reflected, providing illumination to the pupil to produce a bright pupil image. An external reflector deflects the light path toward the front of the goggles, where a heat reflector directs infrared light to the eyes and reflects the eye image back to the eye camera, and a second miniature camera is mounted to the goggles for capturing the view of the user.
In fig. 1b and 1c, the user video camera 10 is made of two pieces in order to capture the features of both eyes simultaneously. It will be apparent, however, that the user video camera 10 may be represented by a block that captures only one eye of the user 2 or may capture both eye features simultaneously.
The image capturing means 6 are provided with suitable software for the purpose of creating and using personalized affective information by the CPU. The software is typically stored in digital memory 12 and may be loaded and updated using communication module 18. Furthermore, the software enables the CPU14 to perform image processing and analysis of the non-affective information contained therein, which may be extracted from the scene image provided by the capture module 8 and stored in the digital storage device 12. The digital memory 12 may also store information relating to individual user profiles, which may be a special database containing information that summarizes the characteristics of the user's 2 response (e.g., quantitative information about typical response patterns to certain scenes or conditions) and a software program that enables the CPU14 to access the specific database when creating and using personalized affective information. Such quantitative information may comprise, for example, a cumulative distribution of user responses to scenes or conditions, which information characterizes, for example, how much the user prefers such scenes or conditions. The individual user profile may be queried by the CPU 14. The individual user profile may be updated with new information learned about the user 2 reaction.
It is clear that all elements and components of the image capturing device 6 discussed above may be realized in one piece with the image capturing device 6 or in physically separate pieces connected to each other by wires or wirelessly.
Various embodiments of methods for image capture device 6 to determine emotional information (e.g., extracting likeness from facial expressions or extracting a classification of emotions from facial expressions and their uniqueness) based on an analysis of facial expressions are described next. Other embodiments represent methods of determining affective information from physiological information, such as extracting a level of interest from pupil size and eye movement or a level of excitement from galvanic skin response. Further embodiments represent methods of determining affective information using affective classification in combination with excitement level. The embodiment may be selected according to the particular application. For example, determining affective information based on the degree of preference is very useful for effective treatment, as the purpose of the treatment is to facilitate and promote a favorable experience. Also, for safety applications or other types of treatment, it is important to detect events that cause negative emotions.
Referring to fig. 2a-2b, a flow diagram is shown illustrating an embodiment of the present invention for providing affective information based on a particular user's preference for images of a particular scene. In this embodiment, the emotion information is decided according to the facial expression of a specific user.
User 2 first starts image capture device 6 (step 110) and, in one embodiment, a software application implementing the method of the present invention is already installed in image capture device 6 and automatically begins operation at step 112. Alternatively, the user 2 may manually launch the application by using appropriate control buttons (not shown) on the image capture device 6.
The user 2 enters a personal ID (identifier) and a password (password) (step 114). In an alternative embodiment, the user video camera, together with facial recognition software, automatically determines the identity of the user and provides an appropriate user identifier, such as the user's name, personal identification number, or other identifier. In another alternative embodiment, the user identification data may be obtained from an external data source, such as a radio transceiver of the communication module 18 used by the image capture device 6. In yet another alternative embodiment, image capture device 6 is pre-programmed with a particular user identifier, and step 114 is not required.
The list of recipients to which image capture device 6 will send the image and affective information and non-affective information is determined by user 2 (step 115). The following is an example of a list of possible recipient classifications arranged according to communication occurrence rates:
1) a personal database;
2) a family member;
3) an agent contact;
4) a health service provider;
5) a safety guarantee mechanism; and/or
6) Local or regional emergency services systems.
This list also reflects the degree of urgency in the order of the numbers: i.e. the larger the number word, the higher the urgency. This information is used by the communication module 18 depicted in fig. 1 a-1 c.
To determine the emotional response when the user 2 observes a scene (step 116), the image capture device 6 optionally provides a selection of signals to be recorded. The user 2 selects the desired signal, that is, the facial expression in this case (step 118). In an alternative embodiment, image capture device 6 is pre-programmed with one or more emotion signals, thus eliminating the need for steps 116 and 118.
The user 2 then instructs the image capturing device 6 to organize the scene to be captured. The capture module 8 captures a first image of the scene (step 120) and, at the same time, the user camera 10 captures a first facial image of the user 2 at step 130.
The image capturing means 6 temporarily stores the scene images (step 122) and automatically analyzes the scene images from all sides (step 123). The purpose of such analysis may be to detect specific things in the scene, for example, things known to cause habitual reactions to the user, or to present threats to the user. This image analysis is done by using off-the-shelf image processing or image understanding algorithms. One such algorithm has been disclosed in commonly assigned U.S. patent No.6,282,317 to Norman et al, filed 1999, 12/14, entitled "strong signal cancellation for enhanced processing of weak spread spectrum signals," which describes a method for automatically determining subject matter in photographic images. Authentic images are produced by identifying flesh, face, sky, grass, etc. as the main features of semantics and prominent "structural" features related to color, texture, brightness, etc., and then combining the two. Another disclosed image processing technique is disclosed in commonly assigned U.S. patent publication No. US2002/0076100A1, filed on 14.12.2000 by Luo, entitled "image processing method for detecting the shape of a human body in a digital image," the disclosure of which is incorporated herein by reference. The method can be used to detect human characters in digital color images. Firstly, segmenting an image into regions which are uniform in color or texture and do not overlap with each other, and then detecting a candidate region with human skin color and a candidate region of a human face; then, for each candidate face region, a region having a human skin color is first selected by combining regions in the vicinity of the face region, and the human shape is constructed based on a human shape drawing model determined in advance. The presence of people or specific characters in a scene established using facial recognition algorithms such as the algorithm described in Liu et al, 5 th IEEE face and gesture automatic recognition International conference paper entitled "Nuclear Fisher decision analysis based facial recognition" (0197 + p 0201, 2002) may also be used.
With respect to the captured face image described in step 130, the image capturing apparatus 6 temporarily stores the face image in step 132, and automatically analyzes the facial expression of the user 2 in step 134. Facial expression analysis may use published facial expression recognition algorithms, such as the algorithm described by Essa et al entitled "facial expression recognition using dynamic models and motion energy" in the ICCV95 conference (Cambrige, massachusetts, 1995) paper. The algorithm is based on knowledge of the facial muscle activation probability distribution associated with each expression and a detailed physical model of the skin muscles. This physics-based model achieves the goal of identifying facial expressions by comparing muscle activations detected from the video signal to typical muscle activations obtained from the emotion-expressing video database.
Facial expression analysis may also be performed using other publicly published algorithms. An example of such an algorithm can be found in "detection, tracking and classification of facial expression action units" published by Lien et al in "robotics and automation systems" (robotics and automation), stage 31, page 131-. Another similar algorithm can be found in the article entitled "measuring facial expressions using computer image analysis", published by Bartlett et al in Psychophysiology "36 th (1999)253-263 page. These algorithms are based on the knowledge of the specific facial movements, basic muscle movements, described in the title "facial movement coding system" published by "consultant Psychologists" publisher (palo alto, ca, usa) by Ekman et al. In a facial motion coding system (FACS), any facial expression can be expressed by combining basic facial motions. For example, a natural smile may be represented by two basic facial movements: 1) the corners of the mouth are lifted by muscles called the "major zygomatic muscles". And 2) the eyes are wrinkled by a muscle called "orbicularis oculi muscle". Thus, when a corner lift and eye wrinkles are detected in the video signal, it means that someone is smiling. When a smile is detected on the face of the user 2, the facial expression analysis results in that the face of the user 2 is recognized as a smile. When no smile is detected, it is no smile.
Image capture device 6 determines a measure of smile (step 138). If no smile is detected, the measure of smile is zero. If a smile is detected in a given image, a measure of the smile in the image may be determined as the maximum distance between the corners of the mouth within the first three seconds of the smile being generated in the given image divided by the distance between the eyes of the user 2. The distance between the eyes of the person is determined using the face recognition algorithm as described above. The necessity to calculate the ratio between the size of the mouth and the size of the measure related to the human head, e.g. the distance between the eyes, is due to the fact that the size of the mouth of user 2 is extracted from a facial image, which depends on the distance between the user and the video camera, the position of the head, etc. The distance between the two eyes of the user 2 is used to take this dependency into account. However, other measures such as the height and width of the face, the area of the face, and other measures may be used.
The image capture device 6 determines the level of preference (step 138). If no smile is detected, the measure of smile, and thus the degree of preference, is equal to zero. If a smile is actually detected, the absolute degree of preference is equal to the measure of the smile. And the relative degree of preference is equal to the smile metric divided by the average smile metric for user 2. The average smile metric may be continuously updated and stored in the digital storage device as part of the individual user profile of user 2. The individual user profile may be used to query and use the smile metric data to update the average smile metric (step 139).
The obtained preference level is compared to a criterion (step 140). The judgments are established to reflect the importance of the image information and emotion information extracted at steps 123 and 138. Such criteria may be defined, for example, by the expression of logical and. That is, if the relevant information is detected from the scene image, or the threshold for affective information (in the example of FIG. 2, the level of preference) is exceeded, or both information in the relevant scene image and the affective threshold are detected, then the criterion in step 140 is satisfied. It will be apparent that the determination in step 140 may be arranged to give priority to either of the two sources of information.
In one embodiment, the criteria may reflect only emotional information, i.e., the importance of the preference. In the described embodiment, the obtained degree of preference is compared to a threshold value established by the user 2 or for the user 2. If the obtained preference exceeds the threshold, image capture device 6 creates an emotion tag for the corresponding image that indicates the preference for that particular captured image (step 144).
In another embodiment, the threshold of like-ness may also be created automatically from an individual user profile, e.g. on the basis of a previously accumulated probability distribution of like-ness of said user. This probability may be 0.5, so that a threshold value for the preference level, corresponding to at least 50% of the events likely to occur. Alternatively, the personal affective tag can include a value selected from a range of preference values to distinguish relative preference between the various captured images.
If the criteria are met, image capture device 6 stores the corresponding image and its emotion tag indicating a preference level in the image file containing the scene image as part of the image metadata (step 146). Alternatively, the personal affective tag that indicates the level of preference can be stored in a separate file that is associated with the user identifier and the image identifier. Furthermore, the observations about the user 2 of certain images (i.e., at the time of capture) may also be recorded and separately entered into the personal affective tag.
In another embodiment, the raw facial image is stored as affective information in a separate file in the image capture device 6, either with the image identifier and user identifier, or as part of the image metadata in a personal affective tag for later analysis and optional use of a separate system. For example, the scene image and raw facial image may be communicated using the communication module 18 (see FIG. 1) and the Internet service provider 20 to another desktop computer (not shown) or computer server (not shown) responsible for performing the previously described analysis functions associated with steps 134 and 138.
The recipient is identified (step 147). In one embodiment, the recipient may be a personal database having an E-mail address provided by an internet service provider. In another embodiment, there may be multiple recipients listed in the list, including personal databases, health service providers, friends, family members, security agencies, and the like. The recipients may also be automatically selected based on the analysis of affective information, image data, and non-image data. In such a case, as part of step 147, such selection may include, for example, comparing a value, such as the level of preference in step 138, for affective information to a predetermined threshold for each recipient in the recipient list. In yet another embodiment, a threshold corresponding to each recipient's preference level is automatically established from the individual user profile, for example, based on a probability distribution of previously accumulated user preference levels. In one embodiment, the cumulative probability of the healthcare provider may be chosen to be 0.9, so that the threshold value for the preference level will correspond to a value that is exceeded in only 10% of cases. In other embodiments, the personal affective tag can include a value selected from a range of preference values, such that the relative degree of preference between the various captured images can be differentiated. In yet another embodiment, the recipients may be selected based solely on the results of the analysis of the scene image or the combination of both the scene information and the affective information, depending on the criteria constructed in step 40.
The corresponding images, personal affective tags, and other image metadata are transmitted to the designated recipient, such as a personal database of digital images, using the communication module 18, such as through the internet provider 20 (step 148). The personal image database may be stored using, for example, a separate desktop computer (not shown) or a computer server (not shown).
In another embodiment, the corresponding images, personal affective tags, image metadata, including derived image information, are sent to a physician or other healthcare provider for additional analysis or evaluation of the user's emotional response to a particular situation. The corresponding images, personal affective tags, image metadata, including derived image information, can also be sent to a plurality of support networks, including family members or local emergency services centers.
The feedback information is displayed on the preview screen 22 or the communication display screen 21 of the camera (step 150). The information is automatically generated by a suitable software program and may include the scene image, or the determined preference level, or both. It may also include or consist solely of a voice signal, a pre-recorded voice message or a computer generated language or image. In further embodiments, the feedback information comes from a doctor or a multitude of support networks to facilitate treatment or to provide assistance to the user 2, in which connection an interactive communication process may be initiated.
If the obtained preference level is below the threshold, the face image and the subject image of the user are deleted (step 242). If the obtained preference level is below the threshold and if user 2 is still viewing the same scene or captured image of the scene, such as the example of preview screen 22, image capture device 6 may optionally capture the next facial image or repeat steps 132 through 140 to determine whether user 2 has changed his facial expression as user 2 views the same scene or captured image of the scene.
If the threshold is set to zero, all scene images captured by the image capture device 6 and corresponding affective information (in other embodiments, the degree of preference, raw facial image) will be permanently stored in a separate image file with an image identifier and user identifier, or in a personal affective tag of the image metadata.
If the user has been powered on, the process of capturing and analyzing the next scene (step 120) and simultaneously determining and storing the personal emotion tags (step 130) and 146 will be repeated (step 126).
As long as the user 2 has been powered up to the image capturing device 6 (step 126), the image capturing device 6 continues to record images of the scene 4 using the capture module 8 and facial images of the user 2 using the user video camera 10. If the power is turned off, the image capture device 6 stops recording the scene image and the facial image and the process of creating the emotion label is also ended (step 128).
The degree of preference may also be used in a digital imaging system to rank images in a systematic and continuous manner according to the images preferred by a particular user, as described in mstraszek et al, co-assigned U.S. patent application serial No. 10/036,113, entitled "creating and using affective information in a digital imaging system," filed 12 months and 26 years 2001, or as described in matraszeksuo et al, serial No. 10/036,123, entitled "method of recording digital images using affective information to generate album pages.
The preference level for the scene image may be determined in a binary manner. When a smile is detected at step 136, the corresponding image is classified as liked, with a binary likeness equal to 1. In other words, when no smile is detected, the image is classified as disliked, and the binary likeness is equal to zero.
The determined emotional information, represented by binary likeness, is then stored in the form of a personal emotion tag, which includes a user identifier as part of the image metadata. It may also be stored in the digital storage device 12 in a separate file with the image identifier and the user identifier. Also, emotion information on a real image of the user's facial expression may also be stored in a separate file in association with the image identifier and the user identifier.
In another embodiment, the captured image is transmitted by the image capture device 6 to the internet service provider 20 only if the affective information exceeds a threshold, such as a threshold set for a relative smile metric. As a result, only images that exceed the preference threshold are stored in the user's personal image database. In this embodiment, metadata is stored in the image file indicating that such file satisfies the threshold.
Referring to fig. 3a-3b, a flow chart of another embodiment of the present invention is shown. In the embodiment, the emotion information is decided according to the emotional classification of the user's reaction with respect to the captured image. In the described embodiment, emotion information is also obtained from an analysis of the facial expressions of the user.
Facial expressions can be classified according to a wider range of emotion types, such as "happy", "sad", "aversive", "surprised", and the like. Entitled "EMPATH: the neural network for facial expression classification "describes an algorithm for facial expression classification that has been disclosed. The algorithm divides facial expressions into 6 basic emotion categories based on developing a feed forward neural network consisting of three layers of neurons: "happy", "sadness", "fear", "anger", "disgust" and "surprise", etc. The neural network performs three layers of processing: sensory analysis, target representation, and classification. Mimicking a group of neurons at the first layer has properties similar to those of complex cells of the visual cortex. The cells of the second layer extract regularity from the data. The third layer is output according to 6 basic emotion categories. As a result, each facial expression is encoded into 6 number words. A number represents an emotion. The numerals representing different emotions are all positive numbers, the sum of which is 1, so that the emotion values can be resolved into probabilities.
The following method may classify emotions based on the user's facial expressions and further provide a range of values for these classifications, and more particularly, the degree of "uniqueness" of the emotion classifications is presented and represented by FIG. 3. The degree of "uniqueness" of an emotion classification reflects a measure of the uniqueness or "purity" of a particular emotion, as opposed to the ambiguity or ambiguity of the emotion. In common language, an emotion is often referred to as a "mixed feeling". This "uniqueness" feature can be understood in terms of similar color saturation.
Steps 210 to 232 of the embodiment of fig. 3a-3b generally correspond to steps 110 to 132 of the embodiment of fig. 2a-2 b.
However, in this embodiment, the image capturing device 6 automatically analyzes the facial expression of the user 2 using the method proposed by Dailey et al, which has been described above (step 234). As a result, the expression of user 2 is associated with 6 number words, one number word representing one basic emotion.
The sentiment category (EC) is determined by selecting the category with the largest value (step 236). For example, if the numerical values are 0.5, 0.01, 0.2, 0.1, 0.19, and 0 for "happy", "sad", "afraid", "angry", "disgust", and "surprised", respectively, the determined emotional category is happy because it corresponds to the numerical value 0.5 being the largest. The result is that scenes that excite happy facial expressions are designated as the "happy" emotion category, scenes that excite "sad" facial expressions are designated as the "sad" emotion category, and so on. When several categories have the same value, one of the categories is randomly selected to correspond to the facial expression. In other words, when several categories have the same value, additional emotional or non-emotional information may be utilized to assist in selecting a category.
The degree of distinctiveness of the emotion classification is determined by image capture device 6 (step 238). Degree of uniqueness (DD)EC) Is calculated from the 6 emotion numbers created in step 236 as described above. For the sake of convenience in the art,as N1, N2, N3, N4, N5, and N6. The following expression is used by the present invention to determine the degree of uniqueness of the identified emotion categories EC:
DDECthe absolute degree of distinctiveness corresponding to the emotion category EC. The relative degree of distinctiveness is defined as the absolute degree of distinctiveness of the emotion class EC divided by DD for the corresponding emotion classification for a particular userECAverage value of (a). Average DDECMay be constantly updated and stored in the digital storage device 12 as part of the personal user profile of the user 2. Individual user profiles can be queried and DD for emotion categoriesECThe average degree of uniqueness (step 239).
When preference degrees are used (step 240), the obtained degree of distinctiveness is compared to criteria similar to those previously resolved in FIG. 2. In step 240, the obtained degree of distinctiveness is compared with a criterion, which is constructed to reflect the importance of the image information and emotion information extracted from steps 223 and 238. This criterion may be defined by, for example, a logical and expression. That is, if a threshold value of the related information or emotional information, i.e., a threshold value of the degree of distinctiveness, is exceeded or both the detection of the related scene image and the threshold value of the degree of distinctiveness are exceeded simultaneously, the criterion in step 240 is satisfied. It is clear that the criterion in step 240 can be set so as to give priority to one of the two information sources.
In one embodiment, the criteria may reflect only the importance of the affective information, i.e., the degree of distinctiveness. In such an embodiment, the obtained degree of distinctiveness is compared to a threshold established by the user 2 or for the user 2. If the obtained degree of distinctiveness is above the threshold, image capture device 6 creates a personal affective tag for the corresponding image, which indicates the emotional category and the degree of distinctiveness of the particular captured image (step 244).
In another embodiment, the threshold degree of distinctiveness may also be automatically established from the individual user profile, for example, based on a previously accumulated probability distribution of degrees of distinctiveness of user 2 corresponding to a particular emotional category. Such a probability may be 0.5, such that a threshold for a degree of distinctiveness indicates that at least 50% of the events occurred. Alternatively, the personal affective tag can include a value selected from a range of distinctive values, such that differences in the degree of relative distinctiveness between the various captured images can be manifested.
Image capture device 6 stores the corresponding image and the personal affective tag, which represents the affective category with the degree of distinctiveness, in the image file containing the scene image as part of the image metadata (step 246). In other words, personal sentiment tags representing sentiment classifications having a degree of distinctiveness may be stored in a separate file in association with the user identifier and the image identifier. Also, information about the date the user viewed a particular image (i.e., at the moment of capture) may be entered as a separate entry into the personal affective tag.
In another embodiment, the raw facial image is stored as affective information in a separate file in the image capture device 6, either with the image identifier and user identifier, or in a personal affective tag as part of the personal metadata for later analysis, and a separate system may optionally be used. For example, the scene image and raw facial image may be transmitted using the wireless modem 18 (see FIG. 1) and the Internet service provider 20 to a separate desktop computer (not shown) or computer server (not shown) that may perform the analysis previously described with respect to step 234 and 238.
The recipient is identified in a manner similar to step 147 mentioned in fig. 2a and 2 b.
The emotion type and degree of distinctiveness used are used as the source of emotion information (step 247).
The corresponding image, personal affective tag, and other image metadata are sent to the internet service provider 20 or other communication network recipient, such as a digital image database of the individual, using communication module 18 (step 248). The personal image database may be stored using a separate desktop computer (not shown) or computer server (not shown).
In further embodiments, the corresponding image, personal affective tag, and image metadata including the derived image information are sent to a physician or other medical service provider for additional analysis or evaluation of the user's specific emotional response to the specific context (step 248). The corresponding images, personal affective tags, and image metadata along with the derived image information can also be sent to a plurality of support networks, including family members or local emergency services centers.
The feedback information is displayed on the camera's preview screen 22 or other communication screen 21 (step 250). The information is automatically generated by a corresponding software program and may comprise an image of the scene, an already determined emotional category with a degree of distinctiveness, or both. It may also contain or include only the following information: a sound signal, a pre-recorded voice message, or a computer generated language. In further embodiments, the feedback may be sent by a physician or a number of support networks to facilitate treatment or targeted assistance to the user 2, thereby initiating an interactive communication process.
If the criteria specified at step 240 cannot be met, for example, the obtained degree of distinctiveness is below a threshold, the user's facial image and scene image are deleted.
If the obtained degree of distinctiveness is below the threshold while user 2 is still viewing the same scene or captured image of the scene, e.g., the image of preview screen 22, image capture device 6 can optionally capture the next facial image and repeat the operations of steps 232 through 240 to determine if user 2 has changed his facial expression while user 2 is viewing the scene or captured image of the scene.
If the threshold is set to 0, all scene images and corresponding affective information (affective category with degree of distinctiveness or raw facial image in other embodiments) recorded by the image capture device 6 will be stored permanently as affective information in a separate file in the image capture device 6 along with the image identification and user identifier or in a personal affective tag as part of the image metadata.
If user 2 is powered on all the time, the next scene image is captured (step 220) along with 223 and the process of determining and storing the personal affective tag for the captured image (step 230) is repeated (step 226).
As long as the user 2 has been powered on the image capture device 6, the image capture device 6 continues to record images of the scene 4 using the capture module 8 and to capture facial images using the user video camera 10. If the power is turned off, image capture device 6 stops recording the image of the scene and the facial image and the affective tag fabrication process ends (step 228).
In the previously discussed embodiment, affective information is extracted from facial features of user 2. Fig. 4a and 4b present a flow chart illustrating a further embodiment of the invention in which affective information is provided based on physiological factors, namely the degree of interest as indicated by eye fixation time. In this embodiment, the degree of interest is determined based on the eye fixation time, i.e., the time that the eyes of user 2 stay in a particular position in the scene before turning to another location.
According to the book Psychophysiology, 1993, 30 th, page 261 and 273 published by Lang et al, the title is "view picture: emotional, facial, visceral, and behavioral responses "the data described, on average, gaze time is linearly related to the degree of interest or attention of a viewer. Thus, according to such a relationship, gaze time may be used to resolve the degree to which a user is interested in a certain area of the scene. In the cited publication of Lang et al, the viewing time is only compared with the degree of interest in the image of the third party scene. In the present invention, the fixation time information is stored directly as an evaluation of the scene and as a first party scene image into a personal affective tag, either as part of the image metadata or in a separate file in association with the user identifier and the image identifier.
In the embodiment of fig. 4a and 4b, the method of steps 310 and 328 generally corresponds to the method of steps 110 and 128 of fig. 2a and 2b, with only one difference: at step 318, the user selects the "fix time" signal. In other words, the image capture device 6 may be pre-programmed to capture "fixation time" information.
In this embodiment, the user 2 views the scene during a time window (e.g., 30 seconds) when playing back (step 330) immediately after image formation, capture, and/or capture, and the user video camera 10 on the image capture device 6 captures a sample of an eye image of one eye of the user 2. In some embodiments, the time window may be modified by user 2.
At the same time, the scene image is analyzed in step 323, as already described in detail above in step 123 of fig. 2a and 2 b.
The coordinates of the user 2 eye gaze direction are stored along with the sampling rate, e.g. 60Hz (step 332). In some embodiments, user 2 may modify the sampling rate. The sampling rate may also be modified based on other factors such as the rate of change of eye gaze over time, the rate of change of scene content, or the amount of memory available to store emotion data.
The raw gaze coordinates are divided into a combination of eye fixations (step 334). A certain eye fixation is typically defined as a period of at least 50 milliseconds during which the eye coordinates do not change more than 1 degree of vision. For each fixation, the start time, end time and gaze coordinates are determined. Also, an average pupil size may be determined for each fixation. The duration of the eye fixation is measured in terms of their start and end times (step 336).
The image capture device 6 determines the degree of interest in each eye fixation (step 338). The absolute interest is defined as the corresponding fixation time. The relative interest is defined as the fixation time divided by the average fixation time for a particular user. The average fixation time may be continuously updated and stored in the digital storage device 12 as part of the individual user profile of the user 2. The individual user profile may be queried and refreshed for the average fixation time of user 2 (step 339).
The subsequent steps 340 and 350 respectively correspond to the steps 140 and 240 and 250 described in the previous embodiments shown in FIG. 2a, FIG. 2b, FIG. 3a and FIG. 3b, except for the type of information recorded by the personal affective tag recorded in the step 344. That is, in the present embodiment, the emotion tag records the degree of interest.
In one embodiment, image capture device 6 stores the level of interest as part of the image metadata in a personal affective tag along with the corresponding image (step 346). The data stored in the image metadata may contain data characterizing the personal affective information, or may contain data indicating the location of the file including the personal affective information. Also, date information about the user viewing a certain image may be recorded as a separate item in the personal emotion tag.
In another embodiment, the scene image and the unprocessed eye image are stored. The raw eye image may be available for later analysis by the CPU14 or by a processor (not shown) in a separate device.
If the obtained level of interest is below a threshold, then the eyes are deleted as well as the scene image (step 342).
In another embodiment, if the obtained level of interest is below the threshold set at step 340, while the user 2 is still viewing the same captured image, e.g., the captured image on the preview screen 22, the image capture device 6 may optionally capture another segment of the eye image and repeat steps 332 through 340 to determine whether the user 2 has changed the level of interest in the captured image.
If the threshold is set to zero, all scene images recorded by the image capturing device 6 and corresponding affective information (degree of interest or raw eye image in another embodiment) can be stored as affective information in the image capturing device 6 in a file along with the image identification and user identifier, separately, or as part of the image metadata in a personal affective tag.
In an alternative embodiment, the user video camera 10 and central processor 14 may be used to obtain additional information from images of at least one eye of the user. Examples of such information include, but are not limited to, eye acceleration, tear information, eye temperature, iris reticulation pattern, blood vessel pattern, and blood vessel size. This information may be used to determine the personal identification, emotional state, and/or health of user 2. The information may be stored as part of the sentiment tag.
The source of the further affective information is taken from the physiological signals generated by the user 2. Fig. 5a and 5b show an embodiment of the invention in which affective information is determined from physiological signals. In the embodiment, the physiological signal is a skin conduction signal and emotion information derived from the skin conduction signal, and is expressed by the degree of excitement.
The change in skin conductivity is a measure of the galvanic skin response. Skin conductivity reflects the change in the magnitude of the skin conductivity as a measure of the response to an event (the scene being viewed or an image of the scene). Such as Lang et al, published on the journal Psychophysiology, 1993, 30, pages 261 and 273 under the heading "View: emotional, facial, visceral, and behavioral responses "as described in the paper, changes in skin conductance depend on the image's motivation to the viewer: the higher the conductivity, the lower the excitation or excitement, whereas the lower the conductivity, the higher the excitation. The measure of the magnitude of the skin conductivity response may also be used to determine interest or attention.
In the present embodiment, the method of steps 410-428 generally corresponds to 110-128 of fig. 2a and 2b, the only difference being that at step 148, the user may manually instruct the image capture device 6 to capture galvanic skin response information, at least as part of the affective information. Alternatively, the image capture device 6 may be pre-programmed to capture galvanic skin response information. The image capture device 6 measures the galvanic skin response signal using the physiological sensor 16 for the duration of the time window (e.g., the time window is 5 seconds) (step 430). In some embodiments, the time window may be modified by the user. An example of a galvanic skin response signal sensor 16 is SC-Flex/Pro +, and the Procomp detection system manufactured by Tought Technology, Inc. of Chazy, N.Y., USA.
The galvanic skin response conductivity signal is stored after sampling at a sampling rate, for example 60Hz (step 432). In one embodiment, the sampling rate may be modified by user 2. The sampling rate may also be based on other factors such as the rate of change of scene content, the rate of time change of galvanic skin response or the capacity of the memory available to store affective data. The galvanic skin response skin conductance signal is filtered to reduce noise in the data (step 434). The amplitude of the galvanic skin response signal is then determined (step 436).
The image capture device 6 determines the excitement level (step 438). The absolute excitement level of the scene is equal to the amplitude of the filtered galvanic skin response conductivity signal. The relative excitement is defined as the amplitude of the galvanic skin response signal divided by the average galvanic skin response signal for a particular user. The average skin conductance may be continually refreshed and stored in digital storage device 12 as part of the user's mental profile. To calculate the relative excitement level, the average conductance response information is retrieved from the user profile. The skin conductance response information in the individual user profile is continually refreshed (step 439).
The obtained degree of excitement is compared to a criterion that is generated to reflect both affective information and image information extracted from steps 423 and 438 (step 440). Such criteria may be defined by the expression of logical and. That is, the criterion of step 440 is satisfied if it is detected that the relevant information, or a threshold for affective information (in this case, excitement level) is exceeded, or that both the relevant image information and excitement level threshold are exceeded. It is clear that the criterion of step 440 can be set so as to give priority to either of the two information sources.
In one embodiment, the criteria may reflect only the importance of affective information, i.e., excitement level. In this embodiment, the obtained excitement level is compared to a threshold established by or for user 2 (step 440). If the obtained excitement level exceeds the threshold, image capture device 6 creates a personal affective tag for the corresponding image that indicates excitement level for the particular captured image (step 444). In another embodiment, the threshold for excitement level may also be automatically established from a personal user profile, for example, based on a previously accumulated probability distribution of the user excitement level. The probability may be equal to 0.5, so that the threshold value for the degree of excitement corresponds to a value that indicates that at least 50% of the cases will occur. Alternatively, the personal affective tag can include a value selected from a range of excitement values such that the relative excitement of the various captured images is differentiated.
If the criteria in step 440 are met, the image capture device 6 stores the corresponding image and the personal affective tag representing the level of excitement as part of the image metadata in an image file containing the scene image (steps 444 and 446). In other words, the personal emotion tag representing the excitement degree may be stored in a separate file in association with the user identifier and the image identifier. Also, the date that the user observed a particular image (i.e., the image at the time of capture) may be recorded as a separate item entered into the personal affective tag.
In another embodiment, the raw galvanic skin response signals are stored as affective signals in a separate file in the image capture device 6, either along with the image identifier and user identifier, or stored in a personal affective tag as part of the image metadata for later analysis. A separate system may be selected for use. For example, the scene image and raw galvanic skin response signal may be transmitted using the wireless communication module 18 (see FIG. 1) and the Internet service provider 20 to a separate desktop computer (not shown) or computer server (not shown) capable of performing the analysis previously described between steps 434 and 438.
In a manner similar to step 147 of FIGS. 2a and 2b, the recipients are identified using the emotion classifications and the degree of distinctiveness from which the emotion information originated (step 447). In one embodiment, the recipient may be a personal database having an email or web address provided by an internet provider. In another embodiment, the recipient may be a healthcare service provider or a security department. In yet another embodiment, there may be multiple recipients selected from the list, including personal databases, health service providers, friends, family members, security departments, and the like. Recipients may also be automatically selected based on an analysis of affective information, image data, and non-image data. In such a case, as part of step 447, such a decision may include, for example, comparing the affective information value, the excitement level as determined at step 438, with a threshold value previously determined for each recipient selected from the recipient list.
In another embodiment, a threshold level of preference set for each recipient is automatically established based on an individual user profile, such as a probability distribution of previously accumulated user preference levels. In one embodiment, a cumulative probability of 0.9 may be selected for the health service provider, and thus, the preference threshold will correspond to a value that is exceeded for only 10% of cases.
In yet another embodiment, the personal affective tag can include a value selected from a range of preference values, such that the relative preference levels between the various captured images can be differentiated. In various embodiments, the recipients may be selected based solely on the results of the scene image analysis, or the recipients may be selected based on a combination of scene information and affective information, depending on the criteria established in step 440.
The corresponding images, personal affective tags, and other image metadata can be transmitted using the communication module 18 and a communication network, such as that provided by the internet service provider 20, for transmission of the images to an identified recipient, such as a personal database of digital images (step 448). The personal image database may be stored, for example, using a separate desktop computer (not shown) or computer server (not shown).
In another embodiment, the corresponding images, personal affective tags, image metadata (including derived image information) are sent to a physician or other health service provider for additional analysis or evaluation of the user's specific affective response to a specific context. The corresponding images, personal affective tags, image metadata (including derived image information) can also be sent to a number of support networks, including family members and local emergency services.
The feedback information is displayed on the camera preview screen 22 or the communication screen 21 (step 450). The information is generated by a suitable software program and includes an image of the scene, the determined degree of excitement, or both. The information may also comprise or consist only of sound signals, pre-recorded voice messages or computer generated speech or images. In another embodiment, the feedback may be sent by the physician or multiple support networks to facilitate treatment.
If the obtained excitement level is below the threshold, the galvanic skin response signal and the scene image of user 2 are deleted (step 442). In another embodiment, if the obtained excitement level is below the threshold and if the user 2 is still viewing (e.g., on the preview screen 22) the same scene or a captured image of the scene, the image capture device 6 may selectively capture the next galvanic skin response segment and repeat steps 432 to 440 to determine if the user 2 has changed his galvanic skin response while the user 2 is viewing the captured image.
If the threshold is set to zero, all scene images captured by the image capture device 6 and corresponding affective information (excitement level or in another embodiment raw galvanic skin response signals) will be stored as affective information, either permanently or in a separate file with the image identifier and user identifier in the image capture device 6, or in a personal affective tag as part of the image metadata.
It is clear that different users 2 have different physiological and facial responses to an image. Some users may exhibit strong physiological responses, but only mild facial responses. Other users may exhibit a mild physiological response and a strong facial response. Still other users may exhibit mild physiological and facial responses. By integrating different types of emotional information, a more reliable emotional response of the user 2 to the scene can be obtained. The following embodiments show methods of parsing affective information. Combining physiological and facial response information is more useful for the analysis of affective information.
Referring to FIG. 6a and FIG. 6b, there is shown a flow chart illustrating an embodiment of the present invention for providing affective information based on the combination of affective signals described in FIGS. 3a, 3b, 5a and 5b, i.e., the degree of uniqueness and excitement of the determined affective class (further combining such affective information yields a complete measure of emotional response). .
In the present embodiment, method steps 510 to 528 correspond to method steps 110 to 128 of fig. 2a and 2b, with the only difference that, in step 518, the user selects the "integrated facial expression and galvanic skin response" (in the following text "integrated") signal or, in other words, the image capturing apparatus 6 is pre-programmed to use the "integrated" signal.
Thus, the image capturing means 6 captures an image of the face and skin conductance information of the user 2 (corresponding to steps 530 and 531, respectively).
Determining, by the image capturing device 6, an emotion class (DD) from the facial expressions, such as previously described in FIG. 3 and steps 232 to 238EC) Degree of uniqueness (step 532). The Degree of Excitement (DE) is determined by the image capture device 6 from the skin conductivity in the same manner as steps 432 to 438 in fig. 5 (step 536).
Image capture device 6 determines the magnitude of the emotional response (step 538). This can be done in various ways. For example, the magnitude of emotional response can be expressed as the sum of two quantities:
emotional response (DD)EC+DE
Information about the particular emotion experienced by the user may be obtained by consulting the emotion classification EC.
In another embodiment, the magnitude of the emotional response is determined by the square root of the sum of the squares of the two quantities.
In yet another embodiment, the magnitude of the emotional response may be calculated from a weighted sum of the two quantities.
Emotional response ═ wDDDDEC+wDEDE
Wherein the weight w is determined according to the standard deviation of each normalized (divided by the maximum) signal previously acquired for a particular userDDAnd wDE. In this case, the larger the standard deviation of the signal, the higher the weight of the contribution of the signal to the measure of emotional response. Thus, the lower the standard deviation of a given signal, the lower the weight of the contribution of the corresponding signal to the measure of emotional response. The reason for this dependency is based on the assumption that: the standard deviation of a particular measure for a particular user reflects the degree of individual variation between different scenes. This means that the signal with the largest standard deviation has a greater ability to distinguish and, for a particular user, more emotional response.
For example, if different scenes cause a large change in facial expression of user A and a small change in skin conductance, it is applied to the emotion class (DD) according to the facial expressionEC) The measure of uniqueness may be weighted more than the measure weight w applied to the Degree of Excitement (DE) according to the skin conductanceDE. On the other hand, if the change in facial expression of user B caused by different scenes is small and the change in skin conductivity caused is large, the relationship between the weights is reversed. Data regarding the corresponding signal maxima and standard deviations may be obtained from the individual user profile (step 539). This information is then used to update the individual user profile.
Similar to other embodiments described previously, the magnitude of the emotional response obtained is compared to the criteria extracted in steps 523 and 538, which reflect both the emotional information and the importance of the image information (step 540).
If the criteria are met, the image capture device 6 stores the corresponding image and the personal affective tag reflecting the magnitude of the emotional response as part of the image metadata in an image file containing the scene image (steps 544 and 546). Alternatively, the personal affective tag, which indicates the magnitude of emotional response, can be stored in a separate file in association with the image identifier and the user identifier. Further, information on data that the user observes a certain image (at the time of capturing the image) may also be recorded as a single piece of information in the personal emotion tag.
In another embodiment, the raw galvanic skin response signal and facial image as affective information can be stored in a separate file on the image capture device 6, either along with the image identifier and user identifier, or stored in a personal affective tag as part of the image metadata for later analysis by a later time-selective further system. For example, the scene image, facial image and raw galvanic skin response signals may be transmitted using the wireless modem 18 (see FIG. 1) and the Internet service provider 20 to a separate desktop computer (not shown) or computer server (not shown) that may perform the analysis previously described with respect to steps 532 and 538.
The recipient is identified (step 547) in a manner similar to step 147 of fig. 2a and 2b above. In one embodiment, the recipient may be a personal database having an email address or a web site address provided by the internet service provider 20. In another embodiment, the recipient may be a health service provider or a security department. In another embodiment, there may be multiple recipients from a list, including personal databases, health service providers, friends, family members, security departments, and the like. Recipients may be automatically selected based on an analysis of the affective information, image data, and non-image data. In this case, this decision as part of step 547 may consist of: for example, the sentiment information value (e.g., the sentiment magnitude determined by step 538) is compared to a pre-specified corresponding threshold value for each recipient extracted from the recipient list.
In another embodiment, the threshold specified for the magnitude of emotional response for each recipient is automatically established based on the individual user profile (e.g., a probability distribution of previously accumulated emotional responses of the user). In one embodiment, a cumulative probability of 0.9 may be chosen for the health service provider, so that the threshold value established for the magnitude of emotional response would be equivalent to a value that would be exceeded in only 10% of cases.
In yet another embodiment, the personal affective tag can include a value selected from a range of response values such that the relative magnitude of affective response between the various captured images is differentiated. In various embodiments, recipients may be selected based solely on the results of the scene image analysis or based on a combination of scene information and affective information, depending on the criteria formation process in step 540.
The corresponding image, personal affective tag, and other image metadata are sent to the identified recipient, e.g., a personal database of digital images, using the communication module 18 on the internet service provider (step 548). A separate desktop computer (not shown) or computer server (not shown) may be utilized to store the personal database of images.
In another embodiment, the corresponding images, personal affective tags, and other image metadata and derived image information are sent to a physician or other health service provider for additional analysis of the user's specific emotional response to a particular situation. The corresponding images, personal affective tags, and other image metadata and derived image information can also be sent to a number of support networks, including family members.
The feedback information is displayed on the preview screen 22 or the communication screen 21 of the camera (step 550). The information is automatically generated by a suitable software program and may include an image of the scene, the determined magnitude of emotional response, or both. It may also contain or include only sound signals, pre-recorded voice messages or computer generated speech or images.
In another embodiment, feedback may be sent to the physician or multiple support networks to facilitate treatment.
If the obtained emotional response magnitude is below the threshold, the facial image, the galvanic skin response of the user, and the scene image are deleted (step 542).
In another embodiment, if the magnitude of the obtained response to the determined emotion is below a threshold and user 2 is still viewing the same scene or a captured image of the scene, for example on preview screen 22, image capture device 6 may optionally capture the next facial image, record the galvanic skin response segment and repeat steps 532 through 540 to determine whether user 2 has changed her facial expression and galvanic skin response while viewing the scene or captured image of the scene.
If the threshold is set to zero, all scene images and corresponding affective information (magnitude of affective response or, in another embodiment, raw facial images and galvanic skin response signals) recorded by the image capture device 6 will be permanently recorded as affective information or in a separate file on the image capture device 6 with the image and user identifiers, or in a personal affective tag as part of the image metadata.
In another embodiment, different combinations of facial expressions, eye characteristics, and physiological responses may be used to create personal affective tags that classify scenery according to a wide range of emotional categories, such as "happy", "feared", "angry", and the like. Table 1 gives examples of such classifications.
Table 1: emotion classification based on facial expressions, eye features and physiological responses
Different combinations of the signals described with respect to fig. 2a, 2b, 3a, 3b, 4a, 5a and 5b or other affective signals (e.g., derived from speech, EEG, brain scan, eye movement, eye image and others) can be used to create personal affective tags to classify scenes according to a wider range of affective categories.
A range of values from these classes can be used to further classify the image, such as very happy, somewhat happy, moderate and somewhat sad, and very sad, etc.
The determined emotion information represented by the emotion classification is then stored as an emotion tag including a user identifier, and becomes part of the image metadata. The determined sentiment information represented by the sentiment classification may also be stored in a separate file on the computer along with the image identification and the user identifier.
The computer program for creating and using personal affective tags on the image capture device can be recorded on one or more storage media, for example, magnetic storage media such as a diskette (e.g., floppy disk) or tape; optical storage media such as optical disks, optical tape, or machine-readable bar codes; solid state electronic storage such as Random Access Memory (RAM), or Read Only Memory (ROM); or any other physical device or medium employed to store a computer program comprising instructions which can implement the methods of the present invention.
The personal affective tag can also include information indicating the relative magnitude of the affective response. As previously mentioned, the relative magnitude of the emotional response may be determined based solely on the emotional information. Alternatively, information on emotion and non-emotion can be combined to determine the relative magnitude of the emotional response. Examples of image metadata have been described above and include date and time information, location information (e.g., information obtained from a global positioning system, GPS, or similar electronic locator). Image analysis of the image itself may also be used as a source of non-affective information that may affect the relative importance level. As previously mentioned, the presence of a particular object in a scene is readily recognized by current image processing and image understanding algorithms. For example, an algorithm for automatically determining a primary target in a photographic image is described in commonly assigned U.S. patent No.6,282,317 entitled "method for automatically determining a subject target in a photographic image," filed by LUO on 31/12/1998. As described in the LIU et al article previously, the use of facial recognition algorithms to determine the presence of people in a scene or a particular population of people may be used to increase the relative magnitude of emotional responses. The algorithm may also be used to selectively process images in order to increase the quality of the image, accentuating the subject matter, as described in european patent No. ep1211637 filed by LUO et al, in order to share the image with selected persons or to send the image to the authorities concerned for security reasons.
Fig. 7 illustrates an embodiment of the present invention in which a plurality of image capture devices 6 are used for personal safety and health monitoring purposes, the operation of which is illustrated in fig. 1-6. Information is obtained from each device and details of events and human responses are recombined by integrating scene image affective information from multiple users. Thus, in steps 610 and 611, the user 602 and the user 603 turn on the power of their image capturing device 6.
As previously described in steps 112 and 118 of fig. 2a and 2b, the users 602 and 603 enter their identification data, configuration signal settings and recipient lists (steps 612 and 613). A user ID and a password may be used as identification data. In another alternative embodiment, the user video camera 10 is used in conjunction with facial recognition software to automatically determine a user identifier and provide an appropriate user identifier, such as the user's name, personal identification number, or fingerprint data. In another alternative embodiment, the image capture device 6 is pre-programmed with user identifier data. Thus, no user identifier data need be entered.
An image capture device 6 having a first user 602 obtains an image of an event and a reaction of the first user. (steps 614 and 616).
At the same time, a similar image capturing device 6 having a second user 603 acquires a different image of the event and the second user's response, steps 625 and 617.
The analysis of the event image is performed automatically for the first user at step 618 and for the second user at step 619. The analysis of the scene is performed using a process similar to that previously described in step 123. The processing is directed to identifying a particular subject matter or classifying the scene image, for example, into accidents, crime scenes, explosions, etc. The images may be further analyzed and used to determine their security status or to identify involved personnel. The scene image and any affective information can also be sent directly to an information processing center without analysis for such analysis.
The responses of the users 602 and 603, such as physiological responses (e.g., galvanic skin responses), eye movement data (e.g., fixation time), or facial expressions or combinations thereof, are analyzed (steps 620 and 621) according to an analysis process similar to steps 139, 232, 239, 432, 439, or 532 539 of FIGS. 2-6. In one embodiment, the result of the process is to combine the degree of significance of the emotion classification obtained from the facial image of the user and the degree of excitement determined from the galvanic skin response signal to determine the magnitude of the emotional response. In another alternative embodiment, the affective information in the form of raw signals is transmitted to a central station for analysis.
The results of the scene analysis and the analysis of the user's reactions in steps 622 and 623 performed for the first user and the second user, respectively, are compared to criteria. The criteria may reflect appropriate image data, magnitude of emotional response associated with a predefined threshold, or a combination of both types of data, similar to those previously described in steps 140, 240, 340, 440 or 540 of FIGS. 2,3, 4, 5 and 6, respectively.
If the criteria are met, the personal affective tag, the corresponding scene image and non-image data (e.g., data and GPS signals) are sent to an appropriate recipient, such as an emergency processing center. This is done at step 624 for the first user and step 625 for the second user. The information is also sent to the personal user database.
Information received from image capture devices 6 having different users is analyzed and compared (step 626). One way to compare such information is to classify based on GPS signals. If the GPS signals from the two devices indicate that the two users are co-located, the image data, affective information, and other non-image data are combined to reconstruct a "multi-view" of the original event. The information may also be bundled together for later use in investigating related events.
The appropriate reaction is taken (step 628). An example of such a reaction is the dispatch of a police or ambulance to the scene. Another example of an action is to provide feedback information to the user in the form of a visual or voice message using the preview screen 22. In a catastrophic event, the feedback information may include information that guides user 2 to a safe location.
In the embodiments described above, the image and image capture system have been described in the form of a digital image and a digital image capture system. Consistent with the principles of the present invention, an image of a scene may be captured in analog electronic form or on an optical medium such as a photographic film or film. When an image is captured in one of these forms, data representing affective information can be recorded along with the associated image by recording affective information separately from the image and identifying the image with which the information is associated with an identification code. Alternatively, affective information can be encoded and recorded in association with the analog electronic image. In the case of photographic film, the affective information can be recorded on the film in optical or electromagnetic form. The affective information can also be recorded on an electronic memory associated with the film.
According to the invention, the affective information is described in a form that is acquired at the time of capture, at the time of capture or during capture. As used herein, these terms encompass the period of time that any image is composed or captured. Such time periods may also include periods immediately following capture in which captured images or images representative of captured images are reviewed in a quick review or preview manner, such as described in commonly assigned U.S. patent No.6,441,854 entitled "electronic camera with quick review of recently captured image functionality" filed on 20.2.1997 by felagara et al, and commonly assigned U.S. patent application No.09/012,144 entitled "electronic camera with quick review and quick erase features" filed on 20.1.1998 by Napoli et al.