US20110040155A1

Movatterモバイル変換

Info

Publication number: US20110040155A1
Application number: US12/540,735
Authority: US
Inventors: Barbara S. Guzak; Hung-Tack Kwan; Janki Y. Vora
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 2009-08-13
Filing date: 2009-08-13
Publication date: 2011-02-17
Also published as: US9329758B2; US20140068472A1

Abstract

Sensory inputs of a user can be received by a computing device. At least one of the sensory inputs can include a physiological input providing a physiological measurement from a body of the user. Each sensory input can be processed in a unique one of a set of standards-defined sensory channels, each corresponding to a specific emotion dimension. Processing the sensory inputs can transform the physiological measurement into an emotion dimension value. The emotion dimension values from each of the sensory channels can be aggregated to generate at least one emotion datum value, which is a standards-defined value for an emotional characteristic of the user. Historical data for a user can be optionally collected and used by a learning and calibration component to improve the accuracy of the generated emotion datum values for a specific individual. A programmatic action driven by the emotion datum value can be performed.

Description

BACKGROUND

The present invention relates to the field of human-to-machine interfacing, more particularly to a multisensory channel approach for translating human emotions in a computing environment.

Humans' interactions and interfaces involving machines have been rapidly evolving as computing devices have a continuingly escalating prominence in business and social settings. From a business perspective, virtual meetings save time, expense, and hassle compared to in-person meetings. From an entertainment perspective, on-line virtual spaces (e.g., SECOND LIFE, ACTIVE WORLD, OPENSIM, WORLD OF WARCRAFT, etc.) provide a rich interactive marketplace, which can be customized for preferences of a user. From a socialization perspective, computing devices have been pervasively enabling family and friends to maintain contact with one another.

One current weakness with interactions between humans and machines relates to emotional expressiveness. A common way for communicating emotions within electronic correspondence is to type a description of emotions within plain text, often preceded with an “emote” tag. Emoticons, which are icons or graphics denoting specific emotions, can also be used to exchange emotions within electronic correspondence.

Existing emotion conveyance techniques lack low level application integration and generally exist within a distinct communication channel isolated from the remainder of a running application. Additionally, today's emotion conveyance techniques are manually driven by a user, requiring explicit user input conveying a user's emotional state. Humans, however, are often reluctant to honestly convey their emotional state with others, even if capable of an accurate self-assessment of emotional state, which can be problematic as well.

SUMMARY

One aspect of the disclosure includes a method and computer program product for incorporating human emotions in a computing environment. In this aspect, sensory inputs of a user can be received by a computing device. At least one of the sensory inputs can include a physiological input providing a physiological measurement from a body of the user. Each sensory input can be processed in a unique one of a set of standards-defined sensory channels, each corresponding to a specific emotion dimension. Processing the sensory inputs can transform the physiological measurement into an emotion dimension value. The emotion dimension values from each of the sensory channels can be aggregated to generate at least one emotion datum value, which is a standards-defined value for an emotional characteristic of the user. The emotion datum value can be a value independent of any sensory capture device and independent of any single one of the standards-defined sensory channels. A programmatic action driven by the emotion datum value can be performed.

Another aspect of the disclosure is for a system for incorporating human emotions in a computing environment. The system can include a set of discrete sensory channels, a set of in-channel processors, and a sensory aggregator. Each of the discrete sensory channels can be a standards-defined sensory channel corresponding to a specific emotion dimension. Sensory input handled within the sensory channels can include physiological input providing a physiological measurement from a body of the user. The in-channel processors can process sensory input specific to the channel and can generate emotion dimension values from the sensory input. Each emotion dimension value can be one that has been transformed to be independent of idiosyncrasies of a sensory capture device from which the sensory input was originally obtained. The sensory aggregator can aggregate emotion dimension values generated in a per-channel basis by the in-channel processors to generate at least one emotion datum value. The emotion datum value can be a standards-defined value for an emotional characteristic of a user from whom the sensory input was gathered. The emotion datum value can be a value independent of any single one of the standards-defined sensory channels and can be an application independent value that is able to be utilized by a set of independent applications to discern emotions of the user and to cause application specific code of the independent applications to be reactive to changes in sensory aggregator generated emotion datum values.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

FIG. 1 shows use of multiple sensory channels that process raw sensory inputs, which are aggregated to translate human emotions into a computing environment in accordance with an embodiment of the inventive arrangements disclosed herein.

FIG. 2 provides further details for operations performed by an in-channel processor, a sensory aggregator, and/or an emotion data consumer in accordance with an embodiment of the inventive arrangements disclosed herein.

FIG. 3 is a flow chart of a method for processing sensory input through multiple sensory channels to generate standardized emotion datum values in accordance with an embodiment of the inventive arrangements disclosed herein.

DETAILED DESCRIPTION

This disclosure provides an approach for translating human emotions in a computing environment, where the approach utilizes multiple sensory channels. Each sensory channel can be associated with a specific raw input, which can be processed in-channel. The raw input can include physiological data (e.g., brain signal readings, heart rate, blood pressure, etc.) captured by physiological sensors, input manually entered by a user via a peripheral (keyboard, mouse, microphone, etc.), and other environmental inputs (e.g., video, audio, etc.) gathered by capture devices. Input can be optionally processed to filter out abnormalities, to normalize human specific input to a baseline values, and to abstract sensory data from specifics of a capture device, thereby converting sensory channel input into a device independent form. Results from each sensory channel can be aggregated to determine a current emotional state of a user. Aggregated emotional information can be fed into one or more user interactive applications, which can programmatically react to the emotional information, thus providing an ability to tailor output and programmatic responds to a user's emotional state.

Historically, accessing human emotions via computer program products has occurred in an ad hoc manner using proprietary processing techniques. In contrast, the disclosure provides a systematic and extensible framework capable of adapting and refining itself as new emotion capture devices and analysis techniques emerge. Use of multiple different sensory channels provides a scalable solution, which is adaptable for a variety of different physiological sensors and other sensory capture devices. Initial processing of raw data can be handled by special purpose analysis engines, which may exist remote from a computing device with which a user interacts. An ability to remotely process discrete sensory channels is significant, as some analysis processes (e.g., facial expression analysis, voice stress analysis, semantic analysis of manually input content, etc.) for ascertaining user emotions can include resource intensive computing operations. In one embodiment, in-channel processing of input and/or aggregation of input from multiple channels can be performed as a software service, such as a Web service.

Processed output per sensory channel can be quantified in a discrete, manageable, and standardized format. Channel specific input can be further processed by a sensory aggregator, which can accept standardized channel input. The sensory aggregator can combine and weigh the per channel inputs to produce aggregate emotional data, which can also conform to a defined standard. For example, a universal format can be established for various emotions, which can be fed into an application in an easy to consume, standardized fashion. Thus, in a series of defined, repeatable, scalable, and standardized stages, complex raw sensory input can be captured, processed, aggregated, and distilled into a standardized usable format able to be consumed by emotion-reactive applications.

As will be appreciated by one skilled in the art, the disclosure may be embodied as a system, method, or computer program product. Accordingly, the disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, the present invention may take the form of a computer program product embodied in any tangible medium of expression having computer usable program code embodied in the medium.

Any combination of one or more computer usable or computer readable medium(s) may be utilized. The computer usable or computer readable medium may be, for example but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer readable medium would include the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CDROM), an optical storage device, or a magnetic storage device.

Computer program code for carrying out operations of the disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).

The disclosure is described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer readable medium that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instruction means which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

FIG. 1 shows use of multiplesensory channels120 that process rawsensory inputs122, which are aggregated to translate human emotions into a computing environment in accordance with an embodiment of the inventive arrangements disclosed herein. More specifically, one or moresensory capture devices110 can capture raw (or pre-processed)sensory input122 from auser105. Each distinctsensory input122 can be initially handled within a distinctsensory channel120, where an in-channel processor130 can transform thesensory input122 to processedinput124 accepted by asensory aggregator132. Hence, the in-channel processor130 can receive raw (minimally processed, or pre-processed)sensory data122 and can produce channel processedinput124.

In one embodiment, theinput122 can be pushed (from a sensory capture device110) to the in-channel processor130, although pull (from processor130) based embodiments are also contemplated, as are embodiments where an intermediary data cache is established between thesensory capture device110 andprocessor130. Thesensory data122 can be received as digital or analog data, where analog data can be converted to a digital form. In one embodiment, thesensory inputs122 can be optionally sampled, where some types ofinputs122 can require more or less sampling than others. Additionally, an input unit (e.g., a unit or set of data to be processed in a processing cycle) received and/or handled by an in-channel processor130 can be variable. In one embodiment, the channel processedinput124 can be a change reactive data element, which is only conveyed from the in-channel processor130 when the generated value is changed from a previously determined value (and/or when a change from a last determined value exceeds an established (but configurable) variance threshold).

Thesensory aggregator132 can use theinput124 to generatestandardized emotion data126, which anemotion data consumer134 utilizes to produce emotionadjusted output128 presentable uponoutput device135. In one embodiment,sensory aggregator132 can be triggered each time anew input124 is received, which results in anew emotion datum126 instance being selectively generated and pushed to theconsumers134. In another embodiment,aggregator132 can evaluate newly receivedinputs124 in a cyclic fashion and/or responsive to a received update command. Further, embodiments are contemplated where one ormore data consumers134 pull data fromaggregator132, as are embodiments where an intermediary data cache receivesdatum126 pushed fromaggregator132, which are later pulled (responsive to cache directed queries) by one ormore data consumers134.

Thesensory aggregator132 can include an algorithm that weights eachinput124, where weights are configurable. For example, theaggregator132 can weigh a voice derivedinput124 more heavily than aheart rate input124, which is weighed more heavily than a facial expression basedinput124, which is weighed more heavily than a skin response basedinput124, which is weighed more heavily than abrain signal input124. In one embodiment,aggregator132 can initially classifyinputs124 as being indicative of either positive emotions (e.g., happy, excited, calm, etc.) or negative emotions (e.g., sad, bored, frantic, etc.), wherestandardized emotion datum126 result from combining positive and negative scores for a given standardized emotion element and assessing whether the resultant score exceeds a previously established certainty threshold representing a minimal certainty required before an emotion element is established (andstandardized emotion datum126 for that element is generated).

Different techniques can be used by thesensory aggregator132 to handleconflicting inputs124. For example, in one embodiment, when conflicts existconflicting inputs124 having low corresponding weights can be discarded. In another embodiment, only inputs expected to have a high correspondence (e.g., heart rate and respiratory rate, for example) can trigger a conflicting situation, when an expected correspondence is lacking in whichcase conflicting inputs124 can be discarded from computations (allother inputs124 being evaluated by aggregator132). Additionally, recent patterns can be taken into consideration by thesensory aggregator132. For example, sudden changes in input values124 can be critiqued to determine if the change indicates a shift in an emotion of theuser105, which can be highly significant or can indicate an existence of a data abnormality which should be largely ignored.

Additionally, since results produced by thesensory aggregator132 can be used for different purposes, different configurable modes can exist, which producedifferent results126. For example, one processing mode can biassensory aggregator132 to generate a relatively smooth set of emotions that gradually transition over time. Such a mode may be preferred by aconsumer134 wantingdata126 indicative of general emotional states of auser105, but wishing to minimize dramatic emotional shifts (e.g., such a mode can minimize emotional “thrashing”). One way of implementing such a mode is to establish relatively high variance thresholds for theaggregator132 and/or to implement a smoothing function, whereemotional datum126 are only reported when consist for a set duration and when a very high level of confidence in the accuracy of thedatum126 exists. In contrast, a different processing mode can encourage rapid reporting ofuser105 emotional shifts, which may be significant/anticipated whenuser105 is being intentionally presented with output designed to elicit an emotional response. In general, thesensory aggregator132 can be implemented in a highly flexible and configurable manner to generateoutput126 suitable for needs of differentemotion data consumers134.

In one embodiment, generatedemotion datum126 can be normalized for standard values. The standard values can be collected from data of several users. In one configuration, standard values themselves can be based upon historical data obtained from a sample set of users connected tosensory capture devices110. This historical data can be analyzed for patterns. The historical data can optionally be continuously collected, updated, and analyzed so that the standard values themselves and patterns associated with these values evolve over time. A best-fit baseline based upon the standard values can be initially used as a starting point for auser105, where this starting point can be optionally tuned over time. That is,user105 specific attributes can be recorded and adjusted for duringaggregator132 processes.

In one embodiment,user105 specific attributes can be continuously updated using a learning algorithm, so that for a givenuser105 accuracy of the generateddatum126 should increase over time. In another embodiment, instead of implementing a learning algorithm, a calibration algorithm can be used, where a system is initially calibrated for aspecific user105 to improve accuracy, but where the system does not necessary dynamic adapt (learn) touser105 specific idiosyncrasies over time. A calibration state can even vary from session to session, assuming thatinput124 varies in a relatively consistent manner depending upon different moods of the user105 (which the calibration determines).

Although possible to produce emotionadjusted output128 for display (via device135) to auser105, such a step is optional. In some instances, for example,user105

emotion data

126 can be captured (possibly withoutexplicit user105 involvement/knowledge) for use in studies (i.e., marketing or psychology studies concerned with user emotion responses to computer presented stimuli) or for business purposes (i.e., call center responses can be transferred away from agents exhibiting hostile emotions when engaged in service calls).

Emotionadjusted output128 has a wide variety of application areas. For example, users participating in any virtual environment, such as Web based social networking sites, virtual communities (e.g., SECOND LIFE, OPEN SIM, any Massively Multiplayer Online Role-Playing Game (MMORPG), etc.) can communicate, where communications are laden with automatically determined and dynamically changing emotional expressions. Marketing and/or advertising firms can capture and process emotional data from a focus group exposed to a possible advertisement to obtain quantified data indicative of the advertisements success from an emotional perspective. People with physical/medical conditions, which make expression of emotions challenging, can better express their emotions through machine translation achievable via this disclosure.

Numerous embodiments, includingnetworked embodiment151 and stand-alone embodiment150, exist for transforming the rawsensory input122 to produce thestandardized emotion data126 using one or more computing devices executing computer code products. Innetworked embodiment151, a userinteractive computing device152 can be linked tosensory capture devices110, which generate the rawsensory input122. The rawsensory input122 can be conveyed over anetwork105 to zero or morechannel processing servers154, each including an in-channel processor130. Channel processedinput124 can be conveyed overnetwork105 to anaggregation server156 runningsensory aggregator132. One ormore application servers158, each executing anemotion data consumer134 computer program product can consumestandardized emotion data126. Theapplication server158 can interface with the userinteractive computing device152 overnetwork105. Input and output (including emotion adjusted output128) can be directly conveyed between the userinteractive computing device152 and theapplication server158.Embodiment151 is just one contemplated networked embodiments—others exist.

Optional use of remotely located networked resources for channel specific processes provides a capability to offload intensive computing operations to specialized devices. For example, facial analysis and speech analysis to ascertain an emotion of a user can both be resource intensive analysis operations, which can be handled by a server (e.g., server154) having sufficient resources dedicated to that type of analysis. An ability to use networked devices to off-load processing tasks also permits a commoditization of one or more functions described herein as a software service. Thus, a competitive marketplace of services for ascertaining emotions of a user can arise, where providers compete to provide sensory channelspecific processing130,output customizations128 based onstandardized emotion data126, and/or channel aggregation (e.g., sensory aggregator132) services. When network options are used during an embodiment of the disclosure, user specific settings for emotion interpretation (per channel and/or in aggregate) can be centrally maintained and utilized regardless of which of many end computing devices (e.g., user interactive computing device152) auser105 is using.

In one embodiment, implementation specifics of thenetwork embodiment151 can conform to a service oriented architecture (SOA). That is, a loose coupling of services with operating systems, programming languages, and other technologies that underlies applications (e.g., emotion data consumers134) can exist for processing and aggregating sensory input withinsensory channels120 to generateemotion data126 and to produce emotionadjusted output128. For example, the functions of the in-channel processor130,sensory aggregator132, and other functions described herein (for example, selecting a channel for handling an type of sensory input and routing the input accordingly, as shown by

items

315 and320 ofFIG. 3) can be implemented as distinct units of services. Developers can make these services accessible (such as via a Universal Description, Discovery and Integration (UDDI) repository or other service directory), where other developers can reuse and/or modify the services. The service can communicate with each other by passing data from one service to another and/or by coordinating activity between two or more services. In one embodiment, the SOA used to implement the functions described herein can conform to an open standard. For example, at least a portion of the functions described herein can be implemented in a standards based fashion within a JAVA 2 ENTERPRISE EDITION (J2EE) application server, such as an IBM WEBSPHERE server.

Despite many possible advantages offloading some processing to a network, stand-alone embodiments (e.g., embodiment150) are possible using techniques described herein. Stand-alone (or partially stand alone) capabilities can be beneficial in many situations (i.e., when adevice152 is offline; when configurable settings restrict access to emotion laden information, etc.). In stand-alone embodiment150, a userinteractive device152 can include in-channel processor130,sensory aggregator132, and anemotion data consumer134.

Each

computing device

152,154,156,158 ofFIG. 1 can be implemented as one or more computing devices (e.g., includes stand-alone as well as distributed computing devices) which is shown ascomputing device160. Thecomputing device160 can include physical devices and/or virtual devices (implementing in a layer of abstraction over hardware using virtualization techniques).Device160 can execute computer program products, which include software andfirmware170, usingunderlying hardware162. Thehardware162 can include aprocessor164, a volatile memory165, and a non-volatile memory166 linked via a bus167. Additional components, such as network interface cards/ports (not shown), can be included in thehardware162. For example,hardware162 of the user-interactive computing device152 and/or devices connected to network105 geographically proximate touser105 can include one or moresensory capture device110 and/or can include a port linked to asensory capture device110. The port can include a wired or wireless linkage for exchanging data. Wired ports can include, but are not limited to, a universal serial bus (USB) port, a serial port (e.g., keyboard, mouse), a parallel port (e.g., a parallel printer port), a firewire port, and the like. Wireless ports can include wireless transceivers conforming to a BLUETOOTH, WIRELESS USB, ZIGBEE, WIFI, or other wireless data exchange standard.

The software/firmware170 can optionally include anoperating system172 upon whichapplications173 execute. Thus, adevice160 can be a general purpose device hosting/running emotion transformation specific computer program products. In another embodiment,device160 can be a special purpose device lacking adiscrete operating system172. Instead, thedevice160 can be dedicated (purposed) for a single purpose, such as performing in-channel processing or sensory aggregation in a highly optimized manner. Whether implemented on top of anoperating system172 or not, theapplications173 can include, but are not limited to, in-channel processor130,sensory aggregator132, and/oremotion data consumer134.

Eachsensory capture device110 can producesensory input122. Thesensory input122 can includephysiological data189, user input181, andenvironmental input185.Physiological data189 can include human physiological data ofuser105, which can include to measurement of taken from a system of a human body (e.g., cardiovascular system, muscular system, circulatory system, nervous system, respiratory system, excretory system, endocrine system, digestive system, and the like). Further,physiological data189 can include data from the fields of biochemistry, biophysics, biomechanics, and the like.Physiological data189 can be obtained from aphysiological sensor188.

Physiological sensors

188 can include any device or peripheral able to capture data specific to a body ofuser105, which can be invasive or non-invasive as well as active or passive.Physical sensors188 can include, but are not limited to, blood pressure sensors (e.g., sphygmomanometer), pulse detector, brain-wave (P300, Mu, Alpha, etc.) sensors (e.g., electroencephalography sensor, magnetoencephalography sensor, hemoencephalography sensor, etc.), electromyography sensors, skin response sensors, fluid sensors (e.g., blood metabolite, oxygen saturation in body tissues sensors, etc.) skin temperature sensors, respiration sensors, conductive ink and textile sensors, and the like.

Environmental input

185 can include data passively obtained from an environment proximate to theuser105. For example,environmental input185 can include images, video, and audio captured by an audio/video capture device184.Environmental input185 can also include an environmental temperature of an environment proximate to theuser105, input from pressure sensors/scales, motion sensor input, and the like. In one embodiment, theenvironmental input185 can include images/video of a face of auser105, which is processed to discern a facial expression of theuser105. Other body language interpretation analytics can also be performed to determine sensory input (i.e., body language can be analyzed to determine ifuser105 is nervous, calm, indecisive, etc.).Environmental input185 can include speech analyzed for voice patterns indicative of emotions of theuser105.

User input181 can include information intentionally and manually entered by auser105 using a user input peripheral180 (e.g., mouse, joystick, keyboard, microphone, touch pad, etc.) In one embodiment, metadata of the user input181 can be evaluated to determine emotions of user105 (e.g., typing pattern analysis, hand steadiness when manipulating a joystick/pointer, etc). In another embodiment, semantic content of the user input181 can be analyzed to ascertain emotions ofuser105.

The different inputs122 (includinginputs181,185,189) can be combined within a singlesensory channel120 to accurately determine a given emotional dimension (associated with a channel120) of auser105. For example, onechannel120 or emotional dimension can be for semantically analyzed content of input181, another emotional dimension can be for facial/body analysis results, another emotional dimension can be for voice analysis results, another for respiratory sensor results, another for brain signal results, another for muscular analysis results, another for circulatory results, another for excretory system results, and the like. The actual categories established for each emotional dimension are able to vary. It can be significant, however, that each emotional dimension (associated with a discrete sensory channel120) be defined, so thatsensory input122 can be processed in a channel specific fashion. This permits channel processedinput124 to be generated, which is abstracted from specifics of thesensory capture device110 and which may be normalized in auser105 independent fashion. Thus, channel processedinput124 can be handled in a uniform manner by thesensory aggregator132.

To elaborate upon theinputs122 by example, onepotential input122 can include a heart rate input, which is a type ofphysiological data189. A heart rate monitory (device110; sensor188) can consist of two parts, a transmitter attached to a belt worn around a chest ofuser105 and a receiver worn around the wrist of theuser105. As the user's heart beats, an electrical signal, which is detected through the skin via the chest belt, can cause the heart muscle to contract. The belt can transmit an electromagnetic signal containing heart rate data to the wrist receiver, which displays the heart rate to a user. This data can also be fed into a communicatively linkedcomputing device160, which sends the heart rate data (possible after pre-processing it) to a suitable in-channel processor130 for handling.

In another example, theinput122 can include a measurement of electrical activity produced by the brain of theuser105, as recorded by electronics of an electroencephalography device (EEG) (e.g.,device110; device188) positioned proximate to the user's scalp. The electrical signals of the user's brain can be present at a microvolt level, which are amplified using the EEG. The amplified signals can be digitized and conveyed asinput122 to in-channel processor130.

In still another example, theinput122 can include voice input, which is processed in accordance with a set of one or more voice recognition and/or voice analysis programs. The voice input can be semantically processed for key words and/or phrases indicative of a corresponding emotion. Voice levels, pitches, inflections, speaking rate, and the like can also be analyzed to discern emotions of the speaker (e.g., user105). Theinput122 examples above are provided as illustrative examples only, and are not intended to be limiting.

InFIG. 1,network105 can include any hardware/software/and firmware necessary to convey digital content encoded within carrier waves. Content can be contained within analog or digital signals and conveyed through data or voice channels and can be conveyed over a local area network (LAN) or a wide area network (WAN). Thenetwork105 can include local components and data pathways necessary for communications to be exchanged among computing device components and between integrated device components and peripheral devices. Thenetwork105 can also include network equipment, such as routers, data lines, hubs, and intermediary servers which together form a packet-based network, such as the Internet or an intranet. Thenetwork105 can further include circuit-based communication components and mobile communication components, such as telephony switches, modems, cellular communication towers, and the like. Thenetwork105 can include line based and/or wireless communication pathways.

Each of the computing devices152-158 can include and/or be communicatively linked to one or more data stores. Each data store can be physically implemented within any type of hardware including, but not limited to, a magnetic disk, an optical disk, a semiconductor memory, a digitally encoded plastic memory, a holographic memory, or any other recording medium. The data stores can be a stand-alone storage unit as well as a storage unit formed from a set of physical devices, which may be remotely located from one another. Additionally, information can be stored within each data store in a variety of manners. For example, information can be stored within a database structure or can be stored within one or more files of a file storage system, where each file may or may not be indexed for information searching purposes.

FIG. 2 provides further details for operations performed by the in-channel processor130, thesensory aggregator132, and/or theemotion data consumer134 in accordance with an embodiment of the inventive arrangements disclosed herein. Details provided inFIG. 2 include specifics for one contemplated implementation of the components and/or elements ofFIG. 1.FIG. 2 is not to be construed as limiting the disclosure in any manner, and alternative implementation elements consistent with the scope of the claims presented herein are to be considered within scope of this disclosure.

Data shown inFIG. 1 includessensory input122, channel processedinput124,standardized emotion datum126, and emotionadjusted output128. Although each input and output122-128 ofFIG. 1 includes sample values, these values are included for illustrative purposes and are not to be construed to limit a scope of this disclosure. For example, although thesensory input122 is shown as being a data instance for blood pressure, which includes a systolic and diastolic measure, a variety of othersensory input122 types can be handled byprocessor130, as previously described.

The input/output122-128 can include numerous attributes defining a data instance. These attributes can include an input category, name/identifier, a value, strength, a certainty, and the like. These attributes can be defined by a known standard, and can be extensible. Further, the input/output122-128 specific attributes can be encoded and conveyed among devices using standardized formats, such as being XML encoded and conveyed via TCP/IP based protocols.

Inchannel processor130 can include afilter210, a sensory data processing component212, a learning and calibration module214, a dimensionalemotion evolution component216, adata store220, and/or other such components. Thedata store220 can include devicespecific data222, userspecific data224, configurable channelspecific rules226, and mappings to a specific standard228 to which the generated processedinput124 conforms.

Thefilter210 can adjustinput122 before it is processed by component121. These adjustments can include, but are not limited to, samplinginput122, analog-to-digital convertinginput122,pre-processing input122,format converting input122, etc.

The sensory data processing component212 can transform a raw value into score/value usable byprocessor130. The score generated by component212 can be device independent, which has been adjusted using device specific data (for a sensory capture device110). The score can also be adjusted for a specific user based upon userspecific data224. Thus, user specific abnormalities present indata122 can be removed and/or taken into account.

For example, when the processedinput122 is voice input, user specific language, dialect, grammar patterns, and speech characteristics can be included indata224 and used by a speech to text converter and/or a speech pattern analyzer (both being instances of processing component212). In another example, the processedinput122 can include image data that can be analyzed for patterns (by processing component212) to determine emotions from a user's facial expressions (which can use userspecific data224 to increase accuracy).

In one embodiment, a set ofconfigurable channel rules226 can be established. Theserules226 can be used to adjust behavior of the sensory data processing component212. Therules226 can be user/administrator defined,sensory capture device110 specific, as well as application (e.g., emotion data consumer) specific.

The learning and calibration module214 can adjust values of theprocessor130 to improve accuracy over time. The values can be part of an automated feedback cycle, which results in a learning capability for the processor. For each user,

historical data

122,124 can be maintained. This historical data can be analyzed for patterns to ascertain a likelihood of emotions being correctly and/or incorrectly evaluated. Sensory data processing212 parameters, channel rules226, and other datum can be tuned based on the historical data to improve an accuracy of the channel processedinput124 for a specific individual. In one embodiment, a specific training routine can be implemented, which feeds a known set of test data for a user and having a known set of desired results specific to the user into the in-channel processor130, which results in the module214

tuning processor

130 for user specifics.

The dimensionalemotion evaluation component216 converts a score/value computed by the processing component212 into a standardized value/form.Component216 can use to-standard mapping data228. In one embodiment,component216 can permit use of data obtained from a source not conforming to a desired standard (e.g., information from a standard non-compliant Web site, for example) into a standardized value of channel processedinput124.

Sensory aggregator

132 can include adimension weighing aggregator230, an aggregate emotion evaluation module232, a learning andcalibration module234, adata store240, and/or other such components. Thedata store240 can include userspecific data244,configurable aggregator rules246, and mappings to a specific standard248 to which thestandard emotion datum126 conforms.

Thedimension weighing aggregator230 can weigh different emotion dimensions differently relatively to each other. In one embodiment, a user data set including userspecific data244 can be established which permits the weighing to occur in a user specific manner. For example, some humans facial expressions are more easily and accurately read than others, soaggregator230 can adjust weights based upon userspecific data244 accordingly.

The aggregate emotion evaluation module232 can include or more algorithms. In one embodiment, code of the module232 can be adjusted/changed based upon configurable aggregator rules246. Different algorithms/computations can be more accurate than others based upon human-to-human variances, which module232 can adjust for driven bydata244. The to-standard mapping248 data can be used to transform an internal representation of emotion datum to a defined standard. In one embodiment,aggregator132 can adjust for multiple different standards (or to changes in standards) usingmapping data248 of the relevant standards/standard changes.

The learning andcalibration module234 can adjust values of theaggregator132 to improve accuracy over time. The values can be part of an automated feedback cycle, which results in a learning capability for theaggregator132. For each user,

historical data

124,126 can be maintained. This historical data can be analyzed for patterns to ascertain a likelihood of emotions being correctly and/or incorrectly evaluated. Weighing (230) and aggregating (232), aggregator rules246, and other parameters can be tuned based on the historical data to improve an accuracy of thestandard emotion datum126 for a specific individual. In one embodiment, a specific training routine can be implemented, which feeds a known set of test data for a user and having a known set of desired results specific to the user into thesensory aggregator132, which results in themodule234

tuning aggregator

132 for user specifics.

Emotion data consumer

134 can includeapplication code250, event triggers252 tied to emotion datum values,output handler254,data store260 and/or other such components. Thedata store260 can include output plug-ins264, configurable consumer rules266, andmappings268 from a defined standard to application specific actions.

Theapplication code250 of theemotion data consumer134 can be specific to a particular application, such as an IM client/server application, a virtual world application, a social networking application, and the like. Event triggers252 can be linked to theapplication code250, which respond to information contained instandardized emotion datum126. For example, when a given emotion has a strength and certainty greater than a defined threshold, programmatic actions of theapplication code250 can be triggered.

Theoutput handler254 can alter application output based upondatum126. For example,handler254 can generate text, images, sounds, and the like that correspond to a givenemotion datum126 which are applied to an application context (e.g., is programmatically linked to code250).

The plug-ins264 can provide a means for extending actions taken byconsumer134 responsive to thestandardized emotion data126. For example, an output plug-in264 can adjust an avatar's displayed image from a standard image set to a set of images having customized facial expressions dynamically adjusted to correspond to thedata126. Similarly, the configurable consumer rules266 can permit users/administrators to adjust behavior ofapplication code250 and/oroutput handler254 in a tailored manner. In one embodiment, theconsumer134 can be capable of accepting standards basedemotion datum126 for more than one standard. The from-standards mapping268 can be modified to permitconsumer134 to respond to these different standards and/or changes in a single standard handled byconsumer134.

FIG. 3 is a flow chart of amethod300 for processing sensory input through multiple sensory channels to generate standardized emotion datum values in accordance with an embodiment of the inventive arrangements disclosed herein.Method300 can be performed in context of a system shown inFIG. 1 andFIG. 2 and/or performed in context of any other suitable computing environment. Inmethod300, one or more sensory capture devices can receive sensory input from a user. The captured sensory input can include physiological data from a physiological sensor as well as environmental input and/or user input.

Instep310, the captured sensory input can be received by a computing device. Instep312, the sensory input can be optionally filtered as needed and/or appropriate. The filtering can occur locally on a device and can remove anomalous input at its source, so that it is not processed within a sensory channel. In one embodiment, local pre-processing actions can occur in addition to and/or instead of the filtering. The pre-processing can, for example, digitize, sample, aggregate, average, and otherwise adjust the raw sensory input before it is sent in-channel for processing.

Instep315, a sensory channel for the received input can be determined. Different standards-defined types of sensory channels can exist, where each is dedicated to handle a specific emotional dimension. An emotional dimension can correspond to one or more defined types or categories of sensory input. The input can be routed to a processor associated with the determined sensory channel, as shown instep320. Input can be continuously received, which is shown by the flow of themethod300 selectively branching fromstep320 to step310.

After sensory input has been routed, the receiving processor can optionally define a processing unit, which is a set of data (such as a time window of data) to be analyzed, as shown bystep325. In one embodiment, the processing unit can include a single value. In other embodiment, the processing unit can include two or more input values.

Instep330, the unit of input can be processed in the sensory channel to transform the unit of input into an emotion dimension value. Instep335, the emotion dimension value can be optionally normalized from a unique user specific value to a standard user value. In one embodiment, this normalization can represent an adjustment of the process or algorithm that generates the dimension value, as opposed to being a post processing of the dimension value to achieve a substantially equivalent result. Instep340, capture device specifics can be optionally abstracted to make the emotion dimension value a device independent one. This abstraction can occur within code of an algorithm that generates the emotion dimension value, or as a pre or post processing step.

The generated emotion dimension value, which has been generated and/or processed within a sensory channel to this stage, can be conveyed to a sensory aggregator. Input can be continuously processed within sensory channels and dimension value can be continuously sent to a sensory aggregator, which is expressed bymethod300 selectively proceeding fromstep345 to step325.

Instep350, after a sensory aggregator has received dimension values from multiple channels, an algorithm can execute that aggregates/weighs dimensional values to create a standardized emotion datum value. Inoptional step355, the emotion datum value can be converted from a value unique to a user to a standardized value that is user independent. Inoptional step360, the emotion datum value can be adjusted based upon configurable rules. Either of the

steps

355 or360 can be performed within code of the algorithm that generates the emotion datum value and/or can be performed as a pre or post processing step. After it is generated, the emotion datum value can be made available to one or more applications, as shown bystep365. The process that generates the emotion datum value can be continuous, which is shown by themethod300 flow proceeding fromstep365 to step350.

One or more applications can respond to the emotion datum values, as these values are generated and made accessible. Instep370, one or more of these application can consume (or utilize) these emotion datum values. These values can be programmatically linked to application specific events. The firing of these events can result in application specific programmatic actions being preformed, as shown bystep375.

The programmatic actions can, for example, generate output (viewable by one or more users) which communicates an emotion of the user from whom the sensory input was captured to one or more other users. The programmatic actions can, in another example, alter logic of an application with which a user interacts in reaction to an emotion condition of a user. For instance, in a call center environment, emotions of agents can be monitored/determined and calls can be automatically routed to other agents (based upon code of a call center application responding to emotion triggered events) whenever one agent becomes agitated, angry, or experiences another emotion that may be detrimental to professional handling of a call.

Method

300 can include additional and/or alternative processing steps not explicitly shown inFIG. 3. For example, although not expressed inFIG. 3, themethod300 can include a learning and calibration step, where an in-channel processor and/or aggregator can be trained/calibrated to increase processing accuracy. The processing ofmethod300 can be designed to occur in real-time, near real-time, and/or after appreciable processing delays depending upon implementation specifics.

The diagrams in theFIGS. 1-3 illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be performed substantially concurrently, or the blocks may sometimes be performed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Claims

1. A method for discerning human emotions in a computing environment comprising:

receiving a plurality of sensory inputs of a user, wherein at least one of the sensory inputs comprises a physiological input providing a physiological measurement from a body of the user, said physiological measurement being obtained using a physiological sensor;

processing each sensory input in a unique one of a plurality of standards-defined sensory channels, each standards-defined sensory channel corresponding to a specific emotion dimension, wherein said processing comprises transforming the physiological measurement into an emotion dimension value, said emotion dimension value abstracting the physiological measurement from specifics attributable to unique characteristics of the physiological sensor;

aggregating the emotion dimension values from each of the sensory channels to generate at least one emotion datum value, which is a standards-defined value for an emotional characteristic of said user, wherein said emotion datum value is a value independent of any of said sensory devices and is a value independent of any single one of said standards-defined sensory channels; and

performing a programmatic action driven by the emotion datum value.

2. The method ofclaim 1, wherein said user is interacting with a graphical user interface to participate within a virtual environment in which said user is represented in said virtual environment as an avatar, which is presented to a plurality of other users of said virtual environment, wherein a presentation characteristic of said avatar is adjusted for the emotion datum value so that an emotion corresponding to the emotion datum value is presented to the other users.

3. The method ofclaim 2, wherein the presentation characteristic alters at least one of a facial expression of the avatar and a speech characteristic of the avatar in accordance with the emotion corresponding to the emotion datum value.

4. The method ofclaim 1, wherein the emotion datum value is generated without the user ever explicitly manually inputting semantic content for an emotion.

5. The method ofclaim 1, wherein said emotion datum value is an application independent value conforming to a defined standard, wherein a plurality of independent applications are able to consume the emotion datum value to drive application specific changes based upon emotions of the user.

6. The method ofclaim 1, further comprising:

when aggregating the emotion dimension values, adjusting user specific factors specified within a stored user specific data set to normalize the generated emotion datum value to a baseline of an average person.

7. The method ofclaim 1, wherein said user participates in an application session of an application which performs said programmatic action driven by the emotion datum value, wherein during the application session, the plurality of sensory inputs from the user are continuously received, processed, and aggregated to continuously generate different emotion datum values, which drive programmatic actions of the application that are based upon emotional variances of the user during the application session, wherein the continuous receiving, processing, aggregating, and performing of the programmatic actions occur in at least one of real-time and near real time, wherein the application session is a communication session in which said user communicates in real-time or near real-time with at least one other user, wherein during said communication session said other user is continuously appraised of emotional changes of said user determined from the continuously generated emotion datum values.

8. The method ofclaim 1, wherein processing for different ones of the plurality of standards-defined is handled by different networked computing devices, each of the different networked computing devices being explicitly dedicated to handle at least one specific sensory channel specific processing task.

9. The method ofclaim 1, further comprising:

collecting historical data for said user comprising emotion data of the user;

analyzing said historical data on an iterative basis; and

adjusting parameters used to generate said emotion datum value in accordance with results of said analyzing to improve an accuracy of the generated emotion datum value over time, thereby instituting a learning process to continuously improve method produced results in a user specific manner.

10. The method ofclaim 1, further comprising:

collecting channel specific historical data for said user on a sensory channel specific basis for each of said plurality of standards-defined sensory channels;

analyzing said channel specific historical data for each sensory channel on an iterative basis;

adjusting parameters used to generate said emotion dimension values in accordance with results of said analyzing of the channel specific historical data to improve an accuracy of the generated emotion dimension values over time; and

collecting aggregation specific historical data for said user comprising emotion datum values and emotion datum values generated from said emotion datum values;

analyzing said aggregation specific historical data on an iterative basis; and

adjusting parameters used to generate said emotion datum values in accordance with results of said analyzing of the aggregation specific historical data to improve an accuracy of the generated emotion datum values over time.

11. The method ofclaim 1, wherein said aggregating of the dimension values comprises:

querying an emotional data set specific to said user, wherein said emotion data set weights the dimension values relative to each other to generate the emotion datum value in a user specific manner that accounts for a user specific point in a human emotional variance range.

12. The method ofclaim 1, wherein the processing of each sensory input is performed as a software service by a computing device remotely located from a computing device proximate to the user, and wherein the aggregating of the emotion dimension values is performed as a software service by a computing device remotely located from a computing device proximate to the user.

13. A computer program product for discerning human emotions in a computing environment comprising:

a computer usable storage medium having computer readable program code embodied therewith, the computer readable program code comprising:

computer readable program code configured to receive a plurality of sensory inputs of a user, wherein at least one of the sensory inputs comprises a physiological input providing a physiological measurement from a body of the user, said physiological measurement being obtained using a physiological sensor;

computer readable program code configured to process each sensory input in a unique one of a plurality of standards-defined sensory channels, each standards-defined sensory channel corresponding to a specific emotion dimension, wherein said processing comprises transforming the physiological measurement into an emotion dimension value, said emotion dimension value abstracting the physiological measurement from specifics attributable to unique characteristics of the physiological sensor;

computer readable program code configured to aggregate the emotion dimension values from each of the sensory channels to generate at least one emotion datum value, which is a standards-defined value for an emotional characteristic of said user, wherein said emotion datum value is a value independent of any of said sensory devices, independent of any single one of said standards-defined sensory channels; and

computer readable program code configured to perform a programmatic action driven by the emotion datum value.

14. A system for incorporating human emotions in a computing environment comprising:

a plurality of discrete sensory channels for handling sensory input, wherein each of the discrete sensory channels is a standards-defined sensory channel corresponding to a specific emotion dimension, wherein sensory input handled within the sensory channels comprises physiological input providing a physiological measurement from a body of the user;

a plurality of in-channel processors that process sensory input specific to the channel and that generate emotion dimension values from the sensory input, wherein each emotion dimension value has been transformed to be independent of idiosyncrasies of a sensory capture device from which the sensory input was originally obtained; and

a sensory aggregator for aggregating emotion dimension values generated in a per-channel basis by the in-channel processors to generate at least one emotion datum value, which is a standards-defined value for an emotional characteristic of a user from whom the sensory input was gathered, wherein said emotion datum value is a value independent of any single one of said standards-defined sensory channels, and is an application independent value that is able to be utilized by a plurality of independent applications to discern emotions of said user and to cause application specific code of the independent applications to be reactive to changes in sensory aggregator generated emotion datum values.

15. The system ofclaim 14, further comprising:

a plurality of sensory capture devices, wherein said sensory capture devices comprise a physiological sensor, an audio or video capture device, and a user input peripheral for manual user input, wherein input obtained from the physiological sensor, from the audio or video capture device, and from the user input peripheral is handled by separately by different ones of the discrete sensory channels, and wherein said physiological sensor obtains a measure from a system of the human body selected from a group of systems consisting of a cardiovascular system, a muscular system, a circulatory system, a nervous system, a respiratory system, an excretory system, an endocrine system, and a digestive system.

16. The system ofclaim 14, further comprising:

a plurality of emotion data consumers each performing a programmatic actions driven by emotion datum values generated by the sensory aggregator.

17. The system ofclaim 16, wherein at least one of the emotion data consumers comprises a virtual world software application in which a plurality of humans interact though user specific avatars, wherein said user specific avatars express emotions corresponding to emotions of their users driven by the emotion data values.

18. The system ofclaim 14, wherein at least a portion of the in-channel processers are computer program products hosted by a physical computing device remotely located and connected by network to another physical computing device from which the sensory input handled by the portion of the in-channel processers originated.

19. The system ofclaim 14, wherein at least one of the in-channel processors analyzes speech input to determine emotions of the user who provided the speech input based upon vocal characteristics of the speech input, and wherein at least one of the in-channel processors analyzes images to determine facial expressions of the user.

20. The system ofclaim 14, wherein said in-channel processors and said sensory aggregator are components of a service oriented architecture (SOA), where software services are provided for processing said sensory input and for aggregating said emotion dimension values.