In the context ofhuman–computer interaction, amodality is the classification of a single independent channel ofinput/output between a computer and a human. Such channels may differ based on sensory nature (e.g., visual vs. auditory),[1] or other significant differences in processing (e.g., text vs. image).[2]A system is designated unimodal if it has only one modality implemented, andmultimodal if it has more than one.[1] When multiple modalities are available for some tasks or aspects of a task, the system is said to have overlapping modalities. If multiple modalities are available for a task, the system is said to have redundant modalities. Multiple modalities can be used in combination to provide complementary methods that may be redundant but convey information more effectively.[3] Modalities can be generally defined in two forms: computer-human and human-computer modalities.
Computers utilize a wide range of technologies to communicate and send information to humans:
Any human sense can be used as a computer to human modality. However, the modalities ofseeing andhearing are the most commonly employed since they are capable of transmitting information at a higher speed than other modalities, 250 to 300[4] and 150 to 160[5]words per minute, respectively. Though not commonly implemented as computer-human modality, tactition can achieve an average of 125 wpm[6] through the use of arefreshable Braille display. Other more common forms of tactition are smartphone and game controller vibrations.
Computers can be equipped with various types ofinput devices and sensors to allow them to receive information from humans. Common input devices are often interchangeable if they have a standardized method of communication with the computer andafford practical adjustments to the user. Certain modalities can provide a richer interaction depending on the context, and having options for implementation allows for more robust systems.[7]
With the increasing popularity ofsmartphones, the general public are becoming more comfortable with the more complex modalities. Motion and orientation are commonly used in smartphone mapping applications. Speech recognition is widely used with Virtual Assistant applications. Computer Vision is now common in camera applications that are used to scan documents and QR codes.
Having multiple modalities in a system gives moreaffordance to users and can contribute to a more robust system. Having more also allows for greateraccessibility for users who work more effectively with certain modalities. Multiple modalities can be used as backup when certain forms of communication are not possible. This is especially true in the case of redundant modalities in which two or more modalities are used to communicate the same information. Certain combinations of modalities can add to the expression of a computer-human or human-computer interaction because the modalities each may be more effective at expressing one form or aspect of information than others.
There are six types of cooperation between modalities, and they help define how a combination or fusion of modalities work together to convey information more effectively.[8]
Complementary-redundant systems are those which have multiple sensors to form one understanding or dataset, and the more effectively the information can be combined without duplicating data, the more effectively the modalities cooperate. Having multiple modalities for communication is common, particularly in smartphones, and often their implementations work together towards the same goal, for example gyroscopes and accelerometers working together to track movement.[8]