This application claims priority from U.S. provisional patent application serial No. 61/284,622, filed 12/22/2009, which is hereby incorporated by reference.
Detailed Description
The present application is directed to systems and methods for human verification by contextual imaging visual public turing testing. The following description contains specific information pertaining to the implementation of the present invention. One skilled in the art will recognize that the present invention may be implemented in a manner different from that specifically discussed in the present application. Furthermore, some of the specific details of the invention are not discussed in order not to obscure the invention. Specific details not described in the present application are within the knowledge of one of ordinary skill in the art. The drawings in the present application and their accompanying detailed description are directed to merely exemplary embodiments of the invention. In the interest of brevity, other embodiments of the invention that use the principles of the present invention are not specifically described in the present application and are not specifically illustrated by the present drawings.
FIG. 1 shows a schematic diagram of an image for performing a contextual imaging visual public Turing test for human verification, according to one embodiment of the invention. As shown in the display 100 of fig. 1, the imaged visual image or picture is selected to match a particular context. A puzzle (puzzle) such as that shown in fig. 1 may be displayed to the user as a prerequisite to registering with the online community, focusing on classical disney animation. Because the user is registering to participate in such a community, it may be reasonable to assume that the user is somewhat familiar with the classic disney role shown in display 100 of fig. 1, allowing the user to easily recognize and select the correct answer with minimal effort.
Because human verification challenges can be tailored to a specific context, users can enjoy a new game-like experience, taking advantage of built-in audience recognition through familiar and recognizable characters and subject matter (subject matter). Rather than struggling with arbitrary, boring and contextually-free verification challenges that use unfamiliar words and phrases that are ambiguous and difficult to read text, such as using traditional CAPTCHAs, the user may instead select from friendly graphical visual cues originating from common and well-known content providers or brands. As a result, content providers may enjoy seeing increased user retention when users are fully attracted and may actually enjoy authentication steps rather than an unrelated chore of feeling that authentication is related outside of desired content.
If the user is unfamiliar with the roles shown in the display 100 of FIG. 1, the challenge question may be re-described as using a more general or universal question. For example, the user may click on an "I don't know" button to indicate a lack of familiarity with the character. In response, instead of requiring a specific character name as shown in fig. 1, the user may instead be required to "find dogs, ducks and elephants". Thus, the puzzle shown in FIG. 1 can also be used as a general puzzle that is applicable to ambiguous contexts as well, since the puzzle can be tailored to the familiarity of each particular user's knowledge and content. For example, the Disney character puzzle shown in FIG. 1 may be typical for children and family-oriented web sites, even those not directly related to classical Disney animations. In this manner, contextual graphical visual public turing tests may be provided as a common service to third parties, relieving them of having to develop and manage human verification systems.
As shown in the display 100 of fig. 1, techniques may be used to enhance automated system prevention while reducing the cognitive load of humans. For example, as shown in, for example, pictures 5 and 6, individual pictures may feature a plurality of distinct characters or objects. This feature advantageously increases the difficulty of automatic image recognition, which now has to detect several distinctive and possibly fuzzy objects in a particular scene. On the other hand, providing pictures with multiple characters may allow for additional adaptability to accommodate different human responses. For example, when "128" is displayed as the response of fig. 1, the alternative response may instead include "25" or "125" because picture 5 includes both "Donald" and "Goofy". Because users may differ in their cognitive processes and puzzle solving strategies depending on age, culture, personality, or other factors, providing several different effective answers helps satisfy a wide range of human responses.
While the embodiment shown in the display 100 of fig. 1 allows for the freedom to select multiple pictures before submitting an answer, alternative embodiments may use, for example, a combination lock concept. In this alternative embodiment, the user may be prompted to answer a series of questions one after another. For example, a combination lock with three problems may successively require a user to first find Donald, then find Minnie, and finally find Goofy. At the end of all three questions, the system may then inform the user whether the answer is correct or incorrect. The detailed results may be hidden to prevent disclosure of the information from being used for fraud. Additionally, time-based restrictions may be enforced to prevent automatic entry attempts in the event that a particular user provides several incorrect answers within a short period of time.
Alternatively, rather than selecting from a grid of pictures, the user may be required to directly identify a particular picture by typing a response or by selecting from a list of options. For example, the user may be prompted to type in the name of the character in picture 1, "Goofy," or the user may be provided with a drop down menu with a different character name, provided "Goofy" as one of the selectable menu options. To prevent brute force attacks, the drop down menu may include several incorrect alternatives.
In another embodiment, the challenging problem may be posed in a completely visual or symbolic way without requiring the user to understand any written content. This may be particularly suitable for younger audiences or non-english speaking audiences. For example, an animated guide may be displayed to the user demonstrating to the user that the object being exercised is to match a particular character to a particular picture. For example, rather than requiring the user to "find Donald, Minnie, and Goofy" in writing, the challenging problem may be displayed as, for example, an image of Donald, an image of Minnie, and an image of Goofy, with equal markings and problem flags or another set of commonly understood symbols to indicate that the user matches these images from the displayed picture. Of course, to thwart automated systems, the images selected for the challenging problem will be different from those shown in the picture. To make the goals more apparent, the example puzzle may first be displayed to the user, and the process of selecting a solution may be demonstrated to the user by, for example, operating a mouse pointer or playing back a demonstration video. Thus, generic visual cues may be used to display challenging questions, providing a user interface that is friendly independent of the understanding of a single written language.
As shown in the display 100 of fig. 1, the pictures are arranged in a user-friendly three-by-three grid, which may be advantageously mapped to a numeric keypad of a keyboard, mobile phone, remote control or another input device. Alternative embodiments may use any other arrangement of pictures to suit a particular input device, display, or another aspect of the operating environment. For example, a server generating the puzzle shown in the display 100 of FIG. 1 may detect the user's operating environment to directly map a numeric keypad to a corresponding picture for input. Thus, the user may simply type numbers directly to answer the puzzle. Of course, conventional pointing devices, such as a mouse, track pad, or touch screen, may still be maintained for selection of pictures.
An optional non-visual verification method may also be provided for accessibility. Conventionally, this is done by providing a spoken phrase and requiring the user to transcribe the phrase. However, the inventive idea can also be extended to these audio verification methods as well. For example, the voice of an identifiable actor or character may recite a particular phrase, and the user may be prompted to transcribe the phrase and further identify the actor or character that said the phrase. Alternatively, the user may be prompted to provide non-written feedback, such as selecting a picture of the actor or character that matches the spoken phrase, thereby allowing the user to successfully verify with limited language skills. Thus, as an alternative to visual illustrations, contextually recognizable voice-overs also provide enhanced user assurance by taking advantage of built-in audience awareness and familiarity with the conversational style and intonation of a particular character. Moreover, the additional need to provide specific character or actor names or identities can serve as an effective automated system stop, as sophisticated audio analysis may be required to provide a correct automated response.
Moving to FIG. 2, FIG. 2 shows a schematic diagram of a database table of data on a contextual graphical visual public Turing test for human verification, according to one embodiment of the invention. Each visual icon shown in the display 100 of fig. 1 may have one or more associated tags and be retrieved from the database for organized access and retrieval. Each entry in database table 200 may include, for example, a unique identifier ("ID"), a path to an associated image file (not shown in fig. 2), an associated Area or subject matter ("Area"), a Group owner ("Group"), and a description tag ("Tags") that describes the visual content of the image file. The "Tags" field may be implemented using one or more secondary tables, as is known in the art. The path to the image file may use a blurring technique, such as one-way hashing, to prevent guessing from the file name or other plain text content. In addition, the individual images may remain hidden from the end user by, for example, generating a single image file with all of the embedded select images. In addition, randomization of image placement, image size, file name, and other blurring techniques may further be used to prevent an automated system.
In addition to static or still picture images, alternative embodiments may animate characters or other objects using, for example, HTML5, Adobe Flash, Microsoft Silverlight, Javascript, or other methods of displaying dynamic content. Animated content, such as movie files, vector animation, or real-time presentation content, may be more difficult for automated systems to analyze and may also provide a more appealing visual image to the user relative to standard still picture content. However, for situations where bandwidth or processor is limited, such as when working with a mobile phone, a still picture image may be preferred for network balance or performance reasons.
For simplicity, the pictures or icons 1 through 9 in the display 100 of FIG. 1 correspond directly to the database entries 1 through 9 in the database table 200 of FIG. 2. However, in an alternative embodiment, images may be randomly selected from database table 200 using certain criteria, such as "select 9 random entries from the common character area in the Disney group, i.e., from database entries 1 through 11". Additionally, while only 18 entries are displayed in database table 200 for simplicity, alternative embodiments may include a sufficiently large number of entries to defeat the brute force matching techniques of an automated system.
In addition to the disconey characters shown in database entries 1 to 11, database entries 12 to 18 concerning images related to the program "desperate wife" of ABC are also included. This set of images may be applied contextually, for example, to a discussion forum for "desperate associates". Since the viewer of the "desperate housewife" program will be familiar with the roles, distinguishing the roles should be a trivial exercise for potential forum registrants, but a difficult task for automated systems. Any arbitrary object may be used for the linked images within the database table 200, in addition to the animated character or the actual live actor. Thus, another set may include a set of images about the classic car that may be used to verify registration with a forum for classic car fans. Thus, the database table 200 may accommodate image sets in several different contexts, thereby allowing selection of the most relevant human verification content for a particular user audience or for a particular subject or brand.
As shown in database table 200, each particular entry may include several tags that may describe some general or specific aspect of the respective referenced image. For example, the image associated with ID 2, which is labeled as picture or icon number 2 in the display 100 of FIG. 1, is associated with the labels "Minnie", "Mouse", and "Female". When a question about a puzzle is generated, any associated tags may be used as selection criteria. Because tags may be implemented using secondary tables, additional attributes, such as, for example, genre or attribute type of tag, may also be embedded within the database to help address issues of knowledge and comfort level for different users. Thus, the user may be asked to identify a specific role "Minnie," or simply find "Mouse," or find "Female" roles. As discussed above, more specific questions may be asked first, while more general questions remain as fail-safe (failsafe).
Moving to FIG. 3, FIG. 3 shows a schematic diagram of a system for performing a contextual graphical visual public Turing test for human verification, in accordance with one embodiment of the present invention. The diagram 300 of fig. 3 includes a validation server 310, a database 320, an image 330, a content provider 340, a client 350, an input device 360, and a display 370. The authentication server 310 comprises a processor 311. The display 100 of fig. 1 may correspond to the display 370 of fig. 3, and the database table 200 of fig. 2 may be included in the database 320 of fig. 3.
The diagram 300 of fig. 3 shows a representative network configuration in which a third-party content provider 340 utilizes a verification server 310 to verify whether an access client is human controlled or automated. However, alternative embodiments may combine the functionality of authentication server 310 and content provider 340 into a single entity. A public network, such as the Internet, can support communication links between the components of diagram 300. Continuing with the example discussed above in connection with fig. 1 and 2, the content provider 340 may provide a public discussion forum targeted to children and families. The public discussion forum may provide features such as voting, message boards, social networks, and other services that may be adversely affected if exposed to automated systems or non-human control. For example, the robot may be programmed to vote multiple times to manipulate the election results by generating false accounts, or the robot may be programmed to disseminate spam, malware, and other malicious content through provided message boards and social network features. To prevent such behavior, it is desirable to verify whether the client is human-controlled or automatic, and to grant access only to human-controlled clients.
Thus, prior to providing the user account to the client 350, the content provider 340 should verify that the client 350 is human controlled rather than an automated system or bot. By previous mutual arrangement, for example, the content provider 340 may thus request the validation server 310 to determine whether the client 350 is controlled by a human. As previously discussed, the validation server 310 may select an entry from the database 320 based on the particular context of the content provider 340. Because content provider 340 is serving both children and family-friendly populations, verification server 310 may contextually select entries from database 320 related to classical Disney animation, which may be readily identified by the target population. As previously discussed, a selection query, such as "select 9 arbitrary items from the common character area in the discone group," may be performed with respect to database 320. Each entry may be linked to an image file stored in image 330 or may be stored directly in database 320.
After retrieving the entry generated by the selection query to the database 320, the verification server 310 may then use the retrieved entry to generate a challenge question and a corresponding answer set for display to the client 350 via the display 370. In an alternative audio file for providing accessibility, the image 330 may be supplemented by an audio file, and the display 370 may be supplemented by an audio output device such as headphones or speakers. The user may then submit a response to the challenge question using input device 360, and input device 360 may include a keypad, remote control, mouse, touch screen, or any other input device. Validation server 310 may then determine whether the submission from client 350 matches the answer set and inform content provider 340 accordingly. Assuming a positive answer, the content provider 340 may then agree to allow the client 350 to register a new user account for full community participation.
Moving to FIG. 4, FIG. 4 shows a flowchart describing steps by which a contextual graphical visual public Turing test for human verification may be performed, according to one embodiment of the invention. Certain details and features have been left out of flowchart 400 that are apparent to a person of ordinary skill in the art. For example, a step may comprise one or more substeps or may involve specialized equipment or materials, as known in the art. Although steps 410 through 460 indicated in flowchart 400 are sufficient to describe one embodiment of the present invention, other embodiments of the invention may utilize steps different from those shown in flowchart 400.
Referring to step 410 of flowchart 400 in fig. 4 and diagram 300 of fig. 3, step 410 of flowchart 400 includes verifying that processor 311 of server 310 receives a request from content provider 340 to verify whether client 350 is in human control. Continuing with the example discussed above, content provider 340 may include a web server that provides children and family-oriented discussion forums and communities. The client 350 may access the content provider 340 over the internet using a web browser and may express an interest in registering a new user login. Before the content provider 340 allows the client 350 to register as a new user, it may send a request to the verification server 310 to verify whether the client 350 is controlled by a human. In this manner, a blocking of the automation system may be provided.
Referring to step 420 of flowchart 400 in FIG. 4 and to diagram 300 of FIG. 3, step 420 of flowchart 400 includes processor 311 of validation server 310 selecting a plurality of images, each having one or more associated tags from database 320, by context criteria (contextualization). For example, a table similar to database table 200 of FIG. 2 may be included in database 320. The entries in database table 200 may include image links (not shown) that relate to image files stored in image 330 of FIG. 3. As discussed previously, the contextual criteria may include an intended audience, subject matter, or brand at the content provider 340. In this example, because the intended audience includes children and households, selection may narrow the focus to the "common character" area of the "Disney" group. As shown in database 200, each entry or image has one or more associated tags. Continuing with the example discussed above in connection with FIG. 1, step 420 may select 9 database entries, corresponding to image match ID numbers 1 through 9.
Referring to step 430 of flowchart 400 in fig. 4 and illustration 300 of fig. 3, step 430 of flowchart 400 includes processor 311 of verification server 310 generating a challenging question and a corresponding answer set based on the associated labels of the subset of the plurality of images selected from step 420. For example, as discussed above in connection with FIG. 1, an imagewise visual three-by-three grid corresponding to the image selected from step 420 may be displayed, and the question may be formulated by the system, asking the user to find one or more images from the grid that match the selected and displayed tags. In the example shown in FIG. 1, the question asks the user to find three images from a three-by-three grid based on the role name labels. As previously discussed, a number of alternative implementations may also be used, such as a combination lock process, a selection from a drop down menu, a request to type or write a reply for identifying an image, or requiring an audio recording of the phrase and providing the identity of the phrase speaker. Furthermore, as discussed previously, challenging questions may be conceived such that there are multiple correct answers in the respective answer sets. For example, some images may be associated with multiple tags, allowing them to be the correct answer for all of the associated tags.
Referring to step 440 of flowchart 400 in fig. 4 and diagram 300 of fig. 3, step 440 of flowchart 400 includes processor 311 of authentication server 310 displaying the plurality of images from step 420 and the challenging problem from step 430 to client 350. Thus, a display 370 connected to the client 350 may display an interface similar to the display 100 of FIG. 1. Alternatively, if the challenging issue from step 430 is audio-based, a set of speakers or headphones connected to client 350 may replace the output talk channel.
Referring to step 450 of flowchart 400 in fig. 4 and diagram 300 of fig. 3, step 450 of flowchart 400 comprises processor 311 of validation server 310 receiving from client 350 a submission of an answer to the challenging question displayed in step 440. Thus, a user of client 350 may select three pictures using input device 360, as shown in display 100 of fig. 1, and click the "submit answer" button. Alternatively, if provided, the user may click on the "I don't know" button to restart the process of step 430, where the challenge question is formulated by the system using more general criteria, such as requiring selection of a specific category of animal rather than a specific character name.
Referring to step 460 of flowchart 400 in fig. 4 and to diagram 300 of fig. 3, step 460 of flowchart 400 includes: the processor 311 of the validation server 310 replies to the request received from step 410 by validating whether the submission of the answer from step 450 is contained in the answer set generated in step 430 to determine whether the client 350 is human controlled. In the example shown in fig. 1, the submissions of pictures 1, 2, and 8 are indeed contained within the answer set, and processor 311 may report to content provider 340 that client 350 may be a human and should be agreed to allow registration as a new user. Otherwise, the verification server 310 may report the failure of the client 350 to pass human verification to the content provider 340. At this point, content provider 340 may request that authentication server 310 restart the process from step 420 again to give client 350 another opportunity. There may be a mandatory limit to the number of retries possible within a given time period to prevent a brute force attack from the automation system.
From the above description of the invention it is manifest that various techniques can be used for implementing the concepts of the present invention without departing from its scope. Moreover, although the present invention has been described with specific reference to certain embodiments, workers skilled in the art will recognize that changes may be made in form and detail without departing from the spirit and scope of the invention. As such, the described embodiments are to be considered in all respects as illustrative and not restrictive. It will also be understood that the invention is not limited to the particular embodiments described herein, but is capable of many rearrangements, modifications, and substitutions without departing from the scope of the invention.