Movatterモバイル変換


[0]ホーム

URL:


US10180989B2 - Generating and executing query language statements from natural language - Google Patents

Generating and executing query language statements from natural language
Download PDF

Info

Publication number
US10180989B2
US10180989B2US14/808,138US201514808138AUS10180989B2US 10180989 B2US10180989 B2US 10180989B2US 201514808138 AUS201514808138 AUS 201514808138AUS 10180989 B2US10180989 B2US 10180989B2
Authority
US
United States
Prior art keywords
conditions
tags
search query
processor
generate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US14/808,138
Other versions
US20170024443A1 (en
Inventor
Yigal S. Dayan
Josemina M. Magdalen
Irit Maharian
Victoria Mazel
Oren Paikowsky
Andrei Shtilman
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines CorpfiledCriticalInternational Business Machines Corp
Priority to US14/808,138priorityCriticalpatent/US10180989B2/en
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATIONreassignmentINTERNATIONAL BUSINESS MACHINES CORPORATIONASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS).Assignors: SHTILMAN, ANDREI, PAIKOWSKY, OREN, DAYAN, YIGAL S., MAGDALEN, JOSEMINA M., MAHARIAN, IRIT, MAZEL, VICTORIA
Priority to US15/140,839prioritypatent/US10169471B2/en
Publication of US20170024443A1publicationCriticalpatent/US20170024443A1/en
Application grantedgrantedCritical
Publication of US10180989B2publicationCriticalpatent/US10180989B2/en
Activelegal-statusCriticalCurrent
Adjusted expirationlegal-statusCritical

Links

Images

Classifications

Definitions

Landscapes

Abstract

Techniques for generating query language statements for a document repository are described herein. An example method includes detecting a search query corresponding to a document repository and generating a modified search query by adding atomic tags to the search query, the atomic tags being based on prior knowledge obtained by static analysis of the document repository and semantic rules. The method also includes generating enriched tags based on combinations of the atomic tags and any previously identified enriched tags and generating a first set of conditions based on combinations of the atomic tags and the generated enriched tags and generating a second set of conditions based on free-text conditions. The method also includes generating the query language statements based on the first set of conditions and the second set of conditions and displaying a plurality of documents from the document repository that satisfy the query language statements.

Description

BACKGROUND
The present invention relates to query language statements, and more specifically, but not exclusively, to generating and executing query language statements.
SUMMARY
According to an embodiment described herein, a method for generating query language statements for a document repository comprises detecting, via a processor, a search query corresponding to a document repository. The method can also include generating, via the processor, a modified search query by adding atomic tags to the search query, the atomic tags being based on prior knowledge obtained by static analysis of the document repository and semantic rules. Additionally, the method can include generating, via the processor, enriched tags based on combinations of the atomic tags and any previously identified enriched tags and adding the generated enriched tags to the modified search query. Furthermore, the method can include generating, via the processor, a first set of conditions based on combinations of the atomic tags and the generated enriched tags and generating a second set of conditions based on free-text conditions and reconciling, via the processor, the first set of conditions based on identified contradictions. The second set of conditions can correspond to terms of the search query that are not associated with any of the first set of conditions, which can result in a more focused and accurate retrieval of the relevant documents. The method can also include generating, via the processor, the query language statements corresponding to the search query, the query language statements based in part on the first set of conditions and the second set of conditions, and displaying, via the processor, a plurality of documents from the document repository that satisfy the query language statement.
According to another embodiment, a system for generating a query language statement can include a processor to detect a search query corresponding to a document repository and generate a modified search query by adding atomic tags to the search query, the atomic tags being based on an entity list, and semantic rules. The processor can also generate enriched tags based on combinations of the atomic tags and any previously identified enriched tags and add the generated enriched tags to the modified search query. The processor can also generate a first set of conditions based on combinations of the atomic tags and the generated enriched tags and generate a second set of conditions based on free-text conditions. Furthermore, the processor can reconcile the first set of conditions based on identified contradictions and generate the query language statements corresponding to the search query, the query language statements based in part on the first set of conditions and the second set of conditions. The second set of conditions can correspond to terms of the search query that are not associated with any of the first set of conditions, which can result in a more focused and accurate retrieval of the relevant documents. Moreover, the processor can display a plurality of documents from the document repository that satisfy the query language statement.
In yet another embodiment, a computer program product for generating a query language statement can include a computer readable storage medium having program instructions embodied therewith, wherein the computer readable storage medium is non-transitory. The program instructions, when executed by a processor, can cause the processor to detect, via the processor, a search query corresponding to a document repository and generate, via the processor, a modified search query by adding atomic tags to the search query, the atomic tags being based on prior knowledge obtained by static analysis of the document repository and semantic rules. The program instructions can also cause the processor to generate, via the processor, enriched tags based on combinations of the atomic tags and any previously identified enriched tags and add the generated enriched tags to the modified search query. A first set of conditions based on combinations of the atomic tags and the generated enriched tags and a second set of conditions based on free-text conditions may also be generated via the processor. The program instructions can also cause the processor to reconcile, via the processor, the first set of conditions based on identified contradictions and generate, via the processor, the query language statements corresponding to the search query, the query language statements based in part on the first set of conditions and the second set of conditions. The second set of conditions can correspond to terms of the search query that are not associated with any of the first set of conditions, which can result in a more focused and accurate retrieval of the relevant documents. Furthermore, the program instructions can cause the processor to display, via the processor, a plurality of documents from the document repository that satisfy the query language statement based on a score.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 depicts a block diagram of a cloud computing node that can generate a query language statement according to an embodiment described herein;
FIG. 2 depicts a cloud computing environment that can generate a query language statement according to an embodiment described herein;
FIG. 3 depicts abstraction model layers used to implement techniques for generating a query language statement according to an embodiment described herein;
FIG. 4 is a process flow diagram of an example method that can generate a query language statement according to an embodiment described herein;
FIG. 5 is a block diagram illustration of an example system for generating a query language statement according to an embodiment described herein; and
FIG. 6 is a tangible, non-transitory computer-readable medium that can generate a query language statement according to an embodiment described herein.
DETAILED DESCRIPTION
Retrieving data from document repositories based on natural language search queries can be imprecise and cumbersome. For example, a natural language search query can include ambiguous words or phrases that prevent the search query from identifying the appropriate documents. The techniques described herein convert a natural language search query (also referred to as a search query) into formal constraints based on repository content and structure (as determined by the repository static analysis), domain knowledge, personal information, and rules. The formal constraints can be used to generate a query language statement to retrieve and display documents from a document repository.
It is understood in advance that although this disclosure includes a detailed description on cloud computing, implementation of the teachings recited herein are not limited to a cloud computing environment. Rather, embodiments of the present invention are capable of being implemented in conjunction with any other type of computing environment now known or later developed.
Cloud computing is a model of service delivery for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, network bandwidth, servers, processing, memory, storage, applications, virtual machines, and services) that can be rapidly provisioned and released with minimal management effort or interaction with a provider of the service. This cloud model may include at least five characteristics, at least three service models, and at least four deployment models.
Characteristics are as follows:
On-demand self-service: a cloud consumer can unilaterally provision computing capabilities, such as server time and network storage, as needed automatically without requiring human interaction with the service's provider.
Broad network access: capabilities are available over a network and accessed through standard mechanisms that promote use by heterogeneous thin or thick client platforms (e.g., mobile phones, laptops, and PDAs).
Resource pooling: the provider's computing resources are pooled to serve multiple consumers using a multi-tenant model, with different physical and virtual resources dynamically assigned and reassigned according to demand. There is a sense of location independence in that the consumer generally has no control or knowledge over the exact location of the provided resources but may be able to specify location at a higher level of abstraction (e.g., country, state, or datacenter).
Rapid elasticity: capabilities can be rapidly and elastically provisioned, in some cases automatically, to quickly scale out and rapidly released to quickly scale in. To the consumer, the capabilities available for provisioning often appear to be unlimited and can be purchased in any quantity at any time.
Measured service: cloud systems automatically control and optimize resource use by leveraging a metering capability at some level of abstraction appropriate to the type of service (e.g., storage, processing, bandwidth, and active user accounts). Resource usage can be monitored, controlled, and reported providing transparency for both the provider and consumer of the utilized service.
Service Models are as follows:
Software as a Service (SaaS): the capability provided to the consumer is to use the provider's applications running on a cloud infrastructure. The applications are accessible from various client devices through a thin client interface such as a web browser (e.g., web-based email). The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, storage, or even individual application capabilities, with the possible exception of limited user-specific application configuration settings.
Platform as a Service (PaaS): the capability provided to the consumer is to deploy onto the cloud infrastructure consumer-created or acquired applications created using programming languages and tools supported by the provider. The consumer does not manage or control the underlying cloud infrastructure including networks, servers, operating systems, or storage, but has control over the deployed applications and possibly application hosting environment configurations.
Infrastructure as a Service (IaaS): the capability provided to the consumer is to provision processing, storage, networks, and other fundamental computing resources where the consumer is able to deploy and run arbitrary software, which can include operating systems and applications. The consumer does not manage or control the underlying cloud infrastructure but has control over operating systems, storage, deployed applications, and possibly limited control of select networking components (e.g., host firewalls).
Deployment Models are as follows:
Private cloud: the cloud infrastructure is operated solely for an organization. It may be managed by the organization or a third party and may exist on-premises or off-premises.
Community cloud: the cloud infrastructure is shared by several organizations and supports a specific community that has shared concerns (e.g., mission, security requirements, policy, and compliance considerations). It may be managed by the organizations or a third party and may exist on-premises or off-premises.
Public cloud: the cloud infrastructure is made available to the general public or a large industry group and is owned by an organization selling cloud services.
Hybrid cloud: the cloud infrastructure is a composition of two or more clouds (private, community, or public) that remain unique entities but are bound together by standardized or proprietary technology that enables data and application portability (e.g., cloud bursting for load-balancing between clouds).
A cloud computing environment is service oriented with a focus on statelessness, low coupling, modularity, and semantic interoperability. At the heart of cloud computing is an infrastructure comprising a network of interconnected nodes.
Referring now toFIG. 1, a schematic of an example of a cloud computing node that can generate a query language statement is shown.Cloud computing node100 is only one example of a suitable cloud computing node and is not intended to suggest any limitation as to the scope of use or functionality of embodiments of the invention described herein. Regardless,cloud computing node100 is capable of being implemented and/or performing any of the functionality set forth hereinabove.
Incloud computing node100 there is a computer system/server102, which is operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well-known computing systems, environments, and/or configurations that may be suitable for use with computer system/server102 include, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputer systems, mainframe computer systems, and distributed cloud computing environments that include any of the above systems or devices, and the like.
Computer system/server102 may be described in the general context of computer system—executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, and so on that perform particular tasks or implement particular abstract data types. Computer system/server102 may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.
As shown inFIG. 1, computer system/server102 incloud computing node100 is shown in the form of a general-purpose computing device. The components of computer system/server102 may include, but are not limited to, one or more processors orprocessing units104, asystem memory106, and abus108 that couples various system components includingsystem memory106 toprocessor104.
Bus108 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Computer system/server102 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computer system/server102, and it includes both volatile and non-volatile media, removable and non-removable media.
System memory106 can include computer system readable media in the form of volatile memory, such as random access memory (RAM)110 and/orcache memory112. Computer system/server102 may further include other removable/non-removable, volatile/non-volatile computer system storage media. By way of example only,storage system114 can be provided for reading from and writing to a non-removable, non-volatile magnetic media (not shown and typically called a “hard drive”). Although not shown, a magnetic disk drive for reading from and writing to a removable, non-volatile magnetic disk (e.g., a “floppy disk”), and an optical disk drive for reading from or writing to a removable, non-volatile optical disk such as a CD-ROM, DVD-ROM or other optical media can be provided. In such instances, each can be connected tobus108 by one or more data media interfaces. As will be further depicted and described below,memory106 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.
Program/utility116 having a set (at least one) of program modules, such as aquery module118, atag module120,condition module122, and anoutput module124 may be stored inmemory106 by way of example, and not limitation, as well as an operating system, one or more application programs, other program modules, and program data. Each of the operating system, one or more application programs, other program modules, and program data or some combination thereof, may include an implementation of a networking environment. Thequery module118,tag module120,condition module122, andoutput module124 generally carry out the functions and/or methodologies of embodiments of the invention as described herein.
Computer system/server102 may also communicate with one or moreexternal devices126 such as a keyboard, a pointing device, adisplay128, etc.; one or more devices that enable a user to interact with computer system/server102; and/or any devices (e.g., network card, modem, etc.) that enable computer system/server102 to communicate with one or more other computing devices. Such communication can occur via Input/Output (I/O) interfaces130. Still yet, computer system/server102 can communicate with one or more networks such as a local area network (LAN), a general wide area network (WAN), and/or a public network (e.g., the Internet) vianetwork adapter132. As depicted,network adapter132 communicates with the other components of computer system/server102 viabus108. It should be understood that although not shown, other hardware and/or software components could be used in conjunction with computer system/server102. Examples, include, but are not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data archival storage systems, etc.
In some embodiments, thequery module118 can detect a search query corresponding to a document repository. A document repository, as referred to herein, can include any collection of emails or documents, and the like. A document repository may not include a collection of websites in some examples. The search query can attempt to retrieve documents from the document repository based on words or conditions in the search query.
In some embodiments, thetag module120 can generate a modified search query by adding atomic tags to the search query, the atomic tags based on prior knowledge obtained by static analysis of the document repository, semantic-aware rules and enrichment rules (also referred to as “the Enrichment Rules Engine”). For example, thetag module120 can analyze the document repository to detect prior knowledge, such as a multitude of various fields with specific meaning, and generate specific word and phrases lists (also referred to herein as entity lists) from the document repository. The entity lists can relate to certain aspects of the domain or document repository, which can include atomic tags that indicate associations between terms in a search query and additional related terms. In some embodiments, thetag module120 can also generate enriched tags based on combinations of previously found atomic tags and add the enriched tags to the modified search query. The enriched tags can include any suitable combination of atomic tags and previously identified enriched tags. For example, the enriched tags can include consecutive atomic tags, or any other suitable sequence of atomic and previously identified enriched tags.
In some embodiments, the condition module122 (also referred to as the “Condition Rules Engine”) can generate a set of atomic conditions based on the enriched tags from thetag module120. Thecondition module122 contains a “Condition Rules Engine” and the logic to combine the atomic conditions into an abstract condition structure. In some examples, thecondition module122 can add free text search constraints for a portion of a query that is not covered by abstract conditions and boiler-plate phrases. The free-text constraints are used to search entire documents for terms from the search query that do not match the atomic conditions. Unlike typical search engines, the free text conditions can be limited to those parts of the query that have not been otherwise covered by the conditions found in the “Enrichment Rules Engine”. This increases the accuracy of the results. Based on the set of atomic conditions, a combination of conditions is generated, in such a way that the atomic conditions do not contradict each other. In some examples, thecondition module122 can also reconcile the combination of conditions based on identified contradictions. For example, thecondition module122 can detect that certain combinations of conditions are illogical. Accordingly, thecondition module122 can indicate that the combination of the conditions is invalid and should be reorganized in order to reconcile a contradiction between conditions. For example, an “and” condition can be converted into an “or” condition.
In some embodiments, theoutput module124 can generate the query language statements corresponding to the search query, the query language statements based on the generated condition by the condition module and display, via the processor, a plurality of documents from the document repository that satisfy the query language statements.
It is to be understood that the block diagram ofFIG. 1 is not intended to indicate that the computing system/server102 is to include all of the components shown inFIG. 1. Rather, the computing system/server102 can include fewer or additional components not illustrated inFIG. 1 (e.g., additional memory components, embedded controllers, additional modules, additional network interfaces, etc.). Furthermore, any of the functionalities of thequery module118,tag module120,condition module122, andoutput module124 may be partially, or entirely, implemented in hardware and/or in the processing unit (also referred to herein as processor)104. For example, the functionality may be implemented with an application specific integrated circuit, or in logic implemented in theprocessor104, among others.
Referring now toFIG. 2, illustrativecloud computing environment200 that can generate a query language statement is depicted. As shown,cloud computing environment200 comprises one or morecloud computing nodes100 with which local computing devices used by cloud consumers, such as, for example, personal digital assistant (PDA) orcellular telephone102A,desktop computer102B,laptop computer102C, and/orautomobile computer system102N may communicate.Nodes100 may communicate with one another. They may be grouped (not shown) physically or virtually, in one or more networks, such as Private, Community, Public, or Hybrid clouds as described hereinabove, or a combination thereof. This allowscloud computing environment200 to offer infrastructure, platforms and/or software as services for which a cloud consumer does not need to maintain resources on a local computing device. It is understood that the types ofcomputing devices102A-N shown inFIG. 2 are intended to be illustrative only and thatcomputing nodes100 andcloud computing environment200 can communicate with any type of computerized device over any type of network and/or network addressable connection (e.g., using a web browser, among others).
Referring now toFIG. 3, a set of functional abstraction layers used to implement techniques for generating a query language statement provided by cloud computing environment200 (FIG. 2) and node100 (FIG. 1) is shown. It should be understood in advance that the components, layers, and functions shown inFIG. 3 are intended to be illustrative only and embodiments of the invention are not limited thereto. As depicted, the following layers and corresponding functions are provided:
Hardware andsoftware layer302 includes hardware and software components. Examples of hardware components include mainframes, in one example IBM® ZSERIES systems; RISC (Reduced Instruction Set Computer) architecture based servers, in one example IBM PSERIES systems; IBM XSERIES systems; IBM BLADECENTER systems; storage devices; networks and networking components. Examples of software components include network application server software, in one example IBM WEBSPHERE application server software; and database software, in one example IBM DB2 database software. (IBM, ZSERIES, PSERIES, XSERIES, BLADECENTER, WEBSPHERE, AND DB2 are trademarks of International Business Machines Corporation registered in many jurisdictions worldwide).
Virtualization layer304 provides an abstraction layer from which the following examples of virtual entities may be provided: virtual servers; virtual storage; virtual networks, including virtual private networks; virtual applications and operating systems; and virtual clients.
In one example,management layer306 may provide the functions described below. Resource provisioning provides dynamic procurement of computing resources and other resources that are utilized to perform tasks within the cloud computing environment. Metering and pricing provide cost tracking as resources are utilized within the cloud computing environment, and billing or invoicing for consumption of these resources. In one example, these resources may comprise application software licenses. Security provides identity verification for cloud consumers and tasks, as well as protection for data and other resources. User portal provides access to the cloud computing environment for consumers and system administrators. Service level management provides cloud computing resource allocation and management such that required service levels are met. Service Level Agreement (SLA) planning and fulfillment provides pre-arrangement for, and procurement of, cloud computing resources for which a future requirement is anticipated in accordance with an SLA.
Workloads layer308 provides examples of functionality for which the cloud computing environment may be utilized. Examples of workloads and functions which may be provided from this layer include: mapping and navigation; software development and lifecycle management; virtual classroom education delivery; data analytics processing; transaction processing; and techniques performed by thequery module118,tag module120,output module124, andcondition module122.
FIG. 4 is an example of a method that can generate a query language statement. Themethod400 can be implemented with any suitable computing device, such as the computing system/server102 ofFIG. 1.
Atblock402, aquery module118 can detect a search query corresponding to a document repository. In some embodiments, the document repository can include a collection of emails, a collection of documents, and the like. In some embodiments, the search query can be a request to locate data stored in the document repository. For example, the search query may include a word, phrase, date, or any other suitable information that can be used to identify documents to be retrieved. In some embodiments, the document repository may not correspond to a collection of websites.
Atblock404, thetag module120 can generate a modified search query by adding atomic tags to the search query, the atomic tags being based on prior knowledge obtained by static analysis of the document repository and semantic rules. For example, thetag module120 can detect structured data conditions based on atomic tags that are to be added to the search query to increase the probability of identifying the documents corresponding to the search query. In some examples, the atomic tags can be based on prior knowledge obtained by static analysis of the document repository such as previously generated entities lists, facets, and relationship between entities. A facet, as referred to herein, can include a document repository field that contains a restricted number of values. To discover facets, the document repository may be processed in advance to identify facets as well as non-facet metadata fields that have a limited range of values and can therefore be considered to be facets. A ‘facet’ word list may be created automatically from the limited list of values or enums. The entity lists can associate any suitable number of words or phrases from the search query with additional terms that share a common characteristic. In addition, thetag module120 can include an “enrichment rules engine” that may make use of semantic knowledge, for example words associated with a time period such as the terms minute, hour, day, week, month, quarter, and year, among others. Similarly, a word list may include various words associated with months such as January, February, March, etc. In some embodiments, any suitable number of word lists can be generated and searched for associated terms to be used as atomic tags for the search query. Accordingly, each term or phrase in the search query can be tagged or associated with any number of terms identified from an entities list or any other suitable source for tagging the search query or any combination thereof. In some embodiments, thetag module120 can include a finite state machine that tags the terms of the search query.
In some embodiments, the atomic tags are also identified based on semantic rules. The Enrichment Rules Engine, as referred to herein, can include a name of an action to invoke, followed by conditions that trigger the action. In some embodiments, thetag module120 can detect a semantic rule that results in the generation of an enriched atomic tag. The Enrichment Rule Engine is described in greater detail below in relation toFIG. 5.
Atblock406, thetag module120 can generate enriched tags based on combinations of the existing (e.g., atomic and enriched) tags and add the generated enriched tags to the modified search query. In some examples, the enriched tags can include a combination or sequence of existing tags. For example, the enriched tags can indicate a relationship between consecutive atomic and/or enriched tags. As discussed above, in some embodiments, each existing tag in a rule can represent a semantic group or an entities list. In some examples, each word in the modified search query can be associated with any suitable number of atomic and/or enriched tags.
Atblock408, thecondition module122 can generate a first set of conditions based on combinations of the atomic and enriched tags and generate a second set of conditions based on free-text conditions. For example, the first set of conditions can indicate logical expressions that are to be satisfied by the enriched tags. The conditions can be derived from atomic and/or enriched tags that indicate date ranges, numerical ranges, and the like. In some embodiments, terms in the search query may not be associated with a tag. The terms not associated with a tag can be searched within documents using free-text conditions. For example, the terms that are not associated with a tag can be used in a text search within the document repository. Accordingly, the second set of conditions can be restricted to the parts of the query that are not covered by the first set of conditions, which results in a more focused and accurate retrieval of the relevant documents.
Atblock410, thecondition module122 can reconcile the first set of conditions based on identified contradictions. For example, two or more conditions from the first set of conditions may violate a logical presumption or expression. In some embodiments, thecondition module122 can reconcile the first set of conditions by recombining atomic conditions so that they will not contradict each other. For example, atomic conditions can be recombined using “or” conditions rather than “and” conditions and vice versa.
Atblock412, theoutput module124 can generate query language statements corresponding to the search query. In some embodiments, the query language statements can be based on the structured and free form conditions. For example, theoutput module124 can use the structured and free form conditions to detect any suitable number of fields in the document repository that are to be searched for documents satisfying the search query. In some embodiments, theoutput module124 may detect a separate query language statement for each field of the document repository and combine the query language statements in any suitable fashion. For example, the query language statements can be joined conjunctively, disjunctively, or any combination thereof.
Atblock414, theoutput module124 can display a plurality of documents from the document repository that satisfy the query language statement. For example, theoutput module124 can display any suitable number of documents from the document repository that match conditions associated with the query language statements. In some examples, theoutput module124 can determine results of the query language statements that do not exceed a quality threshold, generate relaxation rules, and modify the query language statement based on the relaxation rules. The quality threshold can indicate whether the documents returned from the document repository include relevant information pertaining to the search query. In some embodiments, the relaxation rules can broaden the query language statement to return a larger number of documents from the document repository. Additional information pertaining to techniques for generating modified search queries, tags, and query language statements are included below in relation toFIG. 5.
Themethod400 can include any suitable number of additional operations. For example, theoutput module124 can also generate a score for the query language statement, wherein the score corresponds to a characteristic of the search query. In some embodiments, the score indicates that the search query references a field search term or a document search term, the score indicating a preference for search query language statements that correspond to the field search term. For example, the score can indicate that the terms of the search query correspond to more of the first set of conditions related to combinations of atomic and enriched tags than the second set of conditions related to free-text conditions. In some examples, a field search term corresponds to a condition based on tags, while a document search term corresponds to a free-text condition. In other embodiments, the query language statement comprises joining at least two queries for documents of a same type with a logical disjunction or at least two queries for documents of a different type with a logical conjunction.
FIG. 5 is a process flow diagram illustrating techniques for generating a query statement using a tag module. The method500 ofFIG. 5 can be implemented with any suitable computing device such as the computing system/server102 ofFIG. 1.
In some embodiments, a query module, such as thequery module118 ofFIG. 1, can detect a naturallanguage search query502 and send the naturallanguage search query502 to atag module508. In some examples, thetag module508 can be implemented with any suitable module such as thetag module120 ofFIG. 1. The natural language query can include any suitable number of words written in a natural language that are to be used for a search query. A search query can request information from a document repository, storage devices, or the internet, and the like. Thetag module508 can generate a query language statement by performing various techniques such as entities list tagging510,quick tagging514,numerical tagging516, quick reference tagging518, Enrichment Rules Engine tagging520,tag map522,disambiguation524, generatingnew tags526,span determination528, Condition Rules Engine tagging532, and semantic ambiguity tagging534. These various tagging techniques are described in greater detail below.
As discussed above, thetag module508 can generate a modified search query by adding structured data (also referred to herein as tags) to the search query. Tags can include any suitable terms or logical expressions that can be added to the natural language query. The tags can improve the accuracy of the results returned by performing a search. In some embodiments, the tags can be based on word lists (also referred to herein as entities lists)506 from astatic analysis database504 and semantic rules. Thetag module508 can use entities list tagging510 by using entities lists506 to tag and parse thenatural language query502. For example, thetag module508 can tag the naturallanguage search query502 based on a word and/or phrase list that can be predefined or predetermined for a document repository. Each entities list506 may represent a tag and query words that may be found tagged by an identifier for each word list506. For example, a word list “period.txt” may contain the terms “minute,” “hour,” “day,” “week,” “month,” “quarter,” “year,” and the like. In another example, a word list “month.txt” may contain the words “January,” “February,” “March,” etc. If a word or phrase from a word list506 is found in the naturallanguage search query502, the word or phrase may be tagged with a list name. For example, a naturallanguage search query502 containing “the email sent last week of January” may be tagged “the email sent last <TIME QUALIFIER> week <PERIOD> of January <MONTH>.”
In some examples, a text word or phrase may correspond to several tags and each word or phrase in a word list506 may be associated with a more abstract entity depending on the concept that the word list represents. For example, a word list506 may be identified as “date_near.txt” containing the following words and phrases and numerical qualifiers: “the day before yesterday; −2,” yesterday; −1,” “today; 0,” “tomorrow; +1,” “the day after tomorrow; +2,” and the like. In this example, each word or phrase is mapped to a number representing an offset from “today.” This number is available within the generated tag for subsequent date calculations.
Another example of entities list tagging510 by thetag module508 can include identifying various writings of the same name that appear in the naturallanguage search query502. For example, a name may be shortened to a nickname, rather than a full name. In this scenario, a name list among the word lists506 may include “nicknames.txt” containing two fields in each line, wherein the first field includes the word as encountered in the query and the second field includes an alias for the word from a document repository such as the static analysis resource504 (e.g., this information is not crafted manually but generated automatically by the static analysis). For example, the “nicknames.txt” word list may include “alex; alexis,” “allie: alice,” “elsie; alice;” “lisa; alice,” “allie; alicia,” “elsie; alicia,” “lisa; alicia,” and the like.
In some examples, an entity list among the entity lists506 may include phrases that generate an open ended date range. For example, a word list may include words or phrases such as “later than,” “no later than,” “no sooner than,” and the like. In this scenario, tagging may contain information to facilitate generation of an open range, such as which side of the range should be opened, whether the boundary is included or excluded from the range, and so on.
In some embodiments, thetag module508 can use afinite state machine512 to generate tags using quick tagging514 techniques. For example, thefinite state machine512 can generate tags that include a tag name, such as “date_near_tag,” a tag value, such as “yesterday,” and in some cases a mapping or numerical value, such as “−1,” as well as a location/span of the tag in the naturallanguage search query502. For example, if a naturallanguage search query502 includes “the day before yesterday,” two “date_near_tag” tags may be generated by thefinite state machine512. The two “date_near_tag” tags may include one with the value “the day before yesterday,” the mapping of “−2,” and a location from word number one of thesearch query502 to word number four of thesearch query502. An additional tag can also include a value “yesterday,” the mapping “−1,” and the location from word number four to word number four of thesearch query502.
In some embodiments, thetag module508 can also implementnumerical tagging516. For example, thetag module508 can scan the naturallanguage search query502 for numbers and generate appropriate numeric tags. In this example, sequences identified as numbers are tagged with a “number” tag, and mapped to the associated number. In some cases, numbers may also be tagged as “number_cardinal,” “number_ordinal,” “day,” “day_cardinal,” “day_ordinal,” “year,” and the like. For example, the phrase “3rd” may be tagged with the tags “number” and “number_ordinal.” To generate these tags, atag module508, such as thetag module120 ofFIG. 1, may make use of mappings found by previously generated tags. For example, the text phrase “twenty-three” may already be tagged by a word list atblock510 with the mapping “23.” Further, in some cases, short matches may be removed in favor of longer ones. For example, adjacent numeric phrases such as “two-hundred” “and” “twenty three” may be combined into a single number.
In some embodiments, thetag module508 can also set aquick reference518. In some examples, each detected tag name can be mapped to a list of corresponding tags. For example, a “date_near_tag” tag name may be mapped to multiple corresponding tags. Specifically, a tag for “yesterday” and a tag for “the day before yesterday” can be mapped or associated together.
In some examples, thetag module508 can generate tags using a finite state machine (FSM)512. In this embodiment, thetag module508 may be relatively quick tagging a naturallanguage search query502 in comparison tonon-finite state machine512 based tagging. TheFSM512 may be used in conjunction with quick tagging514 techniques. In some examples, thequick tagging514 may be generated via theFSM512 based on word lists. In this example, a single pass on the natural language search query with theFSM512 can tag the words and phrases of the naturallanguage search query502 with the words and/or phrases in one or more of the word lists506.
In some examples, thetag module508 can use arule engine520 to generate arule system521 that searches for sequences of tags and/or original words from the naturallanguage search query502. Therule system521 can include any number of rules that can detect adjacent tags that are at a given proximity from each other. For example, a word list from among the word lists506 may be “next_last.txt” containing the phrase “the last,” or similar. A second word list from the word lists506 may be “periods.txt” as defined above. A rule (<Action>, . . . last, next, Number_Cardinal, periods) may be added to therule engine520 that can find word sequences such as “the next five weeks” or “the last thirty days.” Each tag name specified in the rule represents a semantic group or a word list. The rule searches for any words or phrases from the first word list which are followed by any words or phrases from the second word list, and so on. More specifically, the rule can detect occurrences of the tags from a tag map or sequence of tags, and compatible tag combinations. When a match is found (tag conditions are identified) an action can be triggered by a rule. In short, the rule finds valid combinations of tags, and for each combination of tags, the rule triggers an action passing the combination of tags as input for the action.
In some embodiments, therule engine520 can include two types of engine actions. First, a new tag may be added to the tag map. The new tag may be available for matching by subsequent rules of therules engine520. Second, structured conditions may be generated that may later be used to formulate structured query language (SQL) queries that correspond to the naturallanguage search query502.
In some examples, a rule can contain a name of an action to invoke, followed by conditions that trigger the action. For example, a rule may include makeDate/simple; . . . date_near. In this rule, “date_near” may be a condition that was found in the naturallanguage search query502. The rule component “makeDate/simple” may be the action of this rule, meaning that a routine will be invoked to create a new <DATE> tag. This rule is triggered by a single condition: the existence of the tag “date_near.” As discussed above, this tag is created when the naturallanguage search query502 contains such phrases as “the day before yesterday.” When the rule is triggered, the tags included in the rule can be associated with the “makeDate” action. Besides triggering tags, the action can receive an optional context parameter, in this case the string “simple.” In some examples, a generic “makeDate” action can use this context string to perform various tasks. In one example, a naturallanguage search query502 including “the day before yesterday” can result in the action being invoked twice, once for the tag that covers “yesterday” and once for the tag that covers “the day before yesterday.” In one embodiment, the “makeDate” action results in an inspection of the tags corresponding to the natural language search query502 (in this example a single “date_near” tag) and extraction of the mapping associated with the tag (in this example the offset from today).
In some examples, when invoked for “yesterday” the “makedate” action can find an offset value equal to negative one. For “the day before yesterday,” the “makedate” action can detect an offset value equal to negative two. The “makedate” action can send the offset to a date utility that returns a time range structure denoted by two (begin, end) date structures: getPeriodRange (PERIOD period, int offset, int numberOfltems). This date utility receives a unit of time (day, month, week etc.), an offset from current time, and the number of time units requested. To receive a range for “the day before yesterday” a call getPeriodRange(PERIOD.DAY, −2, 1) can be initiated. The returned time range is delimited by (begin, end) dates. It is possible to create an open-ended time range by setting one of the two dates to a small or large constant.
In some examples, thetag module508 can enrich a tag map as indicated atblock522. For example, the “makeDate” action can be used for enriching the tag map with a new DATE tag. A DATE tag contains, for example, in addition to the normal tag attributes, a structured date range object with start and end dates. In one example, the action is invoked twice, and two DATE tags are added to the tag map. The first tag has the same sentence location as the word “yesterday” (location word number four with a span to word number four) and includes a date range covering yesterday. The second tag has the same sentence location as the phrase “the day before yesterday” (location word number one with a span to word number four) and includes a date range for two days in the past.
In some embodiments, thetag module508 can disambiguate524 meanings by removing overlapping definitions. For example, since the phrase “the day before yesterday” may be preferred to the term “yesterday”, thetag module508 can periodically invoke a cleanup rule that scans DATE tags for overlaps and removes the shorter spans. This can be done by inserting a special action into the rules system: CLEANUP; DATE.
In some examples, thetag module508 can generatenew tags526 based on previous findings. Generating new tags enables the incremental creation of complex expressions and creating new rules from tags. For example, once a DATE tag has been added to the tag map, subsequent rules can make use of the new DATE tag. In another example, thetag module508 can generate a rule for flexible formatting of a date. For example, a context string may contain formatting characters. For a rule that finds nine/ninth of November 2012, the rule may include “formattedDate/DxmY; DAY; WORD/of; month; YEAR;” wherein DxmY are formatting characters. The syntax WORD/of indicates a search for an occurrence of the word “of” in the naturallanguage search query502. In another example, to parse “the first/last 2 weeks in this quarter”, thetag module508 can use the rule: “makeDate/period_in_period; first_last; NUMBER_CARDINAL; periods; in_of; DATE.” In one example, this rule may be too broad. In general, rules can trigger illogical input phrases such as “the first two years of this month” or “the third month of January.” Therefore, thetag module508 performs an extra validation by determining the graininess of each period (whether the period is best expressed in days, weeks, months, quarters or years) and verifies that the first period fits within the second period. A new tag may be added if the two input periods are compatible.
In another example, generating anew tag526 can include broadening or expanding a date range. For example, a phrase “no later than July of last year,” can result in the generation of a rule that includes “extendDate/after_inclusive; phrase/no later than/2; DATE.” In this rule, the syntax “/2” indicates a distance of one or two words between the phrase and the DATE tag is allowed. A context string is passed “after_inclusive” to the action to indicate that the open range should include the original date range (July is included in the date range).
In another example, generatingnew tags526 can include adding inexact dates, such as dates matching “the end of July.” In this example, a rule may include “makeDate/fuzzy; phrase/the end of; DATE.” The actual end period used is configurable and depends on the size of the date range that is being modified. In another example, a single range may be formed from two date ranges. For example, the phrase “beginning on March the second and ending at the end of next April” may be used to form a single range. The rule may include “extendDate/merge_1×2; WORD/beginning; on_at; DATE; phrase/and ending; on_at; DATE.” In this scenario, the rule may be generalized further by using tags that contain synonyms for “beginning” and “ending.”
In some cases, thetag module508 can determine the span528 (sentence location) of a new tag for better accuracy in tag generation and disambiguation. Generally the span of a new tag is the span of the input conditions. For example, if “last” is at location (1,1) of the naturallanguage search query502 and “Friday” is at location (2,2) of the naturallanguage search query502, then the rule “makeWeekday; next_last; weekday;” may generate a DATE tag with the location (1,2). The first number of the parenthetical can indicate the location of a word in the naturallanguage search query502 and the second number of the parenthetical can indicate location of a last word in the naturallanguage search query502.
In some embodiments, when interpreting the naturallanguage search query502, the resulting span may be less than the span of the conditions that generated it. For example, consider the phrases “created Friday” and “expires Friday”. In the first example, the phrase “created Friday” refers to a past date while the phrase “expires Friday” refers to a future date. A date rule that has conditions associated with words that precede “Friday” can be useful, as long as these conditions are not a part of the new date tag. In one embodiment, thetag module508 creates word lists, such as ‘hint_past.txt’ and ‘hint_future.txt,’ which indicate whether to expect a past or future date. Tags that start with ‘hint_’ help form the Rule Engine condition but do not contribute to the span of the new tag. Thetag module508 can then add two rules: “makeWeekday/future; hint_future//4; weekday;” and “makeWeekday/past; hint_past//4; weekday.” These rules look for a past hint (e.g., ‘created’) or future hint (e.g., ‘expires’) four or less words before the weekday.
In some examples, an extra context parameter indicates if the makeWeekday action is to create a past or future date. For example, a word list such as ‘hint_’ can indicate not to include the first input span. The action can generate a correct DATE tag for “Friday” with the same location as the word “Friday”. The new tag will take precedence over the word “Friday” during disambiguation.
As discussed above, thetag module508 can generate structured conditions based on enriched tags, as indicated atblock530. Asecond rule engine532 can include derived rules that trigger an action based on combinations of tags, wherein the actions create a set of structured conditions that can later serve as a base for SQL queries. For example, conditions for dates may include “dateCondition/date_sent_handler,” “date_sent//4; DATE,” wherein “date_sent” is a word list containing words such as send, sent, copied to, cc-ed. DATE is a time range discovered by the previous rule engine and inserted into the set of tags. If the word ‘sent’ is followed by a date at a maximum distance of 4, a condition is created on a date range by calling the dateCondition method. Thetag module508 can then receive a context parameter, ‘date_sent_handler.’ This string is a handle to a list of repository classes and attributes in the document repository that can be used to formulate the condition.
In some embodiments, documents can be added to the document repository accompanied by extensible markup language (XML) text that facilitates text search. In such examples, thetag module508 can refer to a list of xpaths to be searched (rather than a list of fields to search). Specifying an abstract handle to attributes and xpaths allows a rigid separation between the universal rules and the document repository being queried. Repository-specific information can be sequestered separately, which enables connecting to new repositories without making any changes to generated rules.
In the example above, the definition for date_sent_handler may contain the repository information: “date_sent_handler; Email/SentOn; ICCMail3/ICCMailDate. In this example, two repository fields are mentioned. This indicates that thetag module508 will be creating two conditions, one condition corresponding to the SentOn field in Email documents, and one condition that corresponds to the ICCMailDate field in ICCMail3 documents. Each generated date condition uses a calculated time range to specify a start and end time for the date field. At the final stage, the condition can be translated to a SQL statement such as: “WHERE (SentOn>=20130728T000000Z AND SentOn<=20130803T235959Z).”
Other examples of conditions generated by thisrule engine532 are discussed below. In some example, conditions may include documents sent from/to a specific person or having a specific mimetype or facet. As discussed above, a facet can include a document repository field that contains a restricted number of values. To discover facets, the document repository may be processed in advance to identify facets as well as non-facet metadata fields that have a limited range of values and can therefore be considered to be facets. A ‘facet’ word list may be created automatically from these values. Each line of the word list can include a value and a mapping to the fields where the value appears. The facet word list can be incorporated into thegeneral FSM512. When a facet value is tagged in the naturallanguage search query502, a condition can be generated to look for this value in the relevant metadata fields of the document repository.
In some embodiments, thetag module508 can eliminate semantic ambiguity534. For example, expressions such as “between August and September 1999” can be parsed two ways: “between ((August and September) 1999)” and “between (August and (September 1999).” Therule engine532 can score the former tag when the tag is generated so that the former tag is preferred to the second tag. The scoring of a tag is described in greater detail below.
In some examples, conditions can include abstract constraints which can be transformed into a structured query language (SQL) condition to filter search results. For example, a condition can use tags to detect a date or date range that is to be transformed into a SQL condition statement. In some embodiments, conditions generated by thetag module508 can be added to a temporary holding area, as indicated atblock536. Since generated conditions may not be compatible with each other, the conditions can be stored in thetemporary holding area536 for further processing. In some cases, a naturallanguage search query502 can be complemented with free text conditions. In this scenario, parts of the naturallanguage search query502 that could not be parsed into structured conditions are converted into free text conditions. Free text can be searched in all text fields or in specific fields. For example, a rule that triggers on “WORD/with; *; in_title;” can generate a condition to search for the wildcard words in the title or subject fields. In some examples, thetag module508 removes stop words and boilerplate phrases that have been tagged. Removing boilerplate expressions may be performed using the same rule mechanism including creating a word list called “skip_verb.txt” containing phrases such as “I want,” “please give me,” “get,” and the like. As another example, a word list called “skip_object.txt” containing phrases such as “the document,” “email,” “files,” and the like may be created. In some cases, a word may be inserted at the beginning of the naturallanguage search query502 to enable conditions to be generated corresponding to the beginning of the naturallanguage search query502. Then, the rule “removeFreetext; WORD/^; skip_verb; skip_object;” may capture and remove a large number of boilerplate expressions from the beginning of the naturallanguage search query502. In one example, this rule may not actually remove the boilerplate phrases from the naturallanguage search query502. The boilerplate phrases are still available for other types of tagging and condition generation, but the boilerplate phrases may not be sent to a free-text search.
Atag module508 can also generate afinal condition structure538. In one implementation, the final structure (also referred to as a query language statement)538 can include multiple levels. A query to the repository may include different SQL queries, wherein each SQL query selects for a specific document class. Each of these SQL queries may have several AND clauses. In a third level, each item in an AND clause may have several conditions “ORed” together.
In some cases, compatible conditions may be generated from thetemporary holding area536. In some cases, heuristics for generating conditions may be implemented. For example, if there are several conditions found for the same docClass (e.g., “Document Class”) attribute (or xpath), the conditions can be disjunctively combined. Otherwise, if a docClass has conditions based on different attributes, the conditions can be conjunctively combined. Furthermore, if two docClasses are related (one class is derived from the other class), an attempt is made to merge their conditions, and a single SQL query is generated on the most-derived object. In other words, least-derived docClasses attempt to “donate” their conditions to derived docClasses that have conditions. If such a donation could not be made, a separate SQL query is generated for those least-derived docClasses.
In some cases, a donation may not always be made, as the following example will illustrate. Consider a least-derived document class ‘Document’ with two derived classes, ‘Email’ & ‘Record’. The Document class has the attribute ‘Creator’ while the Record class has the attribute ‘DeclaredBy’ and the Email class has the attribute ‘From.’ In this case donating ‘Document’ attributes to ‘Email’ does not violate any conditions or rules. The ‘Document.Creator’ and ‘Record.DeclaredBy’ attributes complement each other, so the conditions can be conjunctively combined. One example may be an SQL example: “SELECT d.* FROM Record d WHERE (d.Creator=‘alice’) AND (d.DeclaredBy=‘bob’).” However, conditions on ‘Document.Creator’ and ‘Email.From’ are incompatible. Although ‘Email’ is derived from ‘Document’, in some examples, one of these two fields can be populated, depending on whether a Document class instance or Email class instance is detected. Therefore a SQL query like “SELECT d.* FROM Email d WHERE (d.Creator=‘john’) AND (d.From=‘john@my.com’)” may fail, and this type of merger may not be allowed. The techniques described herein handle this issue by keeping a list of incompatible docClass_Attribute pairs. Incompatible conditions will not be joined into the same SQL query but will generate separate SQL queries (one for Document.Creator and another for Email.From).
In some cases, full queries can be composed from the condition structure. The structure can be converted to SQL statements appropriate and conforming to the document repository. The final results may be federated from the returned result sets. If no results are returned, or their score is low, it is possible to relax some of the conditions and try again. In some examples, thetag module508 can specify a relaxed alternative for some of the conditions as the conditions are being generated. For example, the phrase “please show me the email I sent to John a week ago” can be relaxed in two ways. In a first way, the approximated range of “a week ago” may be expanded. In a second way, repository instances of “John,” and not just instances which are in close affinity to the sender, may be queried, wherein the affinities were determined bystatic analysis504 of the document repository. In some cases, it is also possible to take a condition that searches for a value in a specific field, and change the condition into a global free text search.
As referenced above, thetag module508 can generate scores for tags and/or rules using scoring heuristics. In some embodiments, one aspect of the system is a weighting algorithm which is meant to enhance the shallow parser outcome. In some cases, facet scores may be boosted. When a value or enum alias is found in thesearch query502, a condition may be generated that searches for the value in the appropriate document class attributes based on thestatic analysis504 of the document repository. Heuristic scores can be generated based on various considerations. In some examples, the score can be based on whether the search query contains (beside the facet value) a “booster” tag that relates to the docClass or the docClass attribute where that facet appears. For example, the tag ‘email’ hints that the search can include fields that belong to the Email docClass. In some cases, closeness of a booster tag to the facet value in the search query may be a factor in generating heuristic scores. In this scenario, a close proximity may boost the condition's score. In some cases, whether there is more than one booster tag for this value may be a factor in generating heuristic scores.
Additional score boosters may take into account the following considerations: how much of the query is covered by structured metadata conditions; how much of the query is covered by free text conditions; how many condition terms are in the search query; the ranking returned by the free-text search; the existence of boosted terms found in the title over terms found in the body, how many results were returned by the SQL query; the depth of the docClass being searched, and the like. In some cases, queries that contain more condition terms, relate to a more specific docClass (a derived docClass rather than the generic Document docClass) and return a small number of results, can get an extra boost before they are federated into the final set of results. Further, thetag module508 can begin with a restrictive set of conditions, and relax the conditions if no results or very low-scoring results are returned.
It will be appreciated that the embodiments described above and illustrated in the drawings represent only a few of the many ways of implementing embodiments of the present invention. The environment of the present invention embodiments may include any number of computer or other processing systems (e.g., client or end-user systems, server systems, etc.) and databases or other repositories arranged in any desired fashion, where the present invention embodiments may be applied to any desired type of computing environment (e.g., cloud computing, client-server, network computing, mainframe, stand-alone systems, etc.). The computer or other processing systems employed by the present invention embodiments may be implemented by any number of any personal or other type of computer or processing system (e.g., desktop, laptop, PDA, mobile devices, etc.), and may include any commercially available operating system and any combination of commercially available and custom software (e.g., query module, tag module, condition module, etc.). These systems may include any types of monitors and input devices (e.g., keyboard, mouse, voice recognition, etc.) to enter and/or view information.
It is to be understood that the software (e.g., query module, tag module, condition module, etc.) of the present invention embodiments may be implemented in any desired computer language and could be developed by one of ordinary skill in the computer arts based on the functional descriptions contained in the specification and flow charts illustrated in the drawings. Further, any references herein of software performing various functions generally refer to computer systems or processors performing those functions under software control. The computer systems of the present invention embodiments may alternatively be implemented by any type of hardware and/or other processing circuitry.
The various functions of the computer or other processing systems may be distributed in any manner among any number of software and/or hardware modules or units, processing or computer systems and/or circuitry, where the computer or processing systems may be disposed locally or remotely of each other and communicate via any suitable communications medium (e.g., LAN, WAN, Intranet, Internet, hardwire, modem connection, wireless, etc.). For example, the functions of the present invention embodiments may be distributed in any manner among the various end-user/client and server systems, and/or any other intermediary processing devices. The software and/or algorithms described above and illustrated in the flow charts may be modified in any manner that accomplishes the functions described herein. In addition, the functions in the flow charts or description may be performed in any order that accomplishes a desired operation.
The software of the present invention embodiments (e.g., query module, tag module, condition module, etc.) may be available on a non-transitory computer useable medium (e.g., magnetic or optical mediums, magneto-optic mediums, floppy diskettes, CD-ROM, DVD, memory devices, etc.) of a stationary or portable program product apparatus or device for use with stand-alone systems or systems connected by a network or other communications medium.
The communication network may be implemented by any number of any type of communications network (e.g., LAN, WAN, Internet, Intranet, VPN, etc.). The computer or other processing systems of the present invention embodiments may include any conventional or other communications devices to communicate over the network via any conventional or other protocols. The computer or other processing systems may utilize any type of connection (e.g., wired, wireless, etc.) for access to the network. Local communication media may be implemented by any suitable communication media (e.g., local area network (LAN), hardwire, wireless link, Intranet, etc.).
The system may employ any number of any conventional or other databases, data stores or storage structures (e.g., files, databases, data structures, data or other repositories, etc.) to store information (e.g., tags and conditions). The database system may be implemented by any number of any conventional or other databases, data stores or storage structures (e.g., files, databases, data structures, data or other repositories, etc.) to store information (e.g., tags and conditions). The database system may be included within or coupled to the server and/or client systems. The database systems and/or storage structures may be remote from or local to the computer or other processing systems, and may store any desired data (e.g., tags and conditions).
The present invention embodiments may employ any number of any type of user interface (e.g., Graphical User Interface (GUI), command-line, prompt, etc.) for obtaining or providing information (e.g., objects, fields, and values), where the interface may include any information arranged in any fashion. The interface may include any number of any types of input or actuation mechanisms (e.g., buttons, icons, fields, boxes, links, etc.) disposed at any locations to enter/display information and initiate desired actions via any suitable input devices (e.g., mouse, keyboard, etc.). The interface screens may include any suitable actuators (e.g., links, tabs, etc.) to navigate between the screens in any fashion.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises”, “comprising”, “includes”, “including”, “has”, “have”, “having”, “with” and the like, when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
The corresponding structures, materials, acts, and equivalents of all means or step plus function elements in the claims below are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of the present invention has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.
The present invention may be a system, a method, and/or a computer program product. The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention.
The computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. A non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing. A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.
Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network. The network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.
Computer readable program instructions for carrying out operations of the present invention may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C++ or the like, and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider). In some embodiments, electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present invention.
Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.
These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts or carry out combinations of special purpose hardware and computer instructions.
Referring now toFIG. 6, a block diagram is depicted of an example of a tangible, non-transitory computer-readable medium that can generate a query language statement. The tangible, non-transitory, computer-readable medium600 may be accessed by aprocessor602 over acomputer interconnect604. Furthermore, the tangible, non-transitory, computer-readable medium600 may include code to direct theprocessor602 to perform the operations of the current method.
The various software components discussed herein may be stored on the tangible, non-transitory, computer-readable medium600, as indicated inFIG. 6. For example, aquery module606 can detect a search query corresponding to a document repository. The search query can attempt to retrieve documents from the document repository based on words or conditions in the search query. In some embodiments, thetag module608 can generate a modified search query by adding atomic tags to the search query, the atomic tags based on prior knowledge obtained by static analysis of the document repository and semantic rules. For example, thetag module608 can analyze the document repository to detect prior knowledge, such as wordlists, which can include tags that indicate associations between terms in a search query and additional related terms. In some embodiments, thetag module608 can also generate enriched tags based on combinations of the atomic tags and add the enriched tags to the modified search query. The enriched tags can be based on any suitable combination of atomic and enriched tags.
In some embodiments, thecondition module610 can generate a first set of conditions based on combinations of the atomic tags and enriched tags and generate a second set of conditions based on free-text conditions. The first set of conditions can correspond to terms in a search query that match atomic and/or enriched tags. The second set of conditions can correspond to terms in a search query that do not match atomic and/or enriched tags. In some example, thecondition module610 can also reconcile the first set of conditions based on identified contradictions. For example, thecondition module610 can detect that conditions violate a logical expression. Accordingly, thecondition module610 can indicate that a combination of conditions is invalid and is to be reorganized to reconcile a contradiction between conditions. In some embodiments, the second set of conditions is restricted to the parts of the query that are not covered by the first set of conditions, which can result in a more focused and accurate retrieval of the relevant documents.
In some embodiments, theoutput module612 can generate the query language statements corresponding to the search query, the query language statements based on conditions and display, via the processor, a plurality of documents from the document repository that satisfy the query language statement. In some examples, the query language statements can be joined by a logical conjunction or logical disjunction to identify documents in the document repository that match the search query and are to be displayed.
It is to be understood that any number of additional software components not shown inFIG. 6 may be included within the tangible, non-transitory, computer-readable medium600, depending on the specific application.
The descriptions of the various embodiments of the present invention have been presented for purposes of illustration, but are not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein was chosen to best explain the principles of the embodiments, the practical application or technical improvement over technologies found in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (18)

What is claimed is:
1. A system for generating a query language statement comprising:
a processor to:
detect a search query corresponding to a document repository;
generate a modified search query by adding atomic tags to the search query, the atomic tags being based on an entity list and semantic rules;
generate enriched tags based on combinations of atomic tags and any previously identified enriched tags and add the generated enriched tags to the modified search query;
generate a first set of conditions based on combinations of the atomic tags and the generated enriched tags and generate a second set of conditions based on free-text conditions, the second set of conditions to correspond to terms in the search query that are not associated with any of the first set of conditions;
reconcile the first set of conditions based on identified contradictions;
generate the query language statements corresponding to the search query, the query language statements based in part on the first set of conditions and the second set of conditions; and
display a plurality of documents from the document repository that satisfy the query language statements, wherein
the generating the query language statements comprises:
joining at least two queries for documents of a same type with a logical disjunction or at least two queries for documents of a different type with a logical conjunction.
2. The system ofclaim 1, wherein the query language statements correspond to a score based on a characteristic of the search query.
3. The system ofclaim 2, wherein the score indicates that the search query references a field search term or a document search term, the score indicating a preference for search query language statements that correspond to the field search term.
4. The system ofclaim 1, wherein the entity list comprises prior knowledge obtained by static analysis of the document repository.
5. The system ofclaim 1, wherein the processor generates the semantic rules, each semantic rule indicating an action, the action comprising generating a new atomic or enriched tag or generating a condition.
6. The system ofclaim 5, wherein the processor:
determines that results of the query language statement do not exceed a quality threshold;
generates relaxation rules; and
modifies the query language statements based on the relaxation rules.
7. A computer program product for generating a query language statement, the computer program product comprising a non-transitory computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to:
detect, via the processor, a search query corresponding to a document repository;
generate, via the processor, a modified search query by adding atomic tags to the search query, the atomic tags based on prior knowledge obtained by static analysis of the document repository and semantic rules;
generate, via the processor, enriched tags based on combinations of the atomic tags and any previously identified enriched tags and add the generated enriched tags to the modified search query;
generate, via the processor, a first set of conditions based on combinations of the atomic tags and the generated enriched tags and generate a second set of conditions based on free-text conditions, the second set of conditions corresponding to terms in the search query that are not associated with any of the first set of conditions;
reconcile, via the processor, the first set of conditions based on identified contradictions;
generate, via the processor, the query language statements corresponding to the search query, the query language statements based in part on the first set of conditions and the second set of conditions; and
display, via the processor, a plurality of documents from the document repository that satisfy the query language statements based on a score, wherein
the program instructions further cause the processor to join at least two queries for documents of a same type with a logical disjunction or at least two queries for documents of a different type with a logical conjunction.
8. The computer program product ofclaim 7, wherein the score indicates that the terms of the search query correspond to more of the first set of conditions than the second set of conditions.
9. The computer program product ofclaim 7, wherein the program instructions cause the processor to identify the atomic tags from an entity list, the entity list comprising the prior knowledge obtained by static analysis of the document repository.
10. The computer program product ofclaim 7, wherein the program instructions cause the processor to generate the semantic rules, each semantic rule indicating an action, the action comprising generating a new atomic or enriched tag or generating a condition.
11. A system for generating a query language statement comprising:
a processor to:
detect a search query corresponding to a document repository;
generate semantic rules, each semantic rule indicating an action, the action comprising generating a new atomic or enriched tag or generating a condition;
generate a modified search query by adding atomic tags to the search query, the atomic tags being based on an entity list and the semantic rules;
generate enriched tags based on combinations of atomic tags and any previously identified enriched tags and add the generated enriched tags to the modified search query;
generate a first set of conditions based on combinations of the atomic tags and the generated enriched tags and generate a second set of conditions based on free-text conditions, the second set of conditions to correspond to terms in the search query that are not associated with any of the first set of conditions;
reconcile the first set of conditions based on identified contradictions;
generate the query language statements corresponding to the search query, the query language statements based in part on the first set of conditions and the second set of conditions; and
display a plurality of documents from the document repository that satisfy the query language statements, wherein
the generating the query language statements comprises:
joining at least two queries for documents of a same type with a logical disjunction or at least two queries for documents of a different type with a logical conjunction.
12. The system ofclaim 11, wherein the query language statements correspond to a score based on a characteristic of the search query.
13. The system ofclaim 12, wherein the score indicates that the search query references a field search term or a document search term, the score indicating a preference for search query language statements that correspond to the field search term.
14. The system ofclaim 11, wherein the entity list comprises prior knowledge obtained by static analysis of the document repository.
15. The system ofclaim 11, wherein the processor:
determines that results of the query language statement do not exceed a quality threshold;
generates relaxation rules; and
modifies the query language statements based on the relaxation rules.
16. A computer program product for generating a query language statement, the computer program product comprising a non-transitory computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to:
detect, via the processor, a search query corresponding to a document repository;
generate, via the processor, semantic rules, each semantic rule indicating an action, the action comprising generating a new atomic or enriched tag or generating a condition;
generate, via the processor, a modified search query by adding atomic tags to the search query, the atomic tags based on prior knowledge obtained by static analysis of the document repository and the semantic rules;
generate, via the processor, enriched tags based on combinations of the atomic tags and any previously identified enriched tags and add the generated enriched tags to the modified search query;
generate, via the processor, a first set of conditions based on combinations of the atomic tags and the generated enriched tags and generate a second set of conditions based on free-text conditions, the second set of conditions corresponding to terms in the search query that are not associated with any of the first set of conditions;
reconcile, via the processor, the first set of conditions based on identified contradictions;
generate, via the processor, the query language statements corresponding to the search query, the query language statements based in part on the first set of conditions and the second set of conditions; and
display, via the processor, a plurality of documents from the document repository that satisfy the query language statements based on a score, wherein
the generating the query language statements comprises:
joining at least two queries for documents of a same type with a logical disjunction or at least two queries for documents of a different type with a logical conjunction.
17. The computer program product ofclaim 16, wherein the score indicates that the terms of the search query correspond to more of the first set of conditions than the second set of conditions.
18. The computer program product ofclaim 16, wherein the program instructions cause the processor to identify the atomic tags from an entity list, the entity list comprising the prior knowledge obtained by static analysis of the document repository.
US14/808,1382015-07-242015-07-24Generating and executing query language statements from natural languageActive2035-09-23US10180989B2 (en)

Priority Applications (2)

Application NumberPriority DateFiling DateTitle
US14/808,138US10180989B2 (en)2015-07-242015-07-24Generating and executing query language statements from natural language
US15/140,839US10169471B2 (en)2015-07-242016-04-28Generating and executing query language statements from natural language

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
US14/808,138US10180989B2 (en)2015-07-242015-07-24Generating and executing query language statements from natural language

Related Child Applications (1)

Application NumberTitlePriority DateFiling Date
US15/140,839ContinuationUS10169471B2 (en)2015-07-242016-04-28Generating and executing query language statements from natural language

Publications (2)

Publication NumberPublication Date
US20170024443A1 US20170024443A1 (en)2017-01-26
US10180989B2true US10180989B2 (en)2019-01-15

Family

ID=57836356

Family Applications (2)

Application NumberTitlePriority DateFiling Date
US14/808,138Active2035-09-23US10180989B2 (en)2015-07-242015-07-24Generating and executing query language statements from natural language
US15/140,839ActiveUS10169471B2 (en)2015-07-242016-04-28Generating and executing query language statements from natural language

Family Applications After (1)

Application NumberTitlePriority DateFiling Date
US15/140,839ActiveUS10169471B2 (en)2015-07-242016-04-28Generating and executing query language statements from natural language

Country Status (1)

CountryLink
US (2)US10180989B2 (en)

Families Citing this family (27)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US10332511B2 (en)2015-07-242019-06-25International Business Machines CorporationProcessing speech to text queries by optimizing conversion of speech queries to text
US10180989B2 (en)2015-07-242019-01-15International Business Machines CorporationGenerating and executing query language statements from natural language
US10078632B2 (en)*2016-03-122018-09-18International Business Machines CorporationCollecting training data using anomaly detection
US10713300B2 (en)*2017-11-032020-07-14Google LlcUsing distributed state machines for human-to-computer dialogs with automated assistants to protect private data
US20190138996A1 (en)*2017-11-032019-05-09Sap SeAutomated Intelligent Assistant for User Interface with Human Resources Computing System
US11573990B2 (en)2017-12-292023-02-07Entefy Inc.Search-based natural language intent determination
US10565189B2 (en)*2018-02-262020-02-18International Business Machines CorporationAugmentation of a run-time query
CN108388650B (en)*2018-02-282022-11-04百度在线网络技术(北京)有限公司Search processing method and device based on requirements and intelligent equipment
US11416481B2 (en)*2018-05-022022-08-16Sap SeSearch query generation using branching process for database queries
CN108920452B (en)*2018-06-082022-05-17北京明略软件系统有限公司Information processing method and device
US12001800B2 (en)2018-09-132024-06-04Feedzai— Consultadoria e Inovação Tecnológica, S.A.Semantic-aware feature engineering
CN111753142B (en)*2019-03-282023-08-01北京百度网讯科技有限公司 Tag-based retrieval method and device
CN112286927A (en)*2019-07-252021-01-29北京中关村科金技术有限公司Method, device and storage medium for inquiring user data
CN111177501B (en)*2019-12-132023-11-17杭州首展科技有限公司Label processing method, device and system
US11263269B2 (en)*2020-01-062022-03-01International Business Machines CorporationExpert-system translation of natural-language input into atomic requirements
CN111327679B (en)*2020-01-192022-06-17苏宁云计算有限公司 A rule parsing method and device
CN111552792B (en)*2020-04-302023-11-21中国建设银行股份有限公司Information query method and device, electronic equipment and storage medium
CN111597205B (en)*2020-05-262024-02-13北京金堤科技有限公司Template configuration method, information extraction device, electronic equipment and medium
CN111984883B (en)*2020-08-112024-05-14北京百度网讯科技有限公司Label mining method, device, equipment and storage medium
US12099536B2 (en)*2020-09-232024-09-24Entigenlogic LlcExtracting knowledge from a knowledge database
CN112749200A (en)*2020-12-212021-05-04北京百分点科技集团股份有限公司Crowd screening method and device
US20230195764A1 (en)*2021-12-172023-06-22CrushBank Technology Inc.Automatic real-time information augmentation for support tickets
CN114969193B (en)*2022-05-162025-04-01成都数之联科技股份有限公司 A method, system, device and medium for generating chart data
CN115952303A (en)*2022-12-262023-04-11上海金融期货信息技术有限公司 Atomization-based smart label system management and application system
CN118733687B (en)*2024-09-042025-02-14杭州玳数科技有限公司 A real-time labeling method and device based on Flink
US12254029B1 (en)2024-10-152025-03-18AskTuring.AI Inc.Machine learning architecture for contextual data retrieval
CN119201986B (en)*2024-11-272025-03-11北京火山引擎科技有限公司Method, apparatus, device, medium and program product for retrieving information

Citations (50)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US5265065A (en)1991-10-081993-11-23West Publishing CompanyMethod and apparatus for information retrieval from a database by replacing domain specific stemmed phases in a natural language to create a search query
US6182029B1 (en)1996-10-282001-01-30The Trustees Of Columbia University In The City Of New YorkSystem and method for language extraction and encoding utilizing the parsing of text data in accordance with domain parameters
US6516312B1 (en)2000-04-042003-02-04International Business Machine CorporationSystem and method for dynamically associating keywords with domain-specific search engine queries
US20030126136A1 (en)*2001-06-222003-07-03Nosa OmoiguiSystem and method for knowledge retrieval, management, delivery and presentation
US6631346B1 (en)1999-04-072003-10-07Matsushita Electric Industrial Co., Ltd.Method and apparatus for natural language parsing using multiple passes and tags
US20030212664A1 (en)*2002-05-102003-11-13Martin BreiningQuerying markup language data sources using a relational query processor
US20040015548A1 (en)2002-07-172004-01-22Lee Jin WooMethod and system for displaying group chat sessions on wireless mobile terminals
US20040153435A1 (en)*2003-01-302004-08-05Decode Genetics Ehf.Method and system for defining sets by querying relational data using a set definition language
US20060053096A1 (en)*2004-09-082006-03-09Oracle International CorporationNatural language query construction using purpose-driven template
US20060074634A1 (en)2004-10-062006-04-06International Business Machines CorporationMethod and apparatus for fast semi-automatic semantic annotation
US20060106769A1 (en)*2004-11-122006-05-18Gibbs Kevin AMethod and system for autocompletion for languages having ideographs and phonetic characters
US7236972B2 (en)*2002-01-142007-06-26Speedtrack, Inc.Identifier vocabulary data access method and system
US20080097748A1 (en)*2004-11-122008-04-24Haley Systems, Inc.System for Enterprise Knowledge Management and Automation
US20080133479A1 (en)*2006-11-302008-06-05Endeca Technologies, Inc.Method and system for information retrieval with clustering
US7548847B2 (en)2002-05-102009-06-16Microsoft CorporationSystem for automatically annotating training data for a natural language understanding system
US20100049502A1 (en)2000-07-242010-02-25Microsoft CorporationMethod and system of generating reference variations for directory assistance data
US20100076972A1 (en)2008-09-052010-03-25Bbn Technologies Corp.Confidence links between name entities in disparate documents
US20100104087A1 (en)2008-10-272010-04-29International Business Machines CorporationSystem and Method for Automatically Generating Adaptive Interaction Logs from Customer Interaction Text
US20100145902A1 (en)2008-12-092010-06-10Ita Software, Inc.Methods and systems to train models to extract and integrate information from data sources
US20100293195A1 (en)2009-05-122010-11-18Comcast Interactive Media, LlcDisambiguation and Tagging of Entities
US7983914B2 (en)2005-08-102011-07-19Nuance Communications, Inc.Method and system for improved speech recognition by degrading utterance pronunciations
US20110224982A1 (en)2010-03-122011-09-15c/o Microsoft CorporationAutomatic speech recognition based upon information retrieval methods
US20110313757A1 (en)2010-05-132011-12-22Applied Linguistics LlcSystems and methods for advanced grammar checking
US20110320203A1 (en)2004-07-222011-12-29Nuance Communications, Inc.Method and system for identifying and correcting accent-induced speech recognition difficulties
US20120166182A1 (en)*2009-06-032012-06-28Ko David HAutocompletion for Partially Entered Query
US20120215533A1 (en)2011-01-262012-08-23Veveo, Inc.Method of and System for Error Correction in Multiple Input Modality Search Engines
US20120323574A1 (en)2011-06-172012-12-20Microsoft CorporationSpeech to text medical forms
US20130006613A1 (en)2010-02-012013-01-03Ginger Software, Inc.Automatic context sensitive language correction using an internet corpus particularly for small keyboard devices
US20130080152A1 (en)2011-09-262013-03-28Xerox CorporationLinguistically-adapted structural query annotation
US20130166598A1 (en)*2011-12-272013-06-27Business Objects Software Ltd.Managing Business Objects Data Sources
US20130332162A1 (en)2012-06-082013-12-12Apple Inc.Systems and Methods for Recognizing Textual Identifiers Within a Plurality of Words
US20140012580A1 (en)2012-07-092014-01-09Nuance Communications, Inc.Detecting potential significant errors in speech recognition results
US20140019435A1 (en)*2012-07-162014-01-16Politecnico Di MilanoMethod and system of management of queries for crowd searching
US8688447B1 (en)2013-08-212014-04-01Ask Ziggy, Inc.Method and system for domain-specific noisy channel natural language processing (NLP)
US20140093845A1 (en)2011-10-262014-04-03Sk Telecom Co., Ltd.Example-based error detection system for automatic evaluation of writing, method for same, and error detection apparatus for same
US8694305B1 (en)2013-03-152014-04-08Ask Ziggy, Inc.Natural language processing (NLP) portal for third party applications
US8700404B1 (en)2005-08-272014-04-15At&T Intellectual Property Ii, L.P.System and method for using semantic and syntactic graphs for utterance classification
US20140129218A1 (en)2012-06-062014-05-08Spansion LlcRecognition of Speech With Different Accents
US20140163959A1 (en)2012-12-122014-06-12Nuance Communications, Inc.Multi-Domain Natural Language Processing Architecture
US20140201202A1 (en)2008-05-012014-07-17Chacha Search, IncMethod and system for improvement of request processing
US20140244246A1 (en)2013-02-262014-08-28Honeywell International Inc.System and method for correcting accent induced speech transmission problems
US20140330819A1 (en)2013-05-032014-11-06Rajat RainaSearch Query Interactions on Online Social Networks
US20140344173A1 (en)*2013-04-022014-11-20Kpmg LlpSystem and method for creating executable policy rules for execution on rules-based engines
US8914277B1 (en)2011-09-202014-12-16Nuance Communications, Inc.Speech and language translation of an utterance
US20150046371A1 (en)2011-04-292015-02-12Cbs Interactive Inc.System and method for determining sentiment from text content
US20150081658A1 (en)*2013-09-182015-03-19Ims Health IncorporatedSystem and method for fast query response
US20160012020A1 (en)2014-07-142016-01-14Samsung Electronics Co., Ltd.Method and system for robust tagging of named entities in the presence of source or translation errors
US20160371392A1 (en)*2015-06-172016-12-22Qualcomm IncorporatedSelectively indexing data entries within a semi-structured database
US20170024459A1 (en)2015-07-242017-01-26International Business Machines CorporationProcessing speech to text queries by optimizing conversion of speech queries to text
US20170024431A1 (en)2015-07-242017-01-26International Business Machines CorporationGenerating and executing query language statements from natural language

Patent Citations (51)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US5265065A (en)1991-10-081993-11-23West Publishing CompanyMethod and apparatus for information retrieval from a database by replacing domain specific stemmed phases in a natural language to create a search query
US6182029B1 (en)1996-10-282001-01-30The Trustees Of Columbia University In The City Of New YorkSystem and method for language extraction and encoding utilizing the parsing of text data in accordance with domain parameters
US6631346B1 (en)1999-04-072003-10-07Matsushita Electric Industrial Co., Ltd.Method and apparatus for natural language parsing using multiple passes and tags
US6516312B1 (en)2000-04-042003-02-04International Business Machine CorporationSystem and method for dynamically associating keywords with domain-specific search engine queries
US20100049502A1 (en)2000-07-242010-02-25Microsoft CorporationMethod and system of generating reference variations for directory assistance data
US20030126136A1 (en)*2001-06-222003-07-03Nosa OmoiguiSystem and method for knowledge retrieval, management, delivery and presentation
US7236972B2 (en)*2002-01-142007-06-26Speedtrack, Inc.Identifier vocabulary data access method and system
US20030212664A1 (en)*2002-05-102003-11-13Martin BreiningQuerying markup language data sources using a relational query processor
US7548847B2 (en)2002-05-102009-06-16Microsoft CorporationSystem for automatically annotating training data for a natural language understanding system
US20040015548A1 (en)2002-07-172004-01-22Lee Jin WooMethod and system for displaying group chat sessions on wireless mobile terminals
US20040153435A1 (en)*2003-01-302004-08-05Decode Genetics Ehf.Method and system for defining sets by querying relational data using a set definition language
US20110320203A1 (en)2004-07-222011-12-29Nuance Communications, Inc.Method and system for identifying and correcting accent-induced speech recognition difficulties
US20060053096A1 (en)*2004-09-082006-03-09Oracle International CorporationNatural language query construction using purpose-driven template
US20060074634A1 (en)2004-10-062006-04-06International Business Machines CorporationMethod and apparatus for fast semi-automatic semantic annotation
US20080097748A1 (en)*2004-11-122008-04-24Haley Systems, Inc.System for Enterprise Knowledge Management and Automation
US20060106769A1 (en)*2004-11-122006-05-18Gibbs Kevin AMethod and system for autocompletion for languages having ideographs and phonetic characters
US7983914B2 (en)2005-08-102011-07-19Nuance Communications, Inc.Method and system for improved speech recognition by degrading utterance pronunciations
US8700404B1 (en)2005-08-272014-04-15At&T Intellectual Property Ii, L.P.System and method for using semantic and syntactic graphs for utterance classification
US20080133479A1 (en)*2006-11-302008-06-05Endeca Technologies, Inc.Method and system for information retrieval with clustering
US20140201202A1 (en)2008-05-012014-07-17Chacha Search, IncMethod and system for improvement of request processing
US20100076972A1 (en)2008-09-052010-03-25Bbn Technologies Corp.Confidence links between name entities in disparate documents
US20100104087A1 (en)2008-10-272010-04-29International Business Machines CorporationSystem and Method for Automatically Generating Adaptive Interaction Logs from Customer Interaction Text
US20100145902A1 (en)2008-12-092010-06-10Ita Software, Inc.Methods and systems to train models to extract and integrate information from data sources
US20100293195A1 (en)2009-05-122010-11-18Comcast Interactive Media, LlcDisambiguation and Tagging of Entities
US20120166182A1 (en)*2009-06-032012-06-28Ko David HAutocompletion for Partially Entered Query
US20130006613A1 (en)2010-02-012013-01-03Ginger Software, Inc.Automatic context sensitive language correction using an internet corpus particularly for small keyboard devices
US20110224982A1 (en)2010-03-122011-09-15c/o Microsoft CorporationAutomatic speech recognition based upon information retrieval methods
US20110313757A1 (en)2010-05-132011-12-22Applied Linguistics LlcSystems and methods for advanced grammar checking
US20120215533A1 (en)2011-01-262012-08-23Veveo, Inc.Method of and System for Error Correction in Multiple Input Modality Search Engines
US20150046371A1 (en)2011-04-292015-02-12Cbs Interactive Inc.System and method for determining sentiment from text content
US20120323574A1 (en)2011-06-172012-12-20Microsoft CorporationSpeech to text medical forms
US8914277B1 (en)2011-09-202014-12-16Nuance Communications, Inc.Speech and language translation of an utterance
US20130080152A1 (en)2011-09-262013-03-28Xerox CorporationLinguistically-adapted structural query annotation
US20140093845A1 (en)2011-10-262014-04-03Sk Telecom Co., Ltd.Example-based error detection system for automatic evaluation of writing, method for same, and error detection apparatus for same
US20130166598A1 (en)*2011-12-272013-06-27Business Objects Software Ltd.Managing Business Objects Data Sources
US20140129218A1 (en)2012-06-062014-05-08Spansion LlcRecognition of Speech With Different Accents
US20130332162A1 (en)2012-06-082013-12-12Apple Inc.Systems and Methods for Recognizing Textual Identifiers Within a Plurality of Words
US20140012580A1 (en)2012-07-092014-01-09Nuance Communications, Inc.Detecting potential significant errors in speech recognition results
US20140019435A1 (en)*2012-07-162014-01-16Politecnico Di MilanoMethod and system of management of queries for crowd searching
US20140163959A1 (en)2012-12-122014-06-12Nuance Communications, Inc.Multi-Domain Natural Language Processing Architecture
US20140244246A1 (en)2013-02-262014-08-28Honeywell International Inc.System and method for correcting accent induced speech transmission problems
US8694305B1 (en)2013-03-152014-04-08Ask Ziggy, Inc.Natural language processing (NLP) portal for third party applications
US20140344173A1 (en)*2013-04-022014-11-20Kpmg LlpSystem and method for creating executable policy rules for execution on rules-based engines
US20140330819A1 (en)2013-05-032014-11-06Rajat RainaSearch Query Interactions on Online Social Networks
US8688447B1 (en)2013-08-212014-04-01Ask Ziggy, Inc.Method and system for domain-specific noisy channel natural language processing (NLP)
US20150081658A1 (en)*2013-09-182015-03-19Ims Health IncorporatedSystem and method for fast query response
US20160012020A1 (en)2014-07-142016-01-14Samsung Electronics Co., Ltd.Method and system for robust tagging of named entities in the presence of source or translation errors
US20160371392A1 (en)*2015-06-172016-12-22Qualcomm IncorporatedSelectively indexing data entries within a semi-structured database
US20170024459A1 (en)2015-07-242017-01-26International Business Machines CorporationProcessing speech to text queries by optimizing conversion of speech queries to text
US20170025120A1 (en)2015-07-242017-01-26International Business Machines CorporationProcessing speech to text queries by optimizing conversion of speech queries to text
US20170024431A1 (en)2015-07-242017-01-26International Business Machines CorporationGenerating and executing query language statements from natural language

Non-Patent Citations (9)

* Cited by examiner, † Cited by third party
Title
Androutsopoulos et al., "Natural Language Interfaces to Databases-An Introduction", arXiv:cmp-Ig/9503016v2, Mar. 16, 1995, pp. 1-50.
Androutsopoulos et al., "Natural Language Interfaces to Databases—An Introduction", arXiv:cmp-Ig/9503016v2, Mar. 16, 1995, pp. 1-50.
CucerZan, et al., "Spelling correction as an iterative process that exploits the collective knowledge of web users", Retrieved at http://acl.ldc.upenn.edu/acl2004/emnlp/pdf/CucerZan.pdf >>, Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, a meeting of SIGDAT, a Special Interest Group of the ACL, held in conjunction with ACL, Jul. 25-26, 2004, pp. 8.
List of IBM Patents or Patent Applications Treated as Related, Apr. 28, 2016.
List of IBM Patents or Patent Applications Treated as Related, Jul. 24, 2015, 1 page.
List of IBM Patents or Patent Applications Treated as Related, May 4, 2016, 2 pages.
Michael Tjalve., "Accent Features and Idiodictionaries: On Improving Accuracy for Accented Speakers in ASR", University College London , PhD in Experimental Phonetics, Mar. 2007, 236 pages.
Tablan et al., "A Natural Language Query Interface to Structured Information", Department of Computer Science, University of Sheffield, pp. 1-15, 2008.
Yanli Zheng et al., "Accent Detection and Speech Recognition for Shanghai-Accented Mandarin", INTERSPEECH, 2005, pp. 217-220.

Also Published As

Publication numberPublication date
US20170024431A1 (en)2017-01-26
US10169471B2 (en)2019-01-01
US20170024443A1 (en)2017-01-26

Similar Documents

PublicationPublication DateTitle
US10169471B2 (en)Generating and executing query language statements from natural language
US10776082B2 (en)Programming environment augment with automated dialog system assistance
US10055410B1 (en)Corpus-scoped annotation and analysis
US10303689B2 (en)Answering natural language table queries through semantic table representation
US11226960B2 (en)Natural-language database interface with automated keyword mapping and join-path inferences
US20200334313A1 (en)Personalizing a search of a search service
JP2018533126A (en) Method, system, and computer program product for a natural language interface to a database
US11630833B2 (en)Extract-transform-load script generation
US10592304B2 (en)Suggesting application programming interfaces based on feature and context analysis
US11151323B2 (en)Embedding natural language context in structured documents using document anatomy
US10198501B2 (en)Optimizing retrieval of data related to temporal based queries
US20190163781A1 (en)Learning user synonyms from sequenced query sessions
US11487801B2 (en)Dynamic data visualization from factual statements in text
US20180089569A1 (en)Generating a temporal answer to a question
WO2023103814A1 (en)Extracting query-related temporal information from unstructured text documents
US11940953B2 (en)Assisted updating of electronic documents
US10776408B2 (en)Natural language search using facets
US11222051B2 (en)Document analogues through ontology matching
US11443101B2 (en)Flexible pseudo-parsing of dense semi-structured text
US20190065583A1 (en)Compound q&amp;a system
US10956436B2 (en)Refining search results generated from a combination of multiple types of searches
US12254033B2 (en)Search in knowledge graphs
US11995070B2 (en)Query expression error detection and correction
US11238088B2 (en)Video management system
US11822591B2 (en)Query-based granularity selection for partitioning recordings

Legal Events

DateCodeTitleDescription
ASAssignment

Owner name:INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y

Free format text:ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:DAYAN, YIGAL S.;MAGDALEN, JOSEMINA M.;MAHARIAN, IRIT;AND OTHERS;SIGNING DATES FROM 20150716 TO 20150721;REEL/FRAME:036178/0091

STCFInformation on status: patent grant

Free format text:PATENTED CASE

MAFPMaintenance fee payment

Free format text:PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment:4


[8]ページ先頭

©2009-2025 Movatter.jp