Movatterモバイル変換


[0]ホーム

URL:


US20040163035A1 - Method for automatic and semi-automatic classification and clustering of non-deterministic texts - Google Patents

Method for automatic and semi-automatic classification and clustering of non-deterministic texts
Download PDF

Info

Publication number
US20040163035A1
US20040163035A1US10/771,315US77131504AUS2004163035A1US 20040163035 A1US20040163035 A1US 20040163035A1US 77131504 AUS77131504 AUS 77131504AUS 2004163035 A1US2004163035 A1US 2004163035A1
Authority
US
United States
Prior art keywords
recited
documents
word sequences
data mining
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US10/771,315
Inventor
Assaf Ariel
Michael Brand
Itsik Horowitz
Ofer Shochet
Itzik Stauber
Dror Ziv
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Credit Suisse AG
Original Assignee
Verint Systems Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Verint Systems IncfiledCriticalVerint Systems Inc
Priority to US10/771,315priorityCriticalpatent/US20040163035A1/en
Assigned to VERINT SYSTEMS INC.reassignmentVERINT SYSTEMS INC.ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS).Assignors: ARIEL, ASSAF, BRAND, MICHAEL, HOROWITZ, ITSIK, SHOCHET, OFER, STAUBER, ITZIK, ZIV, DROR DANIEL
Publication of US20040163035A1publicationCriticalpatent/US20040163035A1/en
Assigned to LEHMAN COMMERCIAL PAPER INC., AS ADMINISTRATIVE AGENTreassignmentLEHMAN COMMERCIAL PAPER INC., AS ADMINISTRATIVE AGENTSECURITY AGREEMENTAssignors: VERINT SYSTEMS INC.
Assigned to CREDIT SUISSE AS ADMINISTRATIVE AGENTreassignmentCREDIT SUISSE AS ADMINISTRATIVE AGENTASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS).Assignors: LEHMAN COMMERCIAL PAPER INC., VERINT SYSTEMS INC.
Assigned to VERINT SYSTEMS INC., VERINT AMERICAS INC., VERINT VIDEO SOLUTIONS INC.reassignmentVERINT SYSTEMS INC.RELEASE OF SECURITY INTERESTAssignors: CREDIT SUISSE AG
Abandonedlegal-statusCriticalCurrent

Links

Images

Classifications

Definitions

Landscapes

Abstract

Non-deterministic text with average word recognition precision below 50% is processed utilizing non-textual differences between words or sequences of words in the text to provide more useful information to users by resolving more than two decision options. One or more indexes that indicate non-textual differences between n-word sequences, where n is a positive integer, may be generated for use in data mining that considers the non-textual differences. Alternatively, multiple indexes may be generated using different data mining techniques that may or may not utilize non-textual differences and then the results produced by the different data mining techniques may be merged to identify non-textual differences. These techniques may be used in classifying, labeling, categorizing, filtering, clustering, or retrieving documents, or in discovering salient terms in a set of documents.

Description

Claims (105)

What is claimed is:
1. A document processing method, comprising:
processing documents derived from at least one of spontaneous and conversational expression and containing non-deterministic text with average word recognition precision below 50 percent, said processing utilizing non-textual differences between n-word sequences in the documents to resolve more than two decision options, where n is a positive integer.
2. A method as recited inclaim 1, wherein said processing includes data mining of the documents.
3. A method as recited inclaim 2, wherein said data mining includes retrieving at least one of the documents utilizing the non-textual differences between the n-word sequences in the documents.
4. A method as recited inclaim 2, wherein said data mining includes extracting parameters from the documents, utilizing the non-textual differences between said n-word sequences.
5. A method as recited inclaim 4, wherein said data mining further includes producing graphic results indicating the relations between the parameters extracted from the documents.
6. A method as recited inclaim 4, wherein at least one of the parameters extracted from the documents is an assessment of relevance to a query based on the non-textual differences between the n-word sequences.
7. A method as recited inclaim 4, wherein at least one of the extracted parameters is an assessment of a hidden variable that cannot be fully determined from information existing in the document.
8. A method as recited inclaim 4, wherein at least one of the extracted parameters is the assessment of the document's relevance to a category.
9. A method as recited inclaim 2, wherein said processing includes categorizing the documents.
10. A method as recited inclaim 9, wherein said categorizing includes use of at least one algorithm to detect salient terms in the documents based on non-linguistic differences between the n-word sequences.
11. A method as recited inclaim 2, further comprising clustering the documents.
12. A method as recited inclaim 11, wherein said clustering includes discovering salient terms in the documents based on non-linguistic differences between the n-word sequences.
13. A method as recited inclaim 11, wherein said clustering includes assessing a relation between the n-word sequences based on non-textual differences.
14. A method as recited inclaim 4, wherein said data mining includes establishing relations between the parameters extracted from the documents.
15. A method as recited inclaim 1, wherein the non-textual differences between the n-word sequences relate to recognition confidence of the n-word sequences.
16. A method as recited inclaim 1, further comprising at least one of classifying and filtering the documents as the documents are received.
17. A method as recited inclaim 1, further comprising labeling the documents as the documents are received.
18. A method as recited inclaim 1, further comprising displaying information related to at least one of the documents, including at least some of the non-textual differences between the n-word sequences.
19. A method as recited inclaim 18, wherein said displaying uses at least one of gray scaling, color, font-size and font style to indicate at least some of the non-textual differences between the n-word sequences.
20. A method as recited inclaim 18, wherein said displaying selectively displays portions of the at least one of the documents based on confidence of accuracy of words displayed.
21. A method as recited inclaim 18, wherein said displaying further displays salient terms in the at least one of the documents based on said processing of confidence levels of the salient terms that resolves more than two decision options.
22. A method as recited inclaim 21, wherein a number of the salient terms are available for display and said displaying is further based on the number of the salient terms available for display and available space for display of the salient terms.
23. A method as recited inclaim 1, further comprising:
receiving user input indicating errors in recognition; and
replacing at least one word in the document with a corrected word based on the user input and setting the confidence levels of the corrected word to indicate high recognition accuracy.
24. A method as recited inclaim 1, further comprising generating the documents by automatic speech recognition of audio signals received via a telephone system.
25. A method as recited inclaim 1, further comprising generating the documents by automatic character recognition.
26. A method as recited inclaim 1, further comprising generating the documents by a fact extraction system.
27. A method as recited inclaim 1, wherein said processing includes
applying different data mining techniques, each of which does not indicate non-textual differences; and
merging results of the different data mining techniques to obtain results that are dependent on the non-textual differences between the n-word sequences.
28. A method as recited inclaim 27, wherein the different data mining techniques include at least one of retrieving, categorizing, filtering, classifying, labeling and clustering documents without utilization of any non-textual differences between the n-word sequences.
29. A method as recited inclaim 27,
wherein said applying uses a plurality of different algorithms to transform non-deterministic text into standard text documents usable in text mining, and
wherein the data mining techniques operate on the standard text documents.
30. A method as recited inclaim 29,
wherein said processing further includes generating a plurality of indexes of the standard text documents, and
wherein the data mining techniques operate on the indexes to obtain the results.
31. A method as recited inclaim 30, wherein the data mining techniques include
receiving a query; and
retrieving the results relevant to the query:
32. A method as recited inclaim 30, wherein the data mining of at least some of the different indexes is performed by data mining software that does not output non-textual differences.
33. A method as recited inclaim 29, wherein the different algorithms are thresholding algorithms using different confidence thresholds to determine omitted words that fall below the confidence thresholds.
34. A method as recited inclaim 1, further comprising:
receiving user input indicating a change in labeling of at least one document; and
replacing at least part of information provided by at least one label for the at least one document based on the user input.
35. A document processing method, comprising:
producing at least one index of n-word sequences in documents derived from at least one of spontaneous and conversational expression and containing non-deterministic text with average word recognition precision below 50 percent, utilizing non-textual differences between the n-word sequences, where n is a positive integer; and
processing the documents based on the non-textual differences between the n-word sequences in the at least one index, where said processing resolves more than two decision options.
36. A method as recited inclaim 35, wherein the non-textual differences between the n-word sequences relate to recognition confidence of the n-word sequences.
37. At least one computer readable medium storing instructions for controlling at least one computer system to perform a document processing method comprising:
processing documents derived from at least one of spontaneous and conversational expression and containing non-deterministic text with average word recognition precision below 50 percent, said processing utilizing non-textual differences between n-word sequences in the documents, where n is a positive integer and said processing resolves more than two decision options.
38. At least one computer readable medium as recited inclaim 37, wherein said processing includes data mining of the documents.
39. At least one computer readable medium as recited inclaim 38, wherein said data mining includes retrieving at least one of the documents utilizing the non-textual differences between the n-word sequences in the documents.
40. At least one computer readable medium as recited inclaim 38, wherein said data mining includes
extracting parameters from the documents, utilizing the non-textual differences between said n-word sequences; and
establishing relations between the parameters extracted from the documents.
41. At least one computer readable medium as recited inclaim 40, wherein said data mining further includes producing graphic results indicating the relations between the parameters extracted from the documents.
42. At least one computer readable medium as recited inclaim 40, wherein at least one of the parameters extracted from the documents is an assessment of relevance to a query based on the non-textual differences between the n-word sequences.
43. At least one computer readable medium as recited inclaim 40, wherein at least one of the extracted parameters is an assessment of a hidden variable that cannot be fully determined from information existing in the document.
44. At least one computer readable medium as recited inclaim 40, wherein at least one of the extracted parameters is the assessment of the document's relevance to a category.
45. At least one computer readable medium as recited inclaim 38, wherein said processing includes categorizing the documents.
46. At least one computer readable medium as recited inclaim 45, wherein said categorizing includes use of at least one algorithm to detect salient terms in the documents based on non-linguistic differences between the n-word sequences.
47. At least one computer readable medium as recited inclaim 38, further comprising clustering the documents.
48. At least one computer readable medium as recited inclaim 47, wherein said clustering includes discovering salient terms in the documents based on non-linguistic differences between the n-word sequences.
49. At least one computer readable medium as recited inclaim 47, wherein said clustering includes assessing a relation between the n-word sequences based on non-textual differences.
50. At least one computer readable medium as recited inclaim 37, wherein the non-textual differences between the n-word sequences relate to recognition confidence of the n-word sequences.
51. At least one computer readable medium as recited inclaim 37, further comprising at least one of classifying and filtering the documents as the documents are received.
52. At least one computer readable medium as recited inclaim 37, further comprising labeling the documents as the documents are received.
53. At least one computer readable medium as recited inclaim 37, further comprising displaying information related to at least one of the documents, including at least some of the non-textual differences between the n-word sequences.
54. At least one computer readable medium as recited inclaim 53, wherein said displaying uses at least one of gray scaling, color, font-size and font style to indicate at least some of the non-textual differences between the n-word sequences.
55. At least one computer readable medium as recited inclaim 53, wherein said displaying selectively displays portions of the at least one of the documents based on confidence of accuracy of words displayed.
56. At least one computer readable medium as recited inclaim 53, wherein said displaying further displays salient terms in the at least one of the documents based on said processing of confidence levels of the salient terms that resolves more than two decision options.
57. At least one computer readable medium as recited inclaim 56, wherein a number of the salient terms are available for display and said displaying is further based on the number of the salient terms available for display and available space for display of the salient terms.
58. At least one computer readable medium as recited inclaim 37, further comprising:
receiving user input indicating errors in recognition; and
replacing at least one word in the document with a corrected word based on the user input and setting the confidence levels of the corrected word to indicate high recognition accuracy.
59. At least one computer readable medium as recited inclaim 37, further comprising generating the documents by automatic speech recognition of audio signals received via a telephone system.
60. At least one computer readable medium as recited inclaim 37, further comprising generating the documents by automatic character recognition.
61. At least one computer readable medium as recited inclaim 37, further comprising generating the documents by a fact extraction system.
62. At least one computer readable medium as recited inclaim 37, wherein said processing includes
applying different data mining techniques, each of which does not indicate non-textual differences; and
merging results of the different data mining techniques to obtain the non-textual differences between the n-word sequences.
63. At least one computer readable medium as recited inclaim 62, wherein the different data mining techniques include at least one of retrieving, categorizing, filtering, classifying, labeling and clustering documents without utilization of any non-textual differences between the n-word sequences.
64. At least one computer readable medium as recited inclaim 62,
wherein said applying uses a plurality of different algorithms to transform non-deterministic text into standard text documents usable in text mining, and
wherein the data mining techniques operate on the standard text documents.
65. At least one computer readable medium as recited inclaim 64,
wherein said processing further includes generating a plurality of indexes of the standard text documents, and
wherein the data mining techniques operate on the indexes to obtain the results.
66. At least one computer readable medium as recited inclaim 65, wherein the data mining techniques include
receiving a query; and
retrieving the results relevant to the query.
67. At least one computer readable medium as recited inclaim 66, wherein the data mining of at least some of the different indexes is performed by data mining software that does not output non-textual differences.
68. At least one computer readable medium as recited inclaim 64, wherein the different algorithms are thresholding algorithms using different confidence thresholds to determine omitted words that fall below the confidence thresholds.
69. At least one computer readable medium as recited inclaim 37, further comprising:
receiving user input indicating a change in labeling of at least one document; and
replacing at least part of information provided by at least one label for the at least one document based on the user input.
70. At least one computer readable medium for controlling at least one computer system to perform document processing method, comprising:
producing at least one index of n-word sequences in documents derived from at least one of spontaneous and conversational expression and containing non-deterministic text with average word recognition precision below 50 percent, utilizing non-textual differences between the n-word sequences, where n is a positive integer; and
processing the documents based on the non-textual differences between the n-word sequences in the at least one index, where said processing resolves more than two decision options.
71. At least one computer readable medium as recited inclaim 70, wherein the non-textual differences between the n-word sequences relate to recognition confidence of the n-word sequences.
72. An apparatus for processing documents, comprising:
processing means for processing documents derived from at least one of spontaneous and conversational expression and containing non-deterministic text with average word recognition precision below 50 percent, said processing utilizing non-textual differences between n-word sequences in the documents, where n is a positive integer and said processing resolves more than two decision options.
73. An apparatus as recited inclaim 72, wherein said processing means comprises index means for producing at least one index of the n-word sequences utilizing the non-textual differences between the n-word sequences.
74. An apparatus as recited inclaim 73, wherein said processing means comprises data mining means for retrieving at least one of the documents utilizing the at least one index.
75. An apparatus as recited inclaim 74,
wherein said data mining means comprises:
parameter extraction means for extracting parameters from the documents, utilizing the non-textual differences between said n-word sequences; and
relations establishment means for establishing relations between the parameters extracted from the documents, and
wherein said apparatus further comprises display means for producing graphic results indicating the relations between the parameters extracted from the documents.
76. An apparatus as recited inclaim 75, wherein at least one of the extracted parameters is an assessment of a hidden variable that cannot be fully determined from information existing in the at least one of the documents.
77. An apparatus as recited inclaim 72, wherein the non-textual differences between the n-word sequences relate to recognition confidence of the n-word sequences.
78. An apparatus as recited inclaim 72, wherein said processing means comprises categorizing means for categorizing the documents utilizing at least one algorithm based on non-linguistic differences between the n-word sequences.
79. An apparatus as recited inclaim 72, wherein said processing means comprises clustering means for clustering the documents by assessing a relation between the n-word sequences based on non-textual differences.
80. An apparatus as recited inclaim 72, wherein said processing means comprises means for at least one of classifying and filtering the documents as the documents are received.
81. An apparatus as recited inclaim 72, further comprising display means for displaying information related to at least one of the documents, including at least some of the non-textual differences between the n-word sequences.
82. An apparatus as recited inclaim 81, wherein said display means selectively displays portions of the at least one of the documents based on confidence of accuracy of words displayed.
83. An apparatus as recited inclaim 72,
further comprising input means for receiving user input indicating errors in recognition, and
wherein said processing means comprises means for replacing at least one word in the at least one of the documents with a corrected word based on the user input and setting the confidence levels of the corrected word to indicate high recognition accuracy.
84. An apparatus as recited inclaim 72, coupled to a telephone system and further comprising automatic speech recognition means for generating the documents by automatic speech recognition of audio signals received via the telephone system.
85. An apparatus as recited inclaim 72, further comprising automatic character recognition means for generating the documents by automatic character recognition.
86. An apparatus as recited inclaim 72, wherein said processing means comprises:
data mining means for applying different data mining techniques, each of which does not indicate non-textual differences; and
merge means for merging results of the different data mining techniques to obtain the non-textual differences between the n-word sequences.
87. An apparatus as recited inclaim 86, wherein said data mining means includes means for at least one of retrieving, categorizing, filtering, classifying, labeling and clustering documents without utilization of any non-textual differences between the n-word sequences.
88. An apparatus as recited inclaim 87, wherein said data mining means uses a plurality of different algorithms to transform non-deterministic text into standard text documents usable in text mining and the data mining techniques operate on the standard text documents.
89. An apparatus as recited inclaim 87,
further comprising indexing means for generating a plurality of indexes of the standard text documents, and
wherein said data mining means uses the different indexes in applying the different data mining techniques.
90. An apparatus as recited inclaim 89,
further comprising input means for receiving a query; and
wherein said data mining means further includes retrieving means for retrieving the results relevant to the query.
91. A data processing system, comprising:
at least one server to process documents, derived from at least one of spontaneous and conversational expression and containing non-deterministic text with word recognition precision of less than 50 percent, utilizing non-textual differences between n-word sequences, where n is a positive integer.
92. A data processing system as recited inclaim 91, wherein said at least one server includes an indexing server producing at least one index of the n-word sequences utilizing the non-textual differences between the n-word sequences,
93. A data processing system as recited inclaim 92, wherein said indexing server retrieves at least one of the documents utilizing data mining of the at least one index.
94. A data processing system as recited inclaim 91,
wherein said at least one server extracts parameters from the documents, utilizing the non-textual differences between said n-word sequences, and establishes relations between the parameters extracted from the documents, and
wherein said data processing system further comprises at least one display device producing graphic results indicating the relations between the parameters extracted from the documents.
95. A data processing system as recited inclaim 94, wherein at least one of the extracted parameters is an assessment of a hidden variable that cannot be fully determined from information existing in the at least one of the documents.
96. A data processing system as recited inclaim 91, wherein the non-textual differences between the n-word sequences relate to recognition confidence of the n-word sequences.
97. A data processing system as recited inclaim 91, further comprising at least one display device displaying information related to at least one of the documents, including at least some of the non-textual differences between the n-word sequences
98. A data processing system as recited inclaim 97, wherein said at least one display device selectively displays portions of at least one of the documents based on confidence of accuracy of words displayed.
99. A data processing system as recited inclaim 91, wherein said at least one server applies different data mining techniques, each of which does not indicate non-textual differences and merges results of the different data mining techniques to obtain the non-textual differences between the n-word sequences.
100. A data processing system as recited inclaim 99, wherein said at least one server uses a plurality of different algorithms to transform non-deterministic text into standard text documents usable in text mining and the data mining techniques operate on the standard text documents.
101. A data processing system as recited inclaim 100, wherein said at least one server generates a plurality of indexes of the standard text documents and uses the different indexes in applying the different data mining techniques.
102. A data processing system as recited inclaim 99, wherein the different data mining techniques include at least one of retrieving, categorizing, filtering, classifying, labeling and clustering documents without utilization of any non-textual differences between the n-word sequences.
103. A data processing system as recited inclaim 102, wherein said at least one server uses a plurality of different algorithms to transform non-deterministic text into standard text documents usable in text mining and the data mining techniques operate on the standard text documents.
104. A data processing system as recited inclaim 91,
further comprising at least one user terminal providing user input indicating errors in recognition in a document, and
wherein said at least one server replaces at least one word in the document with a corrected word based on the user input and sets confidence levels of the corrected word to indicate high recognition accuracy.
105. A data processing system as recited inclaim 91, further comprising at least one of an automatic speech recognition unit, an automatic character recognition unit and a fact extraction unit to generate the documents from data that on average produces word recognition precision of less than 50 percent.
US10/771,3152003-02-052004-02-05Method for automatic and semi-automatic classification and clustering of non-deterministic textsAbandonedUS20040163035A1 (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
US10/771,315US20040163035A1 (en)2003-02-052004-02-05Method for automatic and semi-automatic classification and clustering of non-deterministic texts

Applications Claiming Priority (2)

Application NumberPriority DateFiling DateTitle
US44498203P2003-02-052003-02-05
US10/771,315US20040163035A1 (en)2003-02-052004-02-05Method for automatic and semi-automatic classification and clustering of non-deterministic texts

Publications (1)

Publication NumberPublication Date
US20040163035A1true US20040163035A1 (en)2004-08-19

Family

ID=32869303

Family Applications (4)

Application NumberTitlePriority DateFiling Date
US10/771,409Active2027-03-11US7792671B2 (en)2003-02-052004-02-05Augmentation and calibration of output from non-deterministic text generators by modeling its characteristics in specific environments
US10/771,315AbandonedUS20040163035A1 (en)2003-02-052004-02-05Method for automatic and semi-automatic classification and clustering of non-deterministic texts
US12/059,660AbandonedUS20080183468A1 (en)2003-02-052008-03-31Augmentation and calibration of output from non-deterministic text generators by modeling its characteristics in specific environments
US12/876,207Expired - LifetimeUS8195459B1 (en)2003-02-052010-09-06Augmentation and calibration of output from non-deterministic text generators by modeling its characteristics in specific environments

Family Applications Before (1)

Application NumberTitlePriority DateFiling Date
US10/771,409Active2027-03-11US7792671B2 (en)2003-02-052004-02-05Augmentation and calibration of output from non-deterministic text generators by modeling its characteristics in specific environments

Family Applications After (2)

Application NumberTitlePriority DateFiling Date
US12/059,660AbandonedUS20080183468A1 (en)2003-02-052008-03-31Augmentation and calibration of output from non-deterministic text generators by modeling its characteristics in specific environments
US12/876,207Expired - LifetimeUS8195459B1 (en)2003-02-052010-09-06Augmentation and calibration of output from non-deterministic text generators by modeling its characteristics in specific environments

Country Status (4)

CountryLink
US (4)US7792671B2 (en)
EP (2)EP1590798A2 (en)
IL (1)IL170065A (en)
WO (2)WO2004072780A2 (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20060067578A1 (en)*2004-09-302006-03-30Fuji Xerox Co., Ltd.Slide contents processor, slide contents processing method, and storage medium storing program
US20070050445A1 (en)*2005-08-312007-03-01Hugh HyndmanInternet content analysis
US20080027888A1 (en)*2006-07-312008-01-31Microsoft CorporationOptimization of fact extraction using a multi-stage approach
US20090012970A1 (en)*2007-07-022009-01-08Dror Daniel ZivRoot cause analysis using interactive data categorization
US20090249253A1 (en)*2008-03-312009-10-01Palm, Inc.Displaying mnemonic abbreviations for commands
US20090248647A1 (en)*2008-03-252009-10-01Omer ZivSystem and method for the quality assessment of queries
US20110072052A1 (en)*2008-05-282011-03-24Aptima Inc.Systems and methods for analyzing entity profiles
US8725732B1 (en)*2009-03-132014-05-13Google Inc.Classifying text into hierarchical categories
US20210312123A1 (en)*2020-04-032021-10-07Jon WardSystems and Methods For Cloud-Based Productivity Tools
US11244011B2 (en)*2015-10-232022-02-08International Business Machines CorporationIngestion planning for complex tables
US20230028717A1 (en)*2020-08-272023-01-26Capital One Services, LlcRepresenting Confidence in Natural Language Processing

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
EP1590798A2 (en)*2003-02-052005-11-02Verint Systems Inc.Method for automatic and semi-automatic classification and clustering of non-deterministic texts
US7856355B2 (en)*2005-07-052010-12-21Alcatel-Lucent Usa Inc.Speech quality assessment method and system
US20070078806A1 (en)*2005-10-052007-04-05Hinickle Judith AMethod and apparatus for evaluating the accuracy of transcribed documents and other documents
US20100027768A1 (en)*2006-11-032010-02-04Foskett James JAviation text and voice communication system
US8126891B2 (en)*2008-10-212012-02-28Microsoft CorporationFuture data event prediction using a generative model
US8379801B2 (en)*2009-11-242013-02-19Sorenson Communications, Inc.Methods and systems related to text caption error correction
US9070360B2 (en)*2009-12-102015-06-30Microsoft Technology Licensing, LlcConfidence calibration in automatic speech recognition systems
US8930189B2 (en)2011-10-282015-01-06Microsoft CorporationDistributed user input to text generated by a speech to text transcription service
US9870520B1 (en)*2013-08-022018-01-16Intuit Inc.Iterative process for optimizing optical character recognition
FR3010809B1 (en)*2013-09-182017-05-19Airbus Operations Sas METHOD AND DEVICE FOR AUTOMATIC MANAGEMENT ON BOARD AN AIRCRAFT AUDIO MESSAGE AIRCRAFT.
US11481087B2 (en)*2014-03-272022-10-25Sony CorporationElectronic device and method for identifying input commands of a user
US9858923B2 (en)*2015-09-242018-01-02Intel CorporationDynamic adaptation of language models and semantic tracking for automatic speech recognition
CN108777141B (en)*2018-05-312022-01-25康键信息技术(深圳)有限公司Test apparatus, test method, and storage medium
CN110110303A (en)*2019-03-282019-08-09苏州八叉树智能科技有限公司Newsletter archive generation method, device, electronic equipment and computer-readable medium
US12001206B2 (en)2020-01-162024-06-04Honeywell International Inc.Methods and systems for remote operation of vehicles using hands-free functionality
CN111581455B (en)*2020-04-282023-03-21北京字节跳动网络技术有限公司Text generation model generation method and device and electronic equipment
CN114637829B (en)*2022-02-212024-09-24阿里巴巴(中国)有限公司Recorded text processing method, device and computer readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US5625748A (en)*1994-04-181997-04-29Bbn CorporationTopic discriminator using posterior probability or confidence scores
US6397181B1 (en)*1999-01-272002-05-28Kent Ridge Digital LabsMethod and apparatus for voice annotation and retrieval of multimedia data
US20020178002A1 (en)*2001-05-242002-11-28International Business Machines CorporationSystem and method for searching, analyzing and displaying text transcripts of speech after imperfect speech recognition
US6598054B2 (en)*1999-01-262003-07-22Xerox CorporationSystem and method for clustering data objects in a collection
US20040083101A1 (en)*2002-10-232004-04-29International Business Machines CorporationSystem and method for data mining of contextual conversations

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US5550930A (en)*1991-06-171996-08-27Microsoft CorporationMethod and system for training a handwriting recognizer at the time of misrecognition
GB9709341D0 (en)*1997-05-081997-06-25British Broadcasting CorpMethod of and apparatus for editing audio or audio-visual recordings
AU2001245927A1 (en)*2000-03-242001-10-08Dragon Systems, Inc.Lexical analysis of telephone conversations with call center agents
US6839667B2 (en)*2001-05-162005-01-04International Business Machines CorporationMethod of speech recognition by presenting N-best word candidates
US6963834B2 (en)2001-05-292005-11-08International Business Machines CorporationMethod of speech recognition using empirically determined word candidates
EP1590798A2 (en)*2003-02-052005-11-02Verint Systems Inc.Method for automatic and semi-automatic classification and clustering of non-deterministic texts

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US5625748A (en)*1994-04-181997-04-29Bbn CorporationTopic discriminator using posterior probability or confidence scores
US6598054B2 (en)*1999-01-262003-07-22Xerox CorporationSystem and method for clustering data objects in a collection
US6397181B1 (en)*1999-01-272002-05-28Kent Ridge Digital LabsMethod and apparatus for voice annotation and retrieval of multimedia data
US20020178002A1 (en)*2001-05-242002-11-28International Business Machines CorporationSystem and method for searching, analyzing and displaying text transcripts of speech after imperfect speech recognition
US20040083101A1 (en)*2002-10-232004-04-29International Business Machines CorporationSystem and method for data mining of contextual conversations

Cited By (21)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US7698645B2 (en)*2004-09-302010-04-13Fuji Xerox Co., Ltd.Presentation slide contents processor for categorizing presentation slides and method for processing and categorizing slide contents
US20060067578A1 (en)*2004-09-302006-03-30Fuji Xerox Co., Ltd.Slide contents processor, slide contents processing method, and storage medium storing program
US20070050445A1 (en)*2005-08-312007-03-01Hugh HyndmanInternet content analysis
US20080027888A1 (en)*2006-07-312008-01-31Microsoft CorporationOptimization of fact extraction using a multi-stage approach
US7668791B2 (en)2006-07-312010-02-23Microsoft CorporationDistinguishing facts from opinions using a multi-stage approach
US20090012970A1 (en)*2007-07-022009-01-08Dror Daniel ZivRoot cause analysis using interactive data categorization
US9015194B2 (en)*2007-07-022015-04-21Verint Systems Inc.Root cause analysis using interactive data categorization
US20090248647A1 (en)*2008-03-252009-10-01Omer ZivSystem and method for the quality assessment of queries
US20090249253A1 (en)*2008-03-312009-10-01Palm, Inc.Displaying mnemonic abbreviations for commands
US9053088B2 (en)*2008-03-312015-06-09Qualcomm IncorporatedDisplaying mnemonic abbreviations for commands
US12216687B2 (en)2008-05-282025-02-04Aptima, Inc.Systems and methods for analyzing entity profiles
US9123022B2 (en)2008-05-282015-09-01Aptima, Inc.Systems and methods for analyzing entity profiles
US9594825B2 (en)2008-05-282017-03-14Aptima, Inc.Systems and methods for analyzing entity profiles
US20110072052A1 (en)*2008-05-282011-03-24Aptima Inc.Systems and methods for analyzing entity profiles
US11461373B2 (en)2008-05-282022-10-04Aptima, Inc.Systems and methods for analyzing entity profiles
US8725732B1 (en)*2009-03-132014-05-13Google Inc.Classifying text into hierarchical categories
US11244011B2 (en)*2015-10-232022-02-08International Business Machines CorporationIngestion planning for complex tables
US20210312123A1 (en)*2020-04-032021-10-07Jon WardSystems and Methods For Cloud-Based Productivity Tools
US11687710B2 (en)*2020-04-032023-06-27Braincat, Inc.Systems and methods for cloud-based productivity tools
US11720753B2 (en)*2020-08-272023-08-08Capital One Services, LlcRepresenting confidence in natural language processing
US20230028717A1 (en)*2020-08-272023-01-26Capital One Services, LlcRepresenting Confidence in Natural Language Processing

Also Published As

Publication numberPublication date
EP1590796A1 (en)2005-11-02
US20080183468A1 (en)2008-07-31
WO2004072955A1 (en)2004-08-26
IL170065A (en)2013-02-28
WO2004072780A3 (en)2004-11-11
WO2004072780A2 (en)2004-08-26
US7792671B2 (en)2010-09-07
US20040158469A1 (en)2004-08-12
US8195459B1 (en)2012-06-05
EP1590798A2 (en)2005-11-02

Similar Documents

PublicationPublication DateTitle
US20040163035A1 (en)Method for automatic and semi-automatic classification and clustering of non-deterministic texts
US10431214B2 (en)System and method of determining a domain and/or an action related to a natural language input
US7415409B2 (en)Method to train the language model of a speech recognition system to convert and index voicemails on a search engine
US11182435B2 (en)Model generation device, text search device, model generation method, text search method, data structure, and program
CN108197282B (en)File data classification method and device, terminal, server and storage medium
US7272558B1 (en)Speech recognition training method for audio and video file indexing on a search engine
CN101533401B (en) Voice data retrieval system and voice data retrieval method
US9229974B1 (en)Classifying queries
US8126897B2 (en)Unified inverted index for video passage retrieval
US20230214579A1 (en)Intelligent character correction and search in documents
US20100070263A1 (en)Speech data retrieving web site system
CN107748784B (en)Method for realizing structured data search through natural language
US20040249808A1 (en)Query expansion using query logs
CN109446376B (en) A method and system for classifying speech by word segmentation
JP2013521567A (en) System including client computing device, method of tagging media objects, and method of searching a digital database including audio tagged media objects
CN111881283B (en)Business keyword library creation method, intelligent chat guiding method and device
CN109508441B (en)Method and device for realizing data statistical analysis through natural language and electronic equipment
US20220058213A1 (en)Systems and methods for identifying dynamic types in voice queries
CN108710653A (en)One kind, which is painted, originally reads aloud order method, apparatus and system
CN113177061B (en)Searching method and device and electronic equipment
CN119128120B (en)Consultation retrieval method and system based on demand label configuration
JP3921837B2 (en) Information discrimination support device, recording medium storing information discrimination support program, and information discrimination support method
WO2006118360A1 (en)Issue trend analysis system
CN111090977A (en)Intelligent writing system and intelligent writing method
CN113722447B (en)Voice search method based on multi-strategy matching

Legal Events

DateCodeTitleDescription
ASAssignment

Owner name:VERINT SYSTEMS INC., NEW YORK

Free format text:ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ARIEL, ASSAF;BRAND, MICHAEL;HOROWITZ, ITSIK;AND OTHERS;REEL/FRAME:014967/0368;SIGNING DATES FROM 20040129 TO 20040202

ASAssignment

Owner name:LEHMAN COMMERCIAL PAPER INC., AS ADMINISTRATIVE AG

Free format text:SECURITY AGREEMENT;ASSIGNOR:VERINT SYSTEMS INC.;REEL/FRAME:019588/0613

Effective date:20070525

ASAssignment

Owner name:CREDIT SUISSE AS ADMINISTRATIVE AGENT, NEW YORK

Free format text:ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:VERINT SYSTEMS INC.;LEHMAN COMMERCIAL PAPER INC.;REEL/FRAME:022793/0888

Effective date:20090604

STCBInformation on status: application discontinuation

Free format text:ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION

ASAssignment

Owner name:VERINT AMERICAS INC., NEW YORK

Free format text:RELEASE OF SECURITY INTEREST;ASSIGNOR:CREDIT SUISSE AG;REEL/FRAME:026206/0340

Effective date:20110429

Owner name:VERINT SYSTEMS INC., NEW YORK

Free format text:RELEASE OF SECURITY INTEREST;ASSIGNOR:CREDIT SUISSE AG;REEL/FRAME:026206/0340

Effective date:20110429

Owner name:VERINT VIDEO SOLUTIONS INC., NEW YORK

Free format text:RELEASE OF SECURITY INTEREST;ASSIGNOR:CREDIT SUISSE AG;REEL/FRAME:026206/0340

Effective date:20110429


[8]ページ先頭

©2009-2025 Movatter.jp