Movatterモバイル変換


[0]ホーム

URL:


US20140207786A1 - System and methods for computerized information governance of electronic documents - Google Patents

System and methods for computerized information governance of electronic documents
Download PDF

Info

Publication number
US20140207786A1
US20140207786A1US14/062,233US201314062233AUS2014207786A1US 20140207786 A1US20140207786 A1US 20140207786A1US 201314062233 AUS201314062233 AUS 201314062233AUS 2014207786 A1US2014207786 A1US 2014207786A1
Authority
US
United States
Prior art keywords
documents
document
retention
classifier
classifiers
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US14/062,233
Inventor
Liad Tal-Rothschild
Yiftach Ravid
Amir Milo
Warwick SHARP
Theresa BEAUMONT
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Microsoft Israel Research and Development 2002 Ltd
Original Assignee
Equivio Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Equivio LtdfiledCriticalEquivio Ltd
Priority to US14/062,233priorityCriticalpatent/US20140207786A1/en
Assigned to EQUIVIO LTD.reassignmentEQUIVIO LTD.ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS).Assignors: MILO, AMIR, RAVID, YIFTACH, SHARP, WARWICK, TAL-ROTHSCHILD, LIAD
Assigned to EQUIVIO LTD.reassignmentEQUIVIO LTD.ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS).Assignors: BEAUMONT, THERESA
Publication of US20140207786A1publicationCriticalpatent/US20140207786A1/en
Assigned to MICROSOFT ISRAEL RESEARCH AND DEVELOPMENT (2002) LTDreassignmentMICROSOFT ISRAEL RESEARCH AND DEVELOPMENT (2002) LTDMERGER (SEE DOCUMENT FOR DETAILS).Assignors: EQUIVIO LTD
Abandonedlegal-statusCriticalCurrent

Links

Images

Classifications

Definitions

Landscapes

Abstract

An information governance system comprising a plurality of classifiers which employ cutoffs for classifying at least a portion of a population of incoming documents as documents to be retained and documents to be discarded in accordance with a corresponding plurality of pre-defined retention schedules; training apparatus for training said classifiers based on relevance inputs provided by a human information governance expert regarding a training set of documents within a universe of documents to be governed; and apparatus operative to automatically cause any classified document to be retained and subsequently discarded in accordance with its pre-defined retention schedule including discarding only documents that (a) have been classified as documents to be discarded and (b) have not been classified as documents to be retained, and to automatically cause any document which could not be classified, to be retained as gray area data until further notice.

Description

Claims (19)

1. An information governance system comprising:
A plurality of classifiers which employ cutoffs for classifying at least a portion of a population of incoming documents as documents to be retained and documents to be discarded in accordance with a retention policy comprising a corresponding plurality of pre-defined retention schedules;
training apparatus for training said classifiers based on relevance inputs provided by a human information governance expert regarding a training set of documents within a universe of documents to be governed; and
retain/discard apparatus operative to automatically cause any classified document to be retained and subsequently discarded in accordance with its pre-defined retention schedule including discarding only documents that (a) have been classified as documents to be discarded and (b) have not been classified as documents to be retained, and to automatically cause any document which could not be classified, to be retained as gray area data until further notice.
8. An information governance method comprising:
generating a plurality of classifiers for classifying electronic documents into a corresponding plurality of documentation retention categories;
running training iterations thereby to improve at least one of the plurality of classifiers;
classifying a repository of electronic documents using said plurality of classifiers and running a Logarithmic stratified sampling-based Quality Assurance process to compute precision in cases of low or unknown richness including ordering documents by their ranks then partitioning the ranks into slices: [0,p] [p, 2p], [2p, 4p], . . . , and randomly selecting documents to represent each slice, thereby to generate Quality Assurance results;
if the Quality Assurance results are not deemed good enough, improve the classifier and return to one of said running steps:
if the Quality Assurance results are good enough, use last classifier to implement a plurality of document retention settings corresponding to said plurality of documentation retention categories.
19. A computer program product, comprising a non-transitory tangible computer readable medium having computer readable program code embodied therein, said computer readable program code adapted to be executed to implement an information governance method comprising:
generating a plurality of classifiers for classifying electronic documents into a corresponding plurality of documentation retention categories;
running training iterations thereby to improve at least one of the plurality of classifiers;
classifying a repository of electronic documents using said plurality of classifiers and running a Logarithmic stratified sampling-based Quality Assurance process to compute precision in cases of low or unknown richness including ordering documents by their ranks then partitioning the ranks into slices: [0,p] [p, 2p], [2p, 4p], . . . , and randomly selecting documents to represent each slice, thereby to generate Quality Assurance results;
if the Quality Assurance results are not deemed good enough, improve the classifier and return to one of said running steps;
if the Quality Assurance results are good enough, use last classifier to implement a plurality of document retention settings corresponding to said plurality of documentation retention categories.
US14/062,2332013-01-222013-10-24System and methods for computerized information governance of electronic documentsAbandonedUS20140207786A1 (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
US14/062,233US20140207786A1 (en)2013-01-222013-10-24System and methods for computerized information governance of electronic documents

Applications Claiming Priority (2)

Application NumberPriority DateFiling DateTitle
US201361755242P2013-01-222013-01-22
US14/062,233US20140207786A1 (en)2013-01-222013-10-24System and methods for computerized information governance of electronic documents

Publications (1)

Publication NumberPublication Date
US20140207786A1true US20140207786A1 (en)2014-07-24

Family

ID=51208555

Family Applications (3)

Application NumberTitlePriority DateFiling Date
US14/062,233AbandonedUS20140207786A1 (en)2013-01-222013-10-24System and methods for computerized information governance of electronic documents
US14/161,159AbandonedUS20140207782A1 (en)2013-01-222014-01-22System and method for computerized semantic processing of electronic documents including themes
US14/161,221Active2034-05-03US10002182B2 (en)2013-01-222014-01-22System and method for computerized identification and effective presentation of semantic themes occurring in a set of electronic documents

Family Applications After (2)

Application NumberTitlePriority DateFiling Date
US14/161,159AbandonedUS20140207782A1 (en)2013-01-222014-01-22System and method for computerized semantic processing of electronic documents including themes
US14/161,221Active2034-05-03US10002182B2 (en)2013-01-222014-01-22System and method for computerized identification and effective presentation of semantic themes occurring in a set of electronic documents

Country Status (1)

CountryLink
US (3)US20140207786A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US10002182B2 (en)2013-01-222018-06-19Microsoft Israel Research And Development (2002) LtdSystem and method for computerized identification and effective presentation of semantic themes occurring in a set of electronic documents
JP2020113035A (en)*2019-01-112020-07-27株式会社東芝Classification support system, classification support device, learning device, classification support method, and program
US10902066B2 (en)*2018-07-232021-01-26Open Text Holdings, Inc.Electronic discovery using predictive filtering
US11023828B2 (en)2010-05-252021-06-01Open Text Holdings, Inc.Systems and methods for predictive coding
US20230418980A1 (en)*2022-06-282023-12-28Cisco Technology, Inc.Intent-based enterprise data management for simplified data governance

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN104731828B (en)*2013-12-242017-12-05华为技术有限公司A kind of cross-cutting Documents Similarity computational methods and device
CN105893611B (en)*2016-04-272020-04-07南京邮电大学Method for constructing interest topic semantic network facing social network
US11222266B2 (en)2016-07-152022-01-11Intuit Inc.System and method for automatic learning of functions
US10579721B2 (en)2016-07-152020-03-03Intuit Inc.Lean parsing: a natural language processing system and method for parsing domain-specific languages
US10275444B2 (en)2016-07-152019-04-30At&T Intellectual Property I, L.P.Data analytics system and methods for text data
US11049190B2 (en)2016-07-152021-06-29Intuit Inc.System and method for automatically generating calculations for fields in compliance forms
US20180053120A1 (en)*2016-07-152018-02-22Intuit Inc.System and method for identifying a subset of total historical users of a document preparation system to represent a full set of test scenarios based on statistical analysis
US10725896B2 (en)2016-07-152020-07-28Intuit Inc.System and method for identifying a subset of total historical users of a document preparation system to represent a full set of test scenarios based on code coverage
US10666792B1 (en)*2016-07-222020-05-26Pindrop Security, Inc.Apparatus and method for detecting new calls from a known robocaller and identifying relationships among telephone calls
US10255283B1 (en)*2016-09-192019-04-09Amazon Technologies, Inc.Document content analysis based on topic modeling
US10558657B1 (en)2016-09-192020-02-11Amazon Technologies, Inc.Document content analysis based on topic modeling
US10922621B2 (en)*2016-11-112021-02-16International Business Machines CorporationFacilitating mapping of control policies to regulatory documents
JP6946081B2 (en)*2016-12-222021-10-06キヤノン株式会社 Information processing equipment, information processing methods, programs
CA3080551C (en)*2017-10-272022-10-11Intuit Inc.System and method for identifying a subset of total historical users of a document preparation system to represent a full set of test scenarios based on statistical analysis
US11068932B2 (en)*2017-12-122021-07-20Wal-Mart Stores, Inc.Systems and methods for processing or mining visitor interests from graphical user interfaces displaying referral websites
US11163956B1 (en)2019-05-232021-11-02Intuit Inc.System and method for recognizing domain specific named entities using domain specific word embeddings
US11783128B2 (en)2020-02-192023-10-10Intuit Inc.Financial document text conversion to computer readable operations

Citations (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20030195937A1 (en)*2002-04-162003-10-16Kontact Software Inc.Intelligent message screening
US20040098389A1 (en)*2002-11-122004-05-20Jones Dumont M.Document search method with interactively employed distance graphics display
US20050060643A1 (en)*2003-08-252005-03-17Miavia, Inc.Document similarity detection and classification system
US20070043774A1 (en)*2001-06-272007-02-22Inxight Software, Inc.Method and Apparatus for Incremental Computation of the Accuracy of a Categorization-by-Example System
US20100254615A1 (en)*2009-04-022010-10-07Check Point Software Technologies, Ltd.Methods for document-to-template matching for data-leak prevention
US20100332428A1 (en)*2010-05-182010-12-30Integro Inc.Electronic document classification
US20110040837A1 (en)*2009-08-142011-02-17Tal EdenMethods and apparatus to classify text communications

Family Cites Families (33)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US6208988B1 (en)1998-06-012001-03-27Bigchalk.Com, Inc.Method for identifying themes associated with a search query using metadata and for organizing documents responsive to the search query in accordance with the themes
US7421395B1 (en)*2000-02-182008-09-02Microsoft CorporationSystem and method for producing unique account names
US20020065857A1 (en)*2000-10-042002-05-30Zbigniew MichalewiczSystem and method for analysis and clustering of documents for search engine
US20020111792A1 (en)*2001-01-022002-08-15Julius ChernyDocument storage, retrieval and search systems and methods
US7158961B1 (en)2001-12-312007-01-02Google, Inc.Methods and apparatus for estimating similarity
DE10239321B3 (en)2002-08-272004-04-08Pari GmbH Spezialisten für effektive Inhalation Aerosol therapy device
US7424427B2 (en)*2002-10-172008-09-09Verizon Corporate Services Group Inc.Systems and methods for classifying audio into broad phoneme classes
AU2005264153B2 (en)2004-07-212012-04-05Microsoft Israel Research And Development (2002) LtdA method for determining near duplicate data objects
US9218623B2 (en)*2005-12-282015-12-22Palo Alto Research Center IncorporatedSystem and method for providing private stable matchings
US7765212B2 (en)*2005-12-292010-07-27Microsoft CorporationAutomatic organization of documents through email clustering
WO2007086059A2 (en)2006-01-252007-08-02Equivio Ltd.Determining near duplicate 'noisy' data objects
US7587418B2 (en)2006-06-052009-09-08International Business Machines CorporationSystem and method for effecting information governance
EP2039089B1 (en)*2006-06-292009-07-29International Business Machines CorporationMethod and system for low-redundancy e-mail handling
US7873640B2 (en)*2007-03-272011-01-18Adobe Systems IncorporatedSemantic analysis documents to rank terms
GB2450546A (en)2007-06-292008-12-31Philip GiokasMetered dispensation of fluid
US9317593B2 (en)*2007-10-052016-04-19Fujitsu LimitedModeling topics using statistical distributions
US20090198677A1 (en)2008-02-052009-08-06Nuix Pty.Ltd.Document Comparison Method And Apparatus
WO2009102765A2 (en)2008-02-112009-08-20Nuix North America Inc.Parallelization of electronic discovery document indexing
US8280886B2 (en)*2008-02-132012-10-02Fujitsu LimitedDetermining candidate terms related to terms of a query
US8209665B2 (en)*2008-04-082012-06-26Infosys LimitedIdentification of topics in source code
US7930306B2 (en)2008-04-302011-04-19Msc Intellectual Properties B.V.System and method for near and exact de-duplication of documents
US8346685B1 (en)2009-04-222013-01-01Equivio Ltd.Computerized system for enhancing expert-based processes and methods useful in conjunction therewith
US8392175B2 (en)*2010-02-012013-03-05Stratify, Inc.Phrase-based document clustering with automatic phrase extraction
US8510257B2 (en)*2010-10-192013-08-13Xerox CorporationCollapsed gibbs sampler for sparse topic models and discrete matrix factorization
US8316030B2 (en)*2010-11-052012-11-20Nextgen Datacom, Inc.Method and system for document classification or search using discrete words
US20120215749A1 (en)2011-02-082012-08-23Pierre Van BenedenSystem And Method For Managing Records Using Information Governance Policies
US9269053B2 (en)2011-04-282016-02-23Kroll Ontrack, Inc.Electronic review of documents
WO2013037044A1 (en)*2011-09-122013-03-21Tian LuSystem and method for automatic segmentation and matching of customers to vendible items
AU2013234865B2 (en)2012-03-232018-07-26Bae Systems Australia LimitedSystem and method for identifying and visualising topics and themes in collections of documents
US9355170B2 (en)*2012-11-272016-05-31Hewlett Packard Enterprise Development LpCausal topic miner
US20140195518A1 (en)2013-01-042014-07-10Opera Solutions, LlcSystem and Method for Data Mining Using Domain-Level Context
US20140207786A1 (en)2013-01-222014-07-24Equivio Ltd.System and methods for computerized information governance of electronic documents
US20150113388A1 (en)2013-10-222015-04-23Qualcomm IncorporatedMethod and apparatus for performing topic-relevance highlighting of electronic text

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20070043774A1 (en)*2001-06-272007-02-22Inxight Software, Inc.Method and Apparatus for Incremental Computation of the Accuracy of a Categorization-by-Example System
US20030195937A1 (en)*2002-04-162003-10-16Kontact Software Inc.Intelligent message screening
US20040098389A1 (en)*2002-11-122004-05-20Jones Dumont M.Document search method with interactively employed distance graphics display
US20050060643A1 (en)*2003-08-252005-03-17Miavia, Inc.Document similarity detection and classification system
US20100254615A1 (en)*2009-04-022010-10-07Check Point Software Technologies, Ltd.Methods for document-to-template matching for data-leak prevention
US20110040837A1 (en)*2009-08-142011-02-17Tal EdenMethods and apparatus to classify text communications
US20100332428A1 (en)*2010-05-182010-12-30Integro Inc.Electronic document classification

Cited By (8)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US11023828B2 (en)2010-05-252021-06-01Open Text Holdings, Inc.Systems and methods for predictive coding
US11282000B2 (en)2010-05-252022-03-22Open Text Holdings, Inc.Systems and methods for predictive coding
US10002182B2 (en)2013-01-222018-06-19Microsoft Israel Research And Development (2002) LtdSystem and method for computerized identification and effective presentation of semantic themes occurring in a set of electronic documents
US10902066B2 (en)*2018-07-232021-01-26Open Text Holdings, Inc.Electronic discovery using predictive filtering
US12299051B2 (en)2018-07-232025-05-13Open Text Holdings, Inc.Systems and methods of predictive filtering using document field values
JP2020113035A (en)*2019-01-112020-07-27株式会社東芝Classification support system, classification support device, learning device, classification support method, and program
US20230418980A1 (en)*2022-06-282023-12-28Cisco Technology, Inc.Intent-based enterprise data management for simplified data governance
US12361169B2 (en)*2022-06-282025-07-15Cisco Technology, Inc.Intent-based enterprise data management for simplified data governance

Also Published As

Publication numberPublication date
US20140207783A1 (en)2014-07-24
US20140207782A1 (en)2014-07-24
US10002182B2 (en)2018-06-19

Similar Documents

PublicationPublication DateTitle
US20140207786A1 (en)System and methods for computerized information governance of electronic documents
US8706742B1 (en)System for enhancing expert-based computerized analysis of a set of digital documents and methods useful in conjunction therewith
US9881080B2 (en)System for enhancing expert-based computerized analysis of a set of digital documents and methods useful in conjunction therewith
US10891321B2 (en)Systems and methods for performing a computer-implemented prior art search
US12332954B2 (en)Systems and methods for intelligent content filtering and persistence
US8533194B1 (en)System for enhancing expert-based computerized analysis of a set of digital documents and methods useful in conjunction therewith
US9268851B2 (en)Ranking information content based on performance data of prior users of the information content
US20160034512A1 (en)Context-based metadata generation and automatic annotation of electronic media in a computer network
CN109033200A (en)Method, apparatus, equipment and the computer-readable medium of event extraction
US8832126B2 (en)Custodian suggestion for efficient legal e-discovery
CN107992633A (en)Electronic document automatic classification method and system based on keyword feature
EP4229499A1 (en)Artificial intelligence driven document analysis, including searching, indexing, comparing or associating datasets based on learned representations
CN107463935A (en)Application class methods and applications sorter
CN117648635B (en)Sensitive information classification and classification method and system and electronic equipment
CN116127194A (en)Enterprise recommendation method
CN118862036B (en) An intelligent archive management system and method based on big data
CN113920366A (en) A comprehensive weighted master data identification method based on machine learning
CN115982429B (en)Knowledge management method and system based on flow control
CN109871429A (en) A Short Text Retrieval Method Based on Wikipedia Classification and Explicit Semantic Features
CN107577690A (en)The recommendation method and recommendation apparatus of magnanimity information data
Alshahrani et al.Borsah: An arabic sentiment financial tweets corpus
CN119808794B (en) A big data intelligent analysis method and system based on AI
CN118796954B (en)Classified management method and device for government affair data
RU2843081C1 (en)Method of clustering model pre-training, or training, or additional training
RU2842206C1 (en)Computer device for pre-training, or training, or additional training of classification model and/or clustering model

Legal Events

DateCodeTitleDescription
ASAssignment

Owner name:EQUIVIO LTD., ISRAEL

Free format text:ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:TAL-ROTHSCHILD, LIAD;RAVID, YIFTACH;MILO, AMIR;AND OTHERS;REEL/FRAME:031558/0818

Effective date:20131024

ASAssignment

Owner name:EQUIVIO LTD., ISRAEL

Free format text:ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:BEAUMONT, THERESA;REEL/FRAME:032560/0170

Effective date:20140308

ASAssignment

Owner name:MICROSOFT ISRAEL RESEARCH AND DEVELOPMENT (2002) L

Free format text:MERGER;ASSIGNOR:EQUIVIO LTD;REEL/FRAME:039491/0361

Effective date:20160221

STCBInformation on status: application discontinuation

Free format text:ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION


[8]ページ先頭

©2009-2025 Movatter.jp