Movatterモバイル変換


[0]ホーム

URL:


EP3686756A1 - Method and apparatus for grouping data records - Google Patents

Method and apparatus for grouping data records
Download PDF

Info

Publication number
EP3686756A1
EP3686756A1EP19153803.2AEP19153803AEP3686756A1EP 3686756 A1EP3686756 A1EP 3686756A1EP 19153803 AEP19153803 AEP 19153803AEP 3686756 A1EP3686756 A1EP 3686756A1
Authority
EP
European Patent Office
Prior art keywords
data records
textually
textual
similarity
group
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP19153803.2A
Other languages
German (de)
French (fr)
Inventor
Ahmed Fouad Saleh SALHIN
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sage UK Ltd
Original Assignee
Sage UK Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sage UK LtdfiledCriticalSage UK Ltd
Priority to EP19153803.2ApriorityCriticalpatent/EP3686756A1/en
Priority to US16/752,341prioritypatent/US11372896B2/en
Publication of EP3686756A1publicationCriticalpatent/EP3686756A1/en
Withdrawnlegal-statusCriticalCurrent

Links

Images

Classifications

Definitions

Landscapes

Abstract

A system and computer-implemented method of grouping data records for subsequent data record searching. A level of textual similarity of data records in a group of data records is determined using matching and identifying a textual similarity metric between pairs of data records in the group of data records, and clustering the data records to form groups of textually similar data records. The groups of textually similar data records are provided to a computerised log of textually-matched data records. Further, temporally-repeating data records in the computerised log are identified. If all the data records in a group of temporally-repeating textually similar data records have a level of temporal and textual similarity above an overall similarity threshold, they are provided to a computerised log of temporally and textually-matched data records, for example for searching or future forecasting.

Description

Claims (15)

  1. A system for grouping data records for subsequent data record searching, the system comprising:
    a computer-implemented data record textual match analysis module arranged to:
    determine a level of textual similarity of data records in a group of data records; and
    if the data records in the group of data records have a level of textual similarity above a textual similarity threshold, provide the data records to a computerised log of textually-matched data records; and
    if the data records in the group of data records have a level of textual similarity below the textual similarity threshold, provide the data records to a computer-implemented data record grouping module for further similarity analysis;
    a computer-implemented data record grouping module arranged to perform the further similarity analysis by being arranged to:
    identify a textual similarity metric between pairs of data records in the group of data records;
    record the textual similarity metric of the group of data records;
    cluster the data records of the group of data records using the recorded textual similarity metric to form one or more clusters of textually similar data records; and
    provide the one or more clusters of textually similar data records to the computerised log of textually-matched data records;
    a computer-implemented repeating data record identifier module arranged to identify temporally-repeating data records of the textually-matched data records recorded in the computerised log of textually-matched data records; and
    a computer implemented quality control data record grouping module arranged to:
    analyse the temporally-repeating textually-matched data records in the group for textual and temporal similarity, and
    if all the data records in the group of temporally-repeating textually similar data records have a level of temporal and textual similarity above an overall similarity threshold, provide the temporally-repeating textually similar data records to a computerised log of temporally and textually-matched data records.
  2. The system of any preceding claim, wherein the repeating data record identifier module is arranged to identify temporally-repeating data records of the textually matched data records by:
    analysing the timestamp portions of the textually matched data records to determine a time separation between pairs of the textually matched data records which are temporally consecutive;
    determining if the textually matched data records comprise timestamp portions separated by regular time intervals;
    if the textually matched data records comprise timestamp portions separated by regular time intervals, categorising the textually matched data records as recurrent temporally-repeating textually matched data records; and
    if the textually matched data records do not comprise timestamp portions separated by regular time intervals, categorising the textually matched data records as non-recurrent temporally-repeating textually matched data records.
  3. A computer-implemented method of grouping data records for subsequent data record searching , the method comprising:
    determining a level of textual similarity of data records in a group of data records;
    if the data records in the group of data records have a level of textual similarity above a textual similarity threshold, providing the data records to a computerised log of textually-matched data records; and
    if the data records in the group of data records have a level of textual similarity below the textual similarity threshold, providing the data records for further similarity analysis, the further similarity analysis comprising:
    identifying a textual similarity metric between pairs of data records in the group of data records;
    recording the textual similarity metric of the group of data records;
    clustering the data records of the group of data records using the recorded textual similarity metric to form one or more groups of textually similar data records; and
    providing the one or more groups of textually similar data records to the computerised log of textually-matched data records;
    identifying temporally-repeating data records of the textually-matched data records recorded in the computerised log of textually-matched data records; and
    analysing the temporally-repeating textually-matched data records in the group for textual and temporal similarity, and
    if all the data records in the group of temporally-repeating textually similar data records have a level of temporal and textual similarity above an overall similarity threshold, providing the temporally-repeating textually similar data records to a computerised log of temporally and textually-matched data records.
EP19153803.2A2019-01-252019-01-25Method and apparatus for grouping data recordsWithdrawnEP3686756A1 (en)

Priority Applications (2)

Application NumberPriority DateFiling DateTitle
EP19153803.2AEP3686756A1 (en)2019-01-252019-01-25Method and apparatus for grouping data records
US16/752,341US11372896B2 (en)2019-01-252020-01-24Method and apparatus for grouping data records

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
EP19153803.2AEP3686756A1 (en)2019-01-252019-01-25Method and apparatus for grouping data records

Publications (1)

Publication NumberPublication Date
EP3686756A1true EP3686756A1 (en)2020-07-29

Family

ID=65236908

Family Applications (1)

Application NumberTitlePriority DateFiling Date
EP19153803.2AWithdrawnEP3686756A1 (en)2019-01-252019-01-25Method and apparatus for grouping data records

Country Status (2)

CountryLink
US (1)US11372896B2 (en)
EP (1)EP3686756A1 (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
JP2020154828A (en)*2019-03-202020-09-24富士通株式会社 Data complement program, data complement method and data complement device
US20200380335A1 (en)*2019-05-302020-12-03AVAST Software s.r.o.Anomaly detection in business intelligence time series
US11557288B2 (en)*2020-04-102023-01-17International Business Machines CorporationHindrance speech portion detection using time stamps
US11720601B2 (en)*2020-07-022023-08-08Sap SeActive entity resolution model recommendation system
US11605390B2 (en)*2020-09-012023-03-14Malihe EshghaviSystems, methods, and apparatus for language acquisition using socio-neuorocognitive techniques
US12009107B2 (en)*2020-09-092024-06-11Optum, Inc.Seasonally adjusted predictive data analysis
US11580119B2 (en)*2020-09-222023-02-14Cognism LimitedSystem and method for automatic persona generation using small text components
US11823666B2 (en)*2021-10-042023-11-21International Business Machines CorporationAutomatic measurement of semantic similarity of conversations
US11768860B2 (en)2021-11-032023-09-26International Business Machines CorporationBucketing records using temporal point processes
AU2022460169A1 (en)*2022-05-272024-12-19Xero LimitedMethods and systems for predicting cash flow
US12361415B2 (en)*2022-11-152025-07-15Discover Financial ServicesComputing systems and methods for identifying and providing information about recurring transactions
CN116127078B (en)*2023-04-192023-07-21吉林大学 A large-scale extremely weakly supervised multi-label policy classification method and system
US12321358B1 (en)2023-11-302025-06-03Truist BankDatabase management systems
US12430319B2 (en)2023-11-302025-09-30Truist BankProactive database management systems
US20250181592A1 (en)*2023-11-302025-06-05Truist BankDatabase management systems

Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20110179017A1 (en)*2010-01-202011-07-21Microsoft CorporationDetecting spiking queries
EP2767911A1 (en)*2013-02-132014-08-20BAE Systems PLCData storage and retrieval
US20180113928A1 (en)*2016-10-212018-04-26International Business Machines CorporationMultiple record linkage algorithm selector
US20180181895A1 (en)*2016-12-232018-06-28Yodlee, Inc.Identifying Recurring Series From Transactional Data

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US10353756B2 (en)*2016-10-112019-07-16Oracle International CorporationCluster-based processing of unstructured log messages
US10726501B1 (en)*2017-04-252020-07-28Intuit Inc.Method to use transaction, account, and company similarity clusters derived from the historic transaction data to match new transactions to accounts
US10489348B2 (en)*2017-07-172019-11-26Alteryx, Inc.Performing hash joins using parallel processing
US10747785B2 (en)*2017-11-012020-08-18Mad Street Den, Inc.Method and system for efficient clustering of combined numeric and qualitative data records

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20110179017A1 (en)*2010-01-202011-07-21Microsoft CorporationDetecting spiking queries
EP2767911A1 (en)*2013-02-132014-08-20BAE Systems PLCData storage and retrieval
US20180113928A1 (en)*2016-10-212018-04-26International Business Machines CorporationMultiple record linkage algorithm selector
US20180181895A1 (en)*2016-12-232018-06-28Yodlee, Inc.Identifying Recurring Series From Transactional Data

Also Published As

Publication numberPublication date
US11372896B2 (en)2022-06-28
US20200242134A1 (en)2020-07-30

Similar Documents

PublicationPublication DateTitle
US11372896B2 (en)Method and apparatus for grouping data records
CN111445028B (en)AI-driven transaction management system
US10977293B2 (en)Technology incident management platform
CA3120412A1 (en)An automated and dynamic method and system for clustering data records
EP3555750B1 (en)Garbage collection for data storage
CN112380321B (en) Primary and secondary database allocation method based on bill knowledge graph and related equipment
CN116579804A (en)Holiday commodity sales prediction method, holiday commodity sales prediction device and computer storage medium
US20230066770A1 (en)Cross-channel actionable insights
CN118761842B (en)Information security risk assessment management method and device for online transaction
CN114860819A (en)Method, device, equipment and storage medium for constructing business intelligent system
US20240265456A1 (en)Discovering values for metrics of entities from non-standardized datasets
EP3489838A1 (en)Method and apparatus for determining an association
GaoRETRACTED: Implementation of a dynamic planning algorithm in accounting information technology administration
US11941651B2 (en)LCP pricing tool
CN115062858A (en)User complaint behavior prediction method, device, equipment and storage medium
SadulaIntegrating Big Data Analytics with US SEC Financial Statement Datasets and the Critical Examination of the Altman Z’-Score Model
CN118820325B (en)Account period data processing method, system, equipment and medium based on Microsoft 365
US20250258836A1 (en)Audit tracking and metadata-based data skipping in data processing pipelines
US20250272482A1 (en)Automated electronic document creation through machine learning
US20250278675A1 (en)Maching learning systems
RisenTechnological challenges in accounting and finance (big data)
Fonseka et al.Use of data warehousing to analyze customer complaint data of Consumer Financial Protection Bureau of United States of America
AyyavaraiahData Mining For Business Intelligence
Uchôa de AraújoThe role of Data Preprocessing in Forecasting of Spare Parts Sales: A Case Study from the Mining industry Using Customer Equipment and Sales Data
Alatrista-Salas et al.Algorithms For Anomaly Detection on Time Series: A Use Case on Banking Data

Legal Events

DateCodeTitleDescription
PUAIPublic reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text:ORIGINAL CODE: 0009012

STAAInformation on the status of an ep patent application or granted ep patent

Free format text:STATUS: THE APPLICATION HAS BEEN PUBLISHED

AKDesignated contracting states

Kind code of ref document:A1

Designated state(s):AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AXRequest for extension of the european patent

Extension state:BA ME

STAAInformation on the status of an ep patent application or granted ep patent

Free format text:STATUS: REQUEST FOR EXAMINATION WAS MADE

17PRequest for examination filed

Effective date:20210107

RBVDesignated contracting states (corrected)

Designated state(s):AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

STAAInformation on the status of an ep patent application or granted ep patent

Free format text:STATUS: EXAMINATION IS IN PROGRESS

17QFirst examination report despatched

Effective date:20210604

RAP3Party data changed (applicant data changed or rights of an application transferred)

Owner name:SAGE (UK) LIMITED

STAAInformation on the status of an ep patent application or granted ep patent

Free format text:STATUS: THE APPLICATION HAS BEEN WITHDRAWN

18WApplication withdrawn

Effective date:20221118


[8]ページ先頭

©2009-2025 Movatter.jp