Movatterモバイル変換


[0]ホーム

URL:


US20180096018A1 - Reducing processing for comparing large metadata sets - Google Patents

Reducing processing for comparing large metadata sets
Download PDF

Info

Publication number
US20180096018A1
US20180096018A1US15/281,171US201615281171AUS2018096018A1US 20180096018 A1US20180096018 A1US 20180096018A1US 201615281171 AUS201615281171 AUS 201615281171AUS 2018096018 A1US2018096018 A1US 2018096018A1
Authority
US
United States
Prior art keywords
database
data
fields
field
data record
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US15/281,171
Inventor
Victor Cazin
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Microsoft Technology Licensing LLC
Original Assignee
Microsoft Technology Licensing LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Technology Licensing LLCfiledCriticalMicrosoft Technology Licensing LLC
Priority to US15/281,171priorityCriticalpatent/US20180096018A1/en
Assigned to MICROSOFT TECHNOLOGY LICENSING, LLCreassignmentMICROSOFT TECHNOLOGY LICENSING, LLCASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS).Assignors: CAZIN, VICTOR
Publication of US20180096018A1publicationCriticalpatent/US20180096018A1/en
Priority to US16/239,597prioritypatent/US11275723B2/en
Abandonedlegal-statusCriticalCurrent

Links

Images

Classifications

Definitions

Landscapes

Abstract

A computer system compares databases, where each data record includes values for a plurality of fields, by applying database operations on database tables of the data records to reduce computations being performed. For each data record in a reference set, candidate matches are identified from among the data records in a target set by performing a join operation on the sets on one or more selected, normalized fields from the sets. This significantly reduced set of candidate matches can be further analyzed to determine how similar they are to the data record from the reference set.

Description

Claims (20)

What is claimed is:
1. A computer system comprising:
computer storage, wherein a first database and a second database are stored in the computer storage,
the first database comprising a first plurality of first data records, each first data record comprising a first plurality of first data fields storing metadata describing an item from among a first plurality of items,
the first plurality of data fields comprising at least a first identifier field, a first index field, and at least two more first metadata fields, each first identifier field for each first data record for an item storing a value representing an identifier for the item in the first database, each first index field for each first data record for an item storing a value representing a description of the item,
the second database comprising a second plurality of second data records, each second data record comprising a second plurality of second data fields storing metadata describing an item from among a second plurality of items,
the second plurality of data fields comprising at least a second identifier field, a second index field, and at least two more second metadata fields, each second identifier field for each second data record for an item storing a value representing an identifier for the item in the second database, each second index field for each second data record for an item storing a value representing a description of the item,
wherein, for similar items in the first database and the second database, the value representing the identifier for the similar item in the first identifier field may be different from the value representing the identifier for the similar item in the second identifier field,
a processing system comprising a memory and a processor, the processing system operable to access the computer storage and being configured by computer program instructions to be operative to:
normalize the values in the first index field of the first data records and the values in the second index field of the second data records;
select a candidate set based on the first database and the second database using the normalized first index field and the normalized second index field, to generate a table combining the first metadata fields and the second metadata fields, wherein the table comprises, for each data record in the first database having a similar data record in the second database, and for each similar data record, a row combining data fields for the data record from the first database with data fields for the similar data record from the second database, wherein the similar data record is a data record that has a value in the normalized second index field which is equal to the value in the normalized first index field for the data record in the first database;
for each row in the generated table, compare values of first metadata fields in the row to values of corresponding second metadata fields in the row to generate scores, and storing the scores as data fields in the row;
for each row in the generated table, determine whether the data record from the second database corresponding to the row is equivalent to the data record from the first database corresponding to the row, based on the generated scores stored in the row.
2. The computer system ofclaim 1, wherein selecting the candidate set comprises performing a join operation of the first database and the second database on the normalized first index field and the normalized second index field to generate the table.
3. The computer system ofclaim 1, wherein selecting the candidate set uses a plurality of normalized first index fields and a plurality of normalized second index fields.
4. The computer system ofclaim 3, wherein selecting the candidate set comprises performing a first join operation of the first database and the second database on the normalized first index field and the normalized second index field to generate a first intermediate candidate table, performing a second join operation of the first database and the second database on a second normalized first index field and a second normalized second index field to generating a second intermediate candidate table, and performing a union of the first intermediate candidate table and the second intermediate candidate table to generate the table.
5. The computer system ofclaim 1, wherein determining comprises applying first criteria to the scores indicative of a match of a first priority, and applying second criteria to the scores indicative of a match of a second priority lower than the first priority, and storing and indication of the match and the priority of the match.
6. The computer system ofclaim 2, wherein comparing values of a first metadata field to a corresponding second metadata field in a row to generate a scores comprises applying an edit distance to text values stored in the first and second metadata fields.
7. The computer system ofclaim 6, wherein comparing values of a first metadata field to a corresponding second metadata field in a row to generate a scores comprises determining if a text value stored in the first metadata field is contained in a text value stored in the second metadata field.
8. An article of manufacture comprising:
a computer storage device,
computer program instructions stored on the computer storage which, when processed by a computer, configures the computer to be comprising:
computer storage, wherein a first database and a second database are stored in the computer storage,
the first database comprising a first plurality of first data records, each first data record comprising a first plurality of first data fields storing metadata describing an item from among a first plurality of items,
the first plurality of data fields comprising at least a first identifier field, a first index field, and at least two more first metadata fields, each first identifier field for each first data record for an item storing a value representing an identifier for the item in the first database, each first index field for each first data record for an item storing a value representing a description of the item,
the second database comprising a second plurality of second data records, each second data record comprising a second plurality of second data fields storing metadata describing an item from among a second plurality of items,
the second plurality of data fields comprising at least a second identifier field, a second index field, and at least two more second metadata fields, each second identifier field for each second data record for an item storing a value representing an identifier for the item in the second database, each second index field for each second data record for an item storing a value representing a description of the item,
wherein, for similar items in the first database and the second database, the value representing the identifier for the similar item in the first identifier field may be different from the value representing the identifier for the similar item in the second identifier field,
a processing system comprising a memory and a processor, the processing system operable to access the computer storage and being configured by computer program instructions to be operative to:
normalize the values in the first index field of the first data records and the values in the second index field of the second data records;
select a candidate set based on the first database and the second database using the normalized first index field and the normalized second index field, to generate a table combining the first metadata fields and the second metadata fields, wherein the table comprises, for each data record in the first database having a similar data record in the second database, and for each similar data record, a row combining data fields for the data record from the first database with data fields for the similar data record from the second database, wherein the similar data record is a data record that has a value in the normalized second index field which is equal to the value in the normalized first index field for the data record in the first database;
for each row in the generated table, compare values of first metadata fields in the row to values of corresponding second metadata fields in the row to generate scores, and storing the scores as data fields in the row;
for each row in the generated table, determine whether the data record from the second database corresponding to the row is equivalent to the data record from the first database corresponding to the row, based on the generated scores stored in the row.
9. The article of manufacture ofclaim 8, wherein selecting the candidate set comprises performing a join operation of the first database and the second database on the normalized first index field and the normalized second index field to generate the table.
10. The article of manufacture ofclaim 8, wherein selecting the candidate set uses a plurality of normalized first index fields and a plurality of normalized second index fields.
11. The article of manufacture ofclaim 10, wherein selecting the candidate set comprises performing a first join operation of the first database and the second database on the normalized first index field and the normalized second index field to generate a first intermediate candidate table, performing a second join operation of the first database and the second database on a second normalized first index field and a second normalized second index field to generating a second intermediate candidate table, and performing a union of the first intermediate candidate table and the second intermediate candidate table to generate the table.
12. The article of manufacture ofclaim 8, wherein determining comprises applying first criteria to the scores indicative of a match of a first priority, and applying a second criteria to the scores indicative of a match of a second priority lower than the first priority, and storing and indication of the match and the priority of the match.
13. The article of manufacture ofclaim 9, wherein comparing values of a first metadata field to a corresponding second metadata field in a row to generate a scores comprises applying an edit distance to text values stored in the first and second metadata fields.
14. The article of manufacture ofclaim 9, wherein comparing values of a first metadata field to a corresponding second metadata field in a row to generate a scores comprises determining if a text value stored in the first metadata field is contained in a text value stored in the second metadata field.
15. A computer-implemented process performed by a computer program executing on a computer, the computer including computer storage, wherein a first database and a second database are stored in the computer storage,
the first database comprising a first plurality of first data records, each first data record comprising a first plurality of first data fields storing metadata describing an item from among a first plurality of items,
the first plurality of data fields comprising at least a first identifier field, a first index field, and at least two more first metadata fields, each first identifier field for each first data record for an item storing a value representing an identifier for the item in the first database, each first index field for each first data record for an item storing a value representing a description of the item,
the second database comprising a second plurality of second data records, each second data record comprising a second plurality of second data fields storing metadata describing an item from among a second plurality of items,
the second plurality of data fields comprising at least a second identifier field, a second index field, and at least two more second metadata fields, each second identifier field for each second data record for an item storing a value representing an identifier for the item in the second database, each second index field for each second data record for an item storing a value representing a description of the item,
wherein, for similar items in the first database and the second database, the value representing the identifier for the similar item in the first identifier field may be different from the value representing the identifier for the similar item in the second identifier field,
the computer further including a processing system comprising a memory and a processor, the processing system operable to access the computer storage and being configured by computer program instructions to perform the process comprising:
normalizing the values in the first index field of the first data records and the values in the second index field of the second data records;
selecting a candidate set based on the first database and the second database using the normalized first index field and the normalized second index field, to generate a table combining the first metadata fields and the second metadata fields, wherein the table comprises, for each data record in the first database having a similar data record in the second database, and for each similar data record, a row combining data fields for the data record from the first database with data fields for the similar data record from the second database, wherein the similar data record is a data record that has a value in the normalized second index field which is equal to the value in the normalized first index field for the data record in the first database;
for each row in the generated table, comparing values of first metadata fields in the row to values of corresponding second metadata fields in the row to generate scores, and storing the scores as data fields in the row;
for each row in the generated table, determining whether the data record from the second database corresponding to the row is equivalent to the data record from the first database corresponding to the row, based on the generated scores stored in the row.
16. The computer-implemented process ofclaim 15, wherein selecting the candidate set comprises performing a join operation of the first database and the second database on the normalized first index field and the normalized second index field to generate the table.
17. The computer-implemented process ofclaim 15, wherein selecting the candidate set uses a plurality of normalized first index fields and a plurality of normalized second index fields.
18. The computer-implemented process ofclaim 17, wherein selecting the candidate set comprises performing a first join operation of the first database and the second database on the normalized first index field and the normalized second index field to generate a first intermediate candidate table, performing a second join operation of the first database and the second database on a second normalized first index field and a second normalized second index field to generating a second intermediate candidate table, and performing a union of the first intermediate candidate table and the second intermediate candidate table to generate the table.
19. The computer-implemented process ofclaim 15, wherein determining comprises applying first criteria to the scores indicative of a match of a first priority, and applying second criteria to the scores indicative of a match of a second priority lower than the first priority, and storing and indication of the match and the priority of the match.
20. The computer-implemented process ofclaim 16, wherein comparing values of a first metadata field to a corresponding second metadata field in a row to generate a scores comprises applying an edit distance to text values stored in the first and second metadata fields.
US15/281,1712016-09-302016-09-30Reducing processing for comparing large metadata setsAbandonedUS20180096018A1 (en)

Priority Applications (2)

Application NumberPriority DateFiling DateTitle
US15/281,171US20180096018A1 (en)2016-09-302016-09-30Reducing processing for comparing large metadata sets
US16/239,597US11275723B2 (en)2016-09-302019-01-04Reducing processing for comparing large metadata sets

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
US15/281,171US20180096018A1 (en)2016-09-302016-09-30Reducing processing for comparing large metadata sets

Related Child Applications (1)

Application NumberTitlePriority DateFiling Date
US16/239,597ContinuationUS11275723B2 (en)2016-09-302019-01-04Reducing processing for comparing large metadata sets

Publications (1)

Publication NumberPublication Date
US20180096018A1true US20180096018A1 (en)2018-04-05

Family

ID=61758719

Family Applications (2)

Application NumberTitlePriority DateFiling Date
US15/281,171AbandonedUS20180096018A1 (en)2016-09-302016-09-30Reducing processing for comparing large metadata sets
US16/239,597Active2037-09-26US11275723B2 (en)2016-09-302019-01-04Reducing processing for comparing large metadata sets

Family Applications After (1)

Application NumberTitlePriority DateFiling Date
US16/239,597Active2037-09-26US11275723B2 (en)2016-09-302019-01-04Reducing processing for comparing large metadata sets

Country Status (1)

CountryLink
US (2)US20180096018A1 (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN110188100A (en)*2019-05-172019-08-30中国银行股份有限公司Data processing method, device and computer storage medium
CN110209734A (en)*2019-05-052019-09-06深圳市腾讯计算机系统有限公司Data copy method, device, computer equipment and storage medium
US10417644B2 (en)*2017-05-052019-09-17Servicenow, Inc.Identifying clusters for service management operations
US20200134072A1 (en)*2018-10-302020-04-30Paypal, Inc.Automatic case-review and pattern mining and distributed databases
CN112214983A (en)*2020-09-212021-01-12远光软件股份有限公司Data record duplicate checking method and system
US20210397786A1 (en)*2018-11-022021-12-23Universal Entertainment CorporationInformation providing system, information providing method, and data structure of knowledge data
US11238026B2 (en)*2018-11-262022-02-01Home Box Office, Inc.Identifier technology for entertainment data
CN114116729A (en)*2022-01-182022-03-01荣耀终端有限公司 Method and device for processing test data
CN115080553A (en)*2022-07-212022-09-20广东广物优车科技有限公司Intelligent monitoring method for export goods
US20220365924A1 (en)*2017-06-282022-11-17Rovi Guides, Inc.Systems and methods for replacing a stored version of media with a version better suited for a user
US11537618B2 (en)*2020-03-182022-12-27Microsoft Technology Licensing, LlcCompliant entity conflation and access

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN113495902A (en)*2020-03-192021-10-12华为技术有限公司Data processing method and data standard management system
US11630843B2 (en)*2020-11-132023-04-18Capital One Services, LlcMethods and systems for synchronizing communication records in computer networks based on detecting patterns in categories of metadata
CN112380348B (en)*2020-11-252024-03-26中信百信银行股份有限公司Metadata processing method, apparatus, electronic device and computer readable storage medium
US12210508B1 (en)*2023-12-072025-01-28Sap SeDatabase table comparison

Family Cites Families (38)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US6044205A (en)1996-02-292000-03-28Intermind CorporationCommunications system for transferring information between memories according to processes transferred with the information
US5862325A (en)1996-02-291999-01-19Intermind CorporationComputer-based communication system and method using metadata defining a control structure
US5832499A (en)1996-07-101998-11-03Survivors Of The Shoah Visual History FoundationDigital library system
DE19704485C2 (en)1997-02-071998-11-19Siemens Ag Power supply device for a cooled electrical device
US6122627A (en)1997-05-092000-09-19International Business Machines CorporationSystem, method, and program for object building in queries over object views
US6003039A (en)1997-06-271999-12-14Platinum Technology, Inc.Data repository with user accessible and modifiable reuse criteria
US6044374A (en)1997-11-142000-03-28Informatica CorporationMethod and apparatus for sharing metadata between multiple data marts through object references
US6003025A (en)1997-11-241999-12-14International Business Machines CorporationData transformer system for accessing database information
US6263337B1 (en)*1998-03-172001-07-17Microsoft CorporationScalable system for expectation maximization clustering of large databases
US6209003B1 (en)1998-04-152001-03-27Inktomi CorporationGarbage collection in an object cache
US6216130B1 (en)1998-04-242001-04-10Ingeo Acquisitions, Inc.Geographic-based information technology management system
US6477269B1 (en)1999-04-202002-11-05Microsoft CorporationMethod and system for searching for images based on color and shape of a selected image
US20030023975A1 (en)*2001-03-022003-01-30Microsoft CorporationEnhanced music services for television
US7251648B2 (en)*2002-06-282007-07-31Microsoft CorporationAutomatically ranking answers to database queries
US7296011B2 (en)*2003-06-202007-11-13Microsoft CorporationEfficient fuzzy match for evaluating data records
US7546288B2 (en)2003-09-042009-06-09Microsoft CorporationMatching media file metadata to standardized metadata
US20050246324A1 (en)2004-04-302005-11-03Nokia Inc.System and associated device, method, and computer program product for performing metadata-based searches
US7693887B2 (en)2005-02-012010-04-06Strands, Inc.Dynamic identification of a new set of media items responsive to an input mediaset
US20060212429A1 (en)*2005-03-172006-09-21Microsoft CorporationAnswering top-K selection queries in a relational engine
US8495075B2 (en)2006-03-082013-07-23Apple Inc.Fuzzy string matching of media meta-data
US7885436B2 (en)2006-07-132011-02-08Authentec, Inc.System for and method of assigning confidence values to fingerprint minutiae points
US7962937B2 (en)2006-08-012011-06-14Microsoft CorporationMedia content catalog service
US7844119B2 (en)2007-01-252010-11-30Sony CorporationWavelet detector for finding similarities between major boundaries in images
US8046339B2 (en)*2007-06-052011-10-25Microsoft CorporationExample-driven design of efficient record matching queries
US8095515B2 (en)*2008-03-192012-01-10Semmle LimitedApproximating relation sizes using field dependencies
US8595232B1 (en)2008-05-162013-11-26Houman AzarmSystem and method of linking media content information
US20110184807A1 (en)2010-01-282011-07-28Futurewei Technologies, Inc.System and Method for Filtering Targeted Advertisements for Video Content Delivery
US8335801B2 (en)*2010-12-172012-12-18Fanhattan LlcSystem and method for matching content between sources
US8266115B1 (en)2011-01-142012-09-11Google Inc.Identifying duplicate electronic content based on metadata
US9361355B2 (en)*2011-11-152016-06-07Ab Initio Technology LlcData clustering based on candidate queries
US9235867B2 (en)2012-06-042016-01-12Microsoft Technology Licensing, LlcConcurrent media delivery
US20140286624A1 (en)2013-03-252014-09-25Nokia CorporationMethod and apparatus for personalized media editing
IL226219A (en)*2013-05-072016-10-31Picscout (Israel) LtdEfficient image matching for large sets of images
US20140358916A1 (en)*2013-05-292014-12-04Microsoft CorporationPersonalized prioritization of integrated search results
US9280577B1 (en)2013-06-072016-03-08Google Inc.Method for normalizing media metadata
US20150095349A1 (en)*2013-09-272015-04-02Microsoft CorporationAutomatically identifying matching records from multiple data sources
US9465995B2 (en)2013-10-232016-10-11Gracenote, Inc.Identifying video content via color-based fingerprint matching
CN107111617B (en)*2014-12-192021-06-08微软技术许可有限责任公司 Graph Processing in Databases

Cited By (16)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US10417644B2 (en)*2017-05-052019-09-17Servicenow, Inc.Identifying clusters for service management operations
US10685359B2 (en)*2017-05-052020-06-16Servicenow, Inc.Identifying clusters for service management operations
US12153568B2 (en)*2017-06-282024-11-26Adeia Guides Inc.Systems and methods for replacing a stored version of media with a version better suited for a user
US20220365924A1 (en)*2017-06-282022-11-17Rovi Guides, Inc.Systems and methods for replacing a stored version of media with a version better suited for a user
US20200134072A1 (en)*2018-10-302020-04-30Paypal, Inc.Automatic case-review and pattern mining and distributed databases
US20210397786A1 (en)*2018-11-022021-12-23Universal Entertainment CorporationInformation providing system, information providing method, and data structure of knowledge data
US11238026B2 (en)*2018-11-262022-02-01Home Box Office, Inc.Identifier technology for entertainment data
US11907201B2 (en)2018-11-262024-02-20Home Box Office, Inc.Identifier technology for entertainment data
CN110209734A (en)*2019-05-052019-09-06深圳市腾讯计算机系统有限公司Data copy method, device, computer equipment and storage medium
US11921746B2 (en)2019-05-052024-03-05Tencent Technology (Shenzhen) Company LimitedData replication method and apparatus, computer device, and storage medium
EP3968175A4 (en)*2019-05-052022-05-25Tencent Technology (Shenzhen) Company Limited METHOD AND APPARATUS FOR DATA REPLICATION, AND COMPUTER DEVICE AND STORAGE MEDIA
CN110188100A (en)*2019-05-172019-08-30中国银行股份有限公司Data processing method, device and computer storage medium
US11537618B2 (en)*2020-03-182022-12-27Microsoft Technology Licensing, LlcCompliant entity conflation and access
CN112214983A (en)*2020-09-212021-01-12远光软件股份有限公司Data record duplicate checking method and system
CN114116729A (en)*2022-01-182022-03-01荣耀终端有限公司 Method and device for processing test data
CN115080553A (en)*2022-07-212022-09-20广东广物优车科技有限公司Intelligent monitoring method for export goods

Also Published As

Publication numberPublication date
US20190205294A1 (en)2019-07-04
US11275723B2 (en)2022-03-15

Similar Documents

PublicationPublication DateTitle
US11275723B2 (en)Reducing processing for comparing large metadata sets
US11093515B2 (en)Internet search result intention
Jing et al.When large language models meet vector databases: A survey
Cheng et al.On effective location-aware music recommendation
US9569506B2 (en)Uniform search, navigation and combination of heterogeneous data
US10936680B2 (en)Intent based search result interaction
US9201931B2 (en)Method for obtaining search suggestions from fuzzy score matching and population frequencies
JP5192475B2 (en) Object classification method and object classification system
US11232153B2 (en)Providing query recommendations
JP2019501466A (en) Method and system for search engine selection and optimization
MX2013005056A (en)Multi-modal approach to search query input.
US9798833B2 (en)Accessing information content in a database platform using metadata
US20170140037A1 (en)Document search apparatus and method
CN117688166A (en)Method and device for determining output data based on universal language model
CN113934869A (en)Database construction method, multimedia file retrieval method and device
CN113761213B (en)Knowledge graph-based data query system, method and terminal equipment
US12417218B2 (en)Systems and methods for scalable dataset content embedding for improved database searchability
CN105630837B (en)Media record searching method and device
KR102732683B1 (en)Apparatus for searching video
US20240220530A1 (en)Multi-modal sound effects recommendation
US20220172455A1 (en)Systems and methods for fractal-based visual searching
US11200266B2 (en)Identifying named entities in questions related to structured data
KR20220056287A (en)A semantic image meta extraction and AI learning data composition system using ontology
JP7646091B2 (en) Information processing device, search method, and search program
YeDevelopment of the Data Model and Search Engine for a Moment Retrieval Application

Legal Events

DateCodeTitleDescription
ASAssignment

Owner name:MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON

Free format text:ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:CAZIN, VICTOR;REEL/FRAME:039902/0489

Effective date:20160930

STCBInformation on status: application discontinuation

Free format text:ABANDONED -- FAILURE TO PAY ISSUE FEE


[8]ページ先頭

©2009-2025 Movatter.jp