TW201727516A

Movatterモバイル変換

Info

Publication number: TW201727516A
Application number: TW105139708A
Authority: TW
Inventors: Yong-Kai Zhou; hong-feng Chai; Shuo He; Dong-Jie He; Guo-Bao Liu; Hua Cai
Original assignee: China Unionpay Co Ltd
Priority date: 2015-12-02
Filing date: 2016-12-01
Publication date: 2017-08-01
Also published as: CN105590066B; CN105590066A; TWI664538B; WO2017092696A1

Abstract

A method for safe integration of big data, comprising: a first party and a second party negotiating about associated fields, data items required by the first party and the second party and a sorting rule; screening out, on the basis of the data items required by the first party and the second party, a first to-be-integrated data set and a second to-be-integrated data set respectively from a first data set and a second data set; sorting, according to the sorting rule, respectively the first to-be-integrated data set and the second to-be-integrated data set, and removing, respectively from the first to-be-integrated data set and the second to-be-integrated data set, the data that the associated fields correspond to; submitting the first to-be-integrated data set and the second to-be-integrated data set to a third party computing platform, so as to form an integrated data set; and the third party computing platform generating, by means of analysis and calculation of the integrated data set, a result data set. This invention effectively prevents the private data from being leaked while accomplishing the integration of big data, facilitating share of information on the premise of ensuring the data security.

Description

Translated fromChinese

不泄露隱私的大數據安全融合方法Big data security fusion method without revealing privacy

本發明涉及一種大數據安全融合方法。The invention relates to a big data security fusion method.

隨著國家“互聯網+”戰略的出臺，各產業之間的大數據融合需求愈發迫切。然而，一方面，不同的機構對於大數據共用持歡迎的態度，引入不同類型資料的融合可以產生新的分析結果，資料價值將因此產生乘數效應；另一方面，雙方對於在資料融合的過程中隱私資料的泄露存在擔憂，因為最終的分析結果往往只是一個統計性結論，而在大數據融合計算的過程中卻不得不將資料所有的條目細節都暴露于對方。該問題已經成為產業間大數據協作與共用的一大障礙。With the introduction of the national “Internet Plus” strategy, the need for big data convergence between industries is becoming more urgent. However, on the one hand, different institutions have a welcome attitude towards big data sharing. The introduction of different types of data can produce new analysis results, and the value of the data will have a multiplier effect. On the other hand, the two sides are in the process of data fusion. There is concern about the disclosure of privacy data, because the final analysis result is often only a statistical conclusion, but in the process of big data fusion calculation, all the details of the data must be exposed to the other party. This problem has become a major obstacle to the sharing and sharing of big data between industries.

因此，本領域技術人員期望獲得一種有效遮罩隱私資料的、可靠的大數據安全融合方法。Therefore, those skilled in the art desire to obtain a reliable big data security fusion method that effectively masks privacy data.

本發明的一個目的在於提供一種有效遮罩隱私資料的大數據安全融合方法。It is an object of the present invention to provide a big data security fusion method that effectively masks private data.

為實現上述目的，本發明提供一種技術方案如下：一種大數據安全融合方法，用於將第一方存儲的第一資料集與第二方存儲的第二資料集進行融合，該方法包括如下步驟：a)、第一方與第二方就關聯欄位、各自所需的資料項目以及排序規則進行協商；b)、基於各自所需的資料項目分別從第一資料集、第二資料集中篩選出第一待融合資料集、第二待融合資料集；c)、依據排序規則分別對第一待融合資料集、第二待融合資料集進行排序，並將關聯欄位對應的資料分別從第一待融合資料集、第二待融合資料集中剔除；d)、第一方、第二方分別將第一待融合資料集、第二待融合資料集提交到協力廠商計算平臺，以形成已融合資料集；e)、協力廠商計算平臺對已融合資料集進行分析計算，生成結果資料集。In order to achieve the above object, the present invention provides a technical solution as follows: a big data security fusion method for fusing a first data set stored by a first party with a second data set stored by a second party, the method comprising the following steps : a), the first party and the second party negotiate the relevant fields, their required data items and the sorting rules; b), based on the respective data items required to filter from the first data set and the second data set respectively The first data set to be merged and the second data set to be merged; c) sorting the first data set to be merged and the second data set to be merged according to the sorting rule, and respectively corresponding data corresponding to the field The data set to be merged and the second data to be merged are removed; d), the first party and the second party respectively submit the first data set to be merged and the second data set to be merged to the collaborative manufacturer computing platform to form a merged Data set; e), the collaborative manufacturer computing platform analyzes and calculates the merged data set, and generates a result data set.

優選地，協力廠商計算平臺分別獨立于第一方以及第二方。Preferably, the third-party computing platform is independent of the first party and the second party, respectively.

優選地，在分析計算完成後，將第一待融合資料集、第二待融合資料集從計算系統中刪除。Preferably, after the analysis calculation is completed, the first to-be-fused data set and the second to-be-fused data set are deleted from the computing system.

本發明實施例提供的大數據安全融合方法，在實現大數據融合的同時，有效防止隱私資料的泄露，在確保資料安全的前提下促進了資訊的共用，拓寬了大數據融合技術的應用廣度和深度。此外，上述大數據安全融合方法實施簡單、實現成本低，利於在業內推廣應用。The big data security fusion method provided by the embodiments of the present invention effectively prevents the leakage of private data while realizing the big data fusion, promotes the sharing of information under the premise of ensuring data security, and broadens the application breadth of the big data fusion technology and depth. In addition, the above-mentioned big data security convergence method is simple to implement and low in implementation cost, and is advantageous for promotion and application in the industry.

S10~S50‧‧‧步驟S10~S50‧‧‧Steps

圖1示出本發明第一實施例提供的大數據安全融合方法的流程示意圖。FIG. 1 is a schematic flowchart diagram of a big data security convergence method according to a first embodiment of the present invention.

需要說明的是，依照本發明所公開的各實施例，第一方在第一資料庫中存儲第一資料集，第二方在第二資料庫中存儲第二資料集。It should be noted that, in accordance with various embodiments of the present disclosure, the first party stores the first data set in the first database, and the second party stores the second data set in the second database.

第一、第二資料集分別記錄不同的資訊，例如多個使用者分別在不同場合的活動資訊。第一、第二資料集具有資訊的交集，例如，使用者的身份資訊，其可以提取出來作為關聯欄位。The first and second data sets respectively record different information, such as activity information of multiple users on different occasions. The first and second data sets have an intersection of information, for example, user identity information, which can be extracted as an associated field.

本發明提供對第一、第二資料集進行大數據融合的各種實施方式。The present invention provides various embodiments for performing big data fusion on the first and second data sets.

如圖1所示，本發明第一實施例提供一種大數據安全融合方法，其包括如下步驟：步驟S10、第一方與第二方就關聯欄位、各自所需的資料項目以及排序規則進行協商。As shown in FIG. 1 , a first embodiment of the present invention provides a big data security fusion method, which includes the following steps: Step S10, the first party and the second party perform related fields, respective data items, and collation rules. Negotiation.

具體地，第一方與第二方進行協商會話，並就關聯欄位、各自所需的資料項目以及排序規則達成一致。Specifically, the first party and the second party negotiate a session, and agree on the associated fields, their respective data items, and the sorting rules.

各自所需的資料項目包括第一方期望在資料融合中從第二方間接獲得的資料項目，以及第二方期望在資料融合中從第一方間接獲得的資料項目。通過各自所需的資料項目，在協商會話中可以確定第一方、第二方分別關心哪些使用者的相關資訊，並進一步就這些使用者的身份資訊達成一致。The data items required for each include the data items that the first party expects to obtain indirectly from the second party in the data fusion, and the data items that the second party expects to obtain indirectly from the first party in the data fusion. Through their respective needsThe data item can determine which users' information about the first party and the second party are concerned in the negotiation session, and further agree on the identity information of these users.

關聯欄位能夠表示第一、第二資料集中的資訊交集部分，其可直接取自下列資訊中的任一個或多個：使用者的身份資訊；使用者的所持卡資訊；和/或，唯一地確定使用者的其他標識資訊。The associated field can represent the intersection of information in the first and second data sets, which can be directly taken from any one or more of the following information: user identity information; user's card information; and/or unique Determining other identification information of the user.

排序規則確定在後續的融合過程中，按照何種順序來對具體的待融合資料集進行排序。一旦確定，這種排序規則不能被隨意改變，除非通過再次的協商會話進行變更。依照所確定的排序規則進行排序，第一、第二待融合資料集中各資料項目之間的對應關係也能夠被確定。The collation determines the order in which the specific data sets to be merged are sorted in the subsequent fusion process. Once determined, this sorting rule cannot be arbitrarily changed unless changes are made through a separate negotiation session. According to the determined sorting rules, the correspondence between the data items in the first and second data sets to be merged can also be determined.

協商會話可以由第一方或第二方發起，另一方進行回應。或者，協商會話可以由不同于第一方和第二方的一個獨立的實體模組來發起，第一方、第二方收到指令後，直接進行協商會話，協商會話完成後，通知該實體模組。The negotiation session can be initiated by the first party or the second party, and the other party responds. Alternatively, the negotiation session may be initiated by an independent entity module different from the first party and the second party. After receiving the instruction, the first party and the second party directly perform the negotiation session, and after the negotiation session is completed, notify the entity. Module.

步驟S20、基於各自所需的資料項目分別從第一資料集、第二資料集中篩選出第一待融合資料集、第二待融合資料集。Step S20: Filtering the first data set to be merged and the second data set to be merged from the first data set and the second data set respectively according to respective data items.

具體地，基於協商會話所確定的各自所需的資料項目，可以從第一資料集中篩選出第一待融合資料集，以及從第二資料集中篩選出第二待融合資料集。可以理解，第一待融合資料集與第二待融合資料集具有數量相同的資料項目，且第一待融合資料集中的每個資料項目都能夠在第二待融合資料集中找到與之對應的資料項目，反之亦然。Specifically, based on the respective required data items determined by the negotiation session, the first data set to be merged may be filtered out from the first data set, and the second data set to be merged may be selected from the second data set. It can be understood that the first data set to be merged and the second data set to be merged have a quantity phaseThe same data item, and each data item in the first data set to be merged can find the corresponding data item in the second data set to be merged, and vice versa.

步驟S30、依據排序規則分別對第一待融合資料集、第二待融合資料集進行排序，並將關聯欄位對應的資料分別從第一待融合資料集、第二待融合資料集中剔除。Step S30: Sort the first to-be-fused data set and the second to-be-fused data set according to the sorting rule, and remove the data corresponding to the related field from the first to-be-fused data set and the second to-be-fused data set respectively.

該步驟S30具體包括排序步驟和剔除步驟。This step S30 specifically includes a sorting step and a culling step.

依照一種具體實現，排序步驟可以包括：第一方、第二方分別依據排序規則對第一待融合資料集、第二待融合資料集進行排序。According to a specific implementation, the sorting step may include: the first party and the second party respectively sort the first to-be-fused data set and the second to-be-fused data set according to the sorting rule.

剔除步驟可以包括：第一方、第二方分別將關聯欄位對應的資料分別從第一待融合資料集、第二待融合資料集中剔除。The culling step may include: the first party and the second party respectively remove the data corresponding to the associated field from the first to-be-fused data set and the second to-be-fused data set.

通過執行剔除步驟，第一、第二待融合資料集不再包括使用者身份資訊，從而有效地遮罩了隱私資訊；而通過執行排序步驟，第一、第二待融合資料集中的資料項目之間已具有明確的一一對應關係。By performing the culling step, the first and second data sets to be merged no longer include user identity information, thereby effectively masking the privacy information; and by performing the sorting step, the data items of the first and second data sets to be merged are There is a clear one-to-one correspondence between the two.

步驟S40、第一方、第二方分別將第一待融合資料集、第二待融合資料集提交到協力廠商架設的計算平臺，以形成已融合資料集。Step S40: The first party and the second party respectively submit the first to-be-fused data set and the second to-be-fused data set to a computing platform set up by the third-party manufacturer to form a merged data set.

具體地，第一方將執行排序步驟和剔除步驟之後得到的第一待融合資料集通過專用通信線路提交到協力廠商架設的計算平臺，同時，第二方執行類似操作。其中，協力廠商計算平臺分別獨立于第一方以及第二方。Specifically, the first party submits the first to-be-fused data set obtained after the performing the sorting step and the culling step to the computing platform set up by the third-party vendor through the dedicated communication line, and the second party performs a similar operation. itsThe third-party and second-party computing platforms are independent of the first party and the second party.

隨後，依照執行上述排序步驟所得到的先後順序，將第一待融合資料集中的資料項目與第二待融合資料集中的資料項目一一對應地進行結合來生成新的資料項目，進而形成已融合資料集。Then, according to the sequence obtained by performing the above sorting step, the data items in the first data to be merged are combined with the data items in the second data to be merged to generate a new data item, thereby forming a merged Data set.

所形成的已融合資料集同時包括來自第一方的使用者活動資訊以及來自第二方的使用者活動資訊，但不包括使用者身份資訊，因此，對協力廠商來說，其無法獲知是哪個用戶進行了這些活動。The formed merged data set includes user activity information from the first party and user activity information from the second party, but does not include the user identity information. Therefore, it is impossible for the third party to know which one is The user has performed these activities.

步驟S50、協力廠商計算平臺對已融合資料集進行分析計算，生成結果資料集。Step S50: The third-party computing platform analyzes and calculates the merged data set, and generates a result data set.

通過該步驟S50，協力廠商計算平臺可以對已融合資料集進行分析計算，生成結果資料集，結果資料集可以是分析統計的結果，其完全不同於第一、第二待融合資料集。結果資料集可以回饋給第一方、第二方，而第一方、第二方從結果資料集無法還原出原始資料。Through the step S50, the third-party computing platform can perform analysis and calculation on the merged data set to generate a result data set, and the result data set can be the result of the analysis and statistics, which is completely different from the first and second to-be-fused data sets. The result data set can be fed back to the first party and the second party, and the first party and the second party cannot restore the original data from the result data set.

進一步地，在上述分析計算完成後，協力廠商計算平臺可以刪除第一待融合資料集、第二待融合資料集，從而更有利於保護資料的安全性與隱私性。Further, after the above analysis and calculation is completed, the third-party integration data set and the second to-be-fused data set may be deleted, thereby facilitating protection of data security and privacy.

該實施例所提供的大數據安全融合方法，在實現大數據融合的同時，遮罩了使用者的身份資訊，從而有效防止隱私資料的泄露。這種大數據融合方法安全可靠，實現簡單。The big data security fusion method provided by the embodiment masks the identity information of the user while realizing the big data fusion, thereby effectively preventing the leakage of the private data. This method of big data fusion is safe and reliable, and simple to implement.

根據上述實施例進一步改進的實現方式，在步驟S10中還可以包括：第一方向第二方提出第一資料集中涉及使用者隱私資訊的欄位或需要保護的欄位。與此相應地，步驟S30還包括：將該涉及使用者隱私資訊的欄位或需要保護的欄位所對應的資料從第一待融合資料集中剔除。A further improved implementation according to the above embodiment,The step S10 may further include: in the first direction, the second party proposes a field in the first data set that relates to the user's private information or a field to be protected. Correspondingly, step S30 further includes: deleting the data corresponding to the user privacy information field or the field to be protected from the first to-be-fused data set.

類似地，第二方也可以向第一方提出第二資料集中涉及使用者隱私資訊的欄位或需要保護的欄位。Similarly, the second party may also present to the first party a field in the second data set that relates to the user's private information or a field that needs to be protected.

這種改進實現方式，提供對使用者隱私資訊的強化保護，特別適合在對資料保護要求較高的場合中使用。This improved implementation provides enhanced protection of user privacy information, and is particularly suitable for use in applications where data protection is critical.

上述說明僅針對于本發明的優選實施例，並不在於限制本發明的保護範圍。本領域技術人員可作出各種變形設計，而不脫離本發明的思想及附隨的權利要求。The above description is only for the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Various modifications may be made by those skilled in the art without departing from the spirit of the invention and the appended claims.

Claims

Translated fromChinese

一種大數據安全融合方法，用於將第一方存儲的第一資料集與第二方存儲的第二資料集進行融合，該方法包括如下步驟：a)、該第一方與該第二方就關聯欄位、各自所需的資料項目以及排序規則進行協商；b)、基於該各自所需的資料項目分別從該第一資料集、第二資料集中篩選出第一待融合資料集、第二待融合資料集；c)、依據該排序規則分別對該第一待融合資料集、第二待融合資料集進行排序，並將該關聯欄位對應的資料分別從該第一待融合資料集、第二待融合資料集中剔除；d)、該第一方、第二方分別將該第一待融合資料集、第二待融合資料集提交到協力廠商計算平臺，以形成已融合資料集；e)、該協力廠商計算平臺對該已融合資料集進行分析計算，生成結果資料集。A big data security fusion method for fusing a first data set stored by a first party with a second data set stored by a second party, the method comprising the steps of: a), the first party and the second party Negotiating the related fields, their respective data items, and the sorting rules; b) screening the first data sets to be merged from the first data set and the second data set based on the respective data items required The second to be merged data set; c) sorting the first to-be-fused data set and the second to-be-fused data set according to the sorting rule, and respectively corresponding data corresponding to the related field from the first to-be-fused data set And the second to be merged data is removed; d), the first party and the second party separately submit the first to-be-fused data set and the second to-be-fused data set to the third-party computing platform to form the merged data set; e) The third-party computing platform analyzes and calculates the merged data set to generate a result data set.

如申請專利範圍第1項所述的方法，其中，該協力廠商計算平臺分別獨立於該第一方以及該第二方。The method of claim 1, wherein the third party computing platform is independent of the first party and the second party, respectively.

如申請專利範圍第1項所述的方法，其中，該步驟e)還包括：在該分析計算完成後，將該第一待融合資料集、第二待融合資料集從該計算系統中刪除。The method of claim 1, wherein the step e) further comprises: after the analyzing calculation is completed, deleting the first to-be-fused data set and the second to-be-fused data set from the computing system.

如申請專利範圍第1所述的方法，其中，該第一資料集、第二資料集分別記錄多個使用者的不同活動資訊，該關聯欄位包括：使用者的身份資訊；使用者的所持卡資訊；和/或唯一地確定使用者的標識資訊。The method of claim 1, wherein the firstThe data set and the second data set respectively record different activity information of the plurality of users, and the related field includes: the identity information of the user; the card information of the user; and/or the identification information of the user is uniquely determined.

如申請專利範圍第4所述的方法，其中，該步驟a)還包括：該第一方向該第二方提出該第一資料集中涉及使用者隱私資訊的欄位；該步驟c)還包括：將該涉及使用者隱私資訊的欄位所對應的資料從該第一待融合資料集中剔除。The method of claim 4, wherein the step a) further comprises: the first direction, the second party, submitting a field related to the user privacy information in the first data set; the step c) further comprises: The data corresponding to the field related to the user privacy information is removed from the first to-be-fused data set.