Movatterモバイル変換

Hanten Chang¹ &
Hiroyasu Ando²

732Accesses
2Citations
Explore all metrics

Abstract

Collecting large amounts of data is beneficial in machine learning to generate models that are less biased. There are many cases in which pieces of similar data are distributed among organizations, and it is difficult to integrate these data owing to issues involving privacy and cost. Integrating these distributed data without delivering the original data leads to the concept of data collaboration, which combines data held by different organizations in a secure manner. We propose a method in which a distance matrix of the original data obtained using common data among organizations is shared to learn neighbor information of the original data. Specifically, the proposed method robustly integrates distributed data, which is of as good quality as connected raw data, in cases where the amount of data in each organization is small and the data bias is large. In addition, the proposed method is applicable to data contaminated by noise. To demonstrate the effectiveness of the proposed method, we performed a classification task on open biological data divided into several pieces and found that the classification results for divided data were as precise as when all data were available. Finally, we show that the robustness of the method against noise improves the anonymity of the original data as a by-product.

This is a preview of subscription content,log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Collaborative Data Analysis: Non-model Sharing-Type Machine Learning for Distributed Data

Privacy-Preserving Classification Rule Mining for Balancing Data Utility and Knowledge Privacy Using Adapted Binary Firefly Algorithm

Article24 July 2017

Bidirectional Collaborative Frameworks for Decentralized Data Management

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

Aggarwal CC, Philip SY. A general survey of privacy-preserving data mining models and algorithms. In: Yin Y, Kaku I, Tang J, Zhu JM, editors. Privacy-preserving data mining. New York: Springer; 2008. p. 11–52.
Chapter Google Scholar
Agrawal R, Srikant R. Privacy-preserving data mining. In: ACM Sigmod Record, vol. 29. New York: ACM; 2000. p. 439–50.
Bonawitz K, Eichner H, Grieskamp W, Huba D, Ingerman A, Ivanov V, Kiddon C, Konecný J, Mazzocchi S, McMahan HB, Overveldt TV, Petrou D, Ramage D, Roselander J. Towards federated learning at scale: system design. 2019.arXiv:1902.01046.
Cai H, Zheng VW, Chang KC. A comprehensive survey of graph embedding: problems, techniques and applications. 2017.arXiv:1709.07604.
Chida K, Morohashi G, Fuji H, Magata F, Fujimura A, Hamada K, Ikarashi D, Yamamoto R. Implementation and evaluation of an efficient secure computation system using ‘R’ for healthcare statistics. J Am Med Inf Assoc. 2014;21(e2):e326–31.
Article Google Scholar
Cui P, Wang X, Pei J, Zhu W. A survey on network embedding. 2017.arXiv:1711.08752.
Cunningham JP, Ghahramani Z. Linear dimensionality reduction: survey, insights, and generalizations. J Mach Learn Res. 2015;16:2859–900.
MathSciNet MATH Google Scholar
Du W, Atallah MJ. Secure multi-party computation problems and their applications: a review and open problems. In: Proceedings of the 2001 workshop on New security paradigms. ACM; 2001. p. 13–22.
Dua D, Graff C. UCI machine learning repository. 2017.http://archive.ics.uci.edu/ml.
Goyal P, Ferrara E. Graph embedding techniques, applications, and performance: a survey. 2017. CoRRarXiv:1705.02801.
Grover A, Leskovec J. Node2vec: scalable feature learning for networks. In: Proceedings of the 22Nd ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’16. New York: ACM; 2016. p. 855–64.https://doi.org/10.1145/2939672.2939754.
He X. Locality preserving projections. Ph.D. thesis, Chicago, IL, USA. 2005. AAI3195015.
Imakura A, Sakurai T. Data collaboration analysis framework using centralization of individual intermediate representations for distributed data sets. ASCE ASME J Risk Uncertain Eng Syst A Civ Eng. 2020;6(2):04020018.
Konečný J, McMahan HB, Yu FX, Richtarik P, Suresh AT, Bacon D. Federated learning: Strategies for improving communication efficiency. In: NIPS workshop on private multi-party machine learning. 2016.arXiv:1610.05492.
McMahan HB, Moore E, Ramage D, Hampson S, Arcas BA. Communication-efficient learning of deep networks from decentralized data. In: Proceedings of the 20th international conference on artificial intelligence and statistics (AISTATS). 2017.arXiv:1602.05629.
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. Distributed representations of words and phrases and their compositionality. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ, editors. Advances in neural information processing systems, vol. 26. Red Hook: Curran Associates Inc; 2013. p. 3111–9.
Google Scholar
Nikolaenko V, Weinsberg U, Ioannidis S, Joye M, Boneh D, Taft N. Privacy-preserving ridge regression on hundreds of millions of records. In: 2013 IEEE symposium on security and privacy. IEEE; 2013. p. 334–48.
Perozzi B, Al-Rfou R, Skiena S. Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’14. New York: ACM; 2014. p. 701–10.https://doi.org/10.1145/2623330.2623732.
Roweis ST, Saul LK. Nonlinear dimensionality reduction by locally linear embedding. Science. 2000;290:2323–6.
Article Google Scholar
Sweeney L. k-anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Syst. 2002;10(05):557–70.
Article MathSciNet Google Scholar
Wagner I, Eckhoff D. Technical privacy metrics: a systematic survey. ACM Comput Surv CSUR. 2018;51(3):57.
Google Scholar
Yao ACC. How to generate and exchange secrets. In: 27th annual symposium on foundations of computer science (SFCS 1986). IEEE; 1986. p. 162–7

Download references

Acknowledgements

The present study was supported in part by the New Energy and Industrial Technology Development Organization (NEDO) and by the Japan Society for the Promotion of Science (JSPS), Grants-in-Aid for Scientific Research Nos. 19K12198, 17H03280 and JST MIRAI JPMJMI19B.

Author information

Authors and Affiliations

Division of Policy and Planning Sciences, Faculty of Engineering, Information and Systems, University of Tsukuba, 1-1-1 Tennoudai, Tsukuba, Ibaraki, Japan
Hanten Chang
Faculty of Engineering, Information and Systems, University of Tsukuba, 1-1-1 Tennoudai, Tsukuba, Ibaraki, Japan
Hiroyasu Ando

Authors

Hanten Chang
View author publications
Search author on:PubMed Google Scholar
Hiroyasu Ando
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence toHanten Chang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Privacy, Data Protection and Digital Identity” guest edited by Fernando Boavida, Andrea Praitano and Georgios V. Lioudakis.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chang, H., Ando, H. Privacy-Preserving Data Sharing by Integrating Perturbed Distance Matrices.SN COMPUT. SCI.1, 121 (2020). https://doi.org/10.1007/s42979-020-00127-w

Download citation

Received:15 November 2019
Accepted:27 March 2020
Published:15 April 2020
DOI:https://doi.org/10.1007/s42979-020-00127-w

Movatterモバイル変換

Privacy-Preserving Data Sharing by Integrating Perturbed Distance Matrices

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Collaborative Data Analysis: Non-model Sharing-Type Machine Learning for Distributed Data

Privacy-Preserving Classification Rule Mining for Balancing Data Utility and Knowledge Privacy Using Adapted Binary Firefly Algorithm

Bidirectional Collaborative Frameworks for Decentralized Data Management

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Associated Content

Privacy, Data Protection and Digital Identity

Access this article

Subscribe and save

Buy Now