732Accesses
2Citations
Abstract
Collecting large amounts of data is beneficial in machine learning to generate models that are less biased. There are many cases in which pieces of similar data are distributed among organizations, and it is difficult to integrate these data owing to issues involving privacy and cost. Integrating these distributed data without delivering the original data leads to the concept of data collaboration, which combines data held by different organizations in a secure manner. We propose a method in which a distance matrix of the original data obtained using common data among organizations is shared to learn neighbor information of the original data. Specifically, the proposed method robustly integrates distributed data, which is of as good quality as connected raw data, in cases where the amount of data in each organization is small and the data bias is large. In addition, the proposed method is applicable to data contaminated by noise. To demonstrate the effectiveness of the proposed method, we performed a classification task on open biological data divided into several pieces and found that the classification results for divided data were as precise as when all data were available. Finally, we show that the robustness of the method against noise improves the anonymity of the original data as a by-product.
This is a preview of subscription content,log in via an institution to check access.
Access this article
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
Price includes VAT (Japan)
Instant access to the full article PDF.






Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Aggarwal CC, Philip SY. A general survey of privacy-preserving data mining models and algorithms. In: Yin Y, Kaku I, Tang J, Zhu JM, editors. Privacy-preserving data mining. New York: Springer; 2008. p. 11–52.
Agrawal R, Srikant R. Privacy-preserving data mining. In: ACM Sigmod Record, vol. 29. New York: ACM; 2000. p. 439–50.
Bonawitz K, Eichner H, Grieskamp W, Huba D, Ingerman A, Ivanov V, Kiddon C, Konecný J, Mazzocchi S, McMahan HB, Overveldt TV, Petrou D, Ramage D, Roselander J. Towards federated learning at scale: system design. 2019.arXiv:1902.01046.
Cai H, Zheng VW, Chang KC. A comprehensive survey of graph embedding: problems, techniques and applications. 2017.arXiv:1709.07604.
Chida K, Morohashi G, Fuji H, Magata F, Fujimura A, Hamada K, Ikarashi D, Yamamoto R. Implementation and evaluation of an efficient secure computation system using ‘R’ for healthcare statistics. J Am Med Inf Assoc. 2014;21(e2):e326–31.
Cui P, Wang X, Pei J, Zhu W. A survey on network embedding. 2017.arXiv:1711.08752.
Cunningham JP, Ghahramani Z. Linear dimensionality reduction: survey, insights, and generalizations. J Mach Learn Res. 2015;16:2859–900.
Du W, Atallah MJ. Secure multi-party computation problems and their applications: a review and open problems. In: Proceedings of the 2001 workshop on New security paradigms. ACM; 2001. p. 13–22.
Dua D, Graff C. UCI machine learning repository. 2017.http://archive.ics.uci.edu/ml.
Goyal P, Ferrara E. Graph embedding techniques, applications, and performance: a survey. 2017. CoRRarXiv:1705.02801.
Grover A, Leskovec J. Node2vec: scalable feature learning for networks. In: Proceedings of the 22Nd ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’16. New York: ACM; 2016. p. 855–64.https://doi.org/10.1145/2939672.2939754.
He X. Locality preserving projections. Ph.D. thesis, Chicago, IL, USA. 2005. AAI3195015.
Imakura A, Sakurai T. Data collaboration analysis framework using centralization of individual intermediate representations for distributed data sets. ASCE ASME J Risk Uncertain Eng Syst A Civ Eng. 2020;6(2):04020018.
Konečný J, McMahan HB, Yu FX, Richtarik P, Suresh AT, Bacon D. Federated learning: Strategies for improving communication efficiency. In: NIPS workshop on private multi-party machine learning. 2016.arXiv:1610.05492.
McMahan HB, Moore E, Ramage D, Hampson S, Arcas BA. Communication-efficient learning of deep networks from decentralized data. In: Proceedings of the 20th international conference on artificial intelligence and statistics (AISTATS). 2017.arXiv:1602.05629.
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. Distributed representations of words and phrases and their compositionality. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ, editors. Advances in neural information processing systems, vol. 26. Red Hook: Curran Associates Inc; 2013. p. 3111–9.
Nikolaenko V, Weinsberg U, Ioannidis S, Joye M, Boneh D, Taft N. Privacy-preserving ridge regression on hundreds of millions of records. In: 2013 IEEE symposium on security and privacy. IEEE; 2013. p. 334–48.
Perozzi B, Al-Rfou R, Skiena S. Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’14. New York: ACM; 2014. p. 701–10.https://doi.org/10.1145/2623330.2623732.
Roweis ST, Saul LK. Nonlinear dimensionality reduction by locally linear embedding. Science. 2000;290:2323–6.
Sweeney L. k-anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Syst. 2002;10(05):557–70.
Wagner I, Eckhoff D. Technical privacy metrics: a systematic survey. ACM Comput Surv CSUR. 2018;51(3):57.
Yao ACC. How to generate and exchange secrets. In: 27th annual symposium on foundations of computer science (SFCS 1986). IEEE; 1986. p. 162–7
Acknowledgements
The present study was supported in part by the New Energy and Industrial Technology Development Organization (NEDO) and by the Japan Society for the Promotion of Science (JSPS), Grants-in-Aid for Scientific Research Nos. 19K12198, 17H03280 and JST MIRAI JPMJMI19B.
Author information
Authors and Affiliations
Division of Policy and Planning Sciences, Faculty of Engineering, Information and Systems, University of Tsukuba, 1-1-1 Tennoudai, Tsukuba, Ibaraki, Japan
Hanten Chang
Faculty of Engineering, Information and Systems, University of Tsukuba, 1-1-1 Tennoudai, Tsukuba, Ibaraki, Japan
Hiroyasu Ando
- Hanten Chang
Search author on:PubMed Google Scholar
- Hiroyasu Ando
Search author on:PubMed Google Scholar
Corresponding author
Correspondence toHanten Chang.
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the topical collection “Privacy, Data Protection and Digital Identity” guest edited by Fernando Boavida, Andrea Praitano and Georgios V. Lioudakis.
Rights and permissions
About this article
Cite this article
Chang, H., Ando, H. Privacy-Preserving Data Sharing by Integrating Perturbed Distance Matrices.SN COMPUT. SCI.1, 121 (2020). https://doi.org/10.1007/s42979-020-00127-w
Received:
Accepted:
Published:
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative