322Accesses
4Citations
Abstract
This paper studies a link-text algorithm to model scientific documents by citation influences, which is applied to document clustering and influence prediction. Most existing link-text algorithms ignore the different weights of citation influences that cited documents have on the corresponding citing document. In fact, citation influences reveal the latent structure of citation networks which is more accurate to describe the knowledge flow than the original citation structure. In this study, a citation influence is modeled as a weight of linear combination that approximates the text of a document by the content of its citations. Then, we present a novel matrix factorization algorithm, called Citation-Influences-Text Nonnegative Matrix Factorization (CIT-NMF), which incorporates text and citations to obtain better document representations by learning influence weights. In addition, an efficient optimization method is derived to solve the optimization problem. Experimental results on several real datasets show satisfactory improvements over the baseline models.
This is a preview of subscription content,log in via an institution to check access.
Access this article
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
Price includes VAT (Japan)
Instant access to the full article PDF.




Similar content being viewed by others
References
Asatani, K., Mori, J., Ochi, M., Sakata, I.: Detecting trends in academic research from a citation network using network representation learning. PLoS One.13(5), 197–220 (2018)
Barbieri, N., Bonchi, F., Manco, G.: Topic-aware social influence propagation models. Knowl. Inf. Syst.37(3), 555–584 (2013)
Bonzi, S., Snyder, H.: Motivations for citation: a comparison of self citation and citation to others. Scientometrics.21(2), 245–254 (1991)
Cai, D., He, X., Han, J., Huang, T.S.: Graph regularized nonnegative matrix factorization for data representation. IEEE Trans. Pattern Anal. Mach. Intell.33(8), 1548–1560 (2011)
Chang, J., Blei, D.: Relational topic models for document networks. In: Proceedings of the 12th International Conference on Artificial Intelligence and Statistics, AISTATS 2009, pp. 81–88. Clearwater Beach, Florida, USA (2009)
Chen, N., Zhu, J., Xia, F., Zhang, B.: Discriminative relational topic models. IEEE Trans. Pattern Anal. Mach. Intell.37(5), 973–986 (2015)
Cohn, D.A., Hofmann, T.: The missing link-a probabilistic model of document content and hypertext connectivity. In: Advances in Neural Information Processing Systems 14, NIPS 2001, pp. 430–436. Vancouver, British Columbia, Canada (2001)
Dietz, L., Bickel, S., Scheffer, T.: Unsupervised prediction of citation influences. In: Proceedings of the 24th International Conference on Machine Learning, ICML 2007, pp. 233–240. Corvallis, Oregon, USA (2007)
Ding, C., He, X., Simon, H.D.: On the equivalence of nonnegative matrix factorization and spectral clustering. In: Proceedings of the 2005 SIAM International Conference on Data Mining, SDM 2005, pp. 606–610. New Orleans, Louisiana, USA (2005)
Erosheva, E., Fienberg, S., Lafferty, J.: Mixed-membership models of scientific publications. Proc. Natl. Acad. Sci.101(suppl 1), 5220–5227 (2004)
Ganguly, S., Pudi, V.: Paper2vec: combining graph and text information for scientific paper representation. In: European Conference on Information Retrieval, ECIR 2017, pp. 383–395. Aberdeen, Scotland (2017)
Gao, J., Zhang, J.: Clustered svd strategies in latent semantic indexing. Inf. Process. Manag.41(5), 1051–1063 (2005)
Gollapalli, S.D., Caragea, C.: Extracting keyphrases from research papers using citation networks. In: Twenty-Eighth AAAI Conference on Artificial Intelligence, AAAI 2014, pp. 1629–1635. Quebec City, Quebec, Canada (2014)
Grover, A., Leskovec, J.: Node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864. ACM, San Francisco (2016)
Hoyer, P.O.: Non-negative sparse coding. In: Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing, NNSP 2002, pp. 557–565. Martigny, Switzerland, Switzerland (2002)
Hu, C., Cao, H.: Aspect-level influence discovery from graphs. IEEE Trans. Knowl. Data Eng.28(7), 1635–1649 (2016)
Hu, C., Cao, H., Ke, C.: Detecting influence relationships from graphs. In: Proceedings of the 2014 SIAM International Conference on Data Mining, SDM 2014, pp. 821–829. Philadelphia, Pennsylvania, USA (2014)
Huang, S., Kang, Z., Xu, Z.: Auto-weighted multi-view clustering via deep matrix decomposition. Pattern Recogn.97, 1070–1085 (2020)
Huang, W., Kataria, S., Caragea, C., Mitra, P., Giles, C.L., Rokach, L.: Recommending citations: translating papers into references. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM 2012, pp. 1910–1914. Maui, HI, USA (2012)
Kim, J., He, Y., Park, H.: Algorithms for nonnegative matrix and tensor factorizations: a unified view based on block coordinate descent framework. J. Glob. Optim.58(2), 285–319 (2014)
Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature.401(6755), 788–799 (1999)
Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Advances in Neural Information Processing Systems, NIPS 2001, pp. 556–562. Vancouver, British Columbia, Canada (2001)
Li, C.T., Huang, M.Y., Yan, R.: Team formation with influence maximization for influential event organization on social networks. World Wide Web. 1–21 (2017)
Li, W., Yeung, D.: Relation regularized matrix factorization. In: Twenty-First International Joint Conference on Artificial Intelligence, IJCAI 2009, pp. 1126–1131. Pasadena, California, USA (2009)
Li, Y., Chen, W., Wang, Y., Zhang, Z.: Influence diffusion dynamics and influence maximization in social networks with friend and foe relationships. In: Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, WSDM 2013, pp. 657–666. Rome, Italy (2013)
Lim, K.W., Buntine, W.: Bibliographic analysis with the citation network topic model. In: The 6th Asian Conference on Machine Learning, ACML 2014, pp. 142–158. Nha Trang City, Vietnam (2014)
Liu, J., Wang, C., Gao, J., Han, J.: Multi-view clustering via joint nonnegative matrix factorization. In: Proceedings of the 2013 SIAM International Conference on Data Mining, SDM 2013, pp. 252–260. Austin, Texas, USA (2013)
Liu, L., Tang, J., Han, J., Jiang, M., Yang, S.: Mining topic-level influence in heterogeneous networks. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, CIKM 2010, pp. 199–208. Toronto, Ontario, Canada (2010)
Liu, L., Tang, J., Han, J., Yang, S.: Learning influence from heterogeneous social networks. Data Min. Knowl. Disc.25(3), 511–544 (2012)
Liu, Y., Cao, H., Hao, Y., Han, P., Zeng, X.: Discovering context-aware influential objects. In: Proceedings of the 2012 SIAM International Conference on Data Mining, SDM 2012, pp. 780–791. Anaheim, California, USA (2012)
McKeown, K., Daume, H., Chaturvedi, S., Paparrizos, J., Thadani, K., Barrio, P., Biran, O., Bothe, S., Collins, M., Fleischmann, K.R., et al.: Predicting the impact of scientific concepts using full-text features. J. Assoc. Inf. Sci. Technol.67(11), 2684–2696 (2016)
Nallapati, R., Cohen, W.W.: Link-plsa-lda: a new unsupervised model for topics and influence of blogs. In: International Conference on Weblogs and Social Media 2008, ICWSM 2008, pp. 84–92. Hilton Seattle Downtown, Seattle, Washington, USA (2008)
Nallapati, R., McFarland, D., Manning, C.: Topicflow model: unsupervised learning of topic-specific influences of hyperlinked documents. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2011, pp. 543–551. Ft. Lauderdale, FL, USA (2011)
Nallapati, R.M., Ahmed, A., Xing, E.P., Cohen, W.W.: Joint latent topic models for text and citations. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, SIGKDD 2008, pp. 542–550. ACM, Las Vegas (2008)
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in python. J. Mach. Learn. Res.12, 2825–2830 (2011)
Shen, J., Song, Z., Li, S., Tan, Z., Mao, Y., Fu, L., Song, L., Wang, X.: Modeling topic-level academic influence in scientific literatures. In: Workshops at the Thirtieth AAAI Conference on Artificial Intelligence, AAAI 2016, pp. 711–717. Phoenix, Arizona, USA (2016)
Shi, C., Zhang, Z., Ji, Y., Wang, W., Philip, S.Y., Shi, Z.: Semrec: a personalized semantic recommendation method based on weighted heterogeneous information networks. World Wide Web.22(1), 153–184 (2019)
Singh, A.P., Gordon, G.J.: Relational learning via collective matrix factorization. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, SIGKDD 2008, pp. 650–658. Las Vegas, Nevada, USA (2008)
Takeuchi, K., Ishiguro, K., Kimura, A., Sawada, H.: Non-negative multiple matrix factorization. In: Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, IJCAI 2013, pp. 1713–1720. Beijing, China (2013)
Tang, J., Sun, J., Wang, C., Yang, Z.: Social influence analysis in large-scale networks. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, SIGKDD 2009, pp. 807–816. Las Vegas, Nevada, USA (2009)
Wang, X., Cui, P., Wang, J., Pei, J., Zhu, W., Yang, S.: Community preserving network embedding. In: Thirty-First AAAI Conference on Artificial Intelligence, AAAI 2017, pp. 203–209. San Francisco, California, USA (2017)
Zhang, C., Fu, H., Hu, Q., Cao, X., Xie, Y., Tao, D., Xu, D.: Generalized latent multi-view subspace clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence pp. 170–191 (2018)
Zhao, H., Ding, Z., Fu, Y.: Multi-view clustering via deep matrix factorization. In: Thirty-First AAAI Conference on Artificial Intelligence, AAAI 2017, pp. 2921–2927. San Francisco, California, USA (2017)
Zhu, X., Turney, P., Lemire, D., Vellino, A.: Measuring academic influence: not all citations are equal. J. Assoc. Inf. Sci. Technol.66(2), 408–427 (2015)
Zhu, Y., Yan, X., Getoor, L., Moore, C.: Scalable text and link analysis with mixed-topic link models. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, SIGKDD 2013, pp. 473–481. Chicago, IL, USA (2013)
Acknowledgments
This work is supported by the National Natural Science Foundation of China (61672128).
Author information
Authors and Affiliations
School of Software, Dalian University of Technology, Dalian, China
Yue Qian, Yu Liu & Xiujuan Xu
Department of Computing, Macquarie University, Macquarie Park, NSW, 2109, Australia
Quan Z. Sheng
- Yue Qian
You can also search for this author inPubMed Google Scholar
- Yu Liu
You can also search for this author inPubMed Google Scholar
- Xiujuan Xu
You can also search for this author inPubMed Google Scholar
- Quan Z. Sheng
You can also search for this author inPubMed Google Scholar
Corresponding author
Correspondence toYu Liu.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Qian, Y., Liu, Y., Xu, X.et al. Leveraging citation influences for Modeling scientific documents.World Wide Web23, 2281–2302 (2020). https://doi.org/10.1007/s11280-020-00796-w
Received:
Revised:
Accepted:
Published:
Issue Date:
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative