- Ashish Sharma ORCID:orcid.org/0000-0001-6636-42191 na1,
- Gaurav Sharma na1 nAff2,
- Fatima A. Asiri3,
- Javed Khan Bhutto4 &
- …
- Abdulwasa Bakr Barnawi4
105Accesses
Abstract
In the past few years, there have been many advancements in the field of Generative Adversarial Networks (GANs). The paper talks about the various types of GANs developed along with focusing on one specific application of generating a human face using the given text description which is a less-explored area. GANs are a class of machine learning models designed for generative tasks, such as creating realistic images, music, or text. GANs are a powerful tool in the field of deep learning. This paper explains StackGAN, AttentionalGAN, MirrorGAN, CycleGAN, etc. Beyond this, the paper also comprises the various embedding techniques, their advantage, models, and disadvantages. Through this paper, we also got insight into how to improve the performance of models just by improving the embeddings or by pre-training the models in the case of MirrorGAN.
This is a preview of subscription content,log in via an institution to check access.
Access this article
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
Price includes VAT (Japan)
Instant access to the full article PDF.
















Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems. 2013; p. 26.
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial nets. In: Advances in Neural Information Processing Systems. 2014;p. 27.
Herdade S, Kappeler A, Boakye K, Soares J. Image captioning: Transforming objects into words. In: Advances in Neural Information Processing Systems; 2019. pp. 11135–11145.
Zhang H, Xu T, Li H, Zhang S, Wang X, Huang X, Metaxas DN. Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In: The IEEE International Conference on Computer Vision (ICCV); 2017.
Zhang H, Xu T, Li H, Zhang S, Wang X, Huang X, Metaxas DN. Stackgan++: Realistic image synthesis with stacked generative adversarial networks. IEEE Trans Pattern Anal Mach Intell. 2019;41(8):1947–62.
Xu T, Zhang P, Huang Q, Zhang H, Gan Z, Huang X, He X. Attngan: Fine-grained text to image generation with attentional generative adversarial networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2018.
Qiao T, Zhang J, Xu D, Tao D. Mirrorgan: Learning text-to-image generation by redescription. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2019.
Chen X, Qing L, He X, Luo X, Xu Y. FTGAN: A fully-trained generative adversarial networks for text to face generation. CoRR. 2019.arXiv: 1904.05729
Reed S, Akata Z, Yan X, Logeswaran L, Schiele B, Lee H. Generative adversarial text to image synthesis. arXiv preprintarXiv:1605.05396. 2016.
Yan X, Yang J, Sohn K, Lee H. Attribute2image: Conditional image generation from visual attributes. In: European Conference on Computer Vision. Springer; 2016. pp. 776–791.
Odena A, Olah C, Shlens J. Conditional image synthesis with auxiliary classifier gans. In: Proceedings of the 34th International Conference on Machine Learning; 2017:70, pp. 2642–2651. JMLR.org
Lu Y, Tai Y-W, Tang C-K. Attribute-guided face generation using conditional cyclegan. In: Proceedings of the European Conference on Computer Vision (ECCV), 2018;pp. 282–297
Oord A, Kalchbrenner N, Espeholt L, Kavukcuoglu K, Vinyals O, Graves A. Conditional image generation with pixelcnn decoders. In: Lee DD, Sugiyama M, Luxburg UV, Guyon I, Garnett R, editors. Advances in neural information processing systems, vol. 29. New York: Curran Associates Inc.; 2016. p. 4790–8.
Castelle M. The social lives of generative adversarial networks. In: FAT*, p. 413; 2020.
Gui J, Sun Z, Wen Y, Tao D, Ye J. A review on generative adversarial networks: Algorithms, theory, and applications. arXiv preprintarXiv:2001.06937. 2020.
Schuster M, Paliwal KK. Bidirectional recurrent neural networks. IEEE Trans Signal Process. 1997;45(11):2673–81.
Cheng J, Chen Y-PP, Li M, Jiang Y-G. Tc-gan: Triangle cycle-consistent gans for face frontalization with facial features preserved. In: Proceedings of the 27th ACM International Conference on Multimedia; 2019. pp. 220–228.
Radford A, Metz L, Chintala S. Unsupervised representation learning with deep convolutional generative adversarial networks.arXiv:1511.06434, 2016.
Arjovsky M, Chintala S, Bottou L. Wasserstein GAN. 2017.
Karras T, Aila T, Laine S, Lehtinen J. Progressive growing of GANs for improved quality, stability, and variation.arXiv:1710.10196, 2018.
Karras T, Laine S, Aila T. A style-based generator architecture for generative adversarial networks. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16–20, 2019, pp. 4401–4410. Computer Vision Foundation / IEEE. 2019.https://doi.org/10.1109/CVPR.2019.00453 .http://openaccess.thecvf.com/content_CVPR_2019/html/Karras_A_Style-Based_Generator_Architecture_for_Generative_Adversarial_Networks_CVPR_2019_paper.html
Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R , editors. Advances in Neural Information Processing Systems 30, 2017;pp. 6626–6637.
Schroff F, Kalenichenko D, Philbin J. Facenet: A unified embedding for face recognition and clustering. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2015.
Cho K, Merrienboer B, Gülçehre Ç, Bougares F, Schwenk H, Bengio Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. CoRR. 2014.arXiv: 1406.1078
Rice JR. The algorithm selection problem**this work was partially supported by the national science foundation through grant gp-32940x. this chapter was presented as the george e. forsythe memorial lecture at the computer science conference, February 19, 1975, Washington, D. D. Advances in Computers, vol. 15, pp. 65–118. Elsevier. 1976.https://doi.org/10.1016/S0065-2458(08)60520-3 .https://www.sciencedirect.com/science/article/pii/S0065245808605203
Devlin J, Chang M, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. CoRR; 2018.
Ramos J. Using tf-idf to determine word relevance in document queries. In: Proceedings of the First Instructional Conference on Machine Learning. Piscataway, NJ; 2003. vol. 242, pp. 133–142
Pennington J, Socher R, Manning CD. Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP); 2014. pp. 1532–1543.
Reimers N, Gurevych I. Sentence-bert: Sentence embeddings using Siamese Bert-networks. arXiv preprintarXiv:1908.10084; 2019.
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2016.
Hong S, Yang D, Choi J, Lee H. Inferring semantic layout for hierarchical text-to-image synthesis. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2018.
Acknowledgements
The authors extend their appreciation to the Deanship of Scientific Research at King Khalid University, Kingdom of Saudi Arabia for funding this work through Large Groups Project under Grant No. R.G.P 2/88/44.
Author information
Gaurav Sharma
Present address: Department of CSE, Banasthali University, Jaipur, India
Ashish Sharma and Gaurav Sharma have contributed equally to this work.
Authors and Affiliations
Department of CSE, Manipal University Jaipur, Jaipur, India
Ashish Sharma
Information Systems Department College of Computer Sciences, King Khalid University, Abha, Saudi Arabia
Fatima A. Asiri
Department of Electrical Engineering, College of Engineering, King Khalid University, Abha, Saudi Arabia
Javed Khan Bhutto & Abdulwasa Bakr Barnawi
- Ashish Sharma
You can also search for this author inPubMed Google Scholar
- Gaurav Sharma
You can also search for this author inPubMed Google Scholar
- Fatima A. Asiri
You can also search for this author inPubMed Google Scholar
- Javed Khan Bhutto
You can also search for this author inPubMed Google Scholar
- Abdulwasa Bakr Barnawi
You can also search for this author inPubMed Google Scholar
Corresponding authors
Correspondence toAshish Sharma orGaurav Sharma.
Ethics declarations
Conflict of interest
The authors of the manuscript entitled “Optimized Mirror Generative Adversarial Network with BERT Neural Architecture for Text Caption to Image Conversion”; there is no conflict of interest. On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the topical collection "Advanced Computing: Innovations and Applications" guest edited by Sanjay Madria, Parteek Bhatia, Priyanka Sharma and Deepak Garg.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sharma, A., Sharma, G., Asiri, F.A.et al. Optimized Mirror Generative Adversarial Network with BERT Neural Architecture for Text Caption to Image Conversion.SN COMPUT. SCI.5, 334 (2024). https://doi.org/10.1007/s42979-024-02609-7
Received:
Accepted:
Published:
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative