Movatterモバイル変換

Ashish Sharma ORCID:orcid.org/0000-0001-6636-4219¹^na1,
Gaurav Sharma^na1^nAff2,
Fatima A. Asiri³,
Javed Khan Bhutto⁴ &
…
Abdulwasa Bakr Barnawi⁴

105Accesses
Explore all metrics

Abstract

In the past few years, there have been many advancements in the field of Generative Adversarial Networks (GANs). The paper talks about the various types of GANs developed along with focusing on one specific application of generating a human face using the given text description which is a less-explored area. GANs are a class of machine learning models designed for generative tasks, such as creating realistic images, music, or text. GANs are a powerful tool in the field of deep learning. This paper explains StackGAN, AttentionalGAN, MirrorGAN, CycleGAN, etc. Beyond this, the paper also comprises the various embedding techniques, their advantage, models, and disadvantages. Through this paper, we also got insight into how to improve the performance of models just by improving the embeddings or by pre-training the models in the case of MirrorGAN.

This is a preview of subscription content,log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Text-to-Face Generation Using DCGAN with Bert-Embedding Vectors

Image Caption Combined with GAN Training Method

Human Facial Image Generation from Textual Descriptions Using StyleGAN

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems. 2013; p. 26.
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial nets. In: Advances in Neural Information Processing Systems. 2014;p. 27.
Herdade S, Kappeler A, Boakye K, Soares J. Image captioning: Transforming objects into words. In: Advances in Neural Information Processing Systems; 2019. pp. 11135–11145.
Zhang H, Xu T, Li H, Zhang S, Wang X, Huang X, Metaxas DN. Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In: The IEEE International Conference on Computer Vision (ICCV); 2017.
Zhang H, Xu T, Li H, Zhang S, Wang X, Huang X, Metaxas DN. Stackgan++: Realistic image synthesis with stacked generative adversarial networks. IEEE Trans Pattern Anal Mach Intell. 2019;41(8):1947–62.
Article Google Scholar
Xu T, Zhang P, Huang Q, Zhang H, Gan Z, Huang X, He X. Attngan: Fine-grained text to image generation with attentional generative adversarial networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2018.
Qiao T, Zhang J, Xu D, Tao D. Mirrorgan: Learning text-to-image generation by redescription. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2019.
Chen X, Qing L, He X, Luo X, Xu Y. FTGAN: A fully-trained generative adversarial networks for text to face generation. CoRR. 2019.arXiv: 1904.05729
Reed S, Akata Z, Yan X, Logeswaran L, Schiele B, Lee H. Generative adversarial text to image synthesis. arXiv preprintarXiv:1605.05396. 2016.
Yan X, Yang J, Sohn K, Lee H. Attribute2image: Conditional image generation from visual attributes. In: European Conference on Computer Vision. Springer; 2016. pp. 776–791.
Odena A, Olah C, Shlens J. Conditional image synthesis with auxiliary classifier gans. In: Proceedings of the 34th International Conference on Machine Learning; 2017:70, pp. 2642–2651. JMLR.org
Lu Y, Tai Y-W, Tang C-K. Attribute-guided face generation using conditional cyclegan. In: Proceedings of the European Conference on Computer Vision (ECCV), 2018;pp. 282–297
Oord A, Kalchbrenner N, Espeholt L, Kavukcuoglu K, Vinyals O, Graves A. Conditional image generation with pixelcnn decoders. In: Lee DD, Sugiyama M, Luxburg UV, Guyon I, Garnett R, editors. Advances in neural information processing systems, vol. 29. New York: Curran Associates Inc.; 2016. p. 4790–8.
Google Scholar
Castelle M. The social lives of generative adversarial networks. In: FAT*, p. 413; 2020.
Gui J, Sun Z, Wen Y, Tao D, Ye J. A review on generative adversarial networks: Algorithms, theory, and applications. arXiv preprintarXiv:2001.06937. 2020.
Schuster M, Paliwal KK. Bidirectional recurrent neural networks. IEEE Trans Signal Process. 1997;45(11):2673–81.
Article Google Scholar
Cheng J, Chen Y-PP, Li M, Jiang Y-G. Tc-gan: Triangle cycle-consistent gans for face frontalization with facial features preserved. In: Proceedings of the 27th ACM International Conference on Multimedia; 2019. pp. 220–228.
Radford A, Metz L, Chintala S. Unsupervised representation learning with deep convolutional generative adversarial networks.arXiv:1511.06434, 2016.
Arjovsky M, Chintala S, Bottou L. Wasserstein GAN. 2017.
Karras T, Aila T, Laine S, Lehtinen J. Progressive growing of GANs for improved quality, stability, and variation.arXiv:1710.10196, 2018.
Karras T, Laine S, Aila T. A style-based generator architecture for generative adversarial networks. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16–20, 2019, pp. 4401–4410. Computer Vision Foundation / IEEE. 2019.https://doi.org/10.1109/CVPR.2019.00453 .http://openaccess.thecvf.com/content_CVPR_2019/html/Karras_A_Style-Based_Generator_Architecture_for_Generative_Adversarial_Networks_CVPR_2019_paper.html
Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R , editors. Advances in Neural Information Processing Systems 30, 2017;pp. 6626–6637.
Schroff F, Kalenichenko D, Philbin J. Facenet: A unified embedding for face recognition and clustering. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2015.
Cho K, Merrienboer B, Gülçehre Ç, Bougares F, Schwenk H, Bengio Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. CoRR. 2014.arXiv: 1406.1078
Rice JR. The algorithm selection problem**this work was partially supported by the national science foundation through grant gp-32940x. this chapter was presented as the george e. forsythe memorial lecture at the computer science conference, February 19, 1975, Washington, D. D. Advances in Computers, vol. 15, pp. 65–118. Elsevier. 1976.https://doi.org/10.1016/S0065-2458(08)60520-3 .https://www.sciencedirect.com/science/article/pii/S0065245808605203
Devlin J, Chang M, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. CoRR; 2018.
Ramos J. Using tf-idf to determine word relevance in document queries. In: Proceedings of the First Instructional Conference on Machine Learning. Piscataway, NJ; 2003. vol. 242, pp. 133–142
Pennington J, Socher R, Manning CD. Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP); 2014. pp. 1532–1543.
Reimers N, Gurevych I. Sentence-bert: Sentence embeddings using Siamese Bert-networks. arXiv preprintarXiv:1908.10084; 2019.
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2016.
Hong S, Yang D, Choi J, Lee H. Inferring semantic layout for hierarchical text-to-image synthesis. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2018.

Download references

Acknowledgements

The authors extend their appreciation to the Deanship of Scientific Research at King Khalid University, Kingdom of Saudi Arabia for funding this work through Large Groups Project under Grant No. R.G.P 2/88/44.

Author information

Author notes

Gaurav Sharma
Present address: Department of CSE, Banasthali University, Jaipur, India
Ashish Sharma and Gaurav Sharma have contributed equally to this work.

Authors and Affiliations

Department of CSE, Manipal University Jaipur, Jaipur, India
Ashish Sharma
Information Systems Department College of Computer Sciences, King Khalid University, Abha, Saudi Arabia
Fatima A. Asiri
Department of Electrical Engineering, College of Engineering, King Khalid University, Abha, Saudi Arabia
Javed Khan Bhutto & Abdulwasa Bakr Barnawi

Authors

Ashish Sharma
View author publications
You can also search for this author inPubMed Google Scholar
Gaurav Sharma
View author publications
You can also search for this author inPubMed Google Scholar
Fatima A. Asiri
View author publications
You can also search for this author inPubMed Google Scholar
Javed Khan Bhutto
View author publications
You can also search for this author inPubMed Google Scholar
Abdulwasa Bakr Barnawi
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding authors

Correspondence toAshish Sharma orGaurav Sharma.

Ethics declarations

Conflict of interest

The authors of the manuscript entitled “Optimized Mirror Generative Adversarial Network with BERT Neural Architecture for Text Caption to Image Conversion”; there is no conflict of interest. On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection "Advanced Computing: Innovations and Applications" guest edited by Sanjay Madria, Parteek Bhatia, Priyanka Sharma and Deepak Garg.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sharma, A., Sharma, G., Asiri, F.A.et al. Optimized Mirror Generative Adversarial Network with BERT Neural Architecture for Text Caption to Image Conversion.SN COMPUT. SCI.5, 334 (2024). https://doi.org/10.1007/s42979-024-02609-7

Download citation

Received:03 August 2023
Accepted:06 January 2024
Published:27 March 2024
DOI:https://doi.org/10.1007/s42979-024-02609-7

Movatterモバイル変換

Optimized Mirror Generative Adversarial Network with BERT Neural Architecture for Text Caption to Image Conversion

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Text-to-Face Generation Using DCGAN with Bert-Embedding Vectors

Image Caption Combined with GAN Training Method

Human Facial Image Generation from Textual Descriptions Using StyleGAN

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Associated Content

Advanced Computing: Innovations and Applications

Access this article

Subscribe and save

Buy Now