Movatterモバイル変換


[0]ホーム

URL:


Skip to main content

Advertisement

Springer Nature Link
Log in

Optimized Mirror Generative Adversarial Network with BERT Neural Architecture for Text Caption to Image Conversion

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

In the past few years, there have been many advancements in the field of Generative Adversarial Networks (GANs). The paper talks about the various types of GANs developed along with focusing on one specific application of generating a human face using the given text description which is a less-explored area. GANs are a class of machine learning models designed for generative tasks, such as creating realistic images, music, or text. GANs are a powerful tool in the field of deep learning. This paper explains StackGAN, AttentionalGAN, MirrorGAN, CycleGAN, etc. Beyond this, the paper also comprises the various embedding techniques, their advantage, models, and disadvantages. Through this paper, we also got insight into how to improve the performance of models just by improving the embeddings or by pre-training the models in the case of MirrorGAN.

This is a preview of subscription content,log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Algorithm 1
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems. 2013; p. 26.

  2. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial nets. In: Advances in Neural Information Processing Systems. 2014;p. 27.

  3. Herdade S, Kappeler A, Boakye K, Soares J. Image captioning: Transforming objects into words. In: Advances in Neural Information Processing Systems; 2019. pp. 11135–11145.

  4. Zhang H, Xu T, Li H, Zhang S, Wang X, Huang X, Metaxas DN. Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In: The IEEE International Conference on Computer Vision (ICCV); 2017.

  5. Zhang H, Xu T, Li H, Zhang S, Wang X, Huang X, Metaxas DN. Stackgan++: Realistic image synthesis with stacked generative adversarial networks. IEEE Trans Pattern Anal Mach Intell. 2019;41(8):1947–62.

    Article  Google Scholar 

  6. Xu T, Zhang P, Huang Q, Zhang H, Gan Z, Huang X, He X. Attngan: Fine-grained text to image generation with attentional generative adversarial networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2018.

  7. Qiao T, Zhang J, Xu D, Tao D. Mirrorgan: Learning text-to-image generation by redescription. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2019.

  8. Chen X, Qing L, He X, Luo X, Xu Y. FTGAN: A fully-trained generative adversarial networks for text to face generation. CoRR. 2019.arXiv: 1904.05729

  9. Reed S, Akata Z, Yan X, Logeswaran L, Schiele B, Lee H. Generative adversarial text to image synthesis. arXiv preprintarXiv:1605.05396. 2016.

  10. Yan X, Yang J, Sohn K, Lee H. Attribute2image: Conditional image generation from visual attributes. In: European Conference on Computer Vision. Springer; 2016. pp. 776–791.

  11. Odena A, Olah C, Shlens J. Conditional image synthesis with auxiliary classifier gans. In: Proceedings of the 34th International Conference on Machine Learning; 2017:70, pp. 2642–2651. JMLR.org

  12. Lu Y, Tai Y-W, Tang C-K. Attribute-guided face generation using conditional cyclegan. In: Proceedings of the European Conference on Computer Vision (ECCV), 2018;pp. 282–297

  13. Oord A, Kalchbrenner N, Espeholt L, Kavukcuoglu K, Vinyals O, Graves A. Conditional image generation with pixelcnn decoders. In: Lee DD, Sugiyama M, Luxburg UV, Guyon I, Garnett R, editors. Advances in neural information processing systems, vol. 29. New York: Curran Associates Inc.; 2016. p. 4790–8.

    Google Scholar 

  14. Castelle M. The social lives of generative adversarial networks. In: FAT*, p. 413; 2020.

  15. Gui J, Sun Z, Wen Y, Tao D, Ye J. A review on generative adversarial networks: Algorithms, theory, and applications. arXiv preprintarXiv:2001.06937. 2020.

  16. Schuster M, Paliwal KK. Bidirectional recurrent neural networks. IEEE Trans Signal Process. 1997;45(11):2673–81.

    Article  Google Scholar 

  17. Cheng J, Chen Y-PP, Li M, Jiang Y-G. Tc-gan: Triangle cycle-consistent gans for face frontalization with facial features preserved. In: Proceedings of the 27th ACM International Conference on Multimedia; 2019. pp. 220–228.

  18. Radford A, Metz L, Chintala S. Unsupervised representation learning with deep convolutional generative adversarial networks.arXiv:1511.06434, 2016.

  19. Arjovsky M, Chintala S, Bottou L. Wasserstein GAN. 2017.

  20. Karras T, Aila T, Laine S, Lehtinen J. Progressive growing of GANs for improved quality, stability, and variation.arXiv:1710.10196, 2018.

  21. Karras T, Laine S, Aila T. A style-based generator architecture for generative adversarial networks. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16–20, 2019, pp. 4401–4410. Computer Vision Foundation / IEEE. 2019.https://doi.org/10.1109/CVPR.2019.00453 .http://openaccess.thecvf.com/content_CVPR_2019/html/Karras_A_Style-Based_Generator_Architecture_for_Generative_Adversarial_Networks_CVPR_2019_paper.html

  22. Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R , editors. Advances in Neural Information Processing Systems 30, 2017;pp. 6626–6637.

  23. Schroff F, Kalenichenko D, Philbin J. Facenet: A unified embedding for face recognition and clustering. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2015.

  24. Cho K, Merrienboer B, Gülçehre Ç, Bougares F, Schwenk H, Bengio Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. CoRR. 2014.arXiv: 1406.1078

  25. Rice JR. The algorithm selection problem**this work was partially supported by the national science foundation through grant gp-32940x. this chapter was presented as the george e. forsythe memorial lecture at the computer science conference, February 19, 1975, Washington, D. D. Advances in Computers, vol. 15, pp. 65–118. Elsevier. 1976.https://doi.org/10.1016/S0065-2458(08)60520-3 .https://www.sciencedirect.com/science/article/pii/S0065245808605203

  26. Devlin J, Chang M, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. CoRR; 2018.

  27. Ramos J. Using tf-idf to determine word relevance in document queries. In: Proceedings of the First Instructional Conference on Machine Learning. Piscataway, NJ; 2003. vol. 242, pp. 133–142

  28. Pennington J, Socher R, Manning CD. Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP); 2014. pp. 1532–1543.

  29. Reimers N, Gurevych I. Sentence-bert: Sentence embeddings using Siamese Bert-networks. arXiv preprintarXiv:1908.10084; 2019.

  30. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2016.

  31. Hong S, Yang D, Choi J, Lee H. Inferring semantic layout for hierarchical text-to-image synthesis. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2018.

Download references

Acknowledgements

The authors extend their appreciation to the Deanship of Scientific Research at King Khalid University, Kingdom of Saudi Arabia for funding this work through Large Groups Project under Grant No. R.G.P 2/88/44.

Author information

Author notes
  1. Gaurav Sharma

    Present address: Department of CSE, Banasthali University, Jaipur, India

  2. Ashish Sharma and Gaurav Sharma have contributed equally to this work.

Authors and Affiliations

  1. Department of CSE, Manipal University Jaipur, Jaipur, India

    Ashish Sharma

  2. Information Systems Department College of Computer Sciences, King Khalid University, Abha, Saudi Arabia

    Fatima A. Asiri

  3. Department of Electrical Engineering, College of Engineering, King Khalid University, Abha, Saudi Arabia

    Javed Khan Bhutto & Abdulwasa Bakr Barnawi

Authors
  1. Ashish Sharma

    You can also search for this author inPubMed Google Scholar

  2. Gaurav Sharma

    You can also search for this author inPubMed Google Scholar

  3. Fatima A. Asiri

    You can also search for this author inPubMed Google Scholar

  4. Javed Khan Bhutto

    You can also search for this author inPubMed Google Scholar

  5. Abdulwasa Bakr Barnawi

    You can also search for this author inPubMed Google Scholar

Corresponding authors

Correspondence toAshish Sharma orGaurav Sharma.

Ethics declarations

Conflict of interest

The authors of the manuscript entitled “Optimized Mirror Generative Adversarial Network with BERT Neural Architecture for Text Caption to Image Conversion”; there is no conflict of interest. On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection "Advanced Computing: Innovations and Applications" guest edited by Sanjay Madria, Parteek Bhatia, Priyanka Sharma and Deepak Garg.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sharma, A., Sharma, G., Asiri, F.A.et al. Optimized Mirror Generative Adversarial Network with BERT Neural Architecture for Text Caption to Image Conversion.SN COMPUT. SCI.5, 334 (2024). https://doi.org/10.1007/s42979-024-02609-7

Download citation

Keywords

Associated Content

Part of a collection:

Advanced Computing: Innovations and Applications

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Advertisement


[8]ページ先頭

©2009-2025 Movatter.jp