Image Captioning using Convolutional Neural Networks and Long Short Term Memory Cells
Hitoishi Das

Hitoishi Das*, Department of Computer Science and Engineering, ICFAI Foundation for Higher Education, Hyderabad (Telangana), India.

Manuscript received on 03 January 2022 | Revised Manuscript received on 18 April 2022 | Manuscript published on 30 May 2022 | PP: 91-95 | Volume-11 Issue-1, May 2022. | Retrieval Number: 100.1/ijrte.E67410110522 | DOI: 10.35940/ijrte.E6741.0511122
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (

Abstract: This paper discusses an efficient approach to captioning a given image using a combination of Convolutional Neural Network (CNN) and Recurrent Neural Networks (RNN) with Long Short Term Memory Cells (LSTM). Image captioning is a realm of deep learning and computer vision which deals with generating relevant captions for a given input image. The research in this area includes the hyperparameter tuning of Convolutional Neural Networks and Recurrent Neural Networks to generate captions which are as accurate as possible. The basic outline of the process includes giving an image as input to the CNN which outputs a feature map. This feature map is passed as input to the RNN which outputs a sentence describing the image. The research in image captioning is relevant because this method demonstrates the true power of the encoder-decoder network made up of Convolutional Neural Network and Recurrent Neural Network and potentially will open many pathways for further interesting research on different types of neural networks. 
Keywords: Captioning, Deep Learning, Encoders, Decoders, Convolutional Neural Networks, Recurrent Neural Networks, Computer Vision
Scope of the Article: Deep Learning