使用深色CNN和双向封盖复发单元的图像到孟加拉语字幕生成

论文标题

使用深色CNN和双向封盖复发单元的图像到孟加拉语字幕生成

Image to Bengali Caption Generation Using Deep CNN and Bidirectional Gated Recurrent Unit

论文作者

Faruk, Al Momin, Faraby, Hasan Al, Azad, Md. Muzahidul, Fedous, Md. Riduyan, Morol, Md. Kishor

论文摘要

关于孟加拉语的描述，几乎没有明显的研究。大约有2.43亿人在孟加拉语讲话，这是地球上第七位口语的第七语言。这项研究的目的是提出一个基于CNN和双向GRU的建筑模型，该模型从图像中以孟加拉语的语言生成自然语言字幕。孟加拉人可以利用这项研究打破语言障碍，并更好地理解彼此的观点。这也将帮助许多盲人日常生活。本文使用一种编码器方法来生成字幕。我们使用称为InceptOnv3Image嵌入模型的预训练的深卷积神经网络（DCNN）作为分析，分类和注释数据集图像的编码器，该图像的双向门控复发单元（BGRU）层是生成字幕的解码器。 Argmax和Beam搜索用于产生标题的最高质量。使用了一个称为BNATUR的新数据集，该数据集包含8000张图像，每个图像五个字幕。它用于培训和测试所提出的模型。我们获得了BLEU-1，BLEU-2，BLEU-3，BLEU-4，流星分别为42.6、27.95、23、66、16.41、28.7。

There is very little notable research on generating descriptions of the Bengali language. About 243 million people speak in Bengali, and it is the 7th most spoken language on the planet. The purpose of this research is to propose a CNN and Bidirectional GRU based architecture model that generates natural language captions in the Bengali language from an image. Bengali people can use this research to break the language barrier and better understand each other's perspectives. It will also help many blind people with their everyday lives. This paper used an encoder-decoder approach to generate captions. We used a pre-trained Deep convolutional neural network (DCNN) called InceptonV3image embedding model as the encoder for analysis, classification, and annotation of the dataset's images Bidirectional Gated Recurrent unit (BGRU) layer as the decoder to generate captions. Argmax and Beam search is used to produce the highest possible quality of the captions. A new dataset called BNATURE is used, which comprises 8000 images with five captions per image. It is used for training and testing the proposed model. We obtained BLEU-1, BLEU-2, BLEU-3, BLEU-4 and Meteor is 42.6, 27.95, 23, 66, 16.41, 28.7 respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题