A Study of ConvNeXt Architectures for Enhanced Image Captioning
Abstract:
This study explores the effectiveness of the ConvNeXt model, an advanced computer vision architecture, in the task of image captioning. We integrated ConvNeXt with a Long Short-Term Memory network that includes a visual attention module, focusing on assessing its performance across different scenarios. Experiments were conducted using various ConvNeXt versions for feature extraction, different learning rates during the training phase were tested, and the impact of including or excluding teacher-forcing was analyzed. The MS COCO 2014 dataset was employed, with top-5 accuracy and BLEU metrics used to evaluate performance. The implementation of ConvNeXt in image captioning systems reveals notable performance enhancements. In terms of BLEU-4 scores, ConvNeXt outperformed existing benchmarks by 43.04% for models using soft-attention and by 39.04% for those with hard-attention mechanisms. Furthermore, ConvNeXt surpassed models based on vision transformers and data-efficient image transformers by 4.57% and 0.93%, respectively, in BLEU-4 scores. When compared with systems using encoders such as ResNet-101, ResNet-152, VGG-16, ResNeXt-101, and MobileNet V3, ConvNeXt achieved higher top-5 accuracy improvements of 6.44%, 6.46%, 6.47%, 6.39%, and 6.68%, and reduced loss by 18.46%, 18.44%, 18.46%, 18.24%, and 18.72%, respectively.
Año de publicación:
2024
Keywords:
- Artificial intelligence
- Computer Vision
- ConvNeXt
- Deep learning
- image analysis
- Image captioning
- image description generation
- Image understanding
- Natural Language processing
Fuente:
scopusTipo de documento:
Article
Estado:
Acceso restringido
Áreas de conocimiento:
- Visión por computadora
- Ciencias de la computación
- Ciencias de la computación
Áreas temáticas de Dewey:
- Métodos informáticos especiales
- Ciencias de la computación
- Programación informática, programas, datos, seguridad
Objetivos de Desarrollo Sostenible:
- ODS 4: Educación de calidad
- ODS 17: Alianzas para lograr los objetivos
- ODS 9: Industria, innovación e infraestructura