Unlocking language barriers: Assessing pre-trained large language models across multilingual tasks and unveiling the black box with Explainable Artificial Intelligence
Comprehensive assessment of cutting-edge pre-trained LLMs across six NLP tasks.
•
Multilingual performance analysis of LLMs in diverse resource settings.
•
Application of XAI to reveal LLM’s inner workings and enhance explainability.
Abstract
Large Language Models (LLMs) have revolutionized many industrial applications and paved the way for fostering a new research direction in many fields. Conventional Natural Language Processing (NLP) techniques, for instance, are no longer necessary for many text-based tasks, including polarity estimation, sentiment and emotion classification, and hate speech detection. However, training a language model for domain-specific tasks is hugely costly and requires high computational power, thereby restricting its true potential for standard tasks. This study, therefore, provides a comprehensive analysis of the latest pre-trained LLMs for various NLP-related applications without fine-tuning them to evaluate their effectiveness. Five language models are thus employed in this study on six distinct NLP tasks (including emotion recognition, sentiment analysis, hate speech detection, irony detection, offensiveness detection, and stance detection) for 12 languages from low- to medium- and high-resource. Generative Pre-trained Transformer 4 (GPT-4) and Gemini Pro outperform state-of-the-art models, achieving average F1 scores of 70.6% and 68.8% on the Tweet Sentiment Multilingual dataset compared to the state-of-the-art average F1 score of 66.8%. The study further interprets the findings obtained by the LLMs using Explainable Artificial Intelligence (XAI). To the best of our knowledge, it is the first time any study has employed explainability on pre-trained language models.