Unlocking language barriers: Assessing pre-trained large language models across multilingual tasks and unveiling the black box with Explainable Artificial Intelligence - ScienceDirect

Skip to main content Skip to article

View PDF

Engineering Applications of Artificial Intelligence

Volume 149, 1 June 2025, 110136

Engineering Applications of Artificial Intelligence

https://doi.org/10.1016/j.engappai.2025.110136 Get rights and content

Under a Creative Commons license

Open access

Highlights

•
Comprehensive assessment of cutting-edge pre-trained LLMs across six NLP tasks.
•
Multilingual performance analysis of LLMs in diverse resource settings.
•
Application of XAI to reveal LLM’s inner workings and enhance explainability.

Abstract

Large Language Models (LLMs) have revolutionized many industrial applications and paved the way for fostering a new research direction in many fields. Conventional Natural Language Processing (NLP) techniques, for instance, are no longer necessary for many text-based tasks, including polarity estimation, sentiment and emotion classification, and hate speech detection. However, training a language model for domain-specific tasks is hugely costly and requires high computational power, thereby restricting its true potential for standard tasks. This study, therefore, provides a comprehensive analysis of the latest pre-trained LLMs for various NLP-related applications without fine-tuning them to evaluate their effectiveness. Five language models are thus employed in this study on six distinct NLP tasks (including emotion recognition, sentiment analysis, hate speech detection, irony detection, offensiveness detection, and stance detection) for 12 languages from low- to medium- and high-resource. Generative Pre-trained Transformer 4 (GPT-4) and Gemini Pro outperform state-of-the-art models, achieving average F1 scores of 70.6% and 68.8% on the Tweet Sentiment Multilingual dataset compared to the state-of-the-art average F1 score of 66.8%. The study further interprets the findings obtained by the LLMs using Explainable Artificial Intelligence (XAI). To the best of our knowledge, it is the first time any study has employed explainability on pre-trained language models.

Keywords

Large language models

Zero-shot classification

Explainable Artificial Intelligence

Sentiment analysis

Emotion recognition

Data availability

Data will be made available on request.

Cited by (0)

© 2025 The Authors. Published by Elsevier Ltd.