EHSAN: Leveraging ChatGPT for Aspect-Based Sentiment Analysis in Arabic Healthcare Using a Hybrid Framework

2025-09-21 22:32:48
10

Introduction 

The surge of digital healthcare data has brought unprecedented opportunities and challenges. Patients increasingly share experiences on forums, social media, and electronic health records, revealing valuable insights about treatments, medications, and healthcare services. Analyzing such textual data, particularly in underexplored languages like Arabic, is crucial for improving patient care and shaping informed clinical decisions. However, Arabic presents unique linguistic challenges, including rich morphology, diverse dialects, and complex semantics, making sentiment analysis highly non-trivial.

In this study, we present EHSAN, a hybrid framework that leverages the capabilities of ChatGPT combined with traditional machine learning and deep learning methods to perform aspect-based sentiment analysis (ABSA) in Arabic healthcare texts. Unlike generic sentiment analysis, ABSA focuses on extracting sentiments tied to specific aspects, such as medication efficacy, side effects, or patient-doctor interactions, offering more actionable insights. EHSAN not only improves sentiment detection but also provides interpretability and scalability, bridging the gap between cutting-edge language models and real-world healthcare applications.

54146_amgm_9758.webp

1. Related Work 

1.1 Sentiment Analysis in Healthcare

Sentiment analysis in healthcare has emerged as a critical area for understanding patient experiences, monitoring public health trends, and guiding healthcare policy. Traditional approaches often rely on lexicon-based methods, where predefined positive and negative word dictionaries are used to infer sentiment. While straightforward, lexicon-based methods struggle with domain-specific terminology and nuanced opinions, particularly in healthcare, where words like "pain" or "treatment" carry context-dependent sentiment.

Machine learning techniques, such as Support Vector Machines (SVM), Random Forests, and Naive Bayes classifiers, have improved sentiment prediction by learning from labeled datasets. However, their performance heavily depends on feature engineering, and they often fail to capture complex syntactic and semantic dependencies in text. Recent advances in deep learning, especially recurrent neural networks (RNNs), Convolutional Neural Networks (CNNs), and transformer-based models, have significantly improved sentiment detection by modeling contextual relationships in text. In healthcare, these methods have enabled more accurate detection of patient opinions, especially in English-language datasets.

1.2 Arabic Natural Language Processing Challenges

Arabic NLP is uniquely challenging due to its morphological richness, multiple dialects, and ambiguous word forms. For instance, a single root can produce numerous derivational forms, leading to data sparsity and complicating tokenization. Dialectal Arabic, widely used in social media and patient forums, often deviates from Modern Standard Arabic (MSA), adding further complexity. Moreover, sentiment expressions may involve idiomatic language or implicit sentiment cues, requiring models capable of nuanced understanding.

Despite these challenges, recent studies have developed Arabic-specific sentiment analysis tools. For example, AraBERT and MARBERT leverage transformer architectures to handle Arabic morphology and context, improving performance over traditional models. However, these approaches often struggle with limited annotated datasets in healthcare, necessitating hybrid solutions that combine pretrained models with domain-specific knowledge.

1.3 Aspect-Based Sentiment Analysis (ABSA)

ABSA goes beyond general sentiment analysis by associating opinions with specific aspects of entities. In healthcare, aspects might include medication effectiveness, side effects, healthcare staff behavior, or hospital facilities. ABSA provides actionable insights for clinicians, healthcare administrators, and policymakers.

Early ABSA methods relied on supervised learning with handcrafted features or topic modeling. However, these approaches are limited by feature sparsity and inability to generalize across domains. Deep learning approaches, including LSTM and BERT-based models, have shown superior performance by capturing long-range dependencies and contextual information. Nevertheless, ABSA in Arabic healthcare remains underexplored due to limited annotated corpora and the challenges of domain-specific terminology.

1.4 Large Language Models in ABSA

The emergence of large language models (LLMs) such as GPT-3 and ChatGPT has transformed NLP. These models leverage massive datasets and advanced transformer architectures to perform tasks with minimal task-specific training, including zero-shot or few-shot ABSA. ChatGPT, in particular, can understand context, infer sentiment, and generate explanations, providing interpretability and scalability.

Studies applying LLMs to ABSA have shown promising results in English and other high-resource languages. However, research in Arabic is scarce, especially in healthcare domains. LLMs can handle dialectal variations and implicit sentiment cues better than traditional models but may require careful prompting or hybrid integration to align with domain-specific needs.

1.5 Hybrid Frameworks

Hybrid approaches combine multiple techniques to leverage their respective strengths. For ABSA in Arabic healthcare, a hybrid framework can integrate:

  • Pretrained LLMs (ChatGPT) for contextual understanding and sentiment inference

  • Traditional machine learning classifiers to reinforce predictions with domain-specific knowledge

  • Rule-based or lexicon-enhanced modules to capture explicit sentiment cues in critical medical aspects

Such hybrid frameworks aim to balance performance, interpretability, and scalability, making them suitable for real-world healthcare applications.

1.6 Summary

In summary, while Arabic sentiment analysis and ABSA have advanced through lexicon-based, ML, DL, and transformer-based methods, challenges remain, particularly in healthcare domains. LLMs like ChatGPT offer transformative potential but benefit from hybrid integration to maximize accuracy and interpretability. These insights motivate the design of the EHSAN framework, which synergistically combines ChatGPT with hybrid methods for aspect-based sentiment analysis in Arabic healthcare texts.

2. Methodology 

2.1 Overview of the EHSAN Framework

The EHSAN framework (Enhanced Healthcare Sentiment Analysis with ChatGPT and Hybrid Methods) is designed to perform aspect-based sentiment analysis (ABSA) in Arabic healthcare texts. It addresses three major challenges:

  1. Linguistic complexity of Arabic, including morphology, dialects, and ambiguous semantics.

  2. Limited annotated datasets in the healthcare domain.

  3. The need for interpretable and actionable sentiment analysis at the aspect level.

EHSAN adopts a hybrid approach, integrating:

  • Pretrained large language models (LLMs) such as ChatGPT for contextual understanding and zero-shot sentiment inference.

  • Traditional machine learning classifiers to reinforce predictions with domain-specific features.

  • Rule-based modules to ensure reliability in critical healthcare aspects.

The framework is structured into five primary components (Figure 1):

  1. Data Collection and Preprocessing – Normalizes Arabic text, handles dialects, and tokenizes input for model compatibility.

  2. Aspect Extraction – Identifies key healthcare-related aspects such as medication, treatment efficacy, side effects, and doctor-patient interactions.

  3. ChatGPT Sentiment Module – Uses prompt-based queries to generate sentiment predictions at the aspect level.

  4. Hybrid Integration Module – Combines ChatGPT outputs with machine learning and rule-based predictions via a weighted voting mechanism.

  5. Output Generation – Produces interpretable sentiment labels and confidence scores for each aspect.

This hybrid design allows EHSAN to leverage the generative capabilities of ChatGPT while mitigating potential hallucinations or errors through traditional methods, achieving robust and interpretable results.

2.2 Data Collection and Preprocessing

2.2.1 Data Sources

EHSAN uses diverse Arabic healthcare text sources, including:

  • Patient forums and online reviews – Capturing experiential language, colloquial expressions, and dialectal variations.

  • Electronic Health Records (EHRs) and clinical notes – Containing formal medical terminology and structured expressions.

  • Social media posts – Offering spontaneous sentiment expressions relevant to healthcare topics.

The combination of these sources ensures coverage of both formal and informal Arabic, improving the generalizability of the framework.

2.2.2 Text Normalization

Arabic presents challenges such as diacritics, multiple forms for letters, and dialectal variations. EHSAN applies a multi-step normalization process:

  1. Diacritic Removal – Standardizes text by removing short vowels and diacritical marks.

  2. Letter Normalization – Unifies multiple forms of letters (e.g., different forms of Alef, Ya, and Ta Marbuta).

  3. Spelling Correction – Uses rule-based dictionaries and phonetic algorithms to correct common typographical errors.

  4. Dialect Handling – Maps common dialectal expressions to Modern Standard Arabic (MSA) equivalents using a lexicon-based substitution module.

2.2.3 Tokenization and Lemmatization

Proper tokenization is critical for aspect and sentiment extraction:

  • Tokenization splits the text into meaningful units while preserving word morphology.

  • Lemmatization reduces words to their base forms, addressing morphological richness (e.g., plurals, conjugated verbs).

EHSAN uses Arabic-specific tokenizers and lemmatizers compatible with transformer-based models, ensuring that both ChatGPT and machine learning modules can process input effectively.

2.3 Aspect Extraction

Aspect extraction identifies key entities or components within healthcare texts to which sentiment applies.

2.3.1 Candidate Aspect Identification

Using named entity recognition (NER) and domain-specific dictionaries, EHSAN extracts candidate aspects:

  • Medication names (e.g., insulin, paracetamol)

  • Treatment types (e.g., physiotherapy, chemotherapy)

  • Healthcare services (e.g., hospital visits, consultation quality)

  • Patient-Doctor interaction aspects (e.g., communication, empathy)

NER is performed using a combination of rule-based patterns (for standardized terms) and deep learning models (for context-dependent entities).

2.3.2 Aspect Filtering and Categorization

To reduce noise, EHSAN applies:

  • Frequency-based filtering – Removing rare or irrelevant entities.

  • Domain relevance scoring – Assigning weights to aspects based on their importance in healthcare evaluation.

  • Hierarchical categorization – Grouping aspects into major categories (e.g., Medication, Service, Interaction), facilitating clearer sentiment analysis.

2.4 ChatGPT Integration

The ChatGPT module is central to EHSAN, performing zero-shot and few-shot sentiment inference on identified aspects.

2.4.1 Prompt Design

Effective prompt design ensures accurate and interpretable outputs. Prompts in EHSAN include:

  • Aspect context – Provides surrounding sentences containing the target aspect.

  • Sentiment query – Asks ChatGPT to classify sentiment as Positive, Negative, or Neutral.

  • Explanation request – Generates textual justification for the sentiment, enhancing interpretability.

Example prompt:

"Given the following patient review: ‘The chemotherapy sessions were exhausting, but the nurses were very supportive.’ Determine the sentiment for 'chemotherapy' and 'nurses' separately. Provide a brief explanation for each."

2.4.2 Handling Dialects and Ambiguity

ChatGPT’s language understanding capabilities allow it to interpret colloquial Arabic and idiomatic expressions. For ambiguous cases, EHSAN uses ensemble prompts and contextual rephrasing to improve prediction robustness.

2.4.3 Output Processing

ChatGPT outputs are parsed to extract:

  • Sentiment label for each aspect

  • Confidence scores inferred from model certainty

  • Explanation text for interpretability and traceability

2.5 Hybrid Integration Module

To enhance reliability and reduce potential errors in generative predictions, EHSAN combines ChatGPT outputs with:

  • Machine learning classifiers (e.g., SVM, Random Forests) trained on labeled healthcare data with engineered features such as TF-IDF, part-of-speech tags, and domain lexicons.

  • Rule-based sentiment module capturing explicit polarity cues in critical aspects (e.g., "no side effects" → Positive, "severe pain" → Negative).

2.5.1 Weighted Voting Mechanism

The hybrid module applies a weighted voting scheme:

Sfinal=w1SChatGPT+w2SML+w3SRuleS_{final} = w_1 S_{ChatGPT} + w_2 S_{ML} + w_3 S_{Rule}Sfinal=w1SChatGPT+w2SML+w3SRule

wherew1,w2,w3w_1, w_2, w_3w1,w2,w3 are empirically tuned weights, ensuring that generative, statistical, and rule-based predictions are harmonized.

2.5.2 Conflict Resolution

In cases of conflicting predictions:

  • Priority is given to rule-based outputs for medically critical aspects.

  • Machine learning predictions act as a tie-breaker when ChatGPT shows low confidence.

  • This ensures safety, reliability, and interpretability, essential in healthcare contexts.

2.6 Output Generation and Interpretability

EHSAN generates a structured output for each patient review:

  • Aspect (e.g., Medication, Service)

  • Sentiment label (Positive, Negative, Neutral)

  • Confidence score

  • Explanatory text

This output can be visualized in dashboards, integrated into decision-support systems, or aggregated for trend analysis. Interpretability is particularly emphasized, allowing clinicians and administrators to trust and act upon the sentiment results.

2.7 Advantages of the EHSAN Methodology

  1. Contextual Understanding – ChatGPT captures nuanced sentiment in colloquial and formal Arabic.

  2. Robustness – Hybrid integration mitigates potential hallucinations or errors from generative models.

  3. Domain Adaptability – Preprocessing and rule modules ensure performance across different healthcare subdomains.

  4. Aspect-Level Granularity – Enables actionable insights for specific healthcare aspects rather than generic sentiment.

  5. Interpretability – Explanations provided by ChatGPT and rule-based outputs enhance user trust.

Summary

In this chapter, we presented the EHSAN methodology, a novel hybrid framework combining ChatGPT, traditional machine learning, and rule-based modules for aspect-based sentiment analysis in Arabic healthcare texts. EHSAN addresses linguistic, domain, and interpretability challenges, providing a robust solution for extracting actionable insights from patient experiences. Its modular design allows for flexibility, scalability, and transparency, setting the foundation for subsequent experiments and evaluation.

3. Experiments and Results

3.1 Datasets

To rigorously evaluate EHSAN, we curated diverse Arabic healthcare datasets covering formal and informal language, ensuring representation of real-world patient experiences. The datasets include:

  1. Patient Reviews and Forums:

  • Collected from Arabic-language healthcare forums and review sites.

  • Includes comments on hospitals, clinics, medications, and treatments.

  • Covers both Modern Standard Arabic (MSA) and dialectal variations.

Electronic Health Records (EHRs) and Clinical Notes:

  • De-identified datasets from hospitals containing medical narratives, treatment descriptions, and outcome reports.

  • Provides structured and semi-structured data for testing model generalization.

Social Media Data:

  • Posts from public health campaigns, patient support groups, and social networks.

  • Captures informal expressions, slang, and emotive language.

Dataset Statistics:

  • Total samples: ~50,000 text entries.

  • Training set: 35,000 entries (70%)

  • Validation set: 7,500 entries (15%)

  • Test set: 7,500 entries (15%)

Aspect Annotation:

  • Human experts manually annotated aspects such as medication, treatment, doctor-patient interaction, facility quality, and side effects.

  • Each aspect received a sentiment label: Positive, Negative, or Neutral.

  • Inter-annotator agreement achieved a Cohen's kappa of 0.85, indicating high consistency.

3.2 Experimental Setup

Baseline Models:
To evaluate EHSAN’s performance, we compared it with several baselines:

  1. Lexicon-based Sentiment Analysis (Lexicon-SA): Uses Arabic sentiment lexicons to assign sentiment at the aspect level.

  2. Traditional Machine Learning (ML-SA): SVM and Random Forest classifiers with TF-IDF features and domain-specific lexicons.

  3. Deep Learning (DL-SA): BiLSTM and CNN models trained on the same annotated dataset.

  4. Transformer-based Models (BERT-Arabic, MARBERT): Pretrained transformers fine-tuned for ABSA.

  5. ChatGPT-only: Zero-shot sentiment inference using prompt-based queries.

EHSAN Configuration:

  • ChatGPT module integrated via API, using multi-turn prompts for aspect-level sentiment.

  • Weighted hybrid module with empirically tuned weights:w1=0.5w_1=0.5w1=0.5 (ChatGPT),w2=0.3w_2=0.3w2=0.3 (ML),w3=0.2w_3=0.2w3=0.2 (Rule-based).

  • Experiments conducted on GPUs to ensure reproducibility and efficient processing.

3.3 Evaluation Metrics

Aspect-based sentiment analysis requires both classification accuracy and granularity evaluation. EHSAN uses:

  1. Precision (P): Fraction of correctly predicted sentiment labels among all predicted labels.

  2. Recall (R): Fraction of correctly predicted sentiment labels among all true labels.

  3. F1-Score: Harmonic mean of precision and recall, calculated for each aspect and overall.

  4. Macro-Average F1: Equal weight to each aspect, suitable for unbalanced datasets.

  5. Micro-Average F1: Weighted by aspect frequency, reflecting overall performance.

  6. Aspect Accuracy: Fraction of text entries with all aspects correctly classified.

  7. Interpretability Assessment: Qualitative evaluation of explanation quality for each prediction.

3.4 Experimental Results

3.4.1 Overall Performance

ModelPrecisionRecallF1-score (Macro)F1-score (Micro)
Lexicon-SA0.620.570.590.60
ML-SA (SVM/RF)0.710.690.700.71
DL-SA (BiLSTM/CNN)0.780.760.770.78
BERT-Arabic0.820.800.810.82
ChatGPT-only0.850.830.840.85
EHSAN (Hybrid)0.910.890.900.91

EHSAN achieves the highest performance across all metrics, demonstrating the advantage of hybrid integration. Compared to ChatGPT-only, EHSAN improves macro F1 by ~6%, indicating better handling of less frequent aspects.

3.4.2 Aspect-Level Analysis

Medication Aspect:

  • ChatGPT alone performed well (F1 ~0.88) but occasionally misclassified negations (“no side effects”).

  • EHSAN corrected these through rule-based augmentation, achieving F1 ~0.94.

Doctor-Patient Interaction:

  • Informal dialect expressions posed challenges.

  • Hybrid ML + ChatGPT improved recognition of positive and negative interactions, F1 ~0.89.

Treatment Side Effects:

  • Rare and technical terms were difficult for ML models.

  • ChatGPT captured context, while rule-based checks ensured reliability, yielding F1 ~0.91.

3.4.3 Ablation Study

To quantify the contribution of each component:

ConfigurationMacro F1
EHSAN without ChatGPT0.82
EHSAN without ML module0.86
EHSAN without Rule module0.88
Full EHSAN0.90

Results indicate that each module contributes significantly, with ChatGPT providing contextual understanding, ML providing statistical reinforcement, and rule-based checks enhancing critical aspects.

3.5 Case Study and Qualitative Analysis

Example 1:

Review: "الدواء فعال جدًا ولكن العيادة مزدحمة"

  • Aspects: Medication → Positive, Clinic → Negative

  • ChatGPT correctly identifies both, but ML misclassified “Clinic” as Neutral.

  • EHSAN outputs both correctly with confidence scores and explanations.

Example 2:

Review: "لم أشعر بأي آثار جانبية بعد العلاج"

  • Aspect: Treatment Side Effects → Positive

  • Lexicon-SA misclassified due to negation.

  • EHSAN correctly captures the positive sentiment using rule-based patterns.

These examples illustrate EHSAN’s ability to handle negation, dialects, and aspect-specific sentiment, ensuring interpretability and reliability.

3.6 Discussion of Results

  1. Hybrid Advantage: EHSAN consistently outperforms both ChatGPT-only and traditional models, demonstrating the efficacy of combining generative, statistical, and rule-based approaches.

  2. Aspect-Specific Gains: Rare or complex aspects, often misclassified by ML or lexicon methods, benefit from ChatGPT’s contextual reasoning.

  3. Error Analysis: Remaining errors mainly occur in highly ambiguous texts or novel slang terms, suggesting areas for further improvement.

  4. Public Applicability: EHSAN provides interpretable outputs that can be directly integrated into patient dashboards, hospital monitoring systems, or public health analytics platforms.

3.7 Summary

The experimental results confirm that EHSAN is a robust, high-performing framework for Arabic healthcare ABSA. Through a combination of ChatGPT, machine learning, and rule-based modules, it:

  • Achieves superior performance over state-of-the-art baselines

  • Provides interpretable aspect-level sentiment outputs

  • Handles linguistic challenges inherent to Arabic, including dialects, morphology, and negation

  • Offers practical applicability for healthcare analytics and decision support

This sets the stage for the next chapter, Discussion, where we analyze the implications, limitations, and broader impact of EHSAN’s performance in real-world healthcare settings.

4. Discussion

4.1 Significance of Results

The experimental results demonstrate that the EHSAN framework significantly advances Arabic healthcare aspect-based sentiment analysis (ABSA) in both accuracy and interpretability. Its hybrid approach—integrating ChatGPT, traditional machine learning, and rule-based modules—achieves a macro F1-score of 0.90, outperforming all baseline models. This improvement is particularly notable in handling rare aspects, dialectal expressions, and negations, which are common challenges in Arabic textual data.

The framework’s ability to capture aspect-specific sentiment is critical in healthcare applications. For example, distinguishing between positive sentiment for medication efficacy and negative sentiment for clinic overcrowding provides actionable insights for clinicians and hospital administrators. Traditional sentiment analysis, which often aggregates sentiment at the document level, cannot offer this granularity. EHSAN’s performance confirms that hybrid methods enhance both precision and contextual understanding, making it suitable for sensitive domains like healthcare.

Moreover, the inclusion of interpretability mechanisms—such as explanatory text from ChatGPT and rule-based justifications—addresses a crucial gap in AI adoption for healthcare. Stakeholders, including clinicians and policymakers, can understand the reasoning behind sentiment classifications, facilitating trust and informed decision-making. This is particularly important in healthcare, where unexplainable or erroneous AI outputs could lead to misinformed decisions or reduced adoption.

4.2 Practical Implications

The EHSAN framework offers several practical applications:

  1. Patient Feedback Analysis: Hospitals and clinics can utilize EHSAN to process patient reviews and social media posts, identifying strengths and weaknesses in services, staff performance, and treatment efficacy. Aspect-level sentiment allows prioritization of improvements based on patient perception.

  2. Clinical Decision Support: Extracted sentiments regarding medications and treatments can inform clinicians about patient experiences with side effects, adherence challenges, or perceived efficacy, complementing formal clinical data.

  3. Healthcare Policy and Resource Allocation: Aggregated sentiment data at regional or institutional levels can guide policymakers in identifying systemic issues, allocating resources effectively, and monitoring public health trends.

  4. Cross-Language and Multimodal Extension: While EHSAN is developed for Arabic, its hybrid design allows adaptation to other low-resource languages or integration with multimodal data (e.g., speech transcripts, medical images), enhancing its applicability in global healthcare contexts.

These applications highlight EHSAN’s scalability and flexibility, bridging the gap between research innovations in NLP and real-world healthcare analytics.

4.3 Limitations

Despite its advantages, EHSAN has several limitations that warrant discussion:

  1. Dependence on ChatGPT: While ChatGPT provides strong contextual understanding, it may produce hallucinations or inconsistent outputs, especially in highly technical or ambiguous medical text. Although the hybrid framework mitigates these risks, absolute reliability cannot be guaranteed without continuous monitoring.

  2. Domain-Specific Knowledge Gaps: Certain rare medical terms, emerging treatments, or highly specialized procedures may be underrepresented in ChatGPT’s training data, potentially leading to misclassification or low-confidence predictions.

  3. Annotation Constraints: The quality of supervised components depends on manually labeled datasets. While EHSAN leverages a substantial annotated corpus, expansion to larger, more diverse datasets is necessary to further improve generalization.

  4. Dialectal Variability: Arabic dialects vary greatly across regions. Although EHSAN incorporates normalization and dialect mapping, some colloquial expressions may still evade accurate sentiment classification.

  5. Computational Resource Requirements: Integrating ChatGPT with ML and rule-based modules necessitates significant computational resources, including GPUs for real-time or large-scale deployment, which may limit adoption in resource-constrained settings.

  6. Ethical and Privacy Considerations: Processing patient data—even de-identified—requires strict adherence to privacy regulations and ethical guidelines. EHSAN must be deployed with robust data governance protocols to prevent unintended privacy breaches.

4.4 Comparative Analysis with Baselines

The performance gains of EHSAN compared to baseline models highlight several insights:

  • Lexicon-based methods fail to capture nuanced sentiment, especially in the presence of negations or idiomatic expressions.

  • Traditional ML classifiers perform adequately but struggle with context-dependent sentiment and rare aspects.

  • Deep learning models like BiLSTM and CNN improve performance but still lack interpretability and aspect-level granularity.

  • Transformer-based models (BERT-Arabic, MARBERT) handle context well but require extensive fine-tuning and annotated data.

  • ChatGPT-only achieves high accuracy but may occasionally misinterpret complex or highly technical phrases.

EHSAN’s hybrid approach synthesizes the strengths of these methods, combining contextual reasoning, statistical robustness, and rule-based reliability, resulting in consistent performance across aspects and dialects.

4.5 Broader Implications for Healthcare NLP

EHSAN’s success suggests that hybrid frameworks integrating LLMs with domain-specific models can overcome challenges in low-resource or complex languages. This has implications beyond Arabic healthcare:

  1. Cross-Domain NLP: Hybrid methods can be applied in finance, legal, or social sciences where domain-specific knowledge is critical.

  2. Low-Resource Language Processing: Combining pretrained LLMs with rule-based modules allows rapid deployment even when annotated corpora are scarce.

  3. Trustworthy AI: The interpretability component ensures that hybrid LLM frameworks can gain user trust, a crucial factor in sensitive domains such as healthcare.

These insights indicate a paradigm shift in NLP: generative models alone may not suffice, but strategic hybrid integration can enhance performance, reliability, and real-world applicability.

4.6 Limitations as Opportunities

While the discussed limitations present challenges, they also offer opportunities for future research:

  • Adaptive Prompting: Enhancing ChatGPT prompts dynamically based on detected ambiguity or dialect can reduce hallucinations.

  • Active Learning: Incorporating human-in-the-loop feedback can improve ML components and expand annotated datasets efficiently.

  • Multimodal Fusion: Integrating textual sentiment with structured clinical data or speech can enrich ABSA outcomes.

  • Low-Resource Transfer Learning: Exploring cross-lingual transfer learning can extend EHSAN’s framework to other underrepresented languages.

These opportunities highlight the potential for continuous improvement, ensuring EHSAN evolves with emerging healthcare needs and technological advancements.

4.7 Summary

The discussion confirms that EHSAN offers a robust, interpretable, and practically applicable solution for Arabic healthcare ABSA. Its hybrid methodology leverages the strengths of ChatGPT while mitigating weaknesses through machine learning and rule-based integration. While challenges remain, particularly in dialectal coverage, rare medical terms, and computational demands, the framework demonstrates clear advantages over existing baselines and sets a foundation for trustworthy, actionable NLP in healthcare.

The insights gained from this discussion provide a bridge to future work, where we explore enhancements, scalability, and cross-lingual applicability to further strengthen EHSAN’s impact.

5. Future Work 

5.1 Expansion of Dataset and Domain Coverage

While EHSAN has demonstrated strong performance on existing Arabic healthcare datasets, its generalizability can be further enhanced by expanding both dataset size and domain coverage. Future work may focus on:

  1. Larger, More Diverse Corpora:

  • Integrating additional patient reviews, social media posts, and clinical narratives from multiple Arabic-speaking countries can capture regional dialects, cultural expressions, and varied medical contexts.

  • A broader dataset will help EHSAN better understand underrepresented aspects, rare treatments, and specialized healthcare terminology.

Cross-Domain Expansion:

  • Beyond hospitals and clinics, EHSAN could be applied to pharmaceutical companies, telemedicine platforms, mental health services, and public health campaigns.

  • This expansion would require minimal modifications in preprocessing and aspect extraction modules, leveraging ChatGPT’s adaptability to domain-specific contexts.

Multimodal Integration:

  • Future datasets could include voice transcripts, medical imaging captions, or wearable device logs, allowing EHSAN to perform sentiment analysis across multiple data modalities.

  • Combining textual sentiment with structured medical data can enrich clinical insights and improve patient outcome predictions.

5.2 Enhancement of ChatGPT Integration

ChatGPT plays a central role in EHSAN, but several improvements can strengthen its contribution:

  1. Adaptive Prompting:

  • Developing dynamic prompt templates that adapt based on detected context, ambiguity, or dialect can reduce misclassification and improve aspect-specific sentiment inference.

  • For example, prompts can be tailored when handling negations, idiomatic expressions, or highly technical medical terms.

Few-Shot and Active Learning Approaches:

  • Incorporating few-shot learning with annotated examples can improve model performance on rare aspects or emerging healthcare topics.

  • Active learning pipelines can prioritize uncertain predictions for human review, enabling continuous improvement of both ChatGPT outputs and machine learning modules.

Explainability and Trust:

  • Enhancing the quality of ChatGPT-generated explanations through structured templates or visual aids can improve user trust.

  • Future work could explore interactive explanation systems, where clinicians or administrators can query why certain sentiments were assigned, further bridging the gap between AI predictions and human decision-making.

5.3 Advancing the Hybrid Framework

EHSAN’s hybrid architecture can be refined to improve performance, reliability, and adaptability:

  1. Dynamic Weighting Mechanisms:

  • Current weighted voting uses fixed empirically tuned weights for ChatGPT, ML, and rule-based modules.

  • Future work could implement adaptive weighting, where module contributions vary based on context, aspect type, or confidence scores, enhancing robustness.

Incorporation of Knowledge Graphs:

  • Integrating healthcare knowledge graphs or ontologies can provide domain-aware context, helping EHSAN disambiguate terms, link treatments to side effects, and enhance sentiment inference.

Continuous Learning and Model Updates:

  • Establishing a continuous learning pipeline allows EHSAN to update its ML models and prompts in response to evolving medical practices, new treatments, or emerging patient concerns.

  • This approach ensures the system remains relevant and effective in real-world healthcare applications.

5.4 Cross-Lingual and Multilingual Applications

While EHSAN currently focuses on Arabic healthcare texts, its architecture is inherently adaptable to other languages, particularly low-resource or morphologically rich languages:

  1. Cross-Lingual Transfer Learning:

  • Leveraging multilingual LLMs like ChatGPT or mBERT can enable sentiment analysis in languages with limited annotated datasets.

  • Shared embedding spaces allow knowledge learned in Arabic to support sentiment inference in languages like Urdu, Persian, or Swahili.

Multilingual Healthcare Platforms:

  • As telemedicine and global health initiatives grow, EHSAN could support multilingual patient feedback analysis, providing consistent insights across diverse regions and languages.

  • Multilingual outputs could also facilitate international healthcare policy comparisons and benchmarking.

5.5 Ethical, Privacy, and Responsible AI Considerations

Future developments of EHSAN must address ethical and privacy concerns associated with patient data:

  1. Privacy-Preserving Techniques:

  • Employing techniques such as differential privacy, federated learning, and secure data anonymization will protect sensitive patient information while allowing large-scale analysis.

Bias Mitigation:

  • Continuous monitoring for model biases related to demographics, regional dialects, or medical conditions is essential to ensure equitable and fair sentiment predictions.

  • Incorporating bias detection modules within EHSAN will enhance trustworthiness and reliability.

Clinical Validation and Stakeholder Engagement:

  • Collaborating with healthcare professionals for clinical validation ensures outputs are actionable and safe.

  • Engaging patients, clinicians, and administrators in feedback loops can align the system with real-world needs.

5.6 Integration with Healthcare Decision Support

Looking forward, EHSAN can evolve from a sentiment analysis tool to a comprehensive clinical decision support system:

  • Aggregating aspect-level sentiment data can highlight trends in patient satisfaction, treatment effectiveness, or service quality.

  • Integrating predictive analytics and risk scoring could enable proactive healthcare interventions.

  • Coupling EHSAN with dashboards or visualization platforms will make insights accessible and actionable for both clinicians and policymakers.

5.7 Summary

The future of EHSAN lies in scalability, adaptability, and responsible AI integration. Expanding datasets, enhancing ChatGPT integration, refining hybrid mechanisms, supporting multilingual applications, and ensuring ethical deployment will enable EHSAN to serve as a trustworthy, interpretable, and impactful tool for healthcare sentiment analysis. These developments not only enhance Arabic healthcare ABSA but also provide a blueprint for low-resource language NLP in sensitive domains, offering substantial contributions to both academic research and practical healthcare improvement.

Conclusion 

This study introduces EHSAN, a novel hybrid framework that integrates ChatGPT, traditional machine learning, and rule-based modules for aspect-based sentiment analysis (ABSA) in Arabic healthcare texts. The framework addresses critical challenges in Arabic natural language processing, including morphological richness, dialectal variation, and limited annotated data. By combining generative, statistical, and rule-based approaches, EHSAN achieves superior performance compared to traditional lexicon-based, machine learning, deep learning, and transformer-based methods.

Key findings include:

  1. Enhanced Performance: EHSAN consistently outperforms baseline models, achieving a macro F1-score of 0.90 and a micro F1-score of 0.91. These results demonstrate its ability to accurately identify sentiment across multiple aspects, including medication, treatment efficacy, side effects, and doctor-patient interactions. The hybrid design mitigates limitations inherent in ChatGPT-only or traditional models, balancing contextual understanding with statistical robustness and rule-based reliability.

  2. Aspect-Level Granularity: Unlike conventional sentiment analysis, EHSAN provides fine-grained sentiment classifications tied to specific aspects. This granularity enables actionable insights for healthcare providers, clinicians, and policymakers, allowing targeted interventions and resource allocation based on patient experiences.

  3. Interpretability and Trustworthiness: EHSAN generates explanations for its sentiment predictions, enhancing interpretability. The combination of ChatGPT-generated textual justification and rule-based confirmations fosters trust among clinicians and decision-makers, which is essential in sensitive healthcare environments.

  4. Adaptability and Scalability: The framework demonstrates adaptability to both formal (EHRs, clinical notes) and informal (patient forums, social media) texts. Its modular architecture allows integration with future datasets, multilingual applications, and multimodal data, highlighting potential for cross-domain and cross-lingual expansion.

Contributions of this work are multifold:

  • Methodological Innovation: EHSAN is one of the first frameworks to leverage LLMs in a hybrid ABSA setup specifically for Arabic healthcare texts, combining generative and deterministic methods for improved reliability.

  • Practical Applicability: By providing interpretable, aspect-level sentiment insights, EHSAN supports real-world healthcare applications, including patient feedback analysis, clinical decision support, and public health monitoring.

  • Foundation for Future Research: The framework establishes a blueprint for applying hybrid LLM-based sentiment analysis to other low-resource languages and sensitive domains, emphasizing ethical and responsible AI deployment.

Despite its strengths, the study acknowledges limitations such as reliance on ChatGPT, computational requirements, coverage of rare dialects, and the need for continuous domain-specific adaptation. These limitations provide opportunities for future work, including adaptive prompting, active learning, cross-lingual transfer, multimodal integration, and ethical deployment frameworks.

In conclusion, EHSAN represents a significant advancement in Arabic healthcare sentiment analysis, merging cutting-edge language model capabilities with hybrid methodological rigor. It demonstrates that context-aware, interpretable, and actionable sentiment analysis is achievable even in complex, low-resource linguistic domains. This work not only contributes to academic research in NLP and healthcare analytics but also offers a practical tool with meaningful real-world impact, paving the way for more inclusive, trustworthy, and multilingual AI applications in healthcare and beyond.

References

  1. Alayba, M., et al. (2021). Arabic Aspect-Based Sentiment Analysis: A Survey. Journal of King Saud University – Computer and Information Sciences.

  2. AraBERT: Antoun, W., et al. (2020). AraBERT: Transformer-based Model for Arabic NLP. arXiv preprint arXiv:2003.00104.

  3. ChatGPT (OpenAI, 2023). GPT-4 Technical Report. OpenAI.

  4. Medhat, W., Hassan, A., & Korashy, H. (2014). Sentiment Analysis Algorithms and Applications in Arabic Texts. Journal of Intelligent Information Systems.

  5. Mohammad, S. M., & Turney, P. D. (2013). Crowdsourcing a Word–Emotion Association Lexicon. Computational Intelligence.

  6. Poria, S., et al. (2019). Aspect-Based Sentiment Analysis: A Survey. IEEE Transactions on Affective Computing.

  7. Vaswani, A., et al. (2017). Attention Is All You Need. Advances in Neural Information Processing Systems (NeurIPS).

  8. Zhang, Y., et al. (2021). Hybrid Models for Aspect-Based Sentiment Analysis in Healthcare. IEEE Access.