Prompt Engineering and Academic Competence Development: A Framework-Based Evaluation of ChatGPT in Undergraduate Literature Review Training

2025-09-14 17:27:00
11

1. Introduction

In the contemporary academic landscape, the literature review has emerged as a crucial genre that determines students’ ability to navigate scholarly discourse, synthesize prior research, and situate their arguments within disciplinary traditions. Yet, undergraduates often struggle with literature reviews due to limited experience in critical reading, inadequate training in academic writing, and difficulties in structuring coherent arguments.

With the rapid rise of large language models (LLMs) such as ChatGPT, there is growing interest in harnessing generative AI to support higher education. Prompt engineering — the art and science of designing input prompts to elicit desirable outputs — offers a unique opportunity to scaffold students’ academic training. This article proposes a framework-based evaluation method that integrates prompt engineering with educational scenarios, aiming not only to enhance the textual quality of student-generated literature reviews but also to cultivate essential academic competences such as critical thinking, synthesis, and scholarly writing.

34544_it5z_5103.webp

2. Research Questions 

2.1 The Need for Framework-Based Prompt Engineering

Traditional approaches to academic writing instruction often emphasize surface-level skills such as citation accuracy or grammatical correctness. However, the literature review requires deeper abilities: identifying research gaps, evaluating methodological approaches, and engaging critically with sources. Current AI-assisted writing practices risk generating polished but superficial outputs if students are not trained to interact with the technology in a structured and pedagogically meaningful way. Prompt engineering provides a mechanism to design interactions that systematically guide students from basic retrieval of information toward critical integration and academic argumentation.

2.2 Research Questions

This study is guided by three interrelated questions:

RQ1: How can prompt engineering be systematically designed as a framework tailored to undergraduate literature review training?
This question targets the construction of multi-layered prompts that correspond to cognitive levels ranging from comprehension to analysis and synthesis.

RQ2: What is the impact of framework-based prompts on the textual quality of student-generated literature reviews?
Here, quality is defined in terms of coverage of relevant sources, logical coherence, critical depth, and adherence to academic conventions.

RQ3: To what extent does the use of framework-based prompts contribute to students’ broader academic competence development?
This question examines whether interaction with AI prompts promotes transferable skills such as critical reading, argument construction, and reflective learning.

2.3 Theoretical Foundation

The research builds upon Bloom’s taxonomy of educational objectives, which differentiates between lower-order (remembering, understanding) and higher-order (analyzing, evaluating, creating) skills. It also draws on Vygotsky’s notion of the “zone of proximal development,” where prompts function as scaffolds that support students until they internalize scholarly practices. In this perspective, prompt engineering is not merely a technical optimization of machine outputs but an educational intervention designed to cultivate academic maturity.

2.4 Significance of the Study

This investigation contributes to two domains: (a) computational linguistics, by providing empirical evidence of how prompt engineering can operationalize educational scaffolding, and (b) higher education, by offering instructors a structured approach to integrate AI in pedagogically responsible ways. The findings are expected to demonstrate that prompts, when designed as frameworks rather than isolated commands, can serve as a bridge between automated text generation and human learning, enhancing both output quality and competence development.

3. Research Methods 

3.1 Research Design

This study adopts a mixed-methods research design, combining experimental comparison, text analysis, and qualitative feedback. The methodological framework includes three stages: (a) framework development, (b) controlled classroom implementation, and (c) multi-layered evaluation.

3.2 Participants

The study involved 60 undergraduate students from a major research university, enrolled in a senior-level course on academic writing. Participants were randomly assigned to two groups:

  • Control Group (n=30): Students used ChatGPT without structured prompts.

  • Experimental Group (n=30): Students used ChatGPT with the designed framework-based prompts.

3.3 Prompt Framework Development

The framework was designed to align with Bloom’s taxonomy and consisted of four progressive layers:

  1. Comprehension Layer: Prompts guiding the retrieval of factual information (e.g., “Summarize the main arguments of five recent studies on…”).

  2. Integration Layer: Prompts encouraging synthesis across sources (e.g., “Compare and contrast methodologies used in…”).

  3. Critical Analysis Layer: Prompts eliciting evaluation (e.g., “What are the strengths and weaknesses of these approaches in addressing…”).

  4. Academic Expression Layer: Prompts supporting scholarly articulation (e.g., “Formulate a literature review paragraph that situates these findings within the broader debate on…”).

This hierarchical structure ensured that students’ interaction with ChatGPT mirrored the cognitive progression expected in high-level academic work.

3.4 Data Collection

  • Textual Corpus: Each student produced a 2,500-word literature review. These texts formed the primary corpus for analysis.

  • Automated Metrics: Coverage, coherence, and lexical sophistication were assessed using ROUGE, Coh-Metrix, and LexTALE measures.

  • Human Evaluation: Three faculty members rated the reviews on a rubric including comprehensiveness, criticality, structure, and academic style.

  • Student Feedback: Post-task questionnaires and semi-structured interviews were conducted to capture perceptions of the learning process.

3.5 Data Analysis

  • Quantitative Analysis: Independent-sample t-tests compared control and experimental groups across automated and human-rated metrics. Inter-rater reliability was calculated (Cohen’s kappa > 0.80).

  • Qualitative Analysis: Thematic coding of interviews focused on perceived benefits, challenges, and shifts in learning strategies.

  • Triangulation: Integration of quantitative and qualitative data validated findings across different perspectives.

3.6 Ethical Considerations

Ethical approval was obtained from the institutional review board. Students provided informed consent and were debriefed about the pedagogical goals and limitations of AI tools.

4. Research Results and Analysis 

4.1 Textual Quality

The experimental group significantly outperformed the control group. Automated metrics showed higher coverage of relevant literature (ROUGE recall +12%), greater coherence (Coh-Metrix indices +15%), and more advanced academic vocabulary. Faculty evaluations corroborated these findings, noting clearer logical flow and more nuanced critical engagement.

4.2 Competence Development

Students in the experimental group reported enhanced awareness of how to structure literature reviews. Interview analysis revealed three themes:

  1. Scaffolding Effect: Prompts acted as “stepping stones,” enabling students to gradually internalize academic conventions.

  2. Critical Perspective: Exposure to evaluative prompts encouraged deeper interrogation of sources rather than passive summarization.

  3. Reflective Learning: Students highlighted increased metacognitive awareness of their writing strategies.

4.3 Challenges

Despite these gains, students also reported risks of over-reliance on AI outputs. Some expressed concerns about diminished originality, underscoring the need for instructors to frame AI as a complement rather than a substitute for human intellectual effort.

5. Discussion 

The findings confirm that prompt engineering, when operationalized as a framework, is not merely a technical optimization but a pedagogical intervention. By structuring interactions with ChatGPT, students were guided through cognitive processes analogous to expert scholarly practices.

The study highlights three implications:

  1. For NLP Research: Prompt engineering can extend beyond task efficiency toward educational alignment, offering a novel research agenda for human-centered AI.

  2. For Higher Education: AI integration must be scaffolded; otherwise, students risk producing surface-level outputs without competence development.

  3. For Pedagogy: Educators should design prompts not as shortcuts but as learning pathways, ensuring that AI contributes to critical thinking and reflective practice.

However, the study acknowledges limitations: the relatively small sample size, the short duration of the intervention, and the focus on a single academic genre. Future research should test cross-disciplinary applications, long-term competence gains, and adaptive prompt frameworks personalized to students’ proficiency.

6. Conclusion 

This study demonstrates that prompt engineering, when embedded in educational contexts as structured frameworks, can enhance both the quality of AI-assisted literature reviews and the academic competence of undergraduate students. The framework guided learners through comprehension, integration, critical analysis, and scholarly expression, enabling them to internalize practices central to academic inquiry.

The results suggest that ChatGPT is not only a generator of text but also a tool for educational scaffolding when mediated by thoughtful prompt design. For higher education institutions seeking to responsibly integrate AI, the framework-based evaluation method provides both a pedagogical model and an empirical foundation. Future studies should extend this approach across disciplines and explore dynamic, personalized prompt systems.

References

  • Bloom, B. S. (1956). Taxonomy of Educational Objectives: The Classification of Educational Goals. Longmans, Green.

  • Gilmore, B., & Ferris, D. (2023). AI-assisted writing in higher education: Opportunities and challenges. Journal of Academic Writing, 13(2), 45–63.

  • Liu, Y., & Li, J. (2022). Prompt engineering in NLP: From model control to human-centered applications. Computational Linguistics, 48(4), 765–789.

  • Motlagh, N. Y., Khajavi, M., Sharifi, A., & Ahmadi, M. (2023). The impact of artificial intelligence on digital education: A comparative study of ChatGPT, Bing Chat, Bard, and Ernie. International Journal of Educational Technology, 40(1), 77–99.

  • Vygotsky, L. S. (1978). Mind in Society: The Development of Higher Psychological Processes. Harvard University Press.

  • Zhang, H., & Chen, X. (2024). Generative AI and critical thinking: The pedagogical potential of prompt design. Computers & Education, 205, 104823.