Making Sense of Data: AI-based NLP Tools for Education Research

Making Sense of Data: AI-based NLP Tools for Education Research

For most researchers—even those with some experience in data analysis or who have taken statistics courses—deciding on and applying appropriate statistical methods is still challenging (Pallant, 2020).

 When you must analyze your data and you are not sure how to proceed, what do you do? Do you open a statistics book? Ask your supervisor or experienced colleagues? Search social media and the web? Or, if it is available, do you try artificial intelligence? Would you use AI just for guidance, or let it run the whole analysis?

This blog post looks at why researchers need support in data analysis, why many turn to AI-based Natural Language Processing (NLP) tools, how these tools can help at different stages of analysis, and what skills researchers should build to work with AI effectively.

Why researchers need support in data analysis

Data analysis is a complex task that requires both technical knowledge and methodological thinking (Creswell & Creswell, 2018). Even simple datasets may include missing values, errors, or outliers that require careful preparation (Field, 2018). Many researchers do not have strong training in statistics, which often creates anxiety and a lack of confidence (Onwuegbuzie & Wilson, 2003). Time restrictions and limited access to expert consultants make this harder, especially in smaller or less-funded institutions (Cabrera & McDougall, 2013). Some researchers also see statistics as a secondary part of research, which reduces their motivation to engage with it (Gal & Ginsburg, 1994). These challenges explain why accessible support in data analysis is so important.

Why researchers turn to AI-based NLP tools

There are many sources of support for data analysis, such as books, tutorials, social media or web resources, academic advisors or colleagues. Most recently, AI-based NLP tools are becoming very popular. NLP is a branch of Artificial Intelligence (AI) and Machine Learning (ML) that focuses on enabling computers to understand, generate, and interact through human language (Hirschberg & Manning, 2015). Well-known examples include chat-based systems such as ChatGPT, which allow researchers to ask questions in plain language and receive immediate feedback.

These tools provide fast, on-demand help that fits tight research schedules. They allow researchers to interact in natural language, without needing advanced software or coding skills. Many are low-cost or free, which makes them more accessible than professional consultants. AI systems are also improving quickly, which increases their usefulness in different types of research tasks (Floridi & Chiriatti, 2020). Another reason might be that some researchers prefer using AI to avoid the discomfort of asking for help from others (Bohns & Flynn, 2010). For a short introduction to the distinctions between AI, ML, and NLP, see this overview. For these reasons, AI tools are now widely used as an easy and convenient support system.

Capabilities and limits of AI-based NLP tools

AI-based NLP tools can support many parts of data analysis. They can clean and organize data, identify patterns, and summarize large sets of text. They can also suggest interpretations or help draft parts of research reports (Gale, 1987; Žižka et al., 2019; Young et al., 2018). However, they have clear limits. AI usually lacks deep contextual understanding and domain expertise. It can reflect biases in its training data (Shah & Sureja, 2025).It cannot judge ethical issues such as deep meaning, privacy and consent (Bankins& Formosa, 2023). The quality of AI output depends on clear input, and poor prompts often lead to poor results. Finally, many AI systems work like a “black box,” offering little transparency about how answers are produced (von Eschenbach, 2021). For this reason, AI should not replace human expertise but rather complement it.

AI support across data analysis steps

The process of data analysis usually follows a series of steps, moving from collecting raw information to reporting results. Classic frameworks describe these stages as data collection, cleaning and preparation, exploration, hypothesis building, modeling and analysis, interpretation, and reporting (Tukey, 1977; Creswell & Creswell, 2018). Each step requires different skills and decisions, and mistakes at one stage can affect the quality of the entire study. AI can assist in these steps, but human oversight remains essential (Shneiderman, 2022). Furthermore, this shared approach combines AI’s efficiency with human judgment and expertise.

Data collection: AI plays a key role in the data revolution by enhancing data collection within big data, open data, and evolving infrastructures. It automates gathering information from digital sources, sensors, and databases, making large-scale data more accessible for research. Still, researchers must ensure ethical use and data quality when relying on AI-driven collection (Kitchin, 2014).

Data cleaning and preparation: Detecting errors or missing values is one of AI’s strengths, but researchers should always confirm corrections (De Waal et al., 2012).

Exploratory analysis: In this stage, AI’s ability to summarize and visualize data helps detect patterns or anomalies. It can interpret tables, graphs, and outputs from analysis, providing summaries and potential insights, but final interpretation should be validated by the researcher ( Amant & Cohen, 1998).

Hypothesis building: Instead of providing answers, AI may highlight possible patterns that inspire new hypotheses, but researchers decide which are meaningful (Yao et al., 2025).

Deciding the appropriate method: Suggestions for statistical methods can be generated by AI based on the data type and research questions. However, evaluating appropriateness and assumptions remains the researcher’s responsibility (Schwarz, 2025).

Modeling and analysis: Support for replicating models or tuning parameters shows the usefulness of AI at this stage. Yet its limits become clear with more complex tasks – tools like ChatGPT may produce repetitive or incomplete solutions, making human judgment and verification essential (Prander et al., 2025; Schwarz, 2025).

Interpretation: While outputs may be accurate, meaningful interpretation still requires human insight. Even advanced tools such as ChatGPT-4 often lack precision and contextual understanding, so theoretical conclusions must come from the researcher (Sporek& Konieczny, 2025).

Reporting: Drafting, formatting, and revising reports can be streamlined by AI, but it cannot take full responsibility for accuracy, interpretation, or compliance with research standards. Human researchers must review and finalize all outputs to ensure correct and meaningful reporting  (Anderson et al., 2025).

Skills for working effectively with AI

To use AI responsibly, researchers need certain skills. Data literacy is key for understanding data types, quality, and methods (Carlson et al., 2011). Basic statistical knowledge helps them check analysis outputs even when provided by AI (Garfield et al., 2008). Methodological proficiency is also important; researchers should understand research design, data collection strategies, and how analysis decisions relate to research questions and hypotheses (Creswell & Creswell, 2018).

Some literacy skills—such as statistical literacy, digital literacy, and AI literacy—are essential for understanding methods, navigating tools, and using AI effectively. Critical thinking and problem solving allow researchers to question and refine AI-generated results (Saddhono et al., 2024). Ethical awareness ensures responsible handling of privacy, bias, and transparency issues (Jobin et al., 2019). Finally, clear prompt writing is important to guide AI effectively (Federiakin et al., 2024). These skills help researchers combine AI tools with scientific rigor. Importantly, becoming proficient with these tools takes practice. While the learning curve may initially reduce efficiency, familiarity with AI over time can significantly increase the net benefits of its use.

Ethical considerations

In addition to the limits mentioned earlier, using AI in research comes with ethical risks. AI may increase existing biases in data (Mehrabi et al., 2021; Shah & Sureja, 2025). Furthermore, it might lead to some fairness issues (Barocas et al., 2023). Its “black box” nature makes transparency and accountability difficult. Sensitive data may not be fully protected by AI systems (von Eschenbach, 2021). Over-reliance on AI may also reduce human skills or allow mistakes to go unnoticed (Karamuk, 2025). Reproducibility may suffer if AI use is not well documented. Researchers, therefore, need to apply ethical standards carefully when working with AI.

Conclusion and recommendations

AI-based NLP tools can make data analysis more accessible and efficient. But they cannot replace human expertise and ethical responsibility. Researchers need skills in data literacy, statistics, critical thinking, problem solving, AI literacy, prompt writing, and ethics to use AI effectively.

At the same time, researchers should carefully weigh whether using NLP tools provides a net benefit. While these tools can accelerate tasks, effective use still requires time for accurate prompting, checking outputs, filtering hypotheses, and reviewing conclusions. In some cases, this effort may equal or even exceed the time saved. Therefore, choosing to use AI should be a conscious decision, guided by the nature of the task, the researcher’s skills, and the standards of the research community.

A hybrid model that combines AI’s speed with human insight can improve both quality and trust in research. With thoughtful use, AI can help researchers manage data analysis while keeping high scientific standards.

Key messages

  • Researchers need support in data analysis because the process is complex, often stressful, and requires both technical knowledge and methodological skills that many researchers lack.
  • Common solutions for support include consulting statistics books, online resources, supervisors, or colleagues—but recently, many researchers have increasingly turned to AI-based NLP tools for quick, accessible guidance.
  • AI offers valuable help across different stages of data analysis, from data collection to reporting. However, its limitations in context, accuracy, and ethical judgment mean it cannot replace human expertise.
  • The most effective approach is human–AI collaboration, where AI provides efficiency and automation while researchers contribute interpretation, ethical oversight, and scientific rigor—supported by skills in data literacy, statistics, critical thinking, AI literacy, and ethics, as well as clear institutional guidelines.

ECER 2025 –

Prof. Dr. Ergul Demir

Prof. Dr. Ergul Demir

Department of Measurement and Evaluation, Ankara University

Prof. Dr. Ergul Demir currently works at the Department of Measurement and Evaluation, Ankara University, as a professor and a senior researcher. His focus is on psychometric modelling, including Item Response Theory and its applications, multivariate data analysis, and advanced research methods. Most recently, he has been working on ‘Data Science in Psychology and Education’ and ‘AI integration into psychometrics and educational assessment’.

Academic profiles:

ECER Belgrade 2025

Since the first ECER in 1992, the conference has grown into one of the largest annual educational research conferences in Europe. In 2025, the EERA family heads to Serbia for ECER and ERC.

08 - 09 September 2025 - Emerging Researchers' Conference
09 - 12 September 2025 - European Conference on Educational Research

Find out about fees and registration here.

Since the first ECER in 1992, the conference has grown into one of the largest annual educational research conferences in Europe. In 2025, the EERA family heads to Serbia for ECER and ERC.

In Belgrade, the conference theme is Charting the Way Forward: Education, Research, Potentials and Perspectives

No doubt that education has a central role in society, but what it is destined to do is contested politically as well as scientifically. Yet more debate is had concerning the question of the way in which educational research should shape the future of educational practice. The important, but sensitive role educational research occupies in that regard should be the promotion of a better understanding of the contemporary and future world of education, as is expressed in EERA’s aim.

Emerging Researchers' Conference - Belgrade 2025

The Emerging Researchers' Conference (ERC) precedes ECER and is organised by EERA's Emerging Researchers' Group. Emerging researchers are uniquely supported to discuss and debate topical and thought-provoking research projects in relation to the ECER themes, trends and current practices in educational research year after year. The high-quality academic presentations during the ERC are evidence of the significant participation and contributions of emerging researchers to the European educational research community.

By participating in the ERC, emerging researchers have the opportunity to engage with world class educational research and to learn the priorities and developments from notable regional and international researchers and academics. The ERC is purposefully organised to include special activities and workshops that provide emerging researchers varied opportunities for networking, creating global connections and knowledge exchange, sharing the latest groundbreaking insights on topics of their interest. Submissions to the ERC are handed in via the standard submission procedure.

Prepare yourself to be challenged, excited and inspired.

Other blog posts on similar topics:

References and further reading

Andersen, J. P., Degn, L., Fishberg, R., Graversen, E. K., Horbach, S. P., Schmidt, E. K., Schneider, J. W., & Sørensen, M. P. (2025). Generative Artificial Intelligence (GenAI) in the research process–A survey of researchers’ practices and perceptions. Technology in Society81, 102813. https://doi.org/10.1016/j.techsoc.2025.102813

Bankins, S., & Formosa, P. (2023). The ethical implications of Artificial Intelligence (AI) for meaningful work. Journal of Business Ethics, 185, 725–740. https://doi.org/10.1007/s10551-023-05339-7

Barocas, S., Hardt, M., & Narayanan, A. (2023). Fairness and machine learning: Limitations and opportunities. MIT press. https://fairmlbook.org/

Bohns, V. K., & Flynn, F. J. (2010). “Why didn’t you just ask?” Underestimating the discomfort of help-seeking. Journal of Experimental Social Psychology, 46(2), 402–409. https://doi.org/10.1016/j.jesp.2009.12.015

Cabrera, J., & McDougall, A. (2013). Statistical consulting. Springer New York, NY. https://doi.org/10.1007/978-1-4757-3663-2

Carlson, J., Fosmire, M., Miller, C. C., & Nelson, M. S. (2011). Determining data information literacy needs: A study of students and research faculty. Portal: Libraries and the Academy, 11(2), 629–657. Doi: 10.1353/pla.2011.0022

Creswell, J. W., & Creswell, J. D. (2018). Research design: Qualitative, quantitative, and mixed methods approaches (5th ed.). Thousand Oaks, California, SAGE Publications, Inc.

De Waal, T., Pannekoek, J. & Scholtus, S. (2012), The editing of statistical data: methods and techniques for the efficient detection and correction of errors and missing values. WIREs Computational Statistics, 4(2), 204-210. https://doi.org/10.1002/wics.1194

Federiakin, D., Molerov, D., Zlatkin-Troitschanskaia, O., & Maur, A. (2024) Prompt engineering as a new 21st century skill. Frontiers in Education,9, 1366434. Doi:10.3389/feduc.2024.1366434

Field, A. (2018). Discovering statistics using IBM SPSS statistics (5th ed.). Sage.

Floridi, L., &Chiriatti, M. (2020). GPT-3: Its nature, scope, limits, and consequences. Minds and Machines, 30(4), 681–694. https://doi.org/10.1007/s11023-020-09548-1

Gal, I., & Ginsburg, L. (1994). The role of beliefs and attitudes in learning statistics: Towards an assessment framework. Journal of Statistics Education, 2(2). https://doi.org/10.1080/10691898.1994.11910471

Gale, W. A. (1987). Statistical applications of artificial intelligence and knowledge engineering. The Knowledge Engineering Review, 2(4), 227-247. https://doi.org/10.1017/S0269888900004136

Garfield, J., Ben-Zvi, D., Chance, B., Medina, E., Roseth, C., &Zieffler, A. (2008). Developing students’ statistical reasoning: Connecting research and teaching practice. Springer Science & Business Media. https://doi.org/10.1007/978-1-4020-8383-9

Hirschberg, J. & Manning , C.D. (2015). Advances in natural language processing. Science, 349, 261-266. https://doi.org/10.1126/science.aaa8685

Jobin, A., Ienca, M., &Vayena, E. (2019). The global landscape of AI ethics guidelines. Nature Machine Intelligence, 1(9), 389–399. https://doi.org/10.1038/s42256-019-0088-2

Karamuk, E. (2025). The Automation Trap: Unpacking the Consequences of Over-Reliance on AI in Education and Its Hidden Costs. In Pitfalls of AI Integration in Education: Skill Obsolescence, Misuse, and Bias (pp. 151-174). IGI Global Scientific Publishing

Kitchin, R. (2014). The data revolution: Big data, open data, data infrastructures & their consequences. SAGE Publications Ltd. https://doi.org/10.4135/9781473909472

Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., & Galstyan, A. (2021). A survey on bias and fairness in machine learning. ACM Computing Surveys, 54(6), 1–35. https://doi.org/10.1145/3457607

Onwuegbuzie, A. J., & Wilson, V. A. (2003). Statistics anxiety: Nature, etiology, antecedents, effects, and treatments. Teaching in Higher Education, 8(2), 195–209. https://doi.org/10.1080/1356251032000052447

Pallant, J. (2020). SPSS survival manual (7th ed.). McGraw-Hill Education. https://doi.org/10.4324/9781003117452

Prandner, D., Wetzelhütter, D., & Hese, S. (2025). ChatGPT as a data analyst: An exploratory study on AI-supported quantitative data analysis in empirical research. Frontiers in Education, 9, 1417900. https://doi.org/10.3389/feduc.2024.1417900

Saddhono, K., Suhita, R., Rakhmawati, A., Rohmadi, M., &Sukmono, I.K. (2024). AI and literacy: Developing critical thinking and analytical skills in the digital era. International Conference on IoT, Communication and Automation Technology (ICICAT), Gorakhpur, India, pp. 360-365. Doi:10.1109/ICICAT62666.2024.10922871

Schwarz, J. (2025). The use of generative AI in statistical data analysis and its impact on teaching statistics at universities of applied sciences, Teaching Statistics, 47(2), 118–128. https://doi.org/10.1111/test.12398

Shah, M., &Sureja, N. A. (2025). Comprehensive review of bias in deep learning models: Methods, impacts, and future directions. Archives of Computational Methods in Engineering, 32, 255–267. https://doi.org/10.1007/s11831-024-10134-2

Shneiderman, B. (2022). Human-centered AI. Oxford University Press.

Sporek, P., & Konieczny, M. (2025). Artificial intelligence versus human analysis: Interpreting data in elderly fat reduction study. Advances in Integrative Medicine12(1), 13-18. https://doi.org/10.1016/j.aimed.2024.12.011

St. Amant, R., & Cohen, P. R. (1998). Intelligent support for exploratory data analysis. Journal of Computational and Graphical Statistics7(4), 545–558. https://doi.org/10.1080/10618600.1998.10474794

Tukey, J. W. (1977). Exploratory data analysis (Vol. 2, pp. 131-160). Reading, MA: Addison-Wesley.

von Eschenbach, W.J. (2021). Transparency and the black box problem: Why we do not trust AI. Philosophy & Technology, 34, 1607–1622. https://doi.org/10.1007/s13347-021-00477-0

Yao, L., Yin, H., Yang, C., Han, S., Ma, J., Graff, J. C., Wang, C.-Y., Jiao, Y., Ji, J., Gu, W. & Wang, G. (2025). Generating research hypotheses to overcome key challenges in the early diagnosis of colorectal cancer-Future application of AI. Cancer Letters620, 217632. https://doi.org/10.1016/j.canlet.2025.217632

Young, T., Hazarika, D., Poria, S., & Cambria, E. (2018). Recent trends in deep learning based natural language processing. IEEE Computational Intelligence Magazine, 13(3), 55–75. Doi: 10.1109/MCI.2018.2840738

Žižka, J., Dařena, F., & Svoboda, A. (2019). Text Mining with Machine Learning: Principles and Techniques (1st ed.). CRC Press. https://doi.org/10.1201/9780429469275