Researchers from Freie Universität Berlin, the Leibniz Institute for Science and Mathematics Education, and Boston College offer a new perspective on results from large-scale educational assessments and a more meaningful way of comparing outcomes
No 067/2021 from Apr 23, 2021
Professor Steffi Pohl, a researcher at Freie Universität Berlin, is part of an international team taking an in-depth look at large-scale educational assessments such as PISA ("Programme for International Student Assessment"). In a new study, the team has analyzed the PISA results from a different perspective. Their findings show that when it comes to assessing student performance in school, it’s not just the academic ability of the students that makes a difference. It turns out that outcomes are based on complex student behavior. When taking tests, students tend to use a mixture of skill and strategy, and these strategies can vary; for example, some students tend to prioritize speed over accuracy when answering exam questions, while for others the opposite is true. The researchers believe that at present it is not always possible to clearly define which aspect of student behavior most influences PISA outcomes and to what extent assessments fail to factor in these aspects equally for all students - something that could have particular implications for country rankings. The research findings were published in the journal Science.
Large-scale assessments like PISA have a huge impact on policy-makers in the field of education. Since the Organization for Economic Cooperation and Development (OECD) published the results of the first PISA study in December 2001, public debate on education in Germany - a country that did comparatively badly in the study - has centered around primary and secondary education before broadening to include pre-schools and kindergartens along with the teacher training system.
The authors of the new study suggest that it is important to differentiate between the various factors that play a role in answering questions in exams. These factors include accuracy (whether the answer to a question is correct), speed (how long it takes to answer a question), and response propensity (how many questions a student answers in the time available). To measure student test-taking strategies, the researchers use log data from computerized testing. They examine, for example, the amount of time a student spends on each question and use statistical models to describe what takes place. These techniques make it possible to get a more precise picture of how exactly students answer questions. The study also shows that there are differences between countries in terms of the typical strengths of their students. In some countries, students tend to work carefully with a high degree of accuracy. But because they spend so much time ensuring their answers are correct, they tend to answer fewer questions overall. In other countries, students tend to complete all the questions quickly, but this leads to them making more mistakes.
The research team has suggested using a composite scoring system that takes into account three key aspects of test-taking behavior, rather than basing scores on just one aspect (for example, speed). They argue that if this system were to be implemented, it would be easier to understand the various factors that play a role in good performance in examinations. This in turn would allow for a better understanding of student ability and thus enable more targeted interventions. The approach would also make it possible to compare countries in a more nuanced way. The list of country rankings can change dramatically depending on which aspects of student test-taking behavior are weighted most heavily.
Pohl, S., Ulitzsch, E., & von Davier, M. (2021). Reframing country rankings in educational assessments. Science, 372(6540), 338-340. abd3300
Steffi Pohl, Department of Education and Psychology, Freie Universität Berlin, Tel.: +49 30 838- 62926, Email: firstname.lastname@example.org