AI supercharges scientific output while quality slips
AI is flooding science with well-written papers, boosting productivity worldwide while blurring the line between real breakthroughs and empty polish.
- Date:
- December 24, 2025
- Source:
- Cornell University
- Summary:
- AI writing tools are supercharging scientific productivity, with researchers posting up to 50% more papers after adopting them. The biggest beneficiaries are scientists who don’t speak English as a first language, potentially shifting global centers of research power. But there’s a downside: many AI-polished papers fail to deliver real scientific value. This growing gap between slick writing and meaningful results is complicating peer review, funding decisions, and research oversight.
- Share:
After ChatGPT became widely available in late 2022, many researchers started telling colleagues they could get more done with these new artificial intelligence tools. At the same time, journal editors reported a surge of smoothly written submissions that did not seem to add much scientific value.
A new Cornell study suggests those informal reports point to a broader change in how scientists are preparing manuscripts. The researchers found that large language models (LLMs) such as ChatGPT can increase paper output, with especially strong benefits for scientists who are not native English speakers. But the growing volume of AI written text is also making it harder for key decision makers to tell meaningful work apart from low value content.
"It is a very widespread pattern, across different fields of science -- from physical and computer sciences to biological and social sciences," said Yian Yin, assistant professor of information science in the Cornell Ann S. Bowers College of Computing and Information Science. "There's a big shift in our current ecosystem that warrants a very serious look, especially for those who make decisions about what science we should support and fund."
The findings appear in a paper titled "Scientific Production in the Era of Large Language Models," published Dec. 18 in Science.
How the Cornell Team Measured AI Use in Research Papers
To examine how LLMs are influencing scientific publishing, Yin's team compiled more than 2 million papers posted from January 2018 through June 2024 across three major preprint platforms. Those sites are arXiv, bioRxiv and Social Science Research Network (SSRN). Together, they represent the physical sciences, life sciences and social sciences, and they host studies that have not yet been through peer review.
The researchers used papers posted before 2023 that were presumed to be written by humans and compared them with AI generated text. From that comparison, they built a model designed to flag papers that were likely written with help from LLMs. Using this detector, they estimated which authors were probably using LLMs for writing, tracked how many papers those scientists posted before and after adopting the tools, and then checked whether the papers were later accepted by scientific journals.
Big Productivity Gains, Especially for Non Native English Speakers
The results showed a clear productivity jump linked to apparent LLM use. On arXiv, scientists flagged as using LLMs posted roughly one third more papers than those who did not appear to use AI. On bioRxiv and SSRN, the increase exceeded 50%.
The boost was largest for scientists who write in English as a second language and face extra hurdles when communicating technical work in a foreign language. For example, researchers affiliated with Asian institutions posted between 43.0% and 89.3% more papers after the detector suggested they began using LLMs, compared with similar researchers who did not appear to adopt the technology, depending on the preprint site. Yin expects the advantage could eventually shift global patterns of scientific productivity toward regions that have been held back by the language barrier.
AI Search May Broaden What Scientists Cite
The study also pointed to a potential benefit during literature searches and citation building. When researchers look for related work to cite, Bing Chat -- described as the first widely adopted AI powered search tool -- performed better at surfacing newer papers and relevant books than traditional search tools. Traditional tools, by contrast, were more likely to return older and more heavily cited sources.
"People using LLMs are connecting to more diverse knowledge, which might be driving more creative ideas," said first author Keigo Kusumegi, a doctoral student in the field of information science. He plans future research to test whether AI use is associated with more innovative and interdisciplinary science.
A New Problem for Peer Review and Research Evaluation
Even as LLMs help individuals produce more manuscripts, the same tools can make it harder for others to judge what is truly strong science. In human written papers, clearer yet more complex writing, including longer sentences and bigger words, has often been a useful signal of higher quality research. Across arXiv, bioRxiv and SSRN, papers likely written by humans that scored highly on a writing complexity test were also the most likely to be accepted by journals.
That pattern looked different for papers likely written with LLM assistance. Even when those AI flagged papers scored high on writing complexity, they were less likely to be accepted by journals. The researchers interpret this as a sign that polished language may no longer reliably reflect scientific value, and that reviewers may be rejecting some of these papers despite strong sounding writing.
Yin said this gap between writing quality and research quality could have serious consequences. Editors and reviewers may struggle more to identify the most valuable submissions, while universities and funding agencies may find that raw publication counts no longer reflect scientific contribution.
What Comes Next for Research on Generative AI
The researchers emphasize that these findings are observational. As a next step, they hope to test cause and effect using approaches such as controlled experiments, including designs where some scientists are randomly assigned to use LLMs and others are not.
Yin is also organizing a symposium on the Ithaca campus scheduled for March 3-5, 2026. The event will explore how generative AI is changing research and how scientists and policymakers can guide those changes.
As AI becomes more common for writing, coding and even generating ideas, Yin expects its influence to expand, effectively turning these systems into a kind of co scientist. He argues that policymakers should update rules to keep pace with the fast moving technology.
"Already now, the question is not, have you used AI? The question is, how exactly have you used AI and whether it's helpful or not."
Study Authors and Funding
Co authors include Xinyu Yang, a doctoral student in the field of computer science; Paul Ginsparg, professor of information science in Cornell Bowers and of physics in the College of Arts and Sciences, and founder of arXiv; and Mathijs de Vaan and Toby Stuart of the University of California, Berkeley.
The research was supported by the National Science Foundation.
Story Source:
Materials provided by Cornell University. Note: Content may be edited for style and length.
Journal Reference:
- Keigo Kusumegi, Xinyu Yang, Paul Ginsparg, Mathijs de Vaan, Toby Stuart, Yian Yin. Scientific production in the era of large language models. Science, 2025; 390 (6779): 1240 DOI: 10.1126/science.adw3000
Cite This Page: