AI search engines give incorrect answers at an alarming 60% rate, study says
**UNSAFE AT ANY SEED**
AI search engines give incorrect answers at an alarming 60% rate, study says
CJR study shows AI search services misinform users and ignore publisher exclusion requests.
BENJ EDWARDS - MAR 13, 2025 9:16 PM
A new study from Columbia Journalism Review's Tow Center for Digital Journalism finds serious accuracy issues with generative AI models used for news searches. The research tested eight AI-driven search tools equipped with live search functionality and discovered that the AI models incorrectly answered more than 60 percent of queries about news content. Researchers Klaudia Jaźwińska and Aisvarya Chandrasekar noted in their report that roughly 1 in 4 Americans now uses AI models as alternatives to traditional search engines. This raises serious concerns about reliability, given the substantial error rate uncovered in the study.
Error rates varied notably among the tested platforms. Perplexity provided incorrect information in 37 percent of the queries tested, whereas ChatGPT Search incorrectly identified 67 percent (134 out of 200) of articles queried. Grok 3 demonstrated the highest error rate, at 94 percent.
Generative search tools were often confidently wrong in our study. The Tow Center asked eight generative search tools to identify the source article, the publication, and URL for 200 excerpts extracted from news articles by 20 publishers. Each square represents the citation behavior of a response.
For the tests, researchers fed direct excerpts from actual news articles to the AI models, then asked each model to identify the article's headline, original publisher, publication date, and URL. They ran 1,600 queries across the eight different generative search tools. The study highlighted a common trend among these AI models: rather than declining to respond when they lacked reliable information, the models frequently provided confabulations—plausible-sounding incorrect or speculative answers. The researchers emphasized that this behavior was consistent across all tested models, not limited to just one tool.
Surprisingly, premium paid versions of these AI search tools fared even worse in certain respects. Perplexity Pro ($20/month) and Grok 3's premium service ($40/month) confidently delivered incorrect responses more often than their free counterparts. Though these premium models correctly answered a higher number of prompts, their reluctance to decline uncertain responses drove higher overall error rates.
**Issues with citations and publisher control**
The CJR researchers also uncovered evidence suggesting some AI tools ignored Robot Exclusion Protocol settings, which publishers use to prevent unauthorized access. For example, Perplexity's free version correctly identified all 10 excerpts from paywalled National Geographic content, despite National Geographic explicitly disallowing Perplexity's web crawlers.
Even when these AI search tools cited sources, they often directed users to syndicated versions of content on platforms like Yahoo News rather than original publisher sites. This occurred even in cases where publishers had formal licensing agreements with AI companies. URL fabrication emerged as another significant problem. More than half of citations from Google's Gemini and Grok 3 led users to fabricated or broken URLs resulting in error pages. Of 200 citations tested from Grok 3, 154 resulted in broken links.
These issues create significant tension for publishers, which face difficult choices. Blocking AI crawlers might lead to loss of attribution entirely, while permitting them allows widespread reuse without driving traffic back to publishers' own websites.
Mark Howard, chief operating officer at Time magazine, expressed concern to CJR about ensuring transparency and control over how Time's content appears via AI-generated searches. Despite these issues, Howard sees room for improvement in future iterations, stating, "Today is the worst that the product will ever be," citing substantial investments and engineering efforts aimed at improving these tools. However, Howard also did some user shaming, suggesting it's the user's fault if they aren't skeptical of free AI tools' accuracy: "If anybody as a consumer is right now believing that any of these free products are going to be 100 percent accurate, then shame on them."
OpenAI and Microsoft provided statements to CJR acknowledging receipt of the findings but did not directly address the specific issues. OpenAI noted its promise to support publishers by driving traffic through summaries, quotes, clear links, and attribution. Microsoft stated it adheres to Robot Exclusion Protocols and publisher directives.
The latest report builds on previous findings published by the Tow Center in November 2024, which identified similar accuracy problems in how ChatGPT handled news-related content. For more detail on the fairly exhaustive report, check out Columbia Journalism Review's website.