As of 2026-05-18. Verification tooling is improving fast — specific products and model names will shift. The workflow below uses durable principles. For current tool guides and the exact code, see thepromptbench.com.
Why this matters
An AI assistant's confident answer is a hypothesis, not a fact. It might be right; it might be a hallucination. The fluency of the answer tells you nothing about its truth.
For low-stakes questions (drafting an email, brainstorming, getting a quick explanation of a topic you'll re-read later) you can mostly let this slide. For high-stakes questions — anything you'll act on, publish, send to someone, or base a decision on — you need to fact-check.
Fact-checking isn't theoretical. Real consequences of unverified AI output, all from public incidents over the past several years:
- Lawyers sanctioned for filing briefs with cases AI invented.
- Companies bound by promises their chatbots made.
- Medical errors when AI summaries were trusted without verification.
- Reputational damage from publishing AI articles with fabricated quotes and statistics.
The good news: fact-checking has gotten dramatically easier. Modern grounded models can verify claims in seconds against current web sources. The skill is knowing what to check and how.
What to fact-check (and what to skip)
You can't verify everything. The skill is identifying which claims need verification.
High priority to check:
- Specific numbers (statistics, prices, percentages, populations).
- Specific dates (when an event happened, when a paper was published).
- Specific names (of people, products, organizations, models, versions).
- Direct quotes attributed to named people.
- Citations (paper titles, book titles, author names, URLs).
- Causal claims ("X causes Y") backed by specific studies.
- Anything you'll act on financially, medically, legally.
Lower priority:
- General principles and definitions (usually well-trained).
- Common-knowledge facts (capitals, basic science, mainstream history).
- Structural framing of an argument.
- Stylistic choices.
- Drafts that you'll re-read carefully before using.
The 80/20 is: spend your fact-checking effort on the specific quantitative and named claims, less on the framing.
The workflow
A practical sequence for verifying an AI answer:
1. Extract the claims
Read the AI's output and list the specific factual claims. Be explicit. If the AI wrote "The unemployment rate fell to 3.5% in March 2024 according to BLS data" — that's actually three claims:
- The unemployment rate was 3.5% in March 2024.
- The number fell (was higher previously).
- The Bureau of Labor Statistics is the source.
Each is independently verifiable.
2. Pick the right tool for each claim
Different claims need different verification:
For published statistics (employment, inflation, populations): go directly to the issuing organization. BLS, Eurostat, World Bank, IMF, Our World in Data. These primary sources are authoritative and free.
For scientific claims: find the original paper. Google Scholar, the journal website, or the preprint server (arXiv, bioRxiv). Read the abstract and methods, not a press release.
For news events: cross-check across multiple independent reputable outlets. A claim in only one outlet, especially a partisan or low-credibility one, is weak evidence.
For legal/regulatory claims: official government sources. The actual statute, the court ruling, the regulator's website. Not a news summary.
For product/technology claims: official documentation, manufacturer's website, the technical spec, the changelog.
For citations of specific papers, books, articles: search the title. If you can't find the source the AI claimed, it's likely a hallucination.
For general factual claims without a specific source: use a grounded search model.
3. Use grounded models for cross-checks
When you don't already know the primary source, a grounded model is the fast path. Options:
- Perplexity (https://www.perplexity.ai/) — pure-play grounded search engine. Returns answers with inline citations. The Pro tier and Sonar Pro API are designed for fact-checking workflows.
- ChatGPT with browsing enabled — works similarly.
- Claude with web search — Anthropic's grounded mode.
- Google AI Mode / Gemini with web — Google's grounded answers.
- Bing Copilot — grounded with web search.
For each, the pattern is: ask the same factual question, see what sources are returned, click through, verify the source actually says what's claimed.
The grounded model's answer isn't itself the truth — the linked source is. The grounded model is the search tool; the source is the evidence.
4. Click through and read the source
This is the step that gets skipped most often. Don't trust a citation just because it exists — open it. Read the relevant paragraph. Confirm the claim is in fact what the source says.
Common patterns where AI gets sources wrong even with grounding:
- Source is real, but doesn't say what the AI claimed.
- Source says it, but with caveats the AI dropped.
- Source says it, but the source itself is unreliable (low-quality blog, predatory journal, content farm).
- Numbers in source don't match what AI quoted (rounding, year mix-up, different metric).
The act of clicking through and reading is the fact-check. Without it, you're trusting the AI's summary of a source you haven't seen.
5. Use code execution for math
For any computation in an AI's answer — percentages, conversions, projections, anything involving numbers — don't trust the math. Either redo it yourself or have the AI execute the calculation in code.
Modern tools (ChatGPT, Claude, Gemini) offer code execution. Ask the AI to compute via Python or JavaScript, which actually runs and produces verifiable results. Pure language model arithmetic is unreliable.
6. Cross-check across two different models
For especially high-stakes claims, ask the same question to two different model families (e.g., Claude and ChatGPT, or Perplexity and Gemini). Different training, different weights, different hallucinations.
- Both agree, with sources: high confidence.
- Both agree, without sources: medium confidence; do go check the underlying source.
- They disagree: at least one is wrong. Investigate.
- One says "I don't know": that's actually a signal — pay attention to the cautious answer.
A scalable pattern: Sonar Pro via OpenRouter
For articles published on this site (and at thepromptbench.com), the editorial workflow includes an automated fact-check pass:
- The article is written without web access — drafted from general knowledge.
- The article is sent to
perplexity/sonar-provia the OpenRouter API. - The model performs grounded search across current web sources and returns structured findings: each specific claim labeled as
ok,outdated,wrong,unverifiable, ormissing, with source URLs for corrections. - The article is edited to fix flagged claims.
- The corrected article is published.
The tool is open and reusable. In this repository, see tools/fact-check.mjs for the implementation. Run with:
node --env-file=.env tools/fact-check.mjs --site feynmanpedia --slug my-article
The cost is a few cents per article-check; the time is ~30-60 seconds; the output is a structured JSON file of findings to act on.
This isn't unique — fact-check pipelines exist at major publishers and AI companies. The point is: the operational pattern works at scale, and you can build a personal version of it without much effort. For longer guides on building this kind of pipeline, see thepromptbench.com.
Where automated fact-checking still fails
Automated fact-checking is helpful but not magical. Specific weaknesses:
Recently changed facts. If the grounded model's search index hasn't updated yet (some lag is normal), recent changes will be missed.
Contested facts. Where sources disagree, the grounded model picks one. Sometimes the wrong one. Manual judgment is still needed for contested topics.
Domain-specific accuracy. Grounded models are generalists. For deeply specialized topics (legal precedent, scientific frontier research, niche technical specs), they can miss nuance that a human expert would catch.
Subtle misrepresentation. A claim can be technically true but misleading. Automated fact-checking catches outright falsehoods better than slight distortions.
Citation laundering. Sometimes sources cite each other in loops, all tracing back to a single weak original. Following the chain takes more care than a single fact-check.
The right model: automated fact-checking catches the obvious errors, leaving you a much smaller set of judgment calls. It doesn't replace careful editing — but it eliminates the bulk of routine errors.
What "verified" actually means
Some honesty: even with all this, complete verification of every claim isn't realistic. Editorial pipelines, including this one, end up with a confidence gradient:
- Explicitly checked against a primary source: high confidence. Should be the default for numerical, named, dated, and quoted claims.
- Cross-checked across multiple grounded models or summaries: medium-high confidence. Reasonable for general factual claims.
- Drawn from training knowledge, not specifically checked: low confidence. Acceptable for principles and well-known facts; risky for specifics.
A useful habit: flag your own confidence. When you write or speak based on an AI answer, distinguish "I checked this against the BLS" from "the AI told me this and I haven't checked." Both are sometimes appropriate; conflating them is how errors propagate.
A simple personal workflow
If you don't want to build pipelines but want to fact-check casual AI answers:
- Ask the AI your question.
- Read the answer. Identify any specific claims you'd actually rely on.
- Open Perplexity (or your grounded model of choice). Ask the same factual question.
- Compare sources. Click through to the primary source if it matters.
- Update your mental model based on what the primary source actually says.
Takes 2-3 minutes. Catches most fabrications. Worth the time for anything important.
If you'd like a guided 5-minute course on AI fact-checking workflows, NerdSip can generate one on your phone.
The takeaway
An AI answer is a hypothesis to verify, not a fact to repeat. The workflow: extract specific claims, find primary sources for the high-stakes ones, use grounded search models for cross-checks, click through and read what sources actually say, use code execution for math, and cross-check across models for the critical claims. For scaled or repeated workflows, automated fact-checking with grounded models like Perplexity Sonar Pro (via OpenRouter or directly) works well — it's the same pattern used to verify the article you're reading. The deeper tooling and pipeline guides live at thepromptbench.com; the everyday habit is just "click through and read the source."