Why don't LLMs say 'I don't know' more often?

Because they're trained on text that's mostly confident, not text that's full of caveats. The training data contains lots of declarative sentences and few 'I'm not sure about this' clauses. The model learns to imitate the dominant style. Post-training (RLHF) tries to instill more 'I don't know' behaviour but doesn't eliminate the bias. Newer models do say 'I don't know' more often but still confidently fabricate sometimes.

Are some hallucinations worse than others?

Yes. Hallucinated facts (wrong dates, fake citations, made-up people) are easy to spot if you check. Hallucinated reasoning (plausible-sounding but flawed logic) is harder to catch. The dangerous case is hallucinations in professional contexts — legal cases with fabricated precedents, medical advice with wrong drug interactions, scientific claims with non-existent papers. These have caused real harm.

Can hallucination be eliminated?

Not completely with current architectures. Retrieval-augmented generation (RAG) — letting the model look up real information — reduces hallucination dramatically for facts. Training improvements have helped. But the underlying mechanism (predict next token, no internal truth model) means some hallucination is inherent. The future likely involves hybrid systems with explicit fact-checking, not pure end-to-end neural language models.

Why LLMs Hallucinate — Confident Nonsense Is a Feature, Not a Bug

What hallucination actually is

When a language model produces confident-sounding text that's factually wrong, it's "hallucinating." The term is borrowed from psychology but it's a bit misleading — the model isn't experiencing anything. It's just generating its best-guess plausible continuation, and the continuation happens not to match reality.

Examples:

Asked who wrote a non-existent academic paper, the model invents a plausible-sounding author and citation.
Asked the temperature at a specific city on a specific date, the model gives a confident number that's nowhere near the actual value.
Asked to summarize a book the model has never read, it produces a plausible-sounding summary that doesn't match the actual book.

The output is fluent. The output is confident. The output is wrong. That's hallucination.

Why it happens

LLMs are trained to predict the next word (technically, the next "token") in text, given the previous words. They learn what's plausible, not what's true.

If the training data contained many sentences like "Albert Einstein was born in 1879 in Ulm," the model learns to fill in those facts when prompted. The pattern-matching produces correct answers.

But what if you ask about something not well-represented in training data, or contradictory across sources, or specific to a date the model can't have known about? The model still generates the most-plausible-sounding continuation, drawing on patterns from similar contexts. The continuation is fabricated; it has the right grammar, vocabulary, and style of confident sentences from the training corpus. It just doesn't match reality.

There's no "is this true?" check anywhere in the process. The model is doing one thing — sampling from a learned probability distribution over next tokens — and "truth" doesn't factor in.

Why models don't just say "I don't know"

This is the most common follow-up question. The answer involves both training and inference dynamics.

Training data is mostly confident. The internet, books, and articles are dominated by declarative sentences. People who don't know things tend to not write authoritative-sounding paragraphs about them. So the model learns to imitate confident text more than humble text.

Post-training tries to compensate. Models like ChatGPT and Claude go through "instruction fine-tuning" and "reinforcement learning from human feedback" (RLHF) where humans rate responses. Higher ratings go to responses that admit uncertainty when appropriate. This helps but doesn't fully fix the issue.

The model doesn't have a separate 'knowing' channel. When it's not sure, there's no internal signal saying so. It's just generating the next plausible token. Some research is exploring 'calibration' — getting the model to estimate its own confidence — but it's an active problem.

Newer models (GPT-4, Claude 4.x, Gemini 2+) say "I don't know" or "I'm not sure" more often than older ones. But they still hallucinate confidently in many cases.

Where hallucination is most dangerous

Common hallucination domains:

Specific facts. Names of people, dates, numbers, citations, URLs, addresses. These are the easiest to spot and verify.

Recent events. Anything after the model's training cutoff. The model will sometimes confidently make up information about events it can't possibly know about.

Highly technical claims. When the model is asked about specialized topics (medicine, law, advanced physics), small errors can pass as expert knowledge because the response sounds right.

Reasoning chains. Plausible-sounding logic that contains a hidden flaw. Easier to miss than wrong facts because each step looks reasonable.

Citations and quotes. Fabricated case law and academic citations have been one of the most embarrassing examples. Lawyers have been sanctioned for filing briefs containing AI-generated case citations that didn't exist. Researchers have submitted papers with fake references.

For consequential decisions, never trust an unverified LLM output.

How hallucination can be reduced

Several approaches are being deployed:

Retrieval-Augmented Generation (RAG). Instead of relying on what the model "knows," look up real information and feed it to the model as context. The model then generates text grounded in the retrieved documents. This dramatically reduces fact-hallucination because the model has real source material to draw from.

This is how most production "AI assistants" work today: a search step finds relevant information; the LLM is asked to generate an answer using that information. Hallucination drops from "frequent" to "rare for the retrieved-information domain."

Tool use. Modern LLMs can call external tools — calculators, search engines, code interpreters, specialized APIs. For factual questions, they're trained to use these rather than rely on internal knowledge.

Confidence calibration. Active research area: get the model to estimate its uncertainty and abstain when uncertain.

Verifiable reasoning. Some approaches force the model to lay out its reasoning step by step ("chain of thought") so errors can be caught.

Specialized models. Models fine-tuned on specific domains (medical, legal, scientific) hallucinate less within those domains because the training distribution is closer to the use case.

None of these eliminates hallucination entirely. But they reduce it significantly for many use cases.

The deeper architectural issue

Hallucination is a symptom of a deeper structural feature of LLMs: they don't have an explicit world model.

Humans, when asked a question, often: retrieve relevant memory, check consistency with what we already know, notice gaps, and either answer or admit ignorance. Humans have a (rough, imperfect) sense of what we know and don't know.

LLMs don't have this. They have a single end-to-end function that maps input text to output text. There's no separate "facts" database, no consistency checking, no explicit metacognition. The architecture doesn't include these things, so the model can't perform them.

Hybrid approaches that combine LLMs with structured knowledge bases, factuality checkers, or symbolic reasoning systems may eventually reduce hallucination further. But for end-to-end neural language models, some hallucination is built into the architecture.

If you'd like a guided 5-minute course on how LLMs work and how to use them critically, NerdSip can generate one tuned to your needs.

How to use LLMs anyway

Given hallucinations exist, here's how to use LLMs responsibly:

Verify facts. Anything load-bearing should be double-checked against a primary source.
Use them for tasks where being approximately right is enough. Brainstorming, drafting, summarizing pre-supplied documents, code generation (where you can test the output).
Be wary of citations and numbers. Especially fake citations to non-existent papers.
Prefer tool-augmented systems. ChatGPT with web browsing, or Claude with code execution, or Perplexity with citations, are much more reliable than vanilla LLMs.
Treat the output as a draft. It's a starting point, not an authoritative answer.

In domains where hallucination matters most (medicine, law, technical specifications), don't rely on LLM output without human expert review.

The takeaway

LLMs hallucinate because they're trained to produce plausible-sounding text, and there's no internal verification of truth. When prompts ask for information that's well-represented in training data, they're usually right. When the information is missing, contradictory, or post-training-cutoff, they generate confident-sounding fabrication. Mitigation through retrieval, tool use, and confidence calibration helps but doesn't eliminate the issue. Until architectures change fundamentally, hallucination is a feature of how LLMs work, not a bug to be fully fixed. Knowing this is the first defense.

Why LLMs Hallucinate — Confident Nonsense Is a Feature, Not a Bug

Article summary