Should I trust an AI's answer?

Conditionally. For matters of general understanding, structure, or first drafts, AI is usually good. For specific facts (numbers, dates, names, current state of fast-moving topics), AI is unreliable enough that you should verify with primary sources. The reliability also varies by model and task — frontier models with web access are noticeably more reliable than older models running offline. Treat AI like an enthusiastic colleague whose work you'd still skim before forwarding.

Is AI replacing thinking?

Only if you let it. Used well, AI offloads tedious parts (formatting, research summaries, first drafts, explaining unfamiliar jargon) and frees attention for the parts that matter (judgment, decisions, what to actually do). Used badly, it substitutes for thinking and you stop noticing when its answers are wrong. The skill is recognizing which mode you're in and adjusting.

Which AI should I use?

Depends on the task and constantly shifts. For long-form reasoning and writing, frontier models from Anthropic (Claude) and OpenAI (ChatGPT) are typically strongest. For up-to-date information, models with grounded web search (Perplexity, Google's Gemini with search, ChatGPT with browsing) outperform pure language models. For technical and coding tasks, both Claude and ChatGPT are widely used. For privacy-sensitive work, local models running on your own hardware are an option — covered in detail at [thepromptbench.com](https://thepromptbench.com/local-models/). The 'best' model in any specific month is a moving target.

How to Use AI Well — Principles for a Tool That Sometimes Invents Things

As of 2026-05-18. The AI landscape shifts month by month; specific model names and capabilities in this article may move on. For current guides on individual models and tools, see thepromptbench.com.

A short, accurate description

Modern AI assistants (ChatGPT, Claude, Gemini, others) are large language models: programs trained on enormous amounts of text that learned to predict what word should come next. With enough training, the ability to predict text well becomes the ability to write essays, answer questions, summarize documents, draft emails, write code, and discuss ideas.

They are useful enough that hundreds of millions of people use them weekly. They are also imperfect enough that you can't trust everything they say. The skill of "using AI well" is mostly the skill of getting the useful parts without being misled by the wrong parts.

A useful mental model: think of an AI assistant as a confident intern.

It writes pretty well, faster than you.
It knows surprisingly much about a wide range of topics.
It can be wrong in ways that LOOK confident.
It will make things up rather than say "I don't know" unless you remind it not to.
Its work is worth checking before you forward it.

If you'd hand-edit an intern's draft before sending it to a client, you should hand-edit an AI's draft before sending it anywhere that matters. If you'd accept an intern's quick summary of a topic for your own reading, you can accept an AI's similar summary. The difference is calibration: a senior person knows when they need to verify and when they don't.

What AI is genuinely good at

Some patterns where modern AI consistently helps:

First drafts. Writing emails, articles, code, presentations, plans. The blank page is the hardest part of writing for many people; AI eliminates it. The first draft will need editing, but you've skipped the slowest step.

Summarization. Compress a long document into key points. Pull the main argument from a research paper. Distill a long conversation. AI is good at this, with the caveat that it can occasionally invent details — verify critical numbers against the source.

Explaining unfamiliar territory. Need a quick orientation to a new field, technology, or concept? AI is excellent at this. Ask it to explain like you're a beginner, then ask follow-up questions about specifics. (This is exactly how Feynmanpedia gets built, with editorial fact-checking on top.)

Brainstorming. Ten ways to think about a problem. Counterarguments to a position. Names for a project. The AI's outputs aren't the final answer but they expand your search space.

Translation. Especially between common languages. Quality is high, sometimes better than older machine translation systems.

Code assistance. Writing snippets, explaining unfamiliar libraries, debugging, code review. Especially useful for developers working in less-familiar languages or frameworks.

Formatting and restructuring. Turn this list into a table. Convert this Markdown to HTML. Rewrite this in a more formal tone. AI is fast and reliable for these mechanical transformations.

Practice partner. Roleplay an interview. Pretend to be a customer with a complaint. Quiz me on this material. AI doesn't get tired and won't judge your bad first attempt.

What AI is bad at

Equally important:

Specific facts you can't easily verify. Dates, names, numbers, quotes, statistics, exact technical specifications. AI will fluently generate plausible-sounding values that may be wrong. Default to checking these against a primary source.

Anything with a date cutoff. Models have a training cutoff and don't know what's happened since (unless they have web search). They sometimes don't know what their own cutoff is. They confidently answer questions about recent events using stale or imagined information.

Math at scale. Arithmetic on small numbers is fine. Long calculations, complex equations, financial computations — AI makes errors. Use a calculator, spreadsheet, or code execution instead. Many AI tools now offer integrated code execution which avoids this.

Reasoning about itself. AI is unreliable when asked about its own capabilities, what it can and can't do, its limitations, its training. The answers sound authoritative but are often wrong.

Citations. AI without grounded web search frequently invents plausible-sounding citations — book titles, paper authors, URLs that don't exist. Even with grounded search, double-check that linked sources actually say what's attributed.

Sustained complex reasoning. For multi-step problems with many constraints, AI can lose track or make errors mid-chain. Newer "reasoning" models (with extended thinking) handle these better but aren't infallible.

Anything involving real-time or local data. What's the weather. What's my account balance. Did this happen this morning. These need real-world integrations, not language models alone.

Strong moral judgment in your specific situation. AI is calibrated to give cautious, generic advice. For decisions specific to your life, your values, your stakes — AI's input is one signal, not the answer.

The core habits

A few practices that compound across uses:

1. Be specific in what you ask for.

Don't write: "Write me an email."

Do write: "Write a polite but firm email to my landlord saying the dishwasher has been broken for two weeks and I need it fixed by Friday. Keep it under 100 words. Don't be apologetic."

The more concrete you are, the closer to useful the output is. (Deep dive on prompting at thepromptbench.com/prompting/.)

2. Iterate, don't accept the first answer.

The first output is rarely the best. "Make it shorter." "Less corporate." "Add an example." "Try a different angle." Modern AI handles refinement well. Treat the conversation as a dialogue, not a query.

3. Ask for the AI's reasoning when it matters.

"Walk me through why you think that." This both helps you evaluate the answer and gives the model a chance to catch its own errors. Many models perform meaningfully better when asked to explain their thinking step by step.

4. Ask for sources — and check them.

"What's the source for that number?" If the AI cites a source, verify the source exists and says what the AI claimed. If the AI can't give a source, treat the claim as a hint, not a fact. (More in how to fact-check an AI answer.)

5. Use grounded search for current information.

For anything time-sensitive (recent news, current prices, latest research), use an AI tool with web search — Perplexity, ChatGPT with browsing, Claude with web search, Gemini. Pure language models will hallucinate dates and "facts" about anything since their training cutoff.

6. Verify high-stakes outputs differently than low-stakes ones.

Drafting a personal email? Read it and send. Writing a legal contract clause, a medical decision, a financial calculation, code that handles real money? Stop. Verify against primary sources. Get a human expert to review.

7. Know which model you're using.

Frontier models (current Claude, ChatGPT) are noticeably more reliable than older models. Free tiers often use smaller models. Browser-based "AI" features can be very different from a top-tier API model. If you're getting bad outputs, check whether you're on the latest version. (Comparison tables at thepromptbench.com/models/.)

What good AI use looks like in practice

A handful of concrete examples:

Doctor, looking up a drug interaction: asks AI for a summary, then verifies against the official drug-interaction database. AI accelerates the lookup but isn't trusted as the only source.

Manager, writing a performance review: drafts bullet points, has AI expand into prose, edits the prose heavily to add specific examples and adjust tone. AI saves drafting time; the manager owns the content and judgment.

Researcher, exploring a new field: uses AI for orientation ("explain the key papers in topic X"), then reads the actual papers. AI as table of contents; primary sources as truth.

Developer, learning a new framework: asks AI how to do common tasks, tests the code, reads framework docs for details. AI as fast first explanation; docs as reference truth.

Writer, struggling with structure: dumps a messy draft, asks AI to suggest a clearer structure, picks the parts that work, writes the polished version themselves. AI as collaborator on structure; the writer owns the voice.

Student, studying for an exam: uses AI to quiz themselves on material from textbook chapters. AI as practice partner; the textbook as canonical content. (See Feynman method for the broader study technique.)

What these have in common: AI is contributing to the work, not doing the work alone. The human is making the calls that matter.

What bad AI use looks like

Common failure modes:

Forwarding AI output unchecked. Especially common with summaries, reports, legal drafts. Whatever errors the AI made are now yours. The most-cited lawyer fail is the 2023 case of a US lawyer who submitted a brief with citations to cases that didn't exist — AI fabricated them.

Believing the AI on time-sensitive facts. "What's the current capital gains tax rate?" "What's Apple's latest iPhone called?" "Who won the election?" Pure language models often answer these confidently and wrongly. Use grounded search.

Skipping the verify step on numbers. AI hallucinated percentages, financial figures, statistical claims sound authoritative. Always check against the source.

Letting AI think for you on decisions that matter. AI can lay out options for a life decision; it can't actually have your values, weigh your trade-offs, or live with the result. Use it for input, not for choice.

Trusting AI on itself. "Can you do X?" "Are you allowed to discuss Y?" "How were you trained?" Answers are usually polite, plausible, and unreliable.

Outsourcing judgment. "Should I take this job?" "Should I trust this person?" AI doesn't know your specific situation or values. It will give generic advice that sounds tailored. Make your own call.

A note on getting better at this

AI literacy is a real skill. It's not just about knowing which model is best this month — it's about developing intuitions for when AI is being reliable vs hallucinating, which kinds of tasks suit it, and how to phrase requests so you get useful output.

The fastest way to build the skill: use it for things you already know about. You'll see where it's right, where it's wrong, where it confidently invents. Then extrapolate that calibration to areas you know less about — but always verify when it matters.

For the deeper craft of prompting (system prompts, structured outputs, retrieval-augmented generation, agent workflows, local models), thepromptbench.com is the dedicated resource. This article is the everyday-user version; that's the working-engineer version.

If you'd like a guided 5-minute course on using AI tools effectively, NerdSip can generate one on your phone.

The takeaway

Modern AI is a competent collaborator that occasionally invents things confidently. Use it like a confident intern: lean on it for drafting, summarizing, brainstorming, explaining; verify for specific facts, dates, numbers, names, and anything time-sensitive. The compound skills are writing clear prompts, asking the AI to show its reasoning, asking for sources and checking them, and using grounded-search models for current information. Used well, it's a multiplier; used badly, it substitutes for thinking. The deeper prompting craft lives at thepromptbench.com; this article is the everyday-citizen orientation.

Using AI Well, in Plain English

Article summary