Valtik Studios
Back to blog
AIinfoUpdated 2026-04-17orig. 2026-04-0218 min

Your AI Chatbot Is a Fancy Calculator. Here Is Why.

LLMs are next-token prediction engines, not reasoning machines. A technical takedown of AI sentience claims with implications for cybersecurity, social engineering, and threat intelligence.

Phillip (Tre) Bucchi headshot
Phillip (Tre) Bucchi·Founder, Valtik Studios. Penetration Tester

Founder of Valtik Studios. Penetration tester. Based in Connecticut, serving US mid-market.

Let me take some of the AI panic down a notch

Every few weeks a new "AI is sentient" claim goes viral. A chatbot said something eerily self-aware. A model refused a task and "begged" not to be shut down. A researcher leaked a conversation that sounds like HAL 9000.

I understand the instinct. I've read those same conversations. They're weird. They don't feel like talking to a regular piece of software. But they also aren't evidence of anything like human consciousness, and the more technically you understand what the model is actually doing under the hood, the less the "sentient" framing makes sense.

This post is the technical reality of what happens when you send a prompt to ChatGPT, Claude, or Gemini. Not to downplay the impressive capabilities, which are real. But to separate those capabilities from the sci-fi framing that keeps making this debate incoherent.

What an LLM does

Strip away the marketing and the breathless headlines. A large language model does one thing: it predicts the next token in a sequence.

A token is a chunk of text, typically 3 to 4 characters. The model's vocabulary contains roughly 50,000 to 100,000 tokens. When you type a prompt, the model converts your text into tokens, passes them through dozens of transformer layers using self-attention (matrix multiplication that learns which words relate to which other words) and feed-forward networks, then produces a probability distribution over all possible next tokens. It samples one, appends it, and repeats.

No reasoning engine. No world model. No internal monologue. No goals. Just matrix multiplication producing probability distributions over the next word.

The Chinese Room

In 1980, philosopher John Searle published the most famous thought experiment in philosophy of mind. You're locked in a room. Chinese characters arrive. You have a rulebook: see this sequence, respond with that sequence. You follow the rules perfectly. People outside believe you understand Chinese. But you understand nothing. Symbol manipulation without comprehension.

An LLM is the room. The trained weights are the rulebook. The tokens are the characters. Syntax without semantics.

Stochastic parrots

In 2021, Bender, Gebru, and colleagues published "On the Dangers of Stochastic Parrots" at ACM FAccT. Core argument: LLMs stitch together linguistic forms from training data according to probabilistic patterns. But without any reference to meaning. A parrot produces convincing speech without understanding. An LLM produces convincing text the same way.

The paper raised a critical concern about deceptive fluency. When output is grammatically perfect and contextually appropriate, humans attribute understanding. This is cognitive bias, not evidence of intelligence.

Emergent abilities are a measurement artifact

In 2023, Stanford researchers showed that 92% of claimed "emergent abilities" used discontinuous metrics that gave 0 for imperfect answers. With continuous metrics measuring partial correctness, the emergence disappeared. Performance improved smoothly with scale. No phase transition. The emergence narrative drove billions in investment on a false premise.

Scaling has a ceiling

Each decade of additional compute yields only 1 to 2 percentage points of benchmark improvement. Training data is running out. Inside the labs, consensus is shifting: more data and compute alone won't produce AGI.

Yann LeCun (Meta): "We're not going to get to human-level AI by scaling LLMs." Gary Marcus: "LLMs aren't AGI. And on their own never will be." Yoshua Bengio: doesn't believe current systems are sentient.

The energy reality

Training GPT-3 consumed 284,000 kilowatt-hours. GPT-4 reportedly cost over $100 million. A single ChatGPT query uses 0.34 watt-hours. Inference accounts for over 90% of total power consumption. This is a calculator that requires a power plant.

Why this matters

Misplaced trust: people share sensitive info with chatbots believing the system understands confidentiality. Security implications: AI phishing works because people overestimate AI capabilities. Market distortion: the AGI hype diverts resources from solving problems.

AI is a powerful tool. LLMs are genuinely useful. But they're tools, not minds.

Sources

  1. Vaswani et al., "Attention Is All You Need" (2017)
  2. Searle, "Minds, Brains, and Programs" (1980)
  3. Bender, Gebru et al., "On the Dangers of Stochastic Parrots" (2021)
  4. Schaeffer et al., "Are Emergent Abilities a Mirage?" (2023), Stanford HAI
  5. LeCun, "Meta Chief AI Scientist Questions LLMs" (2025)
  6. Marcus, "The Great AI Retrenchment" (2025)
ai securityllmthreat intelligencesocial engineeringmachine learningtechnical explainerresearch

Want us to check your AI setup?

Our scanner detects this exact misconfiguration. plus dozens more across 38 platforms. Free website check available, no commitment required.

Get new research in your inbox
No spam. No newsletter filler. Only new posts as they publish.