TL;DR

Machines finally understand human language. In response, developers stripped grammar out of their prompts to cut API costs. What the community now calls caveman prompting is behaviorally indistinguishable from 1990s keyword search, except the model is vastly more capable and the constraint is purely economic. This piece traces why it happened, why it works, and whether it’s permanent.

For roughly forty years, the central challenge in computing was getting machines to understand how humans speak. Not how we write code, but how we actually talk: vague, contextual, full of assumed meaning and grammatical shortcuts that make sense to anyone who grew up speaking a language and none at all to a parser scanning for exact string matches.

The early search engines were brutal about this. If you wanted to find a cheap Italian restaurant in Madrid in 1999, you didn’t type “What’s a good Italian restaurant in Madrid that won’t destroy my budget?” You typed “Italian restaurant Madrid cheap.” The machine couldn’t handle the rest. We adapted. We spoke in compressed noun strings, dropping verbs and prepositions and anything that looked like context. It felt undignified, but it worked.

Then natural language processing actually got good. BERT. Transformers. Semantic search. The machines started to understand what you meant, not just what you typed. You could ask a real question and get a real answer. The problem, after decades, was finally solved.

And then something funny happened.

The economics of a sentence

A language model doesn’t read your words the way you do. It breaks them into tokens, sub-word fragments, and processes each one through billions of parameters, calculating attention weights across every other token in the context. Every character costs compute. Every word costs money.

Sam Altman disclosed in April 2025 that the cost of processing “please” and “thank you” across OpenAI’s user base runs to tens of millions of dollars annually. A study using a LLaMA 3-8B model on an NVIDIA H100 measured 0.245 watt-hours per “thank you” interaction: roughly equivalent to powering a 5-watt LED for three minutes. Multiply across billions of interactions and you have a real number on a power grid.

At the API level, verbosity is a financial liability. Claude 3.5 Sonnet costs $3 per million input tokens. An automated support system handling ten thousand tickets a day with a polite, conversational system prompt can run hundreds of dollars more per month than one using stripped-down telegraphic commands. The grammar is not free.

Something familiar in the regression

Developers started optimising their prompts. They cut the pleasantries. They stripped conjunctions, transitional phrases, the conversational scaffolding. A prompt that once read “Could you please look at this contract and tell me if there are any hidden liabilities? Keep it brief and avoid legal jargon.” became: Risk_Audit_MSA: Senior_Legal_Orchestrator: Zero_Legalese; Identify_Hidden_Liability; Bullet_Logic_Only.

The community has started calling this “caveman prompting.” Token compression. Telegraphese for the API era.

In the 1990s, we typed “Italian restaurant Madrid cheap” because the machine couldn’t parse anything richer. Now we write something barely more structured because the machine charges by the word. The surface constraint is different. The behavioral adaptation is nearly the same.

There’s a direct precedent from before the internet. The electrical telegraph, billed by the character as it scaled across continents in the mid-19th century, produced an entire compressed style: ruthless about pronouns and filler, optimised for meaning per cent. Efficient. To an untrained eye, almost unreadable.

What we called “search speak” in the early internet era and “telegraphese” in the Victorian era, we now call prompt engineering. The machine has changed. The economic pressure to compress has not.

Where the analogy breaks down

Early keyword search worked through literal string matching. The engine looked for the exact words you typed and returned documents that contained them. “Italian restaurant Madrid” worked because those strings appeared in web pages. The machine wasn’t understanding anything; it was counting.

Caveman prompting works for the opposite reason. The model is powerful enough to reconstruct meaning from a fragment. When you write Risk_Audit_MSA: Zero_Legalese; Identify_Hidden_Liability, the model doesn’t pattern-match against that exact string. It infers relationships, intent, and expected output from the compressed signal, using the same transformer self-attention mechanism that lets it interpret full grammatically correct prose.

The behavioral pattern looks like a regression. The underlying mechanism is an advance. We’re talking like cavemen to one of the most capable language systems ever built, and it understands us perfectly. That’s the part I keep coming back to.

The practical reality

Developers running production agentic systems report 60 to 80 percent reductions in token usage when switching to telegraphic prompts, often with no measurable drop in output quality. On tasks with strict logical requirements, code generation or data extraction, some report improvements: the compressed prompt forces the model’s attention onto functional variables rather than the social texture of the request.

Microsoft’s LLMLingua has automated this into an algorithm. A smaller language model pre-processes verbose prompts before they reach the primary model, stripping grammatical filler while preserving semantic content. At 20x compression ratios, benchmarks show only 1.5 points of performance degradation on reasoning tasks, while inference latency drops by more than half.

The logical endpoint, apparently, is automated caveman translation.

The question worth asking

67 percent of users incorporate polite language when interacting with AI. 55 percent do it because it feels like the right thing to do. 12 percent, according to one survey, do it to stay on the AI’s good side in the event of some future reckoning.

The prompt engineers are not among them.

The question is whether caveman prompting is temporary, something that dissolves as model costs fall toward zero and context windows expand, or the beginning of something more durable: a compressed human-to-machine grammar, alien to non-practitioners but standard in technical work.

My instinct, from watching SEO for twenty years, is that economic pressure produces behavioral change that outlasts the original constraint. We still write title tags in the compressed style we developed when character counts were a hard limit. I doubt we’ll go back to verbose prompts when tokens get cheaper.

We spent forty years teaching machines to understand us. We got maybe five years of conversational parity before the economics forced a new adaptation. The machines understand us fine. We just can’t afford to say so.


FAQ

What is caveman prompting?

Caveman prompting is a developer technique that strips natural language prompts down to compressed, grammar-free instruction strings to reduce the number of tokens sent to an LLM API. Instead of “Please review this contract for hidden liabilities,” you write something like Risk_Audit_MSA: Identify_Hidden_Liability; Zero_Legalese. The name comes from the way the input resembles fragmented prehistoric speech rather than fluent prose. Developers report token reductions of 60 to 80 percent with no meaningful loss in output quality on structured tasks.

Why does token count matter when using LLMs?

Language models charge by the token at the API level. A token is roughly three-quarters of a word in English. Every token processed, including punctuation, pronouns, polite phrasing, and conversational filler, incurs a real compute cost. At scale, a verbose system prompt running across thousands of daily API calls adds up to significant spend. Sam Altman publicly noted in 2025 that processing “please” and “thank you” across OpenAI’s user base costs tens of millions of dollars annually.

Behaviorally, yes. Both involve humans compressing natural language into abbreviated fragments to communicate with a machine. But the underlying mechanism is opposite. Early keyword search worked through literal string matching: the engine looked for those exact words in documents. Caveman prompting works because transformer models are so semantically capable that they reconstruct intent from compressed input. We’re relying on the AI being smart enough to fill the gap, not limited enough that it needs explicit syntax.

Does caveman prompting hurt output quality?

On structured tasks like code generation, data extraction, or logic-heavy analysis, the evidence suggests it doesn’t, and sometimes helps. The compressed prompt directs the model’s attention to functional variables rather than parsing social context. Microsoft’s LLMLingua compression algorithm achieves 20x compression with only a 1.5-point drop on reasoning benchmarks. On open-ended creative or conversational tasks, the tradeoffs are less clear, and this is where the out-of-distribution argument, that models trained on natural prose may perform worse on fragmented input, carries more weight.

Will compressed prompting become a permanent standard?

My honest view is yes, in technical environments. Economic incentives tend to embed behavioral change even after the original constraint is lifted. SEO practitioners still write compressed, keyword-dense title tags, a habit built when search engines were literal matchers, even though modern search understands intent. The more interesting question is whether caveman prompting stays confined to developers or spreads into the broader population as AI interfaces and their underlying economics become more visible to non-technical users.