June 20, 20267 min read

From Tokenmaxxing to Tokenoptimizing

AI costAgentsMichii

Here is a contradiction on every AI invoice I look at: the price of a token has barely moved, yet the bill keeps climbing — fast. Both are true at once, and the reason is simple once you see it. The cost of AI is going to keep increasing. Not because anyone is overpaying, but for structural reasons that aren't going to reverse. The useful question isn't how to make the bill small. It's knowing what each token actually buys — because, as we'll see, not all tokens are spent with the same meaning.

The paradox has a name

In the 1860s, William Jevons noticed something strange about coal. As steam engines got more efficient, England didn't burn less coal — it burned more. Cheaper, more efficient use of the resource made it so useful that total consumption rose faster than efficiency improved. The same happened with computing, with bandwidth, with cloud storage. Every time a unit got cheaper, we found so many new uses for it that total spend went up, not down.

Tokens are next, and this is the engine behind the rising bill. The per-token price is flat to falling. Models get cheaper and smarter every quarter. But that is exactly why spend rises: cheaper, smarter agents get pointed at far more work than expensive, dumber ones ever could. The model didn't get more expensive. The system around it did — and we built a lot more of it.

2024 versus 2026

A 2024 task was one model call costing cents; a 2026 task is a dynamic agent system — a main agent spawning subagents and an evaluator, each with its own context, costing dollars

In 2024, one task was basically one call to the model. A question went in, an answer came out, maybe with a web search attached. A typical interaction cost a couple of cents. You could budget on a napkin: requests per day times two cents. Nobody needed a token strategy.

In 2026, one task is a whole team of agents. A main agent plans the work, then spawns helper agents to handle pieces of it, sometimes creating new ones mid-task. Another agent checks the results and sends failures back to be redone. Every one of these agents re-reads its growing history on every step, while reaching for tools — web, code, files, a browser, memory.

The model in the middle is the same model. The system around it is unrecognizable. That is where the money goes now — not into the model, into everything wrapped around it. And that wrapping only grows as agents take on harder work.

Gasoline, distance, and mileage

The way I keep it straight is to think of a road trip. It has exactly three variables.

The model price is the gasoline price. It's roughly flat. It's not your lever, and it's not the source of your rising bill.

The distance is how far an agent can travel on its own before it needs you — and distance is growing fast. METR, a research group that measures how long a task AI can finish unsupervised, found that this reach has been doubling on a steady cadence for years, and lately even faster. This isn't a metaphor — it's a measured trend. And distance is exactly what you want to grow — it's where the value is. A longer trip is the whole point of having a car.

The mileage is the only one of the three you actually control. Think of it as how much of your fuel turns into forward motion:

mileage = the tokens that produced the result ÷ all the tokens you burned

A number you want to push up. The denominator is everything you burn — and most of the waste is the agent re-reading its whole history on every step, hauling all that weight along for the ride. Tokenmaxxing was flooring the gas to see how far the car could go. Tokenoptimizing is tuning the engine so more of every tank turns into distance.

So the trip cost is just: fuel price × distance ÷ mileage. Read it as a direction, not a calculation: the further the agent travels the more you pay, the better your mileage the less, and the fuel price barely moves either way. And because the car gets heavier the longer it drives — re-reading its own history at every step — a long task costs far more than a short one, not just a little more.

Not all tokens mean the same thing

Three kinds of spend: discovery spend pays off in learning (keep it uncapped), outcome spend pays off in the result (measure per outcome), waste spend pays off in nothing (drive toward zero)

This is the idea that changes how you manage the whole thing. The word "tokens" on your invoice hides three completely different kinds of spend. Discovery spend is finding out what works — a new idea, a new way of using agents, something nobody's tried; the payoff isn't a deliverable, it's learning, and one run might reveal a trick that saves a thousand hours. Outcome spend is producing a known result — the finished report, the fixed bug; the payoff is the thing itself. Waste spend buys nothing — dead weight dragged along, an oversized model doing a tiny job, parallel attempts that all hit the same dead end.

The mistake nearly everyone makes is to see one line called "tokens," panic at the total, and reach for a cap. But a cap cuts all three blindly. Cap discovery and you blind your future; cap outcomes and you throttle your present. Only waste should be cut — and a blanket limit is exactly the tool that can't tell the difference. This is why you can't say no to your best engineer as a token question: you don't know if they're spending on discovery, which you'd be insane to cap, or leaking waste, which you should fix without rationing the person. The invoice doesn't tell you. You have to know the meaning of the spend.

So the most important thing a leader does is decide which mode a piece of work is in. Experimentation is buying discovery: keep it uncapped, and judge it on whether it found something worth keeping, not on cost per token. This is where tokenmaxxing was always right — you throw things at the wall because you don't yet know what sticks. Production is buying outcomes: here you measure hard, cost per finished task, never cost per token. The sharpest version teams have landed on is hours saved — estimate how long a task would have taken a person without AI, compare it to the time it took with AI, and count only the runs that actually produced something. Anthropic, METR, and Cognition have each published methods doing exactly this: turning tokens into hours, and hours into dollars. This is where tokenoptimizing lives, and it kicks in after you already know the agents work. Explore, then exploit. The job is a constant sorting — which work is still an experiment, which has earned promotion into measured production, and which should simply be killed. Confuse the two modes and budgeting feels impossible, because you're applying one rule to two activities with opposite definitions of success.

The bottom line

The cost of intelligence is going to zero. The cost of autonomy is not — it's rising, and it will keep rising, because distance has no ceiling and every saving gets spent on more work. So accept it first: your AI bill is going up, and a flat one usually means a company hasn't figured AI out yet. Budget for that. But rising doesn't mean unlimited — money is finite, and control still matters. The trap is thinking control means a smaller bill. It means knowing, for every token, whether it bought discovery, an outcome, or nothing at all — and cutting only the waste.

Fuel price isn't your lever. Distance is the value. Mileage is the engineering — and how you tune that mileage, trimming the bill without slowing the team down, is a subject of its own. That's the next post. The companies that win the next few years won't be the ones that spent the least — they'll likely be the ones that spent the most, and knew exactly what every token bought.