1M tokens: long context becomes operational and builders need a playbook

Bigger context helps, but it also amplifies mistakes and lock-in. Learn how to budget, version, and control long context without losing optionality.

Feb 25, 2026

Long context used to feel like a neat demo. You could paste a few PDFs, ask for a summary, get something plausible, and move on. AI is rapidly moving beyond that.

In February 2026, long context started behaving less like a feature and more like a substrate that other workflows can sit on. Anthropic is pushing the size of the window into everyday use. OpenAI is pushing durability so work can continue even when the window fills.

Different bets. Same direction.

Long context is becoming a default, not a special case

Anthropic’s move is straightforward: make the context window big enough that it changes day-to-day practice.

With Claude Sonnet 4.6 (Feb 17, 2026), Anthropic put a 1M-token context window into beta and positioned it as part of the standard experience for many users, not a niche add-on. The signal is hard to miss. This is not being saved for a tiny set of “power users.”

Earlier in the month, Anthropic set expectations with Claude Opus 4.6 (Feb 5, 2026), including the same 1M-token context (beta) framing while highlighting that very large prompts can carry premium pricing once you push beyond a certain threshold. That detail is more than billing trivia. It tells builders where friction shows up first.

This is what normalization looks like: the capability ships broadly, and the economics nudge you to be intentional once you lean into the extreme end.

OpenAI’s bet: keep the work going when context grows

OpenAI’s February shift is less about expanding the headline context number and more about keeping long-running work from falling apart.

In the OpenAI platform, the workflow problem shows up as “threads that keep breaking.” You hit a limit, you truncate, you lose nuance, you restart, you drift.

OpenAI is addressing that by making sessions more durable through platform-level context management. The OpenAI API changelog captured these shifts as they rolled out, and OpenAI’s guidance around compaction is explicit about the goal: keep agents running across long timelines without repeatedly collapsing under their own history.

The centerpiece is server-side compaction. Compaction is not “delete the past and hope for the best.” It is the system compressing prior interaction into a smaller working state so the agent can continue the job.

OpenAI also published practical patterns for long-running work, including Skills and a hosted shell tool for agents. The message is operational: if you want agents to do real work over time, the platform needs mechanisms that make that work survivable.

Put together, this is why long context is starting to feel like infrastructure. You can buy a bigger window, or you can design for endurance. In practice, many teams will do both.

What changed in the last two weeks, in plain terms

Two things clicked at once:

Long context got normalized through mainstream model defaults and broader access to huge windows.
Durable workflows got productized through compaction and agent-focused tooling patterns.

That combination changes what “building with AI” means on an ordinary Tuesday. Less ceremony. Fewer resets. More continuity.

Why builders and operators should care

When context is large enough to use without constant curation, entire categories of work move from “AI-assisted” to “AI-native.”

Codebase work gets simpler. With very large context windows, you can include meaningful chunks of a repo along with failing tests, coding standards, and architectural notes. That reduces the number of retrieval hops and the amount of orchestration you need just to get started. Retrieval still matters, but the day-to-day “RAG tax” shrinks for many tasks.

Operational troubleshooting becomes less brittle. Incident response often depends on narrative: timelines, partial logs, configs, and the messy sequence of what happened. Big context helps you include the story without spending half your time curating it. Durability mechanisms like compaction help keep one agent thread alive across the full incident instead of restarting and losing momentum.

Policy and compliance work fits the shape of long context. Comparing documents, spotting conflicts, proposing edits, and tracking changes is slow, expensive, and easy to under-resource. Long context is naturally suited to this kind of work, which is exactly why governance and control questions get sharper as adoption rises.

The practical improvement is not magic. It is fewer moving parts. Fewer retrieval failures. Fewer broken multi-step threads.

The hidden footguns: cost, truth, and security

Long context can make you faster. It can also make you wrong at higher velocity.

Cost can creep. Vendors are already signaling that extremely large prompts may carry premium tiers. Even if the per-token math looks modest, the “just throw everything in” habit can turn into a quiet monthly budget leak. If you track cloud spend, you need to track context spend too.

Bigger context increases exposure to context poisoning. The more material you feed an agent, the more opportunities there are for malicious instructions to sneak in through docs, tickets, READMEs, or pasted emails. This is most dangerous when the workflow is agentic, meaning the model can take actions such as running commands, not just generating text.

Compaction can preserve mistakes. A compacted state is a lossy compression. If the agent infers something incorrectly early, compaction can freeze that incorrect assumption into the continuing memory of the session. The compaction guide makes this clear by framing compaction as context management, not as a truth guarantee.

Long context amplifies. It amplifies productivity. It amplifies errors. It amplifies whatever you do not control.

Share Popular AI

The power and control angle people avoid

When long context works well, it encourages a tempting architecture: centralize everything into one vendor’s prompt.

That has consequences.

Data gravity becomes lock-in. If your workflow is “upload the whole repo” or “paste the full contract set,” switching providers becomes more than changing an endpoint. You end up rebuilding trust, redaction practices, governance, and internal comfort around where data lives.

Policy enforcement becomes structurally easier. The more work context a provider sees, the easier it becomes to apply centralized rules, logging, and future compliance hooks. This can be benign, but it is still leverage.

Compute centralizes capability. Serving massive context well is expensive. That naturally favors large platforms and makes it harder for smaller competitors to match the full experience even if model quality converges.

None of that means “avoid hosted long context.” It means treat it like any dependency that can become a choke point.

How to use long context without losing optionality

Here’s a practical playbook that optimizes for autonomy without requiring perfection.

1) Version your context.
Treat each context bundle like a build artifact. Store exactly which docs, commits, tickets, and logs went in. If output quality shifts, you can diff inputs and isolate why.

2) Build a context budget and enforce it.
Set thresholds like “no more than X tokens of proprietary code per request” or “no customer data outside a private deployment.” It’s boring. It prevents accidental sprawl.

3) Keep a local preprocessing layer.
Do summarization, redaction, and deduplication locally, then send only what’s needed. This pairs well with durability techniques because you reduce the chance the platform is compressing junk into long-lived state.

4) Use two-pass truth checks for big-context answers.
First pass: do the work.
Second pass: require the model to point to specific excerpts from the supplied material, with identifiers, to justify each key claim. If it cannot, you do not ship it.

5) Keep an exit ramp.
Even if you standardize on one hosted provider, maintain a fallback path for core tasks, especially anything sensitive. You are not chasing purity. You are buying leverage.

Bottom line

Long context is crossing the line from feature to workflow substrate.

Anthropic is betting that 1M-token context belongs in everyday use, as shown in Claude Sonnet 4.6. OpenAI is betting that durable agents need platform-level memory management, with compaction and long-running agent patterns documented across the API changelog and agent workflow guidance like Skills and shell tips.

The teams that win will treat long context like infrastructure: budgeted, tested, versioned, and scoped. The teams that lose will treat it like magic and pour their working memory into a dependency they do not control.

If you want a follow-on piece, this draft naturally expands into a one-page long-context operating checklist, plus a lightweight policy template and an eval harness you can run on a real repo.

Explore more from Popular AI:

Start here | Local AI | Fixes & guides | Builds & gear | AI briefing

Comments

Ready for more?