AI agents become platforms in 2026: how to avoid lock-in

AI agents are evolving from demos into platforms with durable state, execution runtimes, retries, and standards. Here’s what’s changing and why it matters.

Feb 22, 2026

Once a vendor hosts your execution environment, they sit at the choke point for what runs, what is logged, what is blocked, and what is billable. © Popular AI

For the last two years, “agent” mostly meant a chat loop plus a handful of tools. It looked great in a demo, then fell apart the moment you asked it to do real work for more than a few minutes. Context swelled. State went missing. Retries turned into spaghetti. And every team quietly rebuilt the same brittle scaffolding.

That era is ending.

The signal in February 2026 is not “agents got smarter.” The signal is that major vendors are finally shipping the platform primitives that agents always needed: durable state, execution environments, packaging standards, scheduling, retries, and system-managed memory.

Once those primitives exist, agents stop being apps. They become platforms other apps are built on, and they are controlled by whoever owns the runtime.

Durable execution beats clever prompting

If you want to predict where agent infrastructure is going, ignore the hype and look at what teams keep rebuilding anyway.

You keep seeing the same list, because the bottleneck was never prompt cleverness.

A place to run:

a sandbox, terminal, container, or workflow worker

A place to remember:

state, memory, logs, artifacts

A way to repeat safely:

scheduling, retries, idempotency

A way to standardize behavior:

skills, conventions, policies

A way to observe and govern:

audit trails, approvals, boundaries

When those are missing, you do not have an agent. You have a prompt that occasionally calls tools.

This month, OpenAI and Cloudflare both shipped pieces that move agents from “scripts” to “systems.”

OpenAI’s move: agent platforms inside the Responses API

OpenAI’s newest agent direction is easiest to understand as three primitives that show up repeatedly across their guidance and docs: Skills, Shell, and Compaction. The practical framing is “long-running agents that do real work,” not toy prompts, as laid out in their write-up on Skills, Shell, and compaction for long-running agents.

Skills: behavior becomes a versioned dependency

OpenAI’s Skills guide defines a skill as a versioned bundle of files anchored by a required SKILL.md manifest. Skills are modular, reusable, and explicitly positioned as compatible with an “open Agent Skills standard.”

That sounds benign, even helpful, because it is helpful. It also marks the beginning of a packaging ecosystem. Once workflows depend on skill bundles, the platform that resolves and executes those bundles controls the supply chain.

That can become productivity gold. It can also become an app store with compliance hooks.

OpenAI’s cookbook example on using skills in the API makes the packaging model concrete, including how skills are mounted into an execution environment.

Shell: the agent gets a terminal, and the runtime becomes the product

The Shell tool documentation makes the shift explicit: models can run commands in hosted containers managed by OpenAI, or through a local shell runtime that you host.

This is not a convenience feature. It moves “agentic work” from text generation into execution.

Run tests. Edit files. Compile. Format. Fetch dependencies. Generate artifacts. Turn plans into actual changes.

Once a vendor hosts your execution environment, they sit at the choke point for what runs, what is logged, what is blocked, and what is billable. OpenAI’s own materials show how skills are mounted into the shell environment, which tightens the loop between “packaged behavior” and “managed execution.”

Compaction: long-running memory becomes a managed service

Compaction is OpenAI’s answer to a hidden tax in agent systems: context windows grow until cost and latency explode. Their Compaction guide frames compaction as reducing context size while preserving the state needed for later turns.

OpenAI also shipped a dedicated endpoint to compact a response.

Here is the platform implication. Memory management moves server-side. Agents get cheaper and more reliable. Your agent’s continuity and working memory also become entangled with vendor-specific state handling.

That entanglement is useful. It is also sticky.

What these primitives enable in practice

Put Skills, Shell, and Compaction together and you can build a worker that looks less like a chatbot and more like a daemon.

It runs for hours or days.

It executes commands in a controlled environment.

It loads standardized skills, meaning your code style, runbooks, escalation rules, and operational conventions can live as reusable bundles instead of pasted prompts.

It compacts history so it stays inside budget, both technically and financially.

That is a platform. Platforms tend to accrete policy.

OpenAI’s public API changelog shows these primitives rolling into core surfaces, including support for Skills in the Responses API, the Hosted Shell tool, and networking in containers.

Share Popular AI

Cloudflare’s move: agents as Durable Objects, not request handlers

Cloudflare is pushing a parallel thesis: an agent is a persistent, stateful execution environment.

In Cloudflare’s Agents docs, each agent runs on a Durable Object, described as a stateful micro-server with its own SQL database, WebSockets, and scheduling.

That is a very specific bet: stop building “agents” as stateless request handlers that reconstruct context on every call. Start building them as long-lived software entities that keep their own state and can coordinate work over time.

The underlying Durable Objects model is documented in Cloudflare’s Durable Objects overview.

Cloudflare’s public implementation is not vague about the ambition. The cloudflare/agents repository describes agents as persistent, stateful execution environments with built-in support for state, storage, lifecycle, real-time communication, scheduling, AI model calls, MCP, workflows, and more.

This is how Cloudflare tends to build platforms: put primitives in the runtime, make them globally deployable, then sell operational simplicity.

Recent releases underline that this is “system software,” not demos. Cloudflare’s changelog entry for Agents SDK v0.5.0 highlights built-in retry utilities, including exponential backoff and jitter.
Cloudflare also documents retries directly in their Retries reference.

And their release notes and docs call out per-connection protocol controls, aimed at mixed client types, which is the kind of unglamorous feature you ship when you expect real production traffic.

The control layer: why platform agents change the power balance

A platform is not just infrastructure. It is leverage.

When agents are apps, you can swap models, tools, hosting, and memory strategies with moderate pain.

When agents become platforms, switching costs move up the stack:

Workflow format lock-in
Skills schemas, tool definitions, memory semantics, and how state is serialized.

Runtime lock-in
Hosted execution environments, schedulers, and state stores.

Policy lock-in
What the platform allows, blocks, logs, retains, and audits.

Billing lock-in
Per-call pricing can turn into per-workflow pricing, with “reliability features” bundled into premium tiers.

The same “safety” language that used to apply to model outputs now applies to execution. Once the agent can run commands and persist state, governance becomes a justification for tighter chokepoints. If your agent platform controls the runtime, “responsible use” can quietly become permissioned capability.

This does not require conspiracy. It is incentives. Platform owners like recurring revenue, compliance moats, and ecosystem control. Agents are simply the next interface where those incentives can be enforced.

How to get the benefits without surrendering the keys

If you care about autonomy, your goal is not to reject agent platforms. Your goal is to design your system so you can leave.

Here are practical patterns that keep you portable while still getting the upside.

Keep state outside the vendor

Use vendor memory features for convenience, but treat your own database as the source of truth.

Store task state, artifacts, and decisions where you control retention and auditing. If you use server-side state, make sure you can replay and reconstruct critical workflows without it.

For OpenAI specifically, it helps to think in terms of “server-managed state as an accelerator, not a dependency.” Their conversation state guide makes it clear that conversation state can be persisted server-side, which is useful, but you can still design your system so your durable truth lives elsewhere.

Separate planning from execution

Let a hosted model plan, but keep execution local for sensitive tasks.

OpenAI’s Shell tool explicitly supports a local runtime option in addition to hosted containers.
That is an off-ramp. Use it. Treat “hosted shell” as a convenience layer, not the only path to ship.

Use open orchestration for the platform layer you want to own

If you need durable execution and retries without surrendering your runtime, you can adopt orchestration primitives you control.

Two commonly cited options in this space are:

LangGraph on GitHub, which frames itself around building long-running, stateful agents with an orchestration layer you can run and evolve.
Temporal on GitHub, a durable execution platform built around resilient workflows with automatic handling of intermittent failures and retries.

These are not “agent frameworks” in the marketing sense. They are control surfaces you can own.

Maintain a local inference baseline

Even if you use hosted models for peak tasks, keep a local model path for routine work and sensitive text. That gives you leverage and a fallback.

Projects like llama.cpp exist specifically to make local inference practical across a wide range of hardware.

A local baseline also makes it easier to spot when vendor-side changes shift behavior. You can compare outputs, latency, and cost trajectories against something you control.

Regression-test your agent like production software

Compaction and long context change behavior over time. Skills evolve. Tool policies drift. Runtimes update.

Treat prompts, skills, and tool policies as versioned dependencies.

Build a harness that runs the same task suite weekly and alerts on drift. Log the same metrics you would log for any production system: success rates, time-to-completion, retry counts, cost per task, and the shape of failure modes.

The takeaway

Agents are graduating from apps to platforms because the real bottleneck was never “model IQ.” It was durable execution: state, memory, retries, runtimes, and standards.

That is good news for builders who want agents that actually finish jobs.

It is also a new control layer where vendors can centralize workflows, policies, and pricing.

Use the platforms if they help you ship. Just design your system so you can walk away with your state, your logs, your artifacts, and your execution path intact.

That is how you get agent leverage without becoming someone else’s tenant.

Comments

Ready for more?