WebSockets and AI: Why LLMs Are Moving Beyond SSE

by Matthew O'Riordan • Published on March 10, 2026 • Updated March 10, 2026

WebSockets are having a moment in AI. Not because they’re new, but because the AI industry is rediscovering that persistent, bidirectional connections are the right primitive for complex real-time interactions.

The story is worth understanding. It starts with SSE, runs into walls, and lands on WebSockets, with an emerging infrastructure layer called Durable Sessions building on top.

SSE Was the Starting Point

When ChatGPT launched in late 2022, it popularized a simple pattern: stream tokens from server to client as the model generates them. Server-Sent Events was the obvious choice. It’s simple, works over standard HTTP, and does one thing well: push data from server to client.

// The early pattern: SSE for token streaming
const eventSource = new EventSource('/api/chat?prompt=Hello');

eventSource.onmessage = (event) => {
  const token = JSON.parse(event.data);
  appendToResponse(token.text);
};

For the first wave of AI chatbots, this was fine. Take a prompt, stream back a response. SSE handled it cleanly.

But AI products didn’t stay simple for long.

Where SSE Breaks Down

The limitations show up fast once you move beyond a basic chat interface.

There’s No Client-to-Server Channel

SSE is one-way. Server pushes to client, that’s it. The client has no way to send messages back over the same connection. So every client action needs its own separate HTTP request:

Cancelling a generation? Separate POST request.
Steering an agent mid-task? Another endpoint.
Confirming or rejecting a tool call? Different path entirely.
Sending follow-up context during generation? Yet another request.

You end up coordinating state between the SSE stream and a growing set of HTTP endpoints. It works, but the complexity ramps up fast.

Connection Drops Are Brutal

When an SSE connection drops (and on mobile networks, they drop constantly), the context of the current interaction is gone. The client reconnects, but the generation that was in progress? It may have completed, partially completed, or failed. There’s no built-in way to pick up where you left off.

Client                          Server
  |                               |
  |<---- SSE: token stream -------|
  |<---- "The answer is"... ------|
  |                               |
  |  x Connection drops           |
  |                               |
  |---- Reconnect --------------->|
  |                               |
  |  What happened to the rest    |
  |  of the response?             |
  |                               |

No Multi-Device or Multi-Tab Awareness

Open a conversation in one browser tab, then open it in another. With SSE, those are completely independent streams. Confirm a tool call in one tab and the other has no idea. Start on your phone, switch to your laptop, and you’re starting from scratch.

Agent Workflows Need Both Directions

Modern AI isn’t “prompt in, text out” anymore. Agent frameworks like LangGraph, CrewAI, and AutoGen create workflows where the agent proposes actions and waits for human approval, where multiple agents coordinate with a human supervising, where background tasks finish after the user has moved on.

These patterns are fundamentally bidirectional. Trying to force them through a unidirectional protocol creates fragile systems with a lot of duct tape.

Why WebSockets Are a Better Fit

WebSockets solve these problems at the protocol level. One persistent, bidirectional connection handles every interaction pattern:

// WebSocket: one connection handles everything
const ws = new WebSocket('wss://api.example.com/ai/session');

// Receive streamed tokens, tool calls, status updates
ws.onmessage = (event) => {
  const msg = JSON.parse(event.data);
  switch (msg.type) {
    case 'token':
      appendToResponse(msg.text);
      break;
    case 'tool_call':
      showApprovalUI(msg.tool, msg.args);
      break;
    case 'agent_status':
      updateAgentProgress(msg.status);
      break;
  }
};

// Send user actions over the same connection
function approveToolCall(callId) {
  ws.send(JSON.stringify({ type: 'approve', callId }));
}

function cancelGeneration() {
  ws.send(JSON.stringify({ type: 'cancel' }));
}

function steerAgent(instruction) {
  ws.send(JSON.stringify({ type: 'steer', instruction }));
}

The connection maintains state for the session’s lifetime. The server knows which client is connected, what conversation is active, what the current state is. No re-authentication on every message, no coordinating across separate request channels.

When the server manages session state over WebSockets, it can broadcast updates to every connected client. Approve a tool call on your phone and your laptop tab updates instantly. That’s practically impossible to do cleanly with SSE.

And at the wire level, WebSocket frames carry 2-6 bytes of overhead versus SSE’s repeated HTTP headers. When you’re streaming hundreds of tokens per second, that difference matters.

The Ecosystem Is Signaling This Shift

This isn’t just a theoretical argument. The AI ecosystem is actively moving in this direction.

Frameworks are abstracting away SSE. The Vercel AI SDK deprecated its HTTP+SSE transport in favor of a pluggable ChatTransport interface. TanStack AI introduced a ConnectionAdapter for swapping transport layers. AG-UI was designed from the start with pluggable transport. Framework authors don’t build these abstractions unless they’ve seen SSE hit real limits.

MCP dropped SSE. The Model Context Protocol, which is becoming a standard for AI tool integration, deprecated its SSE transport in favor of Streamable HTTP. That’s a clear signal that the protocol layer needs to evolve past simple server-push.

The production pattern is consistent. Teams building AI products start with SSE because it’s the simplest option. Then they migrate to WebSockets once they need human-in-the-loop approval, cross-device continuity, connection resilience, or multi-agent coordination. It’s become a predictable migration path.

Durable Sessions: The Layer Above Transport

Here’s the next problem. Even with WebSockets, what happens to session state when connections break?

A WebSocket connection is ephemeral. When it drops, the state tied to that connection can disappear unless your application explicitly manages persistence and recovery. For AI interactions where a single agent task might run for minutes, losing that state isn’t acceptable.

This is driving an emerging infrastructure category called Durable Sessions: a persistent, stateful layer that sits between AI agents and users, outliving any single connection. If Durable Execution (Temporal, Inngest) made backends crash-proof, Durable Sessions make the user experience crash-proof.

+-----------------------------------------------------+
|                  AI Agent / LLM                      |
|         (OpenAI, Anthropic, LangGraph...)            |
+---------------------------+-------------------------+
                            |
                            v
+-----------------------------------------------------+
|              Durable Session Layer                   |
|                                                      |
|   Session          Connection        State           |
|   Persistence      Resilience        Sync            |
|                                                      |
|   Survives disconnects, works across devices,        |
|   resumes interrupted streams, syncs state           |
+---------------------------+-------------------------+
                            |
                            v
+-----------------------------------------------------+
|                 Transport Layer                      |
|            (WebSockets / HTTP / SSE)                 |
|                                                      |
|   Bidirectional, low-latency, persistent connection  |
+---------------------------+-------------------------+
                            |
                            v
+-----------------------------------------------------+
|                   User Devices                       |
|                                                      |
|    Phone          Laptop          Desktop            |
|    (Tab 1)        (Tab 1)         (Tab 2)            |
|                                                      |
|    All connected to the same durable session         |
+-----------------------------------------------------+

A durable session provides:

Connection resilience - the session persists server-side; clients reconnect and resume exactly where they left off
Cross-device continuity - start on your phone, pick it up on your laptop, full state preserved
Async completion - if an agent finishes work after the user disconnects, the result is there when they come back
Multi-client sync - multiple tabs or devices can observe and interact with the same session simultaneously

Durable Session Providers

The category is still emerging, but two providers are out in front with different approaches:

Ably AI Transport

is a fully featured durable session layer purpose-built for AI. Built on WebSockets, it provides resumable token streaming, multi-device session continuity, human-in-the-loop workflows, and agent coordination. It’s the complete infrastructure layer between AI agents and users. Because WebSockets are the primary transport, you get bidirectional communication, low latency, and real-time state broadcasting out of the box.

ElectricSQL Durable Streams

focuses on the persistence and data sync side of durable sessions. Electric provides durable streams and database-to-client sync using HTTP-based streaming (not WebSockets), which works well with CDN infrastructure. It’s strong on the data persistence layer, but narrower in scope. It doesn’t provide the bidirectional communication or real-time coordination that more interactive AI workflows need.

For the full picture on the durable sessions category and the growing set of vendors supporting it, see durablesessions.ai.

Where This Is Heading

The pattern playing out across the industry is pretty clear. Developers start with SSE because it’s simple. As their AI product grows, they hit the limits: no bidirectionality, no connection resilience, no multi-device support. They migrate to WebSockets.

Now there’s a third step emerging. Rather than building session persistence and state management on top of raw WebSockets yourself, durable session layers provide those capabilities out of the box.

If you’re starting a new AI project today, it’s worth considering a durable session layer from the beginning rather than following this migration path the hard way. Connection resilience, multi-device continuity, and resumable streams aren’t features you bolt on later. They’re fundamental to a good AI user experience. And the most capable providers in this space are building on WebSockets, because bidirectional communication is the right foundation for how humans and AI agents actually interact.

A protocol standardized in 2011 is turning out to be exactly the right fit for one of the most demanding categories of software being built today. That’s a good sign for WebSockets.

Frequently Asked Questions

Why are AI applications switching from SSE to WebSockets?

SSE only supports server-to-client streaming. AI agents need bidirectional communication for tool calls, user interrupts, context updates, and multi-turn conversations. WebSockets provide full-duplex connections that support these patterns.

What are durable sessions for AI?

Durable sessions add connection resilience on top of WebSockets. If a connection drops during an AI response, the session resumes without losing context or repeating work. This is critical for long-running AI agent tasks.

Should I use SSE or WebSockets for LLM token streaming?

Use SSE for simple one-way token streaming where the user sends a prompt and waits for a response. Use WebSockets when you need tool use, interrupts, multi-turn agents, or collaborative AI features that require bidirectional messaging.

WebSocket vs Long Polling — polling approaches compared to persistent connections
WebSocket vs WebTransport — the next-gen transport built on HTTP/3
Future of WebSockets — HTTP/3, WebTransport, and what comes next
WebSockets at Scale — production architecture for high-concurrency applications
Building a WebSocket Application — hands-on tutorial with cursor sharing